DATA WRANGLING - Afternoons
DATA WRANGLING - Afternoons
This course, "Data Wrangling" will enable you to:
- design database schemas for efficient data representation
- implement database schemas using MySQL
- navigate data management issues in organizations
- learn how to learn new technologies
- learn the basics of programming in Python
- import and export data to/from CVS and Excel, changing schemas as needed
- conduct basic analyses in Excel
- prepare a project workflow that imports data from different sources and produces reports
There are no prerequisites for this course; it is appropriate even if you've never done any programming or behind the scenes work with computers. If you do have significant, recent, experience or training with programming and databases this is not the course for you. It's also not suitable for those that have already taken a Database Management course (because the SQL part is identical). In this case I require you to drop the course and welcome you to audit parts that you think will extend your knowledge (and discuss a possible Independent Study to cover the Python parts of the course). In particular this course is not suitable as a interdisciplinary course for Computer Science masters students.
This course is taught by James Howison. My office is UTA 5.404. My office hours are 1-2pm on Friday.
The TA for this class is Rachel Simons, an iSchool PhD student.
Our class meets twice a week: Wednesdays and Fridays at 3:00–4:30, in the iSchool computing lab classroom (UTA 1.210A). If you miss a class it is your responsibility to catch up; I suggest identifying a classmate who will walk you through any material that you missed. It is not acceptable to miss a class but expect personal tutoring on what you've missed during office hours, unless you've already walked through the material from your classmate.
The majority of this class happens on the class server, so really we'll just be editing text files and uploading them from the local computers. I will be teaching using the Mac computers in the lab. This is primarily for a consistent experience for the class in the choice of text editor and uploading files to the server. You are welcome to use your own laptop or windows computer, but I won't be able to stop to help you with those.
There are no required texts for the course, but you will find these resources to be useful.
An intro book for MySQL that's available online at UT is: Learning MySQL
There are assignments each week for this course, covering the material addressed on Wed and Fri. The assignments are due 11.59 pm on the Monday of the following week. Late assignments will receive zero but you can drop your 2 lowest grades. Since we have class on Wednesday, the TA and I need Tuesday for grading to give prompt feedback. The assignments, and grading rubric, will be released on Canvas during class, so we'll go over the assignment and ensure everyone knows what's required. Each assignment will be turned in online, usually by uploading a PDF or Text file, and/or providing a URL to your assignment on the class server.
A major portion of the course is producing a data wrangling workflow that imports data from different sources to a database and then produces reports from that data. The workflow will have to run "live," handling data it has not seen before (although in consistent formats). You will use both your database and python skills in this project. You will also choose an analysis tool to display your report. You will learn about this tool independently (drawing on the class materials on how to learn technologies) and keep a journal of your learning. You may choose Excel, but you may also choose from other tools we will discuss, including Crystal Reports, Tableau, R, ManyEyes and so on. You will make a Screencast to demonstrate your workflow, as well as showing it live in class.
We will discuss example projects in class.
iSchool Open Day
Although not required for this class, you may want to present your workflow as a Student Project at the iSchool's Open Day (May 8, 2015 12:00–3:00). The Open House is an opportunity to present student projects, including projects from this course (but also from other courses or semesters!). Students interested in presenting projects submit brief 1-2 paragraph proposals by April 10 (look out for the upload link via email lists). These proposals will be submitted online and the link will be provided later.
Proposals should cover the following elements: 1) A clearly stated objective and an overall description of the work to be performed; 2) The deliverables, outcomes or the expected culminating products and the methods you will employ to achieve these outcomes; 3) Explanation of how the project fits into your education (learning objectives) and professional goals.
Proposals are subject to review by Open House Committee to ensure a professional quality presentation and there’s an opportunity to withdraw a proposal by May 1 if you don’t think it will be ready in time.
Student projects may be displayed at the Open House in a wide variety of formats (iPads, laptops, screens, physical objects, art – posters are NOT required.) Students should plan to have enough power to run their technology for the three hours of the exhibition. A limited number of outlets will be available and student will request all technical needs when they submit project proposals (by April 10th.) Talk with the IT Lab about borrowing equipment early on and reserve any equipment you need in advance.
Good follow up classes for this course would be:
385M Database Management (you'll redo some SQL, but learn php and building database backed websites. You won't reuse the python syntax, but you will reuse understanding loops, lists and dicts.) There are three sections of this for Fall 2015:(Gwidzka, Gunn, and Wallace). It's Wallace's first section of this, so you should check with him if he's going to do something differently.
385T Special Topics in Information Science: Metadata Generation and Interfaces for Massive Datasets with Unmil Karadkar. That class will really use your growing scripting skills.
385T Special Topics in Information Science: Information Modelling with Karen Wickett. If you enjoyed the modeling we did for SQL (ERD, boxes and line, cardinalities), then this course will really extend that way beyond SQL.
Below is the class schedule, including links to the assignments (those are bolded).
The syllabus page shows a table-oriented view of the course schedule, and the basics of course grading. You can add any other comments, notes, or thoughts you have about the course structure, course policies or anything else.
To add some comments, click the "Edit" link at the top.