This course, "Data Wrangling" will enable you to:
- design database schemas for efficient data representation
- implement database schemas using MySQL
- learn how to learn new technologies
- learn the basics of programming in Python
- import and export data to/from CVS and Excel, changing schemas as needed
- conduct basic analyses in Excel
- prepare a project workflow that imports data from different sources and produces reports
There are no prerequisites for this course; it is appropriate even if you've never done any programming or behind the scenes work with computers. I'll address each topic "from scratch." If you do have significant, recent, experience, or training with programming and databases this is not the course for you; you will find it too introductory. In this case I require you to drop the course and welcome you to audit parts that you think will extend your knowledge. In particular this course is not suitable as a interdisciplinary course for Computer Science students. From Spring 2017 onward enrollment in the course is restricted to Information School students. Students from outside the school have to contact the Instructor for permission to register.
This course is taught by James Howison. My office is UTA 5.404. My office hours are 4:15 to 5:15 Tuesday (i.e., just after class on Tuesday). Please understand that I will have to leave at exactly 5:15 because I have to pick my kiddo up from daycare.
Unlike almost all other iSchool classes, our class meets twice a week: Tuesday 3-4:15 and Thursday 10:30-11:45 in the iSchool computing lab classroom (UTA 1.210A).
If you miss a class it is your responsibility to catch up; I provide Screencasts that cover much of the material (but not all). Please identify a classmate early on who will help you catch up on material if there is no Screencast available for that week. Office hours are not for personal replays of missed classes, but neither do I want you to fall behind, so please watch the screencasts or meeting with colleagues then come to office hours and we'll get you back on track.
The majority of this class happens on the class server, so really we'll just be editing text files and uploading them from the local computers. I will be teaching using the Atom programmer's editor. This is available free for Mac, Linux, and Windows and will provide a consistent experience for us all. You are welcome to use your own laptop and should be able to install Atom and the plugins we will use. However I won't be able to stop the class to help you with issues on your laptop, rather I will ask you to use one of the lab computers to continue the exercises. You can work with Purple shirts in the Computer Lab outside class time to get things working on your laptop.
In order to work from off campus from your own computer you will need to have the UTexas VPN installed see http://vpn.utexas.edu I recommend installing the Cisco AnyConnect client. Note that the VPN is not administered by me or by the Information School, so you must direct VPN issues to UTexas ITS support.
There are no required texts for the course, but you will find these resources to be useful:
An intro book for MySQL that's available online at UT is: Learning MySQL
The bulk of your course grade (75%) comes from Weekly Assignments. There are assignments throughout each week for this course, covering the material addressed that week. The weekly assignments are due 11:59 pm on Sunday (this is to ensure that we can grade them before the Tuesday class). Late assignments will receive zero but you can drop your 2 lowest grades. However don't simply not turn in an assignment once it is late; the assignments test and drive your learning and your performance helps guide me on material. So not completing an assignment is a sure way to fall behind!
The assignments, and grading rubric, will be released on Canvas during the Thursday class, so we'll go over the assignment and ensure everyone knows what's required. Each assignment will be turned in online, usually by uploading a PDF or Text file, and/or providing a URL to your assignment on the class server.
If you've uploaded a PDF there will be comments left on the PDF, in addition to any text comments in Canvas. You can see the comments on the PDF via by "viewing feedback":
A portion of the course (25%) is a project to produce a data wrangling workflow. The project will be done in pairs. We will have an initial individual assignment to generate ideas, then opportunities to review each other's ideas and form groups. Working in pairs for this course does not mean dividing up the work; I require each group member to understand everything about their project. If there is a part of your project that you don't understand or couldn't work with then you are missing a crucial learning opportunity. We work in pairs to work together (supporting each other's learning) not to reduce the amount or diversity of the work that we do.
When I use the word "workflow" I mean a process that moves from data you obtain through reading it, analyzing it, and providing results. The workflow includes the data, the scripts that read and query the data, and the final output. We will discuss example projects in class early in the semester.
The project builds up through the semester (e.g., after we've learned Database Design you will do a design for your workflow), culminating in a full workflow that you demonstrate and describe through the report. In the past I required a screencast but that is not required now (so ignore any references to producing a screencast).
There are more details on the specific Assignment page: Project Workflow and Report
Example projects from previous years:
- State education, teachers salaries, public schools. Report.
- Visualizing impact of weather on border crossings. Report.
- How have "Game of Thrones" character names influenced baby names in the US? Report.
- Lightening strikes and baseball games
- Average age of Oscar winners by gender
- How do solar flares affects temperatures on Earth and on Mars
The weekly assignments are individual work. However, on one condition, I give you explicit permission to work together with other classmates on the assignments or on your projects. With the same condition, you are also welcome to seek input from people outside the class, such as friends and family. Neither "working together" nor "seeking input" means having others do the work for you; you should always be certain that you are learning and that you understand the code that you have submitted. The one condition is that you add a note to your homework (ideally through a comment in the Canvas submission) indicating how the work was done and identifying with whom you worked and how. For example you might say "Daria and I worked on this in the lab together, when we started out we were confused about X but I figured it out and shared that with Daria. Our code is very similar because we worked together". Or perhaps "I was confused about how to pad a string with spaces, and after working at it for 30 minutes I chatted about it with my partner who suggested the xyz method. I was pleased when I got that working myself." When you have worked together your code will have similarities, but you must not turn in identical code; rather you should take code you've worked on together and personalize it through comments that explain what is happening in the code. The comments must be your own, individual, work.
If you have questions on this policy please ask in the Discussion forum and I will answer there: Assignment Discussions. I have this policy because learning to program is both individual hard work and learning how to get help from others. Sometimes chatting through with another class member is just what is needed.
iSchool Open Day
Although not required for this class, you may want to present your workflow as a Student Project at the iSchool's Open Day (typically in May). The Open House is an opportunity to present student projects, including projects from this course (but also from other courses or semesters!).
We will meet each Tuesday and Thursday until May 4th, but will not have classes on:
- Tuesday 24 January (Professor traveling for research)
- Tuesday 28 February (Professor traveling for research)
- Tuesday 14 March (Spring Break)
- Thursday 16 March (Spring Break)
|19-Jan-17||meet||Project Intro||Project Intro|
|26-Jan-17||meet||Data Modeling||Data Modeling I - has_many, datatypes|
|31-Jan-17||meet||Data Modeling II - habtm|
|2-Feb-17||meet||Data Modeling III - has_many through|
|7-Feb-17||meet||Project Speed Dating & ERD to phpMyAdmin|
|9-Feb-17||meet||SQL Queries||SQL Queries I - Intro and SELECT basics|
|14-Feb-17||meet||SQL Queries III - Join, Single table|
|16-Feb-17||meet||SQL Queries IV - Join, many tables|
|21-Feb-17||meet||SQL Queries V - Aggregate Functions|
|23-Feb-17||meet||SQL Queries VI - GROUP BY|
|2-Mar-17||meet||SQL Queries VII - INSERT, UPDATE, DELETE|
|7-Mar-17||meet||Excel Queries||Excel - Queries - Sorts, Filters, joins|
|9-Mar-17||meet||Excel - Pivot Tables and Graphs|
|21-Mar-17||meet||Python I||Server Introduction: Part 1, Part 2, Atom SFTP and Creating Files.|
|23-Mar-17||meet||Python - Hello World and Strings|
|28-Mar-17||meet||Python - if/else/while|
|30-Mar-17||meet||Python - Lists and dicts|
|6-Apr-17||meet||Python II||Python - CSV|
|11-Apr-17||meet||Python - Writing CSV files|
|13-Apr-17||meet||Python - Basic SQL queries from Python|
|18-Apr-17||meet||Python - Parameterized SQL Queries|
|20-Apr-17||meet||Python - CSV to SQL via Python (putting it all together)|
|25-Apr-17||meet||Python - Data transforms/Dates|
|27-Apr-17||meet||Python - Regex and splits|
|2-May-17||meet||Python - Working with large files/Workshop|
|4-May-17||meet||"What's Next?" Local/Server installs|
The syllabus page shows a table-oriented view of the course schedule, and the basics of course grading. You can add any other comments, notes, or thoughts you have about the course structure, course policies or anything else.
To add some comments, click the "Edit" link at the top.