Course Syllabus

This course, "Data Wrangling" will enable you to:

  • design database schemas for efficient data representation
  • implement database schemas using MySQL
  • navigate data management issues in organizations
  • learn how to learn new technologies
  • learn the basics of programming in Python
  • import and export data to/from CVS and Excel, changing schemas as needed
  • conduct basic analyses in Excel
  • prepare a project workflow that imports data from different sources and produces reports

There are no prerequisites for this course; it is appropriate even if you've never done any programming or behind the scenes work with computers. I'll address each topic "from scratch". If you do have significant, recent, experience, or training with programming and databases this is not the course for you; you will find it too introductory. In this case I require you to drop the course and welcome you to audit parts that you think will extend your knowledge.  In particular this course is not suitable as a interdisciplinary course for Computer Science students.

Professor

This course is taught by James Howison.  My office is UTA 5.404.  My office hours are 11-12 Monday (i.e., just after class on Monday).  

The TA for this class is Eunyoung Moon, an iSchool PhD student. Eunyoung will attend some classes to help me work with you as you practice, and will help with grading. She can be reached via email at eymoon@utexas.edu.

Class Meetings

Unlike almost all other iSchool classes, our class meets twice a week: Monday and Wednesday 9:00-10:15, in the iSchool computing lab classroom (UTA 1.210A).  If you miss a class it is your responsibility to catch up; I provide Screencasts that cover much of the material (but not all). Please identify a classmate early on who will help you catch up on material if there is no Screencast available for that week. Office hours are not for personal replays of missed classes, but neither do I want you to fall behind, so please watch the screencasts or meeting with colleagues then come to office hours and we'll get you back on track. 

Computing resources

The majority of this class happens on the class server, so really we'll just be editing text files and uploading them from the local computers. I will be teaching using the Mac computers in the lab.  This is primarily for a consistent experience for the class in the choice of text editor and uploading files to the server. You are welcome to use your own laptop or windows computer, but I won't be able to stop the class to help you with those, rather I will ask you to use one of the lab computers to continue the exercises.  You can work with Purple shirts in the Computer Lab outside class time to get things working on your laptop.

You will need your iSchool account username and password for the class.  See the excellent FAQs and video tutorials provided by the school IT group. In particular see the screencast on resetting your password.

Course Texts

There are no required texts for the course, but you will find these resources to be useful.

Recommended Texts:

An intro book for MySQL that's available online at UT is: Learning MySQL

Introductions to programming in Python are available online, including the tutorial from CodeAcademy.  We'll be covering everything with materials in class, but you will find these tutorials useful.

Weekly Assignments

The bulk of your course grade (70%) comes from Weekly Assignments. There are assignments throughout each week for this course, covering the material addressed that week.  The assignments are due 11:59 pm on the Sunday of the following week (this is to ensure that we can grade them before the Wednesday class). Late assignments will receive zero but you can drop your 2 lowest grades. The assignments, and grading rubric, will be released on Canvas during class, so we'll go over the assignment and ensure everyone knows what's required. Each assignment will be turned in online, usually by uploading a PDF or Text file, and/or providing a URL to your assignment on the class server.

If you've uploaded a PDF the TA or I might have left comments on the PDF, which you can see by viewing feedback:

https://guides.instructure.com/m/4212/l/352349-how-do-i-view-assignment-feedback-comments-from-my-instructor-using-crocodoc-annotations

 

Project

A portion of the course (30%) is an individual project to produce a data wrangling workflow that imports data from different sources to a database and then produces reports from that data. We will discuss example projects in class early in the semester.  The project builds up through the semester (e.g., after we've learned Database Design you will do a design for your workflow), culminating in a full workflow that you demonstrate and describe through the report. In the past I required a screencast but that is not required now. There are more details on the specific Assignment page: Project Workflow and Report

Example projects from previous years:

  • Vizualising impact of weather on border crossings. Screencast and Report.
  • Lightening strikes and baseball games
  • Average age of Oscar winners by gender

Collaboration policy

Both the assignments and the project are individual. However, on one condition, I give you explicit permission to work together with other classmates on the assignments or on your projects. With the same condition, you are also welcome to seek input from people outside the class, such as friends and family. Neither "working together" nor "seeking input" means having others do the work for you; you should always be certain that you are learning and that you understand the code that you have submitted.

The one condition is that you add a note to your homework (ideally through a comment in the Canvas submission) indicating how the work was done and identifying with whom you worked and how. For example you might say "Daria and I worked on this in the lab together, when we started out we were confused about X but I figured it out and shared that with Daria. Our code is very similar because we worked together". Or perhaps "I was confused about how to pad a string with spaces, and after working at it for 30 minutes I chatted about it with my partner who suggested the xyz method. I was pleased when I got that working myself."

If you have questions on this policy please ask. I have this policy because learning to program is both individual hard work and learning how to get help from others. Sometimes chatting through with another class member is just what is needed.

iSchool Open Day

Although not required for this class, you may want to present your workflow as a Student Project at the iSchool's Open Day (typically in May).  The Open House is an opportunity to present student projects, including projects from this course (but also from other courses or semesters!).

Schedule

Note that we have a class on Monday of the week of Thanksgiving, but not the Wednesday. The Monday class is a project workshop, so if you have to travel that week just be sure that your project is well underway!

You can see the class schedule below. Each class is a page within the Modules List. On that list the unindented items are links to pages (one per class), with links to Screencasts and Handouts needed for the class. Where I have provided a screencast for the class, you should organize your time to watch this before class. We will go over the material together in class, but if you've prepared before class everything will run more smoothly. The indented items on the Module's page are assignments.

 

Course Summary:

Date Details Due