Course Syllabus

This course, "Data Wrangling" will enable you to:

  • design database schemas for efficient data representation
  • implement database schemas using MySQL
  • learn how to learn new technologies
  • learn the basics of programming in Python
  • import and export data to/from CVS and Excel, changing schemas as needed
  • conduct basic analyses in Excel
  • prepare a project workflow that imports data from different sources and produces reports

There are no prerequisites for this course; it is appropriate even if you've never done any programming or behind the scenes work with computers. I'll address each topic "from scratch." If you do have significant, recent, experience, or training with programming and databases this is not the course for you; you will find it too introductory. In this case I require you to drop the course and welcome you to audit parts that you think will extend your knowledge.  In particular this course is not suitable as a interdisciplinary course for Computer Science students. From Spring 2017 onward enrollment in the course is restricted to Information School students. Students from outside the school have to contact the Instructor for permission to register.

Professor

This course is taught by James Howison.  My office is UTA 5.404.  My office hours are 4:15 to 5:15 Tuesday (i.e., just after class on Tuesday). Please understand that I will have to leave at exactly 5:15 because I have to pick my kiddo up from daycare. 

Class Meetings

Unlike almost all other iSchool classes, our class meets twice a week: Tuesday 3-4:15 and Thursday 10:30-11:45 in the iSchool computing lab classroom (UTA 1.210A).  

If you miss a class it is your responsibility to catch up; I provide Screencasts that cover much of the material (but not all). Please identify a classmate early on who will help you catch up on material if there is no Screencast available for that week. Office hours are not for personal replays of missed classes, but neither do I want you to fall behind, so please watch the screencasts or meeting with colleagues then come to office hours and we'll get you back on track. 

Computing resources

The majority of this class happens on the class server, so really we'll just be editing text files and uploading them from the local computers. I will be teaching using the Atom programmer's editor. This is available free for Mac, Linux, and Windows and will provide a consistent experience for us all. You are welcome to use your own laptop and should be able to install Atom and the plugins we will use. However I won't be able to stop the class to help you with issues on your laptop, rather I will ask you to use one of the lab computers to continue the exercises.  You can work with Purple shirts in the Computer Lab outside class time to get things working on your laptop.

In order to work from off campus from your own computer you will need to have the UTexas VPN installed see http://vpn.utexas.edu I recommend installing the Cisco AnyConnect client. Note that the VPN is not administered by me or by the Information School, so you must direct VPN issues to UTexas ITS support.

Course Texts

There are no required texts for the course, but you will find these resources to be useful:

Recommended Texts:

An intro book for MySQL that's available online at UT is: Learning MySQL

We'll be covering everything with materials in the course, but Introductions to programming in Python are available online, including the tutorial from CodeAcademy and CodeWars.

Weekly Assignments

The bulk of your course grade (75%) comes from Weekly Assignments. There are assignments throughout each week for this course, covering the material addressed that week.  The weekly assignments are due 11:59 pm on Sunday (this is to ensure that we can grade them before the Tuesday class). Late assignments will receive zero but you can drop your 2 lowest grades. However don't simply not turn in an assignment once it is late; the assignments test and drive your learning and your performance helps guide me on material. So not completing an assignment is a sure way to fall behind!

The assignments, and grading rubric, will be released on Canvas during the Thursday class, so we'll go over the assignment and ensure everyone knows what's required. Each assignment will be turned in online, usually by uploading a PDF or Text file, and/or providing a URL to your assignment on the class server.

If you've uploaded a PDF there will be comments left on the PDF, in addition to any text comments in Canvas. You can see the comments on the PDF via by "viewing feedback":

https://guides.instructure.com/m/4212/l/352349-how-do-i-view-assignment-feedback-comments-from-my-instructor-using-crocodoc-annotations

Project

A portion of the course (25%) is a project to produce a data wrangling workflow. The project will be done in pairs. We will have an initial individual assignment to generate ideas, then opportunities to review each other's ideas and form groups. Working in pairs for this course does not mean dividing up the work; I require each group member to understand everything about their project. If there is a part of your project that you don't understand or couldn't work with then you are missing a crucial learning opportunity. We work in pairs to work together (supporting each other's learning) not to reduce the amount or diversity of the work that we do.

When I use the word "workflow" I mean a process that moves from data you obtain through reading it, analyzing it, and providing results. The workflow includes the data, the scripts that read and query the data, and the final output. We will discuss example projects in class early in the semester. 

The project builds up through the semester (e.g., after we've learned Database Design you will do a design for your workflow), culminating in a full workflow that you demonstrate and describe through the report. In the past I required a screencast but that is not required now (so ignore any references to producing a screencast).

There are more details on the specific Assignment page: Project Workflow and Report

Example projects from previous years:

  • State education, teachers salaries, public schools. Report.
  • Visualizing impact of weather on border crossings. Report.
  • How have "Game of Thrones" character names influenced baby names in the US? Report.
  • Lightening strikes and baseball games
  • Average age of Oscar winners by gender
  • How do solar flares affects temperatures on Earth and on Mars

Collaboration policy

The weekly assignments are individual work. However, on one condition, I give you explicit permission to work together with other classmates on the assignments or on your projects. With the same condition, you are also welcome to seek input from people outside the class, such as friends and family. Neither "working together" nor "seeking input" means having others do the work for you; you should always be certain that you are learning and that you understand the code that you have submitted. The one condition is that you add a note to your homework (ideally through a comment in the Canvas submission) indicating how the work was done and identifying with whom you worked and how. For example you might say "Daria and I worked on this in the lab together, when we started out we were confused about X but I figured it out and shared that with Daria. Our code is very similar because we worked together". Or perhaps "I was confused about how to pad a string with spaces, and after working at it for 30 minutes I chatted about it with my partner who suggested the xyz method. I was pleased when I got that working myself." When you have worked together your code will have similarities, but you must not turn in identical code; rather you should take code you've worked on together and personalize it through comments that explain what is happening in the code. The comments must be your own, individual, work.

If you have questions on this policy please ask in the Discussion forum and I will answer there: Assignment Discussions. I have this policy because learning to program is both individual hard work and learning how to get help from others. Sometimes chatting through with another class member is just what is needed.

iSchool Open Day

Although not required for this class, you may want to present your workflow as a Student Project at the iSchool's Open Day (typically in May).  The Open House is an opportunity to present student projects, including projects from this course (but also from other courses or semesters!).

Schedule

We will meet each Tuesday and Thursday until May 4th, but will not have classes on:

  • Tuesday 24 January (Professor traveling for research)
  • Tuesday 28 February (Professor traveling for research)
  • Tuesday 14 March (Spring Break)
  • Thursday 16 March (Spring Break)
Date Meet? Topic Topic Details
17-Jan-17 meet Intro Intro/Card exercise
19-Jan-17 meet Project Intro Project Intro
24-Jan-17 no meeting
26-Jan-17 meet Data Modeling Data Modeling I - has_many, datatypes
31-Jan-17 meet Data Modeling II - habtm
2-Feb-17 meet Data Modeling III - has_many through 
7-Feb-17 meet Project Speed Dating & ERD to phpMyAdmin
9-Feb-17 meet SQL Queries SQL Queries I - Intro and SELECT basics
14-Feb-17 meet SQL Queries III - Join, Single table
16-Feb-17 meet SQL Queries IV - Join, many tables
21-Feb-17 meet SQL Queries V - Aggregate Functions
23-Feb-17 meet SQL Queries VI - GROUP BY
28-Feb-17 no meeting
2-Mar-17 meet SQL Queries VII - INSERT, UPDATE, DELETE
7-Mar-17 meet Excel Queries Excel - Queries - Sorts, Filters, joins
9-Mar-17 meet Excel - Pivot Tables and Graphs
14-Mar-17 spring break
16-Mar-17 spring break
21-Mar-17 meet Python I Server Introduction: Part 1Part 2, Atom SFTP and Creating Files.
23-Mar-17 meet Python - Hello World and Strings
28-Mar-17 meet Python - if/else/while
30-Mar-17 meet Python - Lists and dicts
4-Apr-17 meet
6-Apr-17 meet Python II Python - CSV
11-Apr-17 meet Python - Writing CSV files
13-Apr-17 meet Python - Basic SQL queries from Python
18-Apr-17 meet Python - Parameterized SQL Queries
20-Apr-17 meet Python - CSV to SQL via Python (putting it all together)
25-Apr-17 meet Python - Data transforms/Dates
27-Apr-17 meet Python - Regex and splits
2-May-17 meet Python - Working with large files/Workshop
4-May-17 meet "What's Next?" Local/Server installs

Course Summary:

Date Details Due