This course, "Data Wrangling" will enable you to:
- design database schemas for efficient data representation
- implement database schemas using MySQL
- navigate data management issues in organizations
- learn how to learn new technologies
- learn the basics of programming in Python
- import and export data to/from CVS and Excel, changing schemas as needed
- conduct basic analyses in Excel
- prepare a project workflow that imports data from different sources and produces reports
There are no prerequisites for this course; it is appropriate even if you've never done any programming or behind the scenes work with computers. I'll address each topic "from scratch". If you do have significant, recent, experience, or training with programming and databases this is not the course for you; you will find it too introductory. In this case I require you to drop the course and welcome you to audit parts that you think will extend your knowledge. In particular this course is not suitable as a interdisciplinary course for Computer Science students.
This course is taught by James Howison. My office is UTA 5.404. My office hours are 11-12 Tuesday (i.e., just after class on Tuesday).
The TA for this class is Ayse Gursoy, an iSchool PhD student. Ayse will attend some classes to help me work with you as you practice, and will help with grading. She can be reached via email at email@example.com.
Unlike almost all other iSchool classes, our class meets twice a week: Tuesday and Thursdays 9:00-10:15, in the iSchool computing lab classroom (UTA 1.210A). If you miss a class it is your responsibility to catch up; I provide Screencasts that cover much of the material (but not all). Please identify a classmate early on who will help you catch up on material if there is no Screencast available for that week. It is not acceptable to miss a class but expect personal tutoring on what you've missed during office hours, unless you've already walked through the material from your classmate.
The majority of this class happens on the class server, so really we'll just be editing text files and uploading them from the local computers. I will be teaching using the Mac computers in the lab. This is primarily for a consistent experience for the class in the choice of text editor and uploading files to the server. You are welcome to use your own laptop or windows computer, but I won't be able to stop the class to help you with those, rather I will ask you to use one of the lab computers to continue the exercises. You can work with Purple shirts in the Computer Lab outside class time to get things working on your laptop.
There are no required texts for the course, but you will find these resources to be useful.
An intro book for MySQL that's available online at UT is: Learning MySQL
The bulk of your course grade (70%) comes from Weekly Assignments. There are assignments throughout each week for this course, covering the material addressed that week. The assignments are due 9:00 am on the Monday of the following week (this is to ensure that we can grade them before the Tuesday class). Late assignments will receive zero but you can drop your 2 lowest grades. The assignments, and grading rubric, will be released on Canvas during class, so we'll go over the assignment and ensure everyone knows what's required. Each assignment will be turned in online, usually by uploading a PDF or Text file, and/or providing a URL to your assignment on the class server.
A portion of the course (30%) is an individual project to produce a data wrangling workflow that imports data from different sources to a database and then produces reports from that data. We will discuss example projects in class early in the semester. You will make a Screencast to demonstrate your workflow. The project builds up through the semester (e.g., after we've learned Database Design you will do a design for your workflow), culminating in a full workflow that you demonstrate and describe through the Screencast. There are more details on the specific Assignment page: Project Workflow, Screencast, and short report.
Example projects from previous years:
- Vizualising impact of weather on border crossings. Screencast and Report.
- Lightening strikes and baseball games
- Average age of Oscar winners by gender
iSchool Open Day
Although not required for this class, you may want to present your workflow as a Student Project at the iSchool's Open Day (typically in May). The Open House is an opportunity to present student projects, including projects from this course (but also from other courses or semesters!).
Note that we have a class on Tuesday of the week of Thanksgiving, but not the Thursday. The Tuesday class is a project workshop, so if you have to travel that week just be sure that your project is well underway!
You can see the class schedule by looking at the Modules List. We will explore the topics in the order presented there. The unindented items are links to pages (one per class), with links to Screencasts and Handouts needed for the class. Where I have provided a screencast for the class, you must organize your time to watch this before class; we will review the material and I will answer questions that arise, but I want to use class time primarily for practicing, rather than lecturing. The indented items on the Module's page are assignments.
The syllabus page shows a table-oriented view of the course schedule, and the basics of course grading. You can add any other comments, notes, or thoughts you have about the course structure, course policies or anything else.
To add some comments, click the "Edit" link at the top.