Sp16 - APPLIED DATA MINING (27515)

Sp16 - APPLIED DATA MINING (27515)

Room and times

Thursdays, 3-6pm in UTA 1.210A (the lab at the iSchool)

Instructor

Byron Wallace
byron.wallace@utexas.edu
Office: 5.532
Office Hours: Tuesdays, 3-4pm* 
* or by request

Course materials

We will primarily be using Data Science from Scratch by Joel Grus as our text. Additionally, we will rely on many online materials/resources and readings.

I will regularly post ipython notebooks/code/etc on our github at: https://github.com/bwallace/INF-385T-applied-data-mining.

For your Python distribution, I suggest the "Anaconda" distribution of Python 3. 

Course overview and objectives

This course will provide an overview of the applications, methods, tools and technologies that constitute ``data science'' and data mining. We will be building these from the ground up, using Python. This will be a hands-on course. 

Due to the diversity of subjects that comprise this emerging field, the class will necessarily have more breadth than depth. At the beginning of the course we will cover 'core' data mining topics, such as basic probability/stat and machine learning methods. After this material is covered, the focus of the class will shift to surveying applications of data mining and to class projects, which will be a major component of the course. These projects will allow you to pursue your own interests (and conduct new research in so doing!).

Briefly, in this course we will:

  • Learn about the methods underlying modern data mining, and how to use them
  • Survey applications of data mining
  • Explore the future and implications (ethical and otherwise) of data science

Specific topics will include:

  • Python programming for data analysis (including relevant libraries for modeling + data munging)
  • Supervised and unsupervised learning
  • Data processing and data mining workflows
  • Applications of data mining

We are going to be using the Python programming language, primarily. I will not be assuming that you are an expert programmer or that you are familiar specifically with Python, but I will assume that you are comfortable programming -- there will be programming. I will provide a first assignment aimed in part to assess your preparedness for the class. 

The final project will constitute a large part of your grade. You will work with me to select an appropriate final project: but, broadly, this will involve working with a dataset to perform an analysis or accomplish a specific task of interest (e.g., clustering classic literature based on word counts; building a predictive model in a domain that interests you; etc.). One component of your final project grade will be a presentation of the work that you will give in class. You can use whatever language/technology stack for your final project that you'd like, depending on what best suits your needs (but I of course recommend Python!). If you're not using Python, however, please talk to me first. 

Put succinctly: this course will familiarize you with data mining methods and applications, and you will have opportunity to pursue a project involving data mining that interests you.

Grading

Grading will primarily be based one three components: homework assignments, a mid-term and a final project. There will also be in-class exercises, which will generally be graded as pass/fail (based on participation). You will be allowed to drop your lowest exercise score. 

The final project will be the single most important element of this class; you should therefore start thinking about it early! The sooner you start talking with me about your idea(s), the better.

In class exercises 10%
Homeworks 30%
Midterm Exam 20%
Final Project 40%

Course policies

Late homework policy homeworks are due at the beginning of class. Late assignments will be accepted for two weeks after the class in which it they were due, but at a penalty of 10 points per week -- so failure to turn in an assignment at the start of the class in which it is due results in an immediate 10 point penalty. After two weeks, assignments will receive a 0. 

Students with disabilities may request appropriate academic accommodations from the Division of Diversity and Community Engagement, Services for Students with Disabilities, 512-471-6259, http://www.utexas.edu/diversity/ddce/ssd/

Note: the schedule below is subject to change!

Course Summary:

Date Details