Fa17 - GEOMETRIC FNDTNS OF DATA SCI (65830)

Fa17 - GEOMETRIC FNDTNS OF DATA SCI (65830)

 

 Instructor:

Professor Chandrajit Bajaj

  • Office hours – Mon.,  Wed. 11:00 a.m. - 12:00 p.m. POB 2.324A
  • Contact: bajaj at cs.utexas.edu

NOTE: Most questions should be submitted to Canvas rather than by sending emails to the instructor. Please attempt to make reservation a day before for the office hour  to avoid conflicts. 

 

Teaching Assistant

Anikesh Kamath

  • Office hours – Tues, Thur. 11:00 a.m.- 12:00 p.m., 4:00 - 5:00pm, GDC 1.302
  • Contact: akamath@utexas.edu

Note: Please make reservation a day before for the office hour in advance to avoid conflict. 

Lecture Time and Location: M W 9 – 10:30 a.m. in BUR 112

Prerequisites

 The course shall dwell on the geometric, mathematical and statistical foundations, necessary to understand and computationally exploit scalable data analysis and visualization. We shall explore the dimensionality, sparsity and resolution  of  data (sequence, stream, graph-based,  time-series, images, video, hyper-spectral), emanating from multiple sensors (big and small, slow and fast), and accumulated via the interactive WWW.  Issues of measurement errors, noise and outliers shall be central to bounding the precision, bias and accuracy of the data analysis. We shall learn to characterize and measure the differences in high dimensions, along with  computational concepts that underlie dimension reduction, sampling, sketching, optimization in machine learning and deep learning.  The geometric insight and characterization gained provides the basis  for  designing and improving existing approximation algorithms for NP - hard problems with better accuracy / speed tradeoffs.

 

An initial listing of lecture topics  is given in the syllabus below. This is subject to modification, given the background and speed at which we cover ground.  Homework exercises shall be given almost  bi-weekly.  Assignment solutions that are turned in late shall suffer a  10% per day reduction in credit, and a 100% reduction once solutions are posted. There will be a mid-term exam in class. The content will be similar to the homework exercises. A list of  topics will also be assigned as individual (or pair - group ) data science projects with a written/oral presentation at the end of the semester. This project shall  be graded, and in lieu of a final.

The course is aimed at senior year under-graduate students, especially in the CS, CSEM, ECE and MATH. programs, but others are welcome. You’ll need math at the level of at least 2nd year calculus, plus linear algebra, plus either more continuous math (e.g., CS and ECE students) or more discrete math (e.g., CSEM students).  

Textbook and Course Material.

 

COURSE OUTLINE. 

Date Topic Reading Assignments
08-30-2017 1. Geometry of Data, High Dimensional Spaces,  Foundational Mathematics and Computer Science [notes]

 [BHK] Ch 2, Ch12- Appendix

[B2] 

 
09-06-2017 2. Geometry of Spherical Balls, Gaussians, Sampling in High Dimension, Law of Large Numbers.  [notes]  [BHK] Ch 2, Ch12 -Appendix

 [A1]

due on 09-18-2017, 11:59pm

09-11-2017 3. Probability and Geometry, Multivariate Gaussians, Binomial, Poisson, Normal, Central Limit Theorem  [notes]  [BHK] Ch 2, Ch 12-Appendix  
09-13-2017 4. Geometry of Concentration Bounds, Markov, Chebyshev, Separating a Mixture of Gaussians [notes]  [BHK] Ch2, Ch12  
09-18-2017 5. Probabilistic Approximation Algorithms, Chernoff Bounds, Geometry of  Monte-Carlo Integration  [notes] [slides 19 - 22]  [BHK] Ch 12

[A1 Solution]

Scripts for Exercise 5

 
09-20-2017 6. Pairwise Independence, Low Discrepancy Sampling, Finite Fields [notes] [Slides 23-35] [BHK] Chap 12

 [A2]

Due on 10-06-2017

09-25-2017

7. Bayes Rule , Unbiased Estimators, Transformation of Random Variables  [notes]

[BHK] Chap 12

 

09-27-2017

8.Geometry of  MC & QMC Integration & Koksma-Hlawka Bound, Holder Inequalities  [notes] [Slides 23-35]

[BHK] Chap 12

 

10-02-2017

9. Geometry of Norms, Matrices,  Low Rank-Approximations. J [notes]

[BHK] Chap 2, Appendix

 

10-04-2017

10.Geometry of  Matrix Sampling, Matrix Sketching  Algorithms [notes]

[BHK] Chap 6

 [A2 solutions]

10-09-2017

11.Geometry of  Best Fit Spaces, Spectral Decomposition, Eigenvectors [notes]

[BHK] Chap 3, Appendix

[A3]

Due on 10-23-2017

10-11-2017

12. Geometry of  Maximum Noise Fraction, Applications of Low Rank Matrix Decomposition [notes]

[BHK] Chap 10

10-16-2017

 13. Geometry of SVD, Pseudo-Inverse, Over- and Under-constrained Linear Systems, Optimization  [notes]

[BHK] Chap 6
10-18-2017 14.Geometry of Johnson Lindenstrauss, Restricted Isometry, Compressed Sensing, Sparse Vectors, Optimization [notes] [BHK] Chap 2, 10
10-23-2017 15. Recap of Lessons Learnt. Question/Answer Session (Lecture Notes 1 - 14, Assignments #1 - #3). [BHK] Chapters 1-3,6,10 Appendix [A3 Solutions]

10-25-2017

MIDTERM in CLASS [solutions]

 

10-30-2017

 16. Geometry of Constrained Optimization, Lagrange Multipliers   [notes]

[BHK] Chap 10

[A4]

Solution Template

Due 11-13-2017

11-01-2017

 17.Geometry of  Convex Programming, Linear Programming, Primal/Dual, Positive Semi-Definiteness ,    [notes]

[BHK] Chap 10

[CVX] Chap 2 -5

11-06-2017

18.Geometry of Machine Learning & VC Dimension  [notes] [BHK] Chap 5 Project Topics

11-08-2017

19.Geometry of Support Vector Machines, Lagrangian Duals, Kernel Trick   [notes]

[BHK] Chap 5

[CVX] Chap 5

 

 

11-13-2017

20. Geometry of Spectral Methods for Learning: PCA [notes] [BHK] Chap 5

 

[A4 Solutions]

 

11-15-2017

21. Geometry of  Spectral Methods for Learning: Fischer LDA [notes] [BHK] Chap 5

 

11-20-2017

22.  Geometry of Spectral Methods for Learning: CCA    [notes] [BHK] Chap 5

 

11-22-2017

THANKSGIVING BREAK (no Class)

 

11-27-2017

23. Geometry of Clustering and Spectral Analysis [notes] [BHK] Chap 7

11-29-2017

24. Geometry of Deep Learning: CNN, etc   [notes] [BHK] Chap 5

12-04-2017

PROJECT PRESENTATIONS

12-06-2017

PROJECT PRESENTATIONS

12-11-2017

WRAPUP

 

Class Project Topics List

Class Project Teams 

Grp. No. Class Member Teams Project Topic PPT Day
1 Peter Ackley, Nghia (Tommy) Huynh Amazon Dataset Dec 4
2 Dhwanit Agarwal, Natasa Dragovic Amazon RecSys Dec 4
3 Meghana Palukuri, Supawit Chockchowwat Predicting pairwise docking Dec 4
4 Jose Magana, Ethan Arnold Amazon Dataset Dec 4
5 Cameron Moeller Ebola Health Survey Dec 6
6 Sidharth Kapur, Shaayaan Sayed Fingerprint Detection Dec 4
7 Neil Patil Bird Species Classification Dec 6
8 Joshua Pham, Marco A Guajardo Bird Species Classification Dec 4
9 Eric Rincon Object Classification Dec 6
10 Michael Scaria Bird Species Classification Dec 6
11 Guneet Dhillon, Brahma Pavse Image Classification with Adversarial Examples Dec 4
12 Max Granat Fingerprint Matching Dec 6
13 Andrew Li Digit Classification Dec 6
14 Tiffany Tsai Bird Species Classification Dec 6

 

Project FAQ

1. How long should the project report be?

Ans. As long as you like. For full points, please address each of the evaluation questions as succinctly as possible. Note the deadline for the report is Dec 11 midnight. You will get feedback on your presentations, next week,  that should also be incorporated in your final report.

Tests

There will be one in-class midterm exam and one final project. The important deadline dates are:

  • Midterm: Wednesday, October 25, 9:00am - 10:30am
  • Final Project Due: Monday, Dec 11,  11.59pm

 

Assignments

There will be four written HW assignments and one final project report. Please refer to the above schedule for assignments and final project report due time. 

 

Course Requirements and Grading

Grades will be based on these factors

  • In-class attendance and participation (5%)
  • HW assignments (45% and a little more extra credit) 

4 assignments. Some assignments may have extra questions for extra points you can earn. (They will be specified in the assignment sheet each time.)

  • In-class midterm exam (20%) 
  • Final Presentation & Report (30%) 

 

 

Students with Disabilities. Students with disabilities may request appropriate academic accommodations from the Division of Diversity and Community Engagement, Services for Students with Disabilities, 471-6259, http://www.utexas.edu/diversity/ddce/ssd 

 

Accommodations for Religious Holidays. By UT Austin policy, you must notify the instructor of your pending absence at least fourteen days prior to the date of observance of a religious holiday. If you must miss a class or an examination in order to observe a religious holiday, you will be given an opportunity to complete the missed work within a reasonable time before or after the absence, provided proper notification is given.

 

Statement on Scholastic Dishonesty. Anyone who violates the rules for the HW assignments or who cheats in in-class tests or the final exam is in danger of receiving an F for the course. Additional penalties may be levied by the Computer Science department,  CSEM  and the University. See http://www.cs.utexas.edu/academics/conduct/

 

 

 

Course Summary:

Date Details