Sp19 - STAT/DISCRETE METHS SCI COMPUT (62905)

Instructor:

Professor Chandrajit Bajaj

  • Lecture Hours -- Mon, Wed 11:00 - 12:15 pm GDC 5.304
  • Office hours – Mon, {Tue}, Wed. 1:15 p.m. - 2:30 p.m. POB 2.324A
  • Contact: bajaj@cs.utexas.edu

NOTE: Most questions should be submitted to Canvas rather than by sending emails to the instructor. Please attempt to make reservation a day before for the office hour  to avoid conflicts. 

 

Teaching Assistant

Yi Wang

  • Office hours – Tues, Thur. 4:00 p.m.- 5:30 p.m. POB 2.102
  • Contact: panzer.wy@utexas.edu

Note: Please attempt to make reservations a day before for the office hours  to avoid conflicts. 

 

Lecture Time and Location: M W 11:00 a.m. – 12:15 p.m. in GDC 5.304

Course Motivation and Synopsis

 As businesses and academic enterprises gather ever increasing amount of data/ information, new challenges arise for data analysts. There is also a growing demand for reliable software that can parse these big data sets, and make robust inferences from the information it contains. 

This course dwells on the geometric foundations as well as the computational aspects of data sciences, machine learning and statistical inference analysis. The topics spans scalable data analysis and geometric optimization, while  weaving  together discrete and continuous mathematics, computer science and statistics. Students shall delve with breadth-and-depth into dimensionality, sparsity, resolution, resolvability, recovery, prediction, for a variety of   data (sequence, stream, graph-based,  time-series, images, video, hyper-spectral), emanating from multiple sensors (big and small, slow and fast), and accumulated via the interactive WWW.  Issues of measurement errors, noise and outliers shall be central to bounding the precision, bias and accuracy of the data analysis. The geometric insight and characterization gained provides the basis  for  designing and improving existing approximation algorithms for NP - hard problems with better accuracy / speed tradeoffs.

 An initial listing of lecture topics  is given in the syllabus below. This is subject to modification, given the background and speed at which we cover ground.  Homework exercises shall be given almost  bi-weekly.  Assignment solutions that are turned in late shall suffer a  10% per day reduction in credit, and a 100% reduction once solutions are posted. There will be a mid-term exam in class. The content will be similar to the homework exercises. A list of  topics will also be assigned as individual (or pair - group ) data science projects with a written/oral presentation, at the end of the semester. This project shall  be graded, and be in lieu of a final.

The course is aimed at graduate students. Those in the 5-year master's program students, especially in the CS, CSEM, ECE and MATH. are welcome. You’ll need math at the level of first year graduate, plus linear algebra, geometry, plus introductory functional analysis,  probability and statistics  (e.g., for  CS and ECE students) or more discrete math (e.g.,for  CSEM and Math. students).  

Course Material.

  1. [B1] Chandrajit Bajaj (frequently updated)  A Mathematical Primer for Computational Data Sciences 
  2. [BHK] Avrim Blum, John Hopcroft and Ravindran Kannan. Foundations of Data Science 
  3. [CVX] Stephen Boyd, Lieven Vandenberghe. Convex Optimization .
  4. [GBC] Ian Goodfellow, Yoshua Bengio, Aaron Courville Deep Learning .
  5. [JK] Prateek Jain, Purshottam Kar Non-Convex Optimization for Machine Learning .
  6. [MU] Michael Mitzenmacher, Eli Upfal Probability and Computing (Randomized Algorithms and Probabilistic Analysis)
  7. [SD] Shai Shalev-Shwartz, Shai Ben-David Understanding Machine Learning, From Theory to Algorithms
  8. [MRT] Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar Foundation of Machine Learning
  9. [HF] Hermann Flaschka  Principles of Analysis
  10. Extra reference materials .

 

TENTATIVE  COURSE OUTLINE (in Flux). 

Date Topic Reading Assignments
01-23-2019

1. Introduction to Data Science, Geometry of Data, High Dimensional Spaces   [notes]

Learning, Models, Applications I [notes] 

Mathematical Review [see refs in Reading]

 [BHK] Ch 1, 12-Appendix

[B1] Ch 1,2

[CVX] [MRT] Appendix

 [A1] out today

due before 02-04-2018, 11:59pm

01-28-2019

2. Learning Models, Applications II [notes]

Geometry of Vector, Matrix, Functional Norms  and Approximations [notes] 

Functional Approximation [notes]

[HF] Sec 1,2,3

[BHK] Ch 12-Appendix

 

01-30-2019

3. Noisy Regression & Probability Primer [notes]

[MU] Ch 1 -4

[B1] Appendix

02-04-2019

4. Machine Learning Theory, Empirical Risk Minimization, PAC Model [notes]

[MRT] Chap 1, 2

[MU] . Chap 1

[A1] solution 

 [A2] out today

due before 02-18-2018, 11:59pm

02-06-2019

5. Sampling, High Dimensional Probability and Geometry, Mixture of Gaussians [notes

[BHK] Ch 3

See Refs in Notes

 
02-11-2019

6.  Transform Sampling [notes] Low Discrepancy Quasi-Monte Carlo Sampling, [slides], Integration Error H-K Inequality [notes]

See Refs in Notes in Slides

 

 

02-13-2019

7. Compression via Low Rank Matrix Approximation with Applications [notes]

[BHK] [CVX] Appendix

See References in Notes

 

02-18-2019

8. Matrix Sampling, Matrix Sketching  Algorithms, [notes]  

[BHK] Chap 6

See Refs in Notes

02-20-2019

9. Geometry of Optimization I [notes], Duality -I [notes]

[CVX] Chap 1, 2, 3

See Refs in Notes

[A2] solution

 [A3] out today

due before 03-06-2019, 11:59pm

02-25-2019

10. Geometry of  Optimization II [notes], Primal-Dual -II, [notes] 

 

[B2]  Ch 5

[CVX] Ch 5, Ch 8

 

02-27-2019

11.  Review : Geometry of  Optimization [notes]  [notes]

[CVX] Ch 5, Ch 8

[B2]  Ch 5

 

03-04-2019

12. Lagrange Multipliers, Lagrangian Dual [notes]

[BHK] Ch 12

[B2]  Ch 5

 

 

 

03-06-2019

 13. Machine  Learning :  SVM, Kernel SVM Classification [notes]

See References in notes

[A3]  solution 

 [A4] out today

due before 03-27-2019, 11:59pm

03-11-2019

 14. Spectral Methods for Learning: PCA, Kernel PCA,  [notes]  

See References in notes

 

03-13-2019

 MIDTERM in Class

 

 

03-25-2019

15. Spectral Methods for Learning II: Fischer LDA, KDA [notes]

See Refs in Notes

 

03-27-2019

 16. Geometry of Unsupervised Clustering I:  K-Means, Kernel, Optimization [notes]

Geometry of Unsupervised Clustering II:   Min Cut, Normalized [notes]

[BHK] Ch 7

[A4]  solution 

[A5] out 

due before 04-10-2019, 11:59pm

04-01-2019

17.  Random Projections, Johnson Lindenstrauss, RIP Matrices  , Compressive Sensing [notes]

[BHK] Ch 2

 

04-03-2019

18. Learning Models from Data: Convex and Non-Convex Optz [notes]

[JK]  Ch 3,4

 

04-08-2019

 19. Robust Sparse Recovery; Alternating Minimization  [notes2]

[JK]  Ch 9

Final Projects Assigned

 

04-10-2019 20. Robust Sparse Recovery;  Recovery Guarantees [notes]  [JK]  Ch 7,8

[A5] Solution 

04-15-2019 21. Statistical Estimation  Stochastic Optimization : Parametric & Non-Parametric Distribution Estimation, [Notes] Maximum Likelihood, MAP  [Notes]

 

[CVX] Ch 7

 

04-17-2019

22. Statistical Machine Learning I: Alternating Maximization, Expectation Maximization (soft Clustering) [Notes]

[JK]  Ch 5

 

 

 

04-22-2019

 23.  Statistical Machine Learning II:  Latent Variable Models , AM-LVM, SeparatingGaussian Mixtures, Mixed Linear Regression [Notes]

Stochastic Gradient Descent-- Simulated Annealing, Fockker-Planck [notes]

[JK]  Ch 5
04-24-2019 24. Statistical Independence, Independent and Canonical Correlation Analysis [notes]

 

See Refs in Notes

04-29-2019

 

25. Geometry of Deep LearningI [notes]

[BHK] Ch 5

See Refs in Notes

 

05-01-2019

26.  Geometry of Deep Learning II  [notes]

 

[BHK] Ch 5

[GBC] Chap 6,9

 

 

05-06-2019

27. Geometry of Deep Learning III    [notes]

[GBC]  Chap 7-8

See also Ref

05-10-2019

28. Geometry of Deep Learning IV, Applications   [notes] [GBC]  Chap 10-12

05-13-2019

Final PROJECT PRESENTATION DAY

 

POB 2.402, ViZ Lab Final Project Report Due

 

Project FAQ

1. How long should the project report be?

Answer: See directions in the Class Project List.  For full points, please address each of the evaluation questions as succinctly as possible. Note the deadline for the report is May 11 midnight. You will get feedback on your presentations,  that should also be incorporated in your final report.

Tests

There will be one in-class midterm exam and one final project. The important deadline dates are:

  • Midterm: Wednesday, March 13, 11:00am - 12:15am GDC 5.304
  • Final Project  Written Report, Due: May 10, 11:59pm

 

Assignments

There will be five written HW assignments and one final project report. Please refer to the above schedule for assignments and final project report due time.

 

Course Requirements and Grading

Grades will be based on these factors

  • In-class attendance and participation (10%)
  • HW assignments (45% and with potential to get extra credit) 

5 assignments. Some assignments may have extra questions for extra points you can earn. (They will be specified in the assignment sheet each time.)

  • In-class midterm exam (20%) 
  • Final Presentation & Report (25%) 

 

 

Students with Disabilities. Students with disabilities may request appropriate academic accommodations from the Division of Diversity and Community Engagement, Services for Students with Disabilities, 471-6259, http://www.utexas.edu/diversity/ddce/ssd . 

 

Accommodations for Religious Holidays. By UT Austin policy, you must notify the instructor of your pending absence at least fourteen days prior to the date of observance of a religious holiday. If you must miss a class or an examination in order to observe a religious holiday, you will be given an opportunity to complete the missed work within a reasonable time before or after the absence, provided proper notification is given.

 

Statement on Scholastic Dishonesty. Anyone who violates the rules for the HW assignments or who cheats in in-class tests or the final exam is in danger of receiving an F for the course. Additional penalties may be levied by the Computer Science department,  CSEM  and the University. See http://www.cs.utexas.edu/academics/conduct/

Course Summary:

Date Details Due