Fa18 - GEOMETRIC FNDTNS OF DATA SCI (51735)
Instructor:
Professor Chandrajit Bajaj
- Office hours – Mon, Tue, Wed. 1:15 p.m. - 2:30 p.m. POB 2.324A
- Contact: bajaj at cs.utexas.edu
NOTE: Most questions should be submitted to Canvas rather than by sending emails to the instructor. Please attempt to make reservation a day before for the office hour to avoid conflicts.
Teaching Assistant
Yi Wang
- Office hours – Tues, Thur. 3:00 p.m.- 4:30 p.m. POB 2.102
- Contact: panzer.wy@utexas.edu
Note: Please attempt to make reservations a day before for the office hours to avoid conflicts.
Lecture Time and Location: M W 9:30 – 10:45 a.m. in GDC 4.302
Course Motivation and Synopsis
As businesses and academic enterprises gather ever increasing amount of data/ information, new challenges arise for data analysts. There is also a growing demand for reliable software that can parse these big data sets, and make robust inferences from the information it contains.
This course dwells on the geometric foundations as well as the computational aspects of data sciences, machine learning and statistical inference analysis. The topics spans scalable data analysis and geometric optimization, while weaving together discrete and continuous mathematics, computer science and statistics. Students shall delve with breadth-and-depth into dimensionality, sparsity, resolution, resolvability, recovery, prediction, for a variety of data (sequence, stream, graph-based, time-series, images, video, hyper-spectral), emanating from multiple sensors (big and small, slow and fast), and accumulated via the interactive WWW. Issues of measurement errors, noise and outliers shall be central to bounding the precision, bias and accuracy of the data analysis. The geometric insight and characterization gained provides the basis for designing and improving existing approximation algorithms for NP - hard problems with better accuracy / speed tradeoffs.
An initial listing of lecture topics is given in the syllabus below. This is subject to modification, given the background and speed at which we cover ground. Homework exercises shall be given almost bi-weekly. Assignment solutions that are turned in late shall suffer a 10% per day reduction in credit, and a 100% reduction once solutions are posted. There will be a mid-term exam in class. The content will be similar to the homework exercises. A list of topics will also be assigned as individual (or pair - group ) data science projects with a written/oral presentation, at the end of the semester. This project shall be graded, and be in lieu of a final.
The course is aimed at senior undergraduate and those in the 5-year master's program students, especially in the CS, CSEM, and MATH., but others are welcome. You’ll need math at the level of senior undergraduate, plus linear algebra, plus introductory functional analysis, probability and statistics (e.g., for CS and ECE students) or more discrete math (e.g.,for CSEM students).
Course Material.
- [B1] Chandrajit Bajaj (frequently updated) A Mathematical Primer for Computational Data Sciences
- [BHK] Avrim Blum, John Hopcroft and Ravindran Kannan. Foundations of Data Science
- [CVX] Stephen Boyd, Lieven Vandenberghe. Convex Optimization .
- [GBC] Ian Goodfellow, Yoshua Bengio, Aaron Courville Deep Learning .
- [JK] Prateek Jain, Purshottam Kar Non-Convex Optimization for Machine Learning .
- [MU] Michael Mitzenmacher, Eli Upfal Probability and Computing (Randomized Algorithms and Probabilistic Analysis)
- [SD] Shai Shalev-Shwartz, Shai Ben-David Understanding Machine Learning, From Theory to Algorithms
- Extra reference materials .
TENTATIVE COURSE OUTLINE (in Flux).
Date | Topic | Reading | Assignments |
08-29-2018 |
1. Introduction to Data Science, Geometry of Data, High Dimensional Spaces, [notes] Learning Models, Applications I [notes] |
[BHK] Ch 1 [B2] Ch 1 |
[A1] out today due before 09-12-2018, 11:59pm |
09-05-2018 |
2. Geometry of Vector, Matrix, Functional Norms and Approximations [notes] Supplementary Notes [notes] Learning Models, Applications II [notes] |
[B2] Ch 2 [BHK] Ch 12-Appendix |
|
09-10-2018 |
3. Probability Primer [notes] |
[BHK] Ch 12 [B1] Appendix |
|
09-12-2018 |
4. Sampling, High Dimensional Probability and Geometry [notes] Low Discrepancy Sampling [slides] |
[B2] Ch 5 [BHK] Ch 2 |
[A1] due [A2] out today |
09-17-2018 |
5. Spectral Decomposition, SVD, Applications [notes] |
[BHK] Ch 3 See Refs in Notes |
|
09-19-2018 | 6. Applications of Low Rank Matrix Approximation [notes] |
[BHK] Ch 3 See Refs in Notes |
|
09-24-2018 |
7. Geometry of Matrix Norms, Optimization, Under- and Over-constrained Linear Systems [notes] |
[CVX] Ch 1,2, Appendix See Refs in Notes |
|
09-26-2018 |
8. Geometry of Convex Optimization, Duality-I [notes] |
[CVX] Ch 3,4 See Refs in Notes |
|
10-01-2018 |
9. Geometry of Optimization, Primal-Dual [notes] |
See Refs in Notes |
|
10-03-2018 |
10. Geometry of Machine Learning, Perceptron, SVM, Kernel SVM [notes] |
[BHK] Ch 5 [B2] Ch 5 |
|
10-08-2018 |
11. Geometry of Machine Learning, PCA, Primal-Dual [notes] Applications [notes] |
[BHK] Ch 10,12 [B2] Ch 5 |
|
10-10-2018 |
12. Spectral Methods for Learning: Sparse and Kernel PCA, [notes] |
[BHK] Ch 12 [B2] Ch 5 |
[A3] due [A4] out today |
10-15-2018 |
13. Geometry of Spectral Methods for Learning: Fischer LDA, KDA, Applications [notes] |
See References in notes |
|
10-17-2018 |
14. Geometry of Spectral Methods for Learning: CCA ,QR, Applications [notes] |
|
|
10-22-2018 |
15. Matrix Sampling, Matrix Sketching (PAC) Algorithms, [notes] |
[BHK] Ch 2, Appendix [B2] Ch 5 |
|
10-24-2018 |
16. Random Projections, Johnson Lindenstrauss Compressive Sensing, Sparse Recovery [notes] |
[A4] due on Oct 28th [A5] out Oct 28 |
|
10-29-2018 |
17. Convex and Non-Convex Projected Gradient Descent [notes] |
[JK] Ch 3,4
|
|
10-31-2018 |
18. Generalized Projected Gradient Descent: convergence criteria [notes] |
[B2] Ch 2 [JK] Ch 2, 3
|
|
11-05-2018 |
19. Sparse Robust Recovery; Alternating Minimization I [notes] |
[JK] Ch 6 [BHK] Ch 10 |
|
11-07-2018 |
20. Sparse Robust Recovery; Alternating Minimization II [notes] |
See Refs in Notes |
[A5] Due Nov 11,11:59pm
|
11-12-2018 |
21. Review (Lecture Topics 1-20, Assignments 1 - 5) |
Notes, Solutions to Assignments. | |
11-14-2018 |
21. Mid-Term Exam (in class) |
|
Projects Assigned |
11-19-2018 |
22. Bayes Rule [notes] , Expectation Maximization, Maximum Likelihood [notes] |
[BHK] Ch 5,12 [CVX] Ch 6 |
|
11-21-2018 |
Thanksgiving Holiday |
|
|
11-26-2018 |
23. Alternating Maximization for Expectation Maximization, Gaussian Mixture Models [notes]
|
See Refs in Notes |
Additional Notes on Final Project (Background & Datasets) Digital Pathology |
11-28-2018 |
24. Geometry of Clustering I: Kernel, Optimization [notes] 25. Geometry of Clustering II: Spectral Analysis [notes] |
[BHK] Ch 7 See Refs in Notes |
|
12-03-2018 |
26. CNN,RNN, Geometry of Deep Learning, Applications, Next Steps [notes] |
[BHK] Ch 5 [GBC] Chap 6-9 |
|
12-05-2018 |
28. CNN,RNN, Geometry of Deep Learning, Applications, Next Steps [notes] |
[GBC] Chap 10-12 See also Refs in Notes |
|
12-10-2018 |
Final PROJECT PRESENTATION DAY (Project Doc) | POB 2.402, ViZ Lab | |
12-10-2018 |
Final Project Report Due | Due by 9am |
Project FAQ
1. How long should the project report be?
Answer: See directions in the Class Project List. For full points, please address each of the evaluation questions as succinctly as possible. Note the deadline for the report is May 11 midnight. You will get feedback on your presentations, that should also be incorporated in your final report.
Tests
There will be one in-class midterm exam and one final project. The important deadline dates are:
- Midterm: Monday, October 29, 9:30am - 10:45am
- Final Project Due: Dec 10, 11:59pm
Assignments
There will be five written HW assignments and one final project report. Please refer to the above schedule for assignments and final project report due time.
Course Requirements and Grading
Grades will be based on these factors
- In-class attendance and participation (10%)
- HW assignments (45% and with potential to get extra credit)
5 assignments. Some assignments may have extra questions for extra points you can earn. (They will be specified in the assignment sheet each time.)
- In-class midterm exam (20%)
- Final Presentation & Report (25%)
Students with Disabilities. Students with disabilities may request appropriate academic accommodations from the Division of Diversity and Community Engagement, Services for Students with Disabilities, 471-6259, http://www.utexas.edu/diversity/ddce/ssd
Accommodations for Religious Holidays. By UT Austin policy, you must notify the instructor of your pending absence at least fourteen days prior to the date of observance of a religious holiday. If you must miss a class or an examination in order to observe a religious holiday, you will be given an opportunity to complete the missed work within a reasonable time before or after the absence, provided proper notification is given.
Statement on Scholastic Dishonesty. Anyone who violates the rules for the HW assignments or who cheats in in-class tests or the final exam is in danger of receiving an F for the course. Additional penalties may be levied by the Computer Science department, CSEM and the University. See http://www.cs.utexas.edu/academics/conduct/
Course Summary:
Date | Details | Due |
---|---|---|