Sp16 - COMP STAT APPL TO BIOINFMATICS (51415)
Sp16 - COMP STAT APPL TO BIOINFMATICS (51415)
CSE 383M (62965) Statistical and Discrete Methods for Scientific Computing
CS395T (51415) Computational Statistics with Application to Bioinformatics
The University of Texas at Austin Department of Computer Science
Professor Chandrajit Bajaj
- Office hours – Monday 11:00 a.m. - 12:00 p.m., 2:00 p.m - 3:00 p.m. POB 2.324A
- Contact: bajaj at cs.utexas.edu
NOTE: Most questions should be submitted to Canvas rather than by sending emails to the instructor. Please make reservation a day before for the office hour in advance to avoid conflict.
- Office hours – Tuesday 10:00 a.m. - 11:00 a.m., Thursday 2:00 p.m. - 3:00 p.m., POB 2.102
- Contact: geocklam at cs.utexas.edu
Note: Please make reservation a day before for the office hour in advance to avoid conflict.
Lecture Time and Location: M W 12:30 – 2:00 p.m. in GDC 3.516
This Course dwells on fundamentals and algorithms for data in high dimensions with provably probabilistic bounds on the accuracy. The emphasis shall be on high dimensional linear and non-linear geometric data with applications in Bioinformatics and Scientific Computing. The course is aimed at first or second year graduate students, especially in the CSEM, CS, and ECE programs, but others are welcome. You’ll need math at the level of at least 2nd year calculus, plus linear algebra, plus either more continuous math (e.g., CS and ECE students) or more discrete math (e.g., CSEM students).
Textbook and Course Material.
- [Baj1] Chandrajit Bajaj. Multi-scale Bio-Modeling and Visualization.
- [Baj2] Chandrajit Bajaj. Molecular Structural Bioinformatics: A Computational Science Perspective.
- [BG] Chandrajit Bajaj and Andrew Gillette. A Mathematical Primer For Computational Structural Biology.
- [BHK] Avrim Blum, John Hopcroft and Ravindran Kannan. Foundations of Data Science.
- Extra reference materials.
TENTATIVE COURSE OUTLINE.
|01-20-2016||High Dimensional Spaces, Probability, Law of Large Numbers [notes]|| [BHK] Chap2, Appendix
|01-25-2016||Cube, Spheres, Gaussians in High Dimensions, Central Limit Theorem [notes]||[BHK] Chap2, Appendix||
due on 02-05-2016, 11:59pm
|01-27-2016||Bernoulli Trials, Probability Distributions, Binomial, Poisson, Normal [notes]||[BHK] Appendix|
|02-01-2016||Markov, Chebyshev Tail Bounds, Chernoff Bounds [notes]||[BHK] Appendix|
|02-03-2016||Uniform Sampling of Spheres, Gaussians in High Dimension, Separating Mixtures of Gaussians [notes]||[BHK] Chap2, Chap3||A1 Solution|
Distribution Sampling, Unbiased Estimators, pairwise independence [notes]
|ss Ch14&15, [BHK] Appendix||
due on 02-24-2016, 11:59pm
Sampling, k-wise independence [notes]
Discrepancy and dispersion, Monte Carlo and integration [notes]
Quasi Monte Carlo (van der Corput, Halton, Hammersley), application to molecular docking and protein folding [notes]
|02-22-2016||Random walks, Markov Chain Monte Carlo, Metropolis Hasting, Gibbs sampling [notes]||
[MU] Ch7 & 10
|02-24-2016||Random Matrices, Fast Approximate Matrix Multiplication [notes]||[BHK] Chap7||
due on 03-13-2016, 11:59pm
|02-29-2016||Matrix Sketching, Non-uniform Sampling [notes]||[BHK] Chap7|
|03-02-2016||Eigenvalues, Eigenvectors, Singular Vectors, Real Spectral Theorem [notes]||[BHK] Appendix, Chap3|
|03-07-2016||SVD, Low Rank Matrix Approximation||[BHK] Chap3|
Fast SVD, Principal Components, and bioinformatics
|[BHK] Chap2||A3 Solution|
Approximate Nearest Neighbors (ANN), Locality Sensitive Hashing (LSH), and applications
|See Reading List for ANN and LSH papers|
|03-30-2016||Dynamic packing grid (DPG) and spherical range queries||See Reading list for DPG paper [CB1]|
Compressive Sensing, Sparse Vectors
|[BHK] Chap 10|| A4,
due on 04-17-2016, 11:59pm
|04-06-2016||Basis pursuit (Orthogonal Matching Pursuit and variants) and application in spectroscopy||See Reading list for papers on OMP and FTIR spectroscopy|
|04-11-2016||Machine Learning :Probably Approximately Correct, Occam Razor, Hoeffding Bounds, Regularizers||[BHK] Chap6|
|04-13-2016||Perceptron, Learning, Threshold Function, SVM||[BHK] Chap6||A4 Solution|
|04-18-2016||Non-linear SVM Kernel Methods;||[BHK] Chap6|
|04-20-2016||Shatter Functions, VC-Dimension||[BHK] Chap6|
|04-25-2016||Clustering and SDP Kernels||
due on 05-06-2016, 11:59pm
|04-27-2016||Multivariate Geometric Optimization, Convex Programming,||
Global Polynomial Optimization over Semi-algebraic sets; Polynomial Equation System Solving; Toeplitz, Hankel, Resultants, Bezoutians
|05-04-2016||Applications in Bioinformatics||
|05-12-2016||Final Exam GDC 4.304 2 - 5pm||Solution|
There will be one in-class midterm exam and one final exam. The proposed dates for the two tests are:
- Midterm: Wednesday, March 23
- Final: Thursday, May 12 GDC 4.304 2 - 5pm
There will be five written HW assignments. Please refer to the above schedule for assignments due time. You are allowed to discuss with each other, but all solutions must be written individually.
Course Requirements and Grading
Grades will be based on these factors
- In-class attendance and participation (extra 10%)
- HW assignments (60% and a little more extra credit)
5 assignments. Some assignments may have extra questions for extra points you can earn. (They will be specified in the assignment sheet each time.)
- In-class midterm exam (20%)
- In-class final exam (20%)
Students with Disabilities. Students with disabilities may request appropriate academic accommodations from the Division of Diversity and Community Engagement, Services for Students with Disabilities, 471-6259, http://www.utexas.edu/diversity/ddce/ssd
Accommodations for Religious Holidays. By UT Austin policy, you must notify the instructor of your pending absence at least fourteen days prior to the date of observance of a religious holiday. If you must miss a class or an examination in order to observe a religious holiday, you will be given an opportunity to complete the missed work within a reasonable time before or after the absence, provided proper notification is given.
Statement on Scholastic Dishonesty. Anyone who violates the rules for the HW assignments or who cheats in in-class tests or the final exam is in danger of receiving an F for the course. Additional penalties may be levied by the Computer Science department, CSEM and the University. See http://www.cs.utexas.edu/academics/conduct/
The syllabus page shows a table-oriented view of the course schedule, and the basics of course grading. You can add any other comments, notes, or thoughts you have about the course structure, course policies or anything else.
To add some comments, click the "Edit" link at the top.