Sp16 - COMP STAT APPL TO BIOINFMATICS (51415)

Sp16 - COMP STAT APPL TO BIOINFMATICS (51415)

CSE 383M (62965) Statistical and Discrete Methods for Scientific Computing

CS395T (51415) Computational Statistics with Application to Bioinformatics

Spring 2016

The University of Texas at Austin 
Department of Computer Science 

 

Instructor:

Professor Chandrajit Bajaj

  • Office hours – Monday 11:00 a.m. - 12:00 p.m., 2:00 p.m - 3:00 p.m. POB 2.324A
  • Contact: bajaj at cs.utexas.edu

NOTE: Most questions should be submitted to Canvas rather than by sending emails to the instructor. Please make reservation a day before for the office hour in advance to avoid conflict. 

 

Teaching Assistant

George Lam

  • Office hours – Tuesday 10:00 a.m. - 11:00 a.m., Thursday 2:00 p.m. - 3:00 p.m., POB 2.102
  • Contact: geocklam at cs.utexas.edu

Note: Please make reservation a day before for the office hour in advance to avoid conflict. 

 

Lecture Time and Location: M W 12:30 – 2:00 p.m. in GDC 3.516

 

Prerequisites

This Course dwells on fundamentals and algorithms for data in high dimensions with provably probabilistic bounds on the accuracy. The emphasis shall be on high dimensional linear and non-linear geometric data with applications in Bioinformatics and Scientific Computing. The course is aimed at first or second year graduate students, especially in the CSEM, CS, and ECE programs, but others are welcome. You’ll need math at the level of at least 2nd year calculus, plus linear algebra, plus either more continuous math (e.g., CS and ECE students) or more discrete math (e.g., CSEM students). 

 

Textbook and Course Material.

 

TENTATIVE COURSE OUTLINE. 

Date Topic Reading Assignments
01-20-2016 High Dimensional Spaces, Probability, Law of Large Numbers [notes]  [BHK] Chap2, Appendix
 
01-25-2016 Cube, Spheres, Gaussians in High Dimensions, Central Limit Theorem [notes]  [BHK] Chap2, Appendix

 A1

due on 02-05-2016, 11:59pm

01-27-2016 Bernoulli Trials, Probability Distributions, Binomial, Poisson, Normal [notes]  [BHK] Appendix  
02-01-2016 Markov, Chebyshev Tail Bounds, Chernoff Bounds [notes]  [BHK] Appendix  
02-03-2016  Uniform Sampling of Spheres, Gaussians in High Dimension, Separating Mixtures of Gaussians [notes]  [BHK] Chap2, Chap3 A1 Solution
02-08-2016

Distribution Sampling, Unbiased Estimators, pairwise independence [notes]

ss Ch14&15,  [BHK] Appendix    A2

due on 02-24-2016, 11:59pm

02-10-2016

Sampling, k-wise independence [notes]

 [MU] Ch13  
02-15-2016

Discrepancy and dispersion, Monte Carlo and integration [notes]

 [MU] Ch13  
02-17-2016

Quasi Monte Carlo (van der Corput, Halton, Hammersley), application to molecular docking and protein folding [notes]

 [MU] Ch10  
02-22-2016 Random walks, Markov Chain Monte Carlo, Metropolis Hasting, Gibbs sampling [notes]

 [BHK] Chap5,

 [MU] Ch7 & 10

A2 Solution
02-24-2016 Random Matrices, Fast Approximate Matrix Multiplication [notes]   [BHK] Chap7 A3

due on 03-13-2016, 11:59pm

02-29-2016 Matrix Sketching, Non-uniform Sampling [notes]  [BHK] Chap7  
03-02-2016 Eigenvalues, Eigenvectors, Singular Vectors, Real Spectral Theorem [notes]  [BHK] Appendix, Chap3  
03-07-2016 SVD, Low Rank Matrix Approximation  [BHK] Chap3  
03-09-2016

Fast SVD, Principal Components, and bioinformatics

 [BHK] Chap3

 
03-14-2016

Spring break

 
03-16-2016  Spring break    
03-21-2016

Johnson-Lindenstrauss Lemma

 [BHK] Chap2  A3 Solution 
03-23-2016 Midterm Solution
03-28-2016

Approximate Nearest Neighbors (ANN), Locality Sensitive Hashing (LSH), and applications

 See Reading List for ANN and LSH papers  
03-30-2016  Dynamic packing grid (DPG) and spherical range queries  See Reading list for DPG paper [CB1]
04-04-2016

Compressive Sensing, Sparse Vectors

 [BHK] Chap 10     A4

due on 04-17-2016, 11:59pm

04-06-2016 Basis pursuit (Orthogonal Matching Pursuit and variants) and application in spectroscopy  See Reading list for  papers on OMP and FTIR spectroscopy  
04-11-2016 Machine Learning :Probably Approximately Correct, Occam Razor, Hoeffding Bounds, Regularizers  [BHK] Chap6  
04-13-2016 Perceptron, Learning, Threshold Function, SVM  [BHK] Chap6 A4 Solution
04-18-2016 Non-linear SVM Kernel Methods;   [BHK] Chap6
04-20-2016  Shatter Functions, VC-Dimension  [BHK] Chap6  
04-25-2016  Clustering and SDP Kernels 

[CB2]  paper

 [BHK] Chap8

  A5

due on 05-06-2016, 11:59pm

04-27-2016 Multivariate Geometric Optimization, Convex Programming,

[CB3]  paper

 Reading list

 
05-02-2016

Global Polynomial Optimization over Semi-algebraic sets; Polynomial Equation System Solving; Toeplitz, Hankel, Resultants, Bezoutians

Reading List  

 
05-04-2016 Applications in Bioinformatics

Reading List

 [paper]

A5 solution
05-12-2016 Final Exam  GDC 4.304  2 - 5pm   Solution

 

Tests

There will be one in-class midterm exam and one final exam. The proposed dates for the two tests are:

  • Midterm: Wednesday, March 23
  • Final: Thursday, May 12  GDC 4.304  2 - 5pm

 

Assignments

There will be five written HW assignments. Please refer to the above schedule for assignments due time. You are allowed to discuss with each other, but all solutions must be written individually.

 

Course Requirements and Grading

Grades will be based on these factors

  • In-class attendance and participation (extra 10%)
  • HW assignments (60% and a little more extra credit) 

5 assignments. Some assignments may have extra questions for extra points you can earn. (They will be specified in the assignment sheet each time.)

  • In-class midterm exam (20%) 
  • In-class final exam (20%) 

 

Students with Disabilities. Students with disabilities may request appropriate academic accommodations from the Division of Diversity and Community Engagement, Services for Students with Disabilities, 471-6259, http://www.utexas.edu/diversity/ddce/ssd 

 

Accommodations for Religious Holidays. By UT Austin policy, you must notify the instructor of your pending absence at least fourteen days prior to the date of observance of a religious holiday. If you must miss a class or an examination in order to observe a religious holiday, you will be given an opportunity to complete the missed work within a reasonable time before or after the absence, provided proper notification is given.

 

Statement on Scholastic Dishonesty. Anyone who violates the rules for the HW assignments or who cheats in in-class tests or the final exam is in danger of receiving an F for the course. Additional penalties may be levied by the Computer Science department,  CSEM  and the University. See http://www.cs.utexas.edu/academics/conduct/

 

 

 

Course Summary:

Date Details