Fall 2016: EE 520: Special Topics in Communications and Signal Processing:

Foundations of Statistical Machine Learning

Instructor: Prof Namrata Vaswani email: namrata@iastate.edu

Web link: http://home.engineering.iastate.edu/~namrata/MachineLearning_class/

Time : Mon-Wed 10:50 - 12:10

Location: Coover 1012

 

This will be a special topics / seminars course in which we will discuss some recent works on statistical machine learning algorithms and their performance guarantees but will start with learning the Math - random matrix theory (a.k.a. high-dimensional probability) - that is used in many of the proofs. The course should be of interest to graduate students from Electrical and Comp Engineering, Mathematics, Statistics, Computer Science, Industrial Engineering and others. A few topics can be added based on the students` research interests.

 

Background required: solid knowledge of undergraduate-level probability and linear algebra as expected by an EE major, so EE 322, MATH 207 at Iowa State

 

Recommended co-requisites: EE 523, MATH 510.

Motivation for this course: See Fig 3.6 on page 53 of https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.pdf– in 2D, most points of  a standard Gaussian point cloud lie inside and on a circle; while in n-D, most points lie ONLY on the surface of a hyper-sphere of radius \sqrt{n}.

 

Topics and Notes for each

 

A.     Practice Exercises until Oct 7 class: I encourage students to do one or two or all of these:

a.      Vershynin book: Ex. 1.2.2, Ex. 1.2.3, Ex. 2.2.8, Ex. 2.5.10, Ex. 2.5.11, Ex. 3.1.7, Ex. 3.3.7

b.      as with any Math course you really get the material only when you work out problems; if you use the Internet or any other source to help you please mention.

c.       while we do not have homework in the course, I will use your responses to count towards class participation points.

d.       

B.      Probability background:

a.       Recap of EE 322 - 1 lecture

                                                    i.      EE 322 notes , EE 322 problem sets (many of these are harder than what I use for my EE 322 offerings)

b.      New(ish) material - 1 lecture

                                                    i.      Notes

C.    Linear Algebra background:

a.       parts of Chapters 0, 1, 2, 4, 5 of Matrix Analysis by Horn and Johnson

                                                    i.      Notes

D.    Topics from Vershynin's tutorial and some from his book

a.       Links:

                                                    i.      Tutorial: https://arxiv.org/abs/1011.3027 

                                                 ii.      Book:  https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.html#  (much more detailed)

b.      Sub-Gaussian & Sub-exponential random variables - scalar,

                                                    i.      Chap 2, 2.4, 2.5, 2.6, 2.7, 2.8 (excluding most proofs)

c.       Random vectors - concentration of a vec w independent sub-G entries, isotropic r. vec, sub-Gaussian r. vec, nice sub-G r. vec and examples, spherical r.v. is a nice sub-G.

                                                    i.      Chap 3, 3.1, 3.2.2, 3.3.1, 3.3.2, 3.3.3, 3.4 (except 3.4.4), including proofs

d.      E[X] formula for X>0 and its use to convert a high probab bound into a bound on expected value.

e.       Epsilon-net - what is it and how it is used

                                                    i.      Chap 4, parts of 4.1, 4.2 (only Definition 4.2.1, 4.2.2, Coroll 4.2.13), 4.4.1

f.        Sub-Gaussian matrices & proofs of two key theorems (in Chap 4 of book).

                                                    i.      Most useful result: Theorem 4.6.1 and its generalizations (See exercises)

                                                 ii.      See Chap 4, sections 4.6 and 4.4,

g.      Lipschitz concentration results, no proofs

                                                    i.      Define Lipschitz and locally Lip.

                                                 ii.      Chapter 5

h.      Matrix Bernstein

                                                    i.      Chap 5

                                                 ii.      Original result, matrix dilation idea, extension to sums of any n1 x n2 matrices

                                               iii.      Proof: to be done

i.        Matrix Bernstein versus sub-Gaussian rows’ matrices result (Theorem 4.6.1)

                                                    i.      Notes

                                                 ii.      contains both results stated using matching notation, also contains discussion of when to use what and why (written for a specific problem Vaswani has worked on)

j.        Applications covered so far in class:

                                                    i.      Covariance matrix estimation – Sec 4.7:

1.      Application of Theorem 4.6.1

                                                 ii.      Sparse recovery, Compressive Sensing:

1.      Given RIP, compressive sensing works; see https://statweb.stanford.edu/~candes/papers/RIP.pdf for example.

2.      Proof of RIP for sub-Gaussian rows’ matrices: easy proof via Theorem 4.6.1

                                               iii.      Sketching via Johnson-Lindenstrauss (JL) Lemma:

1.      JL Lemma from Exercise 5.3.3 is a direct application of Theorem 4.6.1 or really of its proof

2.      JL Lemma implication: we can often use a “random matrix” for “dimension reduction”

                                                iv.      Phaseless PCA / Low-Rank Phase Retrieval

1.      See Lemma 3.10 of  https://arxiv.org/abs/1902.04972 this involves using sub-exponential Bernstein and epsilon-net arguments.

                                                  v.      PCA in Sparse Data-Dependent Noise

1.      Compare use of matrix Bernstein and Theorem 4.6.1 for a specific problem

2.      Original reference: ISIT 2018 paper with same title.

k.      Application to spectral clustering

                                                    i.      To be done

l.        Chaining and background needed for Chaining

                                                    i.      TBD, may be covered by Prof. Ramamoorthy.

E.     Leave-one-out argument as used for Matrix Completion and Phase Retrieval:

a.       Interspersed with Versh book. So for instance we can talk about some of these after a,b,c above are done.

b.      Some papers:

                                   i.    Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution

                                 ii.    Gradient Descent with Random Initialization: Fast Global Convergence for Nonconvex Phase Retrieval

c.       3-5 lectures

F.     Broad overview of some other topics - for breadth

a.       introduction to statistical learning concepts; and a brief overview of other ML topics

                                                    i.      from EE 425X:

                                                 ii.      Notes

                                               iii.      Notes-2 (on Deep learning)

b.      a few guest lectures

                                                    i.      first set of two on Stochastic Bandits is next week.

c.       2-4 lectures

G.    Student term papers

a.       Decide by Sept 30.

 

A subset of papers that students can choose from, they can also pick others but consult with professor

o    List coming soon.

OLD MATERIAL FROM 2016 TEACHING:

Introduction slides: Intro

Optimization: subset of slides of  Vandenberghe and Boyd:

Term papers and Scribing – New 9/14/2016

Scribe topics

1. Add to probability and linear algebra notes 2. Find linear algebra tricks in all the papers we present 3. Find probability tricks in all the papers we present 4.

Phase retrieval via Alternating Minimization (Jain, Netrapalli, Sanghavi) Solving Random Quadratic Systems of Equations is Nearly as Easy as Solving Linear Equations (Chen and Candes) Low-rank Matrix Completion using Alternating Minimization (Jain, Netrapalli, Sanghavi)

Tentatively: R. Vershynin, ESTIMATION IN HIGH DIMENSIONS: A GEOMETRIC PERSPECTIVE