**Fall 2016: EE 520: Special Topics in Communications
and Signal Processing:**

**Foundations of Statistical
Machine Learning**

Instructor: Prof Namrata Vaswani email: namrata@iastate.edu

Web link: http://home.engineering.iastate.edu/~namrata/MachineLearning_class/

Time : Mon-Wed 10:50 - 12:10

Location: Coover 1012

This will be a special topics / seminars course in which we will discuss some recent works on statistical machine learning algorithms and their performance guarantees but will start with learning the Math - random matrix theory (a.k.a. high-dimensional probability) - that is used in many of the proofs. The course should be of interest to graduate students from Electrical and Comp Engineering, Mathematics, Statistics, Computer Science, Industrial Engineering and others. A few topics can be added based on the students` research interests.

**Background required:**
solid knowledge of undergraduate-level probability and linear algebra as
expected by an EE major, so EE 322, MATH 207 at Iowa State

**Recommended
co-requisites: **EE 523, MATH 510.

**Class time and Location:**Mon-Wed 10:50-12:10, Coover 1012**Instructor:**Prof. Namrata Vaswani**Office Hours:**Tues - Wed 2-3 or by appointment**Email:**namrata AT iastate.edu,**Phone:**515-294-4012**Office:**3121 Coover Hall**Grading:**- 10% class participation
- 40% scribe notes for
one paper that I will present
- 50% term paper (read
and present on a paper/topic for one lecture, submit slides and a short
report)
**Disability accommodation:**If you have a documented disability and anticipate needing accommodations in this course, please make arrangements to meet with me soon. You will need to provide documentation of your disability to Disability Resources (DR) office, located on the main floor of the Student Services Building, Room 1076, 515-294-7220.

**Motivation for this
course: **See Fig 3.6 on page 53 of https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.pdf–
in 2D, most points of a standard
Gaussian point cloud lie inside and on a circle; while in n-D, most points lie
ONLY on the surface of a hyper-sphere of radius \sqrt{n}.

**Topics and Notes for each**

**A. ****Practice Exercises until Oct 7 class:
I encourage students to do one or two or all of these:**

**a. **Vershynin book: Ex. 1.2.2, Ex. 1.2.3,
Ex. 2.2.8, Ex. 2.5.10, Ex. 2.5.11, Ex. 3.1.7, Ex. 3.3.7

b.
as with any Math course you really
get the material only when you work out problems; if you use the Internet or
any other source to help you please mention.

c.
while we do not have homework in the
course, I will use your responses to count towards class participation points.

d.

**B. ****Probability
background: **

a. Recap of EE 322 - 1 lecture

i.
EE 322
notes , EE
322 problem sets (many of these are harder than what I use for my EE 322
offerings)

b. New(ish)
material - 1 lecture

i.
Notes

**C.
****Linear Algebra background: **

a. parts of Chapters 0, 1, 2, 4,
5 of Matrix Analysis by Horn and Johnson

i.
Notes

**D.
****Topics from Vershynin's tutorial and some from his book **

a. Links:

i.
Tutorial:
https://arxiv.org/abs/1011.3027

ii.
Book:
https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.html#
(much more detailed)

b. Sub-Gaussian &
Sub-exponential random variables - scalar,

i.
Chap
2, 2.4, 2.5, 2.6, 2.7, 2.8 (excluding most proofs)

c. Random vectors -
concentration of a vec w independent sub-G entries,
isotropic r. vec, sub-Gaussian r. vec,
nice sub-G r. vec and examples, spherical r.v. is a nice sub-G.

i.
Chap
3, 3.1, 3.2.2, 3.3.1, 3.3.2, 3.3.3, 3.4 (except 3.4.4), including proofs

d. E[X] formula for X>0 and
its use to convert a high probab bound into a bound
on expected value.

e. Epsilon-net - what is it and
how it is used

i.
Chap
4, parts of 4.1, 4.2 (only Definition 4.2.1, 4.2.2, Coroll
4.2.13), 4.4.1

f.
Sub-Gaussian
matrices & proofs of two key theorems (in Chap 4 of book).

i.
Most
useful result: Theorem 4.6.1 and its generalizations (See exercises)

ii.
See
Chap 4, sections 4.6 and 4.4,

g. Lipschitz concentration results,
no proofs

i.
Define
Lipschitz and locally Lip.

ii.
Chapter
5

h. Matrix Bernstein

i.
Chap
5

ii.
Original
result, matrix dilation idea, extension to sums of any n1 x n2 matrices

iii.
Proof:
to be done

**i.
****Matrix Bernstein versus sub-Gaussian
rows’ matrices result (Theorem 4.6.1)**

i.
Notes

ii.
contains
both results stated using matching notation, also contains discussion of when
to use what and why (written for a specific problem Vaswani has worked on)

**j.
****Applications covered so far
in class: **

i.
Covariance
matrix estimation – Sec 4.7:

1. Application of Theorem 4.6.1

**
ii.
****Sparse recovery, Compressive Sensing:
**

1. Given RIP, compressive
sensing works; see https://statweb.stanford.edu/~candes/papers/RIP.pdf
for example.

2. Proof of RIP for sub-Gaussian
rows’ matrices: easy proof via Theorem 4.6.1

**
iii.
****Sketching via Johnson-Lindenstrauss (JL) Lemma: **

1. JL Lemma from Exercise 5.3.3
is a direct application of Theorem 4.6.1 or really of its proof

2. JL Lemma implication: we can
often use a “random matrix” for “dimension reduction”

**
iv.
****Phaseless****
PCA / Low-Rank Phase Retrieval **

**1.
**See
Lemma 3.10 of https://arxiv.org/abs/1902.04972 this involves using
sub-exponential Bernstein and epsilon-net arguments.

**
v.
****PCA in Sparse Data-Dependent Noise**

1. Compare use of matrix
Bernstein and Theorem 4.6.1 for a specific problem

2. Original reference: ISIT 2018
paper with same title.

k. Application to spectral
clustering

i.
To
be done

l.
Chaining
and background needed for Chaining

i. TBD, may be covered by Prof. Ramamoorthy.

**E.
****Leave-one-out argument as
used for Matrix Completion and Phase Retrieval: **

a. Interspersed with Versh book. So for instance we can talk about some of these
after a,b,c above are done.

b. Some papers:

ii. Gradient
Descent with Random Initialization: Fast Global Convergence for Nonconvex Phase
Retrieval

c. 3-5 lectures

**F.
****Broad overview of some other
topics - for breadth **

a. introduction to statistical
learning concepts; and a brief overview of other ML topics

i.
from
EE 425X:

ii.
Notes

iii.
Notes-2
(on Deep learning)

b. a few guest
lectures

i.
first
set of two on Stochastic Bandits is next week.

c. 2-4 lectures

**G.
****Student term papers **

a. Decide by Sept 30.

**A subset of papers that students can choose from, they can
also pick others but consult with professor**

o List coming soon.

**OLD MATERIAL FROM 2016
TEACHING: **

**Introduction slides: **Intro

**Optimization:** subset of slides of Vandenberghe and Boyd:

**Papers**- Phase retrieval via
Alternating Minimization
(Jain, Netrapalli, Sanghavi)
- Solving Random
Quadratic Systems of Equations is Nearly as Easy as Solving Linear
Equations (Chen and Candes)
- Low-rank Matrix
Completion using Alternating Minimization (Jain, Netrapalli,
Sanghavi)
- R. Vershynin,
ESTIMATION
IN HIGH DIMENSIONS: A GEOMETRIC PERSPECTIVE
- Older EE520 on Matrix
Completion and Robust PCA: here,
- Even older EE520 on
Compressive Sensing: here

**Term
papers and Scribing – New 9/14/2016**

Scribe topics

1. Add to probability and linear algebra notes 2. Find linear algebra tricks in all the papers we present 3. Find probability tricks in all the papers we present 4.

Phase retrieval via Alternating Minimization (Jain, Netrapalli, Sanghavi) Solving Random Quadratic Systems of Equations is Nearly as Easy as Solving Linear Equations (Chen and Candes) Low-rank Matrix Completion using Alternating Minimization (Jain, Netrapalli, Sanghavi)

Tentatively: R. Vershynin, ESTIMATION IN HIGH DIMENSIONS: A GEOMETRIC PERSPECTIVE