Fall 2022: Highdimensional Probability for Machine
Learning
Instructor: Prof Namrata Vaswani
contact: namrata@iastate.edu , 3021 Coover Hall, 44012
Time: 12:402pm TuesThurs
Location: Carver 0204
Overview: This
course will teach (a) key topics from nonasymptotic random matrix theory
(bounds on minimum and maximum singular values of many classes of
highdimensional random matrices, as well as on sums of a large number of
random matrices) as well as (b) other highdimensional probability topics like
Chaining that are commonly used in Foundational Data Science and Machine
Learning research. It will conclude with a discussion of a few recent papers
that use the material, these papers can change from year to year. Most of the
material involves heavy use of linear algebra, and only basic probability
concepts. The goal will be to enable students to understand or develop provably
accurate and fast algorithms for practically relevant ML problems.
Prerequisites: MATH 510 or 507
(or equivalent); EE 523 or STAT 542 (or equivalent)
(A strong grasp of EE/STAT 322 and MATH 317 concepts may suffice too.)
Motivation for learning this material: See
Fig 3.6 on page 53 of https://www.math.uci.edu/~rvershyn/papers/HDPbook/HDPbook.pdf:
in 2D, most points of a standard Gaussian point cloud lie inside and on a
circle; while in nD, most points lie ONLY on the surface of a hypersphere of
radius \sqrt{n}.
Textbook: High Dimensional Probability for Data Science, by RomanVershynin, download here https://www.math.uci.edu/~rvershyn/papers/HDPbook/HDPbook.html#
Updates slides/notes
for this course:
https://drive.google.com/drive/folders/1WAa5eK9J93T5V_SWMxQWtMJ3UDsF__H
Syllabus,
grading and introduction:
see this file
Accommodation: Please email me and I can provide most reasonable
accommodations, or call 5152947220. More details in the Syllabus file above
Class notes
o
Folder with updated class
notes
o
https://drive.google.com/drive/u/0/folders/1WAa5eK9J93T5V_SWMxQWtMJ3UDsF__H
o
https://www.dropbox.com/scl/fo/a4t4yldr8n3rd7pcchrll/h?rlkey=skfrbpqd4sj8y92tjwjij55nu&dl=0
o
Introduction
o
Introduction
and brief discussion of applications: first few slides of file
o
Probability background and
recap
o
Recap
of EE 322: file Dropbox
link (Dropbox link is the latest
version, but the link may get broken during editing sometimes)
o
Probability
background: file Dropbox
link
o
Week
1, 2
o
Concentration bounds for
different types of scalar r.v.s: Chap 2
o
Scalar
Bernoulli, general bounded, subgaussian and subexponential random variables
summary: file , Dropbox
link
o
Detailed
derivations: handwritten
§
Detailed
derivations of equivalent definitions and properties for subGaussian
rvs
§
Detailed
derivations of equivalent definitions and properties for subexpo rvs
o
Application
in boosting randomized algorithms, and analyzing degrees of random graphs: also
see book Chap 2 and homework
o
Week
2, 3, 4
o
Linear algebra review and
preliminaries
o
Linear
algebra review and preliminaries: file,
Dropbox
link
o
Week
5
o
High dimensional random
vectors and matrices, use of epsilonnets: Chap 3 (3.13.4), Chap 4 (4.1  4.7)
o
Highdimensional
random vectors and matrices, use of epsilonnets: file, Dropbox
link
o
Handwritten
notes:
§
Chap
3 detailed handwritten notes
o
Week
o
Quadratic forms, Symmetrization, Chap 6, 6.1 – 6.6
o
Quadratic
forms and Symmetrization: file Dropbox
link
o
Concentration without
independence
o
Sec 5.1, 5.2:
o
Proof of Matrix Bernstein
inequality, Sec 5.4:
o
Parts
of Chap 5, TBD – term papers
o
Random processes and Chaining
o
Short
summary: file Dropbox
link
o
Parts
of Chap 7, 8
o
Applications of Chaps 14
o
Low
rank and sparse recovery problems, AltMin, GD
algorithms: file
§
Also
contains a detailed proof of application to LRPR problem (my work)
o
Application
to PCA and community detection in networks: handwritten
o
More:
TBD
Old offering of a part of
this course as EE 520 in Fall 2019: https://www.ece.iastate.edu/~namrata/MachineLearning_class/
Homeworks
o
Policies:
o
Okay
to use any source  books, internet, friends, cite what source or webpage or
classmate that helped you.
o
Need
carefully Latexed solutions (do not just copy what
you find online or what friend tells you)
o
HW
3, 4: can be handwritten. I would suggest Latexing
the ones with long solutions but not required.
o
A subset of problems will be
graded (to be decided after submission date)
o
Homework 1: due Tuesday Sept
13
o
1.2.2,
1.2.3
o
2.2.8,
2.2.9
o
Homework 2: due Thursday Sept
22
o
2.4.2,
2.4.3, 2.4.4
o
2.5.10,
2.6.5, 2.6.6, Show that a subGaussian r.v. is
subexponential.
o
Extra
credit: 2.5.5, 2.5.11, 2.6.9
§
Hints
for extra credit: 2.5.5,
2.5.11
o
Solutions:
see Chap 2 slides
o
Homework 3: due Monday Oct 31
o
3.1.5, 3.4.3, find an example where the claim of Ex 3.4.5 is
true,
o
Quantify Remark 3.2.5 (extra credit)
o
4.4.3, 4.4.4, 4.1.6
o
Homework 4: due end semester
o
Community detection in networks: explore further beyond what is
covered in Sec 4.5 of the book. This can be a little group project for the
entire class. I will make a Discord channel for discussions on this. a.
Depending on each person’s interests, this exploration can be one or all of i.
more general community models, ii. better algorithms that do not require q to
be large to work iii. theoretical results for a and b and intuition on why the
proof goes through, along with refs for the proofs.\
o
Chapter 6: rewrite (by hand or type) proofs of Sec 6.1 and 6.2:
o
Old from 2021
o
Ex
4.4.3 (this proof will need proof of 4.4.2)
o
Ex
4.4.4
o
Ex
4.1.6 and write out proof of Lemma 4.1.5
o
Ex
4.7.3
o
Lower
bound sigma_min(A) by using epsilonnet argument
(this will use upper bound on A)
o
3.1.5,
o
3.4.3,
o
Quantify
Remark 3.2.5
o
Find
an example where the claim of Ex 3.4.5 is true
o
Extra credit: 3.3.1, 3.3.7
o
(can handwrite or Latex but
write clearly;
since you are allowed to use any source
including friends or internet to find solution idea, I would like to see effort
in writing a clearcut proof)
Term
Paper and Scribe topics

See
Discord channels on each.