Statistical Machine Learning Foundations

Fall 2022: High-dimensional Probability for Machine Learning

contact: namrata@iastate.edu , 3021 Coover Hall, 4-4012

Time: 12:40-2pm Tues-Thurs

Location: Carver 0204

Overview: This course will teach all of the following as well as teach topics from my Machine Learning: A Signal Proc Perspective course. That course is EE425 so it goes slower. This course will teach that material and teach the below material in one semester.

This course will teach (a) key topics from non-asymptotic random matrix theory (bounds on minimum and maximum singular values of many classes of high-dimensional random matrices, as well as on sums of a large number of random matrices) as well as (b) other high-dimensional probability topics like Chaining that are commonly used in Foundational Data Science and Machine Learning research. It will conclude with a discussion of a few recent papers that use the material, these papers can change from year to year. Most of the material involves heavy use of linear algebra, and only basic probability concepts. The goal will be to enable students to understand or develop provably accurate and fast algorithms for practically relevant ML problems.

Prerequisites: MATH 510 or 507 (or equivalent); EE322 or EE 523 or STAT 542 (or equivalent)

(A strong grasp of EE/STAT 322 and MATH 317 concepts may suffice too.)

Motivation for learning this material: See Fig 3.6 on page 53 of https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.pdf: in 2D, most points of a standard Gaussian point cloud lie inside and on a circle; while in n-D, most points lie ONLY on the surface of a hyper-sphere of radius \sqrt{n}.

Textbook:

1. My ML notes: link tbd

2. High Dimensional Probability for Data Science, by RomanVershynin, download here https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-book.html#

Syllabus, grading and introduction: see this file

Grading based on: (1) Theory home-works below or Python coding home-works; (2) Mid-term exam or small project; (3) Final exam or second project; (4) Class participation

Accommodation: Please email me and I can provide most reasonable accommodations, or call 515-294-7220. More details in the Syllabus file above

Class notes

o Folder with updated class notes

o ML algorithms file: https://www.dropbox.com/scl/fi/sw41wn8i9u4pc4aszg3g8/ML_algorithms.pdf?rlkey=potcspucrmrip2kocwsfdsabm&dl=0

o Slides on first few chapters of Vershynin book:

https://www.dropbox.com/scl/fo/a4t4yldr8n3rd7pcchrll/h?rlkey=skfrbpqd4sj8y92tjwjij55nu&dl=0

§ Old version also in: https://drive.google.com/drive/u/0/folders/1WAa5eK9J-93T5V_SWMxQWtMJ3UDsF__H

o Introduction

o Introduction and brief discussion of applications: first few slides of file

o Probability background and recap

o Recap of EE 322: file Dropbox link (Dropbox link is the latest version, but the link may get broken during editing sometimes)

o Probability background: file Dropbox link

o Week 1, 2

o Concentration bounds for different types of scalar r.v.s: Chap 2

o Scalar Bernoulli, general bounded, sub-gaussian and sub-exponential random variables summary: file , Dropbox link

o Detailed derivations: handwritten

§ Detailed derivations of equivalent definitions and properties for subGaussian rvs

§ Detailed derivations of equivalent definitions and properties for sub-expo rvs

o Application in boosting randomized algorithms, and analyzing degrees of random graphs: also see book Chap 2 and homework

o Week 2, 3, 4

o Linear algebra review and preliminaries

o Linear algebra review and preliminaries: file, Dropbox link

o Week 5

o High dimensional random vectors and matrices, use of epsilon-nets: Chap 3 (3.1-3.4), Chap 4 (4.1 - 4.7)

o High-dimensional random vectors and matrices, use of epsilon-nets: file, Dropbox link

o Handwritten notes:

§ Chap 3 detailed hand-written notes

§ Application to PCA

§ Davis-Kahan sin theta theorem (2 versions) and application to detecting communities in networks by spectral clustering

o Week

o Quadratic forms, Symmetrization, Chap 6, 6.1 – 6.6

o Quadratic forms and Symmetrization: file Dropbox link

o Concentration without independence

o Sec 5.1, 5.2:

o Proof of Matrix Bernstein inequality, Sec 5.4:

o Parts of Chap 5, TBD – term papers

o Random processes and Chaining

o Short summary: file Dropbox link

o Parts of Chap 7, 8

o Applications of Chaps 1-4

o Low rank and sparse recovery problems, AltMin, GD algorithms: file

§ Also contains a detailed proof of application to LRPR problem (my work)

o Application to PCA and community detection in networks: handwritten

§ Application to PCA

§ Davis-Kahan sin theta theorem (2 versions) and application to detecting communities in networks by spectral clustering

o More: TBD

Old offering of a part of this course as EE 520 in Fall 2019: https://www.ece.iastate.edu/~namrata/MachineLearning_class/

Home-works

o Policies:

o Okay to use any source - books, internet, friends, cite what source or webpage or classmate that helped you.

o Need carefully Latexed solutions (do not just copy what you find online or what friend tells you)

o HW 3, 4: can be handwritten. I would suggest Latexing the ones with long solutions but not required.

o A subset of problems will be graded (to be decided after submission date)

o Homework 1: due Tuesday Sept 13

o 1.2.2, 1.2.3

o 2.2.8, 2.2.9

o Homework 2: due Thursday Sept 22

o 2.4.2, 2.4.3, 2.4.4

o 2.5.10, 2.6.5, 2.6.6, Show that a sub-Gaussian r.v. is sub-exponential.

o Extra credit: 2.5.5, 2.5.11, 2.6.9

§ Hints for extra credit: 2.5.5, 2.5.11

o Solutions: see Chap 2 slides

o Homework 3: due Monday Oct 31

o 3.1.5, 3.4.3, find an example where the claim of Ex 3.4.5 is true,

o Quantify Remark 3.2.5 (extra credit)

o 4.4.3, 4.4.4, 4.1.6

o Homework 4: due end semester

o Community detection in networks: explore further beyond what is covered in Sec 4.5 of the book. This can be a little group project for the entire class. I will make a Discord channel for discussions on this. a. Depending on each person’s interests, this exploration can be one or all of i. more general community models, ii. better algorithms that do not require q to be large to work iii. theoretical results for a and b and intuition on why the proof goes through, along with refs for the proofs.\

o Chapter 6: re-write (by hand or type) proofs of Sec 6.1 and 6.2:

o Old from 2021

o Ex 4.4.3 (this proof will need proof of 4.4.2)

o Ex 4.4.4

o Ex 4.1.6 and write out proof of Lemma 4.1.5

o Ex 4.7.3

o Lower bound sigma_min(A) by using epsilon-net argument (this will use upper bound on ||A||)

o 3.1.5,

o 3.4.3,

o Quantify Remark 3.2.5

o Find an example where the claim of Ex 3.4.5 is true

o Extra credit: 3.3.1, 3.3.7

o (can hand-write or Latex but write clearly; since you are allowed to use any source including friends or internet to find solution idea, I would like to see effort in writing a clear-cut proof)

Term Paper and Scribe topics

- See Discord channels on each.