COMS 4995-004 Optimization for Machine Learning (Fall 2019)

Time & venue
Tuesdays and Thursdays 8:40 am-9:55 am in Kent Hall 413
Instructor
Satyen Kale
Course email
teaching at satyenkale dot com [NOTE: I will not be checking my Columbia email, nor respond to course inquiries sent to my personal email]
Office hours
Tuesdays/Thursdays right after class by appointment only, please email me at least a day in advance to set up a meeting. Location: CEPSR 7th floor, room 7LW1A (across from room 724).
TAs
Gurpreet Singh. Office hours: Mondays 12:00 - 1:00 pm. Location: Mudd 1st floor TA room.
Victor Dong. Office hours: Wednesdays 3:30 - 4:30 pm. Location: Mudd 1st floor TA room.
Chao Qin. Office hours: Thursdays 2:00 - 3:00 pm. Location: Uris Hall 4N.
Online forum
All class discussion will be held on Piazza.

Announcements

Dec 3
The final exam will be in IAB 405 on Thursday, 12/5 from 8:40-10:40 am.
Nov 14
Homework 4 is out.
Nov 12
The second make up class date has been finalized: Wednesday, Nov 13 from 8:40-9:55 am at Math 417.
Oct 28
The second make up class date has been finalized: Monday, Nov 4 from 8:40-9:55 am at Math 417. The Monday, Nov 4 class is cancelled due to a university holiday.
Oct 22
Due to some travel, the classes for Thursday, Oct 24 and Thursday, Nov 7 are cancelled. I will do two make up classes. One make up class is scheduled for Friday, Nov 8 from 8:40-9:55 am at Mudd 633 (note: this is not our usual classroom). I am still working on setting up the second make up class. It will also be from 8:40-9:55 am, date and location TBD.
Oct 22
Project details are out. The project deliverables are due by 5 pm on Nov 24, 2019.
Sep 26
Homework 2 is out.
Sep 16
Office hours have been posted.
Sep 12
Homework 1 is out.
Aug 27
Sign-up for Piazza using this link.

Schedule

9/3 Administrivia, ML basics, ML as optimization Read: BCN {Chapters 1, 2}, EH {Chapter 1}, CPA {Intro slides}
9/5 ML as optimization, Convexity Read: BCN {Chapters 1, 2}, EH {Chapter 1}, CPA {Intro slides}
9/10 Convexity Read: MJ {Chapter 1}, EH {Chapter 2, till Section 2.1}
9/12 Convexity Read: MJ {Chapter 1}, EH {Chapter 2, till Section 2.1}
Lecture 4 notes
9/17 Convexity Read: MJ {Chapter 1}, EH {Chapter 2, till Section 2.1}
Lecture 5 notes
9/19 Gradient descent Read: MJ {Chapter 2}
Lecture 6 notes
9/24 Gradient descent Read: MJ {Chapter 2}
Lecture 7 notes
9/26 Projected gradient descent Read: MJ {Chapter 3, till Section 3.4}
Lecture 8 notes
10/1 Projected gradient descent Read: MJ {Chapter 4 and chapter 5, section 5.1}
Lecture 9 notes
10/3 Stochastic gradient descent Read: MJ {Chapter 5}
Lecture 10 notes
10/8 Stochastic gradient descent Read: MJ {Chapter 5}
Lecture 11 notes
10/10 Stochastic gradient descent Read: MJ {Chapter 5}
Lecture 12 notes
10/15 Minibatched SGD, Variance reduction methods Read: MJ {Chapter 5, section 5.6}, CPA {Nov 29 lecture slides}
No lecture notes available; please use readings.
10/17 In-class midterm
10/22 SAG, SAGA, SVRG Read: CPA {Nov 29 lecture slides}
SVRG paper
Lecture 14 notes
10/24 Class cancelled, make up class TBD
10/29 SVRG Read: SVRG paper
Lecture 15 notes
10/31 Wrap up analysis of SVRG; Frank-Wolfe algorithm Read: SVRG paper, CMJ {Slides for lecture #9}, CEH {Lecture 7 notes}
Lecture 16 notes
11/5 No class (university holiday)
11/8 Class cancelled; make up class on 11/9
11/9 Frank-Wolfe algorithm; Accelerated Gradient Descent CMJ {Slides for lecture #9}, CEH {Lecture 7 notes}, CMH {Lecture 7 notes}
Lecture 17 notes
11/12 Accelerated Gradient Descent CEH {Lecture 7 notes}, CMH {Lecture 7 notes}
Lecture 18 notes
11/13 Mirror Descent CEH {Lecture 5 notes (5.2.2 onward)}, SB {Chapter 4 (till 4.3)}
Lecture 19 notes
11/14 Mirror Descent CEH {Lecture 5 notes (5.2.2 onward)}, SB {Chapter 4 (till 4.3)}
No lecture notes available; material from this class is included in the lecture notes for the next class. Also, readings cover the material taught in the class.
11/19 Mirror Descent CEH {Lecture 5 notes (5.2.2 onward)}, SB {Chapter 4 (till 4.3)}
Lecture 21 notes
11/21 Newton's method CMH {Lecture 23}, BV {Chapter 9 (section 9.5)}
Lecture 22 notes

Course information

Description
The course covers the theory of optimization for problems arising in machine learning. The course will be highly mathematical and will involve analysis of optimization algorithms. Topics covered will be a subset of the following: convex analysis, first-order methods (cutting plane, gradient descent, stochastic gradient methods, and variants), second order methods (Newton, quasi-Newton such as LBFGS, etc.), acceleration techniques for gradient methods (Nesterov, adaptive preconditioning, variance reduction, etc.), projection-free methods (Frank-Wolfe/conditional gradient), non-convex optimization, online learning, regularization, generalization, and zero-order optimization.
Prerequisites
Introductory machine learning: course similar to COMS 4771.
Multivariate calculus: textbook by Marsden and Weinstein; MIT open courseware.
Linear algebra: immersive linear algebra; lecture notes from UC Davis; MIT open courseware; book chapter by Goodfellow, Bengio, and Courville; additional review notes
Probability: textbook by Grinstead and Snell; book chapter by Goodfellow, Bengio, and Courville; review notes by Arian Maleki and Tom Do
Algorithms: Chapter 0 of textbook by Dasgupta, Papadimitriou, and Vazirani (with discussion of asymptotic notation); "booksite" by Sedgewick and Wayne
Mathematical maturity: notes on writing math in paragraph style from SJSU; notes on writing proofs from SJSU.
Python: You must be comfortable writing code to process and analyze data in Python. Some Python tips are available here.
Course work
Homework assignments: 20%.
Implementation project based on research paper: 30%. Details TBD.
Midterm exam: 20%. Covers topics from the first half of the course.
Final exam: 30%. Covers topics from the entire course, but with an emphasis on the second part of the course.
Lecture notes
All lectures will be on the board and no slides will be provided; you are strongly advised to take your own notes during the lecture. Lecture notes scribed by TAs will be posted here possibly with a delay of a couple weeks. hese notes will only be lightly edited, so they may contain errors. Pointers to readings for each lecture will be provided to materials available online, from the textbooks and related courses listed below.
Textbooks
Convex Optimization: Algorithms and Complexity (SB) by Sébastien Bubeck
Convex Optimization (BV) Stephen Boyd and Lieven Vandenberghe
Introductory Lectures on Convex Optimization (YN) by Yurii Nesterov
Lecture notes on Optimization for Machine Learning (EH) by Elad Hazan
Lecture notes on Optimization for Machine Learning (MJ) by Martin Jaggi and Bernd Gärtner
Optimization Methods for Large-Scale Machine Learning (BCN) by Léon Bottou, Frank E. Curtis and Jorge Nocedal
Related courses
Optimization for Machine Learning (CEH) by Elad Hazan
Optimization for Machine Learning (CMJ) by Martin Jaggi
Convex Optimization and Approximation (CMH) by Moritz Hardt
Convex Optimization (CPA) by Pradeep Ravikumar and Aarti Singh
Course materials
Copyright for course materials (e.g., lecture slides, homeworks, exams, solutions) is owned by their respective authors; course materials may not be redistributed without explicit permission.
Getting help
You are encouraged to use office hours and Piazza to discuss and ask questions about course material and reading assignments, and to ask for high-level clarification on and possible approaches to homework problems. If you need to ask a detailed question specific to your solution, please do so on Piazza and mark the post as "private" so only the instructors can see it.
Questions, of course, are also welcome during lecture. If something is not clear to you during lecture, there is a chance it may also not be clear to other students. So please raise your hand to ask for clarification during lecture. Some questions may need to be handled "off-line"; we'll do our best to handle these questions in office hours or on Piazza.

Homework assignments

Formatting policy
Your write-up must be neatly typeset as a PDF document.
It is strongly recommended that you prepare your write-up as a LaTeX document (intro, wikibook). If you do not use LaTeX, you must ensure that the mathematical typesetting is of equal quality and legibility. (Microsoft Office 2013 or later seem to do a reasonable job.)
If you use LaTeX, you should use the following template: homework.tex, homework.cls, homework.pdf. If you do not use LaTeX, you must use an equivalent template: in particular, ensure that your name, your UNI, and the UNI's of students you discussed the homework with appear on the first page.
Submission policy
You must submit your write-up as a single PDF file, called uni.pdf where uni is replaced with your UNI (e.g., abc1234.pdf), on Courseworks by 1:00 pm of the specified due date. If any code is required, separate instructions will be provided.
Late policy
Late assignments will not be accepted.
9/12 Homework 1 [hw1.tex, also needs __includes.tex]. Solution. Due: 9/26 by 1:00 pm.
9/26 Homework 2 [hw2.tex, also needs __includes.tex]. Solution. Due: 10/12 by 1:00 pm.
10/31 Homework 3 [hw3.tex, also needs __includes.tex]. Solution. Due: 11/14 by 1:00 pm.
11/14 Homework 4 [hw4.tex, also needs __includes.tex]. Solution. Due: 11/29 by 1:00 pm.

Academic Rules of Conduct

Academic honesty
You are expected to adhere to the Academic Honesty policy of the Computer Science Department, as well as the following course-specific policies.
Collaboration
You are welcome and encouraged to discuss course materials and reading assignments, and homework assignments with each other in small groups (two to three people). You must list all discussants in your homework write-up. Discussion about homework assignments may include brainstorming and verbally discussing possible solution approaches, but must not go as far as one person telling others how to solve a problem. In addition, you must write-up your solutions by yourself, and you may not look at another student's homework write-up/solutions (whether partial or complete).
Collaboration of any kind on exams is not permitted.
Outside references
Outside reference materials and sources (i.e., texts and sources beyond the assigned reading materials for the course) may be used on homework assignments only if given explicit written permission from the instructor. Such references must be appropriately acknowledged in the homework write-up. You must always write up your solutions in your own words.
Sources obtained by searching the internet for answers or hints on homework assignments are never permitted.
You are permitted to use texts and sources on course prerequisites (e.g., your linear algebra textbook). If you need to look up a result in such a source, provide a citation in your homework write-up.
If you inadvertently come across the solution to a homework problem: you must acknowledge this source and document the circumstance in your homework write-up, and then do your best to produce a solution without looking at the source. You must, as always, write your solution in your own words.
Violations
Violation of any portion of these policies will result in a penalty to be assessed at the instructor's discretion. This may include receiving a zero grade for the assignment in question AND a failing grade for the whole course, even for the first infraction. Such students are also reported to the relevant Deans' offices that handle cases of academic dishonesty.
Credits: Website design by Daniel Hsu, and policies and LaTeX template are by Rocco Servedio.