Optimization method implementation project

Project report and code due: 11:59 pm Nov 24, 2019, on Courseworks

Synposis
Ground rules
Project report, code, and presentation details

Synopsis

The project involves implementing an optimization method from a research paper of your choosing (a few suggestions will be given below over the next few days). You will work in teams of 2-3 people to implement the optimization method, and run experiments with the algorithm on some standard datasets, and compare it to other baseline methods discussed in the paper or from the class. The goal is to understand the optimization method in detail and to understand its benefits and shortcomings, and to critically evaluate the claims made in the research paper. Your findings will be summarized in a short (at most 4 pages) report and a short in-class presentation (details TBD). Your report should look like the "Experiments" section of an ICML, NeurIPS or ICLR machine learning paper. You are not asked to invent new algorithms or convergence theory, but only to extend knowledge and insights about the experimental performance of an optimization method, and possibly interpret this in view of existing theory. You will also need to submit your code on Courseworks.

Your grade on this project will be based on the following:

Project report (submitted on Courseworks, 1/3 of project grade);
Code (submitted on Courseworks, 1/3 of project grade);
In-class presentation (PDF submitted on Courseworks, 1/3 of project grade).

Ground rules

The project should be done in teams of two or three students from the class; collaboration across teams is subject to the usual policies. Furthermore, the usual policies on outside references and academic honesty will be strictly enforced. Please form teams as soon as possible and start thinking about which paper you want to study, and email me the names of the members of your team and the paper you selected. The papers selected by all the teams should be all different; in case there is contention for a paper, I will allocate it one of the interested team uniformly at random. So I recommend coming up with around 3 papers you would be interested to study.

Project report, code, and presentation details

The project report PDF, code, and a PDF of the presentation must be submitted (together in a single ZIP file) on Courseworks by 11:59 pm Nov 24. Put your code in a directory called "code" in the ZIP file.

Report

The report should be written in LaTeX using this template from Martin Jaggi's course. The report is strictly limited to 4 pages, although the bibliography may extend into the 5th page.

The report should describe the following aspects:

Optimization method studied
Give the pseudocode of the optimization method, a short description of the motivation for the development of the method, and the claimed benefits from the paper.
Experimental setup
Mention the baseline algorithms used for comparison, datasets used, hyperparameter choices, and evaluation metrics.
Experimental results
Describe your experimental findings concisely and clearly.
Critical evaluation of claims
Critically evaluate the claims made in the research paper based on the experimental results.
Conclusions from study
Describe your conclusions and inferences from the experimental results.

The report should be well-written and polished. Try to convey a clear story giving the most relevant aspects of your findings. Before the submission, proofread your report. Use a spell-checker. Plots are great way to share information that might be hard to convey by writing. Make sure that your plots are understandable, have labels for axes, correct axes limits, and a description.

Code

In addition to the report, you must submit your code as well. Code should be written in Python and we will only allow access to the following libraries:

Python standard library
cvxopt
matplotlib
numpy
pandas
scipy
sklearn
statsmodels
pytorch
tensorflow

While you're allowed to use pytorch or tensorflow, you should implement the main optimization method from your paper yourself.

If there is a method you would like to use that is not part of these toolboxes or libraries, you must implement it yourself. If there is any other library that you wish to use, you must get it ratified by emailing me first.

Your code should adhere to the following rules:

No datasets: Do not submit any datasets with your code. Instead, provide URLs linking to the datasets on the internet (you may use cloud storage services like Dropbox for this purpose).
Reproducibility: In your submission, provide a script (e.g. run.py) which produces all results and plots you use in the paper. A README file should be provided explaining how to run the code, including where the code expects to find the datasets after downloading them.
Documentation: The code should be well-documented; this should be done in the source itself via comments.
Size: When compressed in a ZIP file, the code should not exceed 1 MB in size.

In-class presentation

Each team will be required to present their findings in a short (probably around 10 minutes including 2 minutes for questions, details TBD) in-class presentation. You may use any software to prepare your presentation, but the final presentation should be exported to a PDF file which is included in the ZIP file submitted on Courseworks along with the report and the code.

The in-class presentations will be on Nov 26 and Dec 3, 2019. Each team will summarize their findings in the presentation. Be sure to include your critical evaluation and conclusions in the presentation.

Some generic advice (from Alekh Agarwal and Alex Slivkins):

Aim to make your presentation as accessible as possible. Explain what is the topic/problem and why it is exciting. Transmitting the high-level bits and excitement is more important in a short talk than going into any details.
Try to distill crucial ideas and find simple high-level explanations for them. Include hard details if you can state them clearly and concisely, in a way accessible to the class.
Strive to make the slides clear and non-crowded. Write the main points, but no need to write full sentences, or to put on the slide everything that you intend to say.
Use large enough font size so that it is visible from the back row.
No need to make slides fancy. In particular, bulleted list with plain text is totally fine!

Paper suggestions

Here are some suggestions for papers to study. Feel free to find your own papers, but make sure to discuss them with me.

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization by Shalev-Shwartz and Zhang.
Accelerated Proximal Stochastic Dual Coordinate Ascent for Regularized Loss Minimization by Shalev-Schwartz and Zhang.
A Universal Catalyst for First-Order Optimization by Lin, Mairal, and Harchaoui.
Katyusha: The First Direct Acceleration of Stochastic Gradient Methods by Allen-Zhu.
Second-Order Stochastic Optimization for Machine Learning in Linear Time by Agarwal, Bullins, and Hazan.
Block-Coordinate Frank-Wolfe Optimization for Structural SVMs by Lacoste-Julien, Jaggi, Schmidt and Pletscher.
Stochastic Optimization with Importance Sampling by Zhao and Zhang.
Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) by Bach and Moulines.
Stochastic Variance Reduction for Nonconvex Optimization by Reddi, Hefny, Sra, Poczos, and Smola.
Variance Reduction for Faster Non-Convex Optimization by Allen-Zhu and Hazan.
The Case for Full-Matrix Adaptive Regularization by Agarwal, Bullins, Chen, Hazan, Singh, Zhang, and Zhang.
Shampoo: Preconditioned Stochastic Tensor Optimization by Gupta, Koren, and Singer.
Memory-Efficient Adaptive Optimization by Anil, Gupta, Koren, and Singer.
Ellipsoidal Trust Region Methods for Neural Network Training by Adolphs, Kohler, and Lucchi.
Optimizing Neural Networks with Kronecker-factored Approximate Curvature by Martens and Grosse.
Communication Efficient Distributed Optimization using an Approximate Newton-type Method by Shamir, Srebro and Zhang.
Momentum-Based Variance Reduction in Non-Convex SGD by Cutkosky and Orabona.
Training Deep Networks without Learning Rates Through Coin Betting by Orabona and Tommasi.