Development Innovation Research

Hi! PARIS Reading groups “Optimization for Machine Learning”

The Hi! PARIS reading groups propose to study a topic using scientific articles on a theoretical and a practical point of view. The reading groups are opportunities of interaction between our corporate donors and our affiliates academic teams around selected topics of interest.

Each edition is planned for 2-4 sessions presenting one topic by the mean of 3-4 research papers. For each session: presentation of mathematical models and theoretical advances by a researcher + simulations with a Python notebook by an engineer.

Registration

Please register to the event using your professional email address to get your personal conference link. Please do not share your personalised link with others, it is unique to you. You will receive an email regarding your registration status.

Optimization for Machine Learning

We are pleased to announce the next edition of our reading groups, focusing on optimization for machine learning. This series will explore various optimization methods and their applications in improving the performance and efficiency of machine learning models. We will delve into optimization techniques used in large-scale machine learning, such as the Stochastic Gradient method and its variants, and discuss both theoretical and practical considerations. Additionally, we will examine the effectiveness of gradient-based optimization methods in over-parameterized non-linear systems and neural networks, including the implications of the PL* condition for Stochastic Gradient Descent. These discussions aim to provide a comprehensive understanding of how optimization techniques can significantly enhance machine learning.

Session 1/3
Tuesday 8 October, 2024 – 2.00-3.30pm (Online)
  • Speaker: Radu Alexandru Dragomir, Télécom Paris – IP Paris (Hi! PARIS Chair Holder)
  • Title: Optimization Methods for Large-Scale Machine Learning
  • Abstract: In this talk, I will give an overview of optimization methods used in large-scale machine learning. These mainly include the Stochastic Gradient method and its variants, such as those using momentum, adaptive step sizes, variance reduction, second-order methods… This presentation aims to be a general introduction to both theoretical and practical considerations and will be based on the review paper by Bottou et al. (2018).

Paper: https://arxiv.org/pdf/1606.04838

Session 2/3
Tuesday 12 November, 2024 – 2.00-3.30pm (Online) 
  • Speaker: Sholom Schechtman, Télécom SudParis.
  • Title: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
  • Abstract: The success of deep learning is due, to a large extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks. Recently, a possible explanation of this observation was proposed: close to its minimizers, a neural network satisfies the so-called PL* condition. The aim of this talk is to present this condition, establish its implications for the Stochastic Gradient Descent, and show that it is satisfied for typical, sufficiently wide neural networks.
  • Paper: https://arxiv.org/abs/2003.00307
Session 3/3
Tuesday 10 December, 2024 – 2.00-3.30pm (Online) 
  • Speaker: Joon Kwon, INRAE.
  • Title: The Road Less Scheduled
  • Abstract: Existing learning rate schedules that do not require specification of the optimization stopping step T are greatly out-performed by learning rate schedules that depend on T. We propose an approach that avoids the need for this stopping time by eschewing the use of schedules entirely, while exhibiting state-of-the-art performance compared to schedules across a wide family of problems ranging from convex problems to large-scale deep learning problems. Our Schedule-Free approach introduces no additional hyper-parameters over standard optimizers with momentum. Our method is a direct consequence of a new theory we develop that unifies scheduling and iterate averaging. An open source implementation of our method is available.
  • Paper: https://arxiv.org/pdf/2405.15682
    .
France 2030

This work has benefited from a government grant managed by the ANR under France 2030 with the reference “ANR-22-CMAS-0002”.