Summer School 2025 – Program & Speakers

Learn about this year’s Summer School program & speakers!

Keynotes

Title: “Foundation models in biology”

Monday, July 7, (3:30 PM – 4:30 PM) – Amphitheater Gay Lussac

Abstract: Foundation models, such as large language models or vision transformers, were initially developed to capture the essence of data that humans understand—like language and images. However, they have also proven to be powerful tools for modeling complex scientific data that humans barely comprehend, such as natural data in various scientific domains.

In this course, I will introduce the audience to the techniques and applications of foundation models in biology, including how they offer new insights into protein folding, cellular function, and the organization of tissues in both normal and disease states.

Bio: Jean-Philippe Vert is co-founder and CEO of Bioptimus, a pioneering AI-first biotech company building foundation models to transform our understanding of biology and accelerate biomedical innovation. He is also Chief R&D Officer at Owkin, where he has helped drive the application of AI to drug discovery and clinical research since 2022.

A leading figure in AI for biology, Jean-Philippe has over 20 years of experience at the frontier of machine learning and life sciences. Prior to his current roles, he was a research scientist at Google Brain, where he led efforts in core machine learning and computational biology. He has held professorships at ENS Paris and Mines ParisTech, was a visiting professor at UC Berkeley, and began his research career at Kyoto University.

He holds degrees from École Polytechnique and a PhD in mathematics from Paris University, and is a member of the National Academy of Technologies of France. He has authored over 190 publications and is recognized globally for his contributions to AI, biomedical data modeling, and translational research.

LUDOVIC DENOYER (H)

Title: “From LLMs to Agents, a Concrete Example: Learning Cost-Efficient Web Agents”

Tuesday, July 8, (1:00 PM – 2:00 PM) – Amphitheater Gay Lussac

Abstract: Moving from training Large Language Models (LLMs) and Vision-Language Models (VLMs) to building models that can drive the behavior of autonomous agents is far from trivial. It requires not only adapting learning methodologies, but also designing new data generation pipelines to support decision-making and action-taking.

As a concrete example, we introduce Surfer-H, a cost-efficient web agent capable of performing user-defined tasks through VLM integration. At its core is Holo1, a new open-weight family of VLMs specialized in web navigation and information extraction. Holo1 is trained on a carefully curated blend of data sources, including open-access web content, synthetic examples, and self-generated agentic trajectories. This enables Surfer-H to reason over complex visual-textual web environments efficiently and robustly.

Bio: Ludovic Denoyer is the Agent Research Team Lead at H Company.

CHARLES-ALBERT LEHALLE (Ecole polytechnique)

Title: “Machine Learning and Data Sciences for Financial Markets”

Wednesday, July 9, (1:00 PM – 2:00 PM) – Amphitheater Gay Lussac

Abstract: After outlining the expected role of the financial system, I will review both the current and potential applications of AI by market participants. In particular:

AI for users of financial markets, including applications such as robo-advisors
Machine learning for improved intermediation, focusing on how nonlinear methods can enhance risk management, including the hedging of derivative products
Data science for stronger connections to the real economy

In the final part, I will focus on the use of alternative data, such as satellite imagery, credit card transactions, geolocation, and text data and discuss their biases as well as methods to address them.

No prerequisites. However, the following books may be of interest for those who wish to explore the topic further:

Supplementary Materials: Slides will be shared later if needed. Additional reading materials can be provided upon request.

Bio: Charles-Albert Lehalle is currently a Professor at Ecole polytechnique in Paris teaching and researching on liquidity, price formation, and the use of AI on financial markets. Previously, he has been Global Head, Quantitative Research & Development, at the Abu Dhabi Investment Authority (ADIA) during three years. He started his career being in charge of embedded AI solutions at the Renault Research Center and moved to the financial industry with the emergence of automated trading in 2005. He was Global Head of Quantitative Research at Crédit Agricole Cheuvreux, and Head of Quantitative Research on Market Microstructure at Crédit Agricole Corporate Investment Bank, before joining to Capital Fund Management (CFM) for 7 years.

On the academic side, Pr. Lehalle received the 2016 Best Paper Award in Finance from Europlace Institute for Finance (EIF) and has published more than eighty academic papers and book chapters. He co-authored the books “Market Microstructure in Practice” (World Scientific Publisher, 2nd edition 2018), analyzing the main features of modern markets; and “Financial Markets in Practice” (World Scientific Publisher 2022), explaining how the connected network of intermediaries that makes the financial system is shaping prices formation; he co-edited with Pr Agostino Capponi the book “Machine Learning and Data Sciences for Financial Markets: A Guide to Contemporary Practices” (Cambridge University Press, 2023).

Pr. Lehalle is also a member of the Scientific Directory of the Louis Bachelier Institute, Lecturer at UC Berkeley and Paris 6 Sorbonne Université and Ecole Polytechnique “Probability and Finance” Master.

ERIC XING (MBZUAI)

Title: “Toward AI-Driven Digital Organism: Multiscale Foundation Models for Predicting, Simulating, and Programming Biology at All Levels”

Thursday, July 10, (1:00 PM – 2:00 PM) – Amphitheater Gay Lussac

Abstract: At the core of medicine, pharmacy, public health, longevity, agriculture, and environment, is biology at work. Biology in the physical world is too complex to manipulate and always expensive and risky to tamper with. In this talk, I present a vision of using AI to model and simulate biology and life. I will layout an engineering viable approach to construct an AI-Driven Digital Organism (AIDO), leveraging self-supervised pretraining and adaption of large-scale foundation models, and I report some results of the AIDO foundation models for DNA, RNA, Protein, Structure, and Cell, respectively; their abilities to tackle biological problems at the full spectrum of granularities, from sequence, to structure, to network, to phenotype, to diseases, and to drug responses; and early efforts on how to integrate these FMs as components into a holistic multiscale system. We envision that AIDO opens up a safe, affordable and high-throughput alternative platform for predicting, simulating and programming biology at all levels from molecules to cells to organisms, and is poised to trigger a new wave of better-guided wet-lab experimentation and better-informed first-principle reasoning, which can eventually help us better decode and improve life, and industrialize the full workflow of biomedicine.

Bio: Professor Eric Xing is the President of the Mohamed bin Zayed University of Artificial Intelligence, and a Professor of Computer Science at Carnegie Mellon University. His main research interests are the development of machine learning and statistical methodology, and large-scale distributed computational system and architectures, for solving problems involving automated learning, reasoning, and decision-making in in artificial, biological, and social systems. In recent years, he has been focusing on building large language models, world models, agent models, and foundation models for biology.

Prof. Xing has served on the editorial boards of several leading journals including JASA, AOAS, JMLR; was a recipient of several awards including NSF Career, Sloan, Carnegie Science Award, and best papers in conferences such as ACL, NeurIPS, OSDI, and ISMB; and is a fellow of several societies including AAAI, ACM, ASA, IEEE, and IMS.

Tutorials

SOLENNE GAUCHER (École polytechnique)

Title: “Introduction to Fair Statistical Learning”

Monday, July 7 – Amphitheater Gay Lussac

Abstract: This mini-course offers an introduction to fairness challenges in statistical learning. As a relatively new area of research, the field raises several fundamental and still unresolved questions: How should unfairness in algorithmic predictions be formally defined? How can fairness constraints be enforced, and what are the associated trade-offs? What are the mathematical implications of including, or excluding, sensitive attributes (such as gender or ethnicity) when attempting to correct inequalities, particularly in the context of affirmative action?

The course will begin with a survey of the main conceptual and mathematical frameworks for defining fairness and integrating fairness constraints into learning algorithms. We will then focus on statistical fairness, an approach that seeks to ensure a balanced distribution of predictions across demographic groups. Classical results in this area will be presented, along with a discussion of their implementation in practice.

Bio: Solenne Gaucher is an Assistant Professor in Machine Learning and Fair AI at École Polytechnique. Before that, she was a postdoctoral researcher at ENSAE, working in the FairPlay group under the supervision of Vianney Perchet.

Her research focuses on sequential learning and sequential decision-making problems, with a particular interest in fair machine learning.

AYMERIC DIEULEVEUT (École polytechnique)

Title: “Introduction to conformal prediction”

Tuesday, July 8 – Amphitheater Gay Lussac

Abstract: The goal of this tutorial is to provide a detailed and rigorous introduction, along with a comprehensive overview, of the rapidly growing field of conformal prediction. In particular, it focuses on thoroughly discussing methods, theoretical results, and practical trade-offs to enable participants to effectively and purposefully apply these techniques in their own domains and use cases.

This tutorial was developed in collaboration with Margaux Zaffran.

Prerequisites: Knowledge on general Machine Learning and basic theory of Probabilities and Statistics is needed.

Bio: Aymeric Dieuleveut is a researcher specializing in statistical machine learning, stochastic optimization, and high-dimensional statistics. He is a tenured Professor at École polytechnique in Paris and, since February 2025, has been serving as one of the Scientific Co-Directors of Hi! PARIS as well as Director of Hi! PACE, the institute’s teaching branch.

His work focuses on developing mathematical guarantees and designing novel algorithms for machine learning. He explores various aspects of AI trustworthiness, seeking to better understand model behavior through rigorous analysis of training dynamics. To that end, his research delves into first-order and stochastic optimization methods, uncertainty quantification, and the development of algorithms that enhance reliability through privacy, decentralization, and adversarial robustness.

Aymeric’s goal is to bridge theoretical foundations with real-world applications, and his research is supported by industrial collaborations in sectors such as energy and rail transportation.

Before joining École polytechnique in 2019, he was a postdoctoral researcher at EPFL. He earned his Ph.D. in mathematical statistics from the École Normale Supérieure (Ulm) in Paris in 2017, where he also completed his undergraduate and master’s studies.

JEREMIE MARY (Criteo AI Lab)

Title: “Introduction to Modern Large-Scale Machine Learning”

Wednesday, July 9 – Amphitheater Gay Lussac

Abstract: This tutorial provides an introduction to the key ideas and tools behind today’s large-scale machine learning systems, particularly those used to train and deploy models that combine text and images. We begin by examining the motivations and challenges of working with models containing billions of parameters. Students will learn how the Transformer architecture has evolved since 2017, how modern attention mechanisms such as FlashAttention function, and how to train models efficiently on large-scale datasets. The course also covers how text is tokenized using modern tokenizers, how large models can be used in zero-shot settings with human feedback, highlights the current successes of the agent-based approach, and explores emerging techniques such as diffusion models in natural language processing (NLP). Finally, we address how to make models smaller and faster through model compression, and examine alternative architectures to Transformers, such as RWKV and state-space models.

Bio: Jeremie Mary is a Senior Staff Researcher at Criteo AI Lab, where he focuses on reinforcement learning and image processing. He joined Criteo in 2017 and also serves as an Associate Professor at Inria (the French National Institute for Research in Digital Science and Technology). His research spans generative adversarial networks (GANs), recommender systems, and reinforcement learning.

ÉMILIE KAUFMANN (CNRS, CRIStAL)

Title: “Bandits for sequential decision making”

Thursday, July 10 – Amphitheater Gay Lussac

Abstract: In this tutorial, I will present the multi-armed bandit model, a simple model for sequential resource allocation tasks, such as the design of a recommender’s system. We will review fundamental algorithms to solve the exploration/exploitation dilemma that arise in some bandit problems, and then explain how they can be used in more complex settings. Participants will be able to try some bandit algorithms themselves using Python in a jupyter notebook.

Prerequisites Basic knowledge in probability and linear algebra. Python and jupyter notebook.

Supplementary materials: Slides will be provided before the tutorial session, in the meantime a good reference is the book “Bandit Algorithms” by Tor Lattimore and Csaba Szepesvari, available here: https://tor-lattimore.com/downloads/book/book.pdf

Bio: Émilie Kaufmann is a CNRS researcher at CRIStAL, Université de Lille, and a member of the Inria team Scool. Her research focuses on statistics and machine learning, with a particular interest in sequential learning.She studies stochastic models, particularly variants of the Multi-Armed Bandit (MAB) model, a key framework for sequential resource allocation, as well as Markov Decision Processes (MDPs). Her work spans reinforcement learning (maximizing rewards while learning) and adaptive testing (accelerating learning through adaptive data collection).

On the applied side, she is currently exploring how bandit strategies can be leveraged for adaptive early-stage clinical trials and how contextual bandits can support precision medicine.

ANNA KORBA (ENSAE)

Title: “Langevin diffusions: from MCMC to generative modeling”

Thursday, July 10 – Amphitheater Gay Lussac

Abstract: Langevin diffusions are stochastic processes that enable sampling from complex, often unnormalized probability distributions by leveraging gradient information. This tutorial introduces their foundations and explores how they form the basis of modern Markov Chain Monte Carlo (MCMC) methods. We then trace their evolution into score-based generative models, which adapt these dynamics to generate realistic data. Participants will gain both theoretical and practical insights into using Langevin-based methods in probabilistic machine learning.

Bio: Anna Korba is an Assistant Professor of Machine Learning at ENSAE/CREST in the Statistics Department.

Her research focuses on machine learning, with expertise in kernel methods, optimal transport, optimization, particle systems, and preference learning. She is particularly interested in sampling and optimization methods and continues to explore new approaches in these areas.

Industry Round Table

July 7, 2025 | 8:45 AM – 10:30 AM

The Industry round table is composed of representatives from Hi! PARIS Corporate Donors. This event is an opportunity for the audience to learn about AI & Data Science initiatives being taken by each of the participating companies.

After an opening introduction by the panel moderators, each of the industry panel members will be invited to provide a 5-minute presentation. This will be followed by a jointly moderated session to identify areas of practical interest that can spawn impactful research. The audience will have the opportunity to ask their questions to the panelists. The industry panel will be an interactive event with an opportunity to open communication channels for further research opportunities between the industry and academia.

AI Business Solutions with Ekimetrics

July 8, 2025 | 10:30 AM – 12:00 PM

Speaker: Théo Alves da Costa, Partner AI & Sustainability at Ekimetrics

Ekimetrics will explore one of today’s most pressing challenges: how to scale the impact of AI. The talk will delve into the strategies needed to reconcile the precision required for critical use cases with the complexities of deploying AI at scale.

The discussion will also highlight how to build sustainable AI solutions that support long-term, responsible business models.

Research Sessions by the Hi! PARIS Engineering Team

Session 1: Practical Research Session

July 7, 2025 | 4:30 PM – 6:00 PM

In this hands-on tutorial, the Hi! PARIS Engineering Team will guide participants through the process of developing a Python package. The session will cover package structure, best practices for maintaining Python code using pytest, and how to turn loose scripts into clean, reusable packages.

To streamline development, the tutorial will introduce Continuous Integration (CI) using GitHub Actions to automate building, testing, and generating package distributions (wheels). Participants will also learn how to use setuptools and twine to build and publish packages on PyPI, the official Python package index.

The session includes four live demos, walking participants through each step of turning raw code into a fully functioning, shareable Python package.

Speakers:
– Nassima Ould Ouali (Machine Learning Research Engineer, École Polytechnique & Hi! PARIS)
– Awais Hussain Sani (Senior Machine Learning Research Engineer, Hi! PARIS)

Session 2: Student-Oriented Practical Research Session

July 9, 2025 | 2:00 PM – 3:30 PM

This student-oriented practical session, led by the Hi! PARIS Engineering Team, offers a hands-on introduction to Machine Learning deployment.

Participants will begin by deploying a web-based machine learning app using just a few lines of code with Streamlit. The session will then introduce Amazon SageMaker to demonstrate how to deploy a machine learning model as a scalable endpoint.

To conclude, participants will learn how to build a Docker container that integrates and deploys a model to the cloud, highlighting best practices commonly used in industry.

Speakers:
– Marguerite Leang (Machine Learning Research Engineer, Hi! PARIS)
– Awais Hussain Sani (Senior Machine Learning Research Engineer, Hi! PARIS)

Poster Session & Poster Award

Posters will be displayed in Ecole polytechnique Campus from Poster session on Day 3 (Wednesday 9 July, 4:00-5:30pm) for presentation to Poster award on Day 4 (Thursday 10 July, 5:30-6:30pm).

An award, including a financial prize, will be given for the best poster.

Please note. Posters must be printed by your own means. There will be no printing on site.

Format. The preferred format of the poster is A0 paper, portrait mode (height : 119 cm, width : 84 cm). We will provide you with pins or with tape to hang your poster on the wall.

Guidelines. For Your Convenience, see above some guidelines for poster presentation borrowed from the ICML Conference.
There are many great guides to making accessible and inclusive talks and posters; we advise everyone to consider all the points made in the RECSYS guidelines, the ACM guide, and the W3C guide.

We would like to highlight the following items:

Keep your posters clear, simple, and uncrowded. Use large, sans-serif fonts, with ample white space between sentences and paragraphs. Use bold for emphasis (instead of italics, underline, or capitalization), and avoid special text effects (e.g., shadows).
Choose high contrast colors; dark text on a cream background works best.
Avoid flashing text or graphics. For any graphics, add a brief text description of the graphic right next to it.
Choose color schemes that can be easily identified by people with all types of color vision and do not rely on color to convey a message (see How to Design for Color Blindness and Color Universal Design for further details).
Use examples that are understandable and respectful to a diverse, multicultural audience.

You can find an example of good poster and another example of a poor poster here: https://guides.nyu.edu/posters

Social Events

Two social events are schedules as part of the Hi! PARIS Summer school 2025:

Day 1 (Monday 7 July, 6:00-7:00pm) – Opening welcome cocktail.
Day 3 (Wednesday 9 July, 6:00-9:00pm) – Cocktail