Research

How to collaboratively train AI – Aymeric Dieuleveut

We are delighted to present the project of Aymeric Dieuleveut, Hi! PARIS Fellow 2021.

Aymeric Dieuleveut is an assistant professor in Statistics at École Polytechnique. His project with Hi! PARIS is about creating new algorithms for federated learning, supported by theoretical guarantees, a project entitled “FLAG”.

How to collaboratively train AI 

One might think that artificial intelligence is a topic for computer scientists. But Aymeric Dieuleveut considers himself as being 95% mathematician and only 5% computer scientist. This young researcher has indeed been trained in statistics and probability before becoming an assistant professor at École Polytechnique in the Centre for applied mathematics (CMAP). That is why he initially approached artificial intelligence from the mathematical side, i.e. statistical learning. “Artificial intelligence is a wide topic. Thanks to AI algorithms, we are training computers to learn rules and structures hidden in the data, and then to use those rules to solve some tasks, for instance translate a sentence in a foreign language”.  When he was Ph.D. student in the Sierra Team of the Computer Science Department of École Normale Supérieure in Paris, he explored the relationship between statistical properties (What is the amount of information contained in a given dataset?) and algorithmic properties (What task can be performed with this dataset and how long will it take?).

Building on this experience, Aymeric Dieuleveut’s goal as a researcher is to develop useful algorithms and mathematically assert their performance. “Even though I work on theory, I want these results to have an impact on daily life applications”.  At the core of his work lie stochastic algorithms. Those are used in neural networks to find the right model for the data and have been extensively studied since deep learning’s revival in the 2000s. Scientists understand them well, at least when the computation is performed on a single machine and dataset. “But another domain called federative learning has gained momentum since 2016. In this case, many people, each having their own dataset, seek to collaborate to train a model without sharing their data for privacy reasons”. How to design new stochastic algorithms in that context?

Federated learning is already implemented in some use-cases. When you type a text message, your smartphone automatically suggests some choices for the next words. This learning is performed thanks to all the users but with some customization: if you type your first name, it may suggest your last name, but only if it’s the smartphone that you regularly use. Another use-case is when hospitals want to collaborate to train an algorithm on medical data. Each one has a rather small dataset and could benefit from the addition of the other hospitals’ information, without sharing its own data. Privacy is a key challenge not only for individuals and organizations such as hospitals, but also for countries, who are becoming more and more eager to protect their citizens’ personal data. That is why federative learning is attracting more and more researchers’ interests.

Confidentiality is not the only constraint to consider in federated learning. Scientists must deal with heterogeneity, because all participants don’t always have the same objectives. There are also communication constraints. All devices may not be available at the same time and their bandwidth is limited you certainly do not want your smartphone to download a few gigaoctet text-model each morning. Finally, there are often missing values resulting from the aggregation of different datasets, for instance one hospital may have kept track of blood pressure and not the other. “Our project funded by Hi! PARIS has several research directions, and we tackle a combination of several of those challenges. Our goal is to build the next generation of algorithms for large scale Federated Learning, supported by strong theoretical guarantees, and practical implementations together with open-source code” emphasizes Aymeric Dieuleveut, who hopes that these guarantees will help build trust in machine learning technologies.