Learn about this year’s Summer School program



HEC Paris

Title: Voice Analytics for Business

With the rise of voice-enabled digital interfaces, including Siri and Alexa, firms have growing interest in understanding how characteristics of a voice may affect business outcomes, such as engagement or sales. Drawing from the physics of sound, we will discuss the key acoustic properties of voices – including pitch, volume, harmonics, and tempo – with demonstrations on how to extract these features from voice clips. We will illustrate how to holistically analyze these acoustic features in the context of voice, through an example of research on the role of voice in consumer decision-making. Session attendees will have an opportunity to conduct similar analysis using a voice dataset, from which they will extract business insights.



Title: Learning on messy, tabular data

Many if not most data science projects are run on tabular data: data from one or multiple tables with columns of diverse nature. Tabular data comes with its own challenges: many entries are of discrete nature (categories or entities), entries may be missing, the data may need to be enriched by joining multiple tables. Additional data-integration challenges arise when the tables are assembled across different sources and come with different conventions. In this lecture I will present various machine-learning methods dedicated to such data. I will illustrate these methods with example using the skrub and scikit-learn Python packages.

Poonacha Medappa

Tilburg University

Title: Integrating AI and Machine Learning into Economics and Management Research

The capabilities of machines are advancing rapidly, with examples such as ChatGPT’s human-like reasoning and creativity, Copilot’s capacity to become our peer-programmers, Facebook’s facial recognition technology, and Google’s new AI and ML frameworks like Tensorflow. With these advancements, researchers now have a large toolset of approaches to perform data-driven research and provide insights that were previously infeasible. But, as researchers, how will these advancements change our research identity and the nature of our research? For instance, face recognition algorithms do not follow predetermined rules for detecting certain pixel combinations that make up a face, based on human understanding. Instead, these algorithms utilize a vast dataset of labeled photos to estimate a function f (x), which predicts the presence y of a face based on pixels x. This approach has similarities to econometrics and raises important questions, which we will address in this workshop. Specifically, we will answer three questions – (a) Are these algorithms simply utilizing conventional methods to process extensive and innovative datasets? (b) If these are new empirical tools, how do they relate to existing knowledge? and, (c) How can we as researchers incorporate these methods into our own research?

The first half of the workshop will be an interactive lecture, where we will understand the background and implications of ML and AI techniques for economic research. In the second half of this workshop, we will have a hands-on exercise. Here, we will develop a data-driven research question using these new and advanced computational techniques. The idea here is to see the amazing power that we now have in conceptualizing new constructs and finding interesting insights.
Johan Hombert

HEC Paris

Title: Scoring Strategically: Application to Finance
This tutorial starts with a brief introduction of fintech lending and the use of credit scoring in credit markets. The main part of the tutorial is an interactive game in which participants play the role of a fintech lender. Context: Banks increasingly use alternative data and machine learning to screen customers and set interest rates. For example, a lender using digital footprints to predict loan default will have a competitive edge over traditional lenders. However, there are important pitfalls to avoid when using alternative data and machine learning to score consumers, such as the winner’s curse and discrimination. This tutorial and its interactive game provide an introduction to these issues.

Télécom Paris

Title: Self-supervised learning in computer vision and medical imaging

Many tasks in Computer Vision and Medical Imaging, such as object detection, image classification, or semantic segmentation, have reached astonishing results in the last years. This has been possible mainly because large (N > 10^6) and labeled data-sets were available. However, in many applications there is a lack of large datasets and/or annotations can be costly, time-consuming and difficult to obtain. Furthermore, it has been recently shown that transferring representations learnt on large datasets, such as ImageNet, is useful only when there is a high visual similarity between pre-training and target domains, namely a small domain gap. To this end, several self-supervised pre-training strategies have recently emerged. They leverage annotation-free pretext tasks to provide surrogate supervision signals for feature learning. These methods can be trained on large, unannotated datasets and then transferred to small, labeled datasets. In the first part of this tutorial, you will learn the most important and used self-supervised strategies for computer vision and medical imaging. In particular, we will study thoroughly contrastive learning using a geometric approach. In the second part, you will test the described methods on both toy exemples and real data using Pytorch.

Title: An introduction to diffusion models

In this tutorial, we will present the recent state-of-the-art method for generative modeling: diffusions models. We will study their theoretical aspects (Time reversal of stochastic processes, …) and practical implementation. The goal of the tutorial is to give every attendant the necessary tools to experiment with simple diffusion models both mathematically and practically. If time permits, we will also cover some extensions of diffusion models and discuss their links with other areas of mathematics such as Optimal Transport.