Highlight Teaching

AI Seminar Cycle

The AI Seminar Cycle is organized by Hi! PARIS, in collaboration with the ELLIS program on Theory, Algorithms, and Computations of Modern Learning Systems and the Paris ELLIS Unit
This partnership aims to broaden the visibility of the seminar among the European AI research community.

No Free Lunch : “From a Simultaneous (Machine) Learning Impossibility to Heisenberg Uncertainty Principle”

Thursday, October 2 (2 PM – 3:30 PM) – Hybrid: Amphitheater Becquerel, École polytechnique (Click to register) I Online

Research Session : “No labels, no training: Leveraging Language for Detecting Anomalous Events in Videos”

Wednesday, November 5 (11 AM – 12 PM) – Online (click to register)

Spyros Gidaris
Latent Representations for Better Generative Image Modeling

Wednesday, December 10 (11 AM – 12 PM) – Hybrid (click to register)

Abstract: This talk explores how latent representations shape modern generative models. While latent spaces (like those in VQ-VAE and VQ-GAN) are central to today’s generative architectures—from diffusion models to autoregressive approaches—their structure and properties are often overlooked. I will present three works that refine or leverage latent representations for better generative modeling. 

First, EQ-VAE addresses a key limitation in existing autoencoders used in latent-based generative models: their latent spaces lack equivariance to simple semantic-preserving transformations like rotation or scaling, making generation harder. We introduce a simple regularization method that enforces equivariance, reducing its complexity without degrading reconstruction quality. This improves multiple state-of-the-art models (DiT, SiT, MaskGIT) and speeds up training.


Next, ReDi integrates pretrained semantic features into latent diffusion models. Instead of just generating low-level image latents, we jointly model them with high-level semantic features (e.g., from DINOv2). This unified approach boosts image quality and training efficiency while enabling “Representation Guidance”, a simple way to steer generation using learned semantics.

Finally, DINO-Foresight tackles video prediction. We predict future frames in the semantic feature space of pretrained vision foundation models (e.g., from DINOv2), avoiding pixel-level inefficiencies. This makes forecasting simpler, faster, and more robust, enabling flexible adaptation to downstream tasks.

Together, these works highlight how better latent representations can simplify, accelerate, and improve generative modeling.

Robotics & AI

Wednesday, January 7 (11 AM – 12 PM) – Hybrid (click to register)

Research Session on Vision

Wednesday, February 4 (11 AM – 12 PM) – Hybrid (click to register)

Research Session on Machine learning

Wednesday, March 4 (11 AM – 12 PM) – Online (click to register)

Research Session on Machine learning

Wednesday, June 3 (11 AM – 12 PM) – Online (click to register)