One of the biggest challenges? Gaining access to the data itself. Media archives are valuable and often guarded, especially in an era when content can be used to train large commercial models without consent.
That’s why Ollion’s collaboration with Aday, an established company which archives this data, is key. It enables his team to work directly with extensive corpora of journalistic text while respecting usage agreements offered by publishers. With access unlocked, the questions become familiar to any social scientist: What are we seeing? What are we missing? And what does it mean?
Interestingly, the team avoids the biggest AI models. “They’re too slow, too opaque, and too resource-hungry,” Ollion says. Instead, they work with smaller models, faster, more transparent, and often better suited to the job, and often better.
“Limitations can be a strength. They force us to think more clearly about what we want to learn.”