Clémentine Fourrier

Welcome!

The website author's logo, an orange slice (pun on her first name). Logo made by Alix Chagué

I’m an AI researcher at HuggingFace, leading our evaluation efforts and collabs (on LLMs/agents). The OpenEvals team maintains lighteval and the evaluation guidebook, as well as builds/helps the community build cool evaluations. We previously worked on the Open LLM Leaderboard.

On the side, I give a hand to our AI for good/AI for science initiatives.

I enjoy programming, making science open and understandable, books, and delicious food. My motto would likely be: “So much to do, so little time”.

Contact:

You’ll find me as clefourrier over the web (Twitter, LinkedIn, BlueSky, …), or you can reach me at myfirstname at 🤗 dot co. Open to both collabs and mentoring, within available bandwidth.

If you want a fast answer, better make it short and to the point ^^

Timeline

Apr 2025: 🗞️ BusinessInsider: Figuring out which AI model is right for you is harder than you think Apr 2025: 🗞️ VentureBeat : Beyond generic benchmarks: How Yourbench lets enterprises evaluate AI models against actual data
Apr 2025: 📜 Arxiv : YourBench: Easy Custom Evaluation Sets for Everyone
Mar 2025: 🎤 CNRS NLP working group : Panorama of LLM evaluations
Mar 2025: 📝 Blog : Fixing the Open LLM Leaderboard with Math-Verify
Mar 2025: 🗞️ Epsiloon Magazine : IA: le quiz ultime
Feb 2025: 🗞️ France 24 TV : Tech24 on AI
Feb 2025: 🗞️ French AI Summit Conclusions: French LLM Leaderboard showcase,
Feb 2025: ⭐ Finalist of the 2025 French Innovators Awards, AI section : 100 French scientists whose research change our lives, by the French journal Le Point
Feb 2025: 📜 Arxiv : SmolLM2: When Smol Goes Big – Data-Centric Training of a Small Language Model
Jan 2025: 📝 Blog : CO2 emissions and model performance: Insights from the Open LLM Leaderboard
Dec 2024: 📜 Arxiv : Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Oct 2024: ⚙️ Release : Evaluation Guidebook
Jul 2024: 🗞️ The Economist : How to tell which AI model is best
Jul 2024: 🎧 Latent Space Benchmarks 201
Jun 2024: 🗞️ VentureBeat : Hugging Face’s updated leaderboard shakes up the AI evaluation game
Jun 2024: 🗞️ La Tribune : Pour contrer la crise de l’évaluation des IA, Hugging Face rehausse les exigences
Jun 2024: 📝 Blog : Performances are plateauing, let’s make the leaderboard steep again
May 2024: 📝 Blog : Let’s talk about LLM evaluation
May 2024: ⭐ Invited to France’s top AI talents gathering at Elysée Event
May 2024: 📜 ICLR : GAIA: a benchmark for General AI Assistants
Apr 2024: 🗞️ La Recherche : 2023, l’année des grands modèles de langue ouverts
Apr 2024: 🗞️ TechCrunch : Hugging Face releases a benchmark for testing generative AI on health tasks
Apr 2024: 📜 Arxiv : The Hallucinations Leaderboard – An Open Effort to Measure Hallucinations in Large Language Models
Feb 2024: ⚙️ Release : Lighteval
Dec 2023: 📝 Blog : 2023, year of Open LLMs
Dec 2023: 📝 Blog : Open LLM Leaderboard: DROP deep dive
Oct 2023: 📜 Arxiv : Zephyr: Direct Distillation of LM Alignment
Jun 2023: 📝 Blog : What’s going on with the Open LLM Leaderboard?
Apr 2023: ⚙️ Release : Open LLM Leaderboard
Mar 2023: 🎧 Parlons Tech : L’IA générative à la Loupe
Jan 2023: 📝 Blog : Introduction to Graph Machine Learning
Nov 2022: 📜 Arxiv : Bloom: A 176b-parameter open-access multilingual language model
Oct 2022: 📜 PhD : Neural Approaches to Historical Word Reconstruction
May 2022: 📜 ACL : Probing Multilingual Cognate Prediction Models
Apr 2022: 📜 Arxiv : Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0
Aug 2021: 📜 ACL : Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task?
May 2020: 📜 LREC : Methodological Aspects of Developing and Managing an Etymological Lexical Resource: Introducing EtymDB-2.0
Feb 2020: 📜 Arxiv : The Alzheimer’s Disease Prediction Of Longitudinal Evolution (TADPOLE) Challenge: Results after 1 Year Follow-up

Previous research/interest topics

atm, evaluation of LLMs and agents (2023-now)
graph machine learning (2022)
reconstructing dead languages using neural networks (2019-2022)
neurodegenerative disease prediction from longitudinal data (2017-2018)
using 3D meshes and grids for geology and structural modeling (2014-2015)

It’s likely I’ll learn new things again! (robotics maybe? ¯\(ツ)/¯ )

About this site

This site is deliberately static and very lightweight, for ecology, accessibility, and this. I use markdown and pandoc. My logo was made by Alix Chagué.