Mathieu Petitbois

Mathieu Petitbois

AI Researcher | PhD Candidate at Ubisoft La Forge & LERIA

I am a final-year PhD candidate at Ubisoft La Forge and LERIA, advised by Ludovic Denoyer, Sylvain Lamprier, and Rémy Portelas. My research interests include Deep Reinforcement Learning, Imitation Learning, and Stylized Policy Learning with applications in video games. In particular, I am passionate about developing learning agents that are diverse, controllable, and trustworthy, so that they can be used to solve many challenges in simulated and real-world applications.

Publications

Google Scholar
SciQL
Offline Reinforcement Learning of High-Quality Behaviors Under Robust Style Alignment
Mathieu Petitbois, Rémy Portelas, Sylvain Lamprier
Agents in the Wild: Safety, Security, and Beyond Workshop at ICLR 2026
Principled Design for Trustworthy AI Workshop at ICLR 2026
QPHIL
QPHIL: Quantizing Planner for Hierarchical Implicit Q-Learning
Mathieu Petitbois*, Alexi Canesse*, Ludovic Denoyer, Sylvain Lamprier, Rémy Portelas
*Equal contribution.
7th Robot Learning Workshop at ICLR 2025
IJCNN 2025
SWR
Offline Learning of Controllable Diverse Behaviors
Mathieu Petitbois, Ludovic Denoyer, Sylvain Lamprier, Rémy Portelas
Generative Models for Robot Learning Workshop at ICLR 2025

Experience

Apr. 2022 – May 2026
Bordeaux, France
Research Assistant
June 2023 – May 2026
Developed and integrated generative models (Transformers, Diffusion, VAEs, VLMs) to bridge generative modeling and sequential decision-making in complex environments (Godot, Atari, MuJoCo, Newton Physics). Collaborated to integrate research outcomes into game development through scientific consulting, and presented findings at international conferences.
Research Engineer
Oct. 2022 – May 2023
Designed and implemented a library to extract behavioral diversity from player data, enabling the learning of diverse, controllable policies for behavior personalization.
Research Intern
Apr. 2022 – Sep. 2022
Implemented and evaluated offline and online RL methods in simulated environments using MuJoCo and Panda3D. Demonstrated that offline RL improves sample efficiency and reduces training time given sufficient data, contributing to a company-wide shift from online RL to data-driven offline RL for policy learning in games.
June 2021 – Aug. 2021
Research Intern · Palaiseau, France

Reimplemented the model-based approach from the World Models paper (Ha & Schmidhuber, NeurIPS 2018) in PyTorch.

Education

PhD in Deep Reinforcement Learning for Stylized Behavior Modeling · Bordeaux & Angers, France

Supervised by Ludovic Denoyer, Sylvain Lamprier, and Rémy Portelas.

Sep. 2021 – Sep. 2022
M.Sc. in Data Science, Highest Honors · Palaiseau, France

Main courses: Deep Learning, Reinforcement Learning, Computer Vision.

Sep. 2019 – Sep. 2022
Master's Degree in Engineering · Palaiseau, France

Main courses: Dynamic Programming, Logic Programming, Linear Programming, Genetic Algorithms, Probability Theory, Optimization, Applied Statistics, Statistical Learning, Control Theory, Bayesian Filtering.

Get In Touch