The pros and cons of reinforcement learning in physical science
Today’s artificial intelligence (AI) systems are built on data generated by humans. They’re trained on huge repositories of writing, images and videos, most of which have been scraped from the Internet without the knowledge or consent of their creators. It’s a vast and sometimes ill-gotten treasure trove of information – but for machine-learning pioneer David Silver, it’s nowhere near enough.
“I think if you provide the knowledge that humans already have, it doesn’t really answer the deepest question for AI, which is how it can learn for itself to solve problems,” Silver told an audience at the 12th Heidelberg Laureate Forum (HLF) in Heidelberg, Germany, on Monday.
Silver’s proposed solution is to move from the “era of human data”, in which AI passively ingests information like a student cramming for an exam, into what he calls the “era of experience” in which it learns like a baby exploring its world. In his HLF talk on Monday, Silver played a sped-up video of a baby repeatedly picking up toys, manipulating them and putting them down while crawling and rolling around a room. To murmurs of appreciation from the audience, he declared, “I think that provides a different perspective of how a system might learn.”
Silver, a computer scientist at University College London, UK, has been instrumental in making this experiential learning happen in the virtual worlds of computer science and mathematics. As head of reinforcement learning at Google DeepMind, he was instrumental in developing AlphaZero, an AI system that taught itself to play the ancient stones-and-grid game of Go. It did this via a so-called “reward function” that pushed it to improve over many iterations, without ever being taught the game’s rules or strategy.
More recently, Silver coordinated a follow-up project called AlphaProof that treats formal mathematics as a game. In this case, AlphaZero’s reward is based on getting correct proofs. While it isn’t yet outperforming the best human mathematicians, in 2024 it achieved silver-medal standard on problems at the International Mathematical Olympiad.
Learning in the physics playroom
Could a similar experiential learning approach work in the physical sciences? At an HLF panel discussion on Tuesday afternoon, particle physicist Thea Klaeboe Åarrestad began by outlining one possible application. Whenever CERN’s Large Hadron Collider (LHC) is running, Åarrestad explained, she and her colleagues in the CMS experiment must control the magnets that keep protons on the right path as they zoom around the collider. Currently, this task is performed by a person, working in real time.

In principle, Åarrestad continued, a reinforcement-learning AI could take over that job after learning by experience what works and what doesn’t. There’s just one problem: if it got anything wrong, the protons would smash into a wall and melt the beam pipe. “You don’t really want to do that mistake twice,” Åarrestad deadpanned.
For Åarrestad’s fellow panellist Kyle Cranmer, a particle physicist who works on data science and machine learning at the University of Wisconsin-Madison, US, this nightmare scenario symbolizes the challenge with using reinforcement learning in physical sciences. In situations where you’re able to do many experiments very quickly and essentially for free – as is the case with AlphaGo and its descendants – you can expect reinforcement learning to work well, Cranmer explained. But once you’re interacting with a real, physical system, even non-destructive experiments require finite amounts of time and money.
Another challenge, Cranmer continued, is that particle physics already has good theories that predict some quantities to multiple decimal places. “It’s not low-hanging fruit for getting an AI to come up with a replacement framework de novo,” Cranmer said. A better option, he suggested, might be to put AI to work on modelling atmospheric fluid dynamics, which are emergent phenomena without first-principles descriptions. “Those are super-exciting places to use ideas from machine learning,” he said.
Not for nuclear arsenals
Silver, who was also on Tuesday’s panel, agreed that reinforcement learning isn’t always the right solution. “We should do this in areas where mistakes are small and it can learn from those small mistakes to avoid making big mistakes,” he said. To general laughter, he added that he would not recommend “letting an AI loose on nuclear arsenals”, either.
Reinforcement learning aside, both Åarrestad and Cranmer are highly enthusiastic about AI. For Cranmer, one of the most exciting aspects of the technology is the way it gets scientists from different disciplines talking to each other. The HLF, which aims to connect early-career researchers with senior figures in mathematics and computer science, is itself a good example, with many talks in the weeklong schedule devoted to AI in one form or another.
For Åarrestad, though, AI’s most exciting possibility relates to physics itself. Because the LHC produces far more data than humans and present-day algorithms can handle, Åarrestad explained, much of it is currently discarded. The idea that, as a result, she and her colleagues could be throwing away major discoveries sometimes keeps her up at night. “Is there new physics below 1 TeV?” Åarrestad wondered.
Someday, maybe, an AI might be able to tell us.
The post The pros and cons of reinforcement learning in physical science appeared first on Physics World.