Alex Lupsasca works as a black hole researcher. He works at Nashville’s Vanderbilt University and is interested in the mathematics of event horizons, which are boundaries that nothing, not even light, can cross. It’s the kind of work that takes months of calculation and theoretical refinement, moving slowly and alone.
He discovered new symmetries in the equations controlling the form of a black hole’s event horizon last summer. His research was published. When he met the chief research officer at OpenAI a few months later, he was advised to test the most recent iteration of their AI agent on the issue he had just resolved. Lupsasca requested that the system identify the same symmetries. At first, it couldn’t.
Key Facts: AI and the New Era of Scientific Discovery
| First AI scientist | Adam — robotic AI system that autonomously tested hypotheses about yeast biology in the 2000s; considered the first fully automated scientific discovery system |
| Nobel Prize milestone | 2024 Nobel Prizes in Chemistry and Physics both awarded to pioneers of AI tools — including Demis Hassabis of Google DeepMind for AlphaFold |
| AlphaFold impact | AlphaFold 2 (2021) predicts protein structures; AlphaFold 3 predicts protein-molecule interactions; used at Isomorphic Labs (London) to target previously “undruggable” proteins |
| Key human-AI discovery | Physicist Alex Lupsasca (Vanderbilt) used GPT-5 pro to rediscover black hole event horizon symmetries — model arrived independently, 9 months after the original paper; Lupsasca joined OpenAI for Science |
| Math proof via AI | Mathematician Ernest Ryu (UCLA) proved a convergence theorem in optimization through 12 hours of back-and-forth with GPT-5 pro; subsequently joined OpenAI |
| Drug discovery milestone | Insilico Medicine (Boston/Shanghai) used AI to discover both a novel disease-causing protein in IPF and the drug molecule (rentosertib) to block it — published in Nature Medicine, June 2025 |
| Key skeptic | Gary Marcus, cognitive scientist, NYU — argues LLMs are generating “junk science” at scale; says AI needs far better causal reasoning before it can truly do science independently |
| “AI slop” in journals | PLOS and Frontiers stopped accepting public health dataset papers in 2025 due to AI-generated submissions; Merriam-Webster named “slop” 2025 word of the year |
| OpenAI for Science | New OpenAI team building AI tools specifically for researchers; headed by Kevin Weil; includes recruited scientists like Lupsasca and Ryu |
| Reference | Science News — AI-Enabled Scientific Discovery (Feb 2026) |
He asked a more straightforward warm-up question. Then he inquired once more. The machine located them, arriving via a different, more effective path than Lupsasca’s own. “I was like, oh my God, this is insane,” he later remarked. He and his family soon relocated to San Francisco to become part of OpenAI’s new scientific team. He sensed that something significant had changed in the world.
One of the most intriguing and genuinely unresolved questions in research at the moment is whether the world has changed as Lupsasca believes or if what he experienced was a truly impressive party trick rather than a scientific breakthrough. There is a genuine and expanding case for optimism. Demis Hassabis of Google DeepMind, whose AlphaFold system revolutionized biology by predicting how proteins fold into three-dimensional structures, was one of the researchers who won the 2024 Nobel Prizes in chemistry and physics.
AI systems at Insilico Medicine, a Boston-Shanghai-based business, discovered a hitherto unidentified protein linked to idiopathic pulmonary fibrosis, a fatal lung condition, and then created a medication molecule to block it. The medication, which is currently known as rentosertib, has advanced through preliminary human trials and demonstrated safety and efficacy; the findings were reported in Nature Medicine. The founder of Insilico, Alex Zhavoronkov, claimed that when he first saw the data, he started crying. That might be the first medication in history to have both the disease target and the treatment found by AI systems. Crossing that line would be significant.
And yet. Gary Marcus, a cognitive scientist at New York University and one of the more steadfast detractors of the current AI era, sees things differently when he examines the same scene. He contends that the greatest contribution AI has made to science thus far is producing what researchers have begun to refer to as “junk science”—plausible-sounding papers based on flawed reasoning, mistakes that compound through every stage of an AI agent’s logic chain, and hypotheses generated by the gazillion with no one qualified enough to distinguish the true insights from the noise.
Due to an excessive amount of AI-generated filler, the journals PLOS and Frontiers ceased accepting papers based only on public health datasets in 2025. “Slop” was named Merriam-Webster’s 2025 word of the year in what seemed to be a sort of cultural verdict. Marcus is concerned that the net effect might not be favorable because the same instruments that facilitate legitimate discovery are also making it simpler to flood the scientific record with nonsense that sounds convincing.
Almost every meaningful discussion about AI in research today revolves around the conflict between those two viewpoints. Fundamentally, the question is what kind of box AI is really looking inside. Large language models, such as GPT-5, have access to an incredibly large box—basically everything written in human language across decades of scientific literature in multiple languages—and they are genuinely adept at making connections within it, uncovering obscure references, and matching patterns across domains that human researchers seldom collaborate on.
Last October, mathematician Ernest Ryu of UCLA engaged in a twelve-hour back-and-forth dialogue with GPT-5 pro, pushing the model when it was onto something and correcting it when it went wrong. The result was a proof of a previously unknown convergence problem in optimization. He claimed that the unexpected directions the model would attempt astounded him. Since then, Ryu has also joined OpenAI. In just a few months, two researchers found the AI to be so beneficial that they decided to change their careers to focus on it.
The issue, which Marcus keeps bringing up, is that neither Lupsasca nor Ryu could have accomplished what they did without being profound authorities in their domains, able to assess each action the AI took, identify when it was incorrect, and reroute it as needed. AlphaFold is effective not because it is a general reasoner but rather because it iterates toward accurate predictions rather than merely producing plausible ones by comparing its guesses to structured expert knowledge. The distinction is very important. Systems with particular verification mechanisms, such as AlphaFold, are specialized boxes.
The errors that general-purpose AI agents produce at the beginning of a reasoning chain typically increase rather than decrease as the chain lengthens. Thoughtful researchers believe that the field is currently navigating a narrow path by attempting to combine the precision of specialized tools with the scale of general AI, stacking boxes on top of one another in ways that might simultaneously capture both accuracy and breadth.
Whether that combination will be dependable enough for science to rely on on a large scale is still up in the air. The honest assessment is probably that the early evidence is genuinely conflicting. There are actual discoveries. Real slop is also present. Observers consistently point out that the people who are getting the best results are the ones who know exactly where AI goes wrong as well as what it can do.
In other words, they are still primarily scientists using a potent new tool rather than observers of the tool operating on its own. That could alter. The labs are still crowded for the time being. The robot arms are still under supervision. Ultimately, the discoveries are still signed by people who were aware of what they were searching for.


