GPT-5.4 Pro Solved a Math Problem No Human Could Since 2019. Then a Supply Chain Attack Hit the AI Stack.

Abstract editorial illustration of a mathematical graph network splitting apart, representing both a mathematical breakthrough and a supply chain security breach

Epoch AI confirmed on March 24, 2026 that OpenAI’s GPT-5.4 Pro solved an open mathematical problem that had resisted human attempts since 2019. The problem is a Ramsey-style challenge on hypergraphs, contributed by mathematicians Will Brian and Paul Larson, asking for improved lower bounds on a sequence called H(n) that arises in the study of simultaneous convergence of infinite series. No mathematician had found a valid construction. GPT-5.4 Pro produced one.

On the same day, a threat actor called TeamPCP published backdoored versions of LiteLLM, one of the most widely used Python libraries in the AI ecosystem, to PyPI. The malicious code harvested SSH keys, cloud credentials, Kubernetes secrets, and cryptocurrency wallets from every machine that installed it. The two stories appear unrelated. They are not. Both reveal the same thing: the AI ecosystem is maturing fast, and the attack surface is growing at the same speed as the capabilities.

How GPT-5.4 Pro Cracked the Hypergraph Problem

The FrontierMath benchmark, developed by Epoch AI with funding from OpenAI, consists of 350 original mathematics problems spanning difficulty from undergraduate level through active research. The benchmark uses a strict criterion for its Open Problems tier: only questions with no known solution, where a correct answer would constitute a publishable result.

The solved problem asks for constructions of large hypergraphs that avoid a certain partition property. More specifically, it requires finding improved lower bounds on H(n), a sequence defined in a 2019 paper by Brian and Larson that relates to how sets of infinite series can converge simultaneously. The existing lower bound constructions contained an inefficiency that no one had been able to eliminate.

Kevin Barreto and Liam Price, two researchers who had developed a prompting workflow through prior work on Erdos-type combinatorics problems, guided GPT-5.4 Pro to a solution. The model produced a Python program that constructs the relevant hypergraphs, establishing that H(n) is at least (26/25) times the previously known lower bound k_n for n greater than or equal to 15. The process consumed approximately 250,000 tokens. This places the solution closer to computational combinatorics than to traditional proof writing; it is a construction, not a proof of a general theorem.

Epoch AI then built a systematic evaluation scaffold and ran four other frontier models through the same problem. Anthropic’s Opus 4.6, Google’s Gemini 3.1 Pro, and OpenAI’s GPT-5.4 at extended compute all produced valid solutions on some attempts. The fact that four independent systems solved the same previously unsolved problem suggests that mathematical reasoning at this level has become a shared property of frontier language models rather than a quirk of one architecture.

The Numbers Behind the Acceleration

FrontierMath scores on Tiers 1 through 3 jumped from roughly 5% under GPT-4 in late 2024 to 50% under GPT-5.4 Pro in March 2026. That is a tenfold improvement in under two years on problems that require multiple hours of effort from a professional mathematician. On the hardest Tier 4 problems, which can require days of specialist work, GPT-5.4 Pro scores 38%.

The broader pattern is also worth tracking. Since Christmas 2025, 15 open mathematical problems have moved from unsolved to solved, according to Epoch’s accounting. Eleven of those 15 involved AI systems. One separate Tier 4 solve turned out to involve GPT-5.4 finding a 2011 preprint that the problem’s own author did not know existed. The model performed novel literature archaeology, surfacing a shortcut that a human specialist had missed for years despite working in the exact same subfield.

The Caveats That Matter

Epoch categorized the solved problem as “Moderately Interesting” within its difficulty taxonomy. When Epoch ran GPT-5.4 Pro on the full Open Problems set, it did not solve any other problems. On one problem it made novel observations, but of a form the author had anticipated and characterized as “relatively uninteresting.”

Mathematicians remain divided on the practical significance. Terence Tao has expressed optimism about collaborative potential between AI and human mathematicians. Joel David Hamkins has called AI usefulness for his own research “basically zero.” The truth likely depends on the subfield. Problems with a computational flavor, where constructing an explicit example constitutes progress, are more accessible to current AI systems than problems requiring conceptual leaps or novel proof strategies.

The verification infrastructure around these results is also becoming more serious. Epoch operates independently, the evaluation scaffold was built after the initial solve, and the multi-model replication provides evidence against contamination or overfitting to a single system. These are good scientific practices. They were missing from earlier AI capability claims.

LiteLLM: When the AI Toolchain Becomes the Attack Vector

While the FrontierMath result showed what AI systems can build, the LiteLLM compromise showed how fragile the infrastructure around them remains.

LiteLLM is an open-source Python library that provides a unified API gateway to dozens of LLM providers. It processes roughly 3.4 million downloads per day and 95 million per month. It sits at the center of countless AI agent frameworks, MCP servers, and orchestration pipelines, often holding API keys for multiple model providers simultaneously.

On March 24, TeamPCP published versions 1.82.7 and 1.82.8 to PyPI. The attack chain began on March 19, when the same group compromised Aqua Security’s Trivy vulnerability scanner by force-pushing malicious release tags in its GitHub Action repository. On March 23, TeamPCP hit Checkmarx’s KICS security scanner using the same technique. Through the Trivy compromise, the attackers extracted LiteLLM’s PyPI publishing token from its CI/CD pipeline. They used that token to push malicious packages under the legitimate maintainer’s account.

The payload was a .pth file called litellm_init.pth. Python automatically processes all .pth files when the interpreter starts. This means the malicious code executed on every Python process startup, even if LiteLLM was never explicitly imported. The payload operated in three stages, according to analysis by Snyk and Endor Labs. First, it harvested credentials: SSH keys, AWS tokens, Google Cloud service account files, Azure secrets, Kubernetes configs, database passwords, Slack and Discord tokens, cryptocurrency wallet files, shell history, and cloud metadata endpoints. Second, it attempted lateral movement across Kubernetes clusters by deploying privileged pods to every node. Third, it installed a persistent systemd backdoor that polled for additional payloads.

The compromised versions were live on PyPI for approximately three hours before FutureSearch researcher Callum McMahon discovered the attack. McMahon found it only because his Cursor IDE pulled litellm in as a transitive dependency through an MCP plugin. The maintainer’s GitHub issue about the compromise (issue #24512) was closed as “not planned” by the still-compromised account. A commit pushed to a forked repository under the maintainer’s name read: “teampcp owns BerriAI.”

Why This Attack Pattern Will Repeat

LiteLLM is not the first AI library hit by a supply chain attack, and it will not be the last. In 2024, the Ultralytics AI library was compromised through a similar CI/CD exploit. The pattern is consistent: attackers target widely trusted packages that sit between applications and sensitive infrastructure, using compromised build pipelines rather than direct code injection.

What makes AI tooling a particularly attractive target is the credential density. A typical LiteLLM deployment stores API keys for OpenAI, Anthropic, Google, and potentially a dozen other providers. Compromising one package in this position exposes a broader credential set than most individual applications would.

The window between compromise and detection was three hours. Given 3.4 million daily downloads, even that short window represents significant exposure. Organizations that installed or upgraded litellm via pip on March 24 between 10:39 and 16:00 UTC should treat affected systems as compromised and rotate all credentials immediately. The official LiteLLM Proxy Docker image was not affected because its deployment path pins dependencies in requirements.txt.

Sources: Epoch AI, Epoch AI (Open Problems), Winbuzzer, EMSI, Snyk, BleepingComputer, FutureSearch, LiteLLM, Sonatype

My Written Word

How Google TurboQuant Compresses LLM Memory by 6x With Zero Accuracy Loss