A Zero-Parameter Algorithm Beats Every Time-Series Foundation Model. It Just Copies From the Context.

Parameters
Zero

Cost Difference
10⁶x

Venue
ICLR 2026

Beats
All TSFMs

A zero-parameter algorithm that copies directly from its own input context outperforms every major time-series foundation model on predicting chaotic systems, turbulence, coupled oscillators, and electrocardiograms. It costs six orders of magnitude less to run. The paper, accepted at ICLR 2026, is not proposing a replacement for foundation models. It is exposing what those models actually do when they appear to work, and it is not what anyone assumed.

Yuanzhao Zhang of the Santa Fe Institute and William Gilpin of the University of Texas at Austin built the simplest possible forecasting algorithm. Given a time series context, scan it for nearly repeating motifs. Find the best match to the current state. Copy whatever came after that match as your prediction. No learned weights. No training data. No gradient descent. The entire algorithm is a nearest-neighbor lookup in delay-coordinate space, executable on a CPU in milliseconds.

They tested it against Chronos, Chronos Bolt, TimesFM, TimeMoE, and Moirai across chaotic attractors (Lorenz, Rössler, double pendulum), turbulent fluid dynamics, coupled Kuramoto oscillators, and real-world EKG recordings. Context parroting won on both forecast error (sMAPE) and attractor reconstruction fidelity (KL divergence) across every system tested. The computational gap ranged from five to six orders of magnitude.

How Context Parroting Works

The algorithm operates in delay-coordinate embedding space, a technique from nonlinear dynamics dating to the 1981 Takens embedding theorem. Given a scalar time series x(t), construct delay vectors by taking D consecutive values: [x(t), x(t+1), …, x(t+D-1)]. Each delay vector represents the state of the system at time t in a D-dimensional space. Takens proved that for a deterministic system with an attractor of dimension d, choosing D greater than 2d reconstructs the topology of the attractor from the scalar measurements alone.

Context parroting uses this embedding to find the best match to the current state within the context window. The algorithm constructs delay vectors from the entire context, computes the Euclidean distance between the most recent delay vector and every earlier delay vector, finds the nearest neighbor, and copies the trajectory following that neighbor as the forecast. If the nearest neighbor occurred at time t* in the context, the forecast is simply x(t*+1), x(t*+2), and so on for as many steps as needed.

This is mathematically identical to a first-order local model in the sense of Farmer and Sidorowich (1987), one of the foundational methods in nonlinear time series prediction. The difference is that context parroting runs entirely inside the context window with no separate training phase. It is, functionally, an in-context nearest-neighbor algorithm.

The connection to Takens embedding is not incidental. It is the reason the method works at all. Takens’ theorem guarantees that the delay-coordinate reconstruction preserves the diffeomorphic structure of the original attractor. Nearby points in the reconstruction correspond to nearby points on the true attractor, which means nearby states evolve similarly in time. This is why nearest-neighbor forecasting in delay space produces accurate predictions: it exploits the geometric continuity of the dynamics. Without the embedding theorem, copying from the nearest neighbor would be random guessing. With it, copying is a geometrically principled operation grounded in 45 years of dynamical systems theory.

Why It Beats Foundation Models

The paper identifies a specific failure mode shared by TimesFM, TimeMoE, and Chronos Bolt: they systematically underestimate oscillations and converge toward the mean. Given a chaotic system that swings between extremes, the foundation models predict a trajectory that dampens too quickly and settles near the average value. This is consistent with training objectives that minimize average prediction error across diverse datasets. Predicting the mean is the safest strategy for minimizing loss across many different distributions. It is also the wrong strategy for any specific dynamical system.

Chronos is the exception. It performs well precisely because it implements something close to parroting internally. The paper shows that Chronos frequently copies motifs from the context window when forecasting chaotic systems. When Chronos works, it works because it parrots. When foundation models fail, they fail because they don’t parrot enough and instead fall back on mean-convergent predictions learned from pretraining.

This explains a finding that puzzled the time-series community: large language models trained on text, with no time series in their training data, can sometimes forecast dynamical systems competitively. The mechanism is induction heads, the attention pattern that identifies repeated sequences and copies what follows. Induction heads are a form of context parroting. LLMs can forecast time series not because they understand physics but because they learned to copy repeating patterns from text, and that same copy mechanism transfers to time series.

The Fractal Dimension Scaling Law

The paper’s most original contribution is connecting forecast accuracy to the fractal dimension of the underlying attractor. Context parroting works by finding near-recurrences in the context. The Poincaré recurrence theorem guarantees that an ergodic system will eventually return arbitrarily close to any previous state, but the waiting time depends on the dimensionality of the attractor. For a system with correlation dimension d, the expected recurrence time scales as L ~ epsilon^(-d), where epsilon is the matching tolerance and L is the required context length.

This produces a scaling law: forecast accuracy improves as a power law in context length, with the exponent determined by the fractal dimension of the attractor. Low-dimensional chaotic systems (Lorenz, d approximately 2.05) need shorter contexts for accurate parroting. High-dimensional systems (turbulence, d much larger) need exponentially longer contexts. The paper validates this scaling law empirically across multiple systems and shows it explains previously observed in-context neural scaling laws for time series forecasting.

The practical implication is quantitative. For a system with known fractal dimension, you can calculate exactly how much context data you need for parroting to reach a target accuracy. This is something no foundation model can tell you because their performance depends on training data composition, not on the mathematical structure of the target system.

What This Does Not Mean

The authors state explicitly that they are not proposing to replace foundation models with context parroting. The value of parroting is as a baseline that reveals gaps. When a foundation model underperforms relative to parroting, it means the model has not learned to use the context data effectively. The failure is not that the model is bad. The failure is that a copy-paste algorithm does better, which means the model is leaving information on the table.

Context parroting has clear limitations. It assumes stationarity: the underlying dynamics must not change over the forecast horizon. It cannot handle distribution shifts, trend changes, or regime transitions. It struggles with non-stationary real-world time series (weather, financial markets, traffic) where the generating process itself evolves. Foundation models handle simple nonstationarity like baseline drift because their pretraining covers such patterns. The authors suggest generalizing parroting to handle nonstationarity as a future direction.

The algorithm also requires that the context contains a near-recurrence of the current state. For high-dimensional systems, the context may not be long enough to contain a good match. In these cases, parroting produces poor forecasts and foundation models that generalize from pretraining would outperform it. The fractal dimension scaling law tells you exactly when this happens: when the required context length exceeds the available context window.

Why This Matters for the Foundation Model Race

Every major AI lab is building or acquiring time-series foundation models. Google has TimesFM. Salesforce has Moirai. Amazon backed Chronos. The premise is that pretraining on massive time series datasets produces models that generalize to unseen systems. Context parroting challenges that premise by showing that, for an important class of systems, generalization from pretraining adds nothing. The context alone is sufficient.

This does not kill the foundation model thesis. It narrows it. Time-series foundation models add value when they handle nonstationarity, distribution shifts, and systems where the context window is too short for recurrence. They fail to add value, and actively harm performance, when the system is stationary and the context contains enough recurrences. Knowing which regime you are in determines whether a billion-parameter model is worth its inference cost.

For practitioners running time series forecasting in production, the actionable takeaway is to benchmark against context parroting before deploying a foundation model. If parroting beats your model, you are paying for compute that is worse than free. If your model beats parroting, you have evidence that pretraining is contributing something beyond pattern copying. Either answer is useful. Not knowing which regime you are in is not.

The deeper implication connects to a recurring pattern in machine learning: the simplest baseline, properly constructed, often outperforms complex systems that were never tested against it. When the baseline is missing, the community overestimates how much the complex system has learned. Context parroting fills that gap for time series. The question it forces every foundation model team to answer: what, exactly, did you learn from pretraining that a copy-paste algorithm cannot recover from the context alone?

Sources: arXiv:2505.11349 (Zhang and Gilpin, ICLR 2026). OpenReview (ICLR 2026 acceptance). Takens, “Detecting Strange Attractors in Turbulence” (1981). Farmer and Sidorowich, “Predicting Chaotic Time Series” (1987). Chronos (Ansari et al., 2024). TimesFM (Das et al., 2024). Moirai (Salesforce, 2024).

Santiago Maniches is a researcher and ML practitioner with a background in geometric and topological methods. He writes about AI mechanisms at mywrittenword.com. LinkedIn · ORCID

A Zero-Parameter Algorithm Beats Every Time-Series Foundation Model. It Just Copies From the Context.

How Context Parroting Works

Why It Beats Foundation Models

The Fractal Dimension Scaling Law

What This Does Not Mean

Why This Matters for the Foundation Model Race

Like this:

More posts

Sora Lost $1 Million a Day. Disney Found Out It Was Dead an Hour Before Everyone Else.

The Safety Company Formed a PAC. The AI Industry Spent $300 Million on Midterms. Here Is What Broke.

512,000 Lines of Claude Code Leaked. The Feature Hidden Inside Changes Everything.

OpenAI Killed Sora, Lost Disney’s Billion Dollars, and Proved That Code Beats Video.

A Zero-Parameter Algorithm Beats Every Time-Series Foundation Model. It Just Copies From the Context.

How Context Parroting Works

Why It Beats Foundation Models

The Fractal Dimension Scaling Law

What This Does Not Mean

Why This Matters for the Foundation Model Race

Share this:

Like this:

More posts

Sora Lost $1 Million a Day. Disney Found Out It Was Dead an Hour Before Everyone Else.

The Safety Company Formed a PAC. The AI Industry Spent $300 Million on Midterms. Here Is What Broke.

512,000 Lines of Claude Code Leaked. The Feature Hidden Inside Changes Everything.

OpenAI Killed Sora, Lost Disney’s Billion Dollars, and Proved That Code Beats Video.

Discover more from My Written Word