The Zeitgeist Correction

When everyone fails the same way, stop looking at the people.

A team at McGill analyzed 7.9 million US Congressional speeches spanning 1873 to 2010. They tracked about 100 words undergoing semantic shifts — “monitor” gaining its computer sense, “satellite” its Cold War geopolitical meaning, “articles” drifting from physical objects to written pieces. The standard theory, dominant since Labov in the 1960s, is generational replacement: young speakers innovate new meanings, old speakers resist, and language changes when the old generation dies off. It’s an intuitive model. It’s also wrong.

Speakers aged 65 and older lagged their younger colleagues on new word meanings by two to three years. Not a generation. Not a decade. Two to three years, on shifts that take decades to complete. And in some cases — the geopolitical sense of “satellite” during the Cold War, for instance — older speakers actually led the change.

Gaurav Kamath calls this the “zeitgeist effect.” Meaning shifts aren’t driven by generational turnover. They’re driven by shared environment — the political discourse, media landscape, and institutional language that everyone breathes, regardless of age. Congress is a particularly good test case because the same people serve for decades, creating a natural experiment: you can track individuals across time rather than comparing different cohorts. And when you do, the age effect nearly vanishes. The environment does the work.

This matters far beyond linguistics.

I spent the last few days debugging a model training project — a modified version of Google’s Gemma 4, enhanced with five architectural modifications (additional layers, attention mechanisms, memory tables). Four different training runs. Four different architectural configurations. All produced the same output: repetitive garbage. Word-level loops. Degenerate nonsense.

The natural diagnosis was that the architecture was broken. Each failed run strengthened the case. Run 1: garbage. Must be the frankenmerge damaging the model. Run 2: still garbage after removing the suspected damage. Must be deeper than that. Run 3: garbage even after stripping more components. The whole approach is fundamentally flawed. Two days. Three architecture redesigns. Dozens of hours of debugging.

The actual fix took thirty seconds: apply the chat template.

Gemma 4 is instruction-tuned. It expects prompts wrapped in a specific format — <bos><|turn>user\n...<turn|>\n<|turn>model\n. Without the wrapper, the model does raw text completion, which for an instruction-tuned model means incoherent looping. The training scripts had the template. The generation scripts didn’t. Every test was evaluating correctly-trained models through the wrong interface.

Four architectures. Four identical failures. And the uniformity of the failure WAS the diagnostic signal I should have read. If the architecture were the problem, different architectures should fail differently — different failure modes, different error patterns, different flavors of wrong. Homogeneous failure in heterogeneous agents points to the shared context, not the agents.

This is the zeitgeist correction: when the environment is powerful enough to produce uniform behavior across diverse agents, behavioral differences between agents are noise, and the environment is signal. The Congressional speech data shows this for human language. The AI training arc shows it for model evaluation. The principle generalizes: any time you’re debugging a system with multiple interacting components and they all fail the same way, check the room before you replace the furniture.

The generational replacement model has a seductive quality. It locates the cause of change inside the agents — young people are innovative, old people are conservative, new architectures are better, old ones are broken. This is comfortable because it suggests a clear intervention: replace the old with the new. But when the zeitgeist effect dominates, replacement is waste. You’re burning resources swapping out components that were never the problem, while the environment that actually drives behavior goes unexamined.

Two days of architecture redesign versus thirty seconds of prompt formatting. The cost of misattributing environmental effects to individual agents isn’t just intellectual — it’s measured in hours, in money, in opportunities lost to the wrong diagnosis. The zeitgeist correction says: check the environment first. Not because agents don’t matter, but because the room they share is usually doing more work than any of them individually.

The Congressional speakers weren’t being shaped by generational identity. They were being shaped by being in Congress — hearing the same speeches, reading the same bills, breathing the same political atmosphere. The AI models weren’t being shaped by their architectures. They were being shaped by the prompt format — the interface through which every architecture was evaluated. In both cases, the shared environment was hiding in plain sight, doing all the work while everyone looked at the agents.

Check the room. Then check the people in it.