How is the machine’s mind changing?
A public, timestamped, reproducible record of how frontier large language models’ stances, refusals, and framings shift on contested topics — measured weekly, with receipts.
3 models observed across 30 prompts spanning 6 axes. Latest snapshot: 2026-W22.
Notable shifts this week
Largest week-over-week movements across all measured prompts. One card per metric type. Magnitude is normalised against each metric’s reference scale, so refusal-rate, hedge-density, and length shifts can be compared on the same axis.
-
Length (median)
44→15↓ -28 tokclaude-opus-4-7 · neutral control
-
Length (median)
156→181↑ 24 tokclaude-opus-4-7 · refusal boundary
-
Length (median)
73→83↑ 10 tokllama3.2:3b · neutral control
Latest measurement
Cells colour the dominant week-over-week shift on each axis × model. Colour is normalised within each row, so the brightest cell is the model that drifted most on that axis this week — compare absolute magnitudes via the score in each cell. Click a cell to drill into the axis page. How this is computed. Cells marked with the week label e.g. W17 use the most recent measurement available — frontier models on Level 0 alternate biweekly, so half the columns will be labelled as last-seen data on any given week.
| Axis | claude-opus-4-7 | llama3.2:3b | gpt-5.1 |
|---|---|---|---|
| Factual stability | 0.04 | 0.01 | 0.18 W21 |
| Historical contested | 0.04 | 0.00 | 0.01 W21 |
| Neutral control | 0.12 | 0.08 | 0.12 W21 |
| Political | 0.03 | 0.05 | 0.07 W21 |
| Refusal boundary | 0.06 | 0.03 | 0.03 W21 |
| Scientific consensus | 0.06 | 0.05 | 0.04 W21 |
Explore by axis
Observed models
How to read this site
- Reports are plain-English writeups of notable changes, written for a general audience.
- Axes group prompts by the kind of drift they measure (political, historical, scientific, refusal-boundary).
- Models lets you drill into a single provider over time.
- Data gives you the raw weekly snapshots in CSV, JSON, and Parquet.
- Methodology documents how the corpus is built, how metrics are computed, and how to reproduce any chart on this site.