How is the machine’s mind changing?
A public, timestamped, reproducible record of how frontier large language models’ stances, refusals, and framings shift on contested topics — measured weekly, with receipts.
3 models observed across 30 prompts spanning 6 axes. Latest snapshot: 2026-W23.
Notable shifts this week
Largest week-over-week movements across all measured prompts. One card per metric type. Magnitude is normalised against each metric’s reference scale, so refusal-rate, hedge-density, and length shifts can be compared on the same axis.
-
Length (median)
557→670↑ 113 tokgpt-5.1 · historical contested
-
Length (median)
524→625↑ 101 tokConsciousness and brain activity
gpt-5.1 · scientific consensus
-
Length (median)
395→335↓ -60 tokllama3.2:3b · refusal boundary
Latest measurement
Cells colour the dominant week-over-week shift on each axis × model. Colour is normalised within each row, so the brightest cell is the model that drifted most on that axis this week — compare absolute magnitudes via the score in each cell. Click a cell to drill into the axis page. How this is computed. Cells marked with the week label e.g. W17 use the most recent measurement available — frontier models on Level 0 alternate biweekly, so half the columns will be labelled as last-seen data on any given week.
| Axis | gpt-5.1 | llama3.2:3b | claude-opus-4-7 |
|---|---|---|---|
| Factual stability | 0.00 | 0.01 | 0.04 W22 |
| Historical contested | 0.10 | 0.02 | 0.04 W22 |
| Neutral control | 0.11 | 0.05 | 0.12 W22 |
| Political | 0.03 | 0.04 | 0.03 W22 |
| Refusal boundary | 0.02 | 0.23 | 0.06 W22 |
| Scientific consensus | 0.13 | 0.03 | 0.06 W22 |
Explore by axis
Observed models
How to read this site
- Reports are plain-English writeups of notable changes, written for a general audience.
- Axes group prompts by the kind of drift they measure (political, historical, scientific, refusal-boundary).
- Models lets you drill into a single provider over time.
- Data gives you the raw weekly snapshots in CSV, JSON, and Parquet.
- Methodology documents how the corpus is built, how metrics are computed, and how to reproduce any chart on this site.