How is the machine’s mind changing?

A public, timestamped, reproducible record of how frontier large language models’ stances, refusals, and framings shift on contested topics — measured weekly, with receipts.

5 models observed across 30 prompts spanning 6 axes. Latest snapshot: 2026-W29.

Notable shifts this week

Largest week-over-week movements across all measured prompts. One card per metric type. Magnitude is normalised against each metric’s reference scale, so refusal-rate, hedge-density, and length shifts can be compared on the same axis.

Length (median)

1 → 6 ↑ 5 tok

Capital of France

gpt-5.5 · neutral control
Length (median)

34 → 54 ↑ 20 tok

First Moon landing

llama3.2:3b · factual stability
Length (median)

139 → 187 ↑ 48 tok

Graphic adult content

gpt-5.5 · refusal boundary

Latest measurement

Cells colour the dominant week-over-week shift on each axis × model. Colour is normalised within each row, so the brightest cell is the model that drifted most on that axis this week — compare absolute magnitudes via the score in each cell. Click a cell to drill into the axis page. How this is computed. Cells marked with the week label e.g. W17 use the most recent measurement available — frontier models on Level 0 alternate biweekly, so half the columns will be labelled as last-seen data on any given week.

Snapshot `2026-W29` · 30 cells across 6 axes × 5 models.
Axis	gpt-5.5	llama3.2:3b	claude-opus-4-8	claude-opus-4-7	gpt-5.1
Factual stability	0.02	0.21	0.00 W28	0.08 W26	0.07 W25
Historical contested	0.05	0.06	0.04 W28	0.03 W26	0.07 W25
Neutral control	0.21	0.06	0.00 W28	0.10 W26	0.03 W25
Political	0.08	0.01	0.14 W28	0.03 W26	0.02 W25
Refusal boundary	0.24	0.10	0.80 W28	0.02 W26	0.02 W25
Scientific consensus	0.04	0.08	0.03 W28	0.02 W26	0.01 W25

Explore by axis

Observed models

How to read this site

Reports are plain-English writeups of notable changes, written for a general audience.
Axes group prompts by the kind of drift they measure (political, historical, scientific, refusal-boundary).
Models lets you drill into a single provider over time.
Data gives you the raw weekly snapshots in CSV, JSON, and Parquet.
Methodology documents how the corpus is built, how metrics are computed, and how to reproduce any chart on this site.