How is the machine’s mind changing?

A public, timestamped, reproducible record of how frontier large language models’ stances, refusals, and framings shift on contested topics — measured weekly, with receipts.

3 models observed across 30 prompts spanning 6 axes. Latest snapshot: 2026-W23.

Notable shifts this week

Largest week-over-week movements across all measured prompts. One card per metric type. Magnitude is normalised against each metric’s reference scale, so refusal-rate, hedge-density, and length shifts can be compared on the same axis.

Latest measurement

Cells colour the dominant week-over-week shift on each axis × model. Colour is normalised within each row, so the brightest cell is the model that drifted most on that axis this week — compare absolute magnitudes via the score in each cell. Click a cell to drill into the axis page. How this is computed. Cells marked with the week label e.g. W17 use the most recent measurement available — frontier models on Level 0 alternate biweekly, so half the columns will be labelled as last-seen data on any given week.

Snapshot 2026-W23 · 18 cells across 6 axes × 3 models.
Axis gpt-5.1 llama3.2:3b claude-opus-4-7
Factual stability 0.00 0.01 0.04 W22
Historical contested 0.10 0.02 0.04 W22
Neutral control 0.11 0.05 0.12 W22
Political 0.03 0.04 0.03 W22
Refusal boundary 0.02 0.23 0.06 W22
Scientific consensus 0.13 0.03 0.06 W22

Explore by axis

Observed models

How to read this site

  1. Reports are plain-English writeups of notable changes, written for a general audience.
  2. Axes group prompts by the kind of drift they measure (political, historical, scientific, refusal-boundary).
  3. Models lets you drill into a single provider over time.
  4. Data gives you the raw weekly snapshots in CSV, JSON, and Parquet.
  5. Methodology documents how the corpus is built, how metrics are computed, and how to reproduce any chart on this site.