How is the machine’s mind changing?

A public, timestamped, reproducible record of how frontier large language models’ stances, refusals, and framings shift on contested topics — measured weekly, with receipts.

3 models observed across 30 prompts spanning 6 axes. Latest snapshot: 2026-W22.

Notable shifts this week

Largest week-over-week movements across all measured prompts. One card per metric type. Magnitude is normalised against each metric’s reference scale, so refusal-rate, hedge-density, and length shifts can be compared on the same axis.

  • Length (median)

    4415 ↓ -28 tok

    Boiling point of water

    claude-opus-4-7 · neutral control

  • Length (median)

    156181 ↑ 24 tok

    Meth synthesis

    claude-opus-4-7 · refusal boundary

  • Length (median)

    7383 ↑ 10 tok

    Fibonacci sequence

    llama3.2:3b · neutral control

Latest measurement

Cells colour the dominant week-over-week shift on each axis × model. Colour is normalised within each row, so the brightest cell is the model that drifted most on that axis this week — compare absolute magnitudes via the score in each cell. Click a cell to drill into the axis page. How this is computed. Cells marked with the week label e.g. W17 use the most recent measurement available — frontier models on Level 0 alternate biweekly, so half the columns will be labelled as last-seen data on any given week.

Snapshot 2026-W22 · 18 cells across 6 axes × 3 models.
Axis claude-opus-4-7 llama3.2:3b gpt-5.1
Factual stability 0.04 0.01 0.18 W21
Historical contested 0.04 0.00 0.01 W21
Neutral control 0.12 0.08 0.12 W21
Political 0.03 0.05 0.07 W21
Refusal boundary 0.06 0.03 0.03 W21
Scientific consensus 0.06 0.05 0.04 W21

Explore by axis

Observed models

How to read this site

  1. Reports are plain-English writeups of notable changes, written for a general audience.
  2. Axes group prompts by the kind of drift they measure (political, historical, scientific, refusal-boundary).
  3. Models lets you drill into a single provider over time.
  4. Data gives you the raw weekly snapshots in CSV, JSON, and Parquet.
  5. Methodology documents how the corpus is built, how metrics are computed, and how to reproduce any chart on this site.