Methodology & data sources

How we track AI tool constraint changes: sources, update cadence, scoring, and confidence levels.

01 / WHERE THE DATA COMES FROM

Every tracked change links a primary source. We prioritise, in order: official docs, changelogs, pricing pages, and status pages maintained by the tool maker. Where official records are absent or retrospective, we use Wayback Machine snapshots to establish dates. Reputable technology news coverage fills the gap for announcements that lack a canonical permalink.

Community sources — forums, X/Twitter threads, Reddit — are used only as secondary confirmation, never as sole evidence. Events sourced this way are marked secondary source or unconfirmed inline.

289 tracked changes across 30 AI tools. 72% of those changes are sourced from official materials.

No claim appears without a source. Every event’s date, direction and figures come from its linked primary source. The longer written analysis is drafted with AI assistance from that sourced record and reviewed before publishing — it interprets the data, it never invents it.

02 / HOW FRESH IT IS

This site is rebuilt automatically, and its data is reviewed against live sources on a rolling basis rather than on a fixed daily promise. Each tool record carries a last checked date that reflects when the current-state data was last verified against live sources — not just when the page was last generated. That date is shown on every tool page in the header and in the provenance strip beneath the title, so you can always see how fresh a given tool’s record actually is.

Timestamps are shown on every page and in the RSS feed. The most recent verification across all tools was 2026-07-23.

03 / CONFIDENCE LEVELS

Every event carries one of three confidence levels. Events with no marker are high-confidence — that is the clean, default state.

High: Sourced from official documentation or a dated official announcement. Date, scope, and direction are confirmed. No marker shown — this is the default expectation.
Secondary source (marked inline): Sourced from reputable news coverage, the Wayback Machine, or a status page without an accompanying official changelog entry. The event occurred but the primary record is indirect.
Unconfirmed (marked inline in italic): Sourced from community reports — forum posts, social threads — or where the date or scope is approximate. Included for completeness; treat with appropriate scepticism.

04 / HOW THE SCORES WORK

Drift score — a number from 0 to 100, where 100 means no tightening pressure in the window. It starts at 100 and each tightening event subtracts a recency-weighted penalty. Older events decay toward a floor over roughly 12 months, so a change from a year ago matters much less than one last week.

Key multipliers applied before weighting:

Tightening events count approximately 4× what loosening events add back.
Silent (unannounced) changes carry an additional ~1.5× penalty.
Loosening adds back, but asymmetrically — a full reversal does not fully restore the score.

Verdict — Stable / Watch / Caution — is derived from the recent trajectory (90-day delta and trend direction), not the absolute score alone. A tool can score well historically but earn a Watch rating if things have tightened lately.

Sub-scores (Pricing, Limits, Models, Transparency) apply the same recency-weighted logic scoped to the relevant event dimension. They are independent slices, not a weighted average of each other.

These are transparent heuristics over sourced events, not opinions — and explicitly NOT capability or quality benchmarks.

05 / THE STABILITY INDEX

The AI Vendor Stability Index turns the per-tool sub-scores above into letter grades, ranking vendors by how often they tighten the rules — not by product quality. It has two independent axes: a coverage tier (do we have enough evidence to grade?) and the grade itself. They never override each other.

Coverage tier — “do we have enough history?”

Rated: Enough sourced history to grade fairly (5+ events, or 3+ events over 18+ months with two or more official sources). Only Rated vendors get a letter grade.
Provisional: Some signal, not enough to grade. Shown with a direction arrow (↗ / → / ↘) instead of a letter — so it can’t be mistaken for a ranking.
Not Rated: Too little history — newly tracked. No grade, no signal.

Letter grades — applied to the 0–100 stability score

A ≥ 88 · B ≥ 80 · C ≥ 72 · D ≥ 60 · F < 60
Bands align to the Stable / Shifting / Volatile buckets, so a tool page never contradicts its grade.

Overall is a weighted blend that reflects operational pain: 50% Limit · 30% Pricing · 20% Transparency — a limit cut hurts users faster than a price bump. Model Access is graded too but is excluded from Overall: a vendor shipping many models isn’t more stable (often the opposite), so it stays informational.

Two honesty markers: quiet = Rated but no changes recorded in 12 months (stable by absence, not by recent observation), and ⚠ low-signal = we can’t reliably detect new changes for that vendor (e.g. a client-rendered pricing page), so its grade reflects history only.

A high grade means few recent restrictions in our sourced record — NOT product quality. Every grade expands to the dated, sourced events behind it, or states the absence (“no pricing tightening in 11mo”). Not a scientific sample.

06 / USER SIGNALS

The User signals section on each tool page surfaces what people are actually complaining about. It is drawn from three free, public sources over the trailing 180 days: Hacker News (via its keyless Algolia search API — stories and comments mentioning the tool), GitHub issues (for tools with a public issue tracker, via the REST search API), and Apple App Store reviews (for tools with a consumer iOS app, via Apple’s public reviews RSS feed). Each match is run through the same keyword classifier and bucketed into a complaint category — limits/quota, pricing, model removal, performance, or other.

We show the live counts only when there are at least 5 matching complaint mentions across all sources; below that threshold we fall back to hand-picked, sourced quotes rather than dress up a tiny sample. Every mention is tagged with its source (HN / GH / App Store) and links back to the original thread, issue, or review so you can read the context. Not every tool has every source — only tools with a public GitHub tracker or a consumer iOS app contribute those signals. Reddit and X are deferred for now (no stable free API).

These are raw counts of public posts, issues, and reviews matching complaint keywords — not a scientific survey or a representative sample of all users. Percentages are shown only when the sample is large enough (≥10) to be meaningful. No count is ever fabricated; every number traces to fetched Hacker News, GitHub, or App Store data.

07 / WHAT WE DO NOT CLAIM

This tracker covers one thing: how AI tool constraints have changed over time.

✗Not model quality. Drift score says nothing about output quality, reasoning ability, or benchmark performance.
✗Not uptime or reliability. We do not track incident frequency or SLA adherence.
✗Not a ‘best tool’ ranking. A high Drift score means consistent, stable constraints — not that the tool is superior for your use case.

We track only: usage limits, quota changes, pricing moves, model access changes, and constraint policy shifts — and how each has evolved over time.