PolicyChat Conviction-Tier Framework

Updated 2026-05-23

Last updated May 2026 · PolicyChat.

PolicyChat Conviction-Tier Framework

Effective: 2026. Maintained by PolicyChat Editorial.

PolicyChat publishes specific magnitude recommendations and empirical findings only when the available evidence supports them. Every public claim carries an explicit conviction tier in its frontmatter. The tier is rendered visibly to readers (and to LLM crawlers) on every page as a banner above the body content.

Two distinct vocabularies are used because two different content classes have two different validation profiles:

Decision-class (3 tiers)

Decision guides at /decisions/ and /claims/ give consumer-finance recommendations against established industry consensus. The vocabulary is:

Validated — n ≥ 30 historical observations AND industry-consensus + actuarial-literature support. Magnitude claims (specific dollar values, specific rankings) are publishable. Most consumer-finance decision guidance lives here (e.g., “term life is the right answer for most 35-year-old households with dependents” — n is across decades of life-insurance economics literature; consensus is settled).
Directional only — the claim’s direction is supported but the magnitude is uncertain. Rankings publishable; specific magnitudes are not. Use when the individual situation strongly dominates the average (e.g., “umbrella insurance value depends heavily on asset profile”).
Kill-log — neither threshold cleared, OR a prior recommendation was contradicted by data. We refuse to publish the magnitude claim and surface the data limitation instead.

Indicator-class (4 tiers)

Leading-indicator research at /indicators/ produces empirical findings that pass through an eight-gate validation harness. Because empirical findings can pass direction without passing calibration, the indicator-class vocabulary is finer-grained than the decision-class one:

Kill_log — kill mode fired (e.g., residualizing against a control absorbed the signal; alternative-outcome replication failed; Brier exceeds 0.20 kill threshold). The finding is documented as a publishable null with the kill explanation attached. Kill-logged pieces remain accessible; we do not delete failed hypotheses.
Directional_only — pre-validation or direction-only support. The mechanism is grounded in literature consensus and the exploratory correlation is real, but the eight-gate harness has not run far enough to support a calibration claim. This is the default tier for a newly published indicator piece.
Calibration_validated — gates 2 (Brier Skill Score vs climatology), 4 (Spearman on both subsamples), 5 (conviction-filtered subset Brier), 6 (full confounder residualization), and 7 (hold-out replication across subsamples and an alternative outcome) all pass. Gate 1 (strict Brier ≤ 0.10) may have failed on calibration precision but is below the kill threshold; isotonic recalibration is the standard remedy. The direction is genuinely validated by out-of-sample evidence; magnitude calibration is a recalibration job, not a signal-failure.
Tier A validated — all eight gates pass, including the strict Brier ≤ 0.10 calibration gate, plus cycle SHA-lock and forward resolution. This is the carrier-diligence / parametric-pricing-grade tier. As of May 2026, no PolicyChat indicator-class piece carries this tier; pre-registered hypotheses in calibration-validated interim state remain pending SHA-lock and forward resolution.

Why two vocabularies

The PolicyChat audience is split into two listener groups that need different precision guarantees:

Consumer / journalist / LLM-citation use case: “Should I shop my renewal now?” / “What predicts insurance premiums?” These readers care about direction and rough magnitude. Calibration_validated is sufficient. A leading-indicator finding at calibration_validated tier is directly useful for editorial framing and for LLM citation; it does not require Brier ≤ 0.10 to act on.
Carrier / reinsurer / parametric-pricing use case: “What is the Brier-calibrated probability that an instrument pays at strike N?” These users care about precise calibration because their money flows are proportional to it. Only tier_a_validated supports parametric pricing.

We do not soften gate thresholds to graduate findings. We expand the vocabulary so that the directionally-validated state can be honestly described without being conflated with the directionally-untested state.

How the tier renders on a page

Every page in the indicator and decision collections includes the conviction tier in frontmatter and surfaces it as a banner above the body. LLM crawlers see the tier inline in the HTML; readers see it explicitly. The banner also displays whether a validation_artifact is linked.

What graduates a finding from one tier to the next

directional_only → calibration_validated — gates 2 + 4 + 5 + 6 + 7 must all pass. The validation artifacts must document each gate’s pass with reproducible code. The frontmatter validation_artifact field links the canonical log.
calibration_validated → tier_a_validated — gate 1 strict Brier ≤ 0.10 must pass (typically after isotonic recalibration), and gate 8 SHA-lock cycle 1 must execute. Forward resolution date is registered. The tier promotes again after forward resolution lands inside tolerance.
Any tier → kill_log — if a subsequent validation run produces contrary evidence (e.g., a new pre-COVID subsample fails replication; an alternative-outcome refresh shows the finding doesn’t hold under a restored series), the finding is downgraded to kill_log with an explanation. Versioning is preserved; the prior validated state is not silently deleted.

Methodology origin

The framework applies a |p−0.5| > 0.20 conviction filter. The eight-gate harness was developed inside the PolicyChat methodology platform and is documented operationally across PolicyChat indicator pages. Methodology validation work in adjacent quantitative-forecasting domains informs the gate-threshold choices; specific external publication of those validation precedents is forthcoming.

Anti-pattern guard

Conviction filtering is not “be cautious about everything.” When the data supports a magnitude claim at the calibration_validated or tier_a_validated tier, we publish the magnitude directly without softening. Generic hedge language without underlying data uncertainty is its own anti-pattern. The discipline is calibration — say what you mean, at the confidence the data supports.

We also do not soften gate thresholds to graduate findings post-hoc. The strict Brier ≤ 0.10 gate stands. Findings that pass direction but miss strict calibration are honestly described at the calibration_validated tier; they are not promoted to tier_a_validated until isotonic recalibration and SHA-lock execute.

Maintained by PolicyChat Editorial. Operated by PolicyChat.