Methodology
TL;DR
The Political Temperature Index produces a daily 0-100 score measuring US political tension. It ingests ~400 articles and ~200 social posts per run from 140+ unique sources across 27 active integrations, clusters them into topic groups via AI, and calculates five sub-indices: Volume (30%), Polarization (30%), Intensity (20%), Rhetoric (10%), and Friction (10%). The weighted composite is smoothed with an asymmetric EMA (fast heating, slow cooling) and boosted by breaking event multipliers.
Overview
The Political Temperature Index (PTI) is a daily measurement of US political tension on a 0-100 scale. It combines data from multiple sources to produce an objective, reproducible snapshot of the current political climate, updated twice daily (7 AM and 7 PM ET). The score, sub-indices, and top drivers together describe the US political climate today in a form that newsrooms, researchers, and the public can cite consistently over time.
140
Unique Sources
5
Sub-Indices
2,000
Sources Tracked
1,100+
Events Cataloged
Data Sources
News Articles (8 integrations · 72 sources)
We analyze 400+ articles daily from sources across the political spectrum using GDELT (primary, with dynamic query generation from recent top stories), GDELT Global Entity Graph (GEG) for neural sentiment, NewsData.io, NewsAPI.org, Google News RSS (with redirect resolution for accurate source attribution), The Guardian, MediaStack, and an RSS Aggregator pulling from 65 feeds including NPR, AP News, Reuters, PBS NewsHour, ABC News, Politico, The Hill, Roll Call, C-SPAN, USA Today, 14 independent media outlets (7 left, 7 right), 12 Substack political commentators (Taibbi, HCR, Greenwald, Silver, Yglesias, and more), and 10 state political news outlets (Texas Tribune, CalMatters, Politico Florida, Michigan Advance, Pennsylvania Capital-Star, Georgia Recorder, Arizona Mirror, Wisconsin Examiner, Ohio Capital Journal, NC Policy Watch). Source bias is rated using a curated database of 2,000 sources with AllSides ratings, fuzzy name matching (Levenshtein + Dice coefficient), and LLM-based auto-rating for unknown sources.
Social Media (5 integrations · 54 sources)
Posts from 15 politically-relevant subreddits (5 left-leaning, 4 right-leaning, 6 centrist), political content from Bluesky, posts from major Mastodon instances, 14 political communities on Lemmy across 5 instances (lemmy.world, lemmy.ml, sh.itjust.works, lemm.ee, lemmy.dbzer0.com), comments from 50+ monitored YouTube channels across the political spectrum, and keyword search on Threads (pending Meta app review). Search terms and hashtags are dynamically populated from recent top drivers and breaking events, ensuring emerging topics are captured automatically. Each platform uses custom bias scoring: Reddit (subreddit-based), Bluesky (content analysis), Mastodon (instance-based), Lemmy (community-based), YouTube (channel-based with pre-assigned scores).
Legislative, Judicial & Regulatory (8 sources)
Federal congressional votes via Congress.gov and GovTrack APIs. Floor speeches and debate transcripts via the Congressional Record. State legislature bills from LegiScan (all 50 states) and state legislative activity from OpenStates. Federal and appellate court opinions from CourtListener. Supreme Court coverage from SCOTUSblog (decisions, orders, cert petitions). Executive orders and regulations via the Federal Register API. Federal rulemaking activity from Regulations.gov (proposed and final rules from political agencies). Government accountability reports from GAO. Election data from Google Civic Information API.
Campaign Finance (1 source)
Independent expenditures and PAC spending data from the Federal Election Commission (FEC) API tracks campaign finance activity. High spending on negative ads correlates with increased political tension.
Cross-Validation & Elections (6 sources)
Polymarket prediction market data provides a market-based signal for political uncertainty, replacing PredictIt (winding down). Wikipedia Pageviews tracks attention spikes on 20 key political pages as a public interest proxy. GDELT TV News measures television mention volume for political terms. Snopes fact-check articles supplement PolitiFact for broader misinformation and rhetoric detection.
Sub-Indices
The final temperature is calculated from five sub-indices:
Rhetoric Heat (10%)
Measures toxicity and inflammatory language in news headlines and social posts. Primary scoring uses DB-stored toxicity and inflammatory scores (weighted by source reliability). When ensemble mode is enabled and 2+ AI models are available, blends 70% DB-based score with 30% multi-model ensemble (OpenAI Moderation, HuggingFace BERT, Groq) for improved accuracy. Falls back to 7-day historical average when fewer than 10 toxicity samples are available, or to a neutral baseline (20) if no historical data exists.
rhetoric = (mean_toxicity × 0.4) + (mean_inflammatory × 0.4) + (gdelt_neg_tone × 0.2)
+ escalation_boost (headline pattern scan, up to +45 pts)
+ fact_check_boost (when rulings present)Coverage Polarization (30%)
Uses a 4-component formula measuring how differently left and right media cover the political landscape. Narrative divergence (40%) detects when both sides frame the same story with opposing sentiment. Topic siloing (30%) catches when left and right cover entirely different stories. Coverage blindspots (20%) penalize stories covered by only one side, weighted by article count. A legacy intensity component (10%) preserves backward compatibility.
narrative_divergence (40%): left vs right sentiment gap per topic
topic_siloing (30%): coverage overlap between left and right
blindspots (20%): one-sided coverage, article-weighted
legacy_intensity (10%): partisan_intensity × imbalance
polarization = weighted sum across all topic clustersVolume Signal (30%)
Detects unusual spikes in political content compared to the day-of-week baseline (e.g., Tuesdays vs past Tuesdays). Falls back to 30-day average if insufficient same-day samples. Primary indicator based on historical event validation.
baseline = same_day_of_week_avg (8 weeks) || 30d_avg
time-of-day extrapolation: 0.35× before 7:00 AM EST, 0.55× before 9:00 AM EST, 0.8× before 1:00 PM EST
carry-forward floor: if today < 40% of avg AND partial data → yesterday × 0.7
volume = sigmoid(2.2 × ((today_count / completeness) / baseline - 1)) × 100Legislative Friction (10%)
Tracks political friction across four government branches over a 7-day window. Each branch's party-line score (0-1, where 1.0 = pure party-line voting) is weighted and combined. Missing branches default to 0.25 (low friction). During recess, federal friction decays 20% per day toward the 0.25 baseline from the last known value, preventing abrupt drops when campaigns and investigations keep tension high. A bipartisan cooling factor reduces friction when >40% of federal votes are bipartisan (party-line score < 0.2).
federal (Senate+House): 50% | executive (Fed Register): 20%
state (LegiScan): 15% | judicial (CourtListener): 15%
recess decay: 0.25 + (last - 0.25) × 0.8^days
bipartisan cooling: factor = 1.0 - (ratio - 0.4) × 0.5
friction = weighted_sum(party_line_scores) × cooling × 100Story Intensity (20%)
Detects breaking news severity using AI-based scoring as the primary method. Reuses OpenAI Moderation scores (toxicity 40%, inflammatory 60%) at zero extra API cost. Falls back to keyword matching if <10 AI samples available. GDELT tone captures both extremes: negative tone (crisis, conflict) and extreme positive tone above +20 (sarcastic celebration, "crisis averted" articles) both contribute to intensity. Includes story velocity detection for rapidly emerging stories. Normalization factors (velocity scaling, cluster heat ceiling, tone intensity) are adaptively computed from the 95th percentile of the last 14 days of intensity metrics, preventing both saturation and under-sensitivity as data accumulates. Falls back to static constants when fewer than 7 days of historical data exist.
AI_score = (toxicity × 0.40) + (inflammatory × 0.60)
tone: negative = |tone| / 10, positive = max(0, tone - 20) / 15
intensity = (AI × 0.20) + (velocity × 0.35) + (severity × 0.25)
+ (tone × 0.05) + (social × 0.05) + (market × 0.05) + (econ × 0.05)
(unavailable components redistribute weight to available ones)Weight Reasoning
The sub-index weights were originally tuned against a historical political-event set covering 2015-2026 (elections, protests, Supreme Court decisions, mass shootings, impeachments, etc.). Current public response metrics are stricter: they include only source-verified events that overlap PTI temperature history.
Why These Specific Weights?
Validation Results
Optimization significantly improved historical event detection. Previous weights (Rhetoric: 35%, Polarization: 30%, Volume: 15%, Friction: 20%) over-weighted rhetoric, causing false positives from routine partisan media coverage. The current 5-index model with Story Intensity and multi-model ensemble sentiment better captures breaking events without manual intervention. Current verified-event response metrics are reported live on the Track Record page, evaluating source-verified events only when actual temperature data exists for the relevant window. Cataloged events that are pending review, or verified events outside available temperature history, are excluded from headline response percentages. Baseline "quiet periods" are reviewed separately to measure false positive rates.
Temperature Scale
Frozen
0-14
Cool
15-29
Mild
30-44
Warm
45-59
Hot
60-74
Boiling
75-89
Meltdown
90-100
Temperature Smoothing
Raw scores are smoothed using an asymmetric Exponential Moving Average (EMA) that responds quickly to rising tension but cools gradually, matching how political tension behaves in the real world.
Asymmetric EMA
Heating up (fast): α = 0.75 → responds quickly to events
Cooling down (slow): α = 0.45 → takes ~2-3 days to decay
smoothed = α × raw + (1 - α) × previousEvent Decay
Major events create a boost that decays exponentially over time:
boost = initial_boost × e^(-t/τ)
τ (decay time): critical=7d, major=5d, moderate=3d, minor=2dKnown Latency
Because of exponential smoothing, rapid-onset events take 1–2 update cycles (4–8 hours) to fully register in the published temperature. This prevents false spikes from noise but means the index can lag during the first hours of a major crisis. The fast heating coefficient (α = 0.75) minimizes this gap, but early-breaking events may show a lower temperature than expected until the next pipeline run completes.
Event Discovery System
The system actively detects breaking events that drive temperature spikes using dynamic cluster-based detection with AI metrics, replacing the former hardcoded keyword approach.
Detection Methods
- 1. Volume Spike Detection compares topic cluster volumes to 7-day averages (1.5x = minor, 2.5x = moderate, 4x = major, 6x = critical)
- 2. Cluster-Based Breaking Detection reads topic clusters with AI-scored toxicity and inflammatory metrics, compares volume to historical averages, and derives severity from data rather than keyword matching
- 3. Story Velocity compares today's cluster article counts to yesterday's to detect rapidly emerging stories. High-impact clusters (war, crisis, conflict) use relaxed matching (1-word overlap). Web search severity provides a velocity floor when cluster-based detection returns low scores during genuine crises.
- 4. Driver-Based Detection uses LLM-clustered topics from daily drivers with metric-based severity derived from cluster toxicity and inflammatory scores
Severity Multipliers
Critical
Major
Moderate
Minor
Dynamic Severity Classification
Event severity is derived from actual article metrics rather than keyword pattern matching. Each topic cluster's articles are scored for toxicity and inflammatory content, and these averages determine severity upgrades:
base severity: 50+ articles = critical, 30+ = major, 15+ = moderate
AI intensity = avg(toxicity, inflammatory) from cluster articles
intensity > 0.3: upgrade 1 level | intensity > 0.5: upgrade 2 levels
heat contribution > 20%: upgrade 1 additional level (cap: critical)Stacking Protection
Uses either the event multiplier or the decay boost from past events (whichever is larger), never both. Total boost is capped at 22 points to prevent event noise from jumping temperature bands. Events with residual decay below 2.0 points are pruned so stale events don't accumulate.
Event Quality Filters
Before saving, detected events pass through three quality filters to prevent noise:
- 1. Foreign story filter checks events against 50+ static country/region indicators (Pakistan, Sudan, Ukraine, etc.) plus dynamically discovered foreign terms. The system analyzes 30 days of topic clusters to find terms that consistently co-occur with known foreign content, so new foreign entities (e.g., "navalny", "starmer") are identified automatically within days. All matches use word-boundary regex and are checked for US relevance; stories with no US connection are excluded.
- 2. Vague title filter rejects keyword-only titles like "Breaking: Terror" or titles shorter than 15 characters that lack specificity.
- 3. Editorialized filter rejects titles containing loaded language like "terror campaign", "slams", or "blasts" that indicate opinion rather than factual reporting.
Domestic vs International Weighting
Since this measures US political tension, international stories receive reduced weight based on their domestic relevance:
2+ US keywords match (aid, vote, sanctions, etc.): 1.0× full weight
1 US keyword match: 0.5× significant reduction
No US connection: 0.1× heavy reduction
Non-foreign stories: 1.0× (unaffected)Events → Top Drivers Bridge
Critical and major breaking events are automatically promoted into the Top Drivers list using fuzzy topic matching (Jaccard word overlap). This ensures a significant event always appears in both the breaking news section and the ranked driver list, even when detected between the three-times-daily scoring runs.
similarity = |words_A ∩ words_B| / |words_A ∪ words_B|
match threshold: 0.4 (40% word overlap)
injected heat: critical = 25%, major = 15%Driver Quality Pipeline
The "Top Drivers" list is produced by a multi-stage pipeline that ensures accuracy, prevents missing major stories, and filters out noise.
Pre-Clustering Intelligence
Before the LLM clusters headlines, the system fetches Google News RSS to extract the top 20 trending headlines and top 15 keywords. These are injected as "Known Top Stories" into the LLM prompt so it doesn't miss major stories that may be underrepresented in the article sample. Trending keywords also seed additional DB queries to pull in relevant articles.
Dynamic Keyword System
All keyword lists across the pipeline are dynamically populated from recent data instead of relying on hardcoded static lists. This ensures emerging topics are automatically captured without manual intervention.
Social search terms: core terms + recent driver/event keywords
GDELT queries: static baseline + dynamic queries from high-heat drivers
Article counts: Document Frequency map classifies keywords at runtime
→ Words in >10% of articles = "generic" (searched with AND)
→ Specific keywords searched with OR (prevents count inflation)
Headline filtering: TF-IDF weighted relevance scoring
→ Rare terms (low DF) rank higher: score = Σ log(1 / (df + 0.01))
→ Ensures driver headlines match their topic, not unrelated storiesArticle Sampling
Up to 350 articles from a 48-hour window are selected using multi-source sampling: high-heat articles, left-leaning sources, right-leaning sources, high-volume topic representatives, and trending keyword matches. Articles are deduplicated by normalized title (removing GDELT syndication duplicates) and filtered to exclude non-political content (local crime, accidents, recipes).
LLM Clustering
Headlines are clustered by a multi-provider LLM system (Groq, Cerebras, Gemini, Mistral, OpenAI, and Anthropic Claude with automatic fallback). The clustering prompt enforces factual, neutral cluster names with no loaded language like "terror", "hate", or "slams". Domestic US stories are prioritized over international.
Post-Clustering Validation
After clustering, the system checks the top 10 keywords by article count. If any keyword with 30+ articles is not represented in any cluster, a warning is logged. This catches cases where a major story (e.g., 442 Epstein articles) was missed by the LLM.
Dynamic Related Topic Groups
Duplicate clusters about the same story are merged using dynamically discovered topic relationships. The system analyzes 30 days of keyword co-occurrence in topic clusters using a Union-Find algorithm to discover which terms consistently appear together.
Co-occurrence graph: keyword pairs in same cluster across distinct days
Strong edges: 3+ days of co-occurrence → connected component
Example: "epstein" ↔ "maxwell" ↔ "bondi" auto-discovered
Merged with static groups (static = floor, dynamic = superset)
Cache: 6hr TTL, refreshed from topic_clusters historyCluster Quality Scoring
A pure algorithmic validator (no LLM calls) runs after clustering to measure quality across four dimensions, producing an overall score (0-100). Warns when score drops below 70.
Within-cluster similarity: Jaccard overlap of headlines in same cluster
Cross-cluster separation: Low overlap between different clusters
Coverage rate: % of input headlines assigned to a cluster
Foreign contamination: % of clusters matching foreign indicatorsVolume-Based Heat Caps
The LLM assigns heat levels based on headline tone, but can't see article volume. Bidirectional corrections are applied:
Volume floors: 500+ → extreme, 200+ → high, 50+ → medium
Volume caps: <3 → max low, 3-9 → max medium, 10-19 → max high
Prevents LLM rating 5-article story as 'extreme'Trending Cross-Validation
After LLM clustering, drivers are cross-validated against trending headlines from 4 external sources (Google News, Memeorandum, Reddit Rising, AP/Reuters). This catches politically significant stories the ingested article corpus missed.
Step 1: Fetch trending headlines from 4 aggregators
Step 2: Match each headline against clusters (Jaccard, keyword overlap)
Step 3: Unmatched headlines confirmed by 2+ sources → synthetic gap topics
Step 4: LLM identifies 0-3 semantic gaps (different wording, same story)
Gap topics injected with 1.2× trending heat bonus (capped at 3)Trending Validation Penalty
Drivers with zero trending headline matches receive a 0.85× heat penalty. This creates natural turnover: stale, unvalidated drivers lose heat while trending stories gain it.
Driver Freshness Tracking
Each driver tracks when it first appeared and its article velocity trend (rising, stable, declining, or stale). Decay penalties increase with age:
Rising or Day 1: 1.0× (no penalty)
Stable ≤ 3 days: 0.95×
Declining: max(0.5, 1.0 - days × 0.1)
Stale: max(0.3, 1.0 - days × 0.15)
Compound: unvalidated + declining/stale + 3+ days = 0.5×Web-Grounded Search (Parallel Enrichment)
Brave News Search and Brave Web Search run in parallel with headline clustering, with Tavily news search corroborating and Gemini grounding held for fallback/audit checks. The search stack independently discovers what US political stories are happening right now. This provides an external ground-truth signal that doesn't depend on the ingested article corpus. If the web search fails or times out, the pipeline continues unaffected.
Web search results are merged into the driver pipeline via three operations:
1. CONFIRM: web story matches existing cluster (Jaccard ≥ 0.25 or
2+ keyword overlap) → cluster gets 1.1× heat bonus
2. RESCUE: web story is international but US-relevant (usRelevance ≥ 0.6)
and matches a cluster previously filtered as foreign
→ cluster restored with isDomestic=true + 1.15× heat bonus
3. INJECT: web story is not minor, usRelevance ≥ 0.5, matches no
existing cluster, AND has 1+ corroborating signal from independent
sources → synthetic cluster created with 1.15× heat bonusAI Sentiment Analysis
The system uses a multi-model ensemble for improved sentiment accuracy, combining three sentiment-analysis models (OpenAI Moderation, HuggingFace RoBERTa, and Groq LLM) for robust toxicity and inflammatory detection.
Ensemble Weights
OpenAI Moderation
General toxicity
HuggingFace BERT
Domain-specific models
Groq LLM
Contextual analysis
Sarcasm Detection
Before sentiment analysis, content is checked for sarcasm/irony using HuggingFace models combined with heuristic signals (punctuation patterns, capitalization, sentiment shifts). Detected sarcasm is weighted differently to avoid misclassification.
Fallback Chain
If the ensemble is unavailable (API limits, network issues), the system falls back to: (1) GDELT GEG neural sentiment, (2) OpenAI Moderation alone, (3) 7-day historical average. This ensures temperature calculations continue even during API outages.
How Top Stories Are Selected
The dashboard's top stories ("drivers") go through a multi-stage pipeline to ensure relevance and accuracy.
1. Topic Clustering
~400 articles per run are grouped into topic clusters by an LLM. The prompt enforces neutral, factual cluster names and requires 2+ headlines per cluster. Clusters are deduplicated across pipeline runs using synonym-aware Jaccard similarity.
2. Heat Ranking
Clusters are ranked by a "heat" score combining article count (discovery-based, not keyword search), sentiment intensity, and velocity (rate of publication). Wire syndication copies (AP/Reuters) are deduplicated so a single story distributed to 30 outlets doesn't inflate its rank. Low-article clusters receive proportional penalties (<3 articles = 0.4x heat).
3. Web-Search Validation
A parallel search stack (Brave News, Brave Web, Tavily news, and Gemini grounding fallback) independently discovers current US political stories. Clusters confirmed by web search receive a severity-proportional boost (critical stories: 1.5x, major: 1.3x, moderate: 1.15x). After driver selection, a post-ranking check verifies the final top 10 against critical/major web stories and injects up to 2 missing high-importance stories.
4. Quality Filters
A pre-publish audit removes foreign stories, non-political content (entertainment, sports, ceremonial events), stale drivers persisting without fresh articles, and near-duplicate entries. LLM-based significance scoring demotes soft news (a politician at a sporting event ranks lower than policy substance). Cluster names are validated against their own headlines to catch hallucinated or incoherent labels.
Heat Source Analysis
The dashboard shows where political "heat" (inflammatory/toxic language) is coming from, with intelligent intensity comparison.
Heat Intensity Comparison
Compares each group's heat contribution % to their coverage share % to determine if they're "running hot" (more inflammatory than expected) or "running cool":
intensity_ratio = heat_percentage / coverage_percentage
> 1.15: ↑ Running hot (orange), more heat than coverage suggests
< 0.85: ↓ Running cool (cyan), less heat than expected
0.85-1.15: → Expected (gray), heat matches coverage shareThis prevents misleading conclusions. If a group produces 45% of heat but also 45% of coverage, they're not actually running "hotter" than expected.
Confidence Score
A data quality metric (0-100) displayed on the dashboard that indicates how reliable the temperature reading is based on available data.
80+ score
60-79 score
40-59 score
<40 score
Based on four components: article volume (30%), source match rate (30%), sentiment coverage (25%), and source diversity (15%). Click the confidence badge on the dashboard to see each component's score. When confidence is low, smoothing trusts historical data more heavily.
Confidence Intervals
Each temperature reading includes a confidence interval, a range reflecting how much the true value could vary given data quality. Toggle the "CI BAND" button on the trend chart to visualize it as a shaded region around the temperature line.
Score ≥ 80
Score 60–79
Score 40–59
Score < 40
Confidence scores and interval bounds are stored per day, enabling historical uncertainty analysis.
External Validation
To verify that PTI measures something real, we compare it against independent external indicators that should correlate with political tension.
PTI vs CBOE Volatility Index (VIX)
The VIX measures market-implied volatility and tends to spike during political crises, major policy announcements, and elections. We calculate the Pearson correlation coefficient between PTI and VIX over a rolling 30-day window and track "coincident spikes", days where both indices rose meaningfully (PTI > +3, VIX > +2).
Live results are displayed on the Track Record page with a dual-axis chart, correlation coefficient, and spike table. VIX data is sourced from the Federal Reserve Economic Data (FRED) API.
Data Quality Monitoring
The pipeline continuously validates its own output to catch data quality issues and ensure consistency across its two daily runs.
Anomaly Detection
Automatically flags impossible values (out-of-range scores), frozen scores (identical temperature for 3+ consecutive days), unexplained spikes (large jumps without high sub-scores), data droughts (too few articles on a weekday), and cross-run inconsistencies (sub-score swings or driver churn between the same day's pipeline runs).
Post-Pipeline Consistency Validation
Before saving the daily temperature, the system validates internal consistency: temperature delta from yesterday must be ≤ 25 points (unless a critical event was detected), sub-scores must be internally coherent, and no sub-score should change more than 30 points without a corresponding data signal. Issues are logged as warnings for investigation, never blocking the pipeline.
Multi-Source Cross-Validation
Stories discovered through web search are verified against at least one independent source (trending headlines from 4 aggregators, breaking events, or gap detection) before being promoted to top drivers. This prevents single-source stories from contaminating the driver list.
Fallback Tracking
When any scoring component falls back to a secondary method (e.g., historical averages instead of live data, static normalization instead of adaptive), the event is tracked and logged. This provides visibility into when the system is operating at reduced data quality.
Source Balance
Our source database includes outlets across the full political spectrum. Bias ratings are based on Media Bias/Fact Check and AllSides data. The pre-publish audit checks the left/right distribution of articles used in each scoring run and flags imbalances exceeding a 2:1 ratio for investigation.
Complete Data Sources
News (8 integrations · 72 sources)
- GDELT
- GDELT GEG (neural sentiment)
- NewsData.io
- NewsAPI.org
- Google News RSS
- The Guardian
- MediaStack
- RSS Aggregator (65 feeds)
Social (5 integrations · 54 sources)
- Bluesky
- Mastodon
- Lemmy
- YouTube (55+ channels)
- Threads (pending review)
Legislative (11)
- Congress.gov
- GovTrack
- Congressional Record
- Federal Register
- Regulations.gov
- GAO Reports
- LegiScan (state bills)
- OpenStates (state legislatures)
- CourtListener (courts)
- SCOTUSblog (Supreme Court)
- Google Civic (elections)
Finance (1)
- FEC (Federal Election Commission)
Cross-Validation (4)
- Polymarket (prediction markets)
- Wikipedia Pageviews
- GDELT TV News
- Snopes (fact-checks)
Social Media Bias Scoring
Each social media platform uses a different method to assign political bias scores (-3 to +3 scale, where negative = left, positive = right).
Subreddit-based lookup. Each subreddit has a pre-assigned bias score based on its typical political leaning (e.g., r/progressive = -2.5, r/Conservative = +2.0).
Bluesky
Content analysis using 80+ political indicator keywords. Posts are classified as left/center/right based on politician mentions, issue keywords, and media references.
Mastodon
Instance-based bias. Each Mastodon instance (server) has an assigned lean based on its community demographics (e.g., kolektiva.social = -2.0, noagendasocial.com = +1.5).
Lemmy
Community@instance key lookup with content fallback. Specific political communities have assigned biases; others use content analysis for classification.
YouTube
Channel-based bias scoring. 55+ monitored political channels across the spectrum have pre-assigned bias scores based on their editorial direction.
Threads (Pending)
Keyword search for political content. Pending Meta app review for the threads_keyword_search permission. Will use content analysis similar to Bluesky.
Data Window & Freshness
Most calculations use a 7-day rolling window to ensure consistency across all metrics and smooth out daily noise. This means the temperature reflects the past week's political activity, not just the last 24 hours. When today's data is not yet available (between midnight UTC and the morning pipeline run), the dashboard displays the most recent available date with an amber "Latest" badge indicating data staleness.
Window by Component
Trend Visualization
The 30-day trend chart uses temperature-aware gradient coloring where the line color changes based on the Y-axis temperature value, making it easy to spot when tension was high (pink/red) versus calm (blue/green). Event annotations from breaking events (moderate severity and above) and historical events are overlaid on the chart, showing what political events drove temperature changes.
Sub-index sparklines show 7-day historical trends for each component, making it easy to spot which sub-index is driving recent temperature changes.
Temperature Delta Explainer
When the temperature changes by more than 1 point from the previous day, the dashboard shows a "What Changed" breakdown listing the top 3 sub-indices that contributed most to the shift. Each contribution is weighted by the sub-index's percentage of the total score.
weightedDelta = (today - yesterday) × sub_index_weight
Only sub-indices with |weightedDelta| > 0.5 are shown
Example: VOL +3.2 POL -1.1 INT +0.8Historical Event Auto-Promotion
Significant political events are automatically promoted into the historical event database for long-term accuracy tracking and trend chart annotations after the evening publication.
Candidate Selection
Candidates are drawn from two sources: today's top drivers (ranked by heat contribution) and today's breaking events (critical/major severity). Candidates are merged, deduplicated by fuzzy name matching, and enriched with a days-seen count (how many of the last 5 days the topic appeared as a driver). This helps the AI distinguish genuinely new events from ongoing coverage of the same story.
AI Classification
An LLM evaluates each candidate for historical significance, assigning a severity (critical, major, moderate, minor) and an impact score (0-100). Candidates classified as minor or with low impact are rejected. The LLM receives the days-seen context to avoid promoting multi-day status updates ("Day 5 of protests") as separate events.
Hard Quality Filters
After AI classification, a deterministic filter rejects events that don't meet quality thresholds. This catches LLM over-promotion that the AI classification alone may miss:
Impact thresholds: critical < 20, major < 12, moderate < 8 → rejected
Title length: < 4 words → rejected (too vague)
Vague patterns: "continues", "ongoing", "heats up", "escalates" → rejected
Action verb required: must contain passed, signed, fired, arrested,
ruled, killed, launched, indicted, overturned, impeached, etc.AI Summaries
The dashboard generates natural language summaries that explain the day's political temperature in context. Summaries are produced by LLM analysis of the day's articles, drivers, and scoring data.
Summary Types
Morning summaries are generated at the first daily publication (7 AM ET), providing context for the day ahead. Evening summaries are generated at the final publication (7 PM ET), wrapping up the day's developments. Breaking summaries are generated when significant events are detected mid-cycle.
Summary Chaining
Each summary references the prior one for narrative continuity. Morning summaries reference the previous evening's context, and evening summaries reference the morning's. This creates a coherent thread across the day rather than isolated snapshots.
Summary Refresh
Top story descriptions are automatically monitored for staleness. If a driver's summary matches generic patterns (e.g., vague descriptions that don't capture the specific story), it is refreshed via LLM with the latest headline context. Driver names are also updated when stories evolve between pipeline runs.
Accuracy Validation
The system continuously validates its own accuracy against real-world political events using a multi-layered approach.
Historical Event Validation
Source-verified political events from 2015–2026 are used as ground truth when they overlap available PTI temperature history. For each evaluated event, the system checks whether the temperature produced a severity-specific spike response. Absolute heat-level coverage is reported separately and is not counted as a response hit. Events pending source review, and verified events without temperature rows, are excluded from headline response math until they can be evaluated.
Post-Publication Accuracy Review
Automated reviews run after each dashboard publication. They evaluate the verified-event response hit rate when enough evaluated events are available, track false positives (days where the temperature spiked without a corresponding real-world event), and monitor severity classification accuracy. If the hit rate drops below acceptable levels, the system generates diagnostic alerts.
Missed Event Diagnostics
When an event lacks a response hit, the system diagnoses the specific reason: whether the temperature spike was too small, the heat floor was not met, or there was insufficient data that day. This feedback loop drives targeted improvements to scoring sensitivity.
Bidirectional Severity Feedback
Event severity classifications are validated in both directions. Events initially rated as "moderate" are flagged for upgrade if temperature data shows they had major impact, and vice versa. This ensures the historical record stays calibrated over time.
2026 Election Tracker
The Election Tracker extends the temperature index to individual 2026 midterm races, combining multiple data signals into a per-race heat score.
Composite Heat Score
Each race receives a daily composite heat score (0–100) weighted across four signals: news article heat (40%), prediction market closeness (25%), campaign spending intensity (20%), and polling closeness (15%). Closer races naturally score higher on the market and polling components.
Data Sources
Races are auto-discovered from FEC candidate filings. Prediction market odds come from Kalshi (primary) and Polymarket (secondary). Expert ratings from Cook Political Report and Sabato's Crystal Ball establish baseline competitiveness. Campaign finance data tracks total raised and independent expenditures. News articles are dynamically matched to races via candidate name and state/race pattern detection.
Race Narratives
Sub-topics are detected within each race's coverage (e.g., fundraising, endorsements, polling surprises, scandals). Sentiment and volume trends are tracked to show whether a race is heating up or cooling down. Competitive races receive additional web search enrichment for recent narrative context.
Rating–Market Divergence
Each race is checked for divergence between expert ratings (Cook Political Report, Sabato's Crystal Ball) and prediction market odds (Polymarket, Kalshi). A “significant” divergence means experts and markets disagree on the likely winner (e.g., Cook rates Lean R but markets imply Lean D). A “mild” divergence means they agree on the party but differ by 2+ rating levels (e.g., Cook rates Safe R but markets imply Lean R). These flags highlight races where conventional wisdom and real-money predictions are telling different stories.
Limitations
- The index measures tension, not correctness or validity of political positions
- Social media data includes Reddit, Bluesky, Mastodon, Lemmy, and YouTube; Threads integration is pending Meta app review
- Sarcasm detection improves accuracy but may still misclassify some ironic content
- Historical event data spans 2015-2026 (1,100+ cataloged events); public response metrics require source verification and overlapping temperature data
- Source matching covers ~83% of articles; unmatched sources are auto-rated by LLM or default to center bias
- Temperature reflects a 7-day EMA window, so single-day events may take time to fully manifest
Appendix: Technical Parameters
All values below are imported directly from source code at build time. Expand any section for full details.
A. Temperature Smoothing
The composite temperature uses an asymmetric Exponential Moving Average (EMA): it heats up quickly in response to spikes but cools down slowly, mimicking how political tension lingers after major events.
| Parameter | Value | Purpose |
|---|---|---|
| alphaUp | 0.75 | Fast heating: 75% of a spike bleeds through immediately |
| alphaDown | 0.45 | Slow cooling: takes ~2–3 days to return to baseline |
Each sub-index also has its own smoothing rate:
| Sub-Index | α Up | α Down | Note |
|---|---|---|---|
| rhetoricHeat | 0.5 | 0.4 | Inflammatory content fades faster |
| coveragePolarization | 0.3 | 0.2 | Slow-moving narrative gaps |
| volumeSignal | 0.45 | 0.35 | Light smoothing |
| legislativeFriction | 1 | 1 | No smoothing (7-day vote window) |
B. Event Boosts & Decay
Breaking events add a fixed number of temperature points (no feedback loop). The boost decays exponentially over time, and multiple events are compressed logarithmically to prevent runaway stacking.
Initial Boost (fixed, per severity)
| Severity | Boost | Decay τ | Multiplier |
|---|---|---|---|
| critical | +18 pts | 7 days | 1.5× |
| major | +10 pts | 5 days | 1.3× |
| moderate | +5 pts | 3 days | 1.1× |
| minor | +2 pts | 2 days | 1× |
Boost Caps & Compression
| Parameter | Value | Purpose |
|---|---|---|
| Normal boost cap | 22 pts | Max event contribution during baseline conditions |
| Crisis boost cap | 35 pts | Max event contribution when 2+ crisis indicators active |
| Log compression cap | 35 pts | Asymptotic maximum when multiple events stack |
Event Detection Thresholds
| Severity | Min Spike | Min Absolute Temp | Velocity |
|---|---|---|---|
| critical | 6× baseline | ≥68 | ≥6× |
| major | 4× baseline | ≥58 | ≥4× |
| moderate | 2.5× baseline | ≥52 | ≥2.5× |
| minor | 1.5× baseline | ≥45 | ≥N/A× |
C. Cross-Index Interaction Terms
When multiple sub-indices spike simultaneously, interaction terms amplify or dampen the composite score. Each uses a smooth sigmoid ramp (no hard cutoff) with a 10-point transition zone around the midpoint.
Crisis Amplifier (max +12 pts)
Activates when volume > 55 AND rhetoric > 30. High news volume paired with inflammatory language signals a genuine crisis.
Breakdown Amplifier (max +9 pts)
Activates when polarization > 50 AND friction > 40. Narrative divergence plus legislative gridlock signals institutional breakdown.
Boring Dampener (max −5 pts)
Activates when volume > 55 but rhetoric < 20 and intensity < 20. High volume of non-inflammatory, low-intensity coverage reduces the score.
Double-Count Guard
When both AI intensity and event severity exceed 45, a 15%–30% reduction is applied to prevent the same signal from being counted in both sub-indices.
D. Sub-Index Formulas & Weights
Volume Signal (30%)
Compares today's article count against a 8-week same-day-of-week baseline using a sigmoid with steepness k=2.2. Early pipeline runs extrapolate using time-of-day completeness factors:
Polarization (30%)
Five components measuring narrative divergence between left- and right-leaning sources:
Intensity (20%)
Seven weighted components measuring story urgency and breaking-news velocity:
Keyword co-occurrence: 1.3× for 2 keywords in same headline, 1.5× for 3+.
Rhetoric (10%)
Standard formula (3-component):
Extended formula (6-component, when full LLM scoring available):
Escalation pattern scan adds up to +45 pts (headline-level detection of 16 patterns across 6 categories).
Friction (10%)
Weighted across government branches:
E. AI & Ensemble Sentiment
Sentiment scores are produced by an ensemble of models. When models disagree significantly, confidence is reduced rather than picking a winner.
| Model | Weight |
|---|---|
| openai | 40% |
| huggingface | 40% |
| groq | 20% |
F. Confidence Intervals & Quality
Confidence intervals are calibrated empirically from historical prediction residuals, not from fixed bucket sizes. The system also monitors for anomalies that may indicate data quality issues.
Empirical CI Calibration
| Parameter | Value | Purpose |
|---|---|---|
| Lookback window | 60 days | Historical residuals analyzed |
| Coverage target | 90% | Percentile of residual distribution |
| Min data required | 14 days | Falls back to fixed CI below this |
| Weekend adjustment | 1.1× | Wider bands on weekends (less data) |
| Width bounds | 2–15 pts | Floor and ceiling for half-width |
| Fallback half-width | 8 pts | Used when insufficient historical data |
Confidence Levels
Anomaly Detection
Automatic alerts fire for: impossible values (<0 or >100), frozen scores (unchanged 3+ days), unexplained spikes (>15 pts with max sub-score <70), data droughts (<20 articles on weekdays), sub-score swings (>20 pts same day), and driver churn (>75% of top drivers changed without a breaking event).
G. Regime Detection
A 7-day trailing analysis detects whether the political environment is in crisis, elevated, or baseline mode. During crises, the system shifts weight from volume toward intensity (intensity matters more than raw article count during genuine emergencies).
| Regime | Condition | Weight Adjustment |
|---|---|---|
| Crisis | 2+ sub-indices avg >70 AND intensity avg >60 | Volume −5%, Intensity +5% |
| Elevated | Intensity avg >50 OR 2+ sub-indices avg >55 | Volume −2%, Intensity +2% |
| Baseline | Everything else | No adjustment |