Methodology
TL;DR
The Political Temperature Index produces a daily 0-100 score measuring US political tension. It ingests ~400 articles and ~200 social posts per run from 149 unique sources across 31 integrations, clusters them into topic groups via AI, and calculates five sub-indices: Volume (30%), Polarization (30%), Intensity (20%), Rhetoric (10%), and Friction (10%). The weighted composite is smoothed with an asymmetric EMA (fast heating, slow cooling) and boosted by breaking event multipliers.
Overview
The Political Temperature Index (PTI) is a daily measurement of US political tension on a 0-100 scale. It combines data from multiple sources to provide an objective snapshot of the political climate, updated three times daily (7 AM, 1 PM, and 7 PM ET).
149
Unique Sources
5
Sub-Indices
2,000
Sources Tracked
870+
Events Validated
Data Sources
News Articles (8 integrations · 72 sources)
We analyze 400+ articles daily from sources across the political spectrum using GDELT (primary, with dynamic query generation from recent top stories), GDELT Global Entity Graph (GEG) for neural sentiment, NewsData.io, NewsAPI.org, Google News RSS (with redirect resolution for accurate source attribution), The Guardian, MediaStack, and an RSS Aggregator pulling from 65 feeds including NPR, AP News, Reuters, PBS NewsHour, ABC News, Politico, The Hill, Roll Call, C-SPAN, USA Today, 14 independent media outlets (7 left, 7 right), 12 Substack political commentators (Taibbi, HCR, Greenwald, Silver, Yglesias, and more), and 10 state political news outlets (Texas Tribune, CalMatters, Politico Florida, Michigan Advance, Pennsylvania Capital-Star, Georgia Recorder, Arizona Mirror, Wisconsin Examiner, Ohio Capital Journal, NC Policy Watch). Source bias is rated using a curated database of 2,000 sources with AllSides ratings, fuzzy name matching (Levenshtein + Dice coefficient), and LLM-based auto-rating for unknown sources.
Social Media (6 integrations · 60 sources)
Posts from 15 politically-relevant subreddits (5 left-leaning, 4 right-leaning, 6 centrist), political content from Bluesky, posts from major Mastodon instances, 14 political communities on Lemmy across 5 instances (lemmy.world, lemmy.ml, sh.itjust.works, lemm.ee, lemmy.dbzer0.com), comments from 50+ monitored YouTube channels across the political spectrum, and keyword search on Threads (pending Meta app review). Search terms and hashtags are dynamically populated from recent top drivers and breaking events, ensuring emerging topics are captured automatically. Each platform uses custom bias scoring: Reddit (subreddit-based), Bluesky (content analysis), Mastodon (instance-based), Lemmy (community-based), YouTube (channel-based with pre-assigned scores).
Legislative, Judicial & Regulatory (11 sources)
Federal congressional votes via Congress.gov and GovTrack APIs. Floor speeches and debate transcripts via the Congressional Record. State legislature bills from LegiScan (all 50 states) and state legislative activity from OpenStates. Federal and appellate court opinions from CourtListener. Supreme Court coverage from SCOTUSblog (decisions, orders, cert petitions). Executive orders and regulations via the Federal Register API. Federal rulemaking activity from Regulations.gov (proposed and final rules from political agencies). Government accountability reports from GAO. Election data from Google Civic Information API.
Campaign Finance (1 source)
Independent expenditures and PAC spending data from the Federal Election Commission (FEC) API tracks campaign finance activity. High spending on negative ads correlates with increased political tension.
Cross-Validation (4 sources)
Polymarket prediction market data provides a market-based signal for political uncertainty, replacing PredictIt (winding down). Wikipedia Pageviews tracks attention spikes on 20 key political pages as a public interest proxy. GDELT TV News measures television mention volume for political terms. Snopes fact-check articles supplement PolitiFact for broader misinformation and rhetoric detection.
Sub-Indices
The final temperature is calculated from five sub-indices:
Rhetoric Heat (10%)
Measures toxicity and inflammatory language in news headlines and social posts. Primary scoring uses DB-stored toxicity and inflammatory scores (weighted by source reliability). When ensemble mode is enabled and 2+ AI models are available, blends 70% DB-based score with 30% multi-model ensemble (OpenAI Moderation, HuggingFace BERT, Groq) for improved accuracy. Falls back to 7-day historical average when fewer than 10 toxicity samples are available, or to a neutral baseline (20) if no historical data exists.
rhetoric = (mean_toxicity × 0.4) + (mean_inflammatory × 0.4) + (gdelt_neg_tone × 0.2)
+ escalation_boost (headline pattern scan, up to +45 pts)
+ fact_check_boost (when rulings present)Coverage Polarization (30%)
Uses a 4-component formula measuring how differently left and right media cover the political landscape. Narrative divergence (40%) detects when both sides frame the same story with opposing sentiment. Topic siloing (30%) catches when left and right cover entirely different stories. Coverage blindspots (20%) penalize stories covered by only one side, weighted by article count. A legacy intensity component (10%) preserves backward compatibility.
narrative_divergence (40%): left vs right sentiment gap per topic
topic_siloing (30%): coverage overlap between left and right
blindspots (20%): one-sided coverage, article-weighted
legacy_intensity (10%): partisan_intensity × imbalance
polarization = weighted sum across all topic clustersVolume Signal (30%)
Detects unusual spikes in political content compared to the day-of-week baseline (e.g., Tuesdays vs past Tuesdays). Falls back to 30-day average if insufficient same-day samples. Primary indicator based on historical event validation.
baseline = same_day_of_week_avg (8 weeks) || 30d_avg
time-of-day extrapolation: 0.35× before 7:00 AM EST, 0.55× before 9:00 AM EST, 0.8× before 1:00 PM EST
carry-forward floor: if today < 40% of avg AND partial data → yesterday × 0.7
volume = sigmoid(3 × ((today_count / completeness) / baseline - 1)) × 100Legislative Friction (10%)
Tracks political friction across four government branches over a 7-day window. Each branch's party-line score (0-1, where 1.0 = pure party-line voting) is weighted and combined. Missing branches default to 0.25 (low friction). During recess, federal friction decays 20% per day toward the 0.25 baseline from the last known value, preventing abrupt drops when campaigns and investigations keep tension high. A bipartisan cooling factor reduces friction when >40% of federal votes are bipartisan (party-line score < 0.2).
federal (Senate+House): 50% | executive (Fed Register): 20%
state (LegiScan): 15% | judicial (CourtListener): 15%
recess decay: 0.25 + (last - 0.25) × 0.8^days
bipartisan cooling: factor = 1.0 - (ratio - 0.4) × 0.5
friction = weighted_sum(party_line_scores) × cooling × 100Story Intensity (20%)
Detects breaking news severity using AI-based scoring as the primary method. Reuses OpenAI Moderation scores (toxicity 40%, inflammatory 60%) at zero extra API cost. Falls back to keyword matching if <10 AI samples available. GDELT tone captures both extremes: negative tone (crisis, conflict) and extreme positive tone above +20 (sarcastic celebration, "crisis averted" articles) both contribute to intensity. Includes story velocity detection for rapidly emerging stories. Normalization factors (velocity scaling, cluster heat ceiling, tone intensity) are adaptively computed from the 95th percentile of the last 14 days of intensity metrics, preventing both saturation and under-sensitivity as data accumulates. Falls back to static constants when fewer than 7 days of historical data exist.
AI_score = (toxicity × 0.40) + (inflammatory × 0.60)
tone: negative = |tone| / 10, positive = max(0, tone - 20) / 15
intensity = (AI × 0.30) + (velocity × 0.25) + (severity × 0.25)
+ (tone × 0.05) + (social × 0.05) + (market × 0.05) + (econ × 0.05)
(unavailable components redistribute weight to available ones)Weight Reasoning
The sub-index weights were determined through empirical validation against 870+ historical political events from 2015-2026 (elections, protests, Supreme Court decisions, mass shootings, impeachments, etc.). A grid search optimization was performed to find weights that maximize correlation between our calculated temperatures and known high-tension periods.
Why These Specific Weights?
Validation Results
Optimization improved historical event detection from 50% to 88%+ accuracy. Previous weights (Rhetoric: 35%, Polarization: 30%, Volume: 15%, Friction: 20%) over-weighted rhetoric, causing false positives from routine partisan media coverage. The current 5-index model with Story Intensity and multi-model ensemble sentiment better captures breaking events without manual intervention. The model is validated daily against a database of 870+ historical events (2015-2026) including baseline "quiet periods" to measure false positive rates.
Temperature Scale
Frozen
0-14
Cool
15-29
Mild
30-44
Warm
45-59
Hot
60-74
Boiling
75-89
Meltdown
90-100
Temperature Smoothing
Raw scores are smoothed using an asymmetric Exponential Moving Average (EMA) that responds quickly to rising tension but cools gradually, matching how political tension behaves in the real world.
Asymmetric EMA
Heating up (fast): α = 0.60 → responds quickly to events
Cooling down (slow): α = 0.38 → takes ~3-4 days to decay
smoothed = α × raw + (1 - α) × previousEvent Decay
Major events create a boost that decays exponentially over time:
boost = initial_boost × e^(-t/τ)
τ (decay time): critical=7d, major=5d, moderate=3d, minor=2dEvent Discovery System
The system actively detects breaking events that drive temperature spikes using dynamic cluster-based detection with AI metrics, replacing the former hardcoded keyword approach.
Detection Methods
- 1. Volume Spike Detection compares topic cluster volumes to 7-day averages (3x = major, 5x = critical)
- 2. Cluster-Based Breaking Detection reads topic clusters with AI-scored toxicity and inflammatory metrics, compares volume to historical averages, and derives severity from data rather than keyword matching
- 3. Story Velocity compares today's cluster article counts to yesterday's to detect rapidly emerging stories. High-impact clusters (war, crisis, conflict) use relaxed matching (1-word overlap). Web search severity provides a velocity floor when cluster-based detection returns low scores during genuine crises.
- 4. Driver-Based Detection uses LLM-clustered topics from daily drivers with metric-based severity derived from cluster toxicity and inflammatory scores
Severity Multipliers
Critical
Major
Moderate
Minor
Dynamic Severity Classification
Event severity is derived from actual article metrics rather than keyword pattern matching. Each topic cluster's articles are scored for toxicity and inflammatory content, and these averages determine severity upgrades:
base severity: 50+ articles = critical, 30+ = major, 15+ = moderate
AI intensity = avg(toxicity, inflammatory) from cluster articles
intensity > 0.3: upgrade 1 level | intensity > 0.5: upgrade 2 levels
heat contribution > 20%: upgrade 1 additional level (cap: critical)Stacking Protection
Uses either the event multiplier or the decay boost from past events (whichever is larger), never both. Total boost is capped at 15 points to prevent event noise from jumping temperature bands. Events with residual decay below 2.0 points are pruned so stale events don't accumulate.
Event Quality Filters
Before saving, detected events pass through three quality filters to prevent noise:
- 1. Foreign story filter checks events against 50+ static country/region indicators (Pakistan, Sudan, Ukraine, etc.) plus dynamically discovered foreign terms. The system analyzes 30 days of topic clusters to find terms that consistently co-occur with known foreign content, so new foreign entities (e.g., "navalny", "starmer") are identified automatically within days. All matches use word-boundary regex and are checked for US relevance; stories with no US connection are excluded.
- 2. Vague title filter rejects keyword-only titles like "Breaking: Terror" or titles shorter than 15 characters that lack specificity.
- 3. Editorialized filter rejects titles containing loaded language like "terror campaign", "slams", or "blasts" that indicate opinion rather than factual reporting.
Domestic vs International Weighting
Since this measures US political tension, international stories receive reduced weight based on their domestic relevance:
2+ US keywords match (aid, vote, sanctions, etc.): 1.0× full weight
1 US keyword match: 0.5× significant reduction
No US connection: 0.1× heavy reduction
Non-foreign stories: 1.0× (unaffected)Events → Top Drivers Bridge
Critical and major breaking events are automatically promoted into the Top Drivers list using fuzzy topic matching (Jaccard word overlap). This ensures a significant event always appears in both the breaking news section and the ranked driver list, even when detected between the three-times-daily scoring runs.
similarity = |words_A ∩ words_B| / |words_A ∪ words_B|
match threshold: 0.4 (40% word overlap)
injected heat: critical = 25%, major = 15%Driver Quality Pipeline
The "Top Drivers" list is produced by a multi-stage pipeline that ensures accuracy, prevents missing major stories, and filters out noise.
Pre-Clustering Intelligence
Before the LLM clusters headlines, the system fetches Google News RSS to extract the top 20 trending headlines and top 15 keywords. These are injected as "Known Top Stories" into the LLM prompt so it doesn't miss major stories that may be underrepresented in the article sample. Trending keywords also seed additional DB queries to pull in relevant articles.
Dynamic Keyword System
All keyword lists across the pipeline are dynamically populated from recent data instead of relying on hardcoded static lists. This ensures emerging topics are automatically captured without manual intervention.
Social search terms: core terms + recent driver/event keywords
GDELT queries: static baseline + dynamic queries from high-heat drivers
Article counts: Document Frequency map classifies keywords at runtime
→ Words in >10% of articles = "generic" (searched with AND)
→ Specific keywords searched with OR (prevents count inflation)
Headline filtering: TF-IDF weighted relevance scoring
→ Rare terms (low DF) rank higher: score = Σ log(1 / (df + 0.01))
→ Ensures driver headlines match their topic, not unrelated storiesArticle Sampling
Up to 350 articles from a 48-hour window are selected using multi-source sampling: high-heat articles, left-leaning sources, right-leaning sources, high-volume topic representatives, and trending keyword matches. Articles are deduplicated by normalized title (removing GDELT syndication duplicates) and filtered to exclude non-political content (local crime, accidents, recipes).
LLM Clustering
Headlines are clustered by a multi-provider LLM system (6 providers with automatic fallback). The clustering prompt enforces factual, neutral cluster names with no loaded language like "terror", "hate", or "slams". Domestic US stories are prioritized over international.
Post-Clustering Validation
After clustering, the system checks the top 10 keywords by article count. If any keyword with 30+ articles is not represented in any cluster, a warning is logged. This catches cases where a major story (e.g., 442 Epstein articles) was missed by the LLM.
Dynamic Related Topic Groups
Duplicate clusters about the same story are merged using dynamically discovered topic relationships. The system analyzes 30 days of keyword co-occurrence in topic clusters using a Union-Find algorithm to discover which terms consistently appear together.
Co-occurrence graph: keyword pairs in same cluster across distinct days
Strong edges: 3+ days of co-occurrence → connected component
Example: "epstein" ↔ "maxwell" ↔ "bondi" auto-discovered
Merged with static groups (static = floor, dynamic = superset)
Cache: 6hr TTL, refreshed from topic_clusters historyCluster Quality Scoring
A pure algorithmic validator (no LLM calls) runs after clustering to measure quality across four dimensions, producing an overall score (0-100). Warns when score drops below 70.
Within-cluster similarity: Jaccard overlap of headlines in same cluster
Cross-cluster separation: Low overlap between different clusters
Coverage rate: % of input headlines assigned to a cluster
Foreign contamination: % of clusters matching foreign indicatorsVolume-Based Heat Caps
The LLM assigns heat levels based on headline tone, but can't see article volume. Bidirectional corrections are applied:
Volume floors: 500+ → extreme, 200+ → high, 50+ → medium
Volume caps: <3 → max low, 3-9 → max medium, 10-19 → max high
Prevents LLM rating 5-article story as 'extreme'Trending Cross-Validation
After LLM clustering, drivers are cross-validated against trending headlines from 4 external sources (Google News, Memeorandum, Reddit Rising, AP/Reuters). This catches politically significant stories the ingested article corpus missed.
Step 1: Fetch trending headlines from 4 aggregators
Step 2: Match each headline against clusters (Jaccard, keyword overlap)
Step 3: Unmatched headlines confirmed by 2+ sources → synthetic gap topics
Step 4: LLM identifies 0-3 semantic gaps (different wording, same story)
Gap topics injected with 1.2× trending heat bonus (capped at 3)Trending Validation Penalty
Drivers with zero trending headline matches receive a 0.85× heat penalty. This creates natural turnover: stale, unvalidated drivers lose heat while trending stories gain it.
Driver Freshness Tracking
Each driver tracks when it first appeared and its article velocity trend (rising, stable, declining, or stale). Decay penalties increase with age:
Rising or Day 1: 1.0× (no penalty)
Stable ≤ 3 days: 0.95×
Declining: max(0.5, 1.0 - days × 0.1)
Stale: max(0.3, 1.0 - days × 0.15)
Compound: unvalidated + declining/stale + 3+ days = 0.5×Web-Grounded Search (Parallel Enrichment)
A Gemini 2.5 Flash call with Google Search grounding runs in parallel with headline clustering to independently discover what US political stories are happening right now. This provides an external ground-truth signal that doesn't depend on the ingested article corpus. If the web search fails or times out, the pipeline continues unaffected.
Web search results are merged into the driver pipeline via three operations:
1. CONFIRM: web story matches existing cluster (Jaccard ≥ 0.25 or
2+ keyword overlap) → cluster gets 1.1× heat bonus
2. RESCUE: web story is international but US-relevant (usRelevance ≥ 0.6)
and matches a cluster previously filtered as foreign
→ cluster restored with isDomestic=true + 1.15× heat bonus
3. INJECT: web story is not minor, usRelevance ≥ 0.5, matches no
existing cluster, AND has 1+ corroborating signal from independent
sources → synthetic cluster created with 1.15× heat bonusAI Sentiment Analysis
The system uses a multi-model ensemble for improved sentiment accuracy, combining three AI providers for robust toxicity and inflammatory detection.
Ensemble Weights
OpenAI Moderation
General toxicity
HuggingFace BERT
Domain-specific models
Groq LLM
Contextual analysis
Sarcasm Detection
Before sentiment analysis, content is checked for sarcasm/irony using HuggingFace models combined with heuristic signals (punctuation patterns, capitalization, sentiment shifts). Detected sarcasm is weighted differently to avoid misclassification.
Fallback Chain
If the ensemble is unavailable (API limits, network issues), the system falls back to: (1) GDELT GEG neural sentiment, (2) OpenAI Moderation alone, (3) 7-day historical average. This ensures temperature calculations continue even during API outages.
Heat Source Analysis
The dashboard shows where political "heat" (inflammatory/toxic language) is coming from, with intelligent intensity comparison.
Heat Intensity Comparison
Compares each group's heat contribution % to their coverage share % to determine if they're "running hot" (more inflammatory than expected) or "running cool":
intensity_ratio = heat_percentage / coverage_percentage
> 1.15: ↑ Running hot (orange), more heat than coverage suggests
< 0.85: ↓ Running cool (cyan), less heat than expected
0.85-1.15: → Expected (gray), heat matches coverage shareThis prevents misleading conclusions. If a group produces 45% of heat but also 45% of coverage, they're not actually running "hotter" than expected.
Confidence Score
A data quality metric (0-100) displayed on the dashboard that indicates how reliable the temperature reading is based on available data.
80+ score
60-79 score
40-59 score
<40 score
Based on four components: article volume (30%), source match rate (30%), sentiment coverage (25%), and source diversity (15%). Click the confidence badge on the dashboard to see each component's score. When confidence is low, smoothing trusts historical data more heavily.
Data Quality Monitoring
The pipeline continuously validates its own output to catch data quality issues and ensure consistency across its three daily runs.
Anomaly Detection
Automatically flags impossible values (out-of-range scores), frozen scores (identical temperature for 3+ consecutive days), unexplained spikes (large jumps without high sub-scores), data droughts (too few articles on a weekday), and cross-run inconsistencies (sub-score swings or driver churn between the same day's pipeline runs).
Post-Pipeline Consistency Validation
Before saving the daily temperature, the system validates internal consistency: temperature delta from yesterday must be ≤ 25 points (unless a critical event was detected), sub-scores must be internally coherent, and no sub-score should change more than 30 points without a corresponding data signal. Issues are logged as warnings for investigation, never blocking the pipeline.
Multi-Source Cross-Validation
Stories discovered through web search are verified against at least one independent source (trending headlines from 4 aggregators, breaking events, or gap detection) before being promoted to top drivers. This prevents single-source stories from contaminating the driver list.
Fallback Tracking
When any scoring component falls back to a secondary method (e.g., historical averages instead of live data, static normalization instead of adaptive), the event is tracked and logged. This provides visibility into when the system is operating at reduced data quality.
Source Balance
We monitor the left/right distribution of our news sources to ensure balanced coverage. If the ratio exceeds 70/30 in either direction, an alert is logged for transparency. Our source bias ratings are based on Media Bias/Fact Check and AllSides data.
Complete Data Sources
News (8 integrations · 72 sources)
- GDELT
- GDELT GEG (neural sentiment)
- NewsData.io
- NewsAPI.org
- Google News RSS
- The Guardian
- MediaStack
- RSS Aggregator (65 feeds)
Social (6 integrations · 60 sources)
- Bluesky
- Mastodon
- Lemmy
- YouTube (55+ channels)
- Threads (pending review)
Legislative (11)
- Congress.gov
- GovTrack
- Congressional Record
- Federal Register
- Regulations.gov
- GAO Reports
- LegiScan (state bills)
- OpenStates (state legislatures)
- CourtListener (courts)
- SCOTUSblog (Supreme Court)
- Google Civic (elections)
Finance (1)
- FEC (Federal Election Commission)
Cross-Validation (4)
- Polymarket (prediction markets)
- Wikipedia Pageviews
- GDELT TV News
- Snopes (fact-checks)
Social Media Bias Scoring
Each social media platform uses a different method to assign political bias scores (-3 to +3 scale, where negative = left, positive = right).
Subreddit-based lookup. Each subreddit has a pre-assigned bias score based on its typical political leaning (e.g., r/progressive = -2.5, r/Conservative = +2.0).
Bluesky
Content analysis using 80+ political indicator keywords. Posts are classified as left/center/right based on politician mentions, issue keywords, and media references.
Mastodon
Instance-based bias. Each Mastodon instance (server) has an assigned lean based on its community demographics (e.g., kolektiva.social = -2.0, noagendasocial.com = +1.5).
Lemmy
Community@instance key lookup with content fallback. Specific political communities have assigned biases; others use content analysis for classification.
YouTube
Channel-based bias scoring. 55+ monitored political channels across the spectrum have pre-assigned bias scores based on their editorial direction.
Threads (Pending)
Keyword search for political content. Pending Meta app review for the threads_keyword_search permission. Will use content analysis similar to Bluesky.
Data Window & Freshness
Most calculations use a 7-day rolling window to ensure consistency across all metrics and smooth out daily noise. This means the temperature reflects the past week's political activity, not just the last 24 hours. When today's data is not yet available (between midnight UTC and the morning pipeline run), the dashboard displays the most recent available date with an amber "Latest" badge indicating data staleness.
Window by Component
Trend Visualization
The 30-day trend chart uses temperature-aware gradient coloring where the line color changes based on the Y-axis temperature value, making it easy to spot when tension was high (pink/red) versus calm (blue/green). Event annotations from breaking events (moderate severity and above) and historical events are overlaid on the chart, showing what political events drove temperature changes.
Sub-index sparklines show 7-day historical trends for each component, making it easy to spot which sub-index is driving recent temperature changes.
Temperature Delta Explainer
When the temperature changes by more than 1 point from the previous day, the dashboard shows a "What Changed" breakdown listing the top 3 sub-indices that contributed most to the shift. Each contribution is weighted by the sub-index's percentage of the total score.
weightedDelta = (today - yesterday) × sub_index_weight
Only sub-indices with |weightedDelta| > 0.5 are shown
Example: VOL +3.2 POL -1.1 INT +0.8Historical Event Auto-Promotion
Significant political events are automatically promoted into the historical event database for long-term accuracy tracking and trend chart annotations. This runs daily at 23:30 UTC.
Candidate Selection
Candidates are drawn from two sources: today's top drivers (ranked by heat contribution) and today's breaking events (critical/major severity). Candidates are merged, deduplicated by fuzzy name matching, and enriched with a days-seen count (how many of the last 5 days the topic appeared as a driver). This helps the AI distinguish genuinely new events from ongoing coverage of the same story.
AI Classification
An LLM evaluates each candidate for historical significance, assigning a severity (critical, major, moderate, minor) and an impact score (0-100). Candidates classified as minor or with low impact are rejected. The LLM receives the days-seen context to avoid promoting multi-day status updates ("Day 5 of protests") as separate events.
Hard Quality Filters
After AI classification, a deterministic filter rejects events that don't meet quality thresholds. This catches LLM over-promotion that the AI classification alone may miss:
Impact thresholds: critical < 20, major < 12, moderate < 8 → rejected
Title length: < 4 words → rejected (too vague)
Vague patterns: "continues", "ongoing", "heats up", "escalates" → rejected
Action verb required: must contain passed, signed, fired, arrested,
ruled, killed, launched, indicted, overturned, impeached, etc.Limitations
- The index measures tension, not correctness or validity of political positions
- Social media data includes Reddit, Bluesky, Mastodon, Lemmy, and YouTube; Threads integration is pending Meta app review
- Sarcasm detection improves accuracy but may still misclassify some ironic content
- Historical validation data spans 2015-2026 (870+ events); live data collection from launch date
- Source matching covers ~83% of articles; unmatched sources are auto-rated by LLM or default to center bias
- Temperature reflects a 7-day EMA window, so single-day events may take time to fully manifest