ENGINEERING METHODOLOGY

How it actually works.

Nova News is not a “writer.” It is a distributed extraction-and-verification pipeline. Roughly 4,000 articles per day flow in from 150+ news sources spanning the full political spectrum, plus direct feeds from primary-source APIs (SEC EDGAR, Congress.gov, federal courts, central banks, arXiv, bioRxiv, FDA, WHO). The pipeline deduplicates them, atomizes their factual claims, cross-references each claim across independent ownership groups and authoritative databases, scores them, and runs every synthesized story through grounding verification, saga-drift detection, and editorial classification before it can reach the feed.

This page starts with why we built it — the structural failure modes of modern news and what we’re trying to do differently — then walks through how the system actually works, layer by layer. If anything described here doesn’t match the system you’re using, that’s a bug, not a feature, and we want to know about it.

Why Nova Exists

The news has a structural problem: every outlet’s incentives point away from informing you. Most readers feel the result — everything seems louder, angrier, more partisan, more repetitive — without seeing the mechanism. The mechanism is a stack of biases that compound when news is funded by attention.

engagement_biasOutlets optimize for time-on-site. Outrage, conflict, and novelty travel further than facts because they keep you scrolling.

framing_bias"Crackdown" or "operation"? "Insurgent" or "freedom fighter"? Same fact, different language. Word choice carries the editorial.

selection_biasWhat gets covered shapes what you think the world is. What's omitted matters as much as what isn't, and omission is invisible.

authority_biasTwelve outlets rerunning the same wire story isn't twelve sources of confirmation. Repetition isn't verification.

filter_bubbleAlgorithmic feeds reward agreement. The opinions you already hold shape what facts you see; disagreement gets demoted before it ever reaches you.

recency_biasNewest is loudest. A 30-second update buries the multi-month context that gives the news its meaning.

Each of these is a rational response to the wrong incentive. Outlets that don’t optimize for attention lose to ones that do. Algorithms that don’t feed you content you already agree with lose to ones that do. The news you read is shaped by the structural pressures on the people making it, not by anyone’s editorial intentions.

Nova’s response is structural, not editorial. We don’t hire a different kind of journalist. We built a system whose objective function is verifiable information density — getting you informed and back to your life as efficiently as possible, and not a second more. The architecture changes what we’re rewarded for producing.

No advertising

Nova has no ads. None. Not as a launch promo, not as a freemium tier. The business model has no incentive to keep you scrolling, manufacture controversy, or soften coverage of paying advertisers. This is structural, not a stance.

Full spectrum, by construction

Every story is built from sources spanning the full political range, plus primary-source APIs (filings, court opinions, central-bank statements, scientific preprints). You see what each side reported and where they disagreed.

Ownership-aware

The system tracks corporate parents. Twelve outlets owned by three companies count as three independent sources, not twelve. Wire-service repetition does not inflate confidence.

Verification as the product

Most news sites treat verification as overhead. Nova treats it as the deliverable. Every claim has a confidence score, a source list, and a chain of evidence — auditable, not asserted.

The rest of this page is the how. Pipeline, sources, claim extraction, verification, gates, what ships to your feed, where we still get it wrong.

The Pipeline

The pipeline’s objective is to produce stories whose every factual claim is supported by independent evidence. If a story can’t clear that bar, it is discarded. Most ingested content does not survive — either because it duplicates something we already have, fails verification, or fails the editorial-significance gate. What you see is what made it through.

Each stage has independent quality gates. An article can be killed at any point—most are. The vast majority of raw ingested content is either duplicate, low-information, or fails verification. What survives is what you see.

Deduplication runs at three independent layers, because the same news event can echo across the system in three different shapes. (1) At ingest, near-identical articles from the same temporal window collapse into a single record — the wire-service echo effect where dozens of outlets republish the same AP or Reuters rewrite. (2) At synthesis, events with overlapping actors, matching event_keywords, and embedding similarity collapse into a single EventNarrative, so “Apple announces earnings” and “Tim Cook reports Q3 numbers” resolve to one story. (3) Canonical entities are clustered at cosine ≥ 0.88, so “Apple,” “Apple Inc.,” and “Apple Inc” resolve to a single record across the graph.

A staged-replay harness re-runs the extraction pipeline at progressively larger scales (250 → 500 → 1,000 articles) and gates the next stage on the previous stage’s quality metrics. If a stage fails its gate — bad event-merge rate, bad entity-resolution rate, drop in claim quality — the harness halts before scaling further and writes an alert artifact. The live pipeline keeps running so users aren’t blacked out; operators are notified within hours of the offending change. We do not auto-roll-back production. A human looks at the alert and decides.

Source Diversity & Credibility

We ingest from over 100 news outlets spanning the full political and editorial spectrum—wire services, legacy print, broadcast, digital-native, international, and partisan media from both sides—plus direct feeds from primary source APIs: SEC filings, Congressional records, FDA approvals, WHO outbreak reports, court opinions, central bank statements, arXiv, and bioRxiv.

This is deliberate. You cannot detect bias if you only read one side. You cannot verify facts if you only have secondary reporting. The system needs the full picture to do its job.

Every source domain is scored automatically using an ensemble of five signals. No human editors assign ratings—the credibility score is derived entirely from observable, auditable data. TLD quality, HTTPS presence, subdomain depth, and suspicious URL patterns are penalized at the infrastructure layer.

Sources are classified into nine categories: wire service, legacy broadcast, legacy print, digital native, partisan media, opinion outlet, international, local, and aggregator. Each category carries a different baseline credibility range, which is then adjusted by the domain-specific ensemble score.

Spectrum coverage

Sources from far-left to far-right, rated on a 7-point political lean scale. Each source carries a credibility score calibrated by the ensemble model.

Primary sources

Direct feeds from SEC, Congress, FDA, WHO, ECB, federal courts, arXiv, bioRxiv, and other authoritative bodies. These are ground truth, weighted above all secondary reporting.

Domain reliability

Scored on Tranco rank, Wayback longevity, Wikipedia notability, Safe Browsing status, and DNS/SSL health. Updated continuously.

Ownership mapping

Corporate parent tracking. Fox News and WSJ (both News Corp) count as one source. The system maintains the full ownership graph across all ingested outlets.

Claim Atomization & Confidence

News articles are unstructured text mixed with opinion, framing, speculation, and background. To verify facts, we first have to extract them. The system breaks every article into atomic, individually verifiable claims—each tagged with a calibrated confidence score.

Why this matters: Framing is often used to mask weak evidence. “Amidst growing concerns” is editorial color, not a fact. By isolating the claim, we can compare reporting purely on the verifiable substance.

Input: Raw Article Segment

“Amidst growing concerns over fiscal irresponsibility, the Senate narrowly passed the controversial infrastructure bill late Tuesday in a move that stunned Washington insiders...”

Output: Atomic Claims

Senate passed infrastructure billconf:1.0

Time: Tuesday, late sessionconf:0.9

Vote margin: narrowconf:0.8

"Growing concerns" — editorial framingdiscarded

Confidence scores are not raw LLM outputs. Every extracted claim runs through a four-axis quality scorer:

Specificity

Does the claim contain named entities, dates, or numbers? "GDP grew 2.4% in Q3" outscores "the economy improved" by roughly 3× on this axis.

Attribution

Is the claim attributed to a named source? Unattributed assertions are penalized. Direct quotes with speaker identification score highest.

Falsifiability

Could this claim hypothetically be disproven? Unfalsifiable statements — opinions, value judgments, tautologies — score down or get filtered.

Newsworthiness

Is this about the real world? Product descriptions, advice, listicles, and speculation are filtered. Only claims about verifiable real-world events pass.

The four axes combine into a composite score from 0.0 to 1.0. Borderline claims (composite between 0.3 and 0.7) escalate to a dedicated LLM judge that returns a three-way verdict: good_claim, weak_claim, or not_a_claim. Each verdict is stored with reasoning. Over time those labels become a corpus for recalibrating the scorer — a closed measurement loop, not a one-shot guess.

A note on the word “verified.” When Nova displays a claim as verified, that is not a statement that the claim is true. It means the claim is substantiated — supported across multiple independent sources, traceable to source text, and clears the quality and grounding thresholds. A claim can be widely reported, fully grounded, structurally sound, and still be wrong (see Source contamination in Limitations). “Verified” means substantiated enough to show with confidence, not true. Nothing on a news feed should be treated as the latter.

A core invariant for entities: confidence only increases, never decreases. When an entity appears in a later extraction with higher confidence, the score upgrades via max(existing, new); a weak observation never overwrites a strong one. This is what protects established facts — a person’s employer, a company’s stock ticker — from being silently degraded by a transient ambiguous mention.

Multi-Source Verification

We do not trust any single source. A claim is only promoted to “verified” status when it meets specific thresholds across independent ownership groups. The minimum confidence for a verified claim is 0.85. Below 0.60, a claim is considered ungrounded and filtered from the UI entirely.

Verification follows a four-tier hierarchy, from highest to lowest authority. The system attempts each tier in order and stops at the first successful verification:

Tier 1: Human fact-checks

Google Fact Check API surfaces existing fact-checks from organizations like PolitiFact, Snopes, and Full Fact. If a claim has already been human-verified, we use that verdict directly.

Tier 2: Resolved references

Phrases like "a Harvard study" or "a recent FDA ruling" are resolved to actual documents via Semantic Scholar, Congress.gov, SEC EDGAR, and other APIs. If the referenced source exists, the claim is grounded.

Tier 3: Direct citations

DOIs, bill numbers, case citations, and filing identifiers are extracted and validated against authoritative databases. Five specialized verifiers handle research papers, legislation, SEC filings, court documents, and economic data.

Tier 4: Cross-source triangulation

Claims are scored on independent source count, graph-path plausibility, and temporal consistency. Three or more truly independent sources trigger automatic confidence boost. Ownership grouping prevents corporate siblings from inflating the count.

At the extraction layer, every claim is also verified against its original article text. The grounding verifier checks whether supporting text exists, whether quoted statements are actually present, and whether numeric values match. Claims scoring below a grounding threshold of 0.6 are flagged as potential hallucinations; below 0.3, they are discarded.

Bias as Data

Most news products try to hide bias. We treat it as structured data. Every source in our system carries a political lean classification on a seven-point scale. Every claim tracks which perspectives reported it. This is not editorial judgment—it is a data model.

Contradiction detection works via semantic similarity combined with sentiment analysis. When two claims have high semantic overlap but opposite sentiment polarity, they are flagged as a ContradictionPair and surfaced in the feed with both positions attributed.

Every synthesized story is decomposed into an EventNarrative structure: confirmed facts (all perspectives agree), disputed claims (left vs. right with source attribution), coverage unique to one side of the spectrum, and a spectrum coverage map showing which parts of the political landscape have reported on the event. You see this directly in the feed. We don’t pick a winner. We show you the data.

Quality Gates

LLMs want to please the user. Left unchecked, they smooth over uncertainty, add plausible-sounding filler, and produce confident text about things they don’t actually know. To prevent that, every story passes through two evaluation layers: one at the claim level (does this single fact hold up?) and one at the article level (does this whole story belong here?). A draft can be killed at either layer.

Claim-level evaluation

Each extracted claim is scored on the four axes shown in Section 04 — specificity, attribution, falsifiability, newsworthiness — producing a composite quality score. The display thresholds are: below 0.6 the claim is hidden from your feed, between 0.6–0.7 a low-confidence indicator is shown, between 0.7–0.85 it is marked as inferred, and at 0.85 or above it is presented as verified.

Claims that land in the borderline band (0.3–0.7) escalate to a dedicated LLM judge that returns a three-way verdict: good_claim, weak_claim, or not_a_claim. Each verdict is stored with reasoning, building a labeled corpus that’s used to recalibrate the rule-based scorer over time.

A separate grounding check runs alongside the quality scorer. The grounding verifier compares each claim against its source article text, looking for unmatched quotes, unmatched named entities, and unmatched numeric values. Claims that fail to ground are flagged or discarded depending on how badly they miss. This is the “the LLM made it up” check, and it runs on every claim, every time.

Article-level gates

Saga drift. When a new article is being attached to an ongoing saga (an active war, a multi-year trial, a regulatory investigation), a separate, stronger reasoning model compares its framing against the saga’s established trajectory. If the new article introduces a contradiction without a corrective signal — reframing “indictment” as “exoneration” without citing new evidence, say — it is rejected from the saga. Roughly one in five candidate articles is rejected here.

Editorial tier. Every synthesized article is classified Tier 1 (essential, major news everyone should know) through Tier 5 (noise — historical rehash, opinion, blog posts, listicles, press releases). Tiers 4 and 5 are filtered from the main feed. The tier and a one-sentence reasoning are stored on the article, so every editorial decision is auditable.

# Why two layers, not one

grounding is rule-based with deterministic penalties.

saga_drift uses a stronger reasoning model.

editorial_tier uses a smaller, cheaper one.

No single model can reliably self-validate — an LLM that hallucinates a claim will also hallucinate the verification of it. Different models, different prompts, different failure modes. A draft that passes both layers was checked from independent angles by systems that share no parameters.

What You See

The feed you read is the output of all of this. Every story includes:

•Source count and confidence—how many independent outlets confirm this, and how confident the system is
•Contradictions—when sources disagree, you see both positions with full attribution
•Primary sources—direct links to official documents, filings, transcripts, and studies
•Political perspectives—what different parts of the spectrum are reporting, decomposed via the EventNarrative structure
•Story continuity—ongoing stories are linked into multi-event sagas with chronological timelines

We show our work. If you want to know why the system believes something, the evidence chain is there.

Limitations

We want you to trust this tool, which means we need to tell you exactly when you shouldn’t.

Nuance loss

Summarization is lossy. In complex legal or scientific stories, the system may simplify crucial details. We link to full sources so you can go deeper.

Source contamination

If a falsehood spreads to all major outlets simultaneously, our consensus model will validate it. Multi-source verification only works when sources are actually independent.

Latency

Verification takes time. A story moves through ingest, dedup, claim extraction, grounding, multi-source verification, synthesis, and three quality gates before it reaches the feed. End-to-end latency from primary report to publish is typically tens of minutes. For breaking news where social media is real-time, we are slower by design. Speed is not the optimization target — verifiability is.

Confidence calibration

Our confidence scores are calibrated but not perfect. A 0.9 confidence claim is wrong more often than 10% of the time in some domains. Nightly staged-replay QA catches regressions, but calibration is an ongoing effort.

LLM limitations

The extraction and synthesis layers use large language models. They can misinterpret sarcasm, miss cultural context, or struggle with highly technical content. The adversarial critic catches many of these, but not all.

Correction latency

When users submit corrections, the system learns from them for future extractions. But corrections to already-published stories propagate on a delay — typically within the next extraction cycle.

This is an ongoing engineering effort. Every threshold cited on this page lives in production code; every diagram corresponds to a real call site. The pipeline improves continuously as we add sources, refine extraction, retire bad heuristics, and learn from user corrections. If something on this page no longer matches the system, that’s a bug, not a feature—and we want to know about it.