DisclosureFeed

Sources & Methodology

DisclosureFeed is a real-time, machine-readable intelligence feed of formally-filed cybersecurity / data-breach disclosures from public regulatory sources worldwide.

Source list

We ingest only primary regulatory sources plus declared press supplementation. Current Tier-1 sources (V1):

Extraction pipeline

Each source document passes through:

  1. Sanitization — Unicode normalization, dangerous-HTML stripping, bidi override removal, prompt-injection guard.
  2. Triage — Claude Haiku 4.5 classifier decides if the document is a breach disclosure and selects an extraction template.
  3. Structured extraction — Claude Sonnet 4.6 with Instructor + Pydantic schema validation produces a BreachDisclosure v1 object with per-field source-span citations and per-field confidence scores.
  4. Hard-pass review — Claude Opus 4.7 with extended thinking re-extracts any document where Pass-2 overall confidence is < 0.80 or any single field is < 0.70.
  5. Entity resolution — name + LEI/CIK lookup via the GLEIF and SEC EDGAR registries.
  6. Cross-jurisdiction dedup — same incident filed across multiple jurisdictions (SEC + state AG + OCR) is grouped under a canonical incident id.
  7. PII redaction — natural-person names appearing in incident narratives are replaced with type-tags before customer-visible fields are emitted.
  8. Human review queue — any record below the confidence threshold is reviewed by a DisclosureFeed operator.

AI-assisted output disclosure (EU AI Act Art. 50)

Every record carries ai_assisted: true. Every API response envelope carries meta.ai_assisted: true. Every dashboard view shows the disclosure inline + in the footer. EU AI Act Article 50 takes effect August 2, 2026.

Accuracy SLOs

Corrections

Email corrections@disclosurefeed.com — 48-hour SLA from receipt.

Provenance

Every record carries: