WORKFLOW / ENRICHMENT

AI IoC enrichment in 2026: what actually works

LLM-driven IoC enrichment pipelines, false-positive rates, commercial tool verdicts, and a reference OSS architecture.

Last verified: April 2026

What IoC enrichment is

An IoC enrichment pipeline takes a raw indicator (IP, domain, file hash, URL, email address) and attaches context that makes it actionable. Pre-AI enrichment pipelines used rule-based lookups: submit the IP to VirusTotal, AbuseIPDB, and MISP; check if any feed already knows it as malicious; apply a verdict based on vote count. The analyst received a scored verdict but no narrative context.

AI enrichment adds four capabilities that rule-based lookups cannot provide. First, narrative context: the LLM can surface that this IP was used in a 2024 SWIFT-targeting campaign by a Lazarus Group affiliate, pulling from threat reports that mention the IP in prose. Second, cross-feed deduplication: when VirusTotal says malicious, AbuseIPDB says clean, and MISP has conflicting tags, the LLM can reconcile the signals and produce a weighted verdict with explicit uncertainty. Third, MITRE ATT&CK mapping: the LLM can map the enriched indicator to TTPs and techniques without manual analyst effort. Fourth, confidence-scored output: a well-prompted enrichment agent can output structured STIX with an explicit confidence level and the data sources behind it.

The TLP protocol governs how enriched intelligence can be shared. Most AI enrichment tools in 2026 produce TLP-AMBER output by default, requiring human review before the enriched indicator is promoted to TLP-WHITE for broader sharing. This is correct. Do not deploy enrichment pipelines that auto-share attribution above TLP-GREEN without human review.

The commercial enrichment stack, 2026

Anomali Lens

Browser extension that provides contextual IoC enrichment inline on any web page. Highlights known IoCs in threat reports, news articles, and security blogs. Good for analyst workflow; limited for automated pipeline enrichment. Pricing: included in Anomali ThreatStream subscription.

Best for: analyst-side contextual lookup. Not for: automated pipeline enrichment at scale.

Recorded Future Pathfinder

Grounded on the Intelligence Cloud's private data. Strong on actor attribution from Insikt Group research. Produces analyst briefs from IoC clusters. Caveat: attribution confidence exceeds data quality on thin-data indicators. Pricing: included in Core/Professional/Elite tiers.

Best for: teams already on RF, actor-attribution enrichment. Not for: hallucination-free YARA from IoC sets.

Mandiant Gemini in Threat Intelligence

Grounded on Mandiant's private DFIR research corpus and Frontline Intelligence feeds. Strongest on APT attribution where Mandiant has IR engagement history. Natural-language query against the knowledge graph is genuinely useful. Pricing: included in Mandiant Advantage Threat Intelligence module.

Best for: APT attribution, DFIR-adjacent enrichment. Not for: criminal-underground IoC enrichment (Mandiant's depth there is weaker than Intel 471).

Microsoft Security Copilot enrichment plugins

Enrichment plugins connect to Defender Threat Intelligence, VirusTotal, and external sources. NL query over the Microsoft Defender TI graph. Well-integrated for Microsoft-centric shops with E5 licensing. Pricing: Security Copilot ~$4/SCU/hr, requires E5 or standalone licence.

Best for: Microsoft-centric shops with E5. Not for: cross-vendor enrichment without Defender TI depth.

VirusTotal Code Insight

Google AI-powered analysis of file samples, scripts, and documents. Explains what a malicious file does in plain language. Strong for malware-focused enrichment of file and hash indicators. Pricing: included in VirusTotal Enterprise.

Best for: malware sample analysis, hash enrichment. Not for: network IoC (IP/domain) enrichment at depth.

CrowdStrike Charlotte AI

Enrichment integrated into the Falcon workflow. Strong on endpoint-telemetry-grounded enrichment (indicator correlation against Falcon sensor data). Significantly weaker on indicators without Falcon coverage. Pricing: included in Falcon Enterprise and Premium.

Best for: Falcon EDR customers enriching endpoint-sourced IoCs. Not for: non-Falcon teams or dark-web-sourced indicators.

False-positive reality

THE HARD TRUTH

LLMs hallucinate attribution. They will confidently tie an IP to a named threat actor based on pattern-matching a single forum post or a passing mention in a threat report. The confidence score is not calibrated to data quality. No commercial vendor publishes their false-positive attribution rate. This is not an oversight.

Research context: TAM-Eval (2024) tested LLM threat-actor attribution accuracy on a standardised IoC set and found accuracy rates of 60-75% on well-documented actors (where training data was rich) dropping to 30-45% on less-documented actors. MuTAP research on LLM-generated malware indicators documented similar hallucination rates on specific byte patterns. Mandiant's own false-positive disclosures in their Gemini-in-TI documentation warn explicitly that attribution outputs require human verification.

Mandatory mitigation patterns for production enrichment pipelines:

1.Require cited sources: every enrichment output must reference the source data that grounds the attribution claim. If the LLM cannot cite a source, the attribution is rejected.
2.Require confidence score with explicit low-data acknowledgement: the enrichment output must include a confidence level (High/Medium/Low) and, for Low, an explicit statement that the evidence base is insufficient for reliable attribution.
3.Never auto-block on enrichment-only signal: enrichment output is advisory, not actionable, until confirmed by a second source or human review. Auto-block workflows must have a second-source gate.
4.Human-in-the-loop on attribution above Medium confidence: Medium and above attribution claims go to analyst review queue before the enriched indicator propagates to block lists, SIEM rules, or sharing feeds.
5.Feedback loop: track enrichment outputs against human-analyst verdicts. Build a false-positive rate dashboard per enrichment tool. Set an SLA for improving tools that fall below 70% accuracy on attribution.

Reference architecture: OSS enrichment pipeline

The MISP + Cortex + LLM orchestrator pattern, verified working as of April 2026:

# OSS enrichment pipeline (April 2026)

1. MISP  <-- pulls from:
   - CIRCL OSINT feed (daily)
   - abuse.ch (URLhaus, MalwareBazaar, ThreatFox) (hourly)
   - AlienVault OTX (daily)
   - CISA KEV watchlist (daily)

2. MISP emits event --> Cortex (via webhook)

3. Cortex fires analysers per new indicator:
   - VirusTotal (hash, IP, domain, URL)
   - AbuseIPDB (IP)
   - URLscan.io (URL, domain)
   - Shodan (IP)
   - Greynoise (IP)
   - crt.sh (domain - cert transparency)
   - BGPview (IP - ASN context)
   - WhoisXML (domain - WHOIS history)

4. LLM agent (Claude Sonnet / GPT-5 via MCP):
   - reads Cortex output (JSON)
   - synthesises enrichment note
   - maps to MITRE ATT&CK TTPs
   - assigns confidence (H/M/L)
   - writes STIX-formatted note back to MISP

5. Human analyst reviews in MISP dashboard
   - Approve: indicator promotes to block list feed
   - Reject: logged with reason for feedback loop

Infrastructure: Cortex requires approximately 4 vCPU / 8GB RAM per 100 concurrent analyser threads. MISP single-instance on 8 vCPU / 32GB RAM handles moderate volume. LLM API cost at typical SOC enrichment volumes (500-2,000 indicators per day): $500-$1,500 per month using Claude Sonnet 4.5 or equivalent. Full stack on Hetzner: approximately $400-$800 per month. See open-source tools for the complete stack walkthrough.

Reference architecture: hybrid enrichment

For teams that want commercial data depth without Recorded Future Elite pricing, the hybrid pattern works well: Recorded Future Core feeds (commercial data depth) plus OpenCTI as the knowledge graph (STIX-native, modern UX) plus an LLM orchestrator for enrichment synthesis plus TheHive for case management.

Cost comparison at typical mid-market scale (8 analysts, 1,000 indicators/day):

Architecture	Year 1 cost estimate	Data depth
Pure OSS (MISP + OpenCTI + Cortex + LLM)	~$12k-$25k / yr	Open feeds + Cortex analysers; no commercial research
Hybrid (RF Core + OpenCTI + LLM)	~$80k-$140k / yr	Commercial feed depth + Insikt research + OSS graph layer
Full commercial (RF Professional)	~$120k-$250k / yr	Full commercial stack, full Pathfinder, brand + identity intel

Use the ROI calculator to model your specific team size and indicator volume against these architectures.

What to benchmark when evaluating

Enrichment latency

Target: Under 30 seconds

For alert-triage workflows. Background enrichment can be longer.

Attribution false-positive rate

Target: Under 15%

Test with a known IoC set. Ask the vendor for their internal FP rate. If they refuse, treat output sceptically.

Citation presence

Target: 100% of outputs

Every attribution claim must cite a source. No source = no attribution.

Human review hook

Target: Mandatory for Medium+

Tool must support a review queue before enriched output propagates.

STIX / TAXII compliance

Target: STIX 2.1 minimum

Required for interoperability with MISP, OpenCTI, and sharing feeds.

SIEM integration depth

Target: Native connector

API polling is acceptable; real-time push via connector is better.

FAQ

What is IoC enrichment?

IoC (Indicator of Compromise) enrichment is the process of taking a raw indicator (an IP address, domain, file hash, URL, or email) and adding threat context: known-malicious classification, threat actor attribution, related campaigns, MITRE ATT&CK technique mapping, confidence score, and TLP classification. Pre-AI enrichment used rule-based lookups against VirusTotal, AbuseIPDB, MISP, and ThreatFox. AI enrichment layers natural-language summarisation, cross-feed deduplication, narrative context from historical actor data, and judgement calls on conflicting signals from multiple feeds.

What is the false-positive rate for LLM IoC enrichment?

No vendor publishes false-positive attribution rates for LLM IoC enrichment. Independent research (TAM-Eval, adjacent MuTAP research) and Mandiant's own false-positive disclosures indicate that LLMs confidently attribute IoCs to named threat actors based on pattern-matching a single corroborating source, producing attributions that do not hold up to human analyst verification. The safe pattern: require cited sources in every enrichment output, require an explicit low-data confidence flag when the evidence base is thin, and never auto-block on enrichment-only signal. Human-in-the-loop for attribution above Medium confidence is a hard requirement.

Which commercial tools do AI IoC enrichment best?

In April 2026, the most capable commercial AI enrichment layers are: Recorded Future Pathfinder (grounded on the Intelligence Cloud's private data), Mandiant Gemini-in-TI (grounded on Mandiant's private research corpus), Microsoft Security Copilot enrichment plugins (grounded on Defender TI graph), and Anomali Lens (browser-extension based contextual enrichment). VirusTotal Code Insight (Google AI-powered analysis of files and scripts) is strong for malware-focused enrichment. The key differentiator across all of them: what private data is the LLM grounded on? A generic LLM with no grounding adds no value over asking ChatGPT.

Can I build an OSS IoC enrichment pipeline?

Yes, and it works well for most mid-market enrichment needs. The reference architecture: MISP pulls feeds (CIRCL, abuse.ch, OTX, CISA KEV). Cortex fires analysers per indicator (VirusTotal, AbuseIPDB, Shodan, URLscan.io, Greynoise, crt.sh, BGPview). A Claude or GPT-5 agent via MCP reads Cortex output and synthesises a STIX-formatted enrichment note back to MISP or OpenCTI, with confidence flag. Human analyst reviews on dashboard before any indicator promotes to block list. Infrastructure cost: $300-800 per month on Hetzner. LLM API cost: $500-1,500 per month at typical SOC enrichment volumes. Full walkthrough at the open-source tools page.

What enrichment metrics should I benchmark?

Six metrics matter for evaluating an AI enrichment tool: enrichment latency (seconds from indicator ingestion to enriched output, target under 30 seconds for alert-triage workflows), false-positive attribution rate (percentage of attribution claims that fail human analyst verification, target under 15%), citation presence (does every enrichment output include a citable source?), human-review hook availability (can the system flag low-confidence outputs for review before they propagate?), STIX and TAXII compliance (critical for interoperability with MISP and OpenCTI), and integration depth with existing SIEM. Vendors that refuse to share false-positive attribution rates should be treated sceptically.

Full OSS stack →AI SIEM correlation →Recorded Future →Agentic SOC buildout →