REFERENCE / OPEN SOURCE

The full open-source threat-intel stack, April 2026

MISP, OpenCTI, TheHive, Cortex, YARA, Sigma, and an LLM orchestrator as a complete zero-vendor CTI stack. Real architecture, verified April 2026.

Last verified: April 2026 | All tools actively maintained as of April 2026

Why OSS, in 2026

Three legitimate reasons to run an OSS CTI stack in 2026: cost (a capable three-person SOC can run a full agentic CTI workflow on OSS at $300-$1,500 per month in infrastructure), transparency (no vendor controls your data pipeline or changes your feed terms unilaterally), and integration flexibility (OpenCTI's GraphQL API and MISP's TAXII interface integrate with anything; commercial platforms have proprietary integration walls).

Where OSS genuinely struggles compared to commercial: data depth. MISP plus abuse.ch feeds give you operational IoC data. They do not give you Insikt Group research, Intel 471 criminal-underground coverage, or Mandiant's IR-pedigree actor profiles. The data-depth gap is real and matters for advanced threat intelligence programs. For threat hunting against named APTs in specific sectors, commercial feeds are not optional.

Community sourced threat data via sharing communities (ISACs, sector-specific MISP groups, CIRCL OSINT) narrows the gap for organisations that qualify for and actively participate in sharing. ISAC participation is free for sector members and provides threat intelligence that approaches commercial feed quality in specific sectors.

The core stack

MISP

Malware Information Sharing Platform

~6.5k stars (Apr 2026)

CIRCL Luxembourg + community

Primary IoC-sharing platform. STIX and TAXII native, large active community, 30k+ organisations worldwide. Best for: community IoC sharing, ISAC participation, feed aggregation. The de facto standard for operational threat intelligence sharing.

OpenCTI

Open Cyber Threat Intelligence Platform

~6k stars (Apr 2026)

Filigran (commercial with Community Edition)

STIX2-native knowledge-graph platform. GraphQL API, modern UX, strong connector ecosystem. Best for: threat actor / campaign / TTP knowledge graph, analyst-facing intelligence store. Community Edition is free; Enterprise tier adds performance and support for larger deployments.

TheHive

TheHive Project

~3.5k stars (Apr 2026)

StrangeBee (commercial with Community Edition)

Incident response case management. Tight MISP integration. Supports multi-tenancy. Best for: MSSP case management, incident tracking, analyst workflow coordination. The Community Edition is production-ready for small to mid-size teams.

Cortex

Cortex Observable Analysis Engine

~1.2k stars (Apr 2026)

StrangeBee / TheHive Project

Analyser and responder orchestration engine. 200+ community analysers for IoC enrichment (VirusTotal, AbuseIPDB, Shodan, URLscan.io, Greynoise, and many more). Pairs tightly with TheHive. Best for: automated IoC enrichment pipeline, active-response automation.

YARA

YARA Pattern Matching Tool

~8k stars (Apr 2026)

VirusTotal / Google (original), community

Pattern-matching for file contents. Malware identification from strings, byte patterns, PE structure. Used across commercial (VirusTotal, Recorded Future) and OSS stacks. Best for: malware hunting, file-based IoC matching, endpoint scanning.

Sigma

Generic SIEM Signature Format

~8.5k stars (Apr 2026)

SigmaHQ community

Generic signature format that translates to any SIEM query language (Splunk SPL, Sentinel KQL, Elastic EQL, Chronicle YARA-L). 3,000+ community rules. Best for: SIEM-agnostic detection engineering, detection rule sharing across organisations.

Reference architecture with LLM orchestrator

The integrated stack, working in production as of April 2026:

# Full OSS agentic CTI stack (April 2026)

## Data ingress (MISP)
MISP pulls from:
  - CIRCL OSINT feed     (TAXII 2.0, requires free account)
  - abuse.ch URLhaus      (API, free)
  - abuse.ch MalwareBazaar (API, free)
  - abuse.ch ThreatFox    (API, free)
  - AlienVault OTX       (TAXII, free account)
  - CISA KEV              (JSON, free, no account)

## Knowledge graph (OpenCTI)
OpenCTI connector imports MISP events:
  - Converts to STIX2 objects
  - Builds relationships (IoC -> Campaign -> Actor -> TTP)
  - GraphQL API for analyst queries

## Analyser orchestration (Cortex)
Cortex fires on new indicators from MISP/OpenCTI:
  - VirusTotal (file hash, IP, domain, URL)
  - AbuseIPDB (IP reputation)
  - URLscan.io (URL/domain sandbox)
  - Shodan (IP infrastructure)
  - Greynoise (IP context: scanner vs targeted)
  - crt.sh (domain cert transparency)
  - BGPview (IP ASN context)
  - WhoisXML (domain registration history)
  - + 40 more community analysers

## LLM orchestrator (Claude Sonnet / GPT-5 / Llama 4)
Agent reads Cortex output (JSON), then:
  1. Synthesises enrichment note from analyser results
  2. Maps to MITRE ATT&CK TTPs
  3. Assigns confidence (High/Medium/Low with rationale)
  4. Writes STIX-formatted note back to OpenCTI
  5. Flags Low-confidence outputs for analyst review

## Case management (TheHive)
  - Promoted-to-incident findings from OpenCTI -> TheHive
  - Analyst reviews case in TheHive UI
  - Approves or rejects enrichment notes
  - Escalates to Cortex responder for response actions

## Detection output
  - YARA rules (LLM draft, analyst review, corpus test)
  - Sigma rules (LLM draft, schema validation, 7-day staging)
  - SIEM alert rules pushed via CI pipeline

## Infrastructure (April 2026 pricing)
  MISP:     Hetzner CPX31   8vCPU/32GB/500GB  ~$55/mo
  OpenCTI:  Hetzner CPX51  16vCPU/64GB/1TB   ~$165/mo
  TheHive:  Hetzner CPX21   4vCPU/8GB/160GB   ~$25/mo
  Cortex:   Hetzner CPX31   8vCPU/32GB/500GB  ~$55/mo
  Elasticsearch: ~$50-100/mo
  LLM API: Claude Sonnet 4.5 ~$500-1,500/mo
  -----------------------------------------------
  Total: ~$850-$1,900/mo at moderate SOC volume

The MCP (Model Context Protocol) bridges for connecting Claude and other LLMs to the MISP and OpenCTI APIs are maintained by the community as of April 2026. Check the Anthropic MCP registry and the MISP project GitHub for current connector status. The architecture above works with direct API calls if MCP bridges are not yet available for your specific LLM.

MISP vs OpenCTI: which first?

Start with MISP if:

Primary use case is IoC sharing with a community
Team participates in an ISAC or ISAO
Existing TAXII feed subscriptions
Small team, needs fast operational value
Sector-specific sharing group exists (FS-ISAC, H-ISAC, MS-ISAC)

Start with OpenCTI if:

Building an internal knowledge graph of actors and TTPs
Team needs a modern analyst-facing UI
STIX-native relationship modelling is a requirement
GraphQL API integration with existing tooling needed
MSSP use case requiring multi-tenancy

Most mature teams run both: MISP as the community feed ingestion layer, OpenCTI as the analyst-facing enriched intelligence graph. The integration is well-documented; OpenCTI has a native MISP connector that imports events and converts them to STIX2 objects.

Hardware / hosting sizing

Component	Min spec	Recommended	Hetzner (Apr 2026)
MISP (mid-size)	4 vCPU / 16GB / 200GB	8 vCPU / 32GB / 500GB	CPX31 ~$55/mo
OpenCTI (mid-size)	8 vCPU / 32GB / 500GB	16 vCPU / 64GB / 1TB	CPX51 ~$165/mo
TheHive + Cortex	4 vCPU / 8GB / 100GB	8 vCPU / 16GB / 200GB	CPX21+CPX31 ~$80/mo
Elasticsearch / PG	4 vCPU / 16GB / 200GB	8 vCPU / 32GB / 500GB	CPX31 ~$55/mo
Total (comfortable)	20 vCPU / 72GB	40 vCPU / 144GB / 2.2TB	~$355/mo

Prices: Hetzner public pricing, April 2026. AWS/GCP multiplier: approximately 2-3x. Self-hosted on co-lo hardware: approximately 0.3-0.5x at 1+ year depreciation.

LLM-generated YARA: does it work?

Honest verdict: workable as a draft accelerator, not a production-ready automation. LLMs can draft YARA rule skeletons from natural-language descriptions or sample binary analysis. The output is structurally valid YARA more often than not. The failure modes:

xHallucinated byte patterns: the LLM invents hex sequences that look plausible but are not present in the actual malware sample. Always validate against a known-malicious corpus.
xOverly broad string matching: the LLM includes common library strings that appear in both malicious and benign software, producing high false-positive rates.
xMissing PE structure context: YARA rules for PE files need to account for section offsets and import table structure; LLMs frequently get these wrong.
xVirusTotal Retrohunt cost risk: deploying untested YARA rules to VirusTotal Retrohunt costs credits; bad LLM-generated rules burn Retrohunt credits with no value.

Best practice: LLM drafts the rule skeleton. Human analyst validates strings against a corpus of at least 50 known-malicious samples and 200 clean-baseline binaries. Rule enters a PR workflow with test results attached. Only then does it deploy to endpoint scanning or Retrohunt. See the Awesome-YARA community repository on GitHub for reference rule patterns that help the LLM produce better drafts.

LLM-generated Sigma: does it work?

Better than YARA. Sigma's YAML structure is more tractable for LLMs. The primary failure mode is hallucinated field names (see the AI SIEM correlation page for the full Sigma workflow). With a schema-validation CI step (SigmaHQ provides validators for Splunk, Sentinel, Elastic, Chronicle), LLM-generated Sigma rules are a valid accelerator for detection engineering.

The SigmaHQ sigma-cli tool validates Sigma rules against product-specific schemas and converts them to the target SIEM query language. Adding this to a CI pipeline catches field-name hallucinations before the rule reaches staging. With this guard in place, the LLM-to-engineer-to-staging-to-production workflow reduces detection engineering time by 40-60% on well-understood TTP coverage gaps.

FAQ

Should I use MISP or OpenCTI?

Use MISP first if the team primarily shares and consumes IoCs within a community (ISAC, ISAO, sector-specific sharing groups). MISP is the dominant standard for community IoC sharing; 30k+ organisations worldwide use it. Use OpenCTI first if the team builds its own knowledge graph of threat actors, campaigns, and TTPs. OpenCTI's STIX2-native graph model is better for complex knowledge representation than MISP's event model. Most mature teams run both: MISP as the feed ingestion and community sharing layer, OpenCTI as the analyst-facing knowledge graph and enriched intelligence store.

Can an LLM generate YARA rules reliably?

Reliably is a high bar that LLMs do not yet meet for YARA. LLMs can draft rule skeletons that are structurally correct and contain plausible string patterns, but hallucinated byte patterns and overly broad string matches that produce unacceptable false-positive rates in production are common. The correct workflow: LLM drafts the rule skeleton from a natural-language description or malware sample analysis, human analyst validates the strings against a known-malicious sample corpus AND a clean-baseline corpus, rule enters a PR workflow with test results before deployment. Never deploy LLM-generated YARA rules without human validation and testing against a baseline corpus.

Can I build an agentic SOC on open source?

Yes, substantially. The OSS agentic SOC covers the enrichment and correlation layers well: MISP ingests feeds, Cortex fires analysers, an LLM agent (Claude API or local Llama 4) synthesises enrichment notes and drafts detection rules, TheHive manages cases with analyst review. What OSS cannot provide: the commercial feed depth (Insikt Group research, dark-web monitored by Intel 471 analysts), the managed triage products like Dropzone AI, and enterprise-grade SLAs. A three-person SOC team with one dedicated platform engineer can run a fully functional agentic CTI workflow on OSS at $300-$1,500 per month in infrastructure costs.

How much does the OSS stack cost to host?

Realistic infrastructure costs for the full MISP, OpenCTI, TheHive, Cortex stack at moderate SOC volume (500-2,000 indicators per day): MISP single instance (8 vCPU, 32GB RAM, 500GB SSD) on Hetzner CPX31 approximately $50-$80 per month. OpenCTI (12 vCPU, 64GB RAM, 1TB SSD minimum) approximately $150-$250 per month. TheHive plus Cortex (8 vCPU, 16GB RAM) approximately $50-$80 per month. Elasticsearch or PostgreSQL for data storage approximately $50-$100 per month. Total infrastructure: $300-$510 per month on Hetzner. On AWS or GCP, multiply by 2-3x. LLM API costs for enrichment at these volumes: $500-$1,500 per month using Claude Sonnet 4.5.

What STIX/TAXII feeds are available for free?

Free STIX and TAXII feeds in April 2026: CIRCL OSINT feed (TAXII 2.0, requires free CIRCL account), CISA Automated Indicator Sharing (AIS) TAXII feed (free for US entities), abuse.ch feeds (URLhaus, MalwareBazaar, ThreatFox - free API access), AlienVault OTX TAXII feed (free OTX account required), MISP threat sharing communities (sector ISACs/ISAOs - membership required but free for eligible organisations), Cybersecurity and Infrastructure Security Agency (CISA) KEV JSON feed (free, no account required). The CIRCL feed and abuse.ch feeds are the most valuable free sources for operational IoC data.

Can an LLM generate Sigma rules reliably?

Better than YARA. Sigma's YAML structure is more tractable for LLMs. GPT-5 and Claude Sonnet 4.5 can generate structurally valid Sigma rules from natural-language attack descriptions, with the primary failure mode being hallucinated log field names (inventing field names that do not exist in the target SIEM's data model). The mitigation: use a schema-validation CI step (SigmaHQ provides schema validators for major SIEMs including Splunk, Sentinel, Elastic, Chronicle) that catches field-name errors before the rule reaches staging. With schema validation, LLM-generated Sigma rules are a valid accelerator for detection engineering, not a replacement for the engineer.

AI IoC enrichment →Sigma + SIEM →Agentic SOC buildout →MSSP stack →ROI calculator →