WORKFLOW / THREAT HUNTING

AI threat hunting in 2026: hypothesis at scale, human verdict

What AI threat hunting actually does in production: hypothesis generation, query construction, result triage, and the cost-per-hunt-hour economics. Without the vendor marketing.

Last verified: May 2026. Independent reference. No vendor input.

What AI threat hunting actually is

Threat hunting is the proactive search for adversary activity in the environment, independent of alert-driven triage. The SANS 2024 Threat Hunting Survey found median time-per-hunt was 10 to 40 hours of analyst effort, with 30 to 50% of that time consumed by data gathering and query construction. The remaining time was spent on hypothesis formation, result interpretation, and documentation. AI threat hunting in 2026 targets the query-construction overhead and the documentation overhead, not the hypothesis or the judgement.

In production, the workflow looks like this. The hunter starts with a hypothesis: "I want to check whether any of our hosts have made DNS queries to recently registered domains in the past 30 days." An LLM-assisted workflow converts the hypothesis into a SIEM query (KQL for Sentinel, SPL for Splunk, YL for Chronicle), runs it, summarises the result set, and proposes follow-up queries (filter to domains registered in the past 7 days; correlate against your asset inventory; check whether any of the source hosts have other suspicious indicators). The hunter remains in the loop on every decision; the LLM is the friction-removal layer.

This is different from agentic SOC triage, where the agent acts autonomously on alerts. Hunting is hypothesis-led; triage is alert-led. The agentic layer for hunting is consistently semi-autonomous in 2026: the LLM proposes, the human decides. Fully autonomous hunting where the LLM forms hypotheses, executes queries, draws conclusions, and acts on them without human review is not a deployed pattern in any credible vendor or open-source stack as of 2026.

For the broader agentic SOC architecture context, see agentic SOC buildout. Hunting sits in the hypothesis layer, not the triage or response layer.

Hypothesis generation: what works

The LLM-assisted hypothesis generation pattern reads recent intelligence sources and proposes hunt hypotheses your environment may have detection gaps for. Sources commonly fed in: CISA Known Exploited Vulnerabilities catalogue, CISA advisories, vendor threat reports (Microsoft Threat Intelligence Center, Mandiant M-Trends, CrowdStrike Global Threat Report), MITRE ATT&CK technique updates, recent publicly disclosed breaches in your sector.

The LLM cross-references the intelligence against your stated detection coverage (commonly expressed as a MITRE ATT&CK Navigator layer) and proposes hunts for techniques you do not currently detect. A typical output: "Volt Typhoon has been observed using DCSync against AD domain controllers in critical infrastructure environments. Your detection layer covers DCSync attempts on the Domain Admins group but not on Enterprise Admins. Hunt suggestion: search for DCSync logs originating from non-Tier-0 hosts targeting EA accounts in the past 90 days."

The quality of hypothesis output depends heavily on the input intelligence quality and on the prompt engineering. Generic prompts produce generic hypotheses; environment-specific prompts that include asset inventory context and detection coverage data produce environment-specific hypotheses. The pattern works best when the team commits to maintaining a current ATT&CK Navigator layer of detection coverage; without it, the LLM has no grounding for the "do we cover this" question.

Query templating: where the friction lifts

Query templating is where the analyst-time savings concentrate. Constructing a Splunk SPL search or a Sentinel KQL query for a non-trivial hunt has historically required either familiarity with the SIEM's query language or substantial trial-and-error against the schema. LLMs in 2026 (Claude Sonnet 4.5, GPT-4o, Gemini 2.0) are competent at translating natural-language hunt hypotheses into SIEM query syntax, with schema awareness when given access to the index or table definitions.

The LLM-generated query is not always correct on first attempt. Common failure modes include hallucinated field names, incorrect time-range syntax, missing required filters that explode the result set, and overly broad searches that hit query timeouts. The mitigation is to run the LLM-generated query in a low-cost preview mode first (Splunk has the "use_eav" toggle, Sentinel has the time-range pre-check), then refine based on the result count before running the full query. Most LLM workflows now build this iteration loop in by default.

A common pattern is the LLM-then-deterministic-validator workflow. The LLM proposes a Sigma rule or a SIEM query; a deterministic linter (Sigma converter, SIEM query syntax validator) catches the obvious errors before the human reviews. This catches hallucinations cheaply and frees the human to focus on logic and intent.

For the SIEM-correlation pattern more broadly, see AI SIEM correlation. The patterns overlap with hunting but the alert-driven workflow has different cost and risk characteristics.

Detection rule drafting from hunt findings

The output of a successful hunt is often a new detection rule that codifies the pattern for ongoing alerting. The LLM-assisted workflow extends to drafting the detection rule from the hunt query: convert the SPL or KQL into a Sigma rule that can be deployed across multiple SIEMs, write the description and false-positive notes, and propose the alert severity based on the hunt findings.

The validate-before-deploy gate is non-negotiable. Common failure modes for LLM-generated detection rules include rules with no false-positive guidance (the analyst on call will not know how to triage), rules with overly broad conditions that generate alert fatigue, and rules that reference fields that do not exist in your specific schema. The recommended workflow is LLM-drafts, human-reviews, CI-validates-syntax, deploy-to-staging, soak-test-for-false-positives, promote-to-production. Compressing this loop creates more alert fatigue than the rule prevents.

For the false-positive cost angle, see ping fatigue on the sister site pingfatigue.com. The cost of a bad detection rule is paid in analyst attention, not in vendor invoices.

Cost model per hunt hour

The cost model that matters is dollars per hunt hour saved. A senior threat hunter in the US in 2026 runs $150,000 to $250,000 per year fully loaded; in Europe $90,000 to $160,000; in lower-cost regions less. At a fully loaded hourly rate of $100, a 10-hour hunt costs $1,000 in analyst time. If AI-assisted hunting reduces the time-to-completion by 40%, the per-hunt saving is $400.

The LLM API cost is small relative to the savings. A complex hunt with multiple LLM-assisted query iterations consumes 30,000 to 100,000 input tokens and 10,000 to 30,000 output tokens. At Claude Sonnet 4.5 rates ($3 per million input, $15 per million output), the LLM cost per hunt is $0.30 to $0.75. The marginal cost is rounding error against the analyst-time saving.

The vendor-product cost is where the calculation becomes more complex. Security Copilot at $4 per SCU per hour, applied to hunting workflows, can consume 10-30 SCUs per active analyst per day; at that rate a 5-hunter team using Security Copilot daily consumes 50-150 SCUs per day or roughly $50,000 to $200,000 per year. The ROI is positive if the team completes 3-4x more hunts at equivalent depth, but the absolute cost has crowded out OSS-first approaches at many cost-conscious shops.

For the broader budget context including SOC headcount and SIEM cost, see securityoperationscost.com and siemcostcalculator.com.

Vendor and OSS options

Microsoft Security Copilot

$4 / SCU / hr

Microsoft-centric SOCs with Sentinel and Defender XDR. KQL query generation, alert triage, hypothesis drafting from Microsoft Threat Intelligence Center.

Splunk SOC Assistant (Cisco)

Bundled with Splunk ES, AI pricing TBD

Splunk Enterprise Security customers. SPL query generation, hunt template library.

Chronicle Duet AI for Security

Bundled with Chronicle SecOps

Google Cloud SecOps customers. YL query generation, alert summarisation.

Dropzone AI

$10,000 - $30,000 / analyst / yr

Agentic SOC platform with hunting-style hypothesis generation. SIEM-agnostic.

Prophet Security

Per-analyst licensing

Agentic SOC with focus on alert investigation; hunting capabilities expanding in 2026.

OSS (Claude API plus Sigma plus MITRE Navigator)

$100 - $1,000 / analyst / mo (LLM API)

Cost-conscious or vendor-agnostic teams. Maximum flexibility, requires platform engineering investment.

FAQ

What does AI threat hunting actually mean?

AI threat hunting in production in 2026 means three concrete things. First, hypothesis generation: an LLM consumes recent intelligence (CISA advisories, vendor reports, MITRE ATT&CK updates) and proposes hunt hypotheses your environment may not have detection for. Second, query templating: the LLM converts a natural-language hypothesis into a SIEM query (Splunk SPL, Sentinel KQL, Sumo, Chronicle YL) the analyst can run. Third, result triage: when the query returns events, the LLM summarises the result set and proposes follow-up queries. The analyst remains in the loop on hypothesis selection and verdict; the LLM removes the friction of query construction.

Can AI replace a threat hunter?

No, not in 2026. AI removes the query-construction tax that consumed 30-50% of a typical hunter's time (per SANS 2024 Threat Hunting Survey) and frees that time for hypothesis quality and result interpretation. The hunter's domain knowledge, environment context, and judgement on what counts as suspicious remain irreplaceable. Hunters using LLM-assisted workflows in 2026 report 2-3x more hunts completed per quarter with equivalent or better depth.

What does AI threat hunting cost in 2026?

Three cost components. LLM API cost for hypothesis generation and query templating is typically $100 to $1,000 per analyst per month at production volumes (Claude API or Azure OpenAI Service). Vendor add-ons that bake hunting AI into existing platforms (Splunk SOC Assistant, Sentinel Security Copilot, Chronicle Duet AI for Security) layer on $4 per Security Compute Unit hour or equivalent token billing. Dedicated agentic hunting tools (Dropzone AI, Prophet Security) typically license per-analyst at $10,000 to $30,000 per year. Total stack cost for a 5-hunter team commonly runs $50,000 to $300,000 per year depending on platform choice.

Does AI hallucinate detection rules?

Yes, occasionally. LLMs occasionally fabricate field names that do not exist in the schema, reference MITRE ATT&CK technique IDs that have been deprecated, or hallucinate Sigma rule syntax. The mitigation is the same as for any LLM-generated code: never deploy without review, run candidate detection rules in a non-production SIEM workspace first, and have a CI step that validates Sigma syntax before merge. Many teams use a YARA-then-LLM or Sigma-then-LLM pattern where the LLM proposes and a deterministic linter validates before any deployment.

What OSS alternatives exist for AI threat hunting?

The OSS path in 2026 combines several components. Claude API or Azure OpenAI for the LLM layer. SOC Prime Marketplace or Sigma rule corpus for detection-content baseline. MITRE ATT&CK Navigator and CTID published research for hypothesis seeds. CISA known-exploited-vulnerabilities catalogue for prioritisation context. A custom orchestration script (Python or LangChain) that ties these into the analyst workflow. This stack delivers most of the AI threat hunting capability of commercial agentic SOC tools at meaningfully lower cost; the trade-off is operations effort to build and maintain the orchestration layer.

MITRE ATT&CK mapping →SIEM correlation →Agentic SOC buildout →Microsoft Defender TI →SIEM cost →