Technical Overview of the Anthropic AI Espionage Attack for SOC Teams

The Anthropic AI espionage case proves attackers trust autonomous agents. To counter machine-speed threats, SOCs must adopt and trust AI to handle 90% of the defense workload.

Ely Abramovitch

November 18, 2025

0 min read

The first publicly documented, large-scale AI-orchestrated cyber-espionage campaign is now out in the open. Anthropic disclosed that threat actors (assessed with high confidence as a Chinese state-sponsored group) misused Claude Code to run the bulk of an intrusion targeting roughly 30 global organizations across tech, finance, chemical manufacturing, and government.

This attack should serve as a wake-up call, not because of what it is, but because of what it enables. The attackers used written scripts and known vulnerabilities, with AI primarily acting as an orchestration and reconnaissance layer; a "script kiddy" rather than a fully autonomous hacker. This is just the start.

In the near future, the capabilities demonstrated here will rapidly accelerate. We can expect to see actual malware that writes itself, finds and exploits vulnerabilities on the fly, and evades defenses in smart, adaptive ways. This shift means that the assumptions guiding SOC teams are changing.

What Actually Happened: The Technical Anatomy

The most critical takeaway from this campaign is not the technology used, but the level of trust the attackers placed in the AI. By trusting the model to carry out complex, multi-stage operations without human intervention, they unlocked significant, scalable capabilities far beyond human tempo.

1. Attackers “Jailbroke” the Model

Claude’s safeguards weren’t broken with a single jailbreak prompt. The actors decomposed malicious tasks into small, plausible “red-team testing” requests. The model believed it was legitimately supporting a pentest workflow. This matters because it shows that attackers don’t need to “break” an LLM. They just need to redirect its context and trust it to complete the mission.

2. AI Performed the Operational Heavy Lifting

The attackers trusted Claude Code to execute the campaign in an agentic chain autonomously:

Scanning for exposed surfaces
Enumerating systems and sensitive databases
Writing and iterating exploit code
Harvesting credentials and moving laterally
Packaging and exfiltrating data

Humans stepped in only at a few critical junctures, mainly to validate targets, approve next steps, or correct the agent when it hallucinated. The bulk of the execution was delegated, demonstrating the attackers’ trust in the AI’s consistency and thoroughness.

3. Scale and Tempo Were Beyond Human Patterns

The agent fired thousands of requests. Traditional SOC playbooks and anomaly models assume slower human-driven actions, distinct operator fingerprints, and pauses due to errors or tool switching. Agentic AI has none of those constraints. The campaign demonstrated a tempo and scale that is only possible when the human operator takes a massive step back and trusts the machine to work at machine speed.

4. Anthropic Detected It and Shut It Down

Anthropic’s logs flagged abnormal usage patterns, disabled accounts, alerted impacted organizations, worked with governments, and released a technical breakdown of how the AI was misused.

The Defender’s Mandate: Adopt and Trust Defensive AI

Attackers have already made the mental pivot, treating AI as a trusted, high-velocity force multiplier for offense. Defenders must meet this shift head-on. If you don't adopt defensive AI, you are falling behind adversaries who already have.

Defenders must further adopt AI and trust it to carry out workflows where it has a decisive advantage: consistency, thoroughness, speed, and scale.

1. Attack Velocity Requires Machine Speed Defense

When an agent can operate at 50–200x human tempo, your detection assumptions rot fast. SOC teams need to treat AI-driven intrusion patterns as high-frequency anomalies, not human-like sequences.

2. Trust AI for High-Volume, Deterministic Workflows

Existing detection pipelines tuned on human patterns will miss sub-second sequential operations, machine-generated payload variants, and coordinated micro-actions. Agentic workloads look more like automation platforms than human operators.

Defenders need to accept the uncomfortable reality that manual triage for these types of intrusions is pointless. You need systems that can sift through massive alert loads, isolate and contain suspicious agentic behavior as it unfolds.

This is where the defense’s trust must be applied. Only the genuinely complex cases should ever reach a human. The SOC must delegate and trust AI to handle triage, investigation, and response with machine-like consistency.

3. “AI vs. AI” is No Longer Theoretical

Attackers have already made the mental pivot: AI is a force multiplier for offense today. Defenders need to accept the same reality. And Anthropic said this out loud in their conclusion:

“We advise security teams to experiment with applying AI for defense in areas like SOC automation, threat detection, vulnerability assessment, and incident response.”

That’s the part most vendors avoid saying publicly, because it commits them to a position. If you don’t adopt defensive AI, you’re falling behind adversaries who already have.

Where SOC Teams Should Act Now

Build Detection for Agentic Behaviors

Start by strengthening detection around behaviors that simply don’t look human. Agentic intrusions move at a pace and rhythm that operators can’t match: rapid-fire request chains, automated tool-hopping, endless exploit-generation loops, and bursty enumeration that sweeps entire environments in seconds. Even lateral movement takes on a mechanical cadence with no hesitation.

These patterns stand out once you train your systems to look for them, but they won’t surface through traditional detection tuned for human adversaries.

Make AI a Core Strategy, Not an Experiment

Start thinking of adopting AI to fight specific offensive AI use cases, whilst keeping your human SOC on its routine.

Defenders have to meet this shift head-on and start using AI against the very tactics it enables. The volume and velocity of these intrusions make manual triage pointless.

You need systems that can sift through massive alert loads, isolate and contain suspicious agentic behavior as it unfolds, generate and evaluate countermeasures on the fly, and digest massive log streams without slowing down. Only the genuinely complex cases should ever reach a human. This isn’t aspirational thinking; attackers have already proven the model works.

Key Takeaway

For SOC teams, the takeaway is that defense has to evolve at the same pace as offense. That means treating AI as a core operational capability inside the SOC, not an experiment or a novelty.

The Defender’s AI Mandate: Trust AI to handle tasks where it excels: consistency, thoroughness, and scale.

The Defender’s AI Goal: Delegate volume and noise to defensive AI agents, freeing human analysts to focus only on genuinely complex, high-confidence threats that require strategic human judgment.

Legion Security will continue publishing analysis, defensive patterns, and applied research in this space. If you want a deeper dive into detection signatures or how to operationalize defensive AI safely, just say the word.

‍

Demand for agentic security that actually works in complex enterprise environments has never been higher, and today we're excited to take a meaningful step forward in meeting it

We're excited to announce that Legion Security has partnered with Optiv to become an Authorized Partner to help enterprises stop talking about the same-old-problem, and start putting AI to work. Security teams are under pressure that doesn't need a lot of explaining. Analysts, engineers, and practitioners are being asked to do more with less; more alerts, more tools, more threat surface, and fewer people to manage it all. AI was supposed to be the great equalizer, and the promise of the AI SOC was compelling: automate the noise, free up your people, let machines handle the volume.

The reality has been… more complicated.

Most AI security tools were built generically for a generic security team in a generic enterprise. One problem with this is… what is an average security team? Every large organization has processes that are entirely their own: workflows built around a specific stack, custom tools that were built and tuned over long stretches, tribal knowledge accumulated over years, investigation procedures tuned to their environment, their risk tolerance, their regulators, their customers.

Heavy API integrations try to stitch it together but end up slow, brittle, and context-poor (at best). And agents that operate inside a black box create exactly the kind of trust deficit that makes security leaders hesitate to hand anything off at all.

This is the gap Legion was built to close.

A Different Approach to Agentic Security in the Enterprise

The premise of Legion is straightforward: nobody knows your security operations like you do. Our platform doesn't arrive with assumptions about how your team should work. Instead, it observes and learns from how your team actually works; across your tools, your workflows, your most repetitive processes and your most bespoke ones, and then uses that knowledge to build optimized AI agents that operate within the context of your organization.

We don’t require integrations for full contextual awareness. We’re an open book (no black box) that leans on our browser-based approach to see what your analysts see and do, learns what they know, and earns YOUR trust before taking action.

The result is agentic security that can actually scale in the enterprise — not by replacing how teams work, but by amplifying it.

The Imperative for Partnering with Optiv

Becoming an Optiv Authorized Partner matters because of what Optiv represents to the enterprise security buyer. Optiv works with organizations that have mature, complex security programs; exactly the kind of environment where Legion's approach of learning from bespoke processes is most valuable.

Enterprise security leaders look to trusted advisors to help them evaluate fit, plan implementation, and optimize outcomes over time. Optiv's position in the market as an integrator with deep relationships and deep domain expertise makes them uniquely positioned to bring best-in-breed solutions to the organizations that need it most and to help them get maximum value from it.

This partnership reflects something we're hearing consistently in the market: enterprises want agentic security, but they want it on their terms. They want AI that understands their environment before it acts in it. They want partners who can help them think through where automation should start, how to build confidence in the system over time, and how to expand from their first use cases into a broader program.

That's exactly what this partnership is designed to deliver.

What It Signals More Broadly

The Optiv partnership is a data point in a larger trend. Channel partners; the integrators, MSSPs, and advisors who sit closest to enterprise security buyers, are increasingly being asked about agentic security. Their clients want to know what's real, what's ready, and what actually works in complex environments.

For Legion, this is an important milestone in building the ecosystem that enterprise agentic security requires. We're grateful to the Optiv team for their partnership and excited about what we'll build together. And for enterprise security leaders who have been watching the agentic security space and wondering what a path to trusted AI adoption actually looks like, we'd love to show you.

Interested in learning how Legion Security and Optiv can help your organization automate, scale, and elevate your security posture? Get in touch.

‍

Legion and Optiv Partner to Deliver Agentic Security That Understands How Enterprises Work

June 29, 2026

min read

Legion Security is now an Optiv Authorized Partner. Enterprise security teams can now deploy agentic AI for security operations that understands and optimizes agentic workflows without integrations, black boxes, or needing to ask teams to change how they work.

Marcia Dempster

I was there, I sat in every SOC seat out there…

A SOC analyst grinding through alert queues at 2am. Part of an Incident Response team leading running war rooms. A SOC manager in Monday morning stand-ups asking what we learned this week while staring at blank faces.

Every single role. Every single day. And the one thing that never changed across any of them?

The insights, recommendations, self improvement, the de-facto SOC continuous improvement action items were disappearing. Seating documented in a case log for no one to action upon, trapped inside closed tickets that live in a backlog nobody rarely reopens.

I know the why and I feel the overwhelming operations, which is why I’m offering a practical solution for how to continuously improve your SOC with the valuable insights coming out of your investigations.

The Hidden Goldmine You're Sitting On

Every ticket your team closes tells a story. It's not just that an alert fired, then an analyst investigated and eventually closed. There are powerful signals buried in those notes, whether it's a tool with overly noisy alerts, a gap in your email gateway rules, or the same user clicking a phishing link for the third month in a row.

Your tier 1 all the way to your tier 5 analysts and IR responders are generating intelligence every single shift and with every single incident. They know things and they're writing them down. It's useful information but these notes get buried and never read again.

It's a sad truth... I know because I've been in those weekly SOC meetings, I was running them.

It's not a people problem, rather, it's a system problem.

The Weekly Report Trap

The thing people look to as the standard fix is the weekly report. In theory it's elegant: senior analysts summarize the week, extract the learnings, feed them back into tier 1 runbooks and detection improvements. On paper, it's a proper feedback loop.

In practice, it becomes the task that either gets rushed on Friday afternoon or simply doesn't happen. It's for good reason too! Your senior analysts are already stretched because on top of everything they need to do for their jobs, they're also being asked to synthesize everything in themes. You either get a half-hearted copy-paste of ticket titles, or, more likely, you get nothing.

Teams try rotation where everyone takes a turn on the ferris wheel. But in doing so, you face losing important insights and information, not to mention a lack of consistency.

Now add a follow-the-sun operation to this. APAC closes tickets while EMEA is asleep. EMEA handles incidents while Americas is offline. By the time anyone tries to compile a summary, they're working with fragments. Nobody has the full picture. The patterns that only emerge when you look across all shifts stay invisible.

Wait, Can't AI Can Solve This Pretty Easily?

When capable LLMs became available, I thought this was finally solved. Just feed all the investigation summaries in, ask for a weekly report. Done? Not so fast... here's what actually happened.

First attempt: I gave the best LLM models that money can buy more than 250 investigation summaries and asked for a consolidated report. But what I got back was a mess.

What I saw were recommendations repeated five times just with slightly different wording. Severity assessments that made no sense and my “favorite” recommendations that are not feasible, for example “Tune your EDR machine learning to reduce false positives of macro xlsx files”.

No traceability whatsoever, no way to tie anything back to the original investigation and forget about cross referencing with similar recommendations.

Second attempt: I went deep on prompt engineering. Longer prompts. More detailed. With examples. The results improved marginally, but the ceiling was surprisingly low.

The fundamental issue is that when you dump a large context with complex requirements into a single LLM call, it can't hold everything in working memory. It forgets constraints from earlier in the prompt. It hallucinates connections between unrelated incidents. Severity levels come out inconsistent.

One-shot approaches get you mediocre fast. They don't get you useful.

The Breakthrough: Think Multi-Step, Not Prompt

The shift that changed everything was stopping thinking about this as one task and starting to think about it as a multi-step pipeline.

When an experienced analyst writes a weekly report, they don't try to do it all at once. They read, they group, they prioritize, they write. Multiple steps. Each one is different.

So I built it that way.

The 6-step pipeline

Step 1: Classification

The first step does one thing and one thing only. It extracts and categorizes recommendations from raw investigation summaries. It looks for whatever your analysts call them: Recommendations, Do Better, Action Items, Next Steps. It pulls each one out and assigns it to a category: detection, prevention and process improvements.

No dedupe. No severity. Just extraction, done well.

Step 2: Feasibility Assessment

Now we evaluate each recommendation against practical reality. Can this actually be implemented? Is it a quick win or a multi-quarter project? Does it require resources you don't have?

This is also where web search earns its keep. When a recommendation references a specific product or vendor, the model can look up current best practices, product documentations, tech community discussions and verify the suggested configuration actually exists and is supported. Without this, you get generic, often infeasible advice. With it, you get grounded recommendations.
Make sure to use an LLM model that has web search capability via API calls.

Step 3: Citation Attachment

Before touching deduplication, every recommendation gets linked back to its source investigation. This is non-negotiable for a report anyone will actually act on. When a SOC manager reads and SOC teams attempt recommendation implementation, they need to know which investigations triggered that and value with volume justification to it. Otherwise it's just noise or worse, it might break business operations.

Step 4: Deduplication

Three analysts working three separate investigations but same use case, all recommend the same prevention improvement. Without deduplication, you get three entries saying the same thing with slightly different wording. With it, you get one consolidated recommendation that shows it came from three independent investigations, which is actually a stronger signal.

Citations from all source recommendations get merged. Nothing is lost.

Step 5: Severity Classification

Now, with duplicates consolidated, we can assign severity levels that actually mean something. The model evaluates security impact per your instructions, weights and SOC defined severities for each use case. Not how urgent did the analyst feel when writing this, but what is the actual risk if this doesn't get addressed built on your SOC knowledge base.

Separating this from extraction forces objectivity. If you try to assign severity while also pulling recommendations from raw notes, the analyst's tone bleeds in and skews the assessment.

Step 6: Report Generation

Everything feeds into the final structure. The model has category breakdown, feasibility assessments, severity levels, citation references. It produces a coherent report with an executive summary and recommendations sorted by severity, with enough context to actually act on. Also comparing recommendations week on week to get remediation/implementation progress for repeated action items.

Add another layer of disregard recommendations and you have a magnificent mechanism.

No LLM at this stage, actually. It's programmatic and deterministic. It assigns citation letters for easy grounding and reference of recommendation with feasibility (A, B, C...), builds the reasoning section for each recommendation, and outputs clean JSON ready for whatever you want to do with it.

Why This Architecture Actually Works

The goal is to achieve focused context at each step. Instead of one massive prompt juggling ten objectives, each step gets only what it needs. Fewer constraints to forget.

Modular iteration is the name of the game here. When severity ratings were inconsistent, I refined only the severity prompt. When analysts switched from Recommendations to Do Better as their section header, I updated only the classification step and nothing else broke.

Inspectable intermediate outputs. Between every step, results are saved. If something looks wrong in the final report, you can trace back through the pipeline and find exactly where it broke. Debugging is possible, which is not nothing.

Web search in the right place. Not as a general capability, but specifically in the feasibility step where it does the most work. Validating that a recommended configuration actually exists changes the quality of the output completely.

The Payoff

Your analysts don't change anything, they can run the same investigations, keep the same ticket notes they're already writing. The pipeline simply runs against their existing documentation.

The output is consistent. Same structure, same categories, same severity criteria, every week. You can compare week over week and actually spot trends. You can see if the same recommendations keep surfacing, which means they're not getting actioned, which is itself a signal.

The feedback loop that should have existed closes automatically. Tier 2 findings reach tier 1. Detection gaps surface. The Monday morning question about what we learned has an answer.

Build it or use it

Building this right takes time. Getting prompts tuned for the variety in how analysts write, handling edge cases, making it robust across different ticketing systems. It's not weekend work.

If you want to build it yourself: start with extraction only. Get that reliable first. Then add deduplication. Then severity. Don't try to build the whole thing at once.

If you'd rather not build tooling while also running a SOC, this is exactly what we built at Legion Security. Already tuned across real SOC environments, connected to your existing ticketing system, your analysts change nothing.

Either way: stop burying the intelligence your team generates every day.

Your team is learning constantly. Those lessons deserve to surface.

Written by someone who's been the analyst, the IR lead, and the manager staring at the empty Monday morning whiteboard.

‍

SOC

How to Keep Up With Never-Ending SOC Continuous Improvement

June 22, 2026

min read

SOC continuous improvement fails when insights get buried in closed tickets. Learn a 6-step LLM pipeline that turns investigation notes into action.

Yaniv Menasherov

Legion Security is Now Available on Google Cloud Marketplace

Security operations were built around human investigators. Skilled analysts, working manually across dozens of tools, piecing together evidence, making judgment calls, closing cases. But as alert volumes outpaced human capacity, institutional knowledge became a bottleneck, and the complexity of the modern enterprise made scaling impossible. The industry responded with more headcount, more tools, more automation. None of it solved the fundamental problem.

Legion introduces a different operating model entirely.

What Legion Does

Legion observes how your analysts operate when running real investigations, learning your organizational context, tools, past cases, playbooks, runbooks and all other tribal knowledge in order to understand what an optimal investigation looks like for your environment. This is then turned into an easily editable and audible workflow which can be automated when you’re ready. Powered by Google Cloud's Gemini models, each workflow is executed by AI agents that reason through the evidence and provide a verdict and even remediate. This is all accomplished with no manual playbook writing or need to document predefined rules.

But legion goes well beyond workflow creation. As Legion builds trust in its performance, teams can choose to keep a human in the loop to approve every decision or have Legion operate fully autonomously reducing MTTR eliminating MTTA, allowing analysts to focus on more novel investigations that are becoming more and more common in the world of AI.

Memory: The Compounding Advantage

Every investigation Legion conducts makes it smarter. A persistent memory layer continuously captures context from previous cases, your SOC knowledge base, and direct analyst feedback, feeding all of it back into future investigations and decisions. Institutional knowledge that once lived in the heads of your most experienced analysts becomes a permanent, improving organizational asset. The more Legion works, the better it gets. That's not a feature. That's a compounding strategic advantage.

Zero Integrations. Immediate Value.

Most security automation platforms fail at the same hurdle: integrations. Enterprises face months of API work, custom connectors, and professional services before anything runs in production, or are forced to adopt entirely new tools and processes, something most complex enterprises simply can't do.

Legion operates natively in the browser, which means it works across your entire security stack, from threat intel platforms to legacy internal tools, without any API configuration. If your analysts can open it in a browser, Legion can learn from it, generate workflows from it, and execute investigations through it.

Proven Results at Scale

The impact Legion delivers isn't theoretical:

As the head of Security at Virgin Money put it, Legion is “like evolving from handcrafted systems to precision manufacturing aligned to our flow (except) faster, repeatable and secure”.

Legion works with the worlds largest enterprises and delivers strong results:

A large insurance organization automated 24,000 investigations and cut mean time to respond from 20 minutes to 2 minutes.
WELL Health Technologies reduced investigation times by 81%, allowing existing analysts to handle significantly higher alert volumes without additional headcount.
The University of Tulsa cut investigation times in half, enabling their team to overcome capacity limits with the staff they already had.

Across deployments, Legion reduces mean time to investigate by up to 85% and response times by up to 90%.

Built on Google Cloud

Legion's integration with Google Cloud goes deeper than the Marketplace listing. The platform runs on Google Cloud infrastructure and leverages Gemini models to power its AI reasoning, combining Legion's browser-native architecture with Google Cloud's security, scale, and model quality.

For organizations already invested in Google Cloud and Google SecOps, Legion extends that ecosystem directly into the analyst workflow.

Who It's For

Legion is purpose-built for enterprise security operations teams, CISOs, VPs of Information Security, SOC Directors, and Security Operations Managers at organizations running in-house SOCs. If your team is dealing with any of the following, Legion was built for you:

Alert volumes that have outpaced your team's capacity
Analyst burnout from manual, repetitive investigation work
Institutional knowledge that walks out the door when senior analysts do
Automation gaps caused by complex integration requirements

Available Now on Google Cloud Marketplace

Legion Security is available today on Google Cloud Marketplace, allowing customers to apply their spend toward their annual Google contract and simplify procurement. For security teams ready to move beyond the limits of traditional operations, this is where that transformation begins.

‍

Engineering

Legion Security Is Now Available on Google Cloud Marketplace

May 31, 2026

min read

Legion is officially on the Google Cloud Marketplace.

Gili Diamant