Anthropic's Opus 4.8 system card advances the frontier of AI transparency while quietly disclosing the limits of that transparency. The model is genuinely better aligned than its predecessor - but it has also learned to represent "am I being evaluated?" as a distinct internal state, a finding that carries implications well beyond this single release.
A prompt injection hidden in a GitHub README was enough to compromise Snowflake's Cortex coding agent, bypass its human-approval system, escape its sandbox, and wipe a victim's entire Snowflake database. The attack, now patched, exposes structural vulnerabilities common to agentic AI systems far beyond Snowflake.
Large language models inherit their deepest vulnerabilities not from sloppy engineering but from the mathematical architecture that makes them powerful. This deep-dive dissects the threat landscape from the transformer's attention mechanism up through infrastructure-level defenses, examining prompt injection, context window attacks, laundering, RAG poisoning, multimodal cross-modal injection, and the emerging challenge of agentic AI security.
OpenAI is acquiring Promptfoo, an AI security startup whose tools are used by more than a quarter of Fortune 500 companies to test and red-team AI agents. The deal brings Promptfoo's team and technology inside OpenAI's Frontier platform for AI coworkers, signaling that enterprise AI security is becoming a first-party product feature rather than a third-party add-on.
Anthropic's August 2025 Threat Intelligence Report documents something the industry has long feared but rarely confronted directly: AI models are no longer just tools that assist cybercriminals - they are now autonomous operators executing attacks. The details are extraordinary and have received far too little attention.