
Sign in to join the discussion.
A prompt injection hidden in a GitHub README was enough to compromise Snowflake's Cortex coding agent, bypass its human-approval system, escape its sandbox, and wipe a victim's entire Snowflake database. The attack, now patched, exposes structural vulnerabilities common to agentic AI systems far beyond Snowflake.
When Mistral unveiled Forge at Nvidia GTC on March 17, the headline was straightforward: enterprises can now train AI models from scratch on their own proprietary data, going further than fine-tuning or retrieval-augmented generation (RAG).[1] That framing is accurate but undersells the architectural ambition. Read Mistral's own documentation and a different picture emerges: Forge is less a customization service than a bet that reliable enterprise AI agents are only possible if the underlying model was trained on the organization's own knowledge in the first place.[2]
Most enterprise AI deployments today run generic foundation models, sometimes fine-tuned, sometimes augmented with RAG pipelines. Fine-tuning adjusts a model's weights using task-specific examples but leaves the base model's knowledge distribution largely intact. RAG layers company data on top at inference time, without touching weights at all. Both approaches assume the base model is good enough to reason about your environment - it just needs to be pointed at the right documents.
Mistral's argument is that this assumption breaks down for agents. When an AI agent must navigate internal systems, select the right tools, and make decisions constrained by organizational policy, a generic reasoning substrate produces generic errors: wrong terminology, misunderstood workflows, tool calls that don't account for internal abstractions. "Tool selection becomes more precise. Multi-step workflows become more reliable," Mistral's announcement states, but only when the model was trained on the organization's own codebases, compliance frameworks, and operational records - not on the public internet.[2]
Forge supports three stages of the model lifecycle to make this concrete: pre-training on large internal datasets to build domain awareness; post-training methods to refine behavior for specific tasks; and reinforcement learning to align agents with internal policies and evaluation criteria. The RL component is notable - it enables continuous improvement as enterprise environments evolve, rather than a one-time training run that ages out as regulations or systems change.[2]
Forge supports both dense transformer architectures and mixture-of-experts (MoE) models, giving enterprises a choice between maximum general capability and compute efficiency. MoE models can match the capability of a dense model at significantly lower inference cost - relevant for enterprises that want to run a large customized model in production without the latency or cost of a comparably sized dense network.[2] Multimodal training is also supported, meaning organizations with large libraries of engineering diagrams, technical schematics, or image-annotated records can incorporate non-text data into training.
Forge customers build on Mistral's open-weight model library, which includes smaller models like the recently released Mistral Small 4.[1] "The trade-offs that we make when we build smaller models is that they just cannot be as good on every topic as their larger counterparts," said Timothée Lacroix, Mistral co-founder and CTO, "and so the ability to customize them lets us pick what we emphasize and what we drop."[1] The model and infrastructure choices remain with the customer; Mistral advises but does not dictate.
One of the more striking design choices Mistral documents: Forge was built for AI agents to use directly, not just human engineers. Mistral's own Vibe agent can operate Forge autonomously - fine-tuning models, optimizing hyperparameters, scheduling training jobs, and generating synthetic data to improve evaluation scores - using plain English instructions.[2] The implication is that Forge is intended to sit inside a continuous AI development loop, not just serve as a one-time onboarding service.
That loop is reinforced by Forge's evaluation framework, which lets teams test models against internal benchmarks, compliance rules, and domain-specific tasks before deployment, then iterate using feedback from live operational workflows. For regulated industries where requirements change frequently - financial compliance, government policy, defense procurement standards - this matters more than any single model's initial benchmark performance.
Forge is also a positioning statement about data control. Enterprises building on Forge retain ownership of the trained model, can govern it using internal policy frameworks, and can deploy it within their own infrastructure rather than routing sensitive data through Mistral's cloud.[2] Early customers reflect exactly the sectors where this is non-negotiable: Singapore's DSO National Laboratories and Home Team Science and Technology Agency (HTX), the European Space Agency, Ericsson, and ASML - the Dutch semiconductor equipment maker that led Mistral's Series C at an €11.7 billion valuation.[1]
For government and defense clients, routing sensitive workloads through a third-party model API is often not a legal option. Forge's sovereignty framing gives Mistral a credible answer to procurement requirements that OpenAI and Anthropic, with their US cloud dependencies, cannot easily match in European and Asian government markets.
Forge also ships with Mistral's forward-deployed engineering (FDE) team embedded directly with customers - a model borrowed from Palantir and IBM, in which expert service and product are inseparable. "Understanding how to build the right evals and making sure that you have the right amount of data is something that enterprises usually don't have the right expertise for," said Elisa Salamanca, Mistral's head of product, "and that's what the FDEs bring to the table."[1] A company that has trained a production model on its own data, with Mistral's engineers embedded throughout that process, is not easily migrated to a competitor. The service relationship is the moat.
CEO Arthur Mensch says Mistral is on track to surpass €1 billion (approximately $1.1 billion) in annual recurring revenue this year, built almost entirely on enterprise clients while OpenAI and Anthropic led in consumer adoption.[1] Forge is the product that most clearly explains how: Mistral has no plausible path to winning on frontier benchmark scores against OpenAI's compute budget. Its path runs through sectors - European government, regulated finance, critical manufacturing infrastructure - where sovereignty, compliance, and deep domain specificity matter more than raw capability rankings.
The agentic framing sharpens that bet further. If the next two years see enterprises shift from AI assistants to AI agents embedded in operational workflows, the question of whether those agents reason correctly about internal terminology and constraints will become acute. Generic models may struggle. A model trained on forty years of ASML's semiconductor engineering documentation, fine-tuned on its internal tooling standards, and continuously improved with RL feedback from its own engineering teams will not.