
Sign in to join the discussion.
There is a reasonable intuition that transparency in AI should track capability: the more powerful and widely deployed a model, the more its developers should disclose about how it was built, what it was trained on, and what risks it poses. The 2025 Foundation Model Transparency Index, a joint project from researchers at Stanford, UC Berkeley, Princeton, and MIT, finds that the opposite is true.[1] Average disclosure scores across major AI developers fell from 58 to 40 in a single year - erasing the gains of the prior edition and returning to near-2023 levels - and the models scoring lowest on transparency are, by every external measure, among the most capable in the world.[1]
The Foundation Model Transparency Index (FMTI) assesses major AI developers across 100 indicators spanning three domains: upstream practices (training data, compute, labor), model characteristics (capabilities, risks, evaluations), and downstream disclosures (usage policies, deployment context, impact data).[1] Researchers at the institutions involved spend roughly a year gathering information - reviewing published documentation, soliciting transparency reports directly from companies, and for those that decline to submit reports, manually assembling disclosures from public sources. The 2025 edition expanded coverage to 13 companies, including Alibaba, DeepSeek, and xAI for the first time.[1]
The index is imperfect - companies that submit their own transparency reports score considerably higher than those assessed purely from external information, which introduces a participation incentive that can distort rankings. The researchers acknowledge this directly.[1] But even accounting for that asymmetry, the directional finding is difficult to dismiss: across nearly every dimension, the industry collectively disclosed less in 2025 than it did in 2024.
The score distribution is stark. IBM leads the index at 95 out of 100 - the highest score in FMTI history - disclosing information on six indicators that no other company touches.[1] At the other end, xAI and Midjourney score 14. The companies that occupy the consequential middle - Anthropic, Google, OpenAI, Meta, Amazon - cluster together in a way the researchers find analytically significant: the five Frontier Model Forum members assessed in the index land in the exact middle of the rankings.[1]
The researchers' interpretation is pointed: these companies appear to share an incentive to avoid particularly low scores, while lacking any market or regulatory pressure to achieve high ones. The result is a kind of coordinated mediocrity - not bad enough to attract censure, not transparent enough to be genuinely useful to external auditors, policymakers, or the public.[1] Meta's score was cut in half year-over-year; Mistral's fell by more than two-thirds.[1] OpenAI dropped 14 points; Google, 6.[1]
The specific gaps are revealing. Training data - what a model was trained on, where that data came from, whether it overlaps with evaluation benchmarks - remains almost entirely opaque across the industry.[1] Training compute disclosure is similarly absent, especially for the most resource-intensive models. Environmental impact data is, for practical purposes, nonexistent: the FMTI finds "little to no information" on this dimension across the field, even as Stanford's broader AI Index documents that Grok 4's training alone emitted an estimated 72,816 tons of CO2 equivalent.[2] Companies disclose capability evaluations, but rarely provide enough methodological detail for those evaluations to be independently reproduced.[1]
The naive reading is competitive pressure: as models become more commercially valuable, the information that goes into building them becomes more strategically sensitive. Training data curation, compute optimization, and fine-tuning approaches represent genuine intellectual property, and no company wants to hand competitors a roadmap.
But that argument has limits. IBM - an enterprise-focused company competing in many of the same markets - scores 95, suggesting the competitive-sensitivity explanation is at least partly a choice rather than a constraint. The FMTI research team identifies a more structural explanation: enterprise-facing companies have clients who demand transparency as a procurement condition, while consumer-facing AI companies do not face equivalent pressure from their user base.[1] When accountability is voluntary and the customers are individuals rather than institutions, opacity becomes the path of least resistance.
The FMTI's own data offers two concrete signals worth taking seriously. Companies that prepare and submit their own transparency reports score roughly twice as high as those assessed from public information alone - a gap the researchers treat not as a methodological flaw but as evidence that disclosure responds to structured demands rather than internal goodwill.[1] Signatories to the EU AI Act's General Purpose AI Code of Practice score modestly higher overall, with the advantage driven mainly by stronger downstream disclosures on usage policies and deployment context rather than any meaningful opening up on training data.[1] Taken together, the data makes a narrow but important point: the floor rises when someone sets one.
The harder question is who that someone will be. Mandatory disclosure requirements have precedent in adjacent industries - financial reporting, pharmaceutical clinical trials, environmental impact assessments - and in each case, voluntary frameworks preceded mandatory ones by years before regulators concluded that goodwill was insufficient. AI transparency appears to be tracing the same arc, only faster and with higher stakes. The 2024 FMTI scores improved when the industry was smaller, more scrutinized, and more anxious about its public standing. The 2025 decline coincides with a period in which frontier labs have grown more commercially entrenched and politically connected.
The U.S. regulatory environment has moved in the opposite direction. Several state-level AI transparency bills have stalled or been preempted, and federal action on foundation model disclosure remains absent. The Stanford AI Index notes that public trust in the U.S. government to regulate AI sits at just 31 percent - the lowest of any country surveyed.[2] That distrust may be warranted; it is also, at this moment, convenient for the companies whose disclosure scores are falling.
The FMTI's longitudinal data now spans three years, enough to see a pattern: transparency improved when the industry was smaller and under greater scrutiny, and has declined as the stakes - and the political headwinds against regulation - have grown. The index exists, its authors note, to give policymakers, journalists, clients, and investors the information to hold developers accountable. Whether any of those actors will use it is a separate question.