Donald Knuth does not do hyperbole. The 88-year-old Stanford professor, Turing Award laureate, and author of The Art of Computer Programming is famous for precision - for writing code he has only proved correct, not run. So when his latest paper opens with "Shock! Shock!", the field pays attention.[1]
The paper, dated February 28, 2026 and titled "Claude's Cycles," documents something genuinely unusual: an open combinatorics problem that Knuth had been wrestling with for several weeks, solved by Anthropic's Claude Opus 4.6 before Knuth could crack it himself.[1] "What a joy it is to learn not only that my conjecture has a nice solution," Knuth wrote, "but also to celebrate this dramatic advance in automatic deduction and creative problem solving."[1]
The question arose while Knuth was drafting a future volume of The Art of Computer Programming. He was working with a specific three-dimensional directed graph: vertices labeled ijk for all triples in a modular arithmetic space of size m, with three outgoing arcs per vertex. The challenge was to decompose all the arcs of this Cayley digraph into three directed Hamiltonian cycles - paths that visit every vertex exactly once - for all odd values of m greater than two.[1]
Knuth had solved the base case at m = 3. His colleague Filip Stappers had found solutions empirically for m up to 16. But a general constructive proof - the kind that works for all odd m - remained out of reach. It was Stappers who decided to pose the problem, verbatim, to Claude.[1]
In plain terms: Knuth had an unsolved puzzle about how to efficiently route paths through a network without retracing any step, and neither he nor his colleague had been able to find a general solution.
What makes "Claude's Cycles" remarkable is not just the result but the record. Knuth narrates Claude's full search process across 31 distinct explorations, each logged to a running plan.md file at Stappers's insistence. The transcript reads like a research notebook: dead ends acknowledged, hypotheses discarded, new framings proposed.[1]
Claude's early attempts were systematic but unsuccessful. It reformulated the problem in terms of permutation assignments, tried linear and quadratic ansätze, ran brute-force depth-first search (too slow), and experimented with simulated annealing. After exploration 25, it concluded: "SA can find solutions but cannot give a general construction. Need pure math."
Sign in to join the discussion.
The conceptual breakthrough came at exploration 30. Claude returned to a simulated-annealing solution found earlier and noticed that the permutation choice at each "fiber" depended on only a single coordinate. That structural observation led directly to a closed-form construction in exploration 31 - a short Python program that produced valid Hamiltonian decompositions for m = 3, 5, 7, 9, and 11. Stappers then verified it for all odd m between 3 and 101. "All three cycles are Hamiltonian, all arcs are used, perfect decomposition!"[1]
Knuth devotes the second half of the paper to a formal proof of why Claude's construction works - supplying the mathematical rigor the model's search process could not. The collaboration is explicit: Claude found the construction; Knuth proved it.[1]
In short: an AI figured out the general recipe before one of the greatest living mathematicians did.
Context matters here. In April 2023, Knuth posed twenty questions to ChatGPT as an experiment and published the results on his Stanford page. His conclusion was pointed: he would continue to leave AI research to others and devote his time to developing concepts that are "authentic and trustworthy."[2] That assessment was consistent with his long-held view that programming is a craft requiring precision that probabilistic systems fundamentally lack.
The about-face in "Claude's Cycles" is therefore not casual. "It seems that I'll have to revise my opinions about 'generative AI' one of these days," Knuth wrote - careful, hedged, but unmistakable.[1] When the field's most credentialed skeptic signals a revision, the signal carries weight that no benchmark leaderboard can replicate.
The model at the center of this story is not a general-purpose chatbot. Claude Opus 4.6, released in early February 2026, is Anthropic's flagship hybrid reasoning model, built around extended and adaptive thinking modes that allow it to dynamically allocate reasoning effort to a problem's difficulty.[3] It supports a 200,000-token context window (with a one-million-token beta), and 128,000 max output tokens - enough to hold a sprawling mathematical search process in a single session.[3] Knuth notes that the model had been available for only three weeks when it solved his problem.[1]
The episode illustrates what extended thinking architectures are designed for: not pattern-matching to a memorized answer, but iterative hypothesis generation and revision under a hard constraint. The 31-step search log is, in effect, a stress test of that capability - run not by a benchmark designer but by one of the most demanding problem-posers in computer science.
For decades, the division of labor between human mathematicians and computers has been stable: machines verify and compute; humans conjecture and construct. "Claude's Cycles" does not upend that division, but it meaningfully blurs one edge of it. Claude did not prove a theorem - but it produced a novel construction that neither Knuth nor Stappers had found, and it did so through a documented process of iterative reasoning rather than lucky retrieval.
That distinction carries practical weight. Mathematical research is bottlenecked not by computation but by the generation of good ideas - the insight that reframes a problem, the structural observation that unlocks a proof. Those moments have always required human intuition. What this episode suggests is that reasoning models can now participate in that phase of the process, at least in well-defined combinatorial settings.
The implications extend beyond pure mathematics. Fields ranging from drug discovery to circuit design to cryptography are similarly structured: a vast search space, a precise correctness criterion, and progress gated on finding the right construction. If the pattern demonstrated here generalizes, AI systems may become routine collaborators in research domains where the bottleneck has never been computation at all.
There is also a methodological lesson in how this result was produced. Stappers's insistence on logging every step to plan.md turned a one-off session into a reproducible, auditable record. That practice - treating an AI's reasoning trace as a research artifact - may become standard in computational mathematics, much as experimental protocols evolved in the natural sciences.
It would be easy to overstate what happened. Claude did not independently conjure the problem, nor did it supply a proof. The construction it found required human verification at scale (Stappers) and formal proof (Knuth). Knuth himself frames it that way: the paper is a collaboration, and its title names the AI not as sole author but as contributor.
What the episode does establish is that reasoning models can now contribute meaningfully to open mathematical research - not by retrieving known results, but by searching a large combinatorial space, recognizing structural patterns, and generating a construction that no human had previously found. That is a qualitatively different capability from summarizing papers or writing boilerplate code. Knuth, a man who chooses words with the care of a typesetter, called it a "dramatic advance." That verdict is worth taking seriously.