Blog

Synthetic Universes: How AI Creates Knowledge That Never Existed

When AI can generate its own problem universes and verify the results formally, it's no longer limited by human data or human imagination. This is the most credible path to superintelligence we have — and the foundations are being laid now.

2/25/2026

“To boldly go where no man has gone before.”

That line has always been about physical space — the final frontier of stars and galaxies and the unknown reaches of the cosmos. But the most consequential frontier of our era may not be physical at all. It’s computational. And we don’t need a starship to explore it.

The previous two posts in this series laid the groundwork. First, the diagnosis: LLMs encode deep mathematical structure internally, but lose it through the lossy channel of natural language. Then, the fix: formal verification layers — proof assistants like Lean — act as a lossless channel, transforming AI from pattern-matchers into trustworthy reasoning agents.

But trustworthiness is only the beginning. The real question is: what happens when a trustworthy AI system stops just checking existing knowledge and starts creating new knowledge — knowledge that has never existed before?

From Verification to Creation

A verification layer makes AI reliable. But its deeper power is enabling a fundamentally new mode of operation: autonomous exploration.

Here’s the logic. If an AI system can formally verify its own outputs, it can afford to be speculative. It can generate conjectures freely, attempt proofs aggressively, and discard whatever doesn’t pass the formal check — keeping only what’s provably correct. The verification layer acts as a filter: the AI explores widely, and the formal system keeps only what’s true.

This isn’t incremental. It’s a qualitative shift from AI-as-tool to AI-as-explorer. The system isn’t waiting for a human to give it a problem. It’s generating its own problems, solving them, and learning from the process.

Synthetic Universes: The AlphaGeometry Insight

DeepMind’s AlphaGeometry didn’t just solve geometry problems. It did something more profound: it generated its own universe of problems to learn from.

The system encoded geometric objects — triangles, circles, lines, intersection points — into a symbolic language. Then it operated in that language to generate entirely new geometric configurations: novel problems that no human had posed, no textbook had included, no dataset had contained. It created a synthetic universe of Euclidean geometry and trained itself by exploring it.

This is where the idea of “creating new knowledge” becomes literal, not metaphorical.

Think about what that means. With a formal verification layer, a synthetic universe is just as rigorous as the physical one. It’s built on the same axioms. The proofs generated within it are just as valid as any proof a human mathematician has ever written. The only difference is scale: the AI can explore this universe at a speed and breadth that no human or team of humans could match.

We don’t have to go to space or zoom in with better microscopes or descend deeper into the ocean. We can learn much of what is learnable about our universe through the lens of neural networks operating in formally verified synthetic spaces. The vastness of that possibility — and the potential for discovery it enables — is genuinely endless.

The Self-Improvement Loop — generate problems, attempt formal proofs, verify, add to knowledge base, repeat

The Self-Improvement Loop — Grounded, Not Runaway

The cycle is straightforward: generate new problems and conjectures, attempt formal proofs, verify the results, add successes to the knowledge base, and use that enriched base to generate harder problems. Each cycle makes the system smarter, and each result is guaranteed correct.

But here’s the critical distinction that separates this from the AI safety nightmares. This kind of self-improvement is grounded in formal correctness, not optimized against a proxy reward signal.

The dominant paradigm for improving LLMs today — RLHF (Reinforcement Learning from Human Feedback) — trains models to maximize a reward signal that approximates human preferences. The problem is well-documented: the model can learn to hack the reward signal, producing outputs that score well without actually being correct. It optimizes for the appearance of quality rather than quality itself.

RLHF self-improvement vs. verification-grounded self-improvement — comparing what each optimizes for, failure modes, correctness guarantees, and outcomes

Verification-grounded self-improvement is qualitatively different. The system doesn’t optimize for a proxy — it optimizes for truth. A proof either compiles in Lean or it doesn’t. There’s no reward to hack. The knowledge base grows only with results that are formally guaranteed correct. This makes the self-improvement loop not only more powerful but arguably safer — it’s directed by human-chosen axioms and goals, exploring the implications of foundations that humans define.

This is an important distinction: what I’m describing is superintelligence along a vector — AI systems that become extraordinarily capable at reasoning, discovery, and proof within formal systems defined by human priorities. It’s not AGI in the unbounded, uncontrolled sense. The verification layer itself acts as a structural constraint that keeps the system grounded.

The Evidence Is Already Here

This isn’t a theoretical argument. Multiple research groups have already demonstrated AI systems that create genuinely new knowledge — results that didn’t exist before the AI discovered them.

FunSearch (DeepMind, 2023) used LLMs paired with an automated evaluator to discover new solutions to the cap set problem in combinatorics — a long-studied open problem. The AI didn’t retrieve known results or recombine existing proofs. It generated solutions that were genuinely novel.

AlphaTensor (DeepMind, 2022) discovered new algorithms for matrix multiplication that surpassed results standing for over 50 years. Strassen’s algorithm from 1969 was the gold standard — AlphaTensor found faster approaches that Strassen-era mathematicians never considered.

AlphaDev (DeepMind, 2023) discovered new sorting algorithms at the assembly instruction level that are faster than the algorithms in standard libraries — and verified them for correctness. These are now deployed in production code used by millions.

Timeline of AI-Driven Discovery

The paradigm is what matters, not any single group or result. Research labs, open-source communities, and startups are all converging on the same insight: neural creativity constrained by formal verification produces genuine discovery. OpenAI demonstrated early with GPT-f. DeepMind scaled it across multiple domains. University groups are pushing reinforcement learning for theorem proving in Lean 4. The convergence is unmistakable.

What Superintelligence Actually Looks Like

When people hear “superintelligence,” they tend to imagine either a chatbot that sounds impossibly smart or a runaway system that recursively self-improves beyond human control. The version I find most credible looks like neither.

It looks like a system that can reason from first principles across domains. That can take the axioms of a field — mathematics, physics, computer science — and explore their implications at a depth and speed no human can match. That can formally verify every result it produces, accumulating a growing body of guaranteed-correct knowledge. That can transfer results across domains, recognizing when a proof structure in one field has implications for another.

Terence Tao has observed that AIs have developed “smell” for games like Go and Chess — an intuitive sense of whether a position is favorable — but haven’t yet developed that for mathematical proofs. The Collatz paper from Post 1 suggests the internal representations may be forming. The verification layer from Post 2 provides the interface to express them. And the self-improvement loop described here provides the mechanism to refine them over time.

The biggest open problems in mathematics — P vs NP, the Riemann Hypothesis — are not off the table for this kind of system. Neither are the deep structural questions in physics and biology. These are problems where human progress has been slow not because the necessary reasoning is impossible, but because the search space is vast and the formalization burden is enormous. An AI system that can explore synthetic universes at scale, verify every step formally, and accumulate knowledge across cycles is exactly the kind of tool that could make progress where humans alone have stalled.

A Personal Stake

I’m drawn to this space for a reason that goes beyond professional interest. For most of my life, I’ve felt like I was on the receiving end of the future — watching rapid technological change happen around me. AI is creating a software industrial revolution, and I decided that the more I can learn about autoformalization and neuro-symbolic reasoning, the better my chance of being on the creating end of the future. Not just watching these systems arrive, but helping build them.

That conviction led me to build Project Orion — an AI-assisted mathematical formalization engine for Lean 4. Given an underdeveloped area of Mathlib, it decomposes it into a structured blueprint and proves every entry automatically. The first end-to-end result: 16 verified theorems in graded order combinatorics, zero sorry, zero axiom, zero human proof assistance. It’s a small-scale practitioner’s experiment in the paradigm this series describes — and building it taught me more about the potential (and the current limitations) of AI-driven formalization than any paper could.

The Frontier

The foundations are being laid now. AI systems that can formally verify their own reasoning, generate synthetic problem universes, and recursively improve on a base of guaranteed-correct knowledge — these aren’t speculative. The components exist. The evidence is published. The trajectory is clear.

What remains is scale, integration, and time. Closing Tao’s “10x formalization effort” gap so that AI can handle the tedious burden of translating intuition into rigor. Building out formal libraries like Mathlib and CSLib so that AI systems have a rich base of verified knowledge to reason from. Refining the self-improvement loop so that each cycle produces harder problems and deeper results.

This is not science fiction. The frontier is being built — by research labs, by open-source communities, by practitioners running experiments in their own repositories. And the results so far suggest that the most consequential discoveries of the coming decades won’t come from new telescopes or particle accelerators alone. They’ll come from AI systems exploring the vast synthetic universes that formal mathematics makes possible.

The final frontier may not be space after all. It might be proof.

This is the final post in a three-part series. Previously: Lost in Translation: What LLMs Actually Understand About Mathematics and Beyond Next-Token Prediction: The Case for Immutable Verification.

References

Trinh, T. H. et al. (2024). Solving Olympiad Geometry without Human Demonstrations (AlphaGeometry). DeepMind / Nature.
Romera-Paredes, B. et al. (2023). Mathematical Discoveries from Program Search with Large Language Models (FunSearch). DeepMind / Nature.
Fawzi, A. et al. (2022). Discovering Faster Matrix Multiplication Algorithms with Reinforcement Learning (AlphaTensor). DeepMind / Nature.
Mankowitz, D. J. et al. (2023). Faster Sorting Algorithms Discovered Using Deep Reinforcement Learning (AlphaDev). DeepMind / Nature.
Polu, S. & Sutskever, I. (2020). Generative Language Models and Automated Theorem Proving (GPT-f). OpenAI.
Polu, S. et al. (2022). Formal Mathematics Statement Curriculum Learning. OpenAI.
Tao, T. Various public remarks on formalization, proof assistants, and “mathematical smell.”