Machine Learning Explained: A Guide to ML, AI, & Deep Learning
A breakdown of Machine Learning (ML), its relationship with AI and Deep Learning, and its core paradigms: supervised, unsupervised, and reinforcement learning. The summary explores classic models and connects them to modern applications like Large Language Models (LLMs) and Reinforcement Learning with Human Feedback (RLHF).
A breakdown of Machine Learning (ML), its relationship with AI and Deep Learning, and its core paradigms: supervised, unsupervised, and reinforcement learning. The summary explores classic models and connects them to modern applications like Large Language Models (LLMs) and Reinforcement Learning with Human Feedback (RLHF).
Machine Learning (ML) is a subset of Artificial Intelligence (AI) that focuses on algorithms that learn patterns from training data to make accurate inferences about new, unseen data. It sits within a hierarchy where AI is the broadest field, ML is a subfield of AI, and Deep Learning (DL)—which uses neural networks with many layers—is a subfield of ML.
The central premise of ML involves model training, a process where a machine's performance is optimized on a dataset that resembles real-world tasks. A well-trained model can then apply the patterns it has learned to infer correct outputs for new data. The deployment of this trained model is called AI inference, where it actively makes predictions on new, live data.
The Three Learning Paradigms
Most machine learning can be grouped into three main paradigms:
1. Supervised Learning
Supervised learning trains a model to predict a correct output using labeled examples, often referred to as "ground truth." This process typically requires a human to provide the correctly labeled data.
⦁ Regression Models: Predict continuous numerical values, such as price predictions or temperature forecasts.
⦁ Linear Regression: Finds the best-fit straight line through data points.
⦁ Polynomial Regression: Captures non-linear relationships in the data.
⦁ Classification Models: Predict discrete classes or categories.
⦁ Binary Classification: Assigns an item to one of two categories (e.g., spam or not spam).
⦁ Multi-class Classification: Assigns an item to one of many categories.
⦁ Multi-label Classification: Assigns multiple relevant tags or labels to a single item.
Modern techniques often use ensemble methods, which combine multiple models to achieve higher accuracy.
A related approach is semi-supervised learning, which uses a small amount of labeled data along with a large pool of unlabeled data. This method allows the model to generalize from the labeled examples to the unlabeled data, reducing the need for costly and time-consuming data labeling.
2. Unsupervised Learning
Unsupervised learning works with unlabeled data to discover hidden structures and patterns on its own.
⦁ Clustering: Groups similar items together.
⦁ K-Means Clustering: Assigns items to a pre-determined number (k) of groups by repeatedly calculating group averages (centroids) until they stabilize. This is useful for tasks like customer segmentation (e.g., bargain hunters, loyal customers).
⦁ Hierarchical Clustering: Builds a tree of clusters by starting with each item as its own group and progressively merging the most similar groups. This allows for creating broad or fine-grained clusters depending on where the tree is "cut," which is useful for organizing IT tickets into themes.
⦁ Dimensionality Reduction: Reduces the complexity of data by representing it with a smaller number of features while retaining meaningful characteristics. This is often used for data preprocessing, compression, and visualization. Common algorithms include Principal Component Analysis (PCA) and autoencoders.
3. Reinforcement Learning (RL)
In reinforcement learning, an agent interacts with an environment. The agent observes the current state, chooses an action, and receives a reward or penalty from the environment. Through trial and error, the agent learns a policy that maximizes its long-term rewards.
A key challenge in RL is balancing exploration (trying new actions) with exploitation (repeating actions that have worked well in the past). A classic example is a self-driving car, where the state comes from GPS and cameras, actions are steering and braking, and rewards are given for safe progress while penalties are applied for hard braking or collisions.
From Classic ML to Modern Applications
Techniques like regression, classification, and clustering a...
Full story
The central premise of ML involves model training, a process where a machine's performance is optimized on a dataset that resembles real-world tasks. A well-trained model can then apply the patterns it has learned to infer correct outputs for new data. The deployment of this trained model is called AI inference, where it actively makes predictions on new, live data.
The Three Learning Paradigms
Most machine learning can be grouped into three main paradigms:
1. Supervised Learning
Supervised learning trains a model to predict a correct output using labeled examples, often referred to as "ground truth." This process typically requires a human to provide the correctly labeled data.
⦁ Regression Models: Predict continuous numerical values, such as price predictions or temperature forecasts.
⦁ Linear Regression: Finds the best-fit straight line through data points.
⦁ Polynomial Regression: Captures non-linear relationships in the data.
⦁ Classification Models: Predict discrete classes or categories.
⦁ Binary Classification: Assigns an item to one of two categories (e.g., spam or not spam).
⦁ Multi-class Classification: Assigns an item to one of many categories.
⦁ Multi-label Classification: Assigns multiple relevant tags or labels to a single item.
Modern techniques often use ensemble methods, which combine multiple models to achieve higher accuracy.
A related approach is semi-supervised learning, which uses a small amount of labeled data along with a large pool of unlabeled data. This method allows the model to generalize from the labeled examples to the unlabeled data, reducing the need for costly and time-consuming data labeling.
2. Unsupervised Learning
Unsupervised learning works with unlabeled data to discover hidden structures and patterns on its own.
⦁ Clustering: Groups similar items together.
⦁ K-Means Clustering: Assigns items to a pre-determined number (k) of groups by repeatedly calculating group averages (centroids) until they stabilize. This is useful for tasks like customer segmentation (e.g., bargain hunters, loyal customers).
⦁ Hierarchical Clustering: Builds a tree of clusters by starting with each item as its own group and progressively merging the most similar groups. This allows for creating broad or fine-grained clusters depending on where the tree is "cut," which is useful for organizing IT tickets into themes.
⦁ Dimensionality Reduction: Reduces the complexity of data by representing it with a smaller number of features while retaining meaningful characteristics. This is often used for data preprocessing, compression, and visualization. Common algorithms include Principal Component Analysis (PCA) and autoencoders.
3. Reinforcement Learning (RL)
In reinforcement learning, an agent interacts with an environment. The agent observes the current state, chooses an action, and receives a reward or penalty from the environment. Through trial and error, the agent learns a policy that maximizes its long-term rewards.
A key challenge in RL is balancing exploration (trying new actions) with exploitation (repeating actions that have worked well in the past). A classic example is a self-driving car, where the state comes from GPS and cameras, actions are steering and braking, and rewards are given for safe progress while penalties are applied for hard braking or collisions.
From Classic ML to Modern Applications
Techniques like regression, classification, and clustering a...
Full story
tokenless.tech
Machine Learning Explained: A Guide to ML, AI, & Deep Learning | Tokenless
A breakdown of Machine Learning (ML), its relationship with AI and Deep Learning, and its core paradigms: supervised, unsupervised, and reinforcement learning. The summary explores classic models and connects them to modern applications like Large Language…
On AI, Creativity, and Intelligence
A common critique of Large Language Models (LLMs) is that they cannot produce genuinely new ideas or creative works, but rather remix their training data. Marc Andreessen challenges this by questioning the definition of human creativity and intelligence itself. He argues that true conceptual breakthroughs are exceedingly rare in human history; most progress, whether in technology or the arts, is the result of decades of prior work and "remixing" existing ideas. Even a genius like Beethoven was heavily influenced by his predecessors. The standard for AI, therefore, should not be a mythical ideal of pure invention but whether it can match or exceed the innovative capacity of the vast majority of humans. If a model can clear the bar of 99.99% of humanity, it represents a monumental leap.
Ben Horowitz echoes this sentiment from the world of music. Through his work with hip-hop legends, he notes that true "conceptual innovators" like Rakim or Dr. Dre are incredibly rare, representing a tiny fraction of all artists. Most artists, particularly in hip-hop, are interested in AI as a powerful creative tool, seeing it as a natural extension of their own methods of sampling and reinterpreting existing music to create something new.
Intelligence, Leadership, and Theory of Mind
The conversation challenges the assumption that superior intelligence inevitably leads to power or control. Andreessen points out the fallacy in this thinking by observing the world around us: "the PhDs all work for MBAs." While IQ shows a positive correlation (around 0.4 in social sciences) with successful life outcomes, it fails to explain the majority of what drives success and leadership.
Leadership requires a different set of skills beyond raw intellect. Horowitz emphasizes qualities like emotional understanding, courage, motivation, and the ability to navigate difficult conversations—seeing decisions through the eyes of the team rather than just one's own.
This leads to the concept of Theory of Mind: the ability to model the mental state of others. Andreessen highlights a fascinating finding from the U.S. military: a leadership problem arises if a leader's IQ is more than one standard deviation away from their subordinates, in either direction. A leader who is significantly smarter than their team can lose their "theory of mind" for them, becoming unable to model their thought processes and connect effectively. This suggests a superintelligent AI with a 1000 IQ might be too "alien" to manage human systems effectively.
Further, human cognition is not a disembodied process. Andreessen argues against mind-body dualism, suggesting that intelligence is a full-body experience involving everything from our gut biome to hormones. Today's AIs are "disembodied brains," and the true robotics revolution will begin when AI is integrated into physical forms that can experience and learn from the world.
The "AI Bubble" and Market Fundamentals
Ben Horowitz asserts that we are not in an AI bubble precisely because the question is still being debated. A true bubble is a psychological phenomenon characterized by "capitulation," where everyone, including skeptics, comes to believe it is not a bubble. Unlike the dot-com era where market size had to catch up to valuations, the AI space is currently characterized by immense, tangible short-term demand.
Andreessen brings the discussion back to two ground-truth fundamentals:
1. Does the technology actually work? Yes, it delivers on its promise.
2. Are customers paying for it? Yes, they are.
As long as these two conditions hold, the market is grounded in reality, not hype.
Platform Shifts and the Future of UX
While the current battle appears to be between incumbents like Google and new entrants like OpenAI, the ultimate product form factors for AI are still unknown. Andreessen draws a historical parallel to the...
Full story
A common critique of Large Language Models (LLMs) is that they cannot produce genuinely new ideas or creative works, but rather remix their training data. Marc Andreessen challenges this by questioning the definition of human creativity and intelligence itself. He argues that true conceptual breakthroughs are exceedingly rare in human history; most progress, whether in technology or the arts, is the result of decades of prior work and "remixing" existing ideas. Even a genius like Beethoven was heavily influenced by his predecessors. The standard for AI, therefore, should not be a mythical ideal of pure invention but whether it can match or exceed the innovative capacity of the vast majority of humans. If a model can clear the bar of 99.99% of humanity, it represents a monumental leap.
Ben Horowitz echoes this sentiment from the world of music. Through his work with hip-hop legends, he notes that true "conceptual innovators" like Rakim or Dr. Dre are incredibly rare, representing a tiny fraction of all artists. Most artists, particularly in hip-hop, are interested in AI as a powerful creative tool, seeing it as a natural extension of their own methods of sampling and reinterpreting existing music to create something new.
Intelligence, Leadership, and Theory of Mind
The conversation challenges the assumption that superior intelligence inevitably leads to power or control. Andreessen points out the fallacy in this thinking by observing the world around us: "the PhDs all work for MBAs." While IQ shows a positive correlation (around 0.4 in social sciences) with successful life outcomes, it fails to explain the majority of what drives success and leadership.
Leadership requires a different set of skills beyond raw intellect. Horowitz emphasizes qualities like emotional understanding, courage, motivation, and the ability to navigate difficult conversations—seeing decisions through the eyes of the team rather than just one's own.
This leads to the concept of Theory of Mind: the ability to model the mental state of others. Andreessen highlights a fascinating finding from the U.S. military: a leadership problem arises if a leader's IQ is more than one standard deviation away from their subordinates, in either direction. A leader who is significantly smarter than their team can lose their "theory of mind" for them, becoming unable to model their thought processes and connect effectively. This suggests a superintelligent AI with a 1000 IQ might be too "alien" to manage human systems effectively.
Further, human cognition is not a disembodied process. Andreessen argues against mind-body dualism, suggesting that intelligence is a full-body experience involving everything from our gut biome to hormones. Today's AIs are "disembodied brains," and the true robotics revolution will begin when AI is integrated into physical forms that can experience and learn from the world.
The "AI Bubble" and Market Fundamentals
Ben Horowitz asserts that we are not in an AI bubble precisely because the question is still being debated. A true bubble is a psychological phenomenon characterized by "capitulation," where everyone, including skeptics, comes to believe it is not a bubble. Unlike the dot-com era where market size had to catch up to valuations, the AI space is currently characterized by immense, tangible short-term demand.
Andreessen brings the discussion back to two ground-truth fundamentals:
1. Does the technology actually work? Yes, it delivers on its promise.
2. Are customers paying for it? Yes, they are.
As long as these two conditions hold, the market is grounded in reality, not hype.
Platform Shifts and the Future of UX
While the current battle appears to be between incumbents like Google and new entrants like OpenAI, the ultimate product form factors for AI are still unknown. Andreessen draws a historical parallel to the...
Full story
tokenless.tech
Marc Andreessen and Ben Horowitz on the State of AI | Tokenless
A discussion with Marc Andreessen and Ben Horowitz on the true nature of AI creativity, the limitations of intelligence in leadership, why the current AI boom is not a bubble, and the coming platform shifts and geopolitical race in robotics.
The Mathematical Foundations of Intelligence [Professor Yi Ma]
Professor Yi Ma presents a unified mathematical theory of intelligence built on two principles: parsimony and self-consistency. He challenges the notion that large language models (LLMs) understand, arguing they are sophisticated memorization systems, and demonstrates how architectures like the Transformer can be derived from the first principle of compression.
Professor Yi Ma presents a unified mathematical theory of intelligence built on two principles: parsimony and self-consistency. He challenges the notion that large language models (LLMs) understand, arguing they are sophisticated memorization systems, and demonstrates how architectures like the Transformer can be derived from the first principle of compression.
Professor Yi Ma proposes that intelligence, both natural and artificial, can be understood scientifically through a mathematical framework built upon two fundamental principles: parsimony and self-consistency. This perspective aims to clarify common misunderstandings about AI, explain the true nature of deep learning models, and outline what is required to build genuinely intelligent systems.
The Two Pillars of Intelligence
The core of intelligence, particularly the kind responsible for forming memory and world models, revolves around discovering what is predictable and structured in the external world.
1. Parsimony (Compression): The first principle is the relentless pursuit of simplicity. Intelligence is the process of compressing high-dimensional sensory data to find its intrinsic low-dimensional structure. This is not merely data compression in a technical sense, but the fundamental act of extracting knowledge. Mechanisms like denoising, dimensionality reduction, and identifying statistical correlations are all manifestations of this principle. As Einstein said of science, the goal is to "make things as simple as possible, but not any simpler."
2. Self-Consistency (Closed-Loop Learning): The second principle, "not any simpler," ensures the learned model is faithful to reality. An intelligent agent must continuously use its internal model to predict future states, compare those predictions with new observations, and use any discrepancy (error) to correct and refine the model. This creates a closed feedback loop, an idea central to cybernetics. This process allows for the model to become increasingly accurate and self-consistent with the world, enabling continual and lifelong learning without direct supervision on the "ground truth" error in the data space. The low-dimensional nature of the world's data is what makes this closed-loop correction possible.
LLMs: Memorization Masquerading as Understanding
A central point of confusion in modern AI is the nature of Large Language Models (LLMs). Professor Ma argues that we are conflating the mechanism of learning with the act of understanding.
⦁ Language is Already a Compressed Code: Natural language is not raw data. It is the result of millennia of human intelligence compressing knowledge about the physical world into a symbolic code. Language is a set of pointers to shared, grounded simulations in our minds.
⦁ Applying the Wrong Tool: Current LLMs apply the same compression mechanism used to learn from raw sensory data (like vision) to the already-compressed code of language. This process is effective at identifying and memorizing the statistical structures within the vast corpus of human text.
⦁ Memorization vs. Understanding: The result is a system that can regenerate text that is statistically plausible, effectively emulating how humans solve logical problems. However, this is akin to memorizing the process rather than understanding the underlying logic. It's the difference between memorizing mathematical proofs and mastering the deductive mechanism of mathematics itself.
The Leap from Empirical Knowledge to Scientific Abstraction
Professor Ma identifies a crucial "phase transition" in the development of intelligence that current AI has yet to make: the leap from empirical knowledge to scientific abstraction.
⦁ Empirical Knowledge: This is gained through passive observation, trial-and-error, and compression of sensory data. This is the level at which animals and current AI systems operate.
⦁ Scientific Abstraction: This involves the ability to hypothesize, create abstract concepts (e.g., infinity, parallel lines that never meet), and use rigorous, deductive logic. This form of intelligence allows us to create knowledge that is not directly present in the observed data.
The key open question for the future of AI is identifying the mechanism that enables this transi...
Full story
The Two Pillars of Intelligence
The core of intelligence, particularly the kind responsible for forming memory and world models, revolves around discovering what is predictable and structured in the external world.
1. Parsimony (Compression): The first principle is the relentless pursuit of simplicity. Intelligence is the process of compressing high-dimensional sensory data to find its intrinsic low-dimensional structure. This is not merely data compression in a technical sense, but the fundamental act of extracting knowledge. Mechanisms like denoising, dimensionality reduction, and identifying statistical correlations are all manifestations of this principle. As Einstein said of science, the goal is to "make things as simple as possible, but not any simpler."
2. Self-Consistency (Closed-Loop Learning): The second principle, "not any simpler," ensures the learned model is faithful to reality. An intelligent agent must continuously use its internal model to predict future states, compare those predictions with new observations, and use any discrepancy (error) to correct and refine the model. This creates a closed feedback loop, an idea central to cybernetics. This process allows for the model to become increasingly accurate and self-consistent with the world, enabling continual and lifelong learning without direct supervision on the "ground truth" error in the data space. The low-dimensional nature of the world's data is what makes this closed-loop correction possible.
LLMs: Memorization Masquerading as Understanding
A central point of confusion in modern AI is the nature of Large Language Models (LLMs). Professor Ma argues that we are conflating the mechanism of learning with the act of understanding.
⦁ Language is Already a Compressed Code: Natural language is not raw data. It is the result of millennia of human intelligence compressing knowledge about the physical world into a symbolic code. Language is a set of pointers to shared, grounded simulations in our minds.
⦁ Applying the Wrong Tool: Current LLMs apply the same compression mechanism used to learn from raw sensory data (like vision) to the already-compressed code of language. This process is effective at identifying and memorizing the statistical structures within the vast corpus of human text.
⦁ Memorization vs. Understanding: The result is a system that can regenerate text that is statistically plausible, effectively emulating how humans solve logical problems. However, this is akin to memorizing the process rather than understanding the underlying logic. It's the difference between memorizing mathematical proofs and mastering the deductive mechanism of mathematics itself.
The Leap from Empirical Knowledge to Scientific Abstraction
Professor Ma identifies a crucial "phase transition" in the development of intelligence that current AI has yet to make: the leap from empirical knowledge to scientific abstraction.
⦁ Empirical Knowledge: This is gained through passive observation, trial-and-error, and compression of sensory data. This is the level at which animals and current AI systems operate.
⦁ Scientific Abstraction: This involves the ability to hypothesize, create abstract concepts (e.g., infinity, parallel lines that never meet), and use rigorous, deductive logic. This form of intelligence allows us to create knowledge that is not directly present in the observed data.
The key open question for the future of AI is identifying the mechanism that enables this transi...
Full story
NVIDIA’s Jensen Huang on Reasoning Models, Robotics, and Refuting the “AI Bubble” Narrative
NVIDIA CEO Jensen Huang discusses the state of AI as we begin 2026, covering rapid improvements in reasoning, the profitability of inference, why AI will increase productivity without taking jobs, the future of robotics, the importance of open source, and which sectors are poised for their 'ChatGPT moment'.
NVIDIA CEO Jensen Huang discusses the state of AI as we begin 2026, covering rapid improvements in reasoning, the profitability of inference, why AI will increase productivity without taking jobs, the future of robotics, the importance of open source, and which sectors are poised for their 'ChatGPT moment'.
Reflecting on the biggest AI surprises of 2025, the rapid improvements in reasoning, grounding, and the connection of models to search tools stand out. The industry effectively addressed skepticism around hallucination by making significant leaps in improving the quality and accuracy of AI-generated answers. A particularly pleasant surprise was the rapid, exponential growth in profitable inference tokens. Companies are now generating tokens with such high value that customers are willing to pay good money for them, indicating the creation of real economic value.
AI's Economic Impact: Jobs, Labor, and Productivity
A common narrative suggests AI will lead to widespread job loss, but this overlooks several key factors.
New Infrastructure and Skilled Labor Demand
The rise of AI has created a need for new "AI factories" to generate tokens, which in turn has spurred the emergence of three new types of industrial plants:
⦁ Chip Plants: Manufacturing the fundamental silicon.
⦁ Computer Plants: Assembling new types of supercomputers, where an entire rack can function as a single GPU.
⦁ AI Factories: The data centers that run the models.
The construction and operation of these facilities are creating enormous demand for skilled labor, including construction workers, plumbers, electricians, and network engineers, leading to significant wage growth in these professions.
The Task vs. Purpose Framework
It's crucial to distinguish between the tasks of a job and its purpose. AI automates tasks, not purposes. The example of radiology is illustrative: years ago, it was predicted that AI would eliminate the need for radiologists. While today nearly 100% of radiology applications are AI-powered, the number of radiologists has actually increased.
By automating the task of studying scans, AI allows radiologists to analyze more scans, more deeply, leading to better diagnoses. This increases the hospital's productivity, allowing them to serve more patients and generating more revenue, which in turn creates demand for more radiologists. The same principle applies to software engineering, where the purpose is to solve problems, and coding is just one of the tasks.
Solving Labor Shortages with Robotics
Physical AI and robotics are not primarily about replacing workers, but about solving severe labor shortages in areas like manufacturing and trucking, which are exacerbated by an aging global population. Furthermore, a future with a billion robots will create the largest repair and maintenance industry the world has ever seen, generating entirely new categories of jobs.
The AI Technology Stack and Ecosystem
To understand the dynamics of the industry, it's helpful to view AI through a framework.
The Five-Layer Cake
The technology stack enabling AI can be visualized as a five-layer cake:
1. Energy: The fundamental input.
2. Chips: The specialized processors.
3. Infrastructure: The hardware (data centers, supercomputers) and software (orchestration) stack.
4. Models: The AI itself, which is a diverse system of models for various modalities beyond human language, including biology, chemistry, and physics.
5. Applications: The industry-specific tools built on top of the models (e.g., Harvey for law, Open Evidence for medicine, Cursor for coding).
The Myth of "God AI" and the Importance of Open Source
The narrative of a single, monolithic "God AI" that does everything is unhelpful and distracts from the practical reality. AI is a diverse field, and different industries require specialized models. No single entity is close to creating an AI that has supreme understanding of human language, genomics, molecular biology, and physics simultaneously.
In this diverse ecosystem, open source is essential. Without it, innovation in countless industries—from healthcare to manufacturi...
Full story
AI's Economic Impact: Jobs, Labor, and Productivity
A common narrative suggests AI will lead to widespread job loss, but this overlooks several key factors.
New Infrastructure and Skilled Labor Demand
The rise of AI has created a need for new "AI factories" to generate tokens, which in turn has spurred the emergence of three new types of industrial plants:
⦁ Chip Plants: Manufacturing the fundamental silicon.
⦁ Computer Plants: Assembling new types of supercomputers, where an entire rack can function as a single GPU.
⦁ AI Factories: The data centers that run the models.
The construction and operation of these facilities are creating enormous demand for skilled labor, including construction workers, plumbers, electricians, and network engineers, leading to significant wage growth in these professions.
The Task vs. Purpose Framework
It's crucial to distinguish between the tasks of a job and its purpose. AI automates tasks, not purposes. The example of radiology is illustrative: years ago, it was predicted that AI would eliminate the need for radiologists. While today nearly 100% of radiology applications are AI-powered, the number of radiologists has actually increased.
The task of a radiologist is to study scans, but the purpose is to diagnose disease.
By automating the task of studying scans, AI allows radiologists to analyze more scans, more deeply, leading to better diagnoses. This increases the hospital's productivity, allowing them to serve more patients and generating more revenue, which in turn creates demand for more radiologists. The same principle applies to software engineering, where the purpose is to solve problems, and coding is just one of the tasks.
Solving Labor Shortages with Robotics
Physical AI and robotics are not primarily about replacing workers, but about solving severe labor shortages in areas like manufacturing and trucking, which are exacerbated by an aging global population. Furthermore, a future with a billion robots will create the largest repair and maintenance industry the world has ever seen, generating entirely new categories of jobs.
The AI Technology Stack and Ecosystem
To understand the dynamics of the industry, it's helpful to view AI through a framework.
The Five-Layer Cake
The technology stack enabling AI can be visualized as a five-layer cake:
1. Energy: The fundamental input.
2. Chips: The specialized processors.
3. Infrastructure: The hardware (data centers, supercomputers) and software (orchestration) stack.
4. Models: The AI itself, which is a diverse system of models for various modalities beyond human language, including biology, chemistry, and physics.
5. Applications: The industry-specific tools built on top of the models (e.g., Harvey for law, Open Evidence for medicine, Cursor for coding).
The Myth of "God AI" and the Importance of Open Source
The narrative of a single, monolithic "God AI" that does everything is unhelpful and distracts from the practical reality. AI is a diverse field, and different industries require specialized models. No single entity is close to creating an AI that has supreme understanding of human language, genomics, molecular biology, and physics simultaneously.
In this diverse ecosystem, open source is essential. Without it, innovation in countless industries—from healthcare to manufacturi...
Full story
tokenless.tech
NVIDIA’s Jensen Huang on Reasoning Models, Robotics, and Refuting the “AI Bubble” Narrative | Tokenless
NVIDIA CEO Jensen Huang discusses the state of AI as we begin 2026, covering rapid improvements in reasoning, the profitability of inference, why AI will increase productivity without taking jobs, the future of robotics, the importance of open source, and…
Post-training best-in-class models in 2025
An expert overview of post-training techniques for language models, covering the entire workflow from data generation and curation to advanced algorithms like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Reinforcement Learning (RL), along with practical advice on evaluation and iteration.
An expert overview of post-training techniques for language models, covering the entire workflow from data generation and curation to advanced algorithms like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Reinforcement Learning (RL), along with practical advice on evaluation and iteration.
Post-training is the crucial process of transforming a pre-trained base model, which can only perform token completion, into a sophisticated model capable of following instructions and answering questions. This process is an iterative cycle of data curation, training, and evaluation, essential for creating specialized and high-performing models.
Supervised Fine-Tuning (SFT): The Foundation
The first step in post-training is Supervised Fine-Tuning (SFT). This involves training the base model on a large, high-quality dataset of instruction-answer pairs, often exceeding one million samples for general-purpose models.
Data Quality and Structure
The quality of the SFT dataset is paramount. A good dataset must be:
⦁ Accurate: The answers must be factually correct.
⦁ Diverse: It should cover a wide range of topics and tasks.
⦁ Complex: The tasks should be challenging enough to facilitate model learning.
The typical data structure includes an optional system prompt, a user instruction, and the expected model output. During training, the loss is calculated only on the model's generated output, making the quality of the provided answers critically important. A common data generation pipeline involves using a powerful LLM to generate responses to seed prompts with specific constraints, followed by automated checks, filtering, and decontamination to prevent training on test data.
SFT Techniques and Parameters
⦁ Full Fine-Tuning: Updates all model parameters, maximizing potential quality but requiring significant computational resources.
⦁ Parameter-Efficient Fine-Tuning (PEFT):
⦁ LoRA (Low-Rank Adaptation): Freezes the base model's weights and introduces small, trainable matrices (adapters). This drastically reduces the number of trainable parameters (e.g., to 0.1%), saving VRAM and speeding up training.
⦁ QLoRA (Quantized LoRA): Further reduces memory requirements by loading a quantized (e.g., 4-bit) version of the model before applying LoRA. This is a trade-off, as it can lead to a degradation in quality compared to standard LoRA.
The most critical hyperparameter to tune is the learning rate. An excessively high learning rate can cause "loss spikes," leading to a collapse in model performance. Monitoring the training loss for a smooth, descending curve is a key indicator of a successful run.
Preference Alignment with DPO
Direct Preference Optimization (DPO) is a powerful technique for aligning a model's behavior and style with human preferences. It moves beyond simple instruction-following to refine the nuances of the model's responses.
DPO uses a preference dataset composed of prompts, "chosen" (preferred) answers, and "rejected" (less preferred) answers. The training objective is contrastive: it increases the likelihood of the model generating responses similar to the chosen examples while decreasing the likelihood of generating those similar to the rejected ones.
A key hyperparameter in DPO is
DPO is highly effective at creating models that humans prefer, as measured by metrics like the Chatbot Arena Elo score. However, it's important to note that human preference is often weakly correlated with performance on academic benchmarks for tasks like math or reasoning.
Advanced Reasoning with Reinforcement Learning (RL)
For complex reasoning tasks like math and coding, Reinforcement Learning (RL) offers a powerful training paradigm. A popular approach, used for models like DeepSeek, involves a multi-stage process.
1. SFT Warm-up: The model is first fine-tuned on a specialized dataset where each answer is preceded by a "reasoning trace" or chain-of-thought. This teaches the model to structure its thinking process before providi...
Full story
Supervised Fine-Tuning (SFT): The Foundation
The first step in post-training is Supervised Fine-Tuning (SFT). This involves training the base model on a large, high-quality dataset of instruction-answer pairs, often exceeding one million samples for general-purpose models.
Data Quality and Structure
The quality of the SFT dataset is paramount. A good dataset must be:
⦁ Accurate: The answers must be factually correct.
⦁ Diverse: It should cover a wide range of topics and tasks.
⦁ Complex: The tasks should be challenging enough to facilitate model learning.
The typical data structure includes an optional system prompt, a user instruction, and the expected model output. During training, the loss is calculated only on the model's generated output, making the quality of the provided answers critically important. A common data generation pipeline involves using a powerful LLM to generate responses to seed prompts with specific constraints, followed by automated checks, filtering, and decontamination to prevent training on test data.
SFT Techniques and Parameters
⦁ Full Fine-Tuning: Updates all model parameters, maximizing potential quality but requiring significant computational resources.
⦁ Parameter-Efficient Fine-Tuning (PEFT):
⦁ LoRA (Low-Rank Adaptation): Freezes the base model's weights and introduces small, trainable matrices (adapters). This drastically reduces the number of trainable parameters (e.g., to 0.1%), saving VRAM and speeding up training.
⦁ QLoRA (Quantized LoRA): Further reduces memory requirements by loading a quantized (e.g., 4-bit) version of the model before applying LoRA. This is a trade-off, as it can lead to a degradation in quality compared to standard LoRA.
The most critical hyperparameter to tune is the learning rate. An excessively high learning rate can cause "loss spikes," leading to a collapse in model performance. Monitoring the training loss for a smooth, descending curve is a key indicator of a successful run.
Preference Alignment with DPO
Direct Preference Optimization (DPO) is a powerful technique for aligning a model's behavior and style with human preferences. It moves beyond simple instruction-following to refine the nuances of the model's responses.
DPO uses a preference dataset composed of prompts, "chosen" (preferred) answers, and "rejected" (less preferred) answers. The training objective is contrastive: it increases the likelihood of the model generating responses similar to the chosen examples while decreasing the likelihood of generating those similar to the rejected ones.
A key hyperparameter in DPO is
beta, which controls how closely the model must adhere to the reference model. A low beta allows for more exploration, while a high beta keeps the model's behavior constrained.DPO is highly effective at creating models that humans prefer, as measured by metrics like the Chatbot Arena Elo score. However, it's important to note that human preference is often weakly correlated with performance on academic benchmarks for tasks like math or reasoning.
Advanced Reasoning with Reinforcement Learning (RL)
For complex reasoning tasks like math and coding, Reinforcement Learning (RL) offers a powerful training paradigm. A popular approach, used for models like DeepSeek, involves a multi-stage process.
1. SFT Warm-up: The model is first fine-tuned on a specialized dataset where each answer is preceded by a "reasoning trace" or chain-of-thought. This teaches the model to structure its thinking process before providi...
Full story
tokenless.tech
Post-training best-in-class models in 2025 | Tokenless
An expert overview of post-training techniques for language models, covering the entire workflow from data generation and curation to advanced algorithms like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Reinforcement Learning (RL)…
Ben Lorica and Evangelos Simoudis of Synapse Partners analyze the major themes from CES 2026 and the shifting landscape of US-China AI chip export policies.
Key Takeaways from CES 2026
The Humanoid Robotics Explosion
CES 2026 was overwhelmingly a "humanoids conference," with over 30 companies, predominantly from China, showcasing their latest developments. However, a closer look reveals the industry is still in its early stages.
⦁ State of the Technology: Many demonstrations were teleoperated, and the robots are not yet ready for deployment in real-world environments. The term "autonomous" is applied loosely, as most units had human "minders" nearby.
⦁ The Dexterity Challenge: A significant hurdle remains the "magnificence of the hand." While robots may have up to 20 degrees of freedom, they lack the sophisticated sensing capabilities of human hands, limiting their dexterity and practical application.
⦁ Applications and Investment: There is a strong emphasis on humanoids within the broader category of adaptive robots, perhaps disproportionately so. Real-world applications remain largely conceptual, with discussions centered on manufacturing and logistics. The recent influx of venture capital into companies like Figure, Apptronik, and 1X seems driven by the "shiny object" appeal, popularized by Elon Musk's Optimus, rather than a deep analysis of market applications. The CEO of 1X admitted their robots are still mostly teleoperated and perform a very narrow range of tasks.
⦁ Market Impact: A notable development is Hyundai's plan to manufacture 20,000-30,000 Atlas robots over the next few years for its new factory in Georgia. This move signals a potential shift in manufacturing and raises questions about future employment, as automation could curtail the number of jobs created by these new industrial investments.
For a more grounded perspective, the speakers recommend Rodney Brooks' recent essay, which offers critical observations from a veteran in the field of robotics.
Software-Defined Vehicles (SDVs) and In-Cabin AI
The automotive sector at CES showcased a significant push towards software-centric vehicles and integrating AI into the user experience.
⦁ Defining the SDV: An SDV is a vehicle where hardware and software are cleanly separated, allowing for most of the vehicle's functionality to be defined and updated via software. This is exemplified by Tesla and Rivian.
⦁ Architectural Evolution: The industry is transitioning from domain-based architectures to more advanced zonal architectures. The key advantages of a zonal approach include using fewer, more powerful hardware components and enabling a much larger portion of the vehicle (upwards of 80%) to be updated over-the-air (OTA).
⦁ AI in the Cabin: The focus of automotive AI is expanding beyond autonomous driving to the in-cabin experience. Major automakers announced significant partnerships:
⦁ Mercedes-Benz: Collaborating with Google and Microsoft on in-cabin agents.
⦁ BMW: Partnering with Amazon for its "Alexa Plus" integration.
⦁ Personal Autonomy: There was a greater emphasis on Level 4 autonomous vehicles for personal use, not just for robotaxi fleets. Nvidia deepened its long-standing relationship with Mercedes, and Ford announced an ambitious plan to deliver a Level 3 autonomous vehicle by 2028 for around $30,000.
⦁ Open Source Initiative: In a potentially transformative move, 30 automakers and tier-one suppliers announced a consortium to develop open-source software for SDVs, which could accelerate innovation and standardization across the industry.
The Shifting AI Chip Export Controls
The conversation shifted to the recent relaxation of US export controls on AI chips to China, a policy announced by the Trump administration in late 2025. The policy itself is ambiguous, seemingly based on a Truth Social post rather than a formal do...
Full story
Key Takeaways from CES 2026
The Humanoid Robotics Explosion
CES 2026 was overwhelmingly a "humanoids conference," with over 30 companies, predominantly from China, showcasing their latest developments. However, a closer look reveals the industry is still in its early stages.
⦁ State of the Technology: Many demonstrations were teleoperated, and the robots are not yet ready for deployment in real-world environments. The term "autonomous" is applied loosely, as most units had human "minders" nearby.
⦁ The Dexterity Challenge: A significant hurdle remains the "magnificence of the hand." While robots may have up to 20 degrees of freedom, they lack the sophisticated sensing capabilities of human hands, limiting their dexterity and practical application.
⦁ Applications and Investment: There is a strong emphasis on humanoids within the broader category of adaptive robots, perhaps disproportionately so. Real-world applications remain largely conceptual, with discussions centered on manufacturing and logistics. The recent influx of venture capital into companies like Figure, Apptronik, and 1X seems driven by the "shiny object" appeal, popularized by Elon Musk's Optimus, rather than a deep analysis of market applications. The CEO of 1X admitted their robots are still mostly teleoperated and perform a very narrow range of tasks.
⦁ Market Impact: A notable development is Hyundai's plan to manufacture 20,000-30,000 Atlas robots over the next few years for its new factory in Georgia. This move signals a potential shift in manufacturing and raises questions about future employment, as automation could curtail the number of jobs created by these new industrial investments.
For a more grounded perspective, the speakers recommend Rodney Brooks' recent essay, which offers critical observations from a veteran in the field of robotics.
Software-Defined Vehicles (SDVs) and In-Cabin AI
The automotive sector at CES showcased a significant push towards software-centric vehicles and integrating AI into the user experience.
⦁ Defining the SDV: An SDV is a vehicle where hardware and software are cleanly separated, allowing for most of the vehicle's functionality to be defined and updated via software. This is exemplified by Tesla and Rivian.
⦁ Architectural Evolution: The industry is transitioning from domain-based architectures to more advanced zonal architectures. The key advantages of a zonal approach include using fewer, more powerful hardware components and enabling a much larger portion of the vehicle (upwards of 80%) to be updated over-the-air (OTA).
⦁ AI in the Cabin: The focus of automotive AI is expanding beyond autonomous driving to the in-cabin experience. Major automakers announced significant partnerships:
⦁ Mercedes-Benz: Collaborating with Google and Microsoft on in-cabin agents.
⦁ BMW: Partnering with Amazon for its "Alexa Plus" integration.
⦁ Personal Autonomy: There was a greater emphasis on Level 4 autonomous vehicles for personal use, not just for robotaxi fleets. Nvidia deepened its long-standing relationship with Mercedes, and Ford announced an ambitious plan to deliver a Level 3 autonomous vehicle by 2028 for around $30,000.
⦁ Open Source Initiative: In a potentially transformative move, 30 automakers and tier-one suppliers announced a consortium to develop open-source software for SDVs, which could accelerate innovation and standardization across the industry.
The Shifting AI Chip Export Controls
The conversation shifted to the recent relaxation of US export controls on AI chips to China, a policy announced by the Trump administration in late 2025. The policy itself is ambiguous, seemingly based on a Truth Social post rather than a formal do...
Full story
tokenless.tech
Humanoid Robots: Hype vs. Reality | Tokenless
A deep dive into the key takeaways from CES 2026, covering the surge in humanoid robotics and the evolution of software-defined vehicles, followed by a nuanced analysis of the shifting US-China export controls on advanced AI chips.
The Problem: Single LLMs Fail Silently
Single-agent Large Language Model (LLM) systems present a significant challenge in production environments: they fail silently and are often "confidently wrong." When a single LLM misses a critical detail, such as a hard-coded key or a SQL injection vulnerability, it doesn't express uncertainty. Instead, it provides a definitive, and incorrect, answer. This behavior stems from several inherent limitations:
⦁ No Uncertainty Quantification: A single agent doesn't communicate its level of confidence. It presents every answer as 100% certain.
⦁ Lack of Alternative Viewpoints: The output is confined to the perspective of the single model being used, with no mechanism to introduce alternative or challenging viewpoints.
⦁ No Self-Correction: Without an external challenge, a single agent has no impetus to reconsider its conclusions, even if they are flawed. As the speaker notes, "if it misses it, it's not going to tell you."
Structured Dissent: A Multi-Agent Debate Swarm
To address these failures, a multi-agent orchestration pattern called Structured Dissent is proposed. The core idea is to create a "Think Tank"—a Socratic debate for AI—where agents with opposing viewpoints discuss and challenge decisions before reaching a consensus. This introduces nuance and a mechanism for adversarial verification.
The swarm is typically composed of three distinct agent personas:
⦁ Believers: The optimists. They are solution-focused, seeking opportunities and positive outcomes.
⦁ Skeptics: The paranoids. They focus on failure modes, risks, and hidden costs, effectively acting as a security team.
⦁ Neutrals: The facilitators. They work to prevent groupthink, synthesize the arguments from believers and skeptics, and build a balanced consensus.
The Three-Phase Debate Process
The system operates in a structured, multi-round debate. The default configuration uses five agents (two believers, two skeptics, one neutral) engaged in a three-phase process:
1. Phase 1: Parallel Analysis: Each agent independently analyzes the initial input (e.g., a security scan report) and forms its initial opinion based on its persona.
2. Phase 2: Adversarial Debate: The agents see each other's analyses and begin to argue. Skeptics challenge the believers' optimistic timelines by pointing out complexities, while believers might counter with potential solutions. This is "adversarial verification in real time," where the agents act as judges for each other's reasoning.
3. Phase 3: Synthesis and Reporting: After the debate rounds, the agents present their final conclusions. The neutral agent, acting as a "foreperson," synthesizes these into a final report.
The output is not a simple binary answer. It includes:
⦁ A majority opinion.
⦁ A confidence score indicating the swarm's certainty.
⦁ A summary of resolved and unresolved conflicts.
⦁ Key minority opinions, ensuring that dissenting views are not lost.
If the confidence score falls within a certain range (e.g., 50-75%), the system flags the issue for human review, acknowledging that it needs "an adult."
Use Case: MCP Server Security Analysis
The primary demonstration involves a security swarm built to analyze findings from open-source tools (like Bandit, Semgrep, Syft) on MCP (Machine Comprehension Programming) servers.
⦁ Input: Reports from static analysis and dependency vulnerability scans (approx. 35,000 characters).
⦁ Process: The swarm debates the findings to assess the security posture of the MCP server.
⦁ Performance: A typical analysis takes 3-5 minutes and costs around $15 in API calls. This is a significant improvement over a manual security analyst review, which could take hours and cost thousands of dollars.
⦁ Output: The system generates an "executive appropriate" report (approx. 10,000 characters)...
Full story
Single-agent Large Language Model (LLM) systems present a significant challenge in production environments: they fail silently and are often "confidently wrong." When a single LLM misses a critical detail, such as a hard-coded key or a SQL injection vulnerability, it doesn't express uncertainty. Instead, it provides a definitive, and incorrect, answer. This behavior stems from several inherent limitations:
⦁ No Uncertainty Quantification: A single agent doesn't communicate its level of confidence. It presents every answer as 100% certain.
⦁ Lack of Alternative Viewpoints: The output is confined to the perspective of the single model being used, with no mechanism to introduce alternative or challenging viewpoints.
⦁ No Self-Correction: Without an external challenge, a single agent has no impetus to reconsider its conclusions, even if they are flawed. As the speaker notes, "if it misses it, it's not going to tell you."
Structured Dissent: A Multi-Agent Debate Swarm
To address these failures, a multi-agent orchestration pattern called Structured Dissent is proposed. The core idea is to create a "Think Tank"—a Socratic debate for AI—where agents with opposing viewpoints discuss and challenge decisions before reaching a consensus. This introduces nuance and a mechanism for adversarial verification.
The swarm is typically composed of three distinct agent personas:
⦁ Believers: The optimists. They are solution-focused, seeking opportunities and positive outcomes.
⦁ Skeptics: The paranoids. They focus on failure modes, risks, and hidden costs, effectively acting as a security team.
⦁ Neutrals: The facilitators. They work to prevent groupthink, synthesize the arguments from believers and skeptics, and build a balanced consensus.
The Three-Phase Debate Process
The system operates in a structured, multi-round debate. The default configuration uses five agents (two believers, two skeptics, one neutral) engaged in a three-phase process:
1. Phase 1: Parallel Analysis: Each agent independently analyzes the initial input (e.g., a security scan report) and forms its initial opinion based on its persona.
2. Phase 2: Adversarial Debate: The agents see each other's analyses and begin to argue. Skeptics challenge the believers' optimistic timelines by pointing out complexities, while believers might counter with potential solutions. This is "adversarial verification in real time," where the agents act as judges for each other's reasoning.
3. Phase 3: Synthesis and Reporting: After the debate rounds, the agents present their final conclusions. The neutral agent, acting as a "foreperson," synthesizes these into a final report.
The output is not a simple binary answer. It includes:
⦁ A majority opinion.
⦁ A confidence score indicating the swarm's certainty.
⦁ A summary of resolved and unresolved conflicts.
⦁ Key minority opinions, ensuring that dissenting views are not lost.
If the confidence score falls within a certain range (e.g., 50-75%), the system flags the issue for human review, acknowledging that it needs "an adult."
Use Case: MCP Server Security Analysis
The primary demonstration involves a security swarm built to analyze findings from open-source tools (like Bandit, Semgrep, Syft) on MCP (Machine Comprehension Programming) servers.
⦁ Input: Reports from static analysis and dependency vulnerability scans (approx. 35,000 characters).
⦁ Process: The swarm debates the findings to assess the security posture of the MCP server.
⦁ Performance: A typical analysis takes 3-5 minutes and costs around $15 in API calls. This is a significant improvement over a manual security analyst review, which could take hours and cost thousands of dollars.
⦁ Output: The system generates an "executive appropriate" report (approx. 10,000 characters)...
Full story
tokenless.tech
Structured Dissent Patterns for Agentic Production Reliability | Tokenless
This talk introduces 'structured dissent,' a multi-agent orchestration pattern where believer, skeptic, and neutral agents debate decisions to overcome the 'confidently wrong' failure mode of single-agent LLM systems, improving reliability for high-stakes…
The Asymmetric Design Cycle: AI's Compute Bottleneck
The fundamental bottleneck holding back AI progress is the asymmetric design cycle between AI models and the chips they run on. While new AI methods can be developed rapidly, designing and manufacturing the next generation of chips is a multi-year, multi-hundred-million-dollar process. This mismatch prevents the effective co-design and co-evolution of hardware, software, and AI workloads. The current paradigm often involves repurposing existing hardware, like GPUs originally designed for graphics, for AI tasks. While effective at matrix multiplication, these chips are not co-optimized for the specific neural network models being run. The vision is to dramatically shorten the chip design cycle, enabling a world where custom silicon can be created in tandem with new AI applications, bending the curve of the scaling laws that govern AI progress.
The Genesis: AlphaChip and the TPU Team
The journey began with the AlphaChip project at Google, which ultimately helped design four successive generations of Tensor Processing Units (TPUs). The project started by applying Reinforcement Learning (RL) to chip placement, a critical stage in the physical design process known as floorplanning.
The initial collaboration with Google's TPU team was met with significant skepticism. The research team, coming from an AI background, initially optimized for academic metrics like "half-perimeter wire length." The TPU engineers, however, were quick to point out that these metrics were irrelevant to them. They cared about a complex set of real-world constraints:
⦁ Routed wire length
⦁ Horizontal and vertical congestion
⦁ Timing violations
⦁ Power consumption
⦁ Area (PPA)
To gain trust, the AlphaChip team adopted a highly iterative, customer-obsessed approach. They met with the TPU team weekly for years, showing them new data and working collaboratively to build cost functions that approximated the metrics the engineers truly valued. This deep partnership was crucial. For an engineer to choose an AI-generated layout over their own, they had to be convinced it was superior on every single metric they cared about, as they were ultimately responsible for the block's performance.
A New Paradigm for Chip Design
The technical approach for AlphaChip was fundamentally different from traditional Electronic Design Automation (EDA) methods. Instead of using classical combinatorial optimization solvers, the team trained an RL agent to place the millions of components on a chip.
⦁ Learning from Experience: The RL agent learns through trial and error, interacting with a simulated environment. It learns from both positive and negative placement examples, iteratively improving its strategy. This ability to learn from experience allows the model to self-improve, much like a human expert who gets better with each new design, but at a vastly greater scale.
⦁ Superhuman and Unconventional Designs: The AI began to produce layouts that were radically different from human-designed ones. As Anna Goldie noted, "We saw these very strange like curved placements... donut shapes as well." Humans tend to create highly regular, grid-like layouts. The AI, however, discovered that curved and non-uniform shapes could reduce wire length, thereby improving power consumption and timing, even if they appeared counter-intuitive and complex.
The project's success was validated when the first chip designed with AlphaChip's help was taped out and returned from the fab fully functional. With each subsequent TPU generation, the AI's layouts were adopted across more of the chip, and the performance delta between the AI's design and the human baseline grew, demonstrating AI's ability to scale with more data and experience.
Ricursive Intelligence: From Fabless to Designless
The success of AlphaChip inspired the founding of Ricursive Int...
Full story
The fundamental bottleneck holding back AI progress is the asymmetric design cycle between AI models and the chips they run on. While new AI methods can be developed rapidly, designing and manufacturing the next generation of chips is a multi-year, multi-hundred-million-dollar process. This mismatch prevents the effective co-design and co-evolution of hardware, software, and AI workloads. The current paradigm often involves repurposing existing hardware, like GPUs originally designed for graphics, for AI tasks. While effective at matrix multiplication, these chips are not co-optimized for the specific neural network models being run. The vision is to dramatically shorten the chip design cycle, enabling a world where custom silicon can be created in tandem with new AI applications, bending the curve of the scaling laws that govern AI progress.
The Genesis: AlphaChip and the TPU Team
The journey began with the AlphaChip project at Google, which ultimately helped design four successive generations of Tensor Processing Units (TPUs). The project started by applying Reinforcement Learning (RL) to chip placement, a critical stage in the physical design process known as floorplanning.
The initial collaboration with Google's TPU team was met with significant skepticism. The research team, coming from an AI background, initially optimized for academic metrics like "half-perimeter wire length." The TPU engineers, however, were quick to point out that these metrics were irrelevant to them. They cared about a complex set of real-world constraints:
⦁ Routed wire length
⦁ Horizontal and vertical congestion
⦁ Timing violations
⦁ Power consumption
⦁ Area (PPA)
To gain trust, the AlphaChip team adopted a highly iterative, customer-obsessed approach. They met with the TPU team weekly for years, showing them new data and working collaboratively to build cost functions that approximated the metrics the engineers truly valued. This deep partnership was crucial. For an engineer to choose an AI-generated layout over their own, they had to be convinced it was superior on every single metric they cared about, as they were ultimately responsible for the block's performance.
A New Paradigm for Chip Design
The technical approach for AlphaChip was fundamentally different from traditional Electronic Design Automation (EDA) methods. Instead of using classical combinatorial optimization solvers, the team trained an RL agent to place the millions of components on a chip.
⦁ Learning from Experience: The RL agent learns through trial and error, interacting with a simulated environment. It learns from both positive and negative placement examples, iteratively improving its strategy. This ability to learn from experience allows the model to self-improve, much like a human expert who gets better with each new design, but at a vastly greater scale.
⦁ Superhuman and Unconventional Designs: The AI began to produce layouts that were radically different from human-designed ones. As Anna Goldie noted, "We saw these very strange like curved placements... donut shapes as well." Humans tend to create highly regular, grid-like layouts. The AI, however, discovered that curved and non-uniform shapes could reduce wire length, thereby improving power consumption and timing, even if they appeared counter-intuitive and complex.
The project's success was validated when the first chip designed with AlphaChip's help was taped out and returned from the fab fully functional. With each subsequent TPU generation, the AI's layouts were adopted across more of the chip, and the performance delta between the AI's design and the human baseline grew, demonstrating AI's ability to scale with more data and experience.
Ricursive Intelligence: From Fabless to Designless
The success of AlphaChip inspired the founding of Ricursive Int...
Full story
tokenless.tech
How Ricursive Intelligence’s Founders are Using AI to Shape The Future of Chip Design | Tokenless
Anna Goldie and Azalia Mirhoseini of Ricursive Intelligence discuss how their work on Google's AlphaChip, which used AI to design TPUs, is now being extended to automate the entire chip design process. They explain their vision for a 'designless' industry…
Why Every Brain Metaphor in History Has Been Wrong [SPECIAL EDITION]
An exploration of scientific simplification, questioning the metaphors we use to understand the brain and intelligence. This summary delves into the tension between creating useful models and mistaking them for reality, featuring insights on the mind-as-software debate, the limits of prediction versus understanding, and the philosophical underpinnings of our quest for AGI.
An exploration of scientific simplification, questioning the metaphors we use to understand the brain and intelligence. This summary delves into the tension between creating useful models and mistaking them for reality, featuring insights on the mind-as-software debate, the limits of prediction versus understanding, and the philosophical underpinnings of our quest for AGI.
Science operates by simplifying complex reality, but this necessary act raises a fundamental question: have we found a deep truth about the world, or are we mistaking our simplified model for the actual thing? This tension is embodied by the "spherical cow" joke in physics and is central to modern neuroscience and AI. As Professor Mazviita Chirimuuta explains in her book, The Brain Abstracted, we are limited creatures who must build models and leave things out. The critical disagreement, however, is what this success implies about reality itself.
This can be framed as a conflict between two perspectives:
⦁ Simplicius: Believes that science works because the universe is fundamentally simple and orderly underneath its apparent complexity. An elegant equation reflects reality.
⦁ Ignorantio: Argues that we simplify because we are cognitively limited. Our models are useful fictions—maps, not the territory—that work for our specific purposes, which doesn't prove that nature itself is simple.
Chirimuuta aligns with "learned ignorance" (docta ignorantia), the idea that true learning includes understanding the limits of what you know.
The Kaleidoscope Hypothesis: Is Reality Fundamentally Code?
Francois Chollet proposes the "kaleidoscope hypothesis," suggesting that beneath the messy surface of reality lies an intrinsic, underlying structure composed of simple, repeating "atoms of meaning." Much like a kaleidoscope creates infinite complexity from a few pieces of colored glass, the world is generated by the repetition and composition of these fundamental elements. Intelligence, in this view, is the process of mining experience to extract these abstractions.
Chirimuuta frames this not as a scientific certainty but as a philosophical bet, akin to Plato's theory of Forms. It's a wager that "real reality is neat, mathematical, and decomposable" beneath the complicated world of appearances.
The Ultimate Metaphor: Is the Mind Software?
The most pervasive simplification today is the idea that the mind is a computer running software. This has moved from a metaphor to what many consider a literal truth. Joscha Bach argues provocatively that this is not a metaphor at all: "Software is spirit." He posits that abstract patterns, like software or money, have real causal power, independent of their physical substrate. A program produces the same effects whether on a Mac, a PC, or potentially even neurons, because the causal power lies in the invariance of the pattern itself.
The counterargument is that this "sameness" is not inherent in nature but is imposed by a human observer. Physically, completely different things are happening inside different computer chips. The invariance exists only in our description. The causal power of money, for example, isn't in the paper or electrons but in the shared social agreements and interpretive practices of humans. The critique is that this view mistakes an elegant description for the fundamental structure of reality.
Historically, our metaphors for the brain have always tracked our most advanced technology:
⦁ Descartes: Hydraulic pumps in French royal gardens.
⦁ 19th Century: A telegraph network.
⦁ 20th Century: A telephone switchboard.
⦁ 21st Century: A digital computer.
As Jeff Beck bluntly states, "It will always be the case that our explanation for how the brain works will be by analogy to the most sophisticated technology that we have."
Ontology vs. Metaphysics: It Depends on Why You're Asking
Professor Luciano Floridi offers a framework to navigate this, distinguishing between metaphysics (reality as it is in itself, which is inaccessible) and ontology (the structure we impose on reality for a specific purpose). Our models of the world are not absolutely true or false; their value is relational.
Full story
This can be framed as a conflict between two perspectives:
⦁ Simplicius: Believes that science works because the universe is fundamentally simple and orderly underneath its apparent complexity. An elegant equation reflects reality.
⦁ Ignorantio: Argues that we simplify because we are cognitively limited. Our models are useful fictions—maps, not the territory—that work for our specific purposes, which doesn't prove that nature itself is simple.
Chirimuuta aligns with "learned ignorance" (docta ignorantia), the idea that true learning includes understanding the limits of what you know.
The Kaleidoscope Hypothesis: Is Reality Fundamentally Code?
Francois Chollet proposes the "kaleidoscope hypothesis," suggesting that beneath the messy surface of reality lies an intrinsic, underlying structure composed of simple, repeating "atoms of meaning." Much like a kaleidoscope creates infinite complexity from a few pieces of colored glass, the world is generated by the repetition and composition of these fundamental elements. Intelligence, in this view, is the process of mining experience to extract these abstractions.
Chirimuuta frames this not as a scientific certainty but as a philosophical bet, akin to Plato's theory of Forms. It's a wager that "real reality is neat, mathematical, and decomposable" beneath the complicated world of appearances.
The Ultimate Metaphor: Is the Mind Software?
The most pervasive simplification today is the idea that the mind is a computer running software. This has moved from a metaphor to what many consider a literal truth. Joscha Bach argues provocatively that this is not a metaphor at all: "Software is spirit." He posits that abstract patterns, like software or money, have real causal power, independent of their physical substrate. A program produces the same effects whether on a Mac, a PC, or potentially even neurons, because the causal power lies in the invariance of the pattern itself.
The counterargument is that this "sameness" is not inherent in nature but is imposed by a human observer. Physically, completely different things are happening inside different computer chips. The invariance exists only in our description. The causal power of money, for example, isn't in the paper or electrons but in the shared social agreements and interpretive practices of humans. The critique is that this view mistakes an elegant description for the fundamental structure of reality.
Historically, our metaphors for the brain have always tracked our most advanced technology:
⦁ Descartes: Hydraulic pumps in French royal gardens.
⦁ 19th Century: A telegraph network.
⦁ 20th Century: A telephone switchboard.
⦁ 21st Century: A digital computer.
As Jeff Beck bluntly states, "It will always be the case that our explanation for how the brain works will be by analogy to the most sophisticated technology that we have."
Ontology vs. Metaphysics: It Depends on Why You're Asking
Professor Luciano Floridi offers a framework to navigate this, distinguishing between metaphysics (reality as it is in itself, which is inaccessible) and ontology (the structure we impose on reality for a specific purpose). Our models of the world are not absolutely true or false; their value is relational.
Is it the same ship of Theseus? The question is a mistake. It provides no interface, what computer scientis...
Full story
tokenless.tech
Why Every Brain Metaphor in History Has Been Wrong [SPECIAL EDITION] | Tokenless
An exploration of scientific simplification, questioning the metaphors we use to understand the brain and intelligence. This summary delves into the tension between creating useful models and mistaking them for reality, featuring insights on the mind-as-software…
"We Made a Dream Machine That Runs on Your Gaming PC"
Shahbuland Matiana and Andrew Lapp from Overworld Labs introduce Waypoint 1, a 2 billion-parameter open-source world simulation model designed to run on consumer hardware at 60 FPS. They discuss its novel architecture, which combines a causal language model with an image diffusion model to denoise frames in real-time based on user prompts and controller inputs, emphasizing low-latency interaction and the importance of local execution for user privacy.
Shahbuland Matiana and Andrew Lapp from Overworld Labs introduce Waypoint 1, a 2 billion-parameter open-source world simulation model designed to run on consumer hardware at 60 FPS. They discuss its novel architecture, which combines a causal language model with an image diffusion model to denoise frames in real-time based on user prompts and controller inputs, emphasizing low-latency interaction and the importance of local execution for user privacy.
Overworld Labs has introduced Waypoint 1, a 2 billion-parameter world simulation model designed to run efficiently on consumer hardware. Unlike large-scale projects like Google's Genie, which rely on massive cloud infrastructure, Waypoint 1 is optimized for local execution on gaming PCs (e.g., NVIDIA 3070s, 4090s) and soon, Apple Silicon. The model, whose weights are being open-sourced, is capable of generating interactive, explorable worlds from text or image prompts at 60 frames per second.
The Vision: Sharable Lucid Dreams
The core motivation behind Overworld is to create a way to record and share the kinds of immersive, dynamic experiences found in dreams. Co-founder Shahbuland Matiana described a personal lucid dream that modern game engines cannot replicate:
The goal of Waypoint 1 is to enable the creation of such fully immersive experiences where the world bends and reacts to the user's actions, and then allow those experiences to be shared with others. This technology aims to be a "killer application" for AI, moving beyond static video generation into truly interactive entertainment.
Technical Architecture: A Real-Time Diffusion Transformer
Waypoint 1's architecture is a novel hybrid of a causal language model and an image diffusion model, optimized for real-time interaction.
1. Image Compression: The process begins with an autoencoder that compresses video frames (e.g., 360p) into a much smaller latent representation, such as a 32x32 grid. The model operates entirely in this compressed latent space, not on raw pixels.
2. Frame Generation: The core of the system is a transformer model. However, instead of autoregressively predicting the next token like a standard LLM, it denoises the next 256 tokens (representing one full frame) in a single forward pass.
3. Conditioning: Each frame is generated conditioned on a history of preceding frames, a text prompt, and controller inputs from the last 1/60th of a second. This conditioning is managed through cross-attention mechanisms within the transformer blocks.
4. Low Latency: To ensure playability and responsiveness, the model generates only one frame at a time. This is a key distinction from many video diffusion models that use temporal autoencoders to compress multiple frames together, which saves computation but introduces significant input lag (e.g., only accepting input every 4th frame).
Optimization and Distillation
Achieving 60 FPS on consumer hardware requires significant optimization. The team uses a four-step rectified flow model with an Euler sampler. In this process, the model starts with random noise and, over four steps, predicts the vector that moves the latent representation closer to the "clean," ideal frame.
A key insight is that reducing the number of diffusion steps primarily sacrifices diversity, not quality. For an autoregressive model like Waypoint 1, this is an acceptable trade-off. The strong conditioning from previous frames and user input already constrains the output, so the inherent diversity from a high-step diffusion process is less critical.
This speed is further enhanced by diffusion distillation (e.g., using methods like Distribution Matching Distillation or DMD), where a "student" model is trained to replicate the output of a larger model in fewer steps. This process effectively "bakes in" parameters like the classifier-free guidance (CFG) scale, which avoids the need for multiple forward passes during inference and dramatically speeds up generation.
Privacy and the Future
The team strongly advocates for ...
Full story
The Vision: Sharable Lucid Dreams
The core motivation behind Overworld is to create a way to record and share the kinds of immersive, dynamic experiences found in dreams. Co-founder Shahbuland Matiana described a personal lucid dream that modern game engines cannot replicate:
"I was in this like house floating in space and there was a giant like dragon circling the the house... I draw a katana from my like waist and I parry the dragon's teeth as it goes try to bite me. I feel a clang reverberate through my whole body. The floorboards crack beneath my feet. The window shatter around me."
The goal of Waypoint 1 is to enable the creation of such fully immersive experiences where the world bends and reacts to the user's actions, and then allow those experiences to be shared with others. This technology aims to be a "killer application" for AI, moving beyond static video generation into truly interactive entertainment.
Technical Architecture: A Real-Time Diffusion Transformer
Waypoint 1's architecture is a novel hybrid of a causal language model and an image diffusion model, optimized for real-time interaction.
1. Image Compression: The process begins with an autoencoder that compresses video frames (e.g., 360p) into a much smaller latent representation, such as a 32x32 grid. The model operates entirely in this compressed latent space, not on raw pixels.
2. Frame Generation: The core of the system is a transformer model. However, instead of autoregressively predicting the next token like a standard LLM, it denoises the next 256 tokens (representing one full frame) in a single forward pass.
3. Conditioning: Each frame is generated conditioned on a history of preceding frames, a text prompt, and controller inputs from the last 1/60th of a second. This conditioning is managed through cross-attention mechanisms within the transformer blocks.
4. Low Latency: To ensure playability and responsiveness, the model generates only one frame at a time. This is a key distinction from many video diffusion models that use temporal autoencoders to compress multiple frames together, which saves computation but introduces significant input lag (e.g., only accepting input every 4th frame).
Optimization and Distillation
Achieving 60 FPS on consumer hardware requires significant optimization. The team uses a four-step rectified flow model with an Euler sampler. In this process, the model starts with random noise and, over four steps, predicts the vector that moves the latent representation closer to the "clean," ideal frame.
A key insight is that reducing the number of diffusion steps primarily sacrifices diversity, not quality. For an autoregressive model like Waypoint 1, this is an acceptable trade-off. The strong conditioning from previous frames and user input already constrains the output, so the inherent diversity from a high-step diffusion process is less critical.
This speed is further enhanced by diffusion distillation (e.g., using methods like Distribution Matching Distillation or DMD), where a "student" model is trained to replicate the output of a larger model in fewer steps. This process effectively "bakes in" parameters like the classifier-free guidance (CFG) scale, which avoids the need for multiple forward passes during inference and dramatically speeds up generation.
Privacy and the Future
The team strongly advocates for ...
Full story
tokenless.tech
"We Made a Dream Machine That Runs on Your Gaming PC" | Tokenless
Shahbuland Matiana and Andrew Lapp from Overworld Labs introduce Waypoint 1, a 2 billion-parameter open-source world simulation model designed to run on consumer hardware at 60 FPS. They discuss its novel architecture, which combines a causal language model…
This Startup Beat Gemini 3 on ARC-AGI — at Half the Cost
Poetic, a startup by ex-DeepMind researchers, has significantly advanced performance on the ARC-AGI benchmark by applying a recursive self-improvement system to Gemini 3. Co-founder Ian Fisher discusses how their approach of automating prompt and system engineering provides a substantial performance boost without needing access to model weights, and explores its potential as a path toward AGI.
Poetic, a startup by ex-DeepMind researchers, has significantly advanced performance on the ARC-AGI benchmark by applying a recursive self-improvement system to Gemini 3. Co-founder Ian Fisher discusses how their approach of automating prompt and system engineering provides a substantial performance boost without needing access to model weights, and explores its potential as a path toward AGI.
Poetic, a new startup founded by former DeepMind researchers, has achieved a significant breakthrough on the ARC-AGI benchmark. By layering their proprietary system on top of Gemini 3, they achieved a 54% score on the private test set, a substantial leap from Gemini 3's baseline of approximately 33% and even surpassing the more advanced Gemini 3 Deep Think's 45% at half the cost.
The Core Technology: Recursive Self-Improvement
The central idea behind Poetic's success is a form of recursive self-improvement (RSI), which co-founder Ian Fisher describes as "the holy grail of AI." The goal is to create a system where the AI actively makes itself smarter.
Unlike methods that require fine-tuning or access to model weights, Poetic's approach operates purely at the system and prompt level. This is a crucial advantage when working with closed-source models available only through APIs. The methodology involves:
⦁ Ensemble Methods: The system calls the underlying model (e.g., Gemini 3) multiple times.
⦁ Independent Refinement: Each member of the ensemble works independently to refine its own answer.
⦁ Advanced Voting Schemes: The refined answers are combined using a sophisticated voting mechanism to produce a final, more accurate solution.
This system-level optimization is what differentiates Poetic from other prompt engineering frameworks like DSPy, containing what Fisher refers to as "trade secret insights" that yield a significant performance difference. The entire ARC-AGI solver was an output of their system, which was trained on ARC-1 and then applied to ARC-2 without any specific training on the latter.
The Gemini 3 Catalyst
The release of Gemini 3 was a pivotal moment. While Poetic's system showed promising results on ARC-1 with other models (reaching 89%), switching to Gemini 3 pushed their performance to 95%. When they applied this new combination to the more challenging ARC-2, they had a "holy cow moment" as the performance jumped to the state-of-the-art 54%.
Fisher attributes this leap to Gemini 3's exceptional ability to generate code for visual problem-solving, a capability that surpassed previous models. He also notes that other powerful models like Anthropic's Opus can be swapped in for Gemini 3 to achieve similar results, albeit at a higher cost.
A Path to AGI and Practical Applications
Fisher views RSI as both a practical tool for immediate performance gains and a credible path toward AGI.
⦁ Immediate Value: The performance "bump" from Poetic's system can be highly valuable. On the ARC-AGI benchmark, which allows for two solution submissions, their method provided a single, higher-quality solution that outperformed the underlying model's two submissions, sometimes at a lower overall cost.
⦁ Long-Term Vision: While not the only path, Fisher believes RSI is "the most exciting path to AGI and beyond." The process on ARC-AGI was stopped manually due to cost constraints, suggesting that with more resources, the performance could have "hill-climbed" even further.
Automating the Prompt Engineer
The broader vision for Poetic is to automate the complex and often manual process of prompt engineering and agent creation. Fisher draws an analogy to the evolution of deep learning, which automated the manual process of feature engineering.
He contrasts their previous manual work at DeepMind—akin to building a car by hand—with Poetic's technology, which is like "building a factory to build cars." The goal is to create a system that automatically discovers the optimal prompts and system configurations, removing the human from the tedious trial-and-error loop. While continuing their research and targeting other high-impact benchmarks, the six-person team is now also focusing on bringing t...
Full story
The Core Technology: Recursive Self-Improvement
The central idea behind Poetic's success is a form of recursive self-improvement (RSI), which co-founder Ian Fisher describes as "the holy grail of AI." The goal is to create a system where the AI actively makes itself smarter.
Unlike methods that require fine-tuning or access to model weights, Poetic's approach operates purely at the system and prompt level. This is a crucial advantage when working with closed-source models available only through APIs. The methodology involves:
⦁ Ensemble Methods: The system calls the underlying model (e.g., Gemini 3) multiple times.
⦁ Independent Refinement: Each member of the ensemble works independently to refine its own answer.
⦁ Advanced Voting Schemes: The refined answers are combined using a sophisticated voting mechanism to produce a final, more accurate solution.
This system-level optimization is what differentiates Poetic from other prompt engineering frameworks like DSPy, containing what Fisher refers to as "trade secret insights" that yield a significant performance difference. The entire ARC-AGI solver was an output of their system, which was trained on ARC-1 and then applied to ARC-2 without any specific training on the latter.
The Gemini 3 Catalyst
The release of Gemini 3 was a pivotal moment. While Poetic's system showed promising results on ARC-1 with other models (reaching 89%), switching to Gemini 3 pushed their performance to 95%. When they applied this new combination to the more challenging ARC-2, they had a "holy cow moment" as the performance jumped to the state-of-the-art 54%.
Fisher attributes this leap to Gemini 3's exceptional ability to generate code for visual problem-solving, a capability that surpassed previous models. He also notes that other powerful models like Anthropic's Opus can be swapped in for Gemini 3 to achieve similar results, albeit at a higher cost.
A Path to AGI and Practical Applications
Fisher views RSI as both a practical tool for immediate performance gains and a credible path toward AGI.
⦁ Immediate Value: The performance "bump" from Poetic's system can be highly valuable. On the ARC-AGI benchmark, which allows for two solution submissions, their method provided a single, higher-quality solution that outperformed the underlying model's two submissions, sometimes at a lower overall cost.
⦁ Long-Term Vision: While not the only path, Fisher believes RSI is "the most exciting path to AGI and beyond." The process on ARC-AGI was stopped manually due to cost constraints, suggesting that with more resources, the performance could have "hill-climbed" even further.
Automating the Prompt Engineer
The broader vision for Poetic is to automate the complex and often manual process of prompt engineering and agent creation. Fisher draws an analogy to the evolution of deep learning, which automated the manual process of feature engineering.
"We are quite intentionally automating ourselves, automating prompt engineers, automating people who are building agents. It's a power tool."
He contrasts their previous manual work at DeepMind—akin to building a car by hand—with Poetic's technology, which is like "building a factory to build cars." The goal is to create a system that automatically discovers the optimal prompts and system configurations, removing the human from the tedious trial-and-error loop. While continuing their research and targeting other high-impact benchmarks, the six-person team is now also focusing on bringing t...
Full story
tokenless.tech
This Startup Beat Gemini 3 on ARC-AGI — at Half the Cost | Tokenless
Poetic, a startup by ex-DeepMind researchers, has significantly advanced performance on the ARC-AGI benchmark by applying a recursive self-improvement system to Gemini 3. Co-founder Ian Fisher discusses how their approach of automating prompt and system engineering…
She Raised $64M to Build an AI Math Prodigy | Carina Hong, CEO of Axiom
Carina Hong, Founder & CEO of Axiom, discusses building a self-improving AI reasoning engine that combines generation and verification. Starting with formal mathematics, Axiom's system has achieved superhuman results on the notoriously difficult Putnam Exam by leveraging formal languages like Lean to overcome the probabilistic and unverifiable nature of standard LLMs. Hong explores how this technology can solve major bottlenecks in hardware and software verification, code migration, and database consistency, and what it means for the future of mathematical research.
Carina Hong, Founder & CEO of Axiom, discusses building a self-improving AI reasoning engine that combines generation and verification. Starting with formal mathematics, Axiom's system has achieved superhuman results on the notoriously difficult Putnam Exam by leveraging formal languages like Lean to overcome the probabilistic and unverifiable nature of standard LLMs. Hong explores how this technology can solve major bottlenecks in hardware and software verification, code migration, and database consistency, and what it means for the future of mathematical research.