OpenAI is laying the groundwork for a Q4 IPO and has started informal talks with Wall Street banks while building out its finance team.
OpenAI is moving faster in part because it’s worried Anthropic could beat it to market.
OpenAI is moving faster in part because it’s worried Anthropic could beat it to market.
The Wall Street Journal
Exclusive | OpenAI Plans Fourth-Quarter IPO in Race to Beat Anthropic to Market
The rivals are competing to be the first major generative AI startup to tap the public markets.
❤3🔥3🥰2
Cool work that aligns with how humans learn.
The model writes its own answers
1) without cheating
2) cheating (seeing the true answer)
It learns to make (1) close to (2) by minimizing the KL divergence.
This prevent catastrophic forgetting in continual learning.
The model writes its own answers
1) without cheating
2) cheating (seeing the true answer)
It learns to make (1) close to (2) by minimizing the KL divergence.
This prevent catastrophic forgetting in continual learning.
❤4👏4🔥3
New paper from Google DeepMind studying how LLMs representations of things like factuality evolve over a conversation.
Researchers find that in edge case conversations, e.g. about model consciousness or delusional content, model representations can change dramatically.
In a simulated argument where two language models argue about whether they are conscious or not (one pro, one anti) their representations for questions about consciousness flip back and forth as they play each role.
By contrast, contexts that are clearly framed as sci-fi stories result in less representational change.
Researchers think these results are interesting as one way models adapt to context, and are consistent with a "role-play" description in which models' representations evolve to reflect the current role, e.g. in an argument. (N.b. these conversations are mostly noton policy!).
They also raise challenges for the construct validity of dimensions discovered using interpretability methods — dimensions may not have the same meaning w.r.t. ground truth at different points in a context. This poses challenges for probing and steering for safety, etc.
Researchers find that in edge case conversations, e.g. about model consciousness or delusional content, model representations can change dramatically.
In a simulated argument where two language models argue about whether they are conscious or not (one pro, one anti) their representations for questions about consciousness flip back and forth as they play each role.
By contrast, contexts that are clearly framed as sci-fi stories result in less representational change.
Researchers think these results are interesting as one way models adapt to context, and are consistent with a "role-play" description in which models' representations evolve to reflect the current role, e.g. in an argument. (N.b. these conversations are mostly noton policy!).
They also raise challenges for the construct validity of dimensions discovered using interpretability methods — dimensions may not have the same meaning w.r.t. ground truth at different points in a context. This poses challenges for probing and steering for safety, etc.
arXiv.org
Linear representations in language models can change dramatically...
Language model representations often contain linear directions that correspond to high-level concepts. Here, we study the dynamics of these representations: how representations evolve along these...
❤4🔥4👍3
Hong Kong has announced that the Stablecoin Ordinance has come into effect and is currently processing license applications.
The regulatory framework for virtual asset trading, custody, advisory, and management services will be submitted to the Legislative Council this year, and automatic exchange of cross-border tax information is expected to commence in 2028.
The regulatory framework for virtual asset trading, custody, advisory, and management services will be submitted to the Legislative Council this year, and automatic exchange of cross-border tax information is expected to commence in 2028.
www.info.gov.hk
財經事務及庫務局局長出席立法會財經事務委員會政策簡報會開場發言(只有中文)
以下是財經事務及庫務局局長許正宇今日(一月三十日)出席立法會財經事務委員會政策簡報會的開場發言:
主席、各位委員:
二 ○二六年是國家「十五五」規...
主席、各位委員:
二 ○二六年是國家「十五五」規...
👍3🔥3👏3
Claude Sonnet 5. The Fennec Leaks
Fennec Codename leaked internal codename for Claude Sonnet 5, reportedly one full generation ahead of Gemini’s “Snow Bunny.”
A Vertex AI error log lists claude-sonnet-5@20260203, pointing to a February 3, 2026 release window.
Rumored to be 50% cheaper than Claude Opus 4.5 while outperforming it across metrics.
Retains the 1M token context window, but runs significantly faster.
Allegedly trained/optimized on Google TPUs, enabling higher throughput and lower latency.
Can spawn specialized sub-agents (backend, QA, researcher) that work in parallel from the terminal.
Agents run autonomously in the background you give a brief, they build the full feature like human teammates.
Insider leaks claim it surpasses 80.9% on SWE-Bench, effectively outscoring current coding models.
The 404 on the specific Sonnet 5 ID suggests the model already exists in Google’s infrastructure, awaiting activation.
Unverified leaks; treat timelines, pricing, and benchmarks with caution.
Fennec Codename leaked internal codename for Claude Sonnet 5, reportedly one full generation ahead of Gemini’s “Snow Bunny.”
A Vertex AI error log lists claude-sonnet-5@20260203, pointing to a February 3, 2026 release window.
Rumored to be 50% cheaper than Claude Opus 4.5 while outperforming it across metrics.
Retains the 1M token context window, but runs significantly faster.
Allegedly trained/optimized on Google TPUs, enabling higher throughput and lower latency.
Can spawn specialized sub-agents (backend, QA, researcher) that work in parallel from the terminal.
Agents run autonomously in the background you give a brief, they build the full feature like human teammates.
Insider leaks claim it surpasses 80.9% on SWE-Bench, effectively outscoring current coding models.
The 404 on the specific Sonnet 5 ID suggests the model already exists in Google’s infrastructure, awaiting activation.
Unverified leaks; treat timelines, pricing, and benchmarks with caution.
❤5🔥5👏3
Meta introduced Self-Improving Pretraining
Reinvents pretraining: no more next token prediction.
- Uses existing LM from last self-improvement iteration to give rewards to pretrain new model on sequences
- Large gains in factuality, safety & quality.
Reinvents pretraining: no more next token prediction.
- Uses existing LM from last self-improvement iteration to give rewards to pretrain new model on sequences
- Large gains in factuality, safety & quality.
arXiv.org
Self-Improving Pretraining: using post-trained models to pretrain...
Ensuring safety, factuality and overall quality in the generations of large language models is a critical challenge, especially as these models are increasingly deployed in real-world...
❤5🔥5👏3🤩2
DeepMind just stress-tested AI for math discovery.
They pointed Gemini at 700 “open” Erdős problems.
Result: 13 resolved.
• 5 via seemingly novel, autonomous solutions
• 8 by uncovering forgotten prior proofs in the literature
The twist? Many “open” problems weren’t hard, just obscure.
The paper also flags real risks for AI math at scale: literature blind spots and even subconscious plagiarism.
AI isn’t just solving math, it’s auditing the canon.
They pointed Gemini at 700 “open” Erdős problems.
Result: 13 resolved.
• 5 via seemingly novel, autonomous solutions
• 8 by uncovering forgotten prior proofs in the literature
The twist? Many “open” problems weren’t hard, just obscure.
The paper also flags real risks for AI math at scale: literature blind spots and even subconscious plagiarism.
AI isn’t just solving math, it’s auditing the canon.
arXiv.org
Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on...
We present a case study in semi-autonomous mathematics discovery, using Gemini to systematically evaluate 700 conjectures labeled 'Open' in Bloom's Erdős Problems database. We employ a hybrid...
❤8🔥2👏2
Tether launches open-source Bitcoin mining OS to challenge proprietary software
The stablecoin issuer said it has open-sourced MiningOS, or MOS, an operating system designed to manage, monitor, and automate bitcoin mining operations across deployments ranging from small home setups to large industrial sites.
Tether first previewed plans for an open-source mining operating system last year, arguing that new entrants should be able to compete in bitcoin mining without depending on expensive, closed-source management tools.
Other firms like Jack Dorsey’s Block have also pushed for more open mining infrastructure.
The stablecoin issuer said it has open-sourced MiningOS, or MOS, an operating system designed to manage, monitor, and automate bitcoin mining operations across deployments ranging from small home setups to large industrial sites.
Tether first previewed plans for an open-source mining operating system last year, arguing that new entrants should be able to compete in bitcoin mining without depending on expensive, closed-source management tools.
Other firms like Jack Dorsey’s Block have also pushed for more open mining infrastructure.
tether.io
Tether Open-Sources the Next Generation of Bitcoin Mining Infrastructure with MOS, Mining OS, Mining SDK - Tether.io
2 February, 2026 – Tether, the largest company in the digital assets industry, today announced the open-sourcing of Mining OS (MOS), an operating system designed to manage, monitor, and automate Bitcoin mining operations at scale. MOS provides end-to-end…
🔥3👏3🆒3💯2
How Claude broke Anthropic's hiring test and what came next
Anthropic recently open-sourced their legendary engineering take-home assignment.
The reason? Claude Opus 4.5 solved it better than any human candidate ever had in just two hours.
The task: optimize code for a simulated accelerator similar to Google's TPU.
Baseline solution: 147,000 cycles. Claude got it down to 1,487 — a 99x speedup.
Igor Kotenkov decided not to just copy the AI solution. His argument: an AI-generated answer carries zero educational value if you don't understand what's happening under the hood.
Over a weekend, he went from the 147k baseline to 2,200 cycles — a 65x speedup. Six months ago, that would have passed the hiring bar.
Then he wrote a detailed breakdown: what SIMD and VLIW actually mean, how accelerator memory works, why processors hate branching, and how it all connects to decision tree inference from classic ML. Everything explained from scratch, no prior background required.
The takeaway? AI gives you answers. Understanding still takes human effort. And that effort is what turns into valuable content.
Anthropic recently open-sourced their legendary engineering take-home assignment.
The reason? Claude Opus 4.5 solved it better than any human candidate ever had in just two hours.
The task: optimize code for a simulated accelerator similar to Google's TPU.
Baseline solution: 147,000 cycles. Claude got it down to 1,487 — a 99x speedup.
Igor Kotenkov decided not to just copy the AI solution. His argument: an AI-generated answer carries zero educational value if you don't understand what's happening under the hood.
Over a weekend, he went from the 147k baseline to 2,200 cycles — a 65x speedup. Six months ago, that would have passed the hiring bar.
Then he wrote a detailed breakdown: what SIMD and VLIW actually mean, how accelerator memory works, why processors hate branching, and how it all connects to decision tree inference from classic ML. Everything explained from scratch, no prior background required.
The takeaway? AI gives you answers. Understanding still takes human effort. And that effort is what turns into valuable content.
www.ikot.blog on Notion
Anthropic Performance Team Take-Home for Dummies
“a notoriously difficult take-home exam”, eh? Let's dig into the solution!
🆒5👏3🔥2🥰1
Meet Phylo a research lab studying agentic biology, backed by a $13.5M seed round co-led by a16z and Menlo Ventures, Anthropic
Phylo introduced a research preview of Biomni Lab, the first Integrated Biology Environment (IBE) a single place where hypotheses are generated, experiments are planned, data is analyzed, models are run, and results are produced in a way that’s auditable and reproducible.
Biomni Lab uses agents to orchestrate hundreds of biological databases, software tools, molecular AI models, expert workflows, and even external research services in one workspace, supporting research end-to-end from question to experiment to result.
Agents handle the mechanics, while you define the question, then review, steer, and decide. Scientists end up spending more time on science: asking questions, understanding mechanisms, and eliminating diseases.
Phylo is a spin-out of Biomni.
Phylo introduced a research preview of Biomni Lab, the first Integrated Biology Environment (IBE) a single place where hypotheses are generated, experiments are planned, data is analyzed, models are run, and results are produced in a way that’s auditable and reproducible.
Biomni Lab uses agents to orchestrate hundreds of biological databases, software tools, molecular AI models, expert workflows, and even external research services in one workspace, supporting research end-to-end from question to experiment to result.
Agents handle the mechanics, while you define the question, then review, steer, and decide. Scientists end up spending more time on science: asking questions, understanding mechanisms, and eliminating diseases.
Phylo is a spin-out of Biomni.
phylo.bio
Built to evolve. Designed to discover. AI research and products for bio-medical super-intelligence — accelerating discovery by 100x.
❤7🔥6👏3
Apple's Xcode now has direct integration with the Claude Agent SDK, giving developers the full functionality of Claude Code for building on Apple platforms, from iPhone to Mac to Apple Vision Pro.
Anthropic
Apple’s Xcode now supports the Claude Agent SDK
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
🔥8❤2👏2
Google launched a first-of-its-kind nationwide randomized study with Included Health to evaluate AI in a real-world virtual care setting and better understand its capabilities and limitations
Google launched Institutional Review Board (IRB) approval, a prospective consented nationwide randomized study to assess AI in a real-world virtual care setting. This new research will build upon our foundational research on the use of AI for diagnostic and management reasoning, personalized health insights and navigating health information.
This study is informed by years of foundational research across Google, investigating the capabilities required for a helpful and safe medical AI.
Google launched Institutional Review Board (IRB) approval, a prospective consented nationwide randomized study to assess AI in a real-world virtual care setting. This new research will build upon our foundational research on the use of AI for diagnostic and management reasoning, personalized health insights and navigating health information.
This study is informed by years of foundational research across Google, investigating the capabilities required for a helpful and safe medical AI.
Google Research
Collaborating on a nationwide randomized study of AI in real-world virtual care
In partnership with Included Health, we will be launching a first-of-its-kind nationwide study to evaluate conversational AI within real-world virtual care workflows. This research will move beyond simulation and retrospective data and aim to gather rigorous…
❤2👍2🔥2
Meet Q Labs a research lab focused on solving generalization.
Alongside others (SSI, Flapping Airplanes), Q Labs see data efficiency as the key problem, but they're taking an unconventional approach to solve it: a new learning algorithm approximating Solomonoff induction.
Why Solomonoff Induction? It's provably optimal for prediction. The idea is simple: search for all programs that fit the data and favor low-complexity ones.
Since it's uncomputable, they're building a practical approximation in the context of neural nets.
Alongside others (SSI, Flapping Airplanes), Q Labs see data efficiency as the key problem, but they're taking an unconventional approach to solve it: a new learning algorithm approximating Solomonoff induction.
Why Solomonoff Induction? It's provably optimal for prediction. The idea is simple: search for all programs that fit the data and favor low-complexity ones.
Since it's uncomputable, they're building a practical approximation in the context of neural nets.
qlabs.sh
Q - Research Lab Solving Generalization
Q is a research lab building learning algorithms beyond gradient descent to solve generalization -- the core open problem in AI.
🔥3🥰2👏2
Mistral Introduced Voxtral Transcribe 2, next-gen speech-to-text models
SOTA transcription, speaker diarization, sub-200ms real-time latency.
Voxtral Realtime is built for voice agents and live applications. Its natively streaming architecture delivers latency configurable to sub-200ms. And at 480ms, it stays within 1-2% WER of our offline model.
Mistral released the model as open weights under Apache 2.0.
The demo is worth a try - ignore the "No microphone found" message, clicking "Record" and allowing your browser to use a microphone fixes that. It transcribes very accurately in almost real-time. It's really impressive.
SOTA transcription, speaker diarization, sub-200ms real-time latency.
Voxtral Realtime is built for voice agents and live applications. Its natively streaming architecture delivers latency configurable to sub-200ms. And at 480ms, it stays within 1-2% WER of our offline model.
Mistral released the model as open weights under Apache 2.0.
The demo is worth a try - ignore the "No microphone found" message, clicking "Record" and allowing your browser to use a microphone fixes that. It transcribes very accurately in almost real-time. It's really impressive.
huggingface.co
Voxtral Mini Realtime - a Hugging Face Space by mistralai
This app lets you speak into your microphone and see your words appear as live text. Just enter your Mistral API key, click the mic button, and talk – the app streams the audio to the Voxtral model...
🔥5👍3❤2
OpenAI introduced Frontier a new platform that helps enterprises build, deploy, and manage AI coworkers that can do real work.
Frontier gives agents the same skills people need to succeed at work:
- Understand how work gets done
- Use a computer and tools
- Improve quality over time
- Stay governed & observable
Also pair OpenAI Forward Deployed Engineers with your team, working side by side to develop your best practices to build and run agents in production.
Frontier gives agents the same skills people need to succeed at work:
- Understand how work gets done
- Use a computer and tools
- Improve quality over time
- Stay governed & observable
Also pair OpenAI Forward Deployed Engineers with your team, working side by side to develop your best practices to build and run agents in production.
🔥5❤2👏2
Anthropic Introduced Claude Opus 4.6, It’s first Opus-class model with 1M token context in beta
Opus 4.6 plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes.
Opus 4.6 is SOTA on several evaluations including agentic coding, multi-discipline reasoning, knowledge work, and agentic search.
Anthropic also shipping new features across Claude in Excel, Claude in PowerPoint, Claude Code, and API to let Opus 4.6 do even more.
Claude in Excel now handles long-running and harder tasks with improved performance.
It can plan before acting, support richer functionalities like conditional formatting and data validation, and handle multi-step changes in one pass.
On Claude Code Anthropic introduced agent teams.
Spin up multiple agents that coordinate autonomously and work in parallel—best for tasks that can be split up and tackled independently.
Agent teams are in research preview
Opus 4.6 plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes.
Opus 4.6 is SOTA on several evaluations including agentic coding, multi-discipline reasoning, knowledge work, and agentic search.
Anthropic also shipping new features across Claude in Excel, Claude in PowerPoint, Claude Code, and API to let Opus 4.6 do even more.
Claude in Excel now handles long-running and harder tasks with improved performance.
It can plan before acting, support richer functionalities like conditional formatting and data validation, and handle multi-step changes in one pass.
On Claude Code Anthropic introduced agent teams.
Spin up multiple agents that coordinate autonomously and work in parallel—best for tasks that can be split up and tackled independently.
Agent teams are in research preview
❤7👍2🔥2
General framework of AI agents.
What if a single framework could unify every AI agent, from software chatbots to physical robots?
Dr. Hang Li from ByteDance proposes exactly that in a new JCST paper.
It's a universal blueprint where agents use LLMs as their "brain" for reasoning, are built via reinforcement learning, and operate using tools & long-term memory to complete tasks.
This general framework outperforms fragmented approaches, providing a unified theory for agent development across both software and hardware domains.
What if a single framework could unify every AI agent, from software chatbots to physical robots?
Dr. Hang Li from ByteDance proposes exactly that in a new JCST paper.
It's a universal blueprint where agents use LLMs as their "brain" for reasoning, are built via reinforcement learning, and operate using tools & long-term memory to complete tasks.
This general framework outperforms fragmented approaches, providing a unified theory for agent development across both software and hardware domains.
🤩4❤3🥰2👏2
$11T by 2030. This is the prediction ARK Investment estimates on-chain assets could grow to ~$11T by 2030, driven by deposits, public equities, credit, and funds moving on-chain.
This is where we are today:
• Stablecoin supply crossed $300B in 2025, with real transaction volumes now competing with legacy payment rails
• Tokenised real-world assets reached ~$19B, led by Treasuries and commodities
• Ethereum hosts the majority of that on-chain value
This is what will fuel the growth:
• Deposits moving on-chain for faster settlement and global liquidity
• Public equities and funds reducing issuance and operational costs via tokenisation
• Credit markets adopting programmable settlement and collateral workflows
• Banks, asset managers, fintechs, and payment networks actively launching on-chain rails
• Public blockchains increasingly used as back-end infrastructure, not front-end products
ARK Investment Management LLC wording is telling:
“Ethereum remains the preferred blockchain for on-chain assets.”
$19B today.
$11T by 2030.
This is where we are today:
• Stablecoin supply crossed $300B in 2025, with real transaction volumes now competing with legacy payment rails
• Tokenised real-world assets reached ~$19B, led by Treasuries and commodities
• Ethereum hosts the majority of that on-chain value
This is what will fuel the growth:
• Deposits moving on-chain for faster settlement and global liquidity
• Public equities and funds reducing issuance and operational costs via tokenisation
• Credit markets adopting programmable settlement and collateral workflows
• Banks, asset managers, fintechs, and payment networks actively launching on-chain rails
• Public blockchains increasingly used as back-end infrastructure, not front-end products
ARK Investment Management LLC wording is telling:
“Ethereum remains the preferred blockchain for on-chain assets.”
$19B today.
$11T by 2030.
KPMG told its auditor, Grant Thornton UK, it should pass on cost savings from the rollout of AI and threatened to find a new accountant if it did not agree to a significant fee reduction, the people said.
Ft
KPMG pressed its auditor to pass on AI cost savings
Big Four accounting firm’s move to cut fees for its own audit comes amid debate over pricing model
🔥3❤2👏2
Waymo introduced World Model a frontier generative mode for large-scale, hyper-realistic autonomous driving simulation built on Google DeepMind’s Genie 3.
Waymo
The Waymo World Model: A New Frontier For Autonomous Driving Simulation
We are excited to introduce the Waymo World Model, a frontier generative model that sets a new bar for large-scale, hyper-realistic autonomous driving simulation.
🔥4🆒3❤2👍1