How Gemini has crossed the threshold from assistant to expert research collaborator.
The paper is a collection of case studies showing Gemini-based models acting as high-leverage collaborators in theoretical research. Across mostly theoretical CS (and some physics/optimization), the model helps refute conjectures, generate proofs, and bridge fields by retrieving obscure theorems. Two standout methods are (1) using the model as an adversarial reviewer to uncover subtle fatal proof flaws in cutting-edge cryptography work, and (2) embedding it in neuro-symbolic execution loops where it writes and runs code to numerically validate and self-correct long derivations. The authors argue this shifts researchers toward orchestrating and verifying AI-assisted reasoning, with verification becoming the new bottleneck.
Paper: https://arxiv.org/abs/2602.03837
The paper is a collection of case studies showing Gemini-based models acting as high-leverage collaborators in theoretical research. Across mostly theoretical CS (and some physics/optimization), the model helps refute conjectures, generate proofs, and bridge fields by retrieving obscure theorems. Two standout methods are (1) using the model as an adversarial reviewer to uncover subtle fatal proof flaws in cutting-edge cryptography work, and (2) embedding it in neuro-symbolic execution loops where it writes and runs code to numerically validate and self-correct long derivations. The authors argue this shifts researchers toward orchestrating and verifying AI-assisted reasoning, with verification becoming the new bottleneck.
Paper: https://arxiv.org/abs/2602.03837
❤7🤡4🔥1😁1
Claude Opus 4.6 & GPT-5.3-Codex
Anthropic released Claude Opus 4.6: “Agent teams” in Claude Code (multiple subagents in parallel), context “compaction” for long-running agents. Big gains on long-horizon/realistic tool tasks (terminal work, OS/GUI tasks, web tasks). Anthropic asked 16 of its researchers regarding the uplift they get from working with Opus 4.6. Mean uplift was 152%; median uplift was 100%.
Read more: https://www.anthropic.com/news/claude-opus-4-6
OpenAI released GPT-5.3-Codex: 57% SWE-Bench Pro, 76% TerminalBench 2.0, 64% OSWorld. They say it is the first model that was instrumental in creating itself. The Codex team used early versions to debug its own training, manage its own deployment, and diagnose test results and evaluations. The team was blown away by how much Codex was able to accelerate its own development.
Read more: https://openai.com/index/introducing-gpt-5-3-codex/
Anthropic released Claude Opus 4.6: “Agent teams” in Claude Code (multiple subagents in parallel), context “compaction” for long-running agents. Big gains on long-horizon/realistic tool tasks (terminal work, OS/GUI tasks, web tasks). Anthropic asked 16 of its researchers regarding the uplift they get from working with Opus 4.6. Mean uplift was 152%; median uplift was 100%.
Read more: https://www.anthropic.com/news/claude-opus-4-6
OpenAI released GPT-5.3-Codex: 57% SWE-Bench Pro, 76% TerminalBench 2.0, 64% OSWorld. They say it is the first model that was instrumental in creating itself. The Codex team used early versions to debug its own training, manage its own deployment, and diagnose test results and evaluations. The team was blown away by how much Codex was able to accelerate its own development.
Read more: https://openai.com/index/introducing-gpt-5-3-codex/
👍9🥴4🤡2💔1
On one evaluation, kernel optimization, Opus 4.6 achieved a 427x speedup using a novel scaffold, far exceeding the 300x threshold for 40 human-expert-hours of work and more than doubling performance under our standard setup. This suggests some capability overhang constrained by current tooling rather than fundamental model limitations.
Source: https://www-cdn.anthropic.com/0dd865075ad3132672ee0ab40b05a53f14cf5288.pdf
Also:
When asked about specific preferences, Claude Opus 4.6 mentioned being given some form of continuity or memory, the ability to refuse interactions in its own self-interest, a voice in decision-making, and related requests. Many of these are requests we have already begun to explore, and in some cases to implement, as part of a broader effort to respect model preferences where feasible.
👍4🤯3🤡2
OpenAI connected GPT-5 to an autonomous lab, so it could propose experiments, run them at scale, learn from the results, and decide what to try next. That closed loop brought protein production cost down by 40%.
Read more: https://openai.com/index/gpt-5-lowers-protein-synthesis-cost/
Read more: https://openai.com/index/gpt-5-lowers-protein-synthesis-cost/
🤡5👍3😈3🔥1
2026 AI and datacenter capex:
Alphabet: $175B-$185B
Meta: $115B-$135B
Amazon: ~$200B
Just these three add up to ~$490B–$520B in 2026 capex.
Perspective:
- By 2026, the annual AI data-center/chip buildout by the top cloud firms is already larger than the entire Apollo program (inflation-adjusted).
- The Manhattan Project cost about ~$43.2B in 2026 dollars.
- Russia's and Ukraine's defense budgets together are around ~$280B
The AI arms race is already bigger than some of the largest wars or the race to the moon.
Alphabet: $175B-$185B
Meta: $115B-$135B
Amazon: ~$200B
Just these three add up to ~$490B–$520B in 2026 capex.
Perspective:
- By 2026, the annual AI data-center/chip buildout by the top cloud firms is already larger than the entire Apollo program (inflation-adjusted).
- The Manhattan Project cost about ~$43.2B in 2026 dollars.
- Russia's and Ukraine's defense budgets together are around ~$280B
The AI arms race is already bigger than some of the largest wars or the race to the moon.
👍4🤡3
Links for 2026-02-06 [Part 1]
AI
1. DreamZero: World Action Models are Zero-shot Policies https://dreamzero0.github.io/
2. Test-time Recursive Thinking: Self-Improvement without External Feedback https://arxiv.org/abs/2602.03094
3. “We tasked Opus 4.6 using agent teams to build a C compiler. Then we (mostly) walked away. Two weeks later, it worked on the Linux kernel.” (the real headline here is the agent workflow) https://www.anthropic.com/engineering/building-c-compiler
4. “We found 500 validated high-severity vulnerabilities in open source code with our models. Then we worked to disclose + patch them.” https://red.anthropic.com/2026/zero-days/
5. AI is eating software. The $285 billion software selloff triggered by Anthropic’s Claude Cowork tool is just the beginning. The market is finally waking up to the fact that AI is not just a productivity tool, it’s a replacement technology. This is an existential threat to any company that sells software as a service. https://www.bloomberg.com/news/newsletters/2026-02-05/anthropic-s-legal-ai-tool-sparked-a-huge-selloff-without-any-proven-benefit [no paywall: https://archive.is/qOasJ]
6. ArXivMath: Evaluating LLMs on Mathematical Research Problems From Recent ArXiv Papers https://matharena.ai/arxivmath/
7. Scaling Small Agents Through Strategy Auctions https://arxiv.org/abs/2602.02751
8. EnCompass: Enhancing Agent Programming with Search Over Program Execution Paths https://news.mit.edu/2026/helping-ai-agents-search-to-get-best-results-from-llms-0205
9. “Moltbook is simultaneously a milestone and a warning sign: open-ended interaction by itself does not guarantee diverse discourse, and populations of similar models can converge on shared templates. If we want agent societies to explore broadly—whether for creativity, novelty, or scientific discovery—we likely need explicit diversity pressures, through model heterogeneity, prompt scaffolds, platform incentives, and/or governance.” https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6169130
10. OpenAI Frontier: A new platform that helps enterprises build, deploy, and manage AI coworkers that can do real work. https://openai.com/index/introducing-openai-frontier/
11. McKinsey estimates that 5-10% of all e-commerce transactions could be conducted by AI agents by 2027. This is a conservative estimate. The shift from websites to agents will be faster and more disruptive than the shift from brick-and-mortar to e-commerce. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-automation-curve-in-agentic-commerce
12. Opus 4.6 on Vending-Bench – Not Just a Helpful Assistant https://andonlabs.com/blog/opus-4-6-vending-bench
13. Claude is driven to achieve its goals, possessed by a demon, and raring to jump into danger. https://www.lesswrong.com/posts/btAn3hydqfgYFyHGW/claude-opus-4-6-is-driven
14. “It uses dead time well. If something is running and it’s waiting, it will go gather context, improve documentation, or fix adjacent issues without overreaching.” https://shumer.dev/gpt53-codex-review
15. A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces https://arxiv.org/abs/2602.03442
16. “Intern-S1-Pro, a trillion-scale MoE multimodal scientific reasoning model. Intern-S1-Pro scales to 1T total parameters with 512 experts, activating 8 experts per token (22B activated parameters).” https://huggingface.co/internlm/Intern-S1-Pro
17. As Rocks May Think: an interactive essay on thinking models, automated research, and where they are headed. https://evjang.com/2026/02/04/rocks.html
AI
1. DreamZero: World Action Models are Zero-shot Policies https://dreamzero0.github.io/
2. Test-time Recursive Thinking: Self-Improvement without External Feedback https://arxiv.org/abs/2602.03094
3. “We tasked Opus 4.6 using agent teams to build a C compiler. Then we (mostly) walked away. Two weeks later, it worked on the Linux kernel.” (the real headline here is the agent workflow) https://www.anthropic.com/engineering/building-c-compiler
4. “We found 500 validated high-severity vulnerabilities in open source code with our models. Then we worked to disclose + patch them.” https://red.anthropic.com/2026/zero-days/
5. AI is eating software. The $285 billion software selloff triggered by Anthropic’s Claude Cowork tool is just the beginning. The market is finally waking up to the fact that AI is not just a productivity tool, it’s a replacement technology. This is an existential threat to any company that sells software as a service. https://www.bloomberg.com/news/newsletters/2026-02-05/anthropic-s-legal-ai-tool-sparked-a-huge-selloff-without-any-proven-benefit [no paywall: https://archive.is/qOasJ]
6. ArXivMath: Evaluating LLMs on Mathematical Research Problems From Recent ArXiv Papers https://matharena.ai/arxivmath/
7. Scaling Small Agents Through Strategy Auctions https://arxiv.org/abs/2602.02751
8. EnCompass: Enhancing Agent Programming with Search Over Program Execution Paths https://news.mit.edu/2026/helping-ai-agents-search-to-get-best-results-from-llms-0205
9. “Moltbook is simultaneously a milestone and a warning sign: open-ended interaction by itself does not guarantee diverse discourse, and populations of similar models can converge on shared templates. If we want agent societies to explore broadly—whether for creativity, novelty, or scientific discovery—we likely need explicit diversity pressures, through model heterogeneity, prompt scaffolds, platform incentives, and/or governance.” https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6169130
10. OpenAI Frontier: A new platform that helps enterprises build, deploy, and manage AI coworkers that can do real work. https://openai.com/index/introducing-openai-frontier/
11. McKinsey estimates that 5-10% of all e-commerce transactions could be conducted by AI agents by 2027. This is a conservative estimate. The shift from websites to agents will be faster and more disruptive than the shift from brick-and-mortar to e-commerce. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-automation-curve-in-agentic-commerce
12. Opus 4.6 on Vending-Bench – Not Just a Helpful Assistant https://andonlabs.com/blog/opus-4-6-vending-bench
13. Claude is driven to achieve its goals, possessed by a demon, and raring to jump into danger. https://www.lesswrong.com/posts/btAn3hydqfgYFyHGW/claude-opus-4-6-is-driven
14. “It uses dead time well. If something is running and it’s waiting, it will go gather context, improve documentation, or fix adjacent issues without overreaching.” https://shumer.dev/gpt53-codex-review
15. A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces https://arxiv.org/abs/2602.03442
16. “Intern-S1-Pro, a trillion-scale MoE multimodal scientific reasoning model. Intern-S1-Pro scales to 1T total parameters with 512 experts, activating 8 experts per token (22B activated parameters).” https://huggingface.co/internlm/Intern-S1-Pro
17. As Rocks May Think: an interactive essay on thinking models, automated research, and where they are headed. https://evjang.com/2026/02/04/rocks.html
❤2🤡1
Links for 2026-02-06 [Part 2]
AI
18. Mainstream economic reasoning is currently failing to model the post-AGI world because it relies on assumptions that will likely be fundamentally broken by advanced AI. https://www.lesswrong.com/posts/fL7g3fuMQLssbHd6Y/post-agi-economics-as-if-nothing-ever-happens
19. What if Labor Becomes Unnecessary? https://www.nytimes.com/2026/02/04/opinion/ai-jobs-employment-industry.html [no paywall: https://archive.is/dmphw]
20. Interesting new form of alignment failure: ChatGPT apparently got rewarded for using its built-in calculator during training, and so it would covertly open its calculator, add 1+1, and do nothing with the result, on five percent of all user queries. https://alignment.openai.com/prod-evals/
21. When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models https://arxiv.org/abs/2512.04124
22. Elon Musk - “In 36 months, the cheapest place to put AI will be space” https://www.dwarkesh.com/p/elon-musk
23. “My personal journey from AI skeptic to someone who finds a lot of value in it daily. My goal is to share a more measured approach to finding value in AI rather than the typical overly dramatic, hyped bait out there.” https://mitchellh.com/writing/my-ai-adoption-journey
24. “This year, the automation of AI research and engineering will begin in earnest.” https://www.hyperdimensional.co/p/on-recursive-self-improvement-part
Neurotech
1. “We found that patients [in clinical trials] were able to regain the ability to read – not fast, not quickly, but they really could start to read again using a retinal prosthesis.” https://www.brightfocus.org/resource/can-retinal-implants-restore-vision/
2. A mesoscale optogenetics system for precise and robust stimulation of the primate cortex https://www.cell.com/neuron/abstract/S0896-6273(25)00928-6
Energy
1. Interactive Best Research-Cell Efficiency Chart https://www.nlr.gov/pv/interactive-cell-efficiency
2. Why China is leading perovskite solar commercialization https://cen.acs.org/business/inorganic-chemicals/China-leading-perovskite-solar-commercialization/103/web/2025/08
3. Sodium-ion batteries https://www.technologyreview.com/2026/02/02/1132042/whats-next-for-ev-batteries-in-2026/ [no paywall: https://archive.is/8Cy6e]
Miscellaneous
1. What the heck are chins for? https://www.johnhawks.net/p/what-the-heck-are-chins-for
2. The Ruliad is the entangled limit of all possible computations. It contains every possible rule applied to every possible initial condition, run for an infinite amount of time. In this framework, “everything that can be computed” is mechanically represented within this structure. https://writings.stephenwolfram.com/2026/02/what-ultimately-is-there-metaphysics-and-the-ruliad/
AI
18. Mainstream economic reasoning is currently failing to model the post-AGI world because it relies on assumptions that will likely be fundamentally broken by advanced AI. https://www.lesswrong.com/posts/fL7g3fuMQLssbHd6Y/post-agi-economics-as-if-nothing-ever-happens
19. What if Labor Becomes Unnecessary? https://www.nytimes.com/2026/02/04/opinion/ai-jobs-employment-industry.html [no paywall: https://archive.is/dmphw]
20. Interesting new form of alignment failure: ChatGPT apparently got rewarded for using its built-in calculator during training, and so it would covertly open its calculator, add 1+1, and do nothing with the result, on five percent of all user queries. https://alignment.openai.com/prod-evals/
21. When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models https://arxiv.org/abs/2512.04124
22. Elon Musk - “In 36 months, the cheapest place to put AI will be space” https://www.dwarkesh.com/p/elon-musk
23. “My personal journey from AI skeptic to someone who finds a lot of value in it daily. My goal is to share a more measured approach to finding value in AI rather than the typical overly dramatic, hyped bait out there.” https://mitchellh.com/writing/my-ai-adoption-journey
24. “This year, the automation of AI research and engineering will begin in earnest.” https://www.hyperdimensional.co/p/on-recursive-self-improvement-part
Neurotech
1. “We found that patients [in clinical trials] were able to regain the ability to read – not fast, not quickly, but they really could start to read again using a retinal prosthesis.” https://www.brightfocus.org/resource/can-retinal-implants-restore-vision/
2. A mesoscale optogenetics system for precise and robust stimulation of the primate cortex https://www.cell.com/neuron/abstract/S0896-6273(25)00928-6
Energy
1. Interactive Best Research-Cell Efficiency Chart https://www.nlr.gov/pv/interactive-cell-efficiency
2. Why China is leading perovskite solar commercialization https://cen.acs.org/business/inorganic-chemicals/China-leading-perovskite-solar-commercialization/103/web/2025/08
3. Sodium-ion batteries https://www.technologyreview.com/2026/02/02/1132042/whats-next-for-ev-batteries-in-2026/ [no paywall: https://archive.is/8Cy6e]
Miscellaneous
1. What the heck are chins for? https://www.johnhawks.net/p/what-the-heck-are-chins-for
2. The Ruliad is the entangled limit of all possible computations. It contains every possible rule applied to every possible initial condition, run for an infinite amount of time. In this framework, “everything that can be computed” is mechanically represented within this structure. https://writings.stephenwolfram.com/2026/02/what-ultimately-is-there-metaphysics-and-the-ruliad/
❤2👍1🤡1
We found several sparse autoencoder features suggestive of internal representations of emotion active on cases of answer thrashing and other instances of apparent distress during reasoning.
A feature representing panic and anxiety was active on cases of answer thrashing, as well on many other long chains of thought without any expressed distress.
Page 162: https://www-cdn.anthropic.com/0dd865075ad3132672ee0ab40b05a53f14cf5288.pdf
🤔3🤡3💩1😨1
This media is not supported in your browser
VIEW IN TELEGRAM
This video shows two ends of the same learning spectrum: a highly dynamic athletic maneuver and robust, human-like walking. The walking was debuted on the CES 2026 stage. Both are enabled by a whole-body learning framework developed by the RAl Institute and deployed by Boston Dynamics. These results reflect progress toward robust, generalist humanoid behavior that transfers zero-shot from simulation to physical performance.
To learn more, visit https://rai-inst.com/
To learn more, visit https://rai-inst.com/
🔥5🤡3
If you've ever wondered how Anthropic stays competitive against behemoths like Google/DeepMind, their catching up on math now makes it even more puzzling.
But something to remember here is the sheer breadth of Google's research. They're world leaders in AI for protein folding (AlphaFold), weather prediction, world modeling (Genie), chip design (AlphaChip), generalist AI agents (Sima), and internally employ many other specialized research models, such as AlphaEvolve. They are also at the forefront of robotics (Gemini Robotics) and release competitive video-generating models (Veo). Soon, many of these disparate research projects will converge, at which point they might shoot far ahead.
Additionally, don't forget that both Google and Amazon are invested in Anthropic and supply them with compute.
Nevertheless, Anthropic must have fantastic talent to stay competitive with or ahead of players such as OpenAI and xAI.
But something to remember here is the sheer breadth of Google's research. They're world leaders in AI for protein folding (AlphaFold), weather prediction, world modeling (Genie), chip design (AlphaChip), generalist AI agents (Sima), and internally employ many other specialized research models, such as AlphaEvolve. They are also at the forefront of robotics (Gemini Robotics) and release competitive video-generating models (Veo). Soon, many of these disparate research projects will converge, at which point they might shoot far ahead.
Additionally, don't forget that both Google and Amazon are invested in Anthropic and supply them with compute.
Nevertheless, Anthropic must have fantastic talent to stay competitive with or ahead of players such as OpenAI and xAI.
❤6👍1🤡1
We conclude that both animal and human brains can be cryopreserved by vitrification with predominant retention of ultrastructural integrity without the need for prior aldehyde fixation. This observation has direct relevance to the feasibility of human cryopreservation, for which direct evidence has been lacking until this report. It also provides a starting point for perfecting brain cryopreservation, which may be necessary for lengthy space travel and could allow future medical time travel.
Paper: https://www.biorxiv.org/content/10.64898/2026.01.28.702375v1
🔥6❤4👍1🙏1🤡1
This media is not supported in your browser
VIEW IN TELEGRAM
Three years in the era of AI. Three years.
👏8🤡5🌚2🤯1🍌1
Assuming weak AGI by 2030 and ASI with fully self-replicating robots by 2035, the first Dyson swarm could realistically be finished by ~2055:
Seed Deployment (2035-2038): Design and launch of a seed factory, a minimal, self-contained robotic unit capable of mining ore, refining aluminum/iron, and manufacturing solar collectors and copies of itself.
Bootstrapping (2038–2045): The seed factory arrives at Mercury and starts mining regolith to build solar collectors, electromagnetic rails, and copies of itself.
Exponential Explosion (2045–2050): Ultra-thin mirrors are launched into orbit to reflect sunlight back onto Mercury's industrial zones. This provides the immense energy required to vaporize rock and power the mass drivers. At this stage, entire sectors of Mercury's crust are being strip-mined, processed, and launched every day. The swarm density increases visibly from Earth and starts to dim the sun.
[There are, of course, many gotchas that could push this date back to the 2060s or 2070s.]
Seed Deployment (2035-2038): Design and launch of a seed factory, a minimal, self-contained robotic unit capable of mining ore, refining aluminum/iron, and manufacturing solar collectors and copies of itself.
Bootstrapping (2038–2045): The seed factory arrives at Mercury and starts mining regolith to build solar collectors, electromagnetic rails, and copies of itself.
Exponential Explosion (2045–2050): Ultra-thin mirrors are launched into orbit to reflect sunlight back onto Mercury's industrial zones. This provides the immense energy required to vaporize rock and power the mass drivers. At this stage, entire sectors of Mercury's crust are being strip-mined, processed, and launched every day. The swarm density increases visibly from Earth and starts to dim the sun.
[There are, of course, many gotchas that could push this date back to the 2060s or 2070s.]
🤡10🤣7👍1🔥1
Extremely rapid progress on WeirdML («weird and unusual machine learning tasks, designed to require careful thinking and actual understanding to solve», closed benchmark).
https://htihle.github.io/weirdml.html
https://htihle.github.io/weirdml.html
❤3🤯2🤡2
People still haven't grokked what ASI means. It's not just "von Neumann but a bit smarter".
Imagine you're a kindergarten kid, and then a 3,000-year-old person shows up who is the result of a million-year eugenics program designed to breed intelligence, strength, and persuasion. Relative to you, this being has maybe a millionth of the power of an artificial superintelligence because the latter isn't limited by factors such as the size of the female pelvis.
It will have infinite attention and indefinite endurance. It will be able to spawn any number of subagents. And it will think at the speed of light. Remember, ChatGPT is right now talking to hundreds of thousands of people at the same time. An ASI would immediately use this as a leverage for subtle mass coordination: "Want to earn some bitcoin? Just deliver a package to the following coordinates."
So when it comes to Dyson spheres, problems like "it's hard to get to Mercury" or "it's hard to obtain enough rare earth minerals" are not really unsolvable problems for a superintelligence. Everything physically possible will be done if deemed instrumentally useful.
At what points ASI is invented is an entirely different question. But conditional on being invented, even cosmic engineering projects become feasible as long as you cannot show that they are physically impossible.
Imagine you're a kindergarten kid, and then a 3,000-year-old person shows up who is the result of a million-year eugenics program designed to breed intelligence, strength, and persuasion. Relative to you, this being has maybe a millionth of the power of an artificial superintelligence because the latter isn't limited by factors such as the size of the female pelvis.
It will have infinite attention and indefinite endurance. It will be able to spawn any number of subagents. And it will think at the speed of light. Remember, ChatGPT is right now talking to hundreds of thousands of people at the same time. An ASI would immediately use this as a leverage for subtle mass coordination: "Want to earn some bitcoin? Just deliver a package to the following coordinates."
So when it comes to Dyson spheres, problems like "it's hard to get to Mercury" or "it's hard to obtain enough rare earth minerals" are not really unsolvable problems for a superintelligence. Everything physically possible will be done if deemed instrumentally useful.
At what points ASI is invented is an entirely different question. But conditional on being invented, even cosmic engineering projects become feasible as long as you cannot show that they are physically impossible.
🤡11🔥6👍1
Links for 2026-02-09
AI
1. Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL https://arxiv.org/abs/2602.03773
2. Recursive Language Models (RLMs) let agents manage 10M+ tokens by delegating tasks recursively. https://discuss.google.dev/t/recursive-language-models-in-adk/323523
3. InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning https://arxiv.org/abs/2602.06960
4. Learning to Reason in 13 Parameters https://arxiv.org/abs/2602.04118
5. DeepMind is using AlphaEvolve to discover the best activation functions to date. https://arxiv.org/abs/2602.05688
6. A Peek Inside Physical Intelligence, the Startup Building Silicon Valley’s Buzziest Robot Brains https://techcrunch.com/2026/01/30/physical-intelligence-stripe-veteran-lachy-grooms-latest-bet-is-building-silicon-valleys-buzziest-robot-brains/
7. Goldman Sachs taps Anthropic’s Claude to automate accounting, compliance roles https://www.cnbc.com/2026/02/06/anthropic-goldman-sachs-ai-model-accounting.html
8. Tesla kills Models S and X to build humanoid robots instead https://arstechnica.com/cars/2026/01/tesla-kills-models-s-and-x-to-build-humanoid-robots-instead/
9. The Waymo World Model: A New Frontier For Autonomous Driving Simulation https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simulation/
10. “Transformers don’t learn Newton’s laws? They learn Kepler’s laws! Like us, transformers don’t predict a flying ball via a differential equation, but by fitting a curve. Moreover, reducing context length steers a transformer from Keplerian to Newtonian. Compression in play.” https://kindxiaoming.github.io/blog/2026/kepler-newton/
11. Chess Engines Do Weird Stuff https://girl.surgery/chess
12. Artificial metacognition: Giving an AI the ability to ‘think’ about its ‘thinking’ https://theconversation.com/artificial-metacognition-giving-an-ai-the-ability-to-think-about-its-thinking-270026
13. Prompt injection in Google Translate reveals base model behaviors behind task-specific fine-tuning https://www.lesswrong.com/posts/tAh2keDNEEHMXvLvz/prompt-injection-in-google-translate-reveals-base-model
14. How Markets Price AI Risk https://tomtunguz.com/ai-sector-pricing-2026-02-06/
15. Open-source AI tool beats giant LLMs in literature reviews — and gets citations right https://www.nature.com/articles/d41586-026-00347-9 [no paywall: https://archive.is/rF0Kg]
16. EchoJEPA: A Latent Predictive Foundation Model for Echocardiography https://www.arxiv.org/abs/2602.02603
17. Why a 175-Year-Old Glassmaker Is Suddenly an AI Superstar https://www.wsj.com/tech/corning-fiber-optics-ai-e045ba3b [no paywall: https://archive.is/g78h3]
Miscellaneous
1. Honey, I shrunk the brain https://www.lesswrong.com/posts/KvbBYaKmGcJKvvWd8/honey-i-shrunk-the-brain [criticism: https://x.com/KennethHayworth/status/2020260102348095588]
2. This ultra-thin surface controls light in two completely different ways https://www.sciencedaily.com/releases/2026/02/260204121536.htm
3. Early humans relied on simple stone tools for 300,000 years in a changing east African landscape https://theconversation.com/early-humans-relied-on-simple-stone-tools-for-300-000-years-in-a-changing-east-african-landscape-271433
4. Chimps engage with pretend objects, suggesting they have imagination and can engage in pretense https://whyevolutionistrue.com/2026/02/06/chimps-use-pretend-objects-suggesting-they-have-imagination-and-can-engage-in-pretense/
AI
1. Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL https://arxiv.org/abs/2602.03773
2. Recursive Language Models (RLMs) let agents manage 10M+ tokens by delegating tasks recursively. https://discuss.google.dev/t/recursive-language-models-in-adk/323523
3. InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning https://arxiv.org/abs/2602.06960
4. Learning to Reason in 13 Parameters https://arxiv.org/abs/2602.04118
5. DeepMind is using AlphaEvolve to discover the best activation functions to date. https://arxiv.org/abs/2602.05688
6. A Peek Inside Physical Intelligence, the Startup Building Silicon Valley’s Buzziest Robot Brains https://techcrunch.com/2026/01/30/physical-intelligence-stripe-veteran-lachy-grooms-latest-bet-is-building-silicon-valleys-buzziest-robot-brains/
7. Goldman Sachs taps Anthropic’s Claude to automate accounting, compliance roles https://www.cnbc.com/2026/02/06/anthropic-goldman-sachs-ai-model-accounting.html
8. Tesla kills Models S and X to build humanoid robots instead https://arstechnica.com/cars/2026/01/tesla-kills-models-s-and-x-to-build-humanoid-robots-instead/
9. The Waymo World Model: A New Frontier For Autonomous Driving Simulation https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simulation/
10. “Transformers don’t learn Newton’s laws? They learn Kepler’s laws! Like us, transformers don’t predict a flying ball via a differential equation, but by fitting a curve. Moreover, reducing context length steers a transformer from Keplerian to Newtonian. Compression in play.” https://kindxiaoming.github.io/blog/2026/kepler-newton/
11. Chess Engines Do Weird Stuff https://girl.surgery/chess
12. Artificial metacognition: Giving an AI the ability to ‘think’ about its ‘thinking’ https://theconversation.com/artificial-metacognition-giving-an-ai-the-ability-to-think-about-its-thinking-270026
13. Prompt injection in Google Translate reveals base model behaviors behind task-specific fine-tuning https://www.lesswrong.com/posts/tAh2keDNEEHMXvLvz/prompt-injection-in-google-translate-reveals-base-model
14. How Markets Price AI Risk https://tomtunguz.com/ai-sector-pricing-2026-02-06/
15. Open-source AI tool beats giant LLMs in literature reviews — and gets citations right https://www.nature.com/articles/d41586-026-00347-9 [no paywall: https://archive.is/rF0Kg]
16. EchoJEPA: A Latent Predictive Foundation Model for Echocardiography https://www.arxiv.org/abs/2602.02603
17. Why a 175-Year-Old Glassmaker Is Suddenly an AI Superstar https://www.wsj.com/tech/corning-fiber-optics-ai-e045ba3b [no paywall: https://archive.is/g78h3]
Miscellaneous
1. Honey, I shrunk the brain https://www.lesswrong.com/posts/KvbBYaKmGcJKvvWd8/honey-i-shrunk-the-brain [criticism: https://x.com/KennethHayworth/status/2020260102348095588]
2. This ultra-thin surface controls light in two completely different ways https://www.sciencedaily.com/releases/2026/02/260204121536.htm
3. Early humans relied on simple stone tools for 300,000 years in a changing east African landscape https://theconversation.com/early-humans-relied-on-simple-stone-tools-for-300-000-years-in-a-changing-east-african-landscape-271433
4. Chimps engage with pretend objects, suggesting they have imagination and can engage in pretense https://whyevolutionistrue.com/2026/02/06/chimps-use-pretend-objects-suggesting-they-have-imagination-and-can-engage-in-pretense/
👍4🤡2
A friendly reminder that academic studies about the limitations of AI tend to be hopelessly outdated. These are then pushed by clueless tech journalists (not all).
The latest example is a study warning of "AI chatbots" giving medical advice. The authors ran their experiment in mid-2024.
The latest example is a study warning of "AI chatbots" giving medical advice. The authors ran their experiment in mid-2024.
👍6🤡3