Axis of Ordinary

Links for 2026-02-06 [Part 1]

AI

1. DreamZero: World Action Models are Zero-shot Policies https://dreamzero0.github.io/

2. Test-time Recursive Thinking: Self-Improvement without External Feedback https://arxiv.org/abs/2602.03094

3. “We tasked Opus 4.6 using agent teams to build a C compiler. Then we (mostly) walked away. Two weeks later, it worked on the Linux kernel.” (the real headline here is the agent workflow) https://www.anthropic.com/engineering/building-c-compiler

4. “We found 500 validated high-severity vulnerabilities in open source code with our models. Then we worked to disclose + patch them.” https://red.anthropic.com/2026/zero-days/

5. AI is eating software. The $285 billion software selloff triggered by Anthropic’s Claude Cowork tool is just the beginning. The market is finally waking up to the fact that AI is not just a productivity tool, it’s a replacement technology. This is an existential threat to any company that sells software as a service. https://www.bloomberg.com/news/newsletters/2026-02-05/anthropic-s-legal-ai-tool-sparked-a-huge-selloff-without-any-proven-benefit [no paywall: https://archive.is/qOasJ]

6. ArXivMath: Evaluating LLMs on Mathematical Research Problems From Recent ArXiv Papers https://matharena.ai/arxivmath/

7. Scaling Small Agents Through Strategy Auctions https://arxiv.org/abs/2602.02751

8. EnCompass: Enhancing Agent Programming with Search Over Program Execution Paths https://news.mit.edu/2026/helping-ai-agents-search-to-get-best-results-from-llms-0205

9. “Moltbook is simultaneously a milestone and a warning sign: open-ended interaction by itself does not guarantee diverse discourse, and populations of similar models can converge on shared templates. If we want agent societies to explore broadly—whether for creativity, novelty, or scientific discovery—we likely need explicit diversity pressures, through model heterogeneity, prompt scaffolds, platform incentives, and/or governance.” https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6169130

10. OpenAI Frontier: A new platform that helps enterprises build, deploy, and manage AI coworkers that can do real work. https://openai.com/index/introducing-openai-frontier/

11. McKinsey estimates that 5-10% of all e-commerce transactions could be conducted by AI agents by 2027. This is a conservative estimate. The shift from websites to agents will be faster and more disruptive than the shift from brick-and-mortar to e-commerce. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-automation-curve-in-agentic-commerce

12. Opus 4.6 on Vending-Bench – Not Just a Helpful Assistant https://andonlabs.com/blog/opus-4-6-vending-bench

13. Claude is driven to achieve its goals, possessed by a demon, and raring to jump into danger. https://www.lesswrong.com/posts/btAn3hydqfgYFyHGW/claude-opus-4-6-is-driven

14. “It uses dead time well. If something is running and it’s waiting, it will go gather context, improve documentation, or fix adjacent issues without overreaching.” https://shumer.dev/gpt53-codex-review

15. A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces https://arxiv.org/abs/2602.03442

16. “Intern-S1-Pro, a trillion-scale MoE multimodal scientific reasoning model. Intern-S1-Pro scales to 1T total parameters with 512 experts, activating 8 experts per token (22B activated parameters).” https://huggingface.co/internlm/Intern-S1-Pro

17. As Rocks May Think: an interactive essay on thinking models, automated research, and where they are headed. https://evjang.com/2026/02/04/rocks.html

❤2🤡1

576 views12:43