🗒 Calathea Daily AI Roundups · 04 Jun 2026
1. Google DeepMind announces Gemma 4 12B, a unified encoder-free multimodal model
2. Unitree becomes first embodied-AI firm to clear China IPO review; H1 adjusted net profit to fall up to 22%
3. xAI partners with Vapi to bring Grok voice models to voice agents
4. Qianxun Intelligence raises ¥1.5B Series A+ as Spirit v1.6 tops RoboArena over Nvidia, Physical Intelligence
5. Microsoft announces Frontier Tuning, letting customers customize its AI models via reinforcement learning
6. Cursor details Composer 2 training: specialized weights cut inference cost an order of magnitude below Opus
7. CyberGym-E2E, a real-world benchmark for AI agents' end-to-end cybersecurity capabilities, published on arXiv
8. Fundamental's NEXUS tabular foundation model ships on Amazon SageMaker JumpStart
9. Pinecone integrates Nexus with Microsoft OneLake at Build 2026 for enterprise AI agent retrieval
10. Anthropic raised $50B in May, 54% of a near-record $92B in global venture funding
11. Fei-Fei Li lays out world models as renderers, simulators, and planners
12. Sean Goedecke argues prompts are technical debt too, requiring ownership, review, and tests
1. Google DeepMind announces Gemma 4 12B, a unified encoder-free multimodal model
2. Unitree becomes first embodied-AI firm to clear China IPO review; H1 adjusted net profit to fall up to 22%
3. xAI partners with Vapi to bring Grok voice models to voice agents
4. Qianxun Intelligence raises ¥1.5B Series A+ as Spirit v1.6 tops RoboArena over Nvidia, Physical Intelligence
5. Microsoft announces Frontier Tuning, letting customers customize its AI models via reinforcement learning
6. Cursor details Composer 2 training: specialized weights cut inference cost an order of magnitude below Opus
7. CyberGym-E2E, a real-world benchmark for AI agents' end-to-end cybersecurity capabilities, published on arXiv
8. Fundamental's NEXUS tabular foundation model ships on Amazon SageMaker JumpStart
9. Pinecone integrates Nexus with Microsoft OneLake at Build 2026 for enterprise AI agent retrieval
10. Anthropic raised $50B in May, 54% of a near-record $92B in global venture funding
11. Fei-Fei Li lays out world models as renderers, simulators, and planners
12. Sean Goedecke argues prompts are technical debt too, requiring ownership, review, and tests
X (formerly Twitter)
Google DeepMind (@GoogleDeepMind) on X
RT @googlegemma: Meet Gemma 4 12B!
A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to yo…
A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to yo…
This is a strong list for learning agentic coding in Jun 2026.
Like one of the comment said, half comp engineering + half LLM ops 🙃
https://vxtwitter.com/divaagurlxw/status/2062419864908951606
Like one of the comment said, half comp engineering + half LLM ops 🙃
https://vxtwitter.com/divaagurlxw/status/2062419864908951606
vxTwitter / fixvx
💖 2.18K 🔁 252
💖 2.18K 🔁 252
diva (@divaagurlxw)
As an AI Engineer. Please learn
>Harness engineering, not just prompt engineering
>Context engineering, not just long prompts
>Prompt caching vs. semantic caching tradeoffs
>KV cache management, eviction, reuse, and memory pressure at scale
>Prefill…
>Harness engineering, not just prompt engineering
>Context engineering, not just long prompts
>Prompt caching vs. semantic caching tradeoffs
>KV cache management, eviction, reuse, and memory pressure at scale
>Prefill…
🗒 Calathea Daily AI Roundups · 05 Jun 2026
1. Ideogram releases Ideogram 4.0, a 9.3B-parameter open-weight text-to-image model trained from scratch
2. AWS launches SageMaker Data Agent in Query Editor, turning natural language into SQL for Redshift and Athena
3. Airbnb CEO Brian Chesky plans a new AI company focused on user interaction and design
4. Anthropic says Claude hits 76% on open-ended coding problems, up 50 points in 6 months
5. White House issues Executive Order 14409 on AI cybersecurity, frontier model governance, vulnerability assessment
6. NVIDIA releases Nemotron 3.5 Content Safety with custom enterprise policies, reasoning traces, 140-language support
7. Cog ships first eval product: 100-hour enterprise evals with financial guarantee, vs METR's 16-hour cap
8. Scotch raises $20M Series A led by VMG Partners; AI liquor-retail OS tops $1B in payments processed
9. Heajun An et al. introduce VEXA testbed, find AI scam-risk explanations can appear grounded yet dilute risk
10. Membrane, a self-evolving contrastive safety memory for LLM agent defense, published on arXiv
11. Alibaba AAIG to detail REAL agent-risk matrix mapping 69 vulnerabilities across 37+ products at AICon Shanghai
Follow Calathea for all things AI: https://t.me/calatheaai
1. Ideogram releases Ideogram 4.0, a 9.3B-parameter open-weight text-to-image model trained from scratch
2. AWS launches SageMaker Data Agent in Query Editor, turning natural language into SQL for Redshift and Athena
3. Airbnb CEO Brian Chesky plans a new AI company focused on user interaction and design
4. Anthropic says Claude hits 76% on open-ended coding problems, up 50 points in 6 months
5. White House issues Executive Order 14409 on AI cybersecurity, frontier model governance, vulnerability assessment
6. NVIDIA releases Nemotron 3.5 Content Safety with custom enterprise policies, reasoning traces, 140-language support
7. Cog ships first eval product: 100-hour enterprise evals with financial guarantee, vs METR's 16-hour cap
8. Scotch raises $20M Series A led by VMG Partners; AI liquor-retail OS tops $1B in payments processed
9. Heajun An et al. introduce VEXA testbed, find AI scam-risk explanations can appear grounded yet dilute risk
10. Membrane, a self-evolving contrastive safety memory for LLM agent defense, published on arXiv
11. Alibaba AAIG to detail REAL agent-risk matrix mapping 69 vulnerabilities across 37+ products at AICon Shanghai
Follow Calathea for all things AI: https://t.me/calatheaai
X (formerly Twitter)
Hugging Face (@huggingface) on X
RT @ideogram_ai: Introducing Ideogram 4.0: the best open image model in the world.
Think it. Make it. Own it.
Download the weights, fine-…
Think it. Make it. Own it.
Download the weights, fine-…
🗒 Calathea Daily AI Roundups · 06 Jun 2026
1. Ramp leads week's US megarounds with $750M at $44B valuation; three $500M AI, space deals follow
2. Trump signs National Security Presidential Memorandum on AI, directing multi-vendor model onboarding, compute buildout
3. Google reports Gemini Enterprise Agentic RAG hits 90.1% accuracy on FramesQA cross-corpus retrieval
4. OpenAI helped build Odessia Travel, an AI travel agent, in 5 months
5. Bilibili launches Build in Bilibili AI contest; viewer coin-votes, not judges, decide prizes through Aug 20
6. Thousand Token Wood runs five-agent trading economy on Qwen2.5-3B, served via vLLM on Modal
Follow Calathea for all things AI: https://t.me/calatheaai
1. Ramp leads week's US megarounds with $750M at $44B valuation; three $500M AI, space deals follow
2. Trump signs National Security Presidential Memorandum on AI, directing multi-vendor model onboarding, compute buildout
3. Google reports Gemini Enterprise Agentic RAG hits 90.1% accuracy on FramesQA cross-corpus retrieval
4. OpenAI helped build Odessia Travel, an AI travel agent, in 5 months
5. Bilibili launches Build in Bilibili AI contest; viewer coin-votes, not judges, decide prizes through Aug 20
6. Thousand Token Wood runs five-agent trading economy on Qwen2.5-3B, served via vLLM on Modal
Follow Calathea for all things AI: https://t.me/calatheaai
Crunchbase News
The Week’s 10 Biggest Funding Rounds: Megarounds Proliferate, Led By Enterprise Software, AI, And Space Tech
Startup investors were in a spendy mood this week, backing more than a dozen rounds in the multiple hundreds of millions. Of those, the biggest one went to spend-management platform Ramp, which closed on $750 million, followed by three $500 million rounds…
/loop used to mean Ralph-style retry loops.
Now it’s starting to mean continuous orchestration via cron jobs: agents supervising agents, spawning threads, checking work, recovering state, and looping until verified.
Basically, AI systems building themselves. Great read.
https://x.com/mvanhorn/status/2063865685558903149
Now it’s starting to mean continuous orchestration via cron jobs: agents supervising agents, spawning threads, checking work, recovering state, and looping until verified.
Basically, AI systems building themselves. Great read.
https://x.com/mvanhorn/status/2063865685558903149
X (formerly Twitter)
Matt Van Horn (@mvanhorn) on X
WTF Is a Loop? Peter Steinberger vs. Boris Cherny
🗒 Calathea Daily AI Roundups · 08 Jun 2026
1. Korgul et al. release TRAP benchmark; web agents fall to prompt injection in 13% (GPT-5) to 43% (DeepSeek-R1) of tasks
2. Boris Cherny shares 5 tips for running Opus autonomously with /loop, cloud Claude Code and self-verification
3. arXiv paper finds attack selection in agentic AI control evaluations meaningfully decreases measured safety
4. Vercel AI Gateway recovers over 1 trillion tokens a month at zero markup over model labs
5. arXiv paper introduces benchmark and agentic framework for measuring user-level privacy leakage on social media
6. arXiv paper proposes detecting LLM deception via stability asymmetry: stable internal CoT, unstable external response
7. NHS England scales Microsoft 365 Copilot to 500,000+ staff; early trials show 43 min/day saved
8. TRACE, a framework to detect hidden LLM-agent sabotage, scores 0.713 F1, 0.844 recall on SHADE-Arena
9. Google's Gemma 4 MTP support officially merged into llama.cpp for lightweight QAT inference
10. WM-MS3M, an agentic world model for 6G O-RAN control, cuts forecast RMSE 35-80%, arXiv paper says
Follow Calathea for all things AI: https://t.me/calatheaai
1. Korgul et al. release TRAP benchmark; web agents fall to prompt injection in 13% (GPT-5) to 43% (DeepSeek-R1) of tasks
2. Boris Cherny shares 5 tips for running Opus autonomously with /loop, cloud Claude Code and self-verification
3. arXiv paper finds attack selection in agentic AI control evaluations meaningfully decreases measured safety
4. Vercel AI Gateway recovers over 1 trillion tokens a month at zero markup over model labs
5. arXiv paper introduces benchmark and agentic framework for measuring user-level privacy leakage on social media
6. arXiv paper proposes detecting LLM deception via stability asymmetry: stable internal CoT, unstable external response
7. NHS England scales Microsoft 365 Copilot to 500,000+ staff; early trials show 43 min/day saved
8. TRACE, a framework to detect hidden LLM-agent sabotage, scores 0.713 F1, 0.844 recall on SHADE-Arena
9. Google's Gemma 4 MTP support officially merged into llama.cpp for lightweight QAT inference
10. WM-MS3M, an agentic world model for 6G O-RAN control, cuts forecast RMSE 35-80%, arXiv paper says
Follow Calathea for all things AI: https://t.me/calatheaai
arXiv.org
It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents
Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking. Their reliance on dynamic web content, however, makes them...
Claude Code just added nested subagents
Anthropic’s Boris Cherny says Claude Code can now let agents launch other agents, with an initial nesting depth capped at 5.
The goal is context management: instead of forcing one agent to carry the whole task, child agents can handle isolated workstreams in their own context.
Worth testing today if you use Claude Code for agentic coding workflows.
Source: https://x.com/bcherny/status/2064327225504403752
Anthropic’s Boris Cherny says Claude Code can now let agents launch other agents, with an initial nesting depth capped at 5.
The goal is context management: instead of forcing one agent to carry the whole task, child agents can handle isolated workstreams in their own context.
Worth testing today if you use Claude Code for agentic coding workflows.
Source: https://x.com/bcherny/status/2064327225504403752
X (formerly Twitter)
Boris Cherny (@bcherny) on X
Just landed nested subagent support in Claude Code
Starting to experiment more with agents kicking off agents as a way to better manage context. Capped at depth=5 to start, going out in today’s release.
Lmk what you think!
Starting to experiment more with agents kicking off agents as a way to better manage context. Capped at depth=5 to start, going out in today’s release.
Lmk what you think!
🗒 Calathea Daily AI Roundups · 09 Jun 2026
1. Stephanie Palazzolo reports Anthropic is launching Claude Fable, a “neutered” Mythos variant priced at 2x Opus
2. Sam Altman shares OpenAI’s current plan for its public benefit corporation path and public-good commitments
3. OpenAI says it filed a confidential S-1 and may go public sooner if that becomes the best path
4. OpenAI Devs says ChinaRxiv retranslated 23,000+ papers after replacing a complex OCR pipeline with GPT-5.5
5. Cognition launches FrontierCode, a coding benchmark that asks real maintainers whether AI-generated code is mergeable
6. Steve Yegge launches yegge.ai as his new consulting and writing hub, says Haiku one-shotted the site
7. arXiv paper finds RAG context injection can suppress specific brands from safety-trained LLM recommendations
8. arXiv paper finds RLHF provides shallow alignment, leaves partisan structure intact in LLMs
9. Google integrates Gemini into Apple’s Foundation Models framework via Firebase SDK, starting iOS 27
Follow Calathea for all things AI: https://t.me/calatheaai
1. Stephanie Palazzolo reports Anthropic is launching Claude Fable, a “neutered” Mythos variant priced at 2x Opus
2. Sam Altman shares OpenAI’s current plan for its public benefit corporation path and public-good commitments
3. OpenAI says it filed a confidential S-1 and may go public sooner if that becomes the best path
4. OpenAI Devs says ChinaRxiv retranslated 23,000+ papers after replacing a complex OCR pipeline with GPT-5.5
5. Cognition launches FrontierCode, a coding benchmark that asks real maintainers whether AI-generated code is mergeable
6. Steve Yegge launches yegge.ai as his new consulting and writing hub, says Haiku one-shotted the site
7. arXiv paper finds RAG context injection can suppress specific brands from safety-trained LLM recommendations
8. arXiv paper finds RLHF provides shallow alignment, leaves partisan structure intact in LLMs
9. Google integrates Gemini into Apple’s Foundation Models framework via Firebase SDK, starting iOS 27
Follow Calathea for all things AI: https://t.me/calatheaai
X (formerly Twitter)
Stephanie Palazzolo (@steph_palazzolo) on X
Scoop: A neutered version of Mythos called Claude Fable is coming today. It's expensive—2x the price of Opus—but perhaps not as pricey as people might have thought from the initial Mythos pricing (5x Opus).
More on that and Apple WWDC in AI Agenda:
htt…
More on that and Apple WWDC in AI Agenda:
htt…
FYI for Claude subscription users on CLI:
Use
Fable 5 is only included for subscriptions until June 22; after that Anthropic says usage credits will be required unless capacity improves.
Source: https://www.anthropic.com/news/claude-fable-5-mythos-5
Use
/model claude-fable-5[1m] manually — otherwise you may be defaulted to the 200k context version instead of 1M.Fable 5 is only included for subscriptions until June 22; after that Anthropic says usage credits will be required unless capacity improves.
Source: https://www.anthropic.com/news/claude-fable-5-mythos-5
Anthropic
Claude Fable 5 and Claude Mythos 5
Today we’re launching Claude Fable 5: a Mythos-class model that we’ve made safe for general use.
👍2
Tried Claude Fable 5 with this approach and it works well.
The model seems better when you stop over-prescribing the steps. Give it the objective, define what done looks like, add verification criteria, then let it find the path.
A lot of old Claude.md files may now be too restrictive.
Source: https://x.com/alexalbert__/status/2064467657483829441
The model seems better when you stop over-prescribing the steps. Give it the objective, define what done looks like, add verification criteria, then let it find the path.
A lot of old Claude.md files may now be too restrictive.
Source: https://x.com/alexalbert__/status/2064467657483829441
X (formerly Twitter)
Alex Albert (@alexalbert__) on X
We've reset usage limits across our products!
For those just starting to test Fable, here's four tips for using it more effectively:
1. Give it bigger, more ambitious tasks than what previous models could handle.
2. Use xhigh/high effort as your default…
For those just starting to test Fable, here's four tips for using it more effectively:
1. Give it bigger, more ambitious tasks than what previous models could handle.
2. Use xhigh/high effort as your default…
👍1
No Calathea headlines today. We are using the window to rebuild the crawler properly with Claude Fable 5.
The reason is simple: Fable looks like a completely new tier of coding model, not just an incremental upgrade. On FrontierCode Main, Fable 5 scores 46.3, ahead of Claude Opus 4.8 at 34.3 and GPT-5.5 at 25.5. That is roughly 35% above the previous Claude frontier baseline on a benchmark designed to test whether code is actually good enough to merge.
This radically changes the economics of technical debt.
The old rule was to avoid rewrites unless absolutely necessary, because rewrites were slow, expensive, and risky. That tradeoff is changing. Anthropic says:
Conservatively, that is a 98% reduction in elapsed time. When the cost of rewriting falls that much, the opportunity cost flips. Calathea is only about a month old, but counterintuitively, the bigger risk is not rewriting too early. The bigger risk is letting brittle crawler assumptions, weak freshness checks, and patched source logic harden into the foundation.
So today we are taking the time to rebuild the crawler, source coverage, freshness checks, and
The goal is to catch not just AI news, but also strong writeups, opinion pieces, and the actual product changes behind them, then curate those into snippets that hold up for technical readers and everyday readers alike. The timing matters because Anthropic says Agent SDK and
Hopefully, we’ll be back tomorrow evening. Until then.
The reason is simple: Fable looks like a completely new tier of coding model, not just an incremental upgrade. On FrontierCode Main, Fable 5 scores 46.3, ahead of Claude Opus 4.8 at 34.3 and GPT-5.5 at 25.5. That is roughly 35% above the previous Claude frontier baseline on a benchmark designed to test whether code is actually good enough to merge.
This radically changes the economics of technical debt.
The old rule was to avoid rewrites unless absolutely necessary, because rewrites were slow, expensive, and risky. That tradeoff is changing. Anthropic says:
“During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.”
Conservatively, that is a 98% reduction in elapsed time. When the cost of rewriting falls that much, the opportunity cost flips. Calathea is only about a month old, but counterintuitively, the bigger risk is not rewriting too early. The bigger risk is letting brittle crawler assumptions, weak freshness checks, and patched source logic harden into the foundation.
So today we are taking the time to rebuild the crawler, source coverage, freshness checks, and
claude -p automation path properly. The new system will also track sources that normal news crawlers miss: changelogs, release notes, docs diffs, SDK updates, app changes, and decompiled product surfaces.The goal is to catch not just AI news, but also strong writeups, opinion pieces, and the actual product changes behind them, then curate those into snippets that hold up for technical readers and everyday readers alike. The timing matters because Anthropic says Agent SDK and
claude -p usage on subscription plans will move to a separate monthly Agent SDK credit starting June 15, 2026. Hopefully, we’ll be back tomorrow evening. Until then.
devin.ai
Claude Fable 5 is now available in Devin
Claude Fable 5 is now available in Devin across Cloud, Desktop, and CLI. It earns the #1 score on FrontierCode, our benchmark for real-world engineering tasks graded on mergeability and quality.
🔥2❤1
Tragic and sad day. This is a terrible display of hegemonic power through the weaponisation of intelligence. Intelligence is the new hegemony.
Original Anthropic post:
https://x.com/AnthropicAI/status/2065597531644743999
My TED talk here:
https://x.com/pokeapallascat/status/2065687739702587642
Original Anthropic post:
https://x.com/AnthropicAI/status/2065597531644743999
My TED talk here:
https://x.com/pokeapallascat/status/2065687739702587642
🗒 Calathea Daily AI Roundups · 15 Jun 2026
1. OpenRouter introduces Fusion API, a compound-model API that runs model panels behind one endpoint
2. Telegram adds rich formatting for chatbots, including tables, nested lists, media, formulas, and headers
3. Kimi launches K2.7 Code HighSpeed, a faster mode for its open-source multimodal coding model
4. Satya Nadella says the next enterprise AI moat is a proprietary learning loop, not a single model
5. Nex says Rio 3.5 closely matches a Nex N2 Pro and Qwen 3.5 model merge, raising open-weight attribution questions
6. Anthropic’s Fable 5 and Mythos 5 access fight turns into an AI export-control flashpoint
7. The UK seeks a US carve-out for Anthropic model access as AI export restrictions widen
8. ByteDance reportedly negotiates for 50,000+ Iluvatar AI chips as China pushes GPU substitution
9. Sakana AI launches Marlin, its first commercial product for autonomous business research
10. MagicLab releases Magic-VLA K02 and Magic-Mix models for embodied-AI robots
11. OpenAI launches Partner Network with $150M to expand enterprise AI deployment
12. KPMG reportedly pulls an AI usage report after apparent hallucination errors were found
Follow Calathea for all things AI: https://t.me/calatheaai
1. OpenRouter introduces Fusion API, a compound-model API that runs model panels behind one endpoint
2. Telegram adds rich formatting for chatbots, including tables, nested lists, media, formulas, and headers
3. Kimi launches K2.7 Code HighSpeed, a faster mode for its open-source multimodal coding model
4. Satya Nadella says the next enterprise AI moat is a proprietary learning loop, not a single model
5. Nex says Rio 3.5 closely matches a Nex N2 Pro and Qwen 3.5 model merge, raising open-weight attribution questions
6. Anthropic’s Fable 5 and Mythos 5 access fight turns into an AI export-control flashpoint
7. The UK seeks a US carve-out for Anthropic model access as AI export restrictions widen
8. ByteDance reportedly negotiates for 50,000+ Iluvatar AI chips as China pushes GPU substitution
9. Sakana AI launches Marlin, its first commercial product for autonomous business research
10. MagicLab releases Magic-VLA K02 and Magic-Mix models for embodied-AI robots
11. OpenAI launches Partner Network with $150M to expand enterprise AI deployment
12. KPMG reportedly pulls an AI usage report after apparent hallucination errors were found
Follow Calathea for all things AI: https://t.me/calatheaai
X (formerly Twitter)
OpenRouter (@OpenRouter) on X
Introducing the Fusion API, the smartest compound model in the market.
Fusion achieves Fable-level intelligence at half the price.
How it works 👇
Fusion achieves Fable-level intelligence at half the price.
How it works 👇
🔥3
🗒 Calathea Daily AI Roundups · 16 Jun 2026
1. SpaceX says it exercised its option to acquire Cursor in an all-stock deal to advance frontier AI models
2. DeepSeek becomes China’s most valuable AI startup after over $7.4 billion fundraise, WSJ reports
3. Alibaba unveils robotics AI models as it shifts from chatbots to agentic systems
4. Microsoft turns to AWS for GitHub AI capacity as internal cloud crunch hits coding tools
5. Fin joins Salesforce to scale its AI customer agent through Salesforce distribution
6. METR tests no-CoT task-completion horizons to gauge frontier model monitoring risk
7. Meta rolls out Facebook AI Mode search using public posts for AI-generated results
8. Zibianliang open-sources XRZero-G0 robotics data box, cutting collection cost to 1/20
9. NVIDIA details fusion kernels to boost MoE training throughput for large-scale AI systems
10. Artificial Analysis v4.1 shifts Intelligence Index toward agentic workloads and adds per-task cost, time and token metrics
11. Tensordyne announces Napier AI Processor using logarithmic math for AI compute
12. Microsoft launches Discovery on Azure to support agentic AI workflows for Majorana 2 quantum-chip R&D
13. U.S. startups drew nearly 88% of 2026 AI funding, Crunchbase says, with $319 billion led by OpenAI and Anthropic
14. Snowflake makes AIM migration agent its recommended path for enterprise migrations with validation workflows
15. Anthropic’s developer API can now separately flag some Claude refusals as frontier-model safety cases, instead of grouping them under older generic refusal labels
Follow Calathea for all things AI: https://t.me/calatheaai
1. SpaceX says it exercised its option to acquire Cursor in an all-stock deal to advance frontier AI models
2. DeepSeek becomes China’s most valuable AI startup after over $7.4 billion fundraise, WSJ reports
3. Alibaba unveils robotics AI models as it shifts from chatbots to agentic systems
4. Microsoft turns to AWS for GitHub AI capacity as internal cloud crunch hits coding tools
5. Fin joins Salesforce to scale its AI customer agent through Salesforce distribution
6. METR tests no-CoT task-completion horizons to gauge frontier model monitoring risk
7. Meta rolls out Facebook AI Mode search using public posts for AI-generated results
8. Zibianliang open-sources XRZero-G0 robotics data box, cutting collection cost to 1/20
9. NVIDIA details fusion kernels to boost MoE training throughput for large-scale AI systems
10. Artificial Analysis v4.1 shifts Intelligence Index toward agentic workloads and adds per-task cost, time and token metrics
11. Tensordyne announces Napier AI Processor using logarithmic math for AI compute
12. Microsoft launches Discovery on Azure to support agentic AI workflows for Majorana 2 quantum-chip R&D
13. U.S. startups drew nearly 88% of 2026 AI funding, Crunchbase says, with $319 billion led by OpenAI and Anthropic
14. Snowflake makes AIM migration agent its recommended path for enterprise migrations with validation workflows
15. Anthropic’s developer API can now separately flag some Claude refusals as frontier-model safety cases, instead of grouping them under older generic refusal labels
Follow Calathea for all things AI: https://t.me/calatheaai
X (formerly Twitter)
SpaceX (@SpaceX) on X
SpaceX has exercised the option to acquire @cursor_ai in an all-stock transaction with the goal of building the world’s most useful AI models.
For the past few months, SpaceXAI has been jointly training a model with Cursor, which will be released in Cursor…
For the past few months, SpaceXAI has been jointly training a model with Cursor, which will be released in Cursor…
🗒 Calathea Daily AI Roundups · 17 Jun 2026
1. Z.ai's GLM-5.2 tops Artificial Analysis open-weights index at 51 with 744B parameters
2. GitHub cuts off GitHub Models for new orgs on June 16, keeping existing users until retirement
3. NVIDIA says Blackwell swept MLPerf Training v6.0, ranking fastest at scale and per accelerator
4. HPE, NVIDIA expand AI Factory with Vera CPU, Agent Toolkit for production agentic AI
5. Snowflake ships Adaptive Compute GA on AWS for spiky AI and analytics queries
6. Wolfram launches Version 15 with AI assistants for Wolfram Language and Mathematica notebooks
7. Interconnects reviews Olmo-style frontier post-training recipes with Finbarr Timbers
8. OpenAI discusses better evals for measuring and forecasting model progress as benchmarks saturate or get gamed
9. Anthropic shares economic research framework for tracking Claude Code usage, task value, and scaling patterns
10. Midjourney will announce its first hardware project at a 6 p.m. PT San Francisco livestream
Follow Calathea for all things AI: https://t.me/calatheaai
1. Z.ai's GLM-5.2 tops Artificial Analysis open-weights index at 51 with 744B parameters
2. GitHub cuts off GitHub Models for new orgs on June 16, keeping existing users until retirement
3. NVIDIA says Blackwell swept MLPerf Training v6.0, ranking fastest at scale and per accelerator
4. HPE, NVIDIA expand AI Factory with Vera CPU, Agent Toolkit for production agentic AI
5. Snowflake ships Adaptive Compute GA on AWS for spiky AI and analytics queries
6. Wolfram launches Version 15 with AI assistants for Wolfram Language and Mathematica notebooks
7. Interconnects reviews Olmo-style frontier post-training recipes with Finbarr Timbers
8. OpenAI discusses better evals for measuring and forecasting model progress as benchmarks saturate or get gamed
9. Anthropic shares economic research framework for tracking Claude Code usage, task value, and scaling patterns
10. Midjourney will announce its first hardware project at a 6 p.m. PT San Francisco livestream
Follow Calathea for all things AI: https://t.me/calatheaai
👍2
🗒 Calathea Daily AI Roundups · 19 Jun 2026
1. Artificial Analysis launches AA-Briefcase, a long-horizon benchmark for agentic knowledge work
2. GLM-5.2 tops Vals AI’s index across legal, finance, proof and vibe-coding benchmarks
3. Ollama doubles US-based GPU capacity for GLM-5.2 on Blackwell B300s
4. Google announces Agentic Resource Discovery, an open spec for agents to find and verify tools, skills, MCP servers and other agents
5. White House and Anthropic shift talks toward frontier AI security rules
6. Manus investors reportedly plan a $2B buyback from Meta after China security pressure
7. xAI puts Grok models on Databricks Agent Bricks for enterprise data-agent workflows
8. OpenAI Codex adds Record & Replay, turning demos into reusable skills on macOS
9. Cursor adds GitHub issue, review and workflow-run triggers plus computer use for cloud agents
10. Midjourney moves beyond image generation with a 60-second ultrasound-style body scanner and 2027 research spa
11. Claude Code fixes a usage-limit bug that affected about 3% of Max and Pro users
12. Databricks’ AI agent playbook highlights enterprise deployment failure cases, including stale-policy traces and PII test breaches
Follow Calathea for all things AI: https://t.me/calatheaai
1. Artificial Analysis launches AA-Briefcase, a long-horizon benchmark for agentic knowledge work
2. GLM-5.2 tops Vals AI’s index across legal, finance, proof and vibe-coding benchmarks
3. Ollama doubles US-based GPU capacity for GLM-5.2 on Blackwell B300s
4. Google announces Agentic Resource Discovery, an open spec for agents to find and verify tools, skills, MCP servers and other agents
5. White House and Anthropic shift talks toward frontier AI security rules
6. Manus investors reportedly plan a $2B buyback from Meta after China security pressure
7. xAI puts Grok models on Databricks Agent Bricks for enterprise data-agent workflows
8. OpenAI Codex adds Record & Replay, turning demos into reusable skills on macOS
9. Cursor adds GitHub issue, review and workflow-run triggers plus computer use for cloud agents
10. Midjourney moves beyond image generation with a 60-second ultrasound-style body scanner and 2027 research spa
11. Claude Code fixes a usage-limit bug that affected about 3% of Max and Pro users
12. Databricks’ AI agent playbook highlights enterprise deployment failure cases, including stale-policy traces and PII test breaches
Follow Calathea for all things AI: https://t.me/calatheaai
X (formerly Twitter)
Artificial Analysis (@ArtificialAnlys) on X
Announcing AA-Briefcase, the benchmark for the next era of agentic knowledge work
AA-Briefcase is our new benchmark for testing models on long-horizon knowledge work tasks in complex projects built by industry experts. Models are evaluated on multi-week…
AA-Briefcase is our new benchmark for testing models on long-horizon knowledge work tasks in complex projects built by industry experts. Models are evaluated on multi-week…
❤1
🗒 Calathea Daily AI Roundups · 20 Jun 2026
1. AlphaFold co-creator John Jumper is leaving Google DeepMind for Anthropic, extending a week of major Google AI talent moves after Noam Shazeer said he is joining OpenAI
2. Popular anon AI source Arfur Grok says Barret Zoph has left OpenAI, adding another frontier-lab talent move to the week
3. SemiAnalysis says Beijing launched a space computing center for radiation-hardened AI chips, orbital compute platforms, and space-optimized LLMs
4. Dean Ball reflects on joining OpenAI to build frontier AI policy as labs move closer to the center of AI governance
5. Cloudflare rolls out temporary accounts so AI agents can deploy Workers without signup
6. Dwarkesh Patel argues AI capabilities still rely heavily on massive training data, with sample efficiency as the key bottleneck
7. Elliot Arledge releases KernelBench-Hard and KernelBench-Mega results across H100, B200, and RTX PRO 6000, with Claude Opus 4.8 leading GPU kernel generation benchmarks
8. Nous Research announces Hermes Agent v0.17.0 Reach Release
9. Federal regulators ordered grid operators to speed power access for AI datacenters
10. Peter Steinberger shared that Codex can now hand off work threads from laptops to remote hosts
Follow Calathea for all things AI: https://t.me/calatheaai
1. AlphaFold co-creator John Jumper is leaving Google DeepMind for Anthropic, extending a week of major Google AI talent moves after Noam Shazeer said he is joining OpenAI
2. Popular anon AI source Arfur Grok says Barret Zoph has left OpenAI, adding another frontier-lab talent move to the week
3. SemiAnalysis says Beijing launched a space computing center for radiation-hardened AI chips, orbital compute platforms, and space-optimized LLMs
4. Dean Ball reflects on joining OpenAI to build frontier AI policy as labs move closer to the center of AI governance
5. Cloudflare rolls out temporary accounts so AI agents can deploy Workers without signup
6. Dwarkesh Patel argues AI capabilities still rely heavily on massive training data, with sample efficiency as the key bottleneck
7. Elliot Arledge releases KernelBench-Hard and KernelBench-Mega results across H100, B200, and RTX PRO 6000, with Claude Opus 4.8 leading GPU kernel generation benchmarks
8. Nous Research announces Hermes Agent v0.17.0 Reach Release
9. Federal regulators ordered grid operators to speed power access for AI datacenters
10. Peter Steinberger shared that Codex can now hand off work threads from laptops to remote hosts
Follow Calathea for all things AI: https://t.me/calatheaai
X (formerly Twitter)
John Jumper (@JohnJumperSci) on X
A bit of news: After nearly 9 years, I have decided to leave Google DeepMind and join Anthropic (after taking some time to recharge). I am incredibly grateful for my time at GDM. @demishassabis took a real chance letting me lead the AlphaFold team just six…
❤1🔥1
🗒 Calathea Daily AI Roundups · 22 Jun 2026
1. Sakana says Fugu matches Fable, Mythos benchmarks and routes around vendor restrictions
2. NVIDIA's Rubin AI infrastructure uses 45C, 100% liquid cooling to improve AI server efficiency
3. Samsung Electronics deploys ChatGPT Enterprise and Codex worldwide in major OpenAI enterprise rollout
4. llama.cpp b9745 adds Step3.5/3.7 flash MTP3 support and speculative multi-head draft paths
5. Sakana AI’s Fugu Ultra hits Vercel AI Gateway, routing work to 1-3 agents
6. PyTorch patches bmm Triton launch bug hitting vLLM tests when CUDA tensors sit on cuda:1 but device is cuda:0
Follow Calathea for all things AI: https://t.me/calatheaai
1. Sakana says Fugu matches Fable, Mythos benchmarks and routes around vendor restrictions
2. NVIDIA's Rubin AI infrastructure uses 45C, 100% liquid cooling to improve AI server efficiency
3. Samsung Electronics deploys ChatGPT Enterprise and Codex worldwide in major OpenAI enterprise rollout
4. llama.cpp b9745 adds Step3.5/3.7 flash MTP3 support and speculative multi-head draft paths
5. Sakana AI’s Fugu Ultra hits Vercel AI Gateway, routing work to 1-3 agents
6. PyTorch patches bmm Triton launch bug hitting vLLM tests when CUDA tensors sit on cuda:1 but device is cuda:0
Follow Calathea for all things AI: https://t.me/calatheaai
X (formerly Twitter)
Sakana AI (@SakanaAILabs) on X
Fugu stands shoulder-to-shoulder with leading models like Fable and Mythos across the industry's most rigorous engineering, scientific, and reasoning benchmarks.
Read the full blog: https://t.co/JqPwOUToGQ
Beyond Bigger Models: Why are Orchestration Models…
Read the full blog: https://t.co/JqPwOUToGQ
Beyond Bigger Models: Why are Orchestration Models…
🔥3
🗒 Calathea Daily AI Roundups · 23 Jun 2026
1. Google DeepMind and A24 launch reported $75M research partnership to build filmmaker-shaped AI tools
2. SpaceX signs $6.3B compute deal with Reflection AI for GB300 access at Colossus 2
3. Samsung and SK hynix plan over 300 trillion won for Gwangju AI chip cluster
4. OpenAI expands Daybreak with Codex Security plugin and full GPT-5.5-Cyber for trusted defenders
5. Baseten raises $1.5B Series F at $13B valuation after 20x revenue growth
6. Baseten serves GLM-5.2 at 280+ tokens/sec on NVIDIA Blackwell via inference-stack optimizations
7. Micron announces Anthropic pact covering AI memory, storage, Claude adoption and Series H investment
8. Google launches Interactions API as unified interface for Gemini models and agents
9. Cursor says it is training a new model with SpaceX in 3 Compile keynote announcements
10. A-Evolve-Training post-trains 30B Nemotron to 0.86 vs top human 0.87 via autonomous 4-round loop
11. Los Alamos National Laboratory’s Mission will include 2,300 NVIDIA Vera CPUs, with Veritas adding approximately 1,150
12. GitHub Copilot adds Claude provider preview, org agents and cloud agent GA for JetBrains IDEs
13. GDPval-AA compares GLM-5.2 with Claude Fable 5, GPT-5.5 and Gemini 3.5 Flash on 3 rendered briefs
14. NVIDIA details Halos for Robotics safety stack for physical AI robots working around people
Follow Calathea for all things AI: https://t.me/CalatheaAI
1. Google DeepMind and A24 launch reported $75M research partnership to build filmmaker-shaped AI tools
2. SpaceX signs $6.3B compute deal with Reflection AI for GB300 access at Colossus 2
3. Samsung and SK hynix plan over 300 trillion won for Gwangju AI chip cluster
4. OpenAI expands Daybreak with Codex Security plugin and full GPT-5.5-Cyber for trusted defenders
5. Baseten raises $1.5B Series F at $13B valuation after 20x revenue growth
6. Baseten serves GLM-5.2 at 280+ tokens/sec on NVIDIA Blackwell via inference-stack optimizations
7. Micron announces Anthropic pact covering AI memory, storage, Claude adoption and Series H investment
8. Google launches Interactions API as unified interface for Gemini models and agents
9. Cursor says it is training a new model with SpaceX in 3 Compile keynote announcements
10. A-Evolve-Training post-trains 30B Nemotron to 0.86 vs top human 0.87 via autonomous 4-round loop
11. Los Alamos National Laboratory’s Mission will include 2,300 NVIDIA Vera CPUs, with Veritas adding approximately 1,150
12. GitHub Copilot adds Claude provider preview, org agents and cloud agent GA for JetBrains IDEs
13. GDPval-AA compares GLM-5.2 with Claude Fable 5, GPT-5.5 and Gemini 3.5 Flash on 3 rendered briefs
14. NVIDIA details Halos for Robotics safety stack for physical AI robots working around people
Follow Calathea for all things AI: https://t.me/CalatheaAI
❤1
🗒 Calathea Daily AI Roundups · 24 Jun 2026
1. OpenAI unveils Jalapeño, its first AI chip built with Broadcom for ChatGPT, Codex and API workloads
2. Menlo Ventures raised $3B, its largest in 50 years, to back AI startups across sectors
3. Qwen-AgentWorld-397B-A17B scores 58.71 on 7-domain AgentWorldBench, topping GPT-5.4
4. NVIDIA says DFlash boosts Blackwell LLM inference up to 15x via lightweight draft model
5. Qualcomm agrees to acquire Modular to expand hardware-independent AI compute platform
6. Unitree advertises the R1 humanoid robot from $4,900 with ready stock
7. Claude Code 2.1.187 ships 21 CLI changes with secret blocking and 5-minute remote MCP aborts
8. Replit details ViBench, A/B tests and Telescope for improving coding agents at scale
9. Nous Research’s Hermes Agent adds /learn to turn code, PDFs and docs into reusable verifiable skills
10. NVIDIA and AWS make cuVS vector indexing default in OpenSearch Serverless, add RTX PRO 4500 EC2 G7 for AI
11. SpaceX launches $25B notes offering for debt repayment and AI expansion
12. Mistral AI makes OCR 4 available via API, Studio, SageMaker, Foundry and self-hosting
13. GPT-5.6 speculation says model could be costlier or larger than GPT-5.5, based on screenshots
14. Legal-tech firm Legion sued the US over an order limiting foreign access to top-tier Anthropic models
Follow Calathea for all things AI: https://t.me/CalatheaAI
1. OpenAI unveils Jalapeño, its first AI chip built with Broadcom for ChatGPT, Codex and API workloads
2. Menlo Ventures raised $3B, its largest in 50 years, to back AI startups across sectors
3. Qwen-AgentWorld-397B-A17B scores 58.71 on 7-domain AgentWorldBench, topping GPT-5.4
4. NVIDIA says DFlash boosts Blackwell LLM inference up to 15x via lightweight draft model
5. Qualcomm agrees to acquire Modular to expand hardware-independent AI compute platform
6. Unitree advertises the R1 humanoid robot from $4,900 with ready stock
7. Claude Code 2.1.187 ships 21 CLI changes with secret blocking and 5-minute remote MCP aborts
8. Replit details ViBench, A/B tests and Telescope for improving coding agents at scale
9. Nous Research’s Hermes Agent adds /learn to turn code, PDFs and docs into reusable verifiable skills
10. NVIDIA and AWS make cuVS vector indexing default in OpenSearch Serverless, add RTX PRO 4500 EC2 G7 for AI
11. SpaceX launches $25B notes offering for debt repayment and AI expansion
12. Mistral AI makes OCR 4 available via API, Studio, SageMaker, Foundry and self-hosting
13. GPT-5.6 speculation says model could be costlier or larger than GPT-5.5, based on screenshots
14. Legal-tech firm Legion sued the US over an order limiting foreign access to top-tier Anthropic models
Follow Calathea for all things AI: https://t.me/CalatheaAI
X (formerly Twitter)
OpenAI (@OpenAI) on X
We’ve designed and built our first AI chip: Jalapeño.
Designed from the ground up by OpenAI and brought to production with @Broadcom, Jalapeño is purpose-built for the LLM workloads powering ChatGPT, Codex, the API, and future agentic products.
Chips are…
Designed from the ground up by OpenAI and brought to production with @Broadcom, Jalapeño is purpose-built for the LLM workloads powering ChatGPT, Codex, the API, and future agentic products.
Chips are…