Calathea

🗒 Calathea Daily AI Roundups · 04 Jun 2026

1. Google DeepMind announces Gemma 4 12B, a unified encoder-free multimodal model

2. Unitree becomes first embodied-AI firm to clear China IPO review; H1 adjusted net profit to fall up to 22%

3. xAI partners with Vapi to bring Grok voice models to voice agents

4. Qianxun Intelligence raises ¥1.5B Series A+ as Spirit v1.6 tops RoboArena over Nvidia, Physical Intelligence

5. Microsoft announces Frontier Tuning, letting customers customize its AI models via reinforcement learning

6. Cursor details Composer 2 training: specialized weights cut inference cost an order of magnitude below Opus

7. CyberGym-E2E, a real-world benchmark for AI agents' end-to-end cybersecurity capabilities, published on arXiv

8. Fundamental's NEXUS tabular foundation model ships on Amazon SageMaker JumpStart

9. Pinecone integrates Nexus with Microsoft OneLake at Build 2026 for enterprise AI agent retrieval

10. Anthropic raised $50B in May, 54% of a near-record $92B in global venture funding

11. Fei-Fei Li lays out world models as renderers, simulators, and planners

12. Sean Goedecke argues prompts are technical debt too, requiring ownership, review, and tests

X (formerly Twitter)

Google DeepMind (@GoogleDeepMind) on X

RT @googlegemma: Meet Gemma 4 12B!

A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to yo…

41 viewsedited 12:27

Calathea

This is a strong list for learning agentic coding in Jun 2026.

Like one of the comment said, half comp engineering + half LLM ops 🙃

https://vxtwitter.com/divaagurlxw/status/2062419864908951606

vxTwitter / fixvx
💖 2.18K 🔁 252

diva (@divaagurlxw)

As an AI Engineer. Please learn

>Harness engineering, not just prompt engineering

>Context engineering, not just long prompts

>Prompt caching vs. semantic caching tradeoffs

>KV cache management, eviction, reuse, and memory pressure at scale

>Prefill…

34 views06:04

Calathea

🗒 Calathea Daily AI Roundups · 05 Jun 2026

1. Ideogram releases Ideogram 4.0, a 9.3B-parameter open-weight text-to-image model trained from scratch
2. AWS launches SageMaker Data Agent in Query Editor, turning natural language into SQL for Redshift and Athena
3. Airbnb CEO Brian Chesky plans a new AI company focused on user interaction and design
4. Anthropic says Claude hits 76% on open-ended coding problems, up 50 points in 6 months
5. White House issues Executive Order 14409 on AI cybersecurity, frontier model governance, vulnerability assessment
6. NVIDIA releases Nemotron 3.5 Content Safety with custom enterprise policies, reasoning traces, 140-language support
7. Cog ships first eval product: 100-hour enterprise evals with financial guarantee, vs METR's 16-hour cap
8. Scotch raises $20M Series A led by VMG Partners; AI liquor-retail OS tops $1B in payments processed
9. Heajun An et al. introduce VEXA testbed, find AI scam-risk explanations can appear grounded yet dilute risk
10. Membrane, a self-evolving contrastive safety memory for LLM agent defense, published on arXiv
11. Alibaba AAIG to detail REAL agent-risk matrix mapping 69 vulnerabilities across 37+ products at AICon Shanghai

Follow Calathea for all things AI: https://t.me/calatheaai

X (formerly Twitter)

Hugging Face (@huggingface) on X

RT @ideogram_ai: Introducing Ideogram 4.0: the best open image model in the world.

Think it. Make it. Own it.

Download the weights, fine-…

38 viewsedited 12:12

Calathea

🗒 Calathea Daily AI Roundups · 06 Jun 2026

1. Ramp leads week's US megarounds with $750M at $44B valuation; three $500M AI, space deals follow
2. Trump signs National Security Presidential Memorandum on AI, directing multi-vendor model onboarding, compute buildout
3. Google reports Gemini Enterprise Agentic RAG hits 90.1% accuracy on FramesQA cross-corpus retrieval
4. OpenAI helped build Odessia Travel, an AI travel agent, in 5 months
5. Bilibili launches Build in Bilibili AI contest; viewer coin-votes, not judges, decide prizes through Aug 20
6. Thousand Token Wood runs five-agent trading economy on Qwen2.5-3B, served via vLLM on Modal

Follow Calathea for all things AI: https://t.me/calatheaai

Crunchbase News

The Week’s 10 Biggest Funding Rounds: Megarounds Proliferate, Led By Enterprise Software, AI, And Space Tech

Startup investors were in a spendy mood this week, backing more than a dozen rounds in the multiple hundreds of millions. Of those, the biggest one went to spend-management platform Ramp, which closed on $750 million, followed by three $500 million rounds…

35 viewsedited 13:22

Calathea

/loop used to mean Ralph-style retry loops.

Now it’s starting to mean continuous orchestration via cron jobs: agents supervising agents, spawning threads, checking work, recovering state, and looping until verified.

Basically, AI systems building themselves. Great read.

https://x.com/mvanhorn/status/2063865685558903149

X (formerly Twitter)

Matt Van Horn (@mvanhorn) on X

WTF Is a Loop? Peter Steinberger vs. Boris Cherny

28 viewsedited 10:00

Calathea

🗒 Calathea Daily AI Roundups · 08 Jun 2026

1. Korgul et al. release TRAP benchmark; web agents fall to prompt injection in 13% (GPT-5) to 43% (DeepSeek-R1) of tasks
2. Boris Cherny shares 5 tips for running Opus autonomously with /loop, cloud Claude Code and self-verification
3. arXiv paper finds attack selection in agentic AI control evaluations meaningfully decreases measured safety
4. Vercel AI Gateway recovers over 1 trillion tokens a month at zero markup over model labs
5. arXiv paper introduces benchmark and agentic framework for measuring user-level privacy leakage on social media
6. arXiv paper proposes detecting LLM deception via stability asymmetry: stable internal CoT, unstable external response
7. NHS England scales Microsoft 365 Copilot to 500,000+ staff; early trials show 43 min/day saved
8. TRACE, a framework to detect hidden LLM-agent sabotage, scores 0.713 F1, 0.844 recall on SHADE-Arena
9. Google's Gemma 4 MTP support officially merged into llama.cpp for lightweight QAT inference
10. WM-MS3M, an agentic world model for 6G O-RAN control, cuts forecast RMSE 35-80%, arXiv paper says

Follow Calathea for all things AI: https://t.me/calatheaai

arXiv.org

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking. Their reliance on dynamic web content, however, makes them...

33 views11:50

Calathea

Claude Code just added nested subagents

Anthropic’s Boris Cherny says Claude Code can now let agents launch other agents, with an initial nesting depth capped at 5.

The goal is context management: instead of forcing one agent to carry the whole task, child agents can handle isolated workstreams in their own context.

Worth testing today if you use Claude Code for agentic coding workflows.

Source: https://x.com/bcherny/status/2064327225504403752

X (formerly Twitter)

Boris Cherny (@bcherny) on X

Just landed nested subagent support in Claude Code

Starting to experiment more with agents kicking off agents as a way to better manage context. Capped at depth=5 to start, going out in today’s release.

Lmk what you think!

38 views13:16

Calathea

🗒 Calathea Daily AI Roundups · 09 Jun 2026

1. Stephanie Palazzolo reports Anthropic is launching Claude Fable, a “neutered” Mythos variant priced at 2x Opus
2. Sam Altman shares OpenAI’s current plan for its public benefit corporation path and public-good commitments
3. OpenAI says it filed a confidential S-1 and may go public sooner if that becomes the best path
4. OpenAI Devs says ChinaRxiv retranslated 23,000+ papers after replacing a complex OCR pipeline with GPT-5.5
5. Cognition launches FrontierCode, a coding benchmark that asks real maintainers whether AI-generated code is mergeable
6. Steve Yegge launches yegge.ai as his new consulting and writing hub, says Haiku one-shotted the site
7. arXiv paper finds RAG context injection can suppress specific brands from safety-trained LLM recommendations
8. arXiv paper finds RLHF provides shallow alignment, leaves partisan structure intact in LLMs
9. Google integrates Gemini into Apple’s Foundation Models framework via Firebase SDK, starting iOS 27

Follow Calathea for all things AI: https://t.me/calatheaai

X (formerly Twitter)

Stephanie Palazzolo (@steph_palazzolo) on X

Scoop: A neutered version of Mythos called Claude Fable is coming today. It's expensive—2x the price of Opus—but perhaps not as pricey as people might have thought from the initial Mythos pricing (5x Opus).

More on that and Apple WWDC in AI Agenda:

htt…

41 viewsedited 16:37

Calathea

FYI for Claude subscription users on CLI:

Use /model claude-fable-5[1m] manually — otherwise you may be defaulted to the 200k context version instead of 1M.

Fable 5 is only included for subscriptions until June 22; after that Anthropic says usage credits will be required unless capacity improves.

Source: https://www.anthropic.com/news/claude-fable-5-mythos-5

Anthropic

Claude Fable 5 and Claude Mythos 5

Today we’re launching Claude Fable 5: a Mythos-class model that we’ve made safe for general use.

👍2

45 views18:07

Calathea

Tried Claude Fable 5 with this approach and it works well.

The model seems better when you stop over-prescribing the steps. Give it the objective, define what done looks like, add verification criteria, then let it find the path.

A lot of old Claude.md files may now be too restrictive.

Source: https://x.com/alexalbert__/status/2064467657483829441

X (formerly Twitter)

Alex Albert (@alexalbert__) on X

We've reset usage limits across our products!

For those just starting to test Fable, here's four tips for using it more effectively:
1. Give it bigger, more ambitious tasks than what previous models could handle.
2. Use xhigh/high effort as your default…

👍1

43 views06:28

Calathea

No Calathea headlines today. We are using the window to rebuild the crawler properly with Claude Fable 5.

The reason is simple: Fable looks like a completely new tier of coding model, not just an incremental upgrade. On FrontierCode Main, Fable 5 scores 46.3, ahead of Claude Opus 4.8 at 34.3 and GPT-5.5 at 25.5. That is roughly 35% above the previous Claude frontier baseline on a benchmark designed to test whether code is actually good enough to merge.

This radically changes the economics of technical debt.

The old rule was to avoid rewrites unless absolutely necessary, because rewrites were slow, expensive, and risky. That tradeoff is changing. Anthropic says:

“During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.”

Conservatively, that is a 98% reduction in elapsed time. When the cost of rewriting falls that much, the opportunity cost flips. Calathea is only about a month old, but counterintuitively, the bigger risk is not rewriting too early. The bigger risk is letting brittle crawler assumptions, weak freshness checks, and patched source logic harden into the foundation.

So today we are taking the time to rebuild the crawler, source coverage, freshness checks, and claude -p automation path properly. The new system will also track sources that normal news crawlers miss: changelogs, release notes, docs diffs, SDK updates, app changes, and decompiled product surfaces.

The goal is to catch not just AI news, but also strong writeups, opinion pieces, and the actual product changes behind them, then curate those into snippets that hold up for technical readers and everyday readers alike. The timing matters because Anthropic says Agent SDK and claude -p usage on subscription plans will move to a separate monthly Agent SDK credit starting June 15, 2026.

Hopefully, we’ll be back tomorrow evening. Until then.

devin.ai

Claude Fable 5 is now available in Devin

Claude Fable 5 is now available in Devin across Cloud, Desktop, and CLI. It earns the #1 score on FrontierCode, our benchmark for real-world engineering tasks graded on mergeability and quality.

🔥2❤1

48 viewsedited 16:34

Calathea

Tragic and sad day. This is a terrible display of hegemonic power through the weaponisation of intelligence. Intelligence is the new hegemony.

Original Anthropic post:
https://x.com/AnthropicAI/status/2065597531644743999

My TED talk here:
https://x.com/pokeapallascat/status/2065687739702587642

37 views07:03

Calathea