Apple's Xcode now has direct integration with the Claude Agent SDK, giving developers the full functionality of Claude Code for building on Apple platforms, from iPhone to Mac to Apple Vision Pro.
Anthropic
Apple’s Xcode now supports the Claude Agent SDK
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
🔥8❤2👏2
Google launched a first-of-its-kind nationwide randomized study with Included Health to evaluate AI in a real-world virtual care setting and better understand its capabilities and limitations
Google launched Institutional Review Board (IRB) approval, a prospective consented nationwide randomized study to assess AI in a real-world virtual care setting. This new research will build upon our foundational research on the use of AI for diagnostic and management reasoning, personalized health insights and navigating health information.
This study is informed by years of foundational research across Google, investigating the capabilities required for a helpful and safe medical AI.
Google launched Institutional Review Board (IRB) approval, a prospective consented nationwide randomized study to assess AI in a real-world virtual care setting. This new research will build upon our foundational research on the use of AI for diagnostic and management reasoning, personalized health insights and navigating health information.
This study is informed by years of foundational research across Google, investigating the capabilities required for a helpful and safe medical AI.
Google Research
Collaborating on a nationwide randomized study of AI in real-world virtual care
In partnership with Included Health, we will be launching a first-of-its-kind nationwide study to evaluate conversational AI within real-world virtual care workflows. This research will move beyond simulation and retrospective data and aim to gather rigorous…
❤2👍2🔥2
Meet Q Labs a research lab focused on solving generalization.
Alongside others (SSI, Flapping Airplanes), Q Labs see data efficiency as the key problem, but they're taking an unconventional approach to solve it: a new learning algorithm approximating Solomonoff induction.
Why Solomonoff Induction? It's provably optimal for prediction. The idea is simple: search for all programs that fit the data and favor low-complexity ones.
Since it's uncomputable, they're building a practical approximation in the context of neural nets.
Alongside others (SSI, Flapping Airplanes), Q Labs see data efficiency as the key problem, but they're taking an unconventional approach to solve it: a new learning algorithm approximating Solomonoff induction.
Why Solomonoff Induction? It's provably optimal for prediction. The idea is simple: search for all programs that fit the data and favor low-complexity ones.
Since it's uncomputable, they're building a practical approximation in the context of neural nets.
qlabs.sh
Q - Research Lab Solving Generalization
Q is a research lab building learning algorithms beyond gradient descent to solve generalization -- the core open problem in AI.
🔥3🥰2👏2
Mistral Introduced Voxtral Transcribe 2, next-gen speech-to-text models
SOTA transcription, speaker diarization, sub-200ms real-time latency.
Voxtral Realtime is built for voice agents and live applications. Its natively streaming architecture delivers latency configurable to sub-200ms. And at 480ms, it stays within 1-2% WER of our offline model.
Mistral released the model as open weights under Apache 2.0.
The demo is worth a try - ignore the "No microphone found" message, clicking "Record" and allowing your browser to use a microphone fixes that. It transcribes very accurately in almost real-time. It's really impressive.
SOTA transcription, speaker diarization, sub-200ms real-time latency.
Voxtral Realtime is built for voice agents and live applications. Its natively streaming architecture delivers latency configurable to sub-200ms. And at 480ms, it stays within 1-2% WER of our offline model.
Mistral released the model as open weights under Apache 2.0.
The demo is worth a try - ignore the "No microphone found" message, clicking "Record" and allowing your browser to use a microphone fixes that. It transcribes very accurately in almost real-time. It's really impressive.
huggingface.co
Voxtral Mini Realtime - a Hugging Face Space by mistralai
This app lets you speak into your microphone and see your words appear as live text. Just enter your Mistral API key, click the mic button, and talk – the app streams the audio to the Voxtral model...
🔥5👍3❤2
OpenAI introduced Frontier a new platform that helps enterprises build, deploy, and manage AI coworkers that can do real work.
Frontier gives agents the same skills people need to succeed at work:
- Understand how work gets done
- Use a computer and tools
- Improve quality over time
- Stay governed & observable
Also pair OpenAI Forward Deployed Engineers with your team, working side by side to develop your best practices to build and run agents in production.
Frontier gives agents the same skills people need to succeed at work:
- Understand how work gets done
- Use a computer and tools
- Improve quality over time
- Stay governed & observable
Also pair OpenAI Forward Deployed Engineers with your team, working side by side to develop your best practices to build and run agents in production.
🔥5❤2👏2
Anthropic Introduced Claude Opus 4.6, It’s first Opus-class model with 1M token context in beta
Opus 4.6 plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes.
Opus 4.6 is SOTA on several evaluations including agentic coding, multi-discipline reasoning, knowledge work, and agentic search.
Anthropic also shipping new features across Claude in Excel, Claude in PowerPoint, Claude Code, and API to let Opus 4.6 do even more.
Claude in Excel now handles long-running and harder tasks with improved performance.
It can plan before acting, support richer functionalities like conditional formatting and data validation, and handle multi-step changes in one pass.
On Claude Code Anthropic introduced agent teams.
Spin up multiple agents that coordinate autonomously and work in parallel—best for tasks that can be split up and tackled independently.
Agent teams are in research preview
Opus 4.6 plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes.
Opus 4.6 is SOTA on several evaluations including agentic coding, multi-discipline reasoning, knowledge work, and agentic search.
Anthropic also shipping new features across Claude in Excel, Claude in PowerPoint, Claude Code, and API to let Opus 4.6 do even more.
Claude in Excel now handles long-running and harder tasks with improved performance.
It can plan before acting, support richer functionalities like conditional formatting and data validation, and handle multi-step changes in one pass.
On Claude Code Anthropic introduced agent teams.
Spin up multiple agents that coordinate autonomously and work in parallel—best for tasks that can be split up and tackled independently.
Agent teams are in research preview
❤7👍2🔥2
General framework of AI agents.
What if a single framework could unify every AI agent, from software chatbots to physical robots?
Dr. Hang Li from ByteDance proposes exactly that in a new JCST paper.
It's a universal blueprint where agents use LLMs as their "brain" for reasoning, are built via reinforcement learning, and operate using tools & long-term memory to complete tasks.
This general framework outperforms fragmented approaches, providing a unified theory for agent development across both software and hardware domains.
What if a single framework could unify every AI agent, from software chatbots to physical robots?
Dr. Hang Li from ByteDance proposes exactly that in a new JCST paper.
It's a universal blueprint where agents use LLMs as their "brain" for reasoning, are built via reinforcement learning, and operate using tools & long-term memory to complete tasks.
This general framework outperforms fragmented approaches, providing a unified theory for agent development across both software and hardware domains.
🤩4❤3🥰2👏2
$11T by 2030. This is the prediction ARK Investment estimates on-chain assets could grow to ~$11T by 2030, driven by deposits, public equities, credit, and funds moving on-chain.
This is where we are today:
• Stablecoin supply crossed $300B in 2025, with real transaction volumes now competing with legacy payment rails
• Tokenised real-world assets reached ~$19B, led by Treasuries and commodities
• Ethereum hosts the majority of that on-chain value
This is what will fuel the growth:
• Deposits moving on-chain for faster settlement and global liquidity
• Public equities and funds reducing issuance and operational costs via tokenisation
• Credit markets adopting programmable settlement and collateral workflows
• Banks, asset managers, fintechs, and payment networks actively launching on-chain rails
• Public blockchains increasingly used as back-end infrastructure, not front-end products
ARK Investment Management LLC wording is telling:
“Ethereum remains the preferred blockchain for on-chain assets.”
$19B today.
$11T by 2030.
This is where we are today:
• Stablecoin supply crossed $300B in 2025, with real transaction volumes now competing with legacy payment rails
• Tokenised real-world assets reached ~$19B, led by Treasuries and commodities
• Ethereum hosts the majority of that on-chain value
This is what will fuel the growth:
• Deposits moving on-chain for faster settlement and global liquidity
• Public equities and funds reducing issuance and operational costs via tokenisation
• Credit markets adopting programmable settlement and collateral workflows
• Banks, asset managers, fintechs, and payment networks actively launching on-chain rails
• Public blockchains increasingly used as back-end infrastructure, not front-end products
ARK Investment Management LLC wording is telling:
“Ethereum remains the preferred blockchain for on-chain assets.”
$19B today.
$11T by 2030.
KPMG told its auditor, Grant Thornton UK, it should pass on cost savings from the rollout of AI and threatened to find a new accountant if it did not agree to a significant fee reduction, the people said.
Ft
KPMG pressed its auditor to pass on AI cost savings
Big Four accounting firm’s move to cut fees for its own audit comes amid debate over pricing model
🔥3❤2👏2
Waymo introduced World Model a frontier generative mode for large-scale, hyper-realistic autonomous driving simulation built on Google DeepMind’s Genie 3.
Waymo
The Waymo World Model: A New Frontier For Autonomous Driving Simulation
We are excited to introduce the Waymo World Model, a frontier generative model that sets a new bar for large-scale, hyper-realistic autonomous driving simulation.
🔥4🆒3❤2👍1
Meta is preparing to get new avocado models, Manus browser agent and integration with Openclaw
What’s new?
- Meta AI website got migrated to the new stack while retaining the same user experience.
- New effort selector, email, and calendar connectors are already available to users.
- Memory and Projects are in the works.
- Avacado and Avacado Thinking models have been spotted in testing. The router still redirects tothe Llama model.
- Meta is testing different models from other providers underneath, including Gemini 3 Pro preview, Claude Sonnet 4.5 and GPT-5.2. This part was likely inherited from Manus AI but used only internally.
- Scheduled tasks feature is under development.
- A new model named Sierra is being tested to power the upcoming browser agent. This likely will be the same agent used in Manus AI currently.
- Big Brain mode is in the works, where multiple model responses will be combined into a final answer.
- OpenClaw integration has been spotted to let Meta AI connect to your OpenClaw gateway.
In short, Meta is on the path to slowly transform into Manus. OpenClaw integration might be big, but something that other Labs may easily adapt as well. We are yet to know if Avacado's performance will surpass recently released models, where Avacado is expected to be released around this spring.
What’s new?
- Meta AI website got migrated to the new stack while retaining the same user experience.
- New effort selector, email, and calendar connectors are already available to users.
- Memory and Projects are in the works.
- Avacado and Avacado Thinking models have been spotted in testing. The router still redirects tothe Llama model.
- Meta is testing different models from other providers underneath, including Gemini 3 Pro preview, Claude Sonnet 4.5 and GPT-5.2. This part was likely inherited from Manus AI but used only internally.
- Scheduled tasks feature is under development.
- A new model named Sierra is being tested to power the upcoming browser agent. This likely will be the same agent used in Manus AI currently.
- Big Brain mode is in the works, where multiple model responses will be combined into a final answer.
- OpenClaw integration has been spotted to let Meta AI connect to your OpenClaw gateway.
In short, Meta is on the path to slowly transform into Manus. OpenClaw integration might be big, but something that other Labs may easily adapt as well. We are yet to know if Avacado's performance will surpass recently released models, where Avacado is expected to be released around this spring.
TestingCatalog
Meta AI redies Avacado, Manus Agent and OpenClaw integration
Meta AI is testing Avocado models, MCP integrations, and Manus browser agent support, with scheduled tasks and OpenClaw compatibility launching soon.
❤2🔥2👏2
Chinese new year is in less than 10 days. This is DeepSeek or Zhipu GLM-5?
OpenRouter announced a new “stealth” large language model, Pony Alpha, described as a next-generation foundation model excelling in coding, reasoning, and roleplay tasks, and optimized for agentic workflows with precise tool-calling.
It is free. The provider logs all prompts and completions for this model, which may be used to improve the model.
The model is free on OpenRouter, though all prompts and completions are logged to improve it.
Multiple Tech PhDs and Silicon Valley entrepreneurs speculate it could be DeepSeek-V4, Zhipu GLM’s new model, or Grok 4.2/Claude 5, with the “Pony” name and Year of the Horse hinting at a Chinese origin.
OpenRouter partner Kilo Code suggested in a blog that Pony Alpha is “a special evolution of a popular open-source global model,” making DeepSeek-V4 or Zhipu GLM-5 the most likely candidates.
Zhipu up over 40% in Hongkong at one point, hitting new peak on AI optimism.
OpenRouter announced a new “stealth” large language model, Pony Alpha, described as a next-generation foundation model excelling in coding, reasoning, and roleplay tasks, and optimized for agentic workflows with precise tool-calling.
It is free. The provider logs all prompts and completions for this model, which may be used to improve the model.
The model is free on OpenRouter, though all prompts and completions are logged to improve it.
Multiple Tech PhDs and Silicon Valley entrepreneurs speculate it could be DeepSeek-V4, Zhipu GLM’s new model, or Grok 4.2/Claude 5, with the “Pony” name and Year of the Horse hinting at a Chinese origin.
OpenRouter partner Kilo Code suggested in a blog that Pony Alpha is “a special evolution of a popular open-source global model,” making DeepSeek-V4 or Zhipu GLM-5 the most likely candidates.
Zhipu up over 40% in Hongkong at one point, hitting new peak on AI optimism.
openrouter.ai
Pony Alpha
Pony is a cutting-edge foundation model with strong performance in coding, agentic workflows, reasoning, and roleplay, making it well suited for hands-on coding and real-world use.
**Note:** All prompts and completions for this model are logged by the provider…
**Note:** All prompts and completions for this model are logged by the provider…
❤2🔥2👏2
Meet EchoJEPA is the first world model for medical video
• 18M echocardiograms
• 300K patients
• Learns heart dynamics — not imaging noise
EchoJEPA discards what’s unpredictable and locks onto what matters clinically:
- chamber geometry
- wall motion
- valve dynamics
The results (frozen encoder, no fine-tuning):
• 20% ↓ error in LVEF
• 17% ↓ error in RVSP
• 79% accuracy with 1% labels (vs 42% for baselines w/ 100%)
• 2% degradation under acoustic artifacts (vs 17%)
• Zero-shot pediatric transfer beats all fine-tuned models
GitHub.
• 18M echocardiograms
• 300K patients
• Learns heart dynamics — not imaging noise
EchoJEPA discards what’s unpredictable and locks onto what matters clinically:
- chamber geometry
- wall motion
- valve dynamics
The results (frozen encoder, no fine-tuning):
• 20% ↓ error in LVEF
• 17% ↓ error in RVSP
• 79% accuracy with 1% labels (vs 42% for baselines w/ 100%)
• 2% degradation under acoustic artifacts (vs 17%)
• Zero-shot pediatric transfer beats all fine-tuned models
GitHub.
❤3🔥2👏2
Google DeepMind introduced a new paper on learning temporally abstract world models and policies (options).
Key idea:
1.use LLM to propose features for a factorized product of experts world model;
2. use this to predict abstract world state after each macro action to help RL explore.
Key idea:
1.use LLM to propose features for a factorized product of experts world model;
2. use this to predict abstract world state after each macro action to help RL explore.
👍5❤3👏2
Google, UC Berkeley and an international team of researchers present Aletheia, a math research agent built on Gemini
The system uses AI to systematically scan hundreds of complex conjectures, filtering through potential proofs with natural language verification before sending the best candidates to human experts for final review.
The team resolved 13 "open" problems from the Erdős database, generating 4 brand-new solutions and identifying 9 others that were actually solved in obscure corners of existing literature.
The system uses AI to systematically scan hundreds of complex conjectures, filtering through potential proofs with natural language verification before sending the best candidates to human experts for final review.
The team resolved 13 "open" problems from the Erdős database, generating 4 brand-new solutions and identifying 9 others that were actually solved in obscure corners of existing literature.
❤2🔥2👏2
Bytedance dropped advanced video generation model
Seedance 2.0 has:
— native audio gen (lipsynced speech + music)
— drastic step up from Veo 3.1 / Sora 2 in quality
— supports multimodal input
— 2k resolution
Goes beyond cinematic video, and can do product demos as well. And it's really hard to tell it's AI.
Seedance 2.0 has:
— native audio gen (lipsynced speech + music)
— drastic step up from Veo 3.1 / Sora 2 in quality
— supports multimodal input
— 2k resolution
Goes beyond cinematic video, and can do product demos as well. And it's really hard to tell it's AI.
WaveSpeedAI
Seedance 2.0 Complete Guide: Multimodal Video Creation - WaveSpeedAI Blog
Master Seedance 2.0's multimodal video generation with this comprehensive guide. Learn how to combine images, videos, audio, and text to create professional-quality videos with precise control over motion, style, and storytelling.
🔥3👏3❤2
The PaddleOCR Document Parsing Skill is now live on ClawHub, ready to plug directly into OpenClaw workflows.
Instead of deploying OCR services or wiring APIs, developers can now invoke PaddleOCR as a standardized composable Skill node — embedding document understanding directly into Agents and automation pipelines.
Built on PaddleOCR-VL-1.5, the Skill delivers
1. Multi-format parsing (PDF, JPG, PNG, BMP, TIFF)
2. Layout analysis — text, tables, formulas, headers
3. 110+ language coverage
4. Structured Markdown output preserving hierarchy
No deployment. No wrappers. Just configuration — and build your document intelligence chain inside OpenClaw.
Instead of deploying OCR services or wiring APIs, developers can now invoke PaddleOCR as a standardized composable Skill node — embedding document understanding directly into Agents and automation pipelines.
Built on PaddleOCR-VL-1.5, the Skill delivers
1. Multi-format parsing (PDF, JPG, PNG, BMP, TIFF)
2. Layout analysis — text, tables, formulas, headers
3. 110+ language coverage
4. Structured Markdown output preserving hierarchy
No deployment. No wrappers. Just configuration — and build your document intelligence chain inside OpenClaw.
GitHub
GitHub - PaddlePaddle/PaddleOCR: Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit…
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages. - PaddlePaddle/Paddl...
🔥4❤3👏3🤔1
What if your model could learn from its own drafts during RL training?
NVIDIA introduced iGRPO: Iterative Group Relative Policy Optimization.
Researchers add a self-feedback loop to GRPO: the model drafts multiple solutions, picks its best one, then learns to refine beyond it.
Core idea:
Stage 1 → explore and select your strongest attempt. Stage 2 → condition on that attempt and beat it.
Same scalar reward. No critics, no generated critiques, no verification text. The best draft is the only feedback the model needs.
Results across 7B / 8B / 14B models:
• Nemotron-H-8B-Base-8K: 41.1% → 45.0% (+3.96 over GRPO)
• DeepSeek-R1-Distill-Qwen-7B: 68.3% → 69.9%
• OpenMath-Nemotron-14B: 76.7% → 78.0%
• OpenReasoning-Nemotron-7B on AceReason-Math: 85.62% AIME24 / 79.64% AIME25
The same two-stage wrapper also improves DAPO and GSPO. It's not tied to GRPO at all.
NVIDIA introduced iGRPO: Iterative Group Relative Policy Optimization.
Researchers add a self-feedback loop to GRPO: the model drafts multiple solutions, picks its best one, then learns to refine beyond it.
Core idea:
Stage 1 → explore and select your strongest attempt. Stage 2 → condition on that attempt and beat it.
Same scalar reward. No critics, no generated critiques, no verification text. The best draft is the only feedback the model needs.
Results across 7B / 8B / 14B models:
• Nemotron-H-8B-Base-8K: 41.1% → 45.0% (+3.96 over GRPO)
• DeepSeek-R1-Distill-Qwen-7B: 68.3% → 69.9%
• OpenMath-Nemotron-14B: 76.7% → 78.0%
• OpenReasoning-Nemotron-7B on AceReason-Math: 85.62% AIME24 / 79.64% AIME25
The same two-stage wrapper also improves DAPO and GSPO. It's not tied to GRPO at all.
arXiv.org
iGRPO: Self-Feedback-Driven LLM Reasoning
Large Language Models (LLMs) have shown promise in solving complex mathematical problems, yet they still fall short of producing accurate and consistent solutions. Reinforcement Learning (RL) is a...
❤4🔥3👏3
Google introduced DialogLab a new open-source prototyping framework, uses a human-in-the-loop control strategy to achieve realistic human-AI group simulation, offering a necessary alternative to fully autonomous agents.
Evaluations with domain experts found that its "Human Control" mode (where you can edit, accept, or dismiss real-time AI suggestions) was preferred in realism, effectiveness, and engagement.
DialogLab transforms dialogue design from rigid scripts to spontaneous, adaptable group dynamics.
Evaluations with domain experts found that its "Human Control" mode (where you can edit, accept, or dismiss real-time AI suggestions) was preferred in realism, effectiveness, and engagement.
DialogLab transforms dialogue design from rigid scripts to spontaneous, adaptable group dynamics.
Google Research
Beyond one-on-one: Authoring, simulating, and testing dynamic human-AI group conversations
DialogLab is a research prototype that provides a unified interface to configure conversational scenes, define agent personas, manage group structures, specify turn-taking rules, and orchestrate transitions between scripted narratives and improvisation.
❤2🔥2👏2
This new research introduces Agyn, an open-source multi-agent platform that models software engineering as a team-based organizational process rather than a monolithic task.
The system configures a team of four specialized agents: a manager, researcher, engineer, and reviewer. Each operates within its own isolated sandbox with role-specific tools, prompts, and language model configurations. The manager agent coordinates dynamically based on intermediate outcomes rather than following a fixed pipeline.
What makes the design interesting?
Different agents use different models depending on their role. The manager and researcher run on GPT-5 for stronger reasoning and broader context. The engineer and reviewer use GPT-5-Codex, a smaller code-specialized model optimized for iterative implementation and debugging. This mirrors how real teams allocate resources based on task requirements.
The workflow follows a GitHub-native process. Agents analyze issues, create pull requests, conduct inline code reviews, and iterate through revision cycles until the reviewer explicitly approves. No human intervention at any point. The number of steps isn't predetermined. It emerges from task complexity.
The system configures a team of four specialized agents: a manager, researcher, engineer, and reviewer. Each operates within its own isolated sandbox with role-specific tools, prompts, and language model configurations. The manager agent coordinates dynamically based on intermediate outcomes rather than following a fixed pipeline.
What makes the design interesting?
Different agents use different models depending on their role. The manager and researcher run on GPT-5 for stronger reasoning and broader context. The engineer and reviewer use GPT-5-Codex, a smaller code-specialized model optimized for iterative implementation and debugging. This mirrors how real teams allocate resources based on task requirements.
The workflow follows a GitHub-native process. Agents analyze issues, create pull requests, conduct inline code reviews, and iterate through revision cycles until the reviewer explicitly approves. No human intervention at any point. The number of steps isn't predetermined. It emerges from task complexity.
arXiv.org
Agyn: A Multi-Agent System for Team-Based Autonomous Software Engineering
Large language models have demonstrated strong capabilities in individual software engineering tasks, yet most autonomous systems still treat issue resolution as a monolithic or pipeline-based...
🔥3❤2👏2