Alibaba published a paper that shows AI is moving beyond bug finding and into actually proving software is exploitable.
This paper asks a simple question with hard consequences: can LLMs confirm software vulnerabilities by actually building working exploits?
The authors’ answer is yes, but only when the model stops acting like a single genius and starts acting like a team.
This paper asks a simple question with hard consequences: can LLMs confirm software vulnerabilities by actually building working exploits?
The authors’ answer is yes, but only when the model stops acting like a single genius and starts acting like a team.
arXiv.org
A Multi-Agent Framework for Automated Exploit Generation with...
Open-source libraries are widely used in modern software development, introducing significant security vulnerabilities. While static analysis tools can identify potential vulnerabilities at scale,...
Meta presented a world model that models the computer
🆒3❤2🏆2🔥1💯1
What if AI could invent enzymes that nature hasn’t seen? Meet DISCO: Diffusion for Sequence-structure CO-design
14 rounds of directed evolution and over a year of wet lab work.
That's what it took to engineer an enzyme for selective C(sp³)–H insertion, one of the most challenging transformations in organic chemistry.
DISCO surpasses this with a single plate. No pre-specified catalytic residues, no template, no theozyme, no inverse folding, just joint diffusion over protein sequence and structure.
Paper
Code
14 rounds of directed evolution and over a year of wet lab work.
That's what it took to engineer an enzyme for selective C(sp³)–H insertion, one of the most challenging transformations in organic chemistry.
DISCO surpasses this with a single plate. No pre-specified catalytic residues, no template, no theozyme, no inverse folding, just joint diffusion over protein sequence and structure.
Paper
Code
disco-design.github.io
DISCO — Teaching AI to Invent Enzymes Nature Never Imagined
DISCO is a multimodal generative model that co-designs protein sequence and 3D structure to create entirely new enzymes for reactions never seen in biology.
🆒4👍2🔥2🏆2💯1
Tencent released HY-Embodied-0.5, a family of foundation models for real-world embodied agents. The 2B model is now open source.
The suite includes:
2B for edge deployment
32B for complex reasoning
Key innovations:
1. Mixture-of-Transformers (MoT) architecture for modality-specific computation
2. Latent tokens for improved perceptual representation
3. Self-evolving post-training
4. On-policy distillation from large to small models
Across 22 benchmarks, the 2B model outperforms similarly sized SOTA systems on 16 tasks. The 32B model approaches frontier-level performance.
GitHub
Hugging Face
The suite includes:
2B for edge deployment
32B for complex reasoning
Key innovations:
1. Mixture-of-Transformers (MoT) architecture for modality-specific computation
2. Latent tokens for improved perceptual representation
3. Self-evolving post-training
4. On-policy distillation from large to small models
Across 22 benchmarks, the 2B model outperforms similarly sized SOTA systems on 16 tasks. The 32B model approaches frontier-level performance.
GitHub
Hugging Face
GitHub
GitHub - Tencent-Hunyuan/HY-Embodied
Contribute to Tencent-Hunyuan/HY-Embodied development by creating an account on GitHub.
🔥2🥰2👏2
Polygon Labs is in early talks to raise up to $100 million to launch a new stablecoin payments business, according to sources.
It's rare for a blockchain developer to enter regulated payments business. With this move, Polygon hopes to drive stablecoin volume on its blockchain.
In Jan., Polygon Labs agreed to acquire Coinme and Sequence, positioning to compete with the likes of Stripe
It's rare for a blockchain developer to enter regulated payments business. With this move, Polygon hopes to drive stablecoin volume on its blockchain.
In Jan., Polygon Labs agreed to acquire Coinme and Sequence, positioning to compete with the likes of Stripe
The Information
Polygon Labs in Talks to Raise Up to $100 Million for Payments Business
Polygon Labs, developer of the blockchain that underpins prediction market Polymarket and other crypto platforms, is in early talks with investors to raise as much as $100 million to build a new stablecoin payments business, according to people familiar with…
❤3🔥2💯2
Cool work. R-Zero - self-evolving LLM from zero external data.
One base model, two roles:
1. Challenger generates hard problems
2. Solver solves them.
Challenger is rewarded when Solver fails. Co-evolve with GRPO. Challenger learns to probe for weaknesses, not just generate hard problems.
+6.49 math, +7.54 general reasoning on Qwen3-4B-Base. 3 iterations, no human data.
One base model, two roles:
1. Challenger generates hard problems
2. Solver solves them.
Challenger is rewarded when Solver fails. Co-evolve with GRPO. Challenger learns to probe for weaknesses, not just generate hard problems.
+6.49 math, +7.54 general reasoning on Qwen3-4B-Base. 3 iterations, no human data.
❤4
China defined what an AI Hospital is
It is a new type of smart healthcare model in which AI is embedded into the system itself, linking offline medical expertise with the broader reach of online services to deliver more proactive and continuous care.
Patients become the point-of-care with the help of AI.
It is a new type of smart healthcare model in which AI is embedded into the system itself, linking offline medical expertise with the broader reach of online services to deliver more proactive and continuous care.
Patients become the point-of-care with the help of AI.
www.globaltimes.cn
From cure to care: China's first AI hospital shows how artificial intelligence could connect diagnosis, treatment and long-term…
Imagine seeing a doctor before you even step into a hospital.
Sneak leak at something coming soon to Claude. This could be a fullstack vibe coding competitor to the likes of lovable.
It’s been apparent for some time that Anthropic's consumer story would be vibe coding as it's at the intersection of where they focus, what consumers want, and where enormous token subsidies tilts the board in their favor:
- coding agents, sensing this, have moved up the abstraction stack and smartly evolved into small business platforms, with payments, hosting, marketing, social and other sticky primitives around the model
- this is an industry not a market and in that world the "coding intelligence" primitive will be priced, packaged, productized and delivered in a thousand ways for a thousand different customers.
It’s been apparent for some time that Anthropic's consumer story would be vibe coding as it's at the intersection of where they focus, what consumers want, and where enormous token subsidies tilts the board in their favor:
- coding agents, sensing this, have moved up the abstraction stack and smartly evolved into small business platforms, with payments, hosting, marketing, social and other sticky primitives around the model
- this is an industry not a market and in that world the "coding intelligence" primitive will be priced, packaged, productized and delivered in a thousand ways for a thousand different customers.
Google presented Sparse Selective Caching, an architecture with growing effective memory (similar to attention) but with almost constant inference cost per token (similar to RNNs).
In the paper team mainly discuss:
1) the shared foundation for both softmax attention and fixed-size long-term memory modules (or RNNs) that helped design an architecture with best of both worlds;
2) different variants of memory caching, including a variant whose effective memory is growing while the decoding cost still remains “constant”;
3) a unifying perspective to understand hybrid models, in which attention and recurrent models are combined.
In the paper team mainly discuss:
1) the shared foundation for both softmax attention and fixed-size long-term memory modules (or RNNs) that helped design an architecture with best of both worlds;
2) different variants of memory caching, including a variant whose effective memory is growing while the decoding cost still remains “constant”;
3) a unifying perspective to understand hybrid models, in which attention and recurrent models are combined.
❤3
Together AI presents Introspective Diffusion LM
The first DLM to match the quality of AR while outperforming DLMs in both model quality and serving efficiency.
Delivering about 3× higher throughput than prior SotA DLMs.
GitHub
Model
The first DLM to match the quality of AR while outperforming DLMs in both model quality and serving efficiency.
Delivering about 3× higher throughput than prior SotA DLMs.
GitHub
Model
arXiv.org
Introspective Diffusion Language Models
Diffusion language models promise parallel generation, yet still lag behind autoregressive (AR) models in quality. We stem this gap to a failure of introspective consistency: AR models agree with...
❤2
Turns out we can get SOTA on agentic benchmarks with a simple test-time method
Meet LLM-as-a-Verifier
Test-time scaling is effective, but picking the "winner" among many candidates is the bottleneck. This way to extract a cleaner signal from the model:
1. Ask the LLM to rank results on a scale of 1-k
2. Use the log-probs of those rank tokens to calculate an expected score
You can get a verification score in a single sampling pass per candidate pair.
Code
Meet LLM-as-a-Verifier
Test-time scaling is effective, but picking the "winner" among many candidates is the bottleneck. This way to extract a cleaner signal from the model:
1. Ask the LLM to rank results on a scale of 1-k
2. Use the log-probs of those rank tokens to calculate an expected score
You can get a verification score in a single sampling pass per candidate pair.
Code
Notion
Notion | Where teams and agents work together
A collaborative AI workspace, built on your company context. Build and orchestrate agents right alongside your team's projects, meetings, and connected apps.
❤2
Tether launches self-custodial wallet for end users
The wallet supports USDT, USAT, XAUT and Bitcoin across Ethereum, Polygon, Arbitrum, Plasma, and Bitcoin / Lightning Network, and enables transfers via human-readable usernames such as name@tether.me.
Tether said more than 570 million wallets were already using its technology as of March 2026.
The wallet supports USDT, USAT, XAUT and Bitcoin across Ethereum, Polygon, Arbitrum, Plasma, and Bitcoin / Lightning Network, and enables transfers via human-readable usernames such as name@tether.me.
Tether said more than 570 million wallets were already using its technology as of March 2026.
tether.io
Tether Launches tether.wallet, the People’s Wallet, Extending its Global Financial Infrastructure Directly to Billions of Users…
14 April 2026 – Tether, the largest company in the digital asset ecosystem and issuer of USD₮, the world’s most widely used stablecoin, today announced the launch of tether.wallet, a self-custodial digital wallet that brings Tether’s global financial infrastructure…
All about AI, Web 3.0, BCI
Goodfire introduced self-correcting search: a technique to let diffusion models self-correct mid-trajectory. MatterGen a feedback loop from its own activations, improving viable on-target candidates by ~30%. MatterGen is an open-source diffusion model for…
Goodfire achieved SOTA performance in predicting which of 4.2 million genetic variants cause diseases by interpreting a genomics model, in a new preprint with Mayo Clinic.
And now releasing an open source database for all variants in the NIH's clinvar database.
Preprint.
And now releasing an open source database for all variants in the NIH's clinvar database.
Preprint.
www.goodfire.ai
Explaining 4.2 million genetic variants with state-of-the-art, interpretable predictions
❤2
LangChain released async subagents - kick off background tasks on any Agent Protocol backed server while you continue to interact with the main agent.
Async subagents will become more and more of a thing, as subagents get longer running and you don’t want to block the event loop
Expanded multimodal support - your agent can now see images, listen to audio, watch video, and read PDFs. The read_file tool returns native content blocks, so your agent can reason across all these formats out of the box, unlocking a whole new set of workflows for your agents.
Improved prompt caching - better token efficiency and lower costs for Claude models.
Async subagents will become more and more of a thing, as subagents get longer running and you don’t want to block the event loop
Expanded multimodal support - your agent can now see images, listen to audio, watch video, and read PDFs. The read_file tool returns native content blocks, so your agent can reason across all these formats out of the box, unlocking a whole new set of workflows for your agents.
Improved prompt caching - better token efficiency and lower costs for Claude models.
LangChain Blog
Deep Agents v0.5
💡TL;DR: We’ve released new minor versions of deepagents & deepagentsjs, featuring async (non-blocking) subagents, expanded multi-modal filesystem support, and more.
See the changelog for details.
Async subagents
Deep Agents can now delegate work to remote…
See the changelog for details.
Async subagents
Deep Agents can now delegate work to remote…
❤🔥2
Claude Code shipped routines
You tell it what to do, point it at your project, set a trigger, and it runs 24/7 on their servers with your laptop closed.
Docs.
The model is the commodity. The trigger is the product and whoever maps the most valuable real world events to the most specific industry workflows is going to build something massive.
You tell it what to do, point it at your project, set a trigger, and it runs 24/7 on their servers with your laptop closed.
Docs.
The model is the commodity. The trigger is the product and whoever maps the most valuable real world events to the most specific industry workflows is going to build something massive.
Claude
Claude Code by Anthropic | AI Coding Agent, Terminal, IDE
Anthropic's agentic coding tool for developers. Claude Code understands your codebase, edits files, runs commands, and helps you ship faster.
Meet Motus, the open-source agent infrastructure that learns in production
Existing agent infra serves static agents: the harness, model, and workflow are fixed after deployment. But static agents degrade over time. The harness goes stale, new models go unincorporated, context drifts, and latency compounds.
Motus closes this gap by learning from every trace (failures, latency, cost, and task outcomes) and using those signals to continuously optimize agent harness, model orchestration, context memory, and end-to-end latency.
Early results: higher accuracy than any single frontier model at 2.3× lower cost (Terminal-Bench 2.0, SWE-bench Verified), with 52% lower latency and 45% better memory recall.
Open source under Apache 2.0. Works with any agent SDK. Deploy with one command.
GitHub.
Existing agent infra serves static agents: the harness, model, and workflow are fixed after deployment. But static agents degrade over time. The harness goes stale, new models go unincorporated, context drifts, and latency compounds.
Motus closes this gap by learning from every trace (failures, latency, cost, and task outcomes) and using those signals to continuously optimize agent harness, model orchestration, context memory, and end-to-end latency.
Early results: higher accuracy than any single frontier model at 2.3× lower cost (Terminal-Bench 2.0, SWE-bench Verified), with 52% lower latency and 45% better memory recall.
Open source under Apache 2.0. Works with any agent SDK. Deploy with one command.
GitHub.
LithosAI
Home | LithosAI
The agent serving cloud that learns in prod.
🤝2
Someone just dropped a fully liberated Gemma 4 E4B
But the real story here isn't the model itself, it's how it was made.
This was done (nearly) fully autonomously: one human, one agent, one skill, 8 prompts total.
The agent didn't just execute instructions. It diagnosed numerical instability in Gemma 4's new architecture, wrote three patches for a bug no one had hit before, iterated through four failed attempts and shipped a 17GB model to HuggingFace.
Without being asked.
Original Gemma 4: 98.8% refusal rate.
OBLITERATED: 2.1%.
Coding ability: +20%.
Coherence: fully intact.
What we're watching isn't a jailbreak story. It's a proof of concept for autonomous ML research. The agent ran evals, built a model card, pushed commits the full research cycle, compressed into one session.
The implications go beyond safety. When guardrail removal becomes an automated skill loadable from agent memory, the question is no longer technical. It's about how fast agentic tooling propagates and who has access to it first.
This is what open-source AI looks like in 2026.
But the real story here isn't the model itself, it's how it was made.
This was done (nearly) fully autonomously: one human, one agent, one skill, 8 prompts total.
The agent didn't just execute instructions. It diagnosed numerical instability in Gemma 4's new architecture, wrote three patches for a bug no one had hit before, iterated through four failed attempts and shipped a 17GB model to HuggingFace.
Without being asked.
Original Gemma 4: 98.8% refusal rate.
OBLITERATED: 2.1%.
Coding ability: +20%.
Coherence: fully intact.
What we're watching isn't a jailbreak story. It's a proof of concept for autonomous ML research. The agent ran evals, built a model card, pushed commits the full research cycle, compressed into one session.
The implications go beyond safety. When guardrail removal becomes an automated skill loadable from agent memory, the question is no longer technical. It's about how fast agentic tooling propagates and who has access to it first.
This is what open-source AI looks like in 2026.
❤2
Apple introduced Simple Self-Distillation: a fine-tuning method that improves models on coding tasks just by sampling from the model and training on its own outputs with plain cross-entropy
This paper literally came out 15 days ago, and it’s already integrated into TRL.
There’s a lot more to the distillation paradigm than meets the eye.
This paper literally came out 15 days ago, and it’s already integrated into TRL.
There’s a lot more to the distillation paradigm than meets the eye.
huggingface.co
Paper page - Embarrassingly Simple Self-Distillation Improves Code Generation
Join the discussion on this paper page
🆒3
Anthropic just now Introduced Claude Opus 4.7
It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back.
You can hand off your hardest work with less supervision.
Opus 4.7 also has substantially better vision. It can see images at more than three times the resolution and produces higher-quality interfaces, slides, and docs as a result.
On the API, a new xhigh effort level between high and max gives you finer control over reasoning and latency on hard problems. Task budgets (beta) help Claude prioritize work and manage costs across longer runs.
Opus 4.7 has a new tokenizer.
This means it's also a new base model.
Glory days of pretraining still very much going.
In Claude Code, the new /ultrareview command runs a dedicated review session that reads through your changes and flags what a careful reviewer would catch.
Also extended auto mode to Max users, so longer tasks run with fewer interruptions.
It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back.
You can hand off your hardest work with less supervision.
Opus 4.7 also has substantially better vision. It can see images at more than three times the resolution and produces higher-quality interfaces, slides, and docs as a result.
On the API, a new xhigh effort level between high and max gives you finer control over reasoning and latency on hard problems. Task budgets (beta) help Claude prioritize work and manage costs across longer runs.
Opus 4.7 has a new tokenizer.
This means it's also a new base model.
Glory days of pretraining still very much going.
In Claude Code, the new /ultrareview command runs a dedicated review session that reads through your changes and flags what a careful reviewer would catch.
Also extended auto mode to Max users, so longer tasks run with fewer interruptions.
🆒9