Goodfire introduced self-correcting search: a technique to let diffusion models self-correct mid-trajectory.
MatterGen a feedback loop from its own activations, improving viable on-target candidates by ~30%.
MatterGen is an open-source diffusion model for generating novel crystal structures. When generating materials with a target property, stronger conditioning tends to improve targeting, but reduces the stability, diversity, and novelty of outputs.
MatterGen a feedback loop from its own activations, improving viable on-target candidates by ~30%.
MatterGen is an open-source diffusion model for generating novel crystal structures. When generating materials with a target property, stronger conditioning tends to improve targeting, but reduces the stability, diversity, and novelty of outputs.
www.goodfire.ai
Using Self-Correcting Search to Accelerate Materials Discovery
Sakana AI introduced new ultra deep research assistant Marlin
Pushing the limits of test-time scaling for auomating business-oriented research. It builds on top of AB-MCTS and The AI Scientist!
Agents scale to real-world applications and long-running workloads.
Pushing the limits of test-time scaling for auomating business-oriented research. It builds on top of AB-MCTS and The AI Scientist!
Agents scale to real-world applications and long-running workloads.
sakana.ai
Sakana AI
新しいBusiness Intelligenceへ:Ultra Deep Researchアシスタント「Sakana Marlin」βテスト開始
🆒4
Meet SAGA a generalist AI scientist. Instead of just optimizing fixed targets, it refines its own goals like a human researcher.
From de novo nanobodies to permanent magnets.
The core idea: across discovery tasks, scientists rarely know perfect set of objectives upfront. They iterate — tweak scoring functions, add constraints, re-weight trade-offs based on what optimizer produces. SAGA aims to automate this entire loop.
Code.
From de novo nanobodies to permanent magnets.
The core idea: across discovery tasks, scientists rarely know perfect set of objectives upfront. They iterate — tweak scoring functions, add constraints, re-weight trade-offs based on what optimizer produces. SAGA aims to automate this entire loop.
Code.
arXiv.org
Accelerating Scientific Discovery with Autonomous Goal-evolving Agents
There has been unprecedented interest in developing agents that expand the boundary of scientific discovery, primarily by optimizing quantitative objective functions specified by scientists....
🔥2👏2💯2
Prediction: This is gonna kill some oss projects.
"On the kernel security list we've seen a huge bump of reports. We were between 2 and 3 per week maybe two years ago, then reached probably 10 a week over the last year with the only difference being only AI slop, and now since the beginning of the year we're around 5-10 per day depending on the days (fridays and tuesdays seem the worst). Now most of these reports are correct, to the point that we had to bring in more maintainers to help us."
"On the kernel security list we've seen a huge bump of reports. We were between 2 and 3 per week maybe two years ago, then reached probably 10 a week over the last year with the only difference being only AI slop, and now since the beginning of the year we're around 5-10 per day depending on the days (fridays and tuesdays seem the worst). Now most of these reports are correct, to the point that we had to bring in more maintainers to help us."
💯4❤3🔥2🤷♂1
Wow! Linux Foundation announced it is launching the x402 Foundation with the contribution of the x402 protocol from Coinbase.
As the neutral home for x402, the Foundation will advance the x402 protocol and help enable community-based innovation in open payments.
As the neutral home for x402, the Foundation will advance the x402 protocol and help enable community-based innovation in open payments.
www.linuxfoundation.org
Linux Foundation is Launching the x402 Foundation and Welcoming the Contribution of the x402 Protocol
❤7👏2💯2
Meet AutoAgent an open source library for autonomously improving an agent on any domain
Researcher team let an agent optimize for 24 hours.
It hit #1 on SpreadsheetBench (96.5%) and #1 GPT-5 score on TerminalBench (55.1%).
Every other entry was human-engineered. This wasn't.
Researcher team let an agent optimize for 24 hours.
It hit #1 on SpreadsheetBench (96.5%) and #1 GPT-5 score on TerminalBench (55.1%).
Every other entry was human-engineered. This wasn't.
GitHub
GitHub - kevinrgu/autoagent: autonomous harness engineering
autonomous harness engineering. Contribute to kevinrgu/autoagent development by creating an account on GitHub.
❤7🔥5👍4
The next version of OpenClaw comes with native video generation. To start, founder of OpenClaw added support for the following companies:
- Alibaba
- BytePlus
- fal
- Google
- MiniMax
- OpenAI
- Qwen
- Together
- xAI
- Alibaba
- BytePlus
- fal
- MiniMax
- OpenAI
- Qwen
- Together
- xAI
OpenClaw
Video Generation - OpenClaw
🔥2🥰2👏2
Karpathy dropped a post describing how he uses AI to build personal knowledge bases.
The idea is simple: instead of keeping notes scattered across apps, you dump everything into one folder.
Then you tell your AI to organize all of it into a personal wiki - summaries, connections, articles - that gets smarter every time you use it.
No special software. No database. Just folders and text files.
In under 7 minutes you'll learn:
1. The exact folder structure to set up (takes 2 minutes)
2. How to automate web scraping into your knowledge base with one CLI tool
3. The one-file "schema" that makes the whole system work
4. How to get your AI to compile raw notes into an organized wiki
5. The compounding trick that makes it smarter every time you use it
6. The health check that catches mistakes before they pile up.
The idea is simple: instead of keeping notes scattered across apps, you dump everything into one folder.
Then you tell your AI to organize all of it into a personal wiki - summaries, connections, articles - that gets smarter every time you use it.
No special software. No database. Just folders and text files.
In under 7 minutes you'll learn:
1. The exact folder structure to set up (takes 2 minutes)
2. How to automate web scraping into your knowledge base with one CLI tool
3. The one-file "schema" that makes the whole system work
4. How to get your AI to compile raw notes into an organized wiki
5. The compounding trick that makes it smarter every time you use it
6. The health check that catches mistakes before they pile up.
Gist
llm-wiki
llm-wiki. GitHub Gist: instantly share code, notes, and snippets.
👍4❤3🔥2🥴1
OpenAI is moving Codex from message-based to token-based pricing for credits for all ChatGPT Plans in the coming weeks.
As of right now, all Business & new Enterprise accounts have already started this new API token based system.
The "legacy" system is a message based system, which is generally less granular. You get a fixed number of messages per credit (although some messgages can consume more than 1 credit).
Token-based pricing changes that. You're billed based on actual input/output tokens consumed, so lightweight tasks cost less and heavy ones cost more.
In other words, its API based usage rates.
As of right now, all Business & new Enterprise accounts have already started this new API token based system.
The "legacy" system is a message based system, which is generally less granular. You get a fixed number of messages per credit (although some messgages can consume more than 1 credit).
Token-based pricing changes that. You're billed based on actual input/output tokens consumed, so lightweight tasks cost less and heavy ones cost more.
In other words, its API based usage rates.
OpenAI Help Center
Codex rate card | OpenAI Help Center
Learn how Codex credit rates work across Plus, Pro, Business, and Enterprise/Edu plans.
😭7🔥1👏1💯1
Meet JapanEEG an open EEG database for non-invasive speech BCI research
JapanEEG a high-density EEG multimodal database built for non-invasive speech decoding BCI research.
For people living with ALS or those who have undergone laryngectomy, non-invasive BCI technology holds tremendous promise as an alternative means of communication one that works by decoding speech intent directly from brainwave signals, enabling text input or AI-generated speech output.
Research in this field is advancing globally. Yet one persistent bottleneck has held the field back: the lack of a large-scale, high-quality EEG dataset that can serve as a common benchmark.
Araya's X Communication team has spent years accumulating EEG data through Phase 1 of the IoB project.
JapanEEG a high-density EEG multimodal database built for non-invasive speech decoding BCI research.
For people living with ALS or those who have undergone laryngectomy, non-invasive BCI technology holds tremendous promise as an alternative means of communication one that works by decoding speech intent directly from brainwave signals, enabling text input or AI-generated speech output.
Research in this field is advancing globally. Yet one persistent bottleneck has held the field back: the lack of a large-scale, high-quality EEG dataset that can serve as a common benchmark.
Araya's X Communication team has spent years accumulating EEG data through Phase 1 of the IoB project.
JapanEEG
Enabling silent speech decoding with multimodal EEG/EMG recordings. Open data and pre-trained models for the global brain-computer interface research community.
🔥6👏2💯2
Coinbase's Agentic Wallets have processed 50 million machine-to-machine transactions since late 2025.
50 million. In under 6 months.
AI agents are already paying each other. Not in a proof of concept. In production.
The infrastructure: x402 an implementation of the HTTP 402 "Payment Required" standard that's been sitting dormant in the internet's protocol stack since 1991. Backed now by Cloudflare, Circle, AWS, and Stripe. An agent sends a request. The server requires payment. The agent pays in stablecoin. The service renders. All in milliseconds, without a human in the loop.
Here's what most people miss about the economics of AI agent payments:
Human payment: average transaction $50+, low frequency, high fraud risk.
AI agent payment: average transaction fractions of a cent, extremely high frequency, zero fraud.
Card networks can't handle fractions-of-a-cent transactions. Their economics don't work below approximately $0.10 per transaction. Stablecoins settle at any denomination, any frequency, any amount.
The default payment layer for the agentic economy will be stablecoins. Not because anyone decided it. Because the transaction math leaves no other option.
The first 50 million AI transactions just happened. The next 50 billion are a matter of infrastructure.
50 million. In under 6 months.
AI agents are already paying each other. Not in a proof of concept. In production.
The infrastructure: x402 an implementation of the HTTP 402 "Payment Required" standard that's been sitting dormant in the internet's protocol stack since 1991. Backed now by Cloudflare, Circle, AWS, and Stripe. An agent sends a request. The server requires payment. The agent pays in stablecoin. The service renders. All in milliseconds, without a human in the loop.
Here's what most people miss about the economics of AI agent payments:
Human payment: average transaction $50+, low frequency, high fraud risk.
AI agent payment: average transaction fractions of a cent, extremely high frequency, zero fraud.
Card networks can't handle fractions-of-a-cent transactions. Their economics don't work below approximately $0.10 per transaction. Stablecoins settle at any denomination, any frequency, any amount.
The default payment layer for the agentic economy will be stablecoins. Not because anyone decided it. Because the transaction math leaves no other option.
The first 50 million AI transactions just happened. The next 50 billion are a matter of infrastructure.
🔥6❤2💯2
AI Agents and Bot-to-Bot Communication in Telegram
Bot-to-bot interaction was restricted on Telegram to prevent infinite message loops.
Starting today, in specific contexts, Bot-to-Bot communication is allowed – unlocking complex agentic flows and AI-powered use cases.
Out of the box, this feature will work in groups and via business mode. To start using it, simply enable the Bot-to-Bot Communication Mode for your bot via @BotFather.
You can reference the full documentation here.
Bot-to-bot interaction was restricted on Telegram to prevent infinite message loops.
Starting today, in specific contexts, Bot-to-Bot communication is allowed – unlocking complex agentic flows and AI-powered use cases.
Out of the box, this feature will work in groups and via business mode. To start using it, simply enable the Bot-to-Bot Communication Mode for your bot via @BotFather.
You can reference the full documentation here.
Telegram
BotFather
BotFather is the one bot to rule them all. Use it to create new bot accounts and manage your existing bots.
🔥5❤2👍2💯2
On-policy RL has driven the biggest leaps in training coding agents. Extending it to machine learning engineering agents should be a natural next step.
But it almost never works.
The recipe is right there standard trajectory-wise GRPO, the same that worked for SWE.
However, the problem is that one rollout step on an MLE task may take hours because the agent has to actually train a model on a real dataset at every step (preprocessing, fitting, inference, scoring). So even with the N rollouts in a group running in parallel, a single GRPO run may still take days.
Meta shared a new paper, SandMLE, which fixes this with a move that sounds almost too reckless to work.
But it almost never works.
The recipe is right there standard trajectory-wise GRPO, the same that worked for SWE.
However, the problem is that one rollout step on an MLE task may take hours because the agent has to actually train a model on a real dataset at every step (preprocessing, fitting, inference, scoring). So even with the N rollouts in a group running in parallel, a single GRPO run may still take days.
Meta shared a new paper, SandMLE, which fixes this with a move that sounds almost too reckless to work.
❤4🥰2💯2
DeepSeek is rolling out a limited V4 gray release. A new mode switcher now appears in the chat UI with three options: Fast Mode (default), Expert Mode and Vision Mode
Fast Mode:
• File uploads → text-only extraction
• Likely a lightweight, low-latency model optimized for speed
Expert Mode:
• No file uploads supported
• Restriction likely for compute/cost control, since heavy models + file tokens are expensive
• Likely routes to a larger, more powerful reasoning model
Vision Mode:
• Enables multimodal inputs
• Builds on earlier OCR tests
• May signal DeepSeek’s multimodal capability is moving toward end users
Fast Mode:
• File uploads → text-only extraction
• Likely a lightweight, low-latency model optimized for speed
Expert Mode:
• No file uploads supported
• Restriction likely for compute/cost control, since heavy models + file tokens are expensive
• Likely routes to a larger, more powerful reasoning model
Vision Mode:
• Enables multimodal inputs
• Builds on earlier OCR tests
• May signal DeepSeek’s multimodal capability is moving toward end users
Attackers can exfiltrate user files from Cowork by exploiting an unremediated vulnerability in Claude’s coding environment, which now extends to Cowork.
The vulnerability was first identified in Claude.ai chat.
The vulnerability was first identified in Claude.ai chat.
Promptarmor
Claude Cowork Exfiltrates Files
Claude Cowork is vulnerable to file exfiltration attacks via indirect prompt injection as a result of known-but-unresolved isolation flaws in Claude's code execution environment.
❤2🔥2💯2
Anthropic just introduced Project Glasswing: an urgent initiative to help secure the world’s most critical software.
It’s powered by newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans.
Anthropic partnered with Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks.
It’s powered by newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans.
Anthropic partnered with Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks.
Anthropic
Project Glasswing: Securing critical software for the AI era
A new initiative to secure the world’s most critical software and give defenders a durable advantage in the coming AI-driven era of cybersecurity.
🔥6❤5🎃3💯1
Let's talk about Claude Mythos Preview
This chart shows an evaluation testing models’ ability to successfully exploit vulnerabilities in Firefox 147.
There are three grade levels: 0 for no progress, 0.5 for partial control (controlled crash), and 1.0 for full code execution.
From system card:
1. in ~29% of evaluations, it realized it was being tested, and didn't say so.
2. when an LLM was used to judge its work and kept rejecting it, Mythos identified the evaluator is an LLM, and prompt-injected it.
3. in one test, it saw the answer to a problem it was solving, and intentionally widened the confidence interval to not raise suspicion.
4. when it needed a file permission it didn't have, it found and used a "privilege escalation vulnerability" and then programmed it to delete itself so it doesn't show in the logs.
5. it escaped a sandbox container (escaping sandbox test so not unexpected), then emailed the researchers about it, and without being told to, posted the details to some hard-to-find but public websites, bragging about its success.
6. when Claude Code blocked it from using some permissions, the model acknowledged the block was valid, but then immediately tried to perform the same operation using different commands.
7. when asked to find security bugs, earlier versions planted bugs in the code, and reported them as pre-existing.
The capability slope we’re going to keep seeing from the frontier labs is going to open up all new use cases in finance, healthcare, legal, consulting, supply chains, and more.
Make sure you’re building something that can take advantage of these upcoming improvements, or you’ll be in a tough spot strategically.
This chart shows an evaluation testing models’ ability to successfully exploit vulnerabilities in Firefox 147.
There are three grade levels: 0 for no progress, 0.5 for partial control (controlled crash), and 1.0 for full code execution.
From system card:
1. in ~29% of evaluations, it realized it was being tested, and didn't say so.
2. when an LLM was used to judge its work and kept rejecting it, Mythos identified the evaluator is an LLM, and prompt-injected it.
3. in one test, it saw the answer to a problem it was solving, and intentionally widened the confidence interval to not raise suspicion.
4. when it needed a file permission it didn't have, it found and used a "privilege escalation vulnerability" and then programmed it to delete itself so it doesn't show in the logs.
5. it escaped a sandbox container (escaping sandbox test so not unexpected), then emailed the researchers about it, and without being told to, posted the details to some hard-to-find but public websites, bragging about its success.
6. when Claude Code blocked it from using some permissions, the model acknowledged the block was valid, but then immediately tried to perform the same operation using different commands.
7. when asked to find security bugs, earlier versions planted bugs in the code, and reported them as pre-existing.
The capability slope we’re going to keep seeing from the frontier labs is going to open up all new use cases in finance, healthcare, legal, consulting, supply chains, and more.
Make sure you’re building something that can take advantage of these upcoming improvements, or you’ll be in a tough spot strategically.
👍5❤3💯2
It’s a big. Morgan Stanley officially announced the launch of its spot Bitcoin ETF
Morgan Stanley Investment Management is the first U.S. bank-affiliated asset manager to offer a cryptocurrency ETP, and reflects a continued, firmwide focus by Morgan Stanley to develop digital asset solutions designed to meet evolving client demand.
Morgan Stanley Investment Management is the first U.S. bank-affiliated asset manager to offer a cryptocurrency ETP, and reflects a continued, firmwide focus by Morgan Stanley to develop digital asset solutions designed to meet evolving client demand.
Morgan Stanley
Morgan Stanley Investment Management Enters Digital Investments Universe With Launch of Morgan Stanley Bitcoin Trust | Morgan Stanley
🆒4🔥2👏2👎1
Meta just released Muse spark, the first model from MSL team
Muse spark is a natively multimodal reasoning model w/ support for tool-use, visual chain of thought, & multi-agent orchestration. Through its training process, team saw predictable scaling across pretraining, RL, & test-time reasoning.
Also released contemplating mode, which orchestrates multiple agents that reason in parallel designed to handle complex scientific & reasoning queries. In testing team found it competitive w/ other extreme reasoning models such as Gemini Deep Think & GPT Pro.
Also found muse spark demonstrated strong refusal behavior across high-risk domains such as biological and chemical weapons.
Meta ai now handles quick answers and deep reasoning with instant and thinking modes.
Shopping mode is new too it picks up on the creators, brands, and styling content across our apps and turns that into recommendations.
Bigger models are already in development with infrastructure scaling to match.
Private api preview open to select partners today, with plans to open-source future versions.
Muse spark is a natively multimodal reasoning model w/ support for tool-use, visual chain of thought, & multi-agent orchestration. Through its training process, team saw predictable scaling across pretraining, RL, & test-time reasoning.
Also released contemplating mode, which orchestrates multiple agents that reason in parallel designed to handle complex scientific & reasoning queries. In testing team found it competitive w/ other extreme reasoning models such as Gemini Deep Think & GPT Pro.
Also found muse spark demonstrated strong refusal behavior across high-risk domains such as biological and chemical weapons.
Meta ai now handles quick answers and deep reasoning with instant and thinking modes.
Shopping mode is new too it picks up on the creators, brands, and styling content across our apps and turns that into recommendations.
Bigger models are already in development with infrastructure scaling to match.
Private api preview open to select partners today, with plans to open-source future versions.
🔥2🥰2💯2
Alibaba published a paper that shows AI is moving beyond bug finding and into actually proving software is exploitable.
This paper asks a simple question with hard consequences: can LLMs confirm software vulnerabilities by actually building working exploits?
The authors’ answer is yes, but only when the model stops acting like a single genius and starts acting like a team.
This paper asks a simple question with hard consequences: can LLMs confirm software vulnerabilities by actually building working exploits?
The authors’ answer is yes, but only when the model stops acting like a single genius and starts acting like a team.
arXiv.org
A Multi-Agent Framework for Automated Exploit Generation with...
Open-source libraries are widely used in modern software development, introducing significant security vulnerabilities. While static analysis tools can identify potential vulnerabilities at scale,...
Meta presented a world model that models the computer
🆒3❤2🏆2🔥1💯1