Meet LUMI-lab is a self-driving lab that closes the loop between an AI foundation model + robotics to accelerate lipid nanoparticle (LNP) discovery for mRNA delivery.
LUMI-lab (Large-scale Unsupervised Modeling followed by Iterative experiments) is a self-driving laboratory that tightly closes the loop between an AI foundation model and automated robotics to accelerate LNP discovery for mRNA delivery.
To tackle data scarcity in emerging mRNA delivery domains, pretrained the model on 28M+ molecular structures, then iteratively improved it with closed-loop experimental data.
In this work, across ten active-learning cycles, LUMI-lab synthesized and evaluated 1,700+ new LNPs and unexpectedly identified a new design feature for efficient delivery: brominated lipid tails.
These brominated-tail ionizable lipids delivered mRNA into human lung cells more efficiently than approved benchmarks, despite representing only a small fraction of the initial chemical space explored.
GitHub.
Check the video here.
LUMI-lab (Large-scale Unsupervised Modeling followed by Iterative experiments) is a self-driving laboratory that tightly closes the loop between an AI foundation model and automated robotics to accelerate LNP discovery for mRNA delivery.
To tackle data scarcity in emerging mRNA delivery domains, pretrained the model on 28M+ molecular structures, then iteratively improved it with closed-loop experimental data.
In this work, across ten active-learning cycles, LUMI-lab synthesized and evaluated 1,700+ new LNPs and unexpectedly identified a new design feature for efficient delivery: brominated lipid tails.
These brominated-tail ionizable lipids delivered mRNA into human lung cells more efficiently than approved benchmarks, despite representing only a small fraction of the initial chemical space explored.
GitHub.
Check the video here.
GitHub
GitHub - bowenli-lab/LUMI-lab: Foundation model-driven lab
Foundation model-driven lab. Contribute to bowenli-lab/LUMI-lab development by creating an account on GitHub.
❤3🔥3👏2
LLM personas can be elicited just by prompting. Even harmful ones.
Lesswrong
In-context learning alone can induce weird generalisation — LessWrong
Benji Berczi, Kyuhee Kim, Cozmin Ududec, James Requeima …
💯4🔥2🥰2
A good model of the world requires not just great graphics but spatial and world intelligence so that you can understand how objects move and respond, what actions cause what outcomes, and what the effects of interactions by players are.
Moonlake's world model delivers that.
Moonlake's world model delivers that.
Moonlakeai
Building Multimodal Worlds with Moonlake's World Modeling Agent - Moonlake AI
What it takes to build an interactive, multimodal world — and how our agent created a bowling mini-game from a single prompt.
🔥3❤2🥰2
Google introduced Nano Banana 2
It uses Gemini’s understanding of the world and is powered by real-time information and images from web search. That means it can better reflect real-world conditions in high-fidelity.
Check out "Window Seat," a demo using Nano Banana 2’s world understanding to generate more accurate views from any window in the world, pulling live local weather info with 2K/4K specs. The precision is mind blowing.
Rolling out today as the new default in the Geminiapp, Search (across 141 countries), and Flow + available in preview via Google AIStudio and Vertex AI. Also available in Google Antigravity.
It uses Gemini’s understanding of the world and is powered by real-time information and images from web search. That means it can better reflect real-world conditions in high-fidelity.
Check out "Window Seat," a demo using Nano Banana 2’s world understanding to generate more accurate views from any window in the world, pulling live local weather info with 2K/4K specs. The precision is mind blowing.
Rolling out today as the new default in the Geminiapp, Search (across 141 countries), and Flow + available in preview via Google AIStudio and Vertex AI. Also available in Google Antigravity.
Google
Nano Banana 2: Combining Pro capabilities with lightning-fast speed
Our latest image generation model offers advanced world knowledge, production-ready specs, subject consistency and more, all at Flash speed.
👏3🔥2🥰2
New from DeepSeek: DualPath
Researchers from Peking University, Tsinghua University, and #DeepSeek unveiled DualPath to fix the storage bandwidth bottleneck, which may be the secret killer of LLM agent performance.
Instead of letting data get stuck in a single storage traffic jam, DualPath creates a second highway for data to travel.
It loads saved model memory into idle decoding engines and then zips it over to the processing engines using high-speed internal networks, ensuring no part of the system sits idle while waiting for data.
The results are massive: DualPath boosts offline throughput by up to 1.87x and nearly doubles online serving speeds without violating performance targets.
Researchers from Peking University, Tsinghua University, and #DeepSeek unveiled DualPath to fix the storage bandwidth bottleneck, which may be the secret killer of LLM agent performance.
Instead of letting data get stuck in a single storage traffic jam, DualPath creates a second highway for data to travel.
It loads saved model memory into idle decoding engines and then zips it over to the processing engines using high-speed internal networks, ensuring no part of the system sits idle while waiting for data.
The results are massive: DualPath boosts offline throughput by up to 1.87x and nearly doubles online serving speeds without violating performance targets.
arXiv.org
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM...
The performance of multi-turn, agentic LLM inference is increasingly dominated by KV-Cache storage I/O rather than computation. In prevalent disaggregated architectures, loading the massive...
🔥3❤2🥰2
Sakana introduced Doc-to-LoRA and Text-to-LoRA, two related research exploring how to make LLM customization faster and more accessible.
By training a Hypernetwork to generate LoRA adapters on the fly, these methods allow models to instantly internalize new information or adapt to new tasks.
Biological systems naturally rely on two key cognitive abilities: durable long-term memory to store facts, and rapid adaptation to handle new tasks given limited sensory cues. While modern LLMs are highly capable, they still lack this flexibility. Traditionally, adding long-term memory or adapting an LLM to a specific downstream task requires an expensive and time-consuming model update, such as fine-tuning or context distillation, or relies on memory-intensive long prompts.
To bypass these limitations, this work focuses on the concept of cost amortization. Researchers pay the meta-training cost once to train a hypernetwork capable of producing tasks or document specific LoRAs on demand. This turns what used to be a heavy engineering pipeline into a single, inexpensive forward pass. Instead of performing per-task optimization, the hypernetwork meta-learns update rules to instantly modify an LLM given a new task description or a long document.
In experiments, Text-to-LoRA successfully specializes models to unseen tasks using just a natural language description. Building on this, Doc-to-LoRA is able to internalize factual documents. On a needle-in-a-haystack task, Doc-to-LoRA achieves near-perfect accuracy on instances five times longer than the base model's context window. It can even generalize to transfer visual information from a vision-language model into a text-only LLM, allowing it to classify images purely through internalized weights.
Importantly, both methods run with sub-second latency, enabling rapid experimentation while avoiding the overhead of traditional model updates. This approach is a step towards lowering the technical barriers of model customization, allowing end-users to specialize foundation models via simple text inputs.
Doc-to-LoRA
Paper
Code
Text-to-LoRA
Paper
Code
By training a Hypernetwork to generate LoRA adapters on the fly, these methods allow models to instantly internalize new information or adapt to new tasks.
Biological systems naturally rely on two key cognitive abilities: durable long-term memory to store facts, and rapid adaptation to handle new tasks given limited sensory cues. While modern LLMs are highly capable, they still lack this flexibility. Traditionally, adding long-term memory or adapting an LLM to a specific downstream task requires an expensive and time-consuming model update, such as fine-tuning or context distillation, or relies on memory-intensive long prompts.
To bypass these limitations, this work focuses on the concept of cost amortization. Researchers pay the meta-training cost once to train a hypernetwork capable of producing tasks or document specific LoRAs on demand. This turns what used to be a heavy engineering pipeline into a single, inexpensive forward pass. Instead of performing per-task optimization, the hypernetwork meta-learns update rules to instantly modify an LLM given a new task description or a long document.
In experiments, Text-to-LoRA successfully specializes models to unseen tasks using just a natural language description. Building on this, Doc-to-LoRA is able to internalize factual documents. On a needle-in-a-haystack task, Doc-to-LoRA achieves near-perfect accuracy on instances five times longer than the base model's context window. It can even generalize to transfer visual information from a vision-language model into a text-only LLM, allowing it to classify images purely through internalized weights.
Importantly, both methods run with sub-second latency, enabling rapid experimentation while avoiding the overhead of traditional model updates. This approach is a step towards lowering the technical barriers of model customization, allowing end-users to specialize foundation models via simple text inputs.
Doc-to-LoRA
Paper
Code
Text-to-LoRA
Paper
Code
arXiv.org
Doc-to-LoRA: Learning to Instantly Internalize Contexts
Long input sequences are central to in-context learning, document understanding, and multi-step reasoning of Large Language Models (LLMs). However, the quadratic attention cost of Transformers...
🔥3👏3🥰2
Anthropic dropped new feature lets you import your entire memory from chatGPT, Gemini etc into Claude so it instantly knows everything about you. no more reminding claude who you are.
Claude
Switch to Claude without starting over | Claude
Transfer your preferences, projects, and context from other AI providers into Claude. Switch without losing what makes your AI useful.
🔥5❤3💯2
REMem: Reasoning with Episodic Memory in AI Agents
REMem addresses a capability gap in many RAG/memory systems: not just storing documents or facts, but also recollecting specific past events with their situational grounding (when/where/who/what) and then reasoning across multiple events on a timeline.
GitHub.
REMem addresses a capability gap in many RAG/memory systems: not just storing documents or facts, but also recollecting specific past events with their situational grounding (when/where/who/what) and then reasoning across multiple events on a timeline.
GitHub.
arXiv.org
REMem: Reasoning with Episodic Memory in Language Agent
Humans excel at remembering concrete experiences along spatiotemporal contexts and performing reasoning across those events, i.e., the capacity for episodic memory. In contrast, memory in language...
🔥2🥰2💯2
Visa is leaning hard into agentic commerce and stablecoins.
• Agentic commerce live in the US & CEMEA, expanding globally
• Stablecoin cards now in 50+ countries
• USDC settlement live in the US
• $4.6B annualized stablecoin volume
Visa is positioning itself as the infrastructure layer between crypto and traditional finance.
• Agentic commerce live in the US & CEMEA, expanding globally
• Stablecoin cards now in 50+ countries
• USDC settlement live in the US
• $4.6B annualized stablecoin volume
Visa is positioning itself as the infrastructure layer between crypto and traditional finance.
👍2🔥2🥰2👎1
Researchers adapted the Avey architecture to the encoder paradigm and called the result Avey-B, a next-generation alternative to BERT with unlimited context length.
Avey is an alternative architecture to Transformers from last year.
It scales linearly with context-length and performs better at long-context tasks (needle).
They now showed that it works just as well in BERT-style model.
This approach definitively needs more attention.
HuggingFace.
GitHub
Avey is an alternative architecture to Transformers from last year.
It scales linearly with context-length and performs better at long-context tasks (needle).
They now showed that it works just as well in BERT-style model.
This approach definitively needs more attention.
HuggingFace.
GitHub
arXiv.org
Avey-B
Compact pretrained bidirectional encoders remain the backbone of industrial NLP under tight compute and memory budgets. Their effectiveness stems from self-attention's ability to deliver...
❤3🔥3💯2
DoubleAI’s AI system beat a decade of expert GPU engineering
WarpSpeed just beat a decade of expert-engineered GPU kernels — every single one of them.
cuGraph is one of the most widely used GPU-accelerated libraries in the world. It spans dozens of graph algorithms, each written and continuously refined by some of the world’s top performance engineers.
DoubleAI’s WarpSpeed autonomously rewrote and re-optimized these kernels across three GPU architectures (A100, L4, A10G). DoubleAI released the hyper-optimized version on GitHub — install it with no change to your code.
The numbers: - 3.6x average speedup over human experts - 100% of kernels benefit from speedup - 55% see more than 2x improvement.
Winning Gold at IMO 2025.
Codeforces benchmarks.
From Reasoning to Super-Intelligence: A Search-Theoretic Perspective.
WarpSpeed just beat a decade of expert-engineered GPU kernels — every single one of them.
cuGraph is one of the most widely used GPU-accelerated libraries in the world. It spans dozens of graph algorithms, each written and continuously refined by some of the world’s top performance engineers.
DoubleAI’s WarpSpeed autonomously rewrote and re-optimized these kernels across three GPU architectures (A100, L4, A10G). DoubleAI released the hyper-optimized version on GitHub — install it with no change to your code.
The numbers: - 3.6x average speedup over human experts - 100% of kernels benefit from speedup - 55% see more than 2x improvement.
Winning Gold at IMO 2025.
Codeforces benchmarks.
From Reasoning to Super-Intelligence: A Search-Theoretic Perspective.
ByteDance published CUDA Agent
It trained a model that writes fast CUDA kernels. Not just correct ones — actually optimized ones.
It beats torch.compile by 2× on simple/medium kernels, ~92% on complex ones, and even outperforms Claude Opus 4.5 and Gemini 3 Pro by ~40% on the hardest setting.
The key idea is simple but kind of brilliant:
CUDA performance isn’t about correctness, it’s about hardware. Warps, memory bandwidth, bank conflicts — the stuff you only see in a profiler.
So instead of rewarding “did it compile?”, they reward actual GPU speed. Real profiling numbers. RL trained directly on performance.
Paper.
It trained a model that writes fast CUDA kernels. Not just correct ones — actually optimized ones.
It beats torch.compile by 2× on simple/medium kernels, ~92% on complex ones, and even outperforms Claude Opus 4.5 and Gemini 3 Pro by ~40% on the hardest setting.
The key idea is simple but kind of brilliant:
CUDA performance isn’t about correctness, it’s about hardware. Warps, memory bandwidth, bank conflicts — the stuff you only see in a profiler.
So instead of rewarding “did it compile?”, they reward actual GPU speed. Real profiling numbers. RL trained directly on performance.
Paper.
cuda-agent.github.io
CUDA Agent | Large-Scale Agentic RL for CUDA Kernel Generation
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation.
❤1🔥1💯1
All about AI, Web 3.0, BCI
Visa is leaning hard into agentic commerce and stablecoins. • Agentic commerce live in the US & CEMEA, expanding globally • Stablecoin cards now in 50+ countries • USDC settlement live in the US • $4.6B annualized stablecoin volume Visa is positioning…
Visa announced partnership with Bridge(Stripe) to launch stablecoin-linked cards in 100+ countries.
These cards will be backed by stablecoin balances, enabling efficient and integrated global coverage.
These cards will be backed by stablecoin balances, enabling efficient and integrated global coverage.
Fortune
Visa to expand card partnership with Stripe’s Bridge to over 100 countries | Fortune
The two firms previously launched stablecoin-backed cards for 18 countries in April.
🔥4❤1🥰1
Meta is testing a shopping research feature in its Meta AI web browser for select US users, positioning it against e-commerce tools in ChatGPT and Gemini
Bloomberg.com
Meta Tests AI Shopping Research Tool to Rival ChatGPT, Gemini
Meta Platforms Inc. is testing a shopping research feature in its artificial intelligence chatbot, rivaling a similar tool offered by OpenAI’s ChatGPT and Google’s Gemini.
🔥1🥰1👏1
Alibaba's top AI researcher resigned immediately after Qwen's most successful model launch ever.
The day Junyang Lin announced his departure, Qwen released FOUR brand new models, including one that can run on just 7 gigs of RAM.
The models got rave reviews, including one from Elon Musk, who praised its "density of intelligence." The models were/are free to use.
Lin was seen as the most important developer at Qwen. He was also a big open source advocate. His departure led to speculation that he'd been forced out against his will. Chinese AI researchers You Jiacheng and Chen Cheng shared this view.
Why did this happen?
Some are saying it was a money thing. All of Alibaba's Qwen models until now have been completely open source, meaning that people can download them and run them locally, generating no revenue for Alibaba. Reportedly, company execs were frustrated that the open source models were not helping get users for Alibaba's revenue-generating services (e.g. Alibaba Cloud, subscription services etc).
Shortly before Lin quit, Alibaba had hired people who had worked on Google's Gemini, reportedly with an eye to increasing Daily Active Users (DAUs). If these reports are correct then we can expect that Alibaba will put more emphasis on monetizing its AI going forward. That should drive higher revenue, though it will likely mean the end of these powerful free Qwen models we've been seeing lately.
Word on the street is that Alibaba is tightening the screws to make money via proprietary cloud and API rather than open source.
The day Junyang Lin announced his departure, Qwen released FOUR brand new models, including one that can run on just 7 gigs of RAM.
The models got rave reviews, including one from Elon Musk, who praised its "density of intelligence." The models were/are free to use.
Lin was seen as the most important developer at Qwen. He was also a big open source advocate. His departure led to speculation that he'd been forced out against his will. Chinese AI researchers You Jiacheng and Chen Cheng shared this view.
Why did this happen?
Some are saying it was a money thing. All of Alibaba's Qwen models until now have been completely open source, meaning that people can download them and run them locally, generating no revenue for Alibaba. Reportedly, company execs were frustrated that the open source models were not helping get users for Alibaba's revenue-generating services (e.g. Alibaba Cloud, subscription services etc).
Shortly before Lin quit, Alibaba had hired people who had worked on Google's Gemini, reportedly with an eye to increasing Daily Active Users (DAUs). If these reports are correct then we can expect that Alibaba will put more emphasis on monetizing its AI going forward. That should drive higher revenue, though it will likely mean the end of these powerful free Qwen models we've been seeing lately.
Word on the street is that Alibaba is tightening the screws to make money via proprietary cloud and API rather than open source.
Venturebeat
Did Alibaba just kneecap its powerful Qwen AI team? Key figures depart in wake of latest open source release
The takeaway? If you value Qwen's open source efforts, download and preserve the models now, while you still can.
🔥1👏1💯1
Physical Intelligence made a memory system for their models and call it Multi-Scale Embodied Memory (MEM).
It provides both short-term and long-term memory to enable very long tasks.
Researchers tested it on cleaning a kitchen (and yes, washing dishes), making grilled cheese, and more.
One of the cool side effects of MEM is in-context adaptation: when the robot makes a mistake, like opening the fridge door from the wrong side, it remembers what happened and tries the task again in a different way.
It provides both short-term and long-term memory to enable very long tasks.
Researchers tested it on cleaning a kitchen (and yes, washing dishes), making grilled cheese, and more.
One of the cool side effects of MEM is in-context adaptation: when the robot makes a mistake, like opening the fridge door from the wrong side, it remembers what happened and tries the task again in a different way.
www.pi.website
VLAs with Long and Short-Term Memory
Multi-Scale Embodied Memory (MEM) gives our models both long-term and short-term memory, enabling complex tasks longer than ten minutes.
🔥6👏2💯1
MIT introduced NeuroSkill, a real-time agentic system that models human cognitive and emotional state by integrating Brain-Computer Interface signals with foundation models.
"Human State of Mind" provided via SKILL dot md.
The system runs fully offline on the edge.
Its NeuroLoop harness enables agentic workflows that engage users across cognitive and emotional levels, responding to both explicit and implicit requests through actionable tool calls.
Why does it matter?
Most AI agents respond only to explicit user requests. NeuroSkill explores the frontier of proactive agents that sense and respond to implicit human states, opening new possibilities for adaptive human-AI interaction.
"Human State of Mind" provided via SKILL dot md.
The system runs fully offline on the edge.
Its NeuroLoop harness enables agentic workflows that engage users across cognitive and emotional levels, responding to both explicit and implicit requests through actionable tool calls.
Why does it matter?
Most AI agents respond only to explicit user requests. NeuroSkill explores the frontier of proactive agents that sense and respond to implicit human states, opening new possibilities for adaptive human-AI interaction.
arXiv.org
NeuroSkill(tm): Proactive Real-Time Agentic System Capable of...
Real-time proactive agentic system, capable of modeling Human State of Mind, using foundation EXG model and text embeddings model, running fully offline on the edge. Unlike all previously known...
🔥1🥰1💯1
Meta introduced Beyond Language Modeling: a deep dive into the design space of truly native multimodal models
Humans communicate through language and interact with the world through vision, yet most multimodal models are language-first.
First: how should we represent vision?
RAE uses a frozen SigLIP encoder as the image tokenizer. One encoder for both understanding and generation. Biggest wins on semantic and world-knowledge generation, where VAEs lose critical information through compression.
Next: how do we route modalities? Hand-crafted separation (MoT) or data-driven (MoE)?
MoE wins. But the interesting part is the emergent routing behavior. Without any hard architectural split, the network learns vision experts, language experts, and multimodal experts on its own. It even routes understanding and generation tokens to the same experts.
Why are frontier models still dense? Maybe they're using the wrong tokenizer.
VAE + MoE: no benefit from sparsity RAE + MoE: scales with sparsity
Semantic vision tokens unlock what MoE was designed to do.
Native multimodal pretraining enables zero-shot language-guided navigation. No navigation-specific data, no caption annotations. The grounding emerges from pretraining alone.
Humans communicate through language and interact with the world through vision, yet most multimodal models are language-first.
First: how should we represent vision?
RAE uses a frozen SigLIP encoder as the image tokenizer. One encoder for both understanding and generation. Biggest wins on semantic and world-knowledge generation, where VAEs lose critical information through compression.
Next: how do we route modalities? Hand-crafted separation (MoT) or data-driven (MoE)?
MoE wins. But the interesting part is the emergent routing behavior. Without any hard architectural split, the network learns vision experts, language experts, and multimodal experts on its own. It even routes understanding and generation tokens to the same experts.
Why are frontier models still dense? Maybe they're using the wrong tokenizer.
VAE + MoE: no benefit from sparsity RAE + MoE: scales with sparsity
Semantic vision tokens unlock what MoE was designed to do.
Native multimodal pretraining enables zero-shot language-guided navigation. No navigation-specific data, no caption annotations. The grounding emerges from pretraining alone.
beyond-llms.github.io
Beyond Language Modeling: An Exploration of Multimodal Pretraining
Empirical insights on representation, data, architecture, and scaling for native multimodal pretraining.
❤1🔥1💯1
Google open-sourced Google Workspace CLI - Gmail, Drive, Docs, Sheets, Calendar, Chat, all from the terminal + 40+ agent skills included.
The CLI is becoming the universal API for agents.
The CLI is becoming the universal API for agents.
GitHub
GitHub - googleworkspace/cli: Google Workspace CLI — one command-line tool for Drive, Gmail, Calendar, Sheets, Docs, Chat, Admin…
Google Workspace CLI — one command-line tool for Drive, Gmail, Calendar, Sheets, Docs, Chat, Admin, and more. Dynamically built from Google Discovery Service. Includes AI agent skills. - googlework...
👍6🆒4❤3💯2
Microsoft released Phi-4-reasoning-vision-15B, a compact and fast multimodal reasoning model, blends strengths of different methods while reducing their limits
The new phi multimodal model pushes the compute/perf pareto frontier via a careful design of many decisions, making it a good choice for edge applications.
The new phi multimodal model pushes the compute/perf pareto frontier via a careful design of many decisions, making it a good choice for edge applications.
Google Introduced a new method to teach LLMs to reason like Bayesians.
By training models to mimic optimal probabilistic inference, we improved their ability to update their predictions and generalize across new domains.
By training models to mimic optimal probabilistic inference, we improved their ability to update their predictions and generalize across new domains.
Google Research
Teaching LLMs to reason like Bayesians
We teach LLMs to reason in a Bayesian manner by training them to mimic the predictions of an optimal Bayesian model.
🔥3❤2💯2