An AI chip startup MatX founded by 2 Google alumni has raised more than $500 million in a new round to compete with Nvidia
They’re building an LLM chip that delivers much higher throughput than any other chip while also achieving the lowest latency. Call it the MatX One.
The MatX One chip is based on a splittable systolic array, which has the energy and area efficiency that large systolic arrays are famous for, while also getting high utilization on smaller matrices with flexible shapes. The chip combines the low latency of SRAM-first designs with the long-context support of HBM. These elements, plus a fresh take on numerics, deliver higher throughput on LLMs than any announced system, while simultaneously matching the latency of SRAM-first designs. Higher throughput and lower latency give you smarter and faster models for your subscription dollar.
They’re building an LLM chip that delivers much higher throughput than any other chip while also achieving the lowest latency. Call it the MatX One.
The MatX One chip is based on a splittable systolic array, which has the energy and area efficiency that large systolic arrays are famous for, while also getting high utilization on smaller matrices with flexible shapes. The chip combines the low latency of SRAM-first designs with the long-context support of HBM. These elements, plus a fresh take on numerics, deliver higher throughput on LLMs than any announced system, while simultaneously matching the latency of SRAM-first designs. Higher throughput and lower latency give you smarter and faster models for your subscription dollar.
Bloomberg.com
AI Chip Startup MatX Raises $500 Million to Compete With Nvidia
MatX, an AI chip startup founded by two alumni of Google’s semiconductor business, has raised more than $500 million in a new funding round to produce hardware that competes with Nvidia Corp.
🔥3🥰2👏2❤1
Anthropic rolled out Remote Control for Claude Code, letting users control a session begun in the terminal from the Claude mobile app or the web.
Remote Control is now available in Research Preview for Max users, and coming soon to Pro users.
Run
The aha moment from Openclaw was moving the control panel of your agent from desktop to where you are (mobile: whatsapp, telegram etc) and people lost their minds.
Very impressive to see the speed at which Anthropic saw that, built remote control and shipped.
Remote Control is now available in Research Preview for Max users, and coming soon to Pro users.
Run
claude rc to get started.The aha moment from Openclaw was moving the control panel of your agent from desktop to where you are (mobile: whatsapp, telegram etc) and people lost their minds.
Very impressive to see the speed at which Anthropic saw that, built remote control and shipped.
Claude Code Docs
Continue local sessions from any device with Remote Control - Claude Code Docs
Continue a local Claude Code session from your phone, tablet, or any browser using Remote Control. Works with claude.ai/code and the Claude mobile app.
🔥3❤2🥰2
Berkeley and Princeton developed a new offline GCRL method based on multistep quasimetrics that can learn multistage tasks in the real world using the Bridge Dataset.
The tasks might seem simple, but they use exactly the same Bridge Dataset as older works, squeezing much more advanced multistage tasks from the same data!
The tasks might seem simple, but they use exactly the same Bridge Dataset as older works, squeezing much more advanced multistage tasks from the same data!
❤1
This is insane. Next.js rebuilt based on Vite, and it only took one week and $1,100 in tokens.
Code was always the cheap part tho, knowing what to build and why is the hard part.
Cloudflare didnt just throw tokens at next.js — they had deep opinions about edge architecture and knew exactly where the framework needed to change. $1100 in tokens + years of infra expertise.
The expertise is the expensive input nobody counts
Code was always the cheap part tho, knowing what to build and why is the hard part.
Cloudflare didnt just throw tokens at next.js — they had deep opinions about edge architecture and knew exactly where the framework needed to change. $1100 in tokens + years of infra expertise.
The expertise is the expensive input nobody counts
The Cloudflare Blog
How we rebuilt Next.js with AI in one week
One engineer used AI to rebuild Next.js on Vite in a week. vinext builds up to 4x faster, produces 57% smaller bundles, and deploys to Cloudflare Workers with a single command.
🔥3❤2👏2
Meet LUMI-lab is a self-driving lab that closes the loop between an AI foundation model + robotics to accelerate lipid nanoparticle (LNP) discovery for mRNA delivery.
LUMI-lab (Large-scale Unsupervised Modeling followed by Iterative experiments) is a self-driving laboratory that tightly closes the loop between an AI foundation model and automated robotics to accelerate LNP discovery for mRNA delivery.
To tackle data scarcity in emerging mRNA delivery domains, pretrained the model on 28M+ molecular structures, then iteratively improved it with closed-loop experimental data.
In this work, across ten active-learning cycles, LUMI-lab synthesized and evaluated 1,700+ new LNPs and unexpectedly identified a new design feature for efficient delivery: brominated lipid tails.
These brominated-tail ionizable lipids delivered mRNA into human lung cells more efficiently than approved benchmarks, despite representing only a small fraction of the initial chemical space explored.
GitHub.
Check the video here.
LUMI-lab (Large-scale Unsupervised Modeling followed by Iterative experiments) is a self-driving laboratory that tightly closes the loop between an AI foundation model and automated robotics to accelerate LNP discovery for mRNA delivery.
To tackle data scarcity in emerging mRNA delivery domains, pretrained the model on 28M+ molecular structures, then iteratively improved it with closed-loop experimental data.
In this work, across ten active-learning cycles, LUMI-lab synthesized and evaluated 1,700+ new LNPs and unexpectedly identified a new design feature for efficient delivery: brominated lipid tails.
These brominated-tail ionizable lipids delivered mRNA into human lung cells more efficiently than approved benchmarks, despite representing only a small fraction of the initial chemical space explored.
GitHub.
Check the video here.
GitHub
GitHub - bowenli-lab/LUMI-lab: Foundation model-driven lab
Foundation model-driven lab. Contribute to bowenli-lab/LUMI-lab development by creating an account on GitHub.
❤3🔥3👏2
LLM personas can be elicited just by prompting. Even harmful ones.
Lesswrong
In-context learning alone can induce weird generalisation — LessWrong
Benji Berczi, Kyuhee Kim, Cozmin Ududec, James Requeima …
💯4🔥2🥰2
A good model of the world requires not just great graphics but spatial and world intelligence so that you can understand how objects move and respond, what actions cause what outcomes, and what the effects of interactions by players are.
Moonlake's world model delivers that.
Moonlake's world model delivers that.
Moonlakeai
Building Multimodal Worlds with Moonlake's World Modeling Agent - Moonlake AI
What it takes to build an interactive, multimodal world — and how our agent created a bowling mini-game from a single prompt.
🔥3❤2🥰2
Google introduced Nano Banana 2
It uses Gemini’s understanding of the world and is powered by real-time information and images from web search. That means it can better reflect real-world conditions in high-fidelity.
Check out "Window Seat," a demo using Nano Banana 2’s world understanding to generate more accurate views from any window in the world, pulling live local weather info with 2K/4K specs. The precision is mind blowing.
Rolling out today as the new default in the Geminiapp, Search (across 141 countries), and Flow + available in preview via Google AIStudio and Vertex AI. Also available in Google Antigravity.
It uses Gemini’s understanding of the world and is powered by real-time information and images from web search. That means it can better reflect real-world conditions in high-fidelity.
Check out "Window Seat," a demo using Nano Banana 2’s world understanding to generate more accurate views from any window in the world, pulling live local weather info with 2K/4K specs. The precision is mind blowing.
Rolling out today as the new default in the Geminiapp, Search (across 141 countries), and Flow + available in preview via Google AIStudio and Vertex AI. Also available in Google Antigravity.
Google
Nano Banana 2: Combining Pro capabilities with lightning-fast speed
Our latest image generation model offers advanced world knowledge, production-ready specs, subject consistency and more, all at Flash speed.
👏3🔥2🥰2
New from DeepSeek: DualPath
Researchers from Peking University, Tsinghua University, and #DeepSeek unveiled DualPath to fix the storage bandwidth bottleneck, which may be the secret killer of LLM agent performance.
Instead of letting data get stuck in a single storage traffic jam, DualPath creates a second highway for data to travel.
It loads saved model memory into idle decoding engines and then zips it over to the processing engines using high-speed internal networks, ensuring no part of the system sits idle while waiting for data.
The results are massive: DualPath boosts offline throughput by up to 1.87x and nearly doubles online serving speeds without violating performance targets.
Researchers from Peking University, Tsinghua University, and #DeepSeek unveiled DualPath to fix the storage bandwidth bottleneck, which may be the secret killer of LLM agent performance.
Instead of letting data get stuck in a single storage traffic jam, DualPath creates a second highway for data to travel.
It loads saved model memory into idle decoding engines and then zips it over to the processing engines using high-speed internal networks, ensuring no part of the system sits idle while waiting for data.
The results are massive: DualPath boosts offline throughput by up to 1.87x and nearly doubles online serving speeds without violating performance targets.
arXiv.org
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM...
The performance of multi-turn, agentic LLM inference is increasingly dominated by KV-Cache storage I/O rather than computation. In prevalent disaggregated architectures, loading the massive...
🔥3❤2🥰2
Sakana introduced Doc-to-LoRA and Text-to-LoRA, two related research exploring how to make LLM customization faster and more accessible.
By training a Hypernetwork to generate LoRA adapters on the fly, these methods allow models to instantly internalize new information or adapt to new tasks.
Biological systems naturally rely on two key cognitive abilities: durable long-term memory to store facts, and rapid adaptation to handle new tasks given limited sensory cues. While modern LLMs are highly capable, they still lack this flexibility. Traditionally, adding long-term memory or adapting an LLM to a specific downstream task requires an expensive and time-consuming model update, such as fine-tuning or context distillation, or relies on memory-intensive long prompts.
To bypass these limitations, this work focuses on the concept of cost amortization. Researchers pay the meta-training cost once to train a hypernetwork capable of producing tasks or document specific LoRAs on demand. This turns what used to be a heavy engineering pipeline into a single, inexpensive forward pass. Instead of performing per-task optimization, the hypernetwork meta-learns update rules to instantly modify an LLM given a new task description or a long document.
In experiments, Text-to-LoRA successfully specializes models to unseen tasks using just a natural language description. Building on this, Doc-to-LoRA is able to internalize factual documents. On a needle-in-a-haystack task, Doc-to-LoRA achieves near-perfect accuracy on instances five times longer than the base model's context window. It can even generalize to transfer visual information from a vision-language model into a text-only LLM, allowing it to classify images purely through internalized weights.
Importantly, both methods run with sub-second latency, enabling rapid experimentation while avoiding the overhead of traditional model updates. This approach is a step towards lowering the technical barriers of model customization, allowing end-users to specialize foundation models via simple text inputs.
Doc-to-LoRA
Paper
Code
Text-to-LoRA
Paper
Code
By training a Hypernetwork to generate LoRA adapters on the fly, these methods allow models to instantly internalize new information or adapt to new tasks.
Biological systems naturally rely on two key cognitive abilities: durable long-term memory to store facts, and rapid adaptation to handle new tasks given limited sensory cues. While modern LLMs are highly capable, they still lack this flexibility. Traditionally, adding long-term memory or adapting an LLM to a specific downstream task requires an expensive and time-consuming model update, such as fine-tuning or context distillation, or relies on memory-intensive long prompts.
To bypass these limitations, this work focuses on the concept of cost amortization. Researchers pay the meta-training cost once to train a hypernetwork capable of producing tasks or document specific LoRAs on demand. This turns what used to be a heavy engineering pipeline into a single, inexpensive forward pass. Instead of performing per-task optimization, the hypernetwork meta-learns update rules to instantly modify an LLM given a new task description or a long document.
In experiments, Text-to-LoRA successfully specializes models to unseen tasks using just a natural language description. Building on this, Doc-to-LoRA is able to internalize factual documents. On a needle-in-a-haystack task, Doc-to-LoRA achieves near-perfect accuracy on instances five times longer than the base model's context window. It can even generalize to transfer visual information from a vision-language model into a text-only LLM, allowing it to classify images purely through internalized weights.
Importantly, both methods run with sub-second latency, enabling rapid experimentation while avoiding the overhead of traditional model updates. This approach is a step towards lowering the technical barriers of model customization, allowing end-users to specialize foundation models via simple text inputs.
Doc-to-LoRA
Paper
Code
Text-to-LoRA
Paper
Code
arXiv.org
Doc-to-LoRA: Learning to Instantly Internalize Contexts
Long input sequences are central to in-context learning, document understanding, and multi-step reasoning of Large Language Models (LLMs). However, the quadratic attention cost of Transformers...
🔥3👏3🥰2
Anthropic dropped new feature lets you import your entire memory from chatGPT, Gemini etc into Claude so it instantly knows everything about you. no more reminding claude who you are.
Claude
Switch to Claude without starting over | Claude
Transfer your preferences, projects, and context from other AI providers into Claude. Switch without losing what makes your AI useful.
🔥5❤3💯2
REMem: Reasoning with Episodic Memory in AI Agents
REMem addresses a capability gap in many RAG/memory systems: not just storing documents or facts, but also recollecting specific past events with their situational grounding (when/where/who/what) and then reasoning across multiple events on a timeline.
GitHub.
REMem addresses a capability gap in many RAG/memory systems: not just storing documents or facts, but also recollecting specific past events with their situational grounding (when/where/who/what) and then reasoning across multiple events on a timeline.
GitHub.
arXiv.org
REMem: Reasoning with Episodic Memory in Language Agent
Humans excel at remembering concrete experiences along spatiotemporal contexts and performing reasoning across those events, i.e., the capacity for episodic memory. In contrast, memory in language...
🔥2🥰2💯2
Visa is leaning hard into agentic commerce and stablecoins.
• Agentic commerce live in the US & CEMEA, expanding globally
• Stablecoin cards now in 50+ countries
• USDC settlement live in the US
• $4.6B annualized stablecoin volume
Visa is positioning itself as the infrastructure layer between crypto and traditional finance.
• Agentic commerce live in the US & CEMEA, expanding globally
• Stablecoin cards now in 50+ countries
• USDC settlement live in the US
• $4.6B annualized stablecoin volume
Visa is positioning itself as the infrastructure layer between crypto and traditional finance.
👍2🔥2🥰2👎1
Researchers adapted the Avey architecture to the encoder paradigm and called the result Avey-B, a next-generation alternative to BERT with unlimited context length.
Avey is an alternative architecture to Transformers from last year.
It scales linearly with context-length and performs better at long-context tasks (needle).
They now showed that it works just as well in BERT-style model.
This approach definitively needs more attention.
HuggingFace.
GitHub
Avey is an alternative architecture to Transformers from last year.
It scales linearly with context-length and performs better at long-context tasks (needle).
They now showed that it works just as well in BERT-style model.
This approach definitively needs more attention.
HuggingFace.
GitHub
arXiv.org
Avey-B
Compact pretrained bidirectional encoders remain the backbone of industrial NLP under tight compute and memory budgets. Their effectiveness stems from self-attention's ability to deliver...
❤3🔥3💯2
DoubleAI’s AI system beat a decade of expert GPU engineering
WarpSpeed just beat a decade of expert-engineered GPU kernels — every single one of them.
cuGraph is one of the most widely used GPU-accelerated libraries in the world. It spans dozens of graph algorithms, each written and continuously refined by some of the world’s top performance engineers.
DoubleAI’s WarpSpeed autonomously rewrote and re-optimized these kernels across three GPU architectures (A100, L4, A10G). DoubleAI released the hyper-optimized version on GitHub — install it with no change to your code.
The numbers: - 3.6x average speedup over human experts - 100% of kernels benefit from speedup - 55% see more than 2x improvement.
Winning Gold at IMO 2025.
Codeforces benchmarks.
From Reasoning to Super-Intelligence: A Search-Theoretic Perspective.
WarpSpeed just beat a decade of expert-engineered GPU kernels — every single one of them.
cuGraph is one of the most widely used GPU-accelerated libraries in the world. It spans dozens of graph algorithms, each written and continuously refined by some of the world’s top performance engineers.
DoubleAI’s WarpSpeed autonomously rewrote and re-optimized these kernels across three GPU architectures (A100, L4, A10G). DoubleAI released the hyper-optimized version on GitHub — install it with no change to your code.
The numbers: - 3.6x average speedup over human experts - 100% of kernels benefit from speedup - 55% see more than 2x improvement.
Winning Gold at IMO 2025.
Codeforces benchmarks.
From Reasoning to Super-Intelligence: A Search-Theoretic Perspective.
ByteDance published CUDA Agent
It trained a model that writes fast CUDA kernels. Not just correct ones — actually optimized ones.
It beats torch.compile by 2× on simple/medium kernels, ~92% on complex ones, and even outperforms Claude Opus 4.5 and Gemini 3 Pro by ~40% on the hardest setting.
The key idea is simple but kind of brilliant:
CUDA performance isn’t about correctness, it’s about hardware. Warps, memory bandwidth, bank conflicts — the stuff you only see in a profiler.
So instead of rewarding “did it compile?”, they reward actual GPU speed. Real profiling numbers. RL trained directly on performance.
Paper.
It trained a model that writes fast CUDA kernels. Not just correct ones — actually optimized ones.
It beats torch.compile by 2× on simple/medium kernels, ~92% on complex ones, and even outperforms Claude Opus 4.5 and Gemini 3 Pro by ~40% on the hardest setting.
The key idea is simple but kind of brilliant:
CUDA performance isn’t about correctness, it’s about hardware. Warps, memory bandwidth, bank conflicts — the stuff you only see in a profiler.
So instead of rewarding “did it compile?”, they reward actual GPU speed. Real profiling numbers. RL trained directly on performance.
Paper.
cuda-agent.github.io
CUDA Agent | Large-Scale Agentic RL for CUDA Kernel Generation
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation.
❤1🔥1💯1
All about AI, Web 3.0, BCI
Visa is leaning hard into agentic commerce and stablecoins. • Agentic commerce live in the US & CEMEA, expanding globally • Stablecoin cards now in 50+ countries • USDC settlement live in the US • $4.6B annualized stablecoin volume Visa is positioning…
Visa announced partnership with Bridge(Stripe) to launch stablecoin-linked cards in 100+ countries.
These cards will be backed by stablecoin balances, enabling efficient and integrated global coverage.
These cards will be backed by stablecoin balances, enabling efficient and integrated global coverage.
Fortune
Visa to expand card partnership with Stripe’s Bridge to over 100 countries | Fortune
The two firms previously launched stablecoin-backed cards for 18 countries in April.
🔥4❤1🥰1
Meta is testing a shopping research feature in its Meta AI web browser for select US users, positioning it against e-commerce tools in ChatGPT and Gemini
Bloomberg.com
Meta Tests AI Shopping Research Tool to Rival ChatGPT, Gemini
Meta Platforms Inc. is testing a shopping research feature in its artificial intelligence chatbot, rivaling a similar tool offered by OpenAI’s ChatGPT and Google’s Gemini.
🔥1🥰1👏1
Alibaba's top AI researcher resigned immediately after Qwen's most successful model launch ever.
The day Junyang Lin announced his departure, Qwen released FOUR brand new models, including one that can run on just 7 gigs of RAM.
The models got rave reviews, including one from Elon Musk, who praised its "density of intelligence." The models were/are free to use.
Lin was seen as the most important developer at Qwen. He was also a big open source advocate. His departure led to speculation that he'd been forced out against his will. Chinese AI researchers You Jiacheng and Chen Cheng shared this view.
Why did this happen?
Some are saying it was a money thing. All of Alibaba's Qwen models until now have been completely open source, meaning that people can download them and run them locally, generating no revenue for Alibaba. Reportedly, company execs were frustrated that the open source models were not helping get users for Alibaba's revenue-generating services (e.g. Alibaba Cloud, subscription services etc).
Shortly before Lin quit, Alibaba had hired people who had worked on Google's Gemini, reportedly with an eye to increasing Daily Active Users (DAUs). If these reports are correct then we can expect that Alibaba will put more emphasis on monetizing its AI going forward. That should drive higher revenue, though it will likely mean the end of these powerful free Qwen models we've been seeing lately.
Word on the street is that Alibaba is tightening the screws to make money via proprietary cloud and API rather than open source.
The day Junyang Lin announced his departure, Qwen released FOUR brand new models, including one that can run on just 7 gigs of RAM.
The models got rave reviews, including one from Elon Musk, who praised its "density of intelligence." The models were/are free to use.
Lin was seen as the most important developer at Qwen. He was also a big open source advocate. His departure led to speculation that he'd been forced out against his will. Chinese AI researchers You Jiacheng and Chen Cheng shared this view.
Why did this happen?
Some are saying it was a money thing. All of Alibaba's Qwen models until now have been completely open source, meaning that people can download them and run them locally, generating no revenue for Alibaba. Reportedly, company execs were frustrated that the open source models were not helping get users for Alibaba's revenue-generating services (e.g. Alibaba Cloud, subscription services etc).
Shortly before Lin quit, Alibaba had hired people who had worked on Google's Gemini, reportedly with an eye to increasing Daily Active Users (DAUs). If these reports are correct then we can expect that Alibaba will put more emphasis on monetizing its AI going forward. That should drive higher revenue, though it will likely mean the end of these powerful free Qwen models we've been seeing lately.
Word on the street is that Alibaba is tightening the screws to make money via proprietary cloud and API rather than open source.
Venturebeat
Did Alibaba just kneecap its powerful Qwen AI team? Key figures depart in wake of latest open source release
The takeaway? If you value Qwen's open source efforts, download and preserve the models now, while you still can.
🔥1👏1💯1
Physical Intelligence made a memory system for their models and call it Multi-Scale Embodied Memory (MEM).
It provides both short-term and long-term memory to enable very long tasks.
Researchers tested it on cleaning a kitchen (and yes, washing dishes), making grilled cheese, and more.
One of the cool side effects of MEM is in-context adaptation: when the robot makes a mistake, like opening the fridge door from the wrong side, it remembers what happened and tries the task again in a different way.
It provides both short-term and long-term memory to enable very long tasks.
Researchers tested it on cleaning a kitchen (and yes, washing dishes), making grilled cheese, and more.
One of the cool side effects of MEM is in-context adaptation: when the robot makes a mistake, like opening the fridge door from the wrong side, it remembers what happened and tries the task again in a different way.
www.pi.website
VLAs with Long and Short-Term Memory
Multi-Scale Embodied Memory (MEM) gives our models both long-term and short-term memory, enabling complex tasks longer than ten minutes.
🔥6👏2💯1