All about AI, Web 3.0, BCI
3.44K subscribers
739 photos
26 videos
161 files
3.22K links
This channel about AI, Web 3.0 and brain computer interface(BCI)

owner @Aniaslanyan
Download Telegram
An AI chip startup MatX founded by 2 Google alumni has raised more than $500 million in a new round to compete with Nvidia

They’re building an LLM chip that delivers much higher throughput than any other chip while also achieving the lowest latency. Call it the MatX One.

The MatX One chip is based on a splittable systolic array, which has the energy and area efficiency that large systolic arrays are famous for, while also getting high utilization on smaller matrices with flexible shapes. The chip combines the low latency of SRAM-first designs with the long-context support of HBM. These elements, plus a fresh take on numerics, deliver higher throughput on LLMs than any announced system, while simultaneously matching the latency of SRAM-first designs. Higher throughput and lower latency give you smarter and faster models for your subscription dollar.
🔥3🥰2👏21
Anthropic rolled out Remote Control for Claude Code, letting users control a session begun in the terminal from the Claude mobile app or the web.

Remote Control is now available in Research Preview for Max users, and coming soon to Pro users.

Run claude rc to get started.

The aha moment from Openclaw was moving the control panel of your agent from desktop to where you are (mobile: whatsapp, telegram etc) and people lost their minds.

Very impressive to see the speed at which Anthropic saw that, built remote control and shipped.
🔥32🥰2
Berkeley and Princeton developed a new offline GCRL method based on multistep quasimetrics that can learn multistage tasks in the real world using the Bridge Dataset.

The tasks might seem simple, but they use exactly the same Bridge Dataset as older works, squeezing much more advanced multistage tasks from the same data!
1
This is insane. Next.js rebuilt based on Vite, and it only took one week and $1,100 in tokens.

Code was always the cheap part tho, knowing what to build and why is the hard part.

Cloudflare didnt just throw tokens at next.js — they had deep opinions about edge architecture and knew exactly where the framework needed to change. $1100 in tokens + years of infra expertise.

The expertise is the expensive input nobody counts
🔥32👏2
Nvidia introduced synthetic data for terminal use.
🔥3🥰2👏2
Meet LUMI-lab is a self-driving lab that closes the loop between an AI foundation model + robotics to accelerate lipid nanoparticle (LNP) discovery for mRNA delivery.

LUMI-lab (Large-scale Unsupervised Modeling followed by Iterative experiments) is a self-driving laboratory that tightly closes the loop between an AI foundation model and automated robotics to accelerate LNP discovery for mRNA delivery.

To tackle data scarcity in emerging mRNA delivery domains, pretrained the model on 28M+ molecular structures, then iteratively improved it with closed-loop experimental data.

In this work, across ten active-learning cycles, LUMI-lab synthesized and evaluated 1,700+ new LNPs and unexpectedly identified a new design feature for efficient delivery: brominated lipid tails.

These brominated-tail ionizable lipids delivered mRNA into human lung cells more efficiently than approved benchmarks, despite representing only a small fraction of the initial chemical space explored.

GitHub.
Check the video here.
3🔥3👏2
A good model of the world requires not just great graphics but spatial and world intelligence so that you can understand how objects move and respond, what actions cause what outcomes, and what the effects of interactions by players are.

Moonlake's world model delivers that.
🔥32🥰2
Google introduced Nano Banana 2

It uses Gemini’s understanding of the world and is powered by real-time information and images from web search. That means it can better reflect real-world conditions in high-fidelity.

Check out "Window Seat," a demo using Nano Banana 2’s world understanding to generate more accurate views from any window in the world, pulling live local weather info with 2K/4K specs. The precision is mind blowing.

Rolling out today as the new default in the Geminiapp, Search (across 141 countries), and Flow + available in preview via Google AIStudio and Vertex AI. Also available in Google Antigravity.
👏3🔥2🥰2
New from DeepSeek: DualPath

Researchers from Peking University, Tsinghua University, and #DeepSeek unveiled DualPath to fix the storage bandwidth bottleneck, which may be the secret killer of LLM agent performance.

Instead of letting data get stuck in a single storage traffic jam, DualPath creates a second highway for data to travel.

It loads saved model memory into idle decoding engines and then zips it over to the processing engines using high-speed internal networks, ensuring no part of the system sits idle while waiting for data.

The results are massive: DualPath boosts offline throughput by up to 1.87x and nearly doubles online serving speeds without violating performance targets.
🔥32🥰2
Sakana introduced Doc-to-LoRA and Text-to-LoRA, two related research exploring how to make LLM customization faster and more accessible.

By training a Hypernetwork to generate LoRA adapters on the fly, these methods allow models to instantly internalize new information or adapt to new tasks.

Biological systems naturally rely on two key cognitive abilities: durable long-term memory to store facts, and rapid adaptation to handle new tasks given limited sensory cues. While modern LLMs are highly capable, they still lack this flexibility. Traditionally, adding long-term memory or adapting an LLM to a specific downstream task requires an expensive and time-consuming model update, such as fine-tuning or context distillation, or relies on memory-intensive long prompts.

To bypass these limitations, this work focuses on the concept of cost amortization. Researchers pay the meta-training cost once to train a hypernetwork capable of producing tasks or document specific LoRAs on demand. This turns what used to be a heavy engineering pipeline into a single, inexpensive forward pass. Instead of performing per-task optimization, the hypernetwork meta-learns update rules to instantly modify an LLM given a new task description or a long document.

In experiments, Text-to-LoRA successfully specializes models to unseen tasks using just a natural language description. Building on this, Doc-to-LoRA is able to internalize factual documents. On a needle-in-a-haystack task, Doc-to-LoRA achieves near-perfect accuracy on instances five times longer than the base model's context window. It can even generalize to transfer visual information from a vision-language model into a text-only LLM, allowing it to classify images purely through internalized weights.

Importantly, both methods run with sub-second latency, enabling rapid experimentation while avoiding the overhead of traditional model updates. This approach is a step towards lowering the technical barriers of model customization, allowing end-users to specialize foundation models via simple text inputs.

Doc-to-LoRA
Paper
Code

Text-to-LoRA
Paper
Code
🔥3👏3🥰2
Anthropic dropped new feature lets you import your entire memory from chatGPT, Gemini etc into Claude so it instantly knows everything about you. no more reminding claude who you are.
🔥53💯2
REMem: Reasoning with Episodic Memory in AI Agents

REMem addresses a capability gap in many RAG/memory systems: not just storing documents or facts, but also recollecting specific past events with their situational grounding (when/where/who/what) and then reasoning across multiple events on a timeline.

GitHub.
🔥2🥰2💯2
Visa is leaning hard into agentic commerce and stablecoins.

• Agentic commerce live in the US & CEMEA, expanding globally

• Stablecoin cards now in 50+ countries

• USDC settlement live in the US

• $4.6B annualized stablecoin volume

Visa is positioning itself as the infrastructure layer between crypto and traditional finance.
👍2🔥2🥰2👎1
Researchers adapted the Avey architecture to the encoder paradigm and called the result Avey-B, a next-generation alternative to BERT with unlimited context length.

Avey is an alternative architecture to Transformers from last year.

It scales linearly with context-length and performs better at long-context tasks (needle).

They now showed that it works just as well in BERT-style model.

This approach definitively needs more attention.

HuggingFace.
GitHub
3🔥3💯2
DoubleAI’s AI system beat a decade of expert GPU engineering

WarpSpeed just beat a decade of expert-engineered GPU kernels — every single one of them.

cuGraph is one of the most widely used GPU-accelerated libraries in the world. It spans dozens of graph algorithms, each written and continuously refined by some of the world’s top performance engineers.

DoubleAI’s WarpSpeed autonomously rewrote and re-optimized these kernels across three GPU architectures (A100, L4, A10G). DoubleAI released the hyper-optimized version on GitHub — install it with no change to your code.

The numbers: - 3.6x average speedup over human experts - 100% of kernels benefit from speedup - 55% see more than 2x improvement.

Winning Gold at IMO 2025.

Codeforces benchmarks.

From Reasoning to Super-Intelligence: A Search-Theoretic Perspective.
ByteDance published CUDA Agent

It trained a model that writes fast CUDA kernels. Not just correct ones — actually optimized ones.

It beats torch.compile by 2× on simple/medium kernels, ~92% on complex ones, and even outperforms Claude Opus 4.5 and Gemini 3 Pro by ~40% on the hardest setting.

The key idea is simple but kind of brilliant:

CUDA performance isn’t about correctness, it’s about hardware. Warps, memory bandwidth, bank conflicts — the stuff you only see in a profiler.

So instead of rewarding “did it compile?”, they reward actual GPU speed. Real profiling numbers. RL trained directly on performance.

Paper.
1🔥1💯1
Alibaba's top AI researcher resigned immediately after Qwen's most successful model launch ever.

The day Junyang Lin announced his departure, Qwen released FOUR brand new models, including one that can run on just 7 gigs of RAM.

The models got rave reviews, including one from Elon Musk, who praised its "density of intelligence." The models were/are free to use.

Lin was seen as the most important developer at Qwen. He was also a big open source advocate. His departure led to speculation that he'd been forced out against his will. Chinese AI researchers You Jiacheng and Chen Cheng shared this view.

Why did this happen?

Some are saying it was a money thing. All of Alibaba's Qwen models until now have been completely open source, meaning that people can download them and run them locally, generating no revenue for Alibaba. Reportedly, company execs were frustrated that the open source models were not helping get users for Alibaba's revenue-generating services (e.g. Alibaba Cloud, subscription services etc).

Shortly before Lin quit, Alibaba had hired people who had worked on Google's Gemini, reportedly with an eye to increasing Daily Active Users (DAUs). If these reports are correct then we can expect that Alibaba will put more emphasis on monetizing its AI going forward. That should drive higher revenue, though it will likely mean the end of these powerful free Qwen models we've been seeing lately.

Word on the street is that Alibaba is tightening the screws to make money via proprietary cloud and API rather than open source.
🔥1👏1💯1
Physical Intelligence made a memory system for their models and call it Multi-Scale Embodied Memory (MEM).

It provides both short-term and long-term memory to enable very long tasks.

Researchers tested it on cleaning a kitchen (and yes, washing dishes), making grilled cheese, and more.

One of the cool side effects of MEM is in-context adaptation: when the robot makes a mistake, like opening the fridge door from the wrong side, it remembers what happened and tries the task again in a different way.
🔥6👏2💯1