All about AI, Web 3.0, BCI
3.44K subscribers
739 photos
26 videos
161 files
3.22K links
This channel about AI, Web 3.0 and brain computer interface(BCI)

owner @Aniaslanyan
Download Telegram
Anthropic dropped new feature lets you import your entire memory from chatGPT, Gemini etc into Claude so it instantly knows everything about you. no more reminding claude who you are.
🔥53💯2
REMem: Reasoning with Episodic Memory in AI Agents

REMem addresses a capability gap in many RAG/memory systems: not just storing documents or facts, but also recollecting specific past events with their situational grounding (when/where/who/what) and then reasoning across multiple events on a timeline.

GitHub.
🔥2🥰2💯2
Visa is leaning hard into agentic commerce and stablecoins.

• Agentic commerce live in the US & CEMEA, expanding globally

• Stablecoin cards now in 50+ countries

• USDC settlement live in the US

• $4.6B annualized stablecoin volume

Visa is positioning itself as the infrastructure layer between crypto and traditional finance.
👍2🔥2🥰2👎1
Researchers adapted the Avey architecture to the encoder paradigm and called the result Avey-B, a next-generation alternative to BERT with unlimited context length.

Avey is an alternative architecture to Transformers from last year.

It scales linearly with context-length and performs better at long-context tasks (needle).

They now showed that it works just as well in BERT-style model.

This approach definitively needs more attention.

HuggingFace.
GitHub
3🔥3💯2
DoubleAI’s AI system beat a decade of expert GPU engineering

WarpSpeed just beat a decade of expert-engineered GPU kernels — every single one of them.

cuGraph is one of the most widely used GPU-accelerated libraries in the world. It spans dozens of graph algorithms, each written and continuously refined by some of the world’s top performance engineers.

DoubleAI’s WarpSpeed autonomously rewrote and re-optimized these kernels across three GPU architectures (A100, L4, A10G). DoubleAI released the hyper-optimized version on GitHub — install it with no change to your code.

The numbers: - 3.6x average speedup over human experts - 100% of kernels benefit from speedup - 55% see more than 2x improvement.

Winning Gold at IMO 2025.

Codeforces benchmarks.

From Reasoning to Super-Intelligence: A Search-Theoretic Perspective.
ByteDance published CUDA Agent

It trained a model that writes fast CUDA kernels. Not just correct ones — actually optimized ones.

It beats torch.compile by 2× on simple/medium kernels, ~92% on complex ones, and even outperforms Claude Opus 4.5 and Gemini 3 Pro by ~40% on the hardest setting.

The key idea is simple but kind of brilliant:

CUDA performance isn’t about correctness, it’s about hardware. Warps, memory bandwidth, bank conflicts — the stuff you only see in a profiler.

So instead of rewarding “did it compile?”, they reward actual GPU speed. Real profiling numbers. RL trained directly on performance.

Paper.
1🔥1💯1
Alibaba's top AI researcher resigned immediately after Qwen's most successful model launch ever.

The day Junyang Lin announced his departure, Qwen released FOUR brand new models, including one that can run on just 7 gigs of RAM.

The models got rave reviews, including one from Elon Musk, who praised its "density of intelligence." The models were/are free to use.

Lin was seen as the most important developer at Qwen. He was also a big open source advocate. His departure led to speculation that he'd been forced out against his will. Chinese AI researchers You Jiacheng and Chen Cheng shared this view.

Why did this happen?

Some are saying it was a money thing. All of Alibaba's Qwen models until now have been completely open source, meaning that people can download them and run them locally, generating no revenue for Alibaba. Reportedly, company execs were frustrated that the open source models were not helping get users for Alibaba's revenue-generating services (e.g. Alibaba Cloud, subscription services etc).

Shortly before Lin quit, Alibaba had hired people who had worked on Google's Gemini, reportedly with an eye to increasing Daily Active Users (DAUs). If these reports are correct then we can expect that Alibaba will put more emphasis on monetizing its AI going forward. That should drive higher revenue, though it will likely mean the end of these powerful free Qwen models we've been seeing lately.

Word on the street is that Alibaba is tightening the screws to make money via proprietary cloud and API rather than open source.
🔥1👏1💯1
Physical Intelligence made a memory system for their models and call it Multi-Scale Embodied Memory (MEM).

It provides both short-term and long-term memory to enable very long tasks.

Researchers tested it on cleaning a kitchen (and yes, washing dishes), making grilled cheese, and more.

One of the cool side effects of MEM is in-context adaptation: when the robot makes a mistake, like opening the fridge door from the wrong side, it remembers what happened and tries the task again in a different way.
🔥6👏2💯1
MIT introduced NeuroSkill, a real-time agentic system that models human cognitive and emotional state by integrating Brain-Computer Interface signals with foundation models.

"Human State of Mind" provided via SKILL dot md.

The system runs fully offline on the edge.

Its NeuroLoop harness enables agentic workflows that engage users across cognitive and emotional levels, responding to both explicit and implicit requests through actionable tool calls.

Why does it matter?

Most AI agents respond only to explicit user requests. NeuroSkill explores the frontier of proactive agents that sense and respond to implicit human states, opening new possibilities for adaptive human-AI interaction.
🔥1🥰1💯1
Meta introduced Beyond Language Modeling: a deep dive into the design space of truly native multimodal models

Humans communicate through language and interact with the world through vision, yet most multimodal models are language-first.

First: how should we represent vision?
RAE uses a frozen SigLIP encoder as the image tokenizer. One encoder for both understanding and generation. Biggest wins on semantic and world-knowledge generation, where VAEs lose critical information through compression.

Next: how do we route modalities? Hand-crafted separation (MoT) or data-driven (MoE)?

MoE wins. But the interesting part is the emergent routing behavior. Without any hard architectural split, the network learns vision experts, language experts, and multimodal experts on its own. It even routes understanding and generation tokens to the same experts.

Why are frontier models still dense? Maybe they're using the wrong tokenizer.
VAE + MoE: no benefit from sparsity RAE + MoE: scales with sparsity
Semantic vision tokens unlock what MoE was designed to do.

Native multimodal pretraining enables zero-shot language-guided navigation. No navigation-specific data, no caption annotations. The grounding emerges from pretraining alone.
1🔥1💯1
Microsoft released Phi-4-reasoning-vision-15B, a compact and fast multimodal reasoning model, blends strengths of different methods while reducing their limits

The new phi multimodal model pushes the compute/perf pareto frontier via a careful design of many decisions, making it a good choice for edge applications.
Google Introduced a new method to teach LLMs to reason like Bayesians.

By training models to mimic optimal probabilistic inference, we improved their ability to update their predictions and generalize across new domains.
🔥32💯2
We are finally moving past the era of using separate vision encoders and generative models to build multimodal AI

SenseTime and NTU introduced NEO-unify, a native, unified, end-to-end paradigm.

Instead of using middleman tools to translate images, this model interacts directly with raw pixels and words. Yes—No VE, No VAE.

It uses a single, unified brain to handle both understanding and generation at the same time, learning everything from scratch without relying on pre-made vision components or traditional bottlenecks.

The system achieves pixel-level fidelity comparable to specialized tools like Flux VAE and shows much higher data-scaling efficiency than counterparts like Bagel, excelling in both image editing and complex visual reasoning.
🔥43💯2
Google just solved a theoretical physics problem using Gemini

Google, Harvard, and CMU built a neuro-symbolic system using the Gemini Deep Think model and a tree-search framework to autonomously discover complex mathematical proofs.

The agent functions like a digital scientist, testing different analytical paths and using numerical feedback to refine its logic until it finds a perfect solution.

This approach successfully solved a major open problem regarding gravitational radiation from cosmic strings, outperforming previous AI attempts by delivering full analytical solutions where others only found partial approximations.
🔥53💯3
Yann LeCun's new paper along with other top researchers says that chasing general AI is a mistake and we must build superhuman adaptable specialists instead.

The whole AI industry is obsessed with building machines that can do absolutely everything humans can do.

But this goal is fundamentally flawed because humans are actually highly specialized creatures optimized only for physical survival.

Instead of trying to force one giant model to master every possible task from folding laundry to predicting protein structures, they suggest building expert systems that learn generic knowledge through self-supervised methods.

By using internal world models to understand how things work, these specialized systems can quickly adapt to solve complex problems that human brains simply cannot handle.

This shift means we can stop wasting computing power on human traits and focus on building diverse tools that actually solve hard real-world problems.

So overall the researchers here propose a new target called Superhuman Adaptable Intelligence which focuses strictly on how fast a system learns new skills.

The paper explicitly argues that evolution shaped human intelligence strictly as a specialized tool for physical survival.

The researchers state that nature optimized our brains specifically for tasks necessary to stay alive in the physical world.

They explain that abilities like walking or seeing seem incredibly general to us only because they are absolutely critical for our existence.

The authors point out that humans are actually terrible at cognitive tasks outside this evolutionary comfort zone, like calculating massive mathematical probabilities.

The study highlights how a chess grandmaster only looks intelligent compared to other humans, while modern computers easily crush those human limits.

This proves their central point that humanity suffers from an illusion of generality simply because we cannot perceive our own biological blind spots.

They conclude that building machines to mimic this narrow human survival toolkit is a deeply flawed way to create advanced technology.
🔥4👏2💯21
Photos from Shenzhen: huge crowd of Chinese people (lots of grannies!) lining up to get help installing OpenClaw.

It’s looks like tech enthusiasm, but the real fuel is job market anxiety. China's employment pressure has been building since COVID. From DeepSeek in 2025 to OpenClaw now, media keeps pushing one message: master this tool, land a better job.

And the tech companies? They have every incentive to amplify the hype. More OpenClaw buzz means more cloud compute demand and more model API calls. The frenzy sells their infrastructure.

What looks like grassroots adoption is actually grassroots career panic, turbocharged by companies with something to sell.