All about AI, Web 3.0, BCI
3.59K subscribers
754 photos
26 videos
162 files
3.32K links
This channel about AI, Web 3.0 and brain computer interface(BCI)

owner @Aniaslanyan
Download Telegram
Alibaba published a paper that shows AI is moving beyond bug finding and into actually proving software is exploitable.

This paper asks a simple question with hard consequences: can LLMs confirm software vulnerabilities by actually building working exploits?

The authors’ answer is yes, but only when the model stops acting like a single genius and starts acting like a team.
Meta presented a world model that models the computer
🆒32🏆2🔥1💯1
What if AI could invent enzymes that nature hasn’t seen? Meet DISCO: Diffusion for Sequence-structure CO-design

14 rounds of directed evolution and over a year of wet lab work.

That's what it took to engineer an enzyme for selective C(sp³)–H insertion, one of the most challenging transformations in organic chemistry.

DISCO surpasses this with a single plate. No pre-specified catalytic residues, no template, no theozyme, no inverse folding, just joint diffusion over protein sequence and structure.

Paper
Code
🆒4👍2🔥2🏆2💯1
Tencent released HY-Embodied-0.5, a family of foundation models for real-world embodied agents. The 2B model is now open source.

The suite includes:

2B for edge deployment
32B for complex reasoning

Key innovations:
1. Mixture-of-Transformers (MoT) architecture for modality-specific computation
2. Latent tokens for improved perceptual representation
3. Self-evolving post-training
4. On-policy distillation from large to small models

Across 22 benchmarks, the 2B model outperforms similarly sized SOTA systems on 16 tasks. The 32B model approaches frontier-level performance.

GitHub
Hugging Face
🔥2🥰2👏2
Polygon Labs is in early talks to raise up to $100 million to launch a new stablecoin payments business, according to sources.

It's rare for a blockchain developer to enter regulated payments business. With this move, Polygon hopes to drive stablecoin volume on its blockchain.

In Jan., Polygon Labs agreed to acquire Coinme and Sequence, positioning to compete with the likes of Stripe
3🔥2💯2
Cool work. R-Zero - self-evolving LLM from zero external data.

One base model, two roles:

1. Challenger generates hard problems

2. Solver solves them.

Challenger is rewarded when Solver fails. Co-evolve with GRPO. Challenger learns to probe for weaknesses, not just generate hard problems.

+6.49 math, +7.54 general reasoning on Qwen3-4B-Base. 3 iterations, no human data.
4
China defined what an AI Hospital is

It is a new type of smart healthcare model in which AI is embedded into the system itself, linking offline medical expertise with the broader reach of online services to deliver more proactive and continuous care.

Patients become the point-of-care with the help of AI.
Sneak leak at something coming soon to Claude. This could be a fullstack vibe coding competitor to the likes of lovable.

It’s been apparent for some time that Anthropic's consumer story would be vibe coding as it's at the intersection of where they focus, what consumers want, and where enormous token subsidies tilts the board in their favor:

- coding agents, sensing this, have moved up the abstraction stack and smartly evolved into small business platforms, with payments, hosting, marketing, social and other sticky primitives around the model

- this is an industry not a market and in that world the "coding intelligence" primitive will be priced, packaged, productized and delivered in a thousand ways for a thousand different customers.
Google presented Sparse Selective Caching, an architecture with growing effective memory (similar to attention) but with almost constant inference cost per token (similar to RNNs).

In the paper team mainly discuss:

1) the shared foundation for both softmax attention and fixed-size long-term memory modules (or RNNs) that helped design an architecture with best of both worlds;

2) different variants of memory caching, including a variant whose effective memory is growing while the decoding cost still remains “constant”;

3) a unifying perspective to understand hybrid models, in which attention and recurrent models are combined.
3
Turns out we can get SOTA on agentic benchmarks with a simple test-time method

Meet
LLM-as-a-Verifier

Test-time scaling is effective, but picking the "winner" among many candidates is the bottleneck. This way to extract a cleaner signal from the model:

1. Ask the LLM to rank results on a scale of 1-k
2. Use the log-probs of those rank tokens to calculate an expected score

You can get a verification score in a single sampling pass per candidate pair.

Code
2
Tether launches self-custodial wallet for end users

The wallet supports USDT, USAT, XAUT and Bitcoin across Ethereum, Polygon, Arbitrum, Plasma, and Bitcoin / Lightning Network, and enables transfers via human-readable usernames such as name@tether.me.

Tether said more than 570 million wallets were already using its technology as of March 2026.
LangChain released async subagents - kick off background tasks on any Agent Protocol backed server while you continue to interact with the main agent.

Async subagents will become more and more of a thing, as subagents get longer running and you don’t want to block the event loop

Expanded multimodal support - your agent can now see images, listen to audio, watch video, and read PDFs. The read_file tool returns native content blocks, so your agent can reason across all these formats out of the box, unlocking a whole new set of workflows for your agents.

Improved prompt caching - better token efficiency and lower costs for Claude models.
❤‍🔥2
Claude Code shipped routines

You tell it what to do, point it at your project, set a trigger, and it runs 24/7 on their servers with your laptop closed.

Docs.

The model is the commodity. The trigger is the product and whoever maps the most valuable real world events to the most specific industry workflows is going to build something massive.
Meet Motus, the open-source agent infrastructure that learns in production

Existing agent infra serves static agents: the harness, model, and workflow are fixed after deployment. But static agents degrade over time. The harness goes stale, new models go unincorporated, context drifts, and latency compounds.

Motus closes this gap by learning from every trace (failures, latency, cost, and task outcomes) and using those signals to continuously optimize agent harness, model orchestration, context memory, and end-to-end latency.

Early results: higher accuracy than any single frontier model at 2.3× lower cost (Terminal-Bench 2.0, SWE-bench Verified), with 52% lower latency and 45% better memory recall.

Open source under Apache 2.0. Works with any agent SDK. Deploy with one command.

GitHub.
🤝2
Someone just dropped a fully liberated Gemma 4 E4B

But the real story here isn't the model itself, it's how it was made.

This was done (nearly) fully autonomously: one human, one agent, one skill, 8 prompts total.
The agent didn't just execute instructions. It diagnosed numerical instability in Gemma 4's new architecture, wrote three patches for a bug no one had hit before, iterated through four failed attempts and shipped a 17GB model to HuggingFace.

Without being asked.

Original Gemma 4: 98.8% refusal rate.
OBLITERATED: 2.1%.
Coding ability: +20%.
Coherence: fully intact.

What we're watching isn't a jailbreak story. It's a proof of concept for autonomous ML research. The agent ran evals, built a model card, pushed commits the full research cycle, compressed into one session.

The implications go beyond safety. When guardrail removal becomes an automated skill loadable from agent memory, the question is no longer technical. It's about how fast agentic tooling propagates and who has access to it first.

This is what open-source AI looks like in 2026.
2
Apple introduced Simple Self-Distillation: a fine-tuning method that improves models on coding tasks just by sampling from the model and training on its own outputs with plain cross-entropy

This paper literally came out 15 days ago, and it’s already integrated into TRL.

There’s a lot more to the distillation paradigm than meets the eye.
🆒3
Anthropic just now Introduced Claude Opus 4.7

It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back.

You can hand off your hardest work with less supervision.

Opus 4.7 also has substantially better vision. It can see images at more than three times the resolution and produces higher-quality interfaces, slides, and docs as a result.

On the API, a new xhigh effort level between high and max gives you finer control over reasoning and latency on hard problems. Task budgets (beta) help Claude prioritize work and manage costs across longer runs.

Opus 4.7 has a new tokenizer.
This means it's also a new base model.
Glory days of pretraining still very much going.


In Claude Code, the new /ultrareview command runs a dedicated review session that reads through your changes and flags what a careful reviewer would catch.

Also extended auto mode to Max users, so longer tasks run with fewer interruptions.
🆒9