All about AI, Web 3.0, BCI
3.59K subscribers
754 photos
26 videos
162 files
3.32K links
This channel about AI, Web 3.0 and brain computer interface(BCI)

owner @Aniaslanyan
Download Telegram
Sneak leak at something coming soon to Claude. This could be a fullstack vibe coding competitor to the likes of lovable.

It’s been apparent for some time that Anthropic's consumer story would be vibe coding as it's at the intersection of where they focus, what consumers want, and where enormous token subsidies tilts the board in their favor:

- coding agents, sensing this, have moved up the abstraction stack and smartly evolved into small business platforms, with payments, hosting, marketing, social and other sticky primitives around the model

- this is an industry not a market and in that world the "coding intelligence" primitive will be priced, packaged, productized and delivered in a thousand ways for a thousand different customers.
Google presented Sparse Selective Caching, an architecture with growing effective memory (similar to attention) but with almost constant inference cost per token (similar to RNNs).

In the paper team mainly discuss:

1) the shared foundation for both softmax attention and fixed-size long-term memory modules (or RNNs) that helped design an architecture with best of both worlds;

2) different variants of memory caching, including a variant whose effective memory is growing while the decoding cost still remains “constant”;

3) a unifying perspective to understand hybrid models, in which attention and recurrent models are combined.
3
Turns out we can get SOTA on agentic benchmarks with a simple test-time method

Meet
LLM-as-a-Verifier

Test-time scaling is effective, but picking the "winner" among many candidates is the bottleneck. This way to extract a cleaner signal from the model:

1. Ask the LLM to rank results on a scale of 1-k
2. Use the log-probs of those rank tokens to calculate an expected score

You can get a verification score in a single sampling pass per candidate pair.

Code
2
Tether launches self-custodial wallet for end users

The wallet supports USDT, USAT, XAUT and Bitcoin across Ethereum, Polygon, Arbitrum, Plasma, and Bitcoin / Lightning Network, and enables transfers via human-readable usernames such as name@tether.me.

Tether said more than 570 million wallets were already using its technology as of March 2026.
LangChain released async subagents - kick off background tasks on any Agent Protocol backed server while you continue to interact with the main agent.

Async subagents will become more and more of a thing, as subagents get longer running and you don’t want to block the event loop

Expanded multimodal support - your agent can now see images, listen to audio, watch video, and read PDFs. The read_file tool returns native content blocks, so your agent can reason across all these formats out of the box, unlocking a whole new set of workflows for your agents.

Improved prompt caching - better token efficiency and lower costs for Claude models.
❤‍🔥2
Claude Code shipped routines

You tell it what to do, point it at your project, set a trigger, and it runs 24/7 on their servers with your laptop closed.

Docs.

The model is the commodity. The trigger is the product and whoever maps the most valuable real world events to the most specific industry workflows is going to build something massive.
Meet Motus, the open-source agent infrastructure that learns in production

Existing agent infra serves static agents: the harness, model, and workflow are fixed after deployment. But static agents degrade over time. The harness goes stale, new models go unincorporated, context drifts, and latency compounds.

Motus closes this gap by learning from every trace (failures, latency, cost, and task outcomes) and using those signals to continuously optimize agent harness, model orchestration, context memory, and end-to-end latency.

Early results: higher accuracy than any single frontier model at 2.3× lower cost (Terminal-Bench 2.0, SWE-bench Verified), with 52% lower latency and 45% better memory recall.

Open source under Apache 2.0. Works with any agent SDK. Deploy with one command.

GitHub.
🤝2
Someone just dropped a fully liberated Gemma 4 E4B

But the real story here isn't the model itself, it's how it was made.

This was done (nearly) fully autonomously: one human, one agent, one skill, 8 prompts total.
The agent didn't just execute instructions. It diagnosed numerical instability in Gemma 4's new architecture, wrote three patches for a bug no one had hit before, iterated through four failed attempts and shipped a 17GB model to HuggingFace.

Without being asked.

Original Gemma 4: 98.8% refusal rate.
OBLITERATED: 2.1%.
Coding ability: +20%.
Coherence: fully intact.

What we're watching isn't a jailbreak story. It's a proof of concept for autonomous ML research. The agent ran evals, built a model card, pushed commits the full research cycle, compressed into one session.

The implications go beyond safety. When guardrail removal becomes an automated skill loadable from agent memory, the question is no longer technical. It's about how fast agentic tooling propagates and who has access to it first.

This is what open-source AI looks like in 2026.
2
Apple introduced Simple Self-Distillation: a fine-tuning method that improves models on coding tasks just by sampling from the model and training on its own outputs with plain cross-entropy

This paper literally came out 15 days ago, and it’s already integrated into TRL.

There’s a lot more to the distillation paradigm than meets the eye.
🆒3
Anthropic just now Introduced Claude Opus 4.7

It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back.

You can hand off your hardest work with less supervision.

Opus 4.7 also has substantially better vision. It can see images at more than three times the resolution and produces higher-quality interfaces, slides, and docs as a result.

On the API, a new xhigh effort level between high and max gives you finer control over reasoning and latency on hard problems. Task budgets (beta) help Claude prioritize work and manage costs across longer runs.

Opus 4.7 has a new tokenizer.
This means it's also a new base model.
Glory days of pretraining still very much going.


In Claude Code, the new /ultrareview command runs a dedicated review session that reads through your changes and flags what a careful reviewer would catch.

Also extended auto mode to Max users, so longer tasks run with fewer interruptions.
🆒9
Nearly 1/3 of surveyed people in Anthropic now think entry-level engineers and researchers are likely replaced by Mythos within 3 months.
😁6🤯3🤣1
OpenAI introduced GPT-Rosalind, a frontier reasoning model built to support research across biology, drug discovery, and translational medicine.

GPT-Rosalind is optimized for scientific workflows, with stronger performance in protein and chemical reasoning, genomics analysis, biochemistry knowledge, and scientific tool use.
Cool work by Google. Team built an AI system that discovers health biomarkers from wearable data: CoDaS

One of its first findings: "late-night doomscrolling" is a statistically validated predictor of depression severity (ρ = 0.177, p < 0.001, n = 7,497).

The AI named the feature. No human guidance.

CoDaS is a multi-agent system that runs the full biomarker discovery lifecycle autonomously:

Sensor data → Generate hypotheses → Run statistical + ML analysis → Conduct adversarial validation → Write manuscript

Research team deploy it across 9,279 participants and 3 clinical cohorts.

Here's where it gets interesting.

On one cohort, CoDaS found a feature with R² = 0.963. Near-perfect prediction of insulin resistance, passing 10/11 tests.

Then the AI rejected it. Finding it was glucose², a tautological transform of the target. True R² after removal: 0.389.

Researchers ran a blind expert evaluation. 15 domain experts, 76 manuscript assessments.

CoDaS: 86% acceptance rate, AI Co-Scientist: 85% rejection rate, Data Science Agent: 95% rejection rate,
Biomni: 100% rejection rate

No baseline received a single Accept or Minor Revision.

A surprising result: CoDaS found circadian instability features in two separate depression cohorts.

Sleep duration variability in one (ρ = 0.252). Sleep onset variability in the other (ρ = 0.126).

The cohorts were processed completely independently.

CoDaS compressed ~37 person-days of research (expert estimate) into 6-8 hours.

But the point isn't speed. It's that separating exploration from adversarial validation at the architecture level produces biomarker candidates that domain experts rate as scientifically valid.