Engineer Readings

[#AI #AICoding #AIAgents #AutonomousAgents #MultiAgentSystems #Cursor]

Scaling AI Coding: Lessons From Running Hundreds of Agents

Cursor shares how they pushed the limits of AI by running hundreds of autonomous coding agents at the same time on real software projects.

Instead of short tasks, these agents worked for weeks, edited shared codebases, and even helped build complex products like a web browser.

The biggest lesson?
Uncoordinated agents create chaos — but a planner + worker system keeps them aligned, focused, and productive over long periods.

The article shows that with the right structure, AI teams can tackle massive engineering challenges, similar to real human teams — and we’re just getting started.

🔗 Read more: https://cursor.com/blog/scaling-agents

Cursor

Scaling long-running autonomous coding · Cursor

We've been experimenting with running coding agents autonomously for weeks at a time.

🔥2

376 views05:15

Engineer Readings

[#AI #context #Manus]

Context Engineering for AI Agents: Lessons from Building Manus

The Manus team shares key insights from building their AI agent system, focusing on context engineering rather than training custom models. The article covers critical strategies like designing around KV-cache for better performance, using the filesystem as unlimited context storage, and keeping error traces to help agents learn from mistakes.
Key takeaways: maximize cache hit rates by keeping prompts stable, mask tools instead of removing them to maintain context integrity, and leverage the filesystem for persistent memory beyond token limits.

🔗 Read more: https://manus.im/de/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus

359 views08:51

Engineer Readings

[#AI #MachineLearning #LLM #AIInference #Hardware #Groq #LPU #AIAccelerators #DeepLearning #TechInnovation #ComputerArchitecture #AIHardware]

How Groq's LPU Achieves Blazing AI Inference Speed

Ever wondered how Groq runs a 1-trillion-parameter model like Kimi K2 in real-time? Their Language Processing Unit (LPU) is rewriting the rules of AI inference.

Key Innovations:
TruePoint Numerics – Strategic precision where it matters. 100 bits of intermediate accumulation enable 2-4× speedup over BF16 with zero accuracy loss. FP32 for critical operations, FP8 for error-tolerant layers.

SRAM-First Architecture – Hundreds of megabytes of on-chip SRAM as primary storage (not cache). Traditional GPUs suffer from HBM latency (hundreds of nanoseconds); LPU eliminates the wait with instant weight access.

Static Scheduling – The compiler pre-computes the entire execution graph down to individual clock cycles. No cache coherency protocols, no runtime delays. Deterministic execution enables tensor parallelism without tail latency.

Tensor Parallelism – Unlike GPUs that scale throughput via data parallelism, LPUs distribute single operations across chips to reduce latency. This is why trillion-parameter models generate tokens in real-time.

RealScale Interconnect – Plesiosynchronous chip-to-chip protocol aligns hundreds of LPUs to act as a single core. The compiler schedules both compute AND network timing.

The Results? First-gen LPU on 14nm process delivers 40× performance improvements. MMLU benchmarks show strong accuracy with no quality degradation.
Groq isn't optimizing around the edges—they rebuilt inference from the ground up for speed, scale, and efficiency.

🔗 Read the full technical breakdown: https://groq.com/blog/inside-the-lpu-deconstructing-groq-speed

Groq

Inside the LPU: Deconstructing Groq’s Speed

Discover how Groq's Language Processing Units (LPUs) achieve breakthrough AI inference speeds with 4 key architectural innovations: SRAM-centric design for instant weight access, statically scheduled networks for predictable performance, tensor parallelism…

351 views18:13

Engineer Readings

[#ai #engineering]
I think it’s important to watch this short video of a Netflix Staff engineer about usage of AI/LLM and perception society puts into that. There are some learning I wish more people could understand.
Thank you and I wish you all a great day!

https://youtu.be/eIoohUmYpGI?si=9A2q5kxelLZy7L5N

YouTube

"I shipped code I don't understand and I bet you have too" – Jake Nations, Netflix

In 1968, the term ""Software Crisis"" emerged when systems grew beyond what developers could manage. Every generation since has ""solved"" it with more powerful tools, only to create even bigger problems.

Today, AI accelerates the pattern into the Infinite…

👍2

359 views12:42

Engineer Readings

[#llm #debugging]
Interesting to observe such articles and stories where engineers are really into debugging some complex problems and not using hype around “llm can do everything for me”

https://mistral.ai/news/debugging-memory-leak-in-vllm

mistral.ai

Heaps do lie: debugging a memory leak in vLLM. | Mistral AI

430 views21:21

Engineer Readings

[#database #scaling]
OpenAI is solving same industry well known problems with scaling databases. Here is their journey described:

https://openai.com/index/scaling-postgresql/

OpenAI

Scaling PostgreSQL to power 800 million ChatGPT users

An inside look at how OpenAI scaled PostgreSQL to millions of queries per second using replicas, caching, rate limiting, and workload isolation.

512 views13:34

Engineer Readings

[ #database ]
Distributed transactions in MongoDB

https://www.mongodb.com/company/blog/engineering/formal-methods-beyond-correctness-isolation-permissiveness-distributed-transactions

428 views13:16

Engineer Readings

[research][google deepmind][llm][agents]
“AI agents are able to tackle increasingly complex tasks. To achieve more ambitious goals, AI agents need to be able to meaningfully decompose problems into manageable sub-components, and safely delegate their completion across to other AI agents and humans alike. Yet, existing task decomposition and delegation methods rely on simple heuristics, and are not able to dynamically adapt to environmental changes and robustly handle unexpected failures. Here we propose an adaptive framework for intelligent AI delegation - a sequence of decisions involving task allocation, that also incorporates transfer of authority, responsibility, accountability, clear specifications regarding roles and boundaries, clarity
of intent, and mechanisms for establishing trust between the two (or more) parties. The proposed framework is applicable to both human and AI delegators and delegatees in complex delegation networks, aiming to inform the development of protocols in the emerging agentic web.”

https://arxiv.org/pdf/2602.11865

388 views11:36

Engineer Readings

[llm][research]

“We show that large language models can deanonymize users at scale. With internet access, our agent can re-identify pseudonymous Hacker News and Anthropic Interviewer users with high precision—matching hours of human investigation.

In a closed-world setting, we build a scalable LLM pipeline that:
1. extracts identity clues from raw text,
2. finds candidate matches via semantic search, and
3. verifies matches to reduce false positives.

Unlike prior work requiring structured data, our method works directly on unstructured content across platforms.

Across three datasets (HN↔LinkedIn, Reddit↔Reddit communities, and split Reddit histories), LLM methods vastly outperform classical baselines—up to 68% recall at 90% precision vs. near 0% for non-LLM approaches.

Bottom line: pseudonymity online is far more fragile than assumed, and privacy threat models need updating.”

https://arxiv.org/pdf/2602.16800

👍1🤔1

447 views12:51

Engineer Readings

[ai][multi-behavior brain]

https://x.com/alexwg/status/2030217301929132323?s=46&t=eNN3Y-GKeBSlFyyj1ozvgg

X (formerly Twitter)

Dr. Alex Wissner-Gross (@alexwg) on X

The First Multi-Behavior Brain Upload

309 views07:30

Engineer Readings

[ai][multi-behavior brain] https://x.com/alexwg/status/2030217301929132323?s=46&t=eNN3Y-GKeBSlFyyj1ozvgg

[ai][human brain cells]

https://x.com/joshkale/status/2030719536991805595?s=46&t=eNN3Y-GKeBSlFyyj1ozvgg

X (formerly Twitter)

Josh Kale (@JoshKale) on X

Scientists used 200,000 human brain cells to play DOOM. The cells had never seen a computer. They learned in real time. They got better.

Separately: an entire fruit fly brain, 139,000 neurons, 50 million connections, was copied into a laptop. It predicts…

🔥2

388 views07:42

Engineer Readings

[ai][computer integrated to transformer]

https://percepta.ai/blog/can-llms-be-computers

Percepta

Can LLMs Be Computers? | Percepta

We build a computer inside a transformer — executing arbitrary C programs for millions of steps with exponentially faster inference via 2D attention heads.

321 views19:56

Engineer Readings

[ai][layoff][paper]
“If AI displaces human workers faster than the economy can reabsorb them, it risks eroding the very consumer demand firms depend on. We show that knowing this is not enough for firms to stop it. In a competitive task-based model, demand externalities trap rational firms in an automation arms race, displacing workers well beyond what is collectively optimal. The resulting loss harms both workers and firm owners. More competition and “better” AI amplify the excess; wage adjustments and free entry cannot eliminate it. Neither can capital income taxes, worker equity participation, universal basic income, upskilling, or Coasian bargaining. Only a Pigouvian automation tax can. The results suggest that policy should address not only the aftermath of AI labor displacement but also the competitive incentives that drive it.
“

https://arxiv.org/html/2603.20617v1

200 views15:30

About

Blog

Apps

Platform