Engineer Readings
521 subscribers
6 photos
3 files
673 links
Download Telegram
[#AI #AICoding #AIAgents #AutonomousAgents #MultiAgentSystems #Cursor]

Scaling AI Coding: Lessons From Running Hundreds of Agents


Cursor shares how they pushed the limits of AI by running hundreds of autonomous coding agents at the same time on real software projects.

Instead of short tasks, these agents worked for weeks, edited shared codebases, and even helped build complex products like a web browser.

The biggest lesson?
Uncoordinated agents create chaos — but a planner + worker system keeps them aligned, focused, and productive over long periods.

The article shows that with the right structure, AI teams can tackle massive engineering challenges, similar to real human teams — and we’re just getting started.

🔗 Read more: https://cursor.com/blog/scaling-agents
🔥2
[#AI #context #Manus]

Context Engineering for AI Agents: Lessons from Building Manus

The Manus team shares key insights from building their AI agent system, focusing on context engineering rather than training custom models. The article covers critical strategies like designing around KV-cache for better performance, using the filesystem as unlimited context storage, and keeping error traces to help agents learn from mistakes.
Key takeaways: maximize cache hit rates by keeping prompts stable, mask tools instead of removing them to maintain context integrity, and leverage the filesystem for persistent memory beyond token limits.

🔗 Read more: https://manus.im/de/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus
[#AI #MachineLearning #LLM #AIInference #Hardware #Groq #LPU #AIAccelerators #DeepLearning #TechInnovation #ComputerArchitecture #AIHardware]

How Groq's LPU Achieves Blazing AI Inference Speed

Ever wondered how Groq runs a 1-trillion-parameter model like Kimi K2 in real-time? Their Language Processing Unit (LPU) is rewriting the rules of AI inference.

Key Innovations:
TruePoint Numerics – Strategic precision where it matters. 100 bits of intermediate accumulation enable 2-4× speedup over BF16 with zero accuracy loss. FP32 for critical operations, FP8 for error-tolerant layers.

SRAM-First Architecture – Hundreds of megabytes of on-chip SRAM as primary storage (not cache). Traditional GPUs suffer from HBM latency (hundreds of nanoseconds); LPU eliminates the wait with instant weight access.

Static Scheduling – The compiler pre-computes the entire execution graph down to individual clock cycles. No cache coherency protocols, no runtime delays. Deterministic execution enables tensor parallelism without tail latency.

Tensor Parallelism – Unlike GPUs that scale throughput via data parallelism, LPUs distribute single operations across chips to reduce latency. This is why trillion-parameter models generate tokens in real-time.

RealScale Interconnect – Plesiosynchronous chip-to-chip protocol aligns hundreds of LPUs to act as a single core. The compiler schedules both compute AND network timing.

The Results? First-gen LPU on 14nm process delivers 40× performance improvements. MMLU benchmarks show strong accuracy with no quality degradation.
Groq isn't optimizing around the edges—they rebuilt inference from the ground up for speed, scale, and efficiency.

🔗 Read the full technical breakdown: https://groq.com/blog/inside-the-lpu-deconstructing-groq-speed
[#llm #debugging]
Interesting to observe such articles and stories where engineers are really into debugging some complex problems and not using hype around “llm can do everything for me”

https://mistral.ai/news/debugging-memory-leak-in-vllm
[research][google deepmind][llm][agents]
“AI agents are able to tackle increasingly complex tasks. To achieve more ambitious goals, AI agents need to be able to meaningfully decompose problems into manageable sub-components, and safely delegate their completion across to other AI agents and humans alike. Yet, existing task decomposition and delegation methods rely on simple heuristics, and are not able to dynamically adapt to environmental changes and robustly handle unexpected failures. Here we propose an adaptive framework for intelligent AI delegation - a sequence of decisions involving task allocation, that also incorporates transfer of authority, responsibility, accountability, clear specifications regarding roles and boundaries, clarity
of intent, and mechanisms for establishing trust between the two (or more) parties. The proposed framework is applicable to both human and AI delegators and delegatees in complex delegation networks, aiming to inform the development of protocols in the emerging agentic web.”

https://arxiv.org/pdf/2602.11865
[llm][research]

“We show that large language models can deanonymize users at scale. With internet access, our agent can re-identify pseudonymous Hacker News and Anthropic Interviewer users with high precision—matching hours of human investigation.

In a closed-world setting, we build a scalable LLM pipeline that:
1. extracts identity clues from raw text,
2. finds candidate matches via semantic search, and
3. verifies matches to reduce false positives.

Unlike prior work requiring structured data, our method works directly on unstructured content across platforms.

Across three datasets (HNLinkedIn, RedditReddit communities, and split Reddit histories), LLM methods vastly outperform classical baselines—up to 68% recall at 90% precision vs. near 0% for non-LLM approaches.

Bottom line: pseudonymity online is far more fragile than assumed, and privacy threat models need updating.”

https://arxiv.org/pdf/2602.16800
👍1🤔1
[ai][layoff][paper]
“If AI displaces human workers faster than the economy can reabsorb them, it risks eroding the very consumer demand firms depend on. We show that knowing this is not enough for firms to stop it. In a competitive task-based model, demand externalities trap rational firms in an automation arms race, displacing workers well beyond what is collectively optimal. The resulting loss harms both workers and firm owners. More competition and “better” AI amplify the excess; wage adjustments and free entry cannot eliminate it. Neither can capital income taxes, worker equity participation, universal basic income, upskilling, or Coasian bargaining. Only a Pigouvian automation tax can. The results suggest that policy should address not only the aftermath of AI labor displacement but also the competitive incentives that drive it.


https://arxiv.org/html/2603.20617v1