TechLead Bits

The Developer Skill Shift

New AI tools appear every day, new coding models become smarter and smarter, AI agents can automate more and more work. It seems impossible to keep up with everything and easy to get lost in all this diversity. And the common question is "Are we gonna be replaced by AI soon?"

In the last few months I attended multiple AI events, watched the latest talks from conferences, read a lot of blogs and discussions. So what I can say for sure is that the developer role is changing. And this change is not about a future where anyone can just vibe-code anything. No. What we give as an input for LLMs will be multiplied in the final result. Garbage in -> garbage out.

This means that developers must have much more professional skills than ever. And it's not the knowledge of particular framework or technology.

New paradigm prioritizes a different set of skills:
🔸 System Design: knowledge of architectural patterns, trade-offs and cost analysis.
🔸 System Thinking: ability to understand a system as a whole, analyze consequences from the changes, build the system of checks and balances for AI agents.
🔸 Critical Thinking: ability to challenge AI generated results.
🔸 Product Thinking: understanding the business context and real problem to solve, product roadmap and evolution.
🔸 Communication Skills: ability to clarify requirements, discuss architecture, clear and in detail explain our intensions to AI assistants, remember "garbage in -> garbage out."

Of course, all those skills are not really new. But previously we expected them from more senior positions like senior developers, architects and leads. Now they are expected from every engineer.

But the important thing is that we are still engineers. We still transform our knowledge and expertise into working products.

AI doesn't replace expertise. It gives us tools to produce better results faster, making strong engineers even stronger.

#engineering #ai

🔥6👍4❤3

303 views03:12

TechLead Bits

10 Tips for AI-Assisted Coding

10 Tips To Level Up Your AI-Assisted Coding is an absolutely wonderful talk from the latest NDC London conference. Alex Stensby shares practical recommendations on how to improve interaction with coding agents:

🔸 Context is King. Context is a limited resource, so we must actively manage it:
- Provide enough context for the task: role, file references, specs, schemas, logs, etc.
- Explicitly request the agent to ask additional questions if more information is needed. It helps to prevent hallucinations.
- Start a new context for each new feature to keep agent focused.
- Summarize the progress and save it to md file for future work.
🔸 Rules & Docs make all the difference. Actively document contracts, specifications, database schema, use AGENT.md to set the initial context of the project, build a library of reusable skills.
🔸 Make a plan. Always use plan mode. It forces the LLM not to go with the first suitable solution but to reflect on the task. Review the plan, give feedback on it, "make it your plan".
🔸 Break it down. Break complex tasks into smaller manageable pieces of work, keep track of them in md file or github issues, track the progress.
🔸 Pick the right model. Use a cheaper model for simpler tasks. But Alex said that he uses the latest Opus 4.x for all coding tasks now.
🔸 Use the tools. Turn repetitive prompts into skills or slash commands, use subagents to run work in parallel or focus on specific task types (e.g., architect, code reviewer, QA specialist)
🔸 Git everything & learn from rabbit holes. Ask AI to learn from and reflect on its mistakes, save results in md files and add it to the memory.
🔸 Power up with MCPs. Use MCPs carefully, especially with databases or other resources where AI can make dangerous changes, better to give it read-only access only.
🔸 Release the agents. Use multiple agents to solve different tasks, orchestrate them, fork their context to explore other options, send them for remote execution when needed.
🔸 Be the human in the loop. Agents make mistakes. That's a fact. So we must review and verify work results.

I definitely recommend watching the full video, as it contains not only common tips for using AI agents but it also has a lot of practical examples of Claude Code usage.

#engineering #ai

YouTube

10 Tips To Level Up Your AI-Assisted Coding - Aleksander Stensby - NDC London 2026

This talk was recorded at NDC London in London, England. #ndclondon #ndcconferences #developer #softwaredeveloper

Attend the next NDC conference near you:
https://ndcconferences.com
https://ndclondon.com/

Subscribe to our YouTube channel and learn…

🔥3👍1

280 views03:43

TechLead Bits

AI-Generated Architecture Diagrams

If you've ever built architecture diagrams, you know how time-consuming it can be.
I use drawio and sometimes I spend hours aligning elements to make the picture compact, clear and not overloaded with elements and connections between them (hello to my perfectionism).

The great news is that AI now can help with this task also using Drawio MCP. Initially I was quite skeptical. Anyone who has tried to generate a diagram by architecture description will understand me 🙂.

Surprisingly, the result is good. Not perfect, but really good.

I tested it with creating new diagrams and modifying existing ones according to some template. New diagrams are a bit clumsy, so you need to clean them up. Modification according to the template shows better results, it aligned around 10 different diagrams to the same template within minutes.

Of course as with any AI agent task you need to tune the output, explain mistakes and save lessons learnt to the memory. But finally I got desired result much faster then doing that on my own.

#engineering #ai #tips

🔥6❤2

274 views03:21

TechLead Bits

Claude Code: Behind the Scenes

Claude Code is one of the most popular developer tools today. But did you know that initially it was just a side project inside the company?

This and many other interesting details are discussed in an interview with Boris Cherny, the creator and head of Claude Code at Anthropic.

Key insights from the interview:
🔸 Boris built Claude Code as a bash chat-based tool when he was learning public Anthropic APIs and how people use the model. It quickly became popular among other employees, that led to a future public success.
🔸 Claude writes ~80% of the code at Anthropic on average.
🔸 Boris ships 20-30 PRs a day by running 5 parallel Claude instances. He starts with a plan mode, iterates over the plan, then let the agent do the implementation. Since the Opus 4.5 release, Claude writes 100% of his code.
🔸 There is no "right way" to use Claude Code. According to Boris, "The way we build cloud code is to be hackable because we know every engineer's workflow is different. There's no one way to do things. There's no two engineers that have the same workflow."
🔸 The Claude team doesn't write Product Requirement Documents (specs). They just build dozens of working prototypes before shipping a feature.
🔸 Claude Code reviews every pull request at Anthropic and it catches ~80% of bugs. There are 2 rounds of review: the first is performed by AI agent, the second is always done by human who finally approves the changes.
🔸 Claude Cowork is intended to provide Claude capabilities for non-engineers. The tool was built in ~10 days and the main engineering complexity here is about safety: building classifiers, a shipping VM, OS-level protections against accidental file deletion, and rethinking the permission model for non-technical users.
🔸 There are no technical grades inside Anthropic. Everyone has the same title "Member of Technical Staff", that highlights the assumption that everyone can do everything: product, design, infrastructure, research.

During the interview Boris repeated multiple times that it's more important to ship fast and get user feedback early than to wait and deliver a fully featured product. That's why they built their engineering culture around prototyping. I think it's one of the reasons Anthropic products are so successful.

By the way, if you haven't tried Claude Code yet I highly recommend to do so. And as a starting point you can use Claude Code In Action Anthropic course.

#ai #engineering #usecase

YouTube

Building Claude Code with Boris Cherny

Boris Cherny is the creator and Head of Claude Code at Anthropic. He previously spent five years at Meta as a Principal Engineer and is the author of the book Programming TypeScript.

In this episode of Pragmatic Engineer, we went through how Claude Code…

❤3👍2

247 views03:39

TechLead Bits

Uber: Agent-Centric Organization

Looking for inspiration on how to apply AI in your team? Then check Uber: Leading engineering through an agentic shift where the Dev Platform team presents Uber's AI adoption approach and its current state.

Uber's agent platform includes:
🔸 MCP Gateway & Registry. Central MCP gateway to expose external and internal MCPs and provide a secure sandbox for experiments.
🔸 ML Michelangelo platform. An agent builder with no code or SDK solutions with built-in visualization, telemetry, tracing.
🔸 AIFX. A tool to access internal agents infrastructure: provisioning, discovery, configuration, background tasks.
🔸 Minion. Background agent platform to integrate them with CI\CD, slack, PRs.
🔸 Code Inbox. Unified inbox for PRs developers need to review. It tries to find the most relevant person to review the code, track review SLOs, help reassign PRs or make escalation if necessary.
🔸 uReview. Review pipeline that enriched with internal context, best practices, guidelines.
🔸 Autocover. A system to generate unit tests. 3x higher code test quality than generated by generic agent, 5000+ tests generated per month.
🔸 Automigrate. A tool to implement large-scale changes. It consists of problem identifier, code transformer (openwrite, piranha or agents), validation, and campaign manager (route PRs to reviewers, split changes on reasonable PRs, rebases, changes prioritization)

As you can see the team uses AI to support the entire development lifecycle. The whole strategy sounds like "Enable Uber engineers to focus on creative work by eliminating toil". By toil they mean upgrades, migration, bugfixes, writing docs, cleanup.

To sum up, Uber has a very reasonable strategy for AI adoption. Of course, they made a huge investment to their agent platform. But one simple recipe suits everyone: define the most tedious repeatable tasks and give it to AI. People are burnt out, agents are not.

#ai #engineering #usecase

YouTube

Uber: Leading engineering through an agentic shift - The Pragmatic Summit

With Ty Smith and Anshu Chada, Uber Dev Platform. At The Pragmatic Summit: www.pragmaticsummit.com

Update on 11 March: the Uber team shared updated numbers as of March 2026:

- 84% of devs at Uber are agentic coding users (either using CLI-based agents or…

🔥6

283 views04:47

TechLead Bits

Are You Ready for Coding Agents?

Coding agents are booming. Only the lazy haven’t yet talked about how they built their own agent. But the reality is much more complex. To get the benefits from AI your development ecosystem should be ready for it.

Garbage in -> garbage out, remember?
Low coverage, flaky tests, undefined code style, long verification cycle, poor documentation. Add AI-generated code to this and you will increase the entropy and reduce overall system stability.

To produce predictable results the engineering infrastructure must be stable.

By infrastructure I mean:
🔸 Linters and automated code style verification.
🔸 High unit tests coverage (>=80%).
🔸 Contract tests for all public APIs.
🔸 System, integration, and E2E tests that run at least once a day.
🔸 No flakiness. You must fully trust your tests and CI process otherwise you cannot guarantee that agent won't break anything.
🔸 Security gates. Secret management, vulnerability checks, SAST verification.
🔸 Documentation. Requirements, architecture, guides, internal agreements. Everything that helps the agent understand how we work.

The most non-obvious part here is test flakiness.
What's the problem with just rerunning the test?
Developers know the context, the agents do not. It means that they will try to fix the test, making it weaker, or modify the code, introducing a bug. The overall result is worse code generation and increased maintenance overhead. So each rerun must be treated as a bug report, not a solution.

If you check how different companies adopt AI, you can notice that all success stories are based on existing powerful CI\CD processes that can safely check AI agent output (Google, Claude Code, Uber, Google, Airbnb).
AI adoption doesn't just bring new tools and processes but also forces the best engineering practices we already have.

#engineering #ci #ai #agents

🔥3

282 views03:19

TechLead Bits

The Minto Pyramid

Do you know how to structure your docs to make it readable?

The Minto Pyramid Principle is very famous concept to organize docs, presentations and your own thoughts. I met it many times in different resources, so I decided to read the book as an initial source of truth.

Surprisingly, the book was difficult for me, mostly because of academical language and a lot of samples from economic and marketing area. It took me 4 months to finish it and 3 more months to prepare its overview 😃.

The idea is simple:
🔸 Put the main message at the top. It can be a key point, idea, or even a question.
🔸 Then add supporting arguments. Keep them consistent, arguments should of the same type and detail level.
🔸 On the next level add facts and details that back those arguments.
🔸 Each level should summarize what’s below it.
🔸 You can go through pyramid in both directions: from top to down (presentations, explanation) or vice versa (research).
🔸 Use either induction or deduction when moving between arguments, but don’t mix both.

The overall concept looks obvious. It even reminds me of a math logic course at the university.

But in reality I've seen a lot of unstructured, difficult-to-follow documents. And good structure is important not only for humans, but now it's even more important for AI agents. That’s where this book can help. It doesn't just teach how to write readable docs, it teaches you how to organize your thoughts and ideas so others can easily understand you.

#booknook #softskills #communications #documentation

❤3🔥3

260 views04:17

TechLead Bits

Tech Blogs Reading List

Any technical leader or architect needs to stay on top of the industry trends and develop a broad perspective on architectural solutions and engineering practices.

For that, I read blogs from big tech companies. They give me a sense of what’s going on, show real-world architecture examples, inspire with new ideas that I can try with my teams.

List of blogs:
- https://www.uber.com/en-IN/blog/engineering
- https://medium.com/airbnb-engineering
- https://engineering.fb.com/
- https://www.linkedin.com/blog/engineering
- https://netflixtechblog.com/
- https://medium.com/@Pinterest_Engineering
- https://engineering.atspotify.com/
- https://aws.amazon.com/blogs/architecture/
- https://github.blog/engineering/
- https://blog.booking.com/
- https://developers.openai.com/blog/
- https://research.google/blog/
- https://www.anthropic.com/engineering
- https://www.anthropic.com/research

I use Feedly to keep everything in one place and check it 1-2 times a week. Most of these blogs are available on free plan.

I’m always curious what others are reading. So if you have good resources, feel free to share in the comments 📚.

#tips #learning #engineering

🔥8👍4❤2

318 views04:42

TechLead Bits

Spec-Driven Development

Do you like frameworks? I'm quite skeptical about them.

They try to solve everything at once and end up adding complexity where it’s not really needed. But engineers love inventing frameworks. Vibecoding is no exception. A new family of frameworks is called spec-driven development (SDD).

The main idea is to write a "spec" before writing code with AI.
A spec is a structured description of WHAT should be done and WHY. In classic terms, a spec is like an interface, generated code is like an implementation.

Main principles:
🔸 Spec-first. A thoughtful spec is written first, reviewed and then used in a development workflow.
🔸 Spec-driven. The spec is kept in git repo and it’s used to evolve and maintain the feature.
🔸 Spec-sourced. Only the spec is edited by human, the code is edited by AI-agent.

Development workflow:

intention -> requirements -> design -> tasks -> implementation

Popular implementations:
- https://github.com/github/spec-kit/
- https://github.com/Fission-AI/OpenSpec
- https://kiro.dev/
- https://github.com/bmad-code-org/BMAD-METHOD

From my perspective it looks like an attempt to bring some control over vibecoding. But as a result the agent generates a bunch of markdown files to review, and I cannot say it's much easier than code review (if not harder, since LLMs tend to be verbose).

There’s no common opinion on SDD yet. I know people who like it, and people who don't. Like any AI tool, it needs experimentation and adaptation to your specific tasks.

P.S. Mentioned tools contain interesting prompts that can be reused without SDD itself.

#engineering #ai #sdd

❤3👍3🔥1

297 views02:58

TechLead Bits

Rethinking High Availability

Have you noticed how the world has changed? Again.

A year ago I used to say that in public clouds three availability zones are enough. Each zone is in its own data center, data centers are located within ~100 km of each other, and the probability of losing two at the same time was considered very low.

There is no such assumption anymore.
Today, losing two or even all three availability zones in a single region is no longer extremely rare. It’s something that can actually happen.

What does it mean in practice?
If your business requires system 99.5% availability or higher, relying on a single region is not an option.

Technically it means:
- Using multiple regions even in public clouds (or even different cloud providers).
- Switching to cold or hot standby setup.
- Storing backups in a different region.
- Regularly testing DR scenarios.

Disasters do happen.
The question is whether your system is ready or not.

#engineering #systemdesign #reliability

🔥6👍1

263 views04:52

TechLead Bits

SDD: OpenSpec

Last week I wrote about spec-driven development (SDD) as a new wave of frameworks for vibecoding. I tried a few of them. If you’re just exploring SDD, I’d suggest starting with OpenSpec.

Why?
For me, it felt the simplest and least overloaded.

To get started, you basically need 3 commands:

/opsx:propose
Here you describe what you want to build: bugfix, feature, design or something else (but using SDD for bugfix still feels like overkill).
What I liked:
- You get a structured design (design.md)
- It explains why decisions were made
- It adds open questions for clarification to think more
- You get tasks.md with detailed steps what will be done
At this stage I had to iterate a few times because some assumptions were wrong. But in the end I identified gaps in initial feature definition and got a clear plan of future changes. The good thing is that changes at this stage are cheap.

/opsx:apply
If you’re ok with the design, you can ask the agent to execute the plan. It's possible to execute all tasks at once, split them or run multiple agents for parallel execution. Implemented steps are marked as completed in tasks.md.

/opsx:validate
This is a control step. You can request the agent to validate that the implementation matches the design. Because… agents still drift and make mistakes.

Of course, OpenSpec contains other interesting commands, but you can add them later when you get used to SDD.

How is it different from Plan mode?
Plan mode mostly generates tasks.md. No design. No real spec.
SDD reframes the work into a design-first approach and adds additional prompts to do it well.
And honestly, I liked the preparation phase result much more.

I wouldn’t use it everywhere, but for refactoring or feature development it looks really good.

#engineering #ai #sdd

👍3✍2🔥2

335 views04:22

TechLead Bits

Building AI-Powered Team

AI adoption is one of the biggest challenges and at the same time one of the biggest opportunities for business.
Some teams report significant productivity boost. Others say: “AI might be useful for some tasks.”

So what’s the difference?
AI adoption is not about access to the tools. Just buying licenses and giving them to engineers doesn't work. The team need to rethink how they work and integrate these tools into their daily workflow. And that’s already classic change management task.

On this topic, I recently came across the GitHub Internal Playbook for building an AI-powered workforce. They highlight that AI adoption is not really a technical problem, it’s a human one.

GitHub suggests 8 pillars to drive adoption at the organization level:
- AI advocates. Internal champions who scale adoption through peer-to-peer influence and feedback.
- Clear policies. Defines rules for using AI.
- Learning & development. External and internal training and education.
- Metrics. Track adoption, engagement, and business impact.
- Ownership. A central owner who orchestrates the program and drives the overall strategy.
- Executive support. Visible leadership commitment and strategic vision.
- Right tools. Different tools for different roles.
- Communities. Peer-to-peer learning, knowledge sharing, and collaborative problem-solving.

And in reality, the key part here is the people on the ground, the experts who drive the change, adapt the tools to real tasks, and teach others. This is also covered in more detail in the companion article Activating your internal AI champions.

You can’t roll out AI top-down.
You can’t standardize it with one template for everyone. Every team has its own context. Without understanding it, any “unified approach” will fail.

#leadership #ai

👍8

290 views04:57

TechLead Bits

Agent Harness

Harness is a new buzzword introduced by modern AI.
Let's check what it is and why it matters.

The term harness refers to the logic around LLM that controls and guides how an agent operates. It's not the agent itself but the tools and guardrails that help it achieve better results.

A harness typically includes:
🔸 System prompts
🔸 Tools, skills, MCPs and their descriptions
🔸 State & memory (current task state, past runs, intermediate states)
🔸 Planning & task decomposition
🔸 Context engineering strategies
🔸 Safety & guardrails (allowed tools, rate limiting, prompt injection protection)
🔸 Bundled infrastructure (filesystem, sandbox, browser)
🔸 Subagent orchestration logic
🔸 Hooks/middleware for deterministic execution (compaction, continuation, lint checks)

Well-known examples of harness ecosystems include Claude Code, Cursor, LangChain.

The overall trend is that each model provider now builds and promotes its own harness. But because each provider uses different system prompts, model tuning techniques and context management strategies, the same model in different ecosystems will produce different results.

So the same model does not mean the same agent. And the real competition is no longer between models. It’s between harnesses fighting for your workflow and your budget.

#ai #engineering

❤4👍3

245 views04:46

TechLead Bits

CliftonStrengths 34

I recently passed the CliftonStrengths 34 assessment, so today I will share what it is and how it can be useful for your career.

CliftonStrengths is a framework that identifies your natural talents that help create value at work.
It was launched by Gallup in 2001, and since then more than 26 million people have taken it.
So it's based on real data and many years of consulting experience.

The main idea is that we should focus on our strengths to achieve results and not try to improve our weaknesses.

How it works:
- 200 questions
- 4 strength domains: executing, influencing, relationship building, and strategic thinking.
- 34 strength areas within those domains.
- All 34 themes are ranked in a personal order from the strongest to the weakest.
- Top-10 are our main talents to focus on.

The interesting part is that every strength has both a positive and a negative side. It can help you succeed or hold you back.
For example: Learner. The person with this strength quickly picks up new topics, constantly extend their knowledge. But it's easy to get stuck in a “forever student” mode.

The test is really helpful from self-reflection perspective:
🔸 Once you know your strengths, you can rely on good parts and mitigate the downsides.
🔸 The less you do the work that isn’t natural to you, the more productive and energized you are.
🔸 We assume others think and work like we do. But they don't. And this is our advantage to use.

What it gave to me? First of all, I realized that I really do the work I'm naturally good at (hello, imposter syndrome). Second, I started noticing my strengths in real situations and using them more consciously.

So the assessment is a helpful tool to understand what to focus on to achieve better results. And it’s a good starting point to rethink your day-to-day activities and align it more with what you enjoy.

#softskills #leadership #productivity

❤6🔥4

254 views05:22

TechLead Bits

Inside the Context Window

What makes your work with agents efficient? Chosen model? Harness? Instructions clarity?
I would say that first of all it's the quality of the context you provide.

Context is everything the model sees before it generates a response.
Two facts to know about the context:
1. It's limited (and costs you money 💰 ).
2. The longer the context, the worse the results.

So context engineering is a set of practices to fill the context with just enough information to get the desired results. The main goal is to balance the amount of context given: not too little and vague, not too much and detailed.

The first step in context engineering is to understand what the context actually contains. And it’s not just your prompt.

Typical context structure:
🔸 System prompts & instructions: the hidden layer of system prompts, safety policies, behavioral rules, role definition. Usually it's part of the harness and you cannot change it.
🔸 Project context: AGENT.md\CLAUDE.md, repo structure, settings. It's added as a first prompt to any session you open with the agent.
🔸 Available tools: skill descriptions, MCPs, available CLIs.
🔸 Retrieved information: loaded files, data from RAG system.
🔸 State & history: The current conversation, including user, model and tools responses.
🔸 Reasoning: intermediate reasoning results (thinking mode).
🔸 Long-term memory: knowledge base from previous conversations like user preferences, summaries of working sessions, facts the agent was asked to remember for future use.
🔸 Your prompt: the actual user request.

As you can see, the context is already filled with a lot of information before you even start the real work. To make agents efficient, keep their context clean and focused. Don't overload it with unnecessary information.

#ai #engineering

🔥4

253 views05:22

TechLead Bits

ReasoningBank

Currently AI agents have one major limitation: they cannot learn. I mean they don't learn from their experience or from the results of completed tasks. Once the model is trained, all we can do is to tune our prompts or enrich results with domain data from RAG.

Researchers from Google started exploring how to overcome this limitation and introduced the concept called ReasoningBank.

The overall idea is simple:
1. The agent writes down the result of successful or failed tasks into a dedicated md file.
2. During task execution, the agent searches the ReasoningBank and pulls relevant memories into the context.
3. Then it uses an LLM-as-a-judge approach to self-evaluate the result, analyze the trajectory of reasoning, and extract success insights or failure reasons.

Each file has the following structure (very similar to skills):
- Title: identifier of the core strategy.
- Description: short summary of the memory item.
- Content: reasoning steps, decision explanation, or operational insights extracted from past experience.

To be honest, benchmark results compared to other agent memory approaches do not look extremely impressive:

ReasoningBank without scaling outperformed memory-free agents by 8.3% on WebArena and 4.6% on SWE-Bench-Verified.

At the same time, this approach adds even more data to the context. And context, as we know, directly affects both model behavior quality and usage cost.

The official paper contains interesting research details, including particular prompts and measurements.

From my perspective, the idea and its implementation are very similar to skills or other long-term agent memories (e.g. in Claude Code). But the overall direction of making agents capable of learning from their own experience looks really promising.

#ai #engineering #news

👍3❤2

250 views05:57

TechLead Bits

AI Engineering

I strongly believe that if you want to use any technology effectively, you need to understand how it works under the hood. Especially in software engineering.

So if you haven’t looked into LLM internals yet, I’d highly recommend reading AI Engineering by Chip Huyen. The book was published in December 2024. And as AI is moving extremely fast, you might think it’s already outdated. Yes and no.

The book focuses on fundamentals. And they don’t really change that fast. You won’t find hype topics like skills, harnesses, or agents orchestration there. But for building structured understanding of how AI works, you don't actually need them.

What I personally found useful:
🔸 Core LLM concepts: tokenization, training and post-training processes, datasets preparation. This part is very similar to Mashing Learning Crash Course from Google.
🔸 Model evaluation: quite complex but interesting topic about model output results and their comparison. The book covers ranking, model specialization, public benchmarks and AI-as-a-judge approach.
🔸 Prompt engineering: good reference about context and prompting. Additionally, the author described different security aspects of using prompts, that part really extended my thoughts about what can go wrong.
🔸 Finetuning: a deep dive into different ways to optimize models. You need to be a good mathematician to understand this part. So I was really glad I'm not an ML engineer 😃 (huge respect to all ML experts, it's really hard).
🔸 User feedback: basic patterns on how to collect feedback, what to measure and why, common pitfalls.

To sum up, this book is really great to structure your knowledge about modern AI systems. Once you have that foundation, it becomes much easier to navigate all the new tools, patterns and paradigms that appear almost every month.

#booknook #ai #engineering

🔥4👍3

228 viewsedited 05:02

TechLead Bits

Ralph Loop

Underterministic nature of AI sometimes produces very interesting engineering solutions. One such example is Ralph (or Ralph Wiggum) Loop.

This is an AI coding pattern inspired by The Simpsons character Ralph Wiggum, known for saying weird things with high confidence 🙃.

The idea is simple: The agent can be dumb in a single iteration. But if it keeps retrying with feedback long enough, it eventually converges.

The loop steps:

Start a new agent\subagent -> load task + memory -> execute 1 selected task -> run validation -> save learnings -> commit progress -> repeat

The loops finishes when all tasks have passes:true or it reaches the maximum number of iterations (default is 10).

But the real value of the technique is not in retries, it's in context engineering strategy under the hood:
🔸 One loop executes only one task. It keeps agent focused.
🔸 Each iteration starts a new agent session. State lives outside the context keeping it clean between iterations. State is stored in git history, progress.txt and long-term-memory files.
🔸 Tasks are delegated to subagents. The main context is not polluted with task execution details and validations.
🔸 AGENTS.md is updated on each iteration. It is a live artifact that contains discovered patterns, learnings and conventions so future iterations can benefit from those findings and do not repeat previous mistakes.
🔸 AGENTS.md contains explicit validations for feedback loop. It usually defines linters and typechecks, build and test execution commands.

Ralph Loop is a really powerful pattern to get things done: it just repeats the task until it succeeds making agent execution more reliable. "Deterministically bad" but effective.

But this approach only works if you have good task decomposition, clear completion criteria, and mature SDLC practices with strong validations and feedback loops. Otherwise the agent will generate just a ton of mess.

#ai #engineering #patterns

Geoffrey Huntley

Ralph Wiggum as a "software engineer"

How Ralph Wiggum went from 'The Simpsons' to the biggest name in AI right now - Venture Beat

😎Here's a cool little field report from a Y Combinator hackathon event where they put Ralph Wiggum to the test.

"We Put a Coding Agent in a While Loop and It Shipped

🔥4

212 views04:20

TechLead Bits

Skill Packaging

Engineering teams are actively building internal collections of skills for agents: code review, troubleshooting, design preparation, onboarding, security practices.

And it looks great until you hit the question: how do you distribute those skills across dozens of teams and multiple harnesses? For Claude you need to put skills into .claude, for Cursor into .cursor, for Gemini into .gemini, etc. And things become even messier when you need to roll out updates.

To solve this problem big companies mostly build their own in-house solutions. Smaller companies usually just copy files from some shared repository and manage this complexity manually.

I don’t like reinventing the wheel, so when my team faced the same problem, we started looking for an existing solution we could reuse. And the only actively maintained tool we managed to find was apm by Microsoft.

APM is a package manager for prompts, skills, and MCPs. In other words, it’s maven or gomod for agents.

APM package structure:

my-package/
├── apm.yml                         
└── .apm/
    ├── instructions/
    │   └── my.instructions.md       
    ├── skills/
    │   └── my-skill/                
    │       └── SKILL.md  
    ├── agents/                      
    └── prompts/

To install the package in a target repo you need to define apm.yaml with a list of required dependencies:

name: my-projecty
version: 1.0.0
targets:
  - claude
  - copilot
dependencies:
  apm:
  - <git-address>/my-package
  - <git-address>/another-package
  mcp: []

After that you just run:

apm install

and the required skills will be installed into the corresponding harness folders (.claude, .copilot, etc.).

The tool works with Github and on-prem git installations like Gitlab.

APM is not perfect. It had some unpleasant (but not critical) issues, and sometimes you can really feel that it was heavily vibe-coded in Python .

But despite all that, the tool actually works: you have a spec to define your skills\prompts packages, distribute and update them with simple apm update. And on top of apm dependencies format it’s pretty easy to vibe-code your own internal skills marketplace.

#ai #engineering #agents

GitHub

GitHub - microsoft/apm: Agent Package Manager

Agent Package Manager. Contribute to microsoft/apm development by creating an account on GitHub.

👍4❤3

205 views05:22

TechLead Bits

The Fearless Organization

The most dangerous teams are the quiet teams. There are no disagreements, no bad news, no conflicts. Looks like harmony until the real incident.

That's the topic of the book The Fearless Organization: Creating Psychological Safety in the Workplace for Learning, Innovation, and Growth by Amy C. Edmondson.

Amy is a professor of Leadership and Management at the Harvard Business School. She has studied the phenomenon of psychological safety and its impact on team performance for many years across different organizations.

She defines psychological safety as follows:

a belief that one will not be punished or humiliated for speaking up with ideas, questions, concerns, or mistakes, and that the team is safe for inter-personal risk taking.

What it means in practice:
- people are not afraid to ask questions
- they do not hide problems
- they are not afraid to look stupid
- they do not avoid conflicts
- they can freely express their opinions and bring suggestions

Why does it matter?
There is a good example from the book that explains that. Imagine a doctor prescribes treatment for a child. A nurse notices that doctors usually prescribe drug A in such cases, but this time it is missing.
In a team with high psychological safety, the nurse will clarify this with the doctor and may help prevent a medical error.
In a team with low psychological safety, she may be afraid to ask. And the consequences can be dramatic.

The core idea is simple. But the book contains a lot of real stories where a low level of psychological safety leads to dramatic results (e.g. Volkswagen emission scandal, pilot mistakes that caused plane crashes). The author repeatedly highlights that the more complex and critical the profession is, the more important psychological safety becomes.

How does this relate to our daily work?
We as leaders are responsible for the psychological climate in the team: how well we listen to people, accept different opinions, react to questions, mistakes, or bad news. It's our daily routine that either helps the team become more effective, or leads people to hide problems and the real state of things.

Overall, I really liked the book. It explains the idea in simple language with many real examples. And what is important for me, all arguments and recommendations are supported by sociological research, experiments and practical psychology.
So psychological safety is not just an idea. It is a proven behavioral model and set of practices that can actually help leaders build better teams.

P.S. One of the best real examples of psychological safety is Pixar. I wrote about it earlier in overview of Creativity, Inc.: Overcoming the Unseen Forces That Stand in the Way of True Inspiration: parts 1,2,3.

#booknook #softskills #leadership

👍3

218 views04:09

TechLead Bits

Agent Readiness Framework

A few weeks ago I wrote that adopting coding agents requires strong engineering practices.
Test stability, linting, documentation, security controls matter much more than a particular harness or model.

Agent Readiness framework is an attempt to formalize these criteria for a particular repository and define how much autonomy can be safely delegated to agents.

The framework evaluates repos across 8 dimensions:
- Style & Validation
- Build System
- Testing
- Documentation
- Dev Environment
- Code Quality
- Observability
- Security & Governance

Based on these dimensions, the framework defines 5 levels of repo maturity:
🔸 Level 1: Functional. Basic checks: README, linters, unit tests.
🔸 Level 2: Documented. Detailed documentation and basic automations: AGENT.md, reproducible dev env, contribution guides.
🔸 Level 3: Standardized. E2E tests, observability, security scanning, maintained documentation.
🔸 Level 4: Optimized. Fast validation loops, canary deployments, build optimization. Process is optimized for fast feedback.
🔸 Level 5: Autonomous. Task decomposition, multi-service orchestration, self-healing logic, auto-remediation.

The idea is simple: the higher the maturity level, the more predictable and reliable agent results. But looking at these levels, I can see that most repos are actually somewhere between Level 1 and Level 3.

Framework authors also provide a tool to automatically measure these criteria and maturity level, but it's available only after registration and using proprietary APIs. Scanned examples you can find at https://factory.ai/agent-readiness.

There is also an open-source alternative https://github.com/kodustech/agent-readiness. The project doesn't look active, but it gets the job done. It analyzes the repo and generates a report with the overall maturity level, findings for each dimension, and suggestions for improvements. Some rules are not very accurate. Looks like the project was mainly designed for python and js code verification. But anyway the tool gives you a good sense of what to pay attention to in your codebase.

What I like about this framework is that it shows that agent effectiveness is actually limited by the maturity of engineering practices. And it provides measurable and actionable results, that are easy to convert into an improvement plan for a particular repo.

#ai #engineering

👍3🔥3

204 views02:51

About

Blog

Apps

Platform