The Developer Skill Shift
New AI tools appear every day, new coding models become smarter and smarter, AI agents can automate more and more work. It seems impossible to keep up with everything and easy to get lost in all this diversity. And the common question is "Are we gonna be replaced by AI soon?"
In the last few months I attended multiple AI events, watched the latest talks from conferences, read a lot of blogs and discussions. So what I can say for sure is that the developer role is changing. And this change is not about a future where anyone can just vibe-code anything. No. What we give as an input for LLMs will be multiplied in the final result. Garbage in -> garbage out.
This means that developers must have much more professional skills than ever. And it's not the knowledge of particular framework or technology.
New paradigm prioritizes a different set of skills:
๐ธ System Design: knowledge of architectural patterns, trade-offs and cost analysis.
๐ธ System Thinking: ability to understand a system as a whole, analyze consequences from the changes, build the system of checks and balances for AI agents.
๐ธ Critical Thinking: ability to challenge AI generated results.
๐ธ Product Thinking: understanding the business context and real problem to solve, product roadmap and evolution.
๐ธ Communication Skills: ability to clarify requirements, discuss architecture, clear and in detail explain our intensions to AI assistants, remember "garbage in -> garbage out."
Of course, all those skills are not really new. But previously we expected them from more senior positions like senior developers, architects and leads. Now they are expected from every engineer.
But the important thing is that we are still engineers. We still transform our knowledge and expertise into working products.
AI doesn't replace expertise. It gives us tools to produce better results faster, making strong engineers even stronger.
#engineering #ai
New AI tools appear every day, new coding models become smarter and smarter, AI agents can automate more and more work. It seems impossible to keep up with everything and easy to get lost in all this diversity. And the common question is "Are we gonna be replaced by AI soon?"
In the last few months I attended multiple AI events, watched the latest talks from conferences, read a lot of blogs and discussions. So what I can say for sure is that the developer role is changing. And this change is not about a future where anyone can just vibe-code anything. No. What we give as an input for LLMs will be multiplied in the final result. Garbage in -> garbage out.
This means that developers must have much more professional skills than ever. And it's not the knowledge of particular framework or technology.
New paradigm prioritizes a different set of skills:
๐ธ System Design: knowledge of architectural patterns, trade-offs and cost analysis.
๐ธ System Thinking: ability to understand a system as a whole, analyze consequences from the changes, build the system of checks and balances for AI agents.
๐ธ Critical Thinking: ability to challenge AI generated results.
๐ธ Product Thinking: understanding the business context and real problem to solve, product roadmap and evolution.
๐ธ Communication Skills: ability to clarify requirements, discuss architecture, clear and in detail explain our intensions to AI assistants, remember "garbage in -> garbage out."
Of course, all those skills are not really new. But previously we expected them from more senior positions like senior developers, architects and leads. Now they are expected from every engineer.
But the important thing is that we are still engineers. We still transform our knowledge and expertise into working products.
AI doesn't replace expertise. It gives us tools to produce better results faster, making strong engineers even stronger.
#engineering #ai
๐ฅ6๐4โค3
10 Tips for AI-Assisted Coding
10 Tips To Level Up Your AI-Assisted Coding is an absolutely wonderful talk from the latest NDC London conference. Alex Stensby shares practical recommendations on how to improve interaction with coding agents:
๐ธ Context is King. Context is a limited resource, so we must actively manage it:
- Provide enough context for the task: role, file references, specs, schemas, logs, etc.
- Explicitly request the agent to ask additional questions if more information is needed. It helps to prevent hallucinations.
- Start a new context for each new feature to keep agent focused.
- Summarize the progress and save it to md file for future work.
๐ธ Rules & Docs make all the difference. Actively document contracts, specifications, database schema, use AGENT.md to set the initial context of the project, build a library of reusable skills.
๐ธ Make a plan. Always use plan mode. It forces the LLM not to go with the first suitable solution but to reflect on the task. Review the plan, give feedback on it, "make it your plan".
๐ธ Break it down. Break complex tasks into smaller manageable pieces of work, keep track of them in md file or github issues, track the progress.
๐ธ Pick the right model. Use a cheaper model for simpler tasks. But Alex said that he uses the latest Opus 4.x for all coding tasks now.
๐ธ Use the tools. Turn repetitive prompts into skills or slash commands, use subagents to run work in parallel or focus on specific task types (e.g., architect, code reviewer, QA specialist)
๐ธ Git everything & learn from rabbit holes. Ask AI to learn from and reflect on its mistakes, save results in md files and add it to the memory.
๐ธ Power up with MCPs. Use MCPs carefully, especially with databases or other resources where AI can make dangerous changes, better to give it read-only access only.
๐ธ Release the agents. Use multiple agents to solve different tasks, orchestrate them, fork their context to explore other options, send them for remote execution when needed.
๐ธ Be the human in the loop. Agents make mistakes. That's a fact. So we must review and verify work results.
I definitely recommend watching the full video, as it contains not only common tips for using AI agents but it also has a lot of practical examples of Claude Code usage.
#engineering #ai
10 Tips To Level Up Your AI-Assisted Coding is an absolutely wonderful talk from the latest NDC London conference. Alex Stensby shares practical recommendations on how to improve interaction with coding agents:
๐ธ Context is King. Context is a limited resource, so we must actively manage it:
- Provide enough context for the task: role, file references, specs, schemas, logs, etc.
- Explicitly request the agent to ask additional questions if more information is needed. It helps to prevent hallucinations.
- Start a new context for each new feature to keep agent focused.
- Summarize the progress and save it to md file for future work.
๐ธ Rules & Docs make all the difference. Actively document contracts, specifications, database schema, use AGENT.md to set the initial context of the project, build a library of reusable skills.
๐ธ Make a plan. Always use plan mode. It forces the LLM not to go with the first suitable solution but to reflect on the task. Review the plan, give feedback on it, "make it your plan".
๐ธ Break it down. Break complex tasks into smaller manageable pieces of work, keep track of them in md file or github issues, track the progress.
๐ธ Pick the right model. Use a cheaper model for simpler tasks. But Alex said that he uses the latest Opus 4.x for all coding tasks now.
๐ธ Use the tools. Turn repetitive prompts into skills or slash commands, use subagents to run work in parallel or focus on specific task types (e.g., architect, code reviewer, QA specialist)
๐ธ Git everything & learn from rabbit holes. Ask AI to learn from and reflect on its mistakes, save results in md files and add it to the memory.
๐ธ Power up with MCPs. Use MCPs carefully, especially with databases or other resources where AI can make dangerous changes, better to give it read-only access only.
๐ธ Release the agents. Use multiple agents to solve different tasks, orchestrate them, fork their context to explore other options, send them for remote execution when needed.
๐ธ Be the human in the loop. Agents make mistakes. That's a fact. So we must review and verify work results.
I definitely recommend watching the full video, as it contains not only common tips for using AI agents but it also has a lot of practical examples of Claude Code usage.
#engineering #ai
YouTube
10 Tips To Level Up Your AI-Assisted Coding - Aleksander Stensby - NDC London 2026
This talk was recorded at NDC London in London, England. #ndclondon #ndcconferences #developer #softwaredeveloper
Attend the next NDC conference near you:
https://ndcconferences.com
https://ndclondon.com/
Subscribe to our YouTube channel and learnโฆ
Attend the next NDC conference near you:
https://ndcconferences.com
https://ndclondon.com/
Subscribe to our YouTube channel and learnโฆ
๐ฅ3๐1
AI-Generated Architecture Diagrams
If you've ever built architecture diagrams, you know how time-consuming it can be.
I use drawio and sometimes I spend hours aligning elements to make the picture compact, clear and not overloaded with elements and connections between them (hello to my perfectionism).
The great news is that AI now can help with this task also using Drawio MCP. Initially I was quite skeptical. Anyone who has tried to generate a diagram by architecture description will understand me ๐.
Surprisingly, the result is good. Not perfect, but really good.
I tested it with creating new diagrams and modifying existing ones according to some template. New diagrams are a bit clumsy, so you need to clean them up. Modification according to the template shows better results, it aligned around 10 different diagrams to the same template within minutes.
Of course as with any AI agent task you need to tune the output, explain mistakes and save lessons learnt to the memory. But finally I got desired result much faster then doing that on my own.
#engineering #ai #tips
If you've ever built architecture diagrams, you know how time-consuming it can be.
I use drawio and sometimes I spend hours aligning elements to make the picture compact, clear and not overloaded with elements and connections between them (hello to my perfectionism).
The great news is that AI now can help with this task also using Drawio MCP. Initially I was quite skeptical. Anyone who has tried to generate a diagram by architecture description will understand me ๐.
Surprisingly, the result is good. Not perfect, but really good.
I tested it with creating new diagrams and modifying existing ones according to some template. New diagrams are a bit clumsy, so you need to clean them up. Modification according to the template shows better results, it aligned around 10 different diagrams to the same template within minutes.
Of course as with any AI agent task you need to tune the output, explain mistakes and save lessons learnt to the memory. But finally I got desired result much faster then doing that on my own.
#engineering #ai #tips
๐ฅ6โค2
Claude Code: Behind the Scenes
Claude Code is one of the most popular developer tools today. But did you know that initially it was just a side project inside the company?
This and many other interesting details are discussed in an interview with Boris Cherny, the creator and head of Claude Code at Anthropic.
Key insights from the interview:
๐ธ Boris built Claude Code as a bash chat-based tool when he was learning public Anthropic APIs and how people use the model. It quickly became popular among other employees, that led to a future public success.
๐ธ Claude writes ~80% of the code at Anthropic on average.
๐ธ Boris ships 20-30 PRs a day by running 5 parallel Claude instances. He starts with a plan mode, iterates over the plan, then let the agent do the implementation. Since the Opus 4.5 release, Claude writes 100% of his code.
๐ธ There is no "right way" to use Claude Code. According to Boris, "The way we build cloud code is to be hackable because we know every engineer's workflow is different. There's no one way to do things. There's no two engineers that have the same workflow."
๐ธ The Claude team doesn't write Product Requirement Documents (specs). They just build dozens of working prototypes before shipping a feature.
๐ธ Claude Code reviews every pull request at Anthropic and it catches ~80% of bugs. There are 2 rounds of review: the first is performed by AI agent, the second is always done by human who finally approves the changes.
๐ธ Claude Cowork is intended to provide Claude capabilities for non-engineers. The tool was built in ~10 days and the main engineering complexity here is about safety: building classifiers, a shipping VM, OS-level protections against accidental file deletion, and rethinking the permission model for non-technical users.
๐ธ There are no technical grades inside Anthropic. Everyone has the same title "Member of Technical Staff", that highlights the assumption that everyone can do everything: product, design, infrastructure, research.
During the interview Boris repeated multiple times that it's more important to ship fast and get user feedback early than to wait and deliver a fully featured product. That's why they built their engineering culture around prototyping. I think it's one of the reasons Anthropic products are so successful.
By the way, if you haven't tried Claude Code yet I highly recommend to do so. And as a starting point you can use Claude Code In Action Anthropic course.
#ai #engineering #usecase
Claude Code is one of the most popular developer tools today. But did you know that initially it was just a side project inside the company?
This and many other interesting details are discussed in an interview with Boris Cherny, the creator and head of Claude Code at Anthropic.
Key insights from the interview:
๐ธ Boris built Claude Code as a bash chat-based tool when he was learning public Anthropic APIs and how people use the model. It quickly became popular among other employees, that led to a future public success.
๐ธ Claude writes ~80% of the code at Anthropic on average.
๐ธ Boris ships 20-30 PRs a day by running 5 parallel Claude instances. He starts with a plan mode, iterates over the plan, then let the agent do the implementation. Since the Opus 4.5 release, Claude writes 100% of his code.
๐ธ There is no "right way" to use Claude Code. According to Boris, "The way we build cloud code is to be hackable because we know every engineer's workflow is different. There's no one way to do things. There's no two engineers that have the same workflow."
๐ธ The Claude team doesn't write Product Requirement Documents (specs). They just build dozens of working prototypes before shipping a feature.
๐ธ Claude Code reviews every pull request at Anthropic and it catches ~80% of bugs. There are 2 rounds of review: the first is performed by AI agent, the second is always done by human who finally approves the changes.
๐ธ Claude Cowork is intended to provide Claude capabilities for non-engineers. The tool was built in ~10 days and the main engineering complexity here is about safety: building classifiers, a shipping VM, OS-level protections against accidental file deletion, and rethinking the permission model for non-technical users.
๐ธ There are no technical grades inside Anthropic. Everyone has the same title "Member of Technical Staff", that highlights the assumption that everyone can do everything: product, design, infrastructure, research.
During the interview Boris repeated multiple times that it's more important to ship fast and get user feedback early than to wait and deliver a fully featured product. That's why they built their engineering culture around prototyping. I think it's one of the reasons Anthropic products are so successful.
By the way, if you haven't tried Claude Code yet I highly recommend to do so. And as a starting point you can use Claude Code In Action Anthropic course.
#ai #engineering #usecase
YouTube
Building Claude Code with Boris Cherny
Boris Cherny is the creator and Head of Claude Code at Anthropic. He previously spent five years at Meta as a Principal Engineer and is the author of the book Programming TypeScript.
In this episode of Pragmatic Engineer, we went through how Claude Codeโฆ
In this episode of Pragmatic Engineer, we went through how Claude Codeโฆ
โค3๐2
Uber: Agent-Centric Organization
Looking for inspiration on how to apply AI in your team? Then check Uber: Leading engineering through an agentic shift where the Dev Platform team presents Uber's AI adoption approach and its current state.
Uber's agent platform includes:
๐ธ MCP Gateway & Registry. Central MCP gateway to expose external and internal MCPs and provide a secure sandbox for experiments.
๐ธ ML Michelangelo platform. An agent builder with no code or SDK solutions with built-in visualization, telemetry, tracing.
๐ธ AIFX. A tool to access internal agents infrastructure: provisioning, discovery, configuration, background tasks.
๐ธ Minion. Background agent platform to integrate them with CI\CD, slack, PRs.
๐ธ Code Inbox. Unified inbox for PRs developers need to review. It tries to find the most relevant person to review the code, track review SLOs, help reassign PRs or make escalation if necessary.
๐ธ uReview. Review pipeline that enriched with internal context, best practices, guidelines.
๐ธ Autocover. A system to generate unit tests. 3x higher code test quality than generated by generic agent, 5000+ tests generated per month.
๐ธ Automigrate. A tool to implement large-scale changes. It consists of problem identifier, code transformer (openwrite, piranha or agents), validation, and campaign manager (route PRs to reviewers, split changes on reasonable PRs, rebases, changes prioritization)
As you can see the team uses AI to support the entire development lifecycle. The whole strategy sounds like "Enable Uber engineers to focus on creative work by eliminating toil". By
To sum up, Uber has a very reasonable strategy for AI adoption. Of course, they made a huge investment to their agent platform. But one simple recipe suits everyone: define the most tedious repeatable tasks and give it to AI. People are burnt out, agents are not.
#ai #engineering #usecase
Looking for inspiration on how to apply AI in your team? Then check Uber: Leading engineering through an agentic shift where the Dev Platform team presents Uber's AI adoption approach and its current state.
Uber's agent platform includes:
๐ธ MCP Gateway & Registry. Central MCP gateway to expose external and internal MCPs and provide a secure sandbox for experiments.
๐ธ ML Michelangelo platform. An agent builder with no code or SDK solutions with built-in visualization, telemetry, tracing.
๐ธ AIFX. A tool to access internal agents infrastructure: provisioning, discovery, configuration, background tasks.
๐ธ Minion. Background agent platform to integrate them with CI\CD, slack, PRs.
๐ธ Code Inbox. Unified inbox for PRs developers need to review. It tries to find the most relevant person to review the code, track review SLOs, help reassign PRs or make escalation if necessary.
๐ธ uReview. Review pipeline that enriched with internal context, best practices, guidelines.
๐ธ Autocover. A system to generate unit tests. 3x higher code test quality than generated by generic agent, 5000+ tests generated per month.
๐ธ Automigrate. A tool to implement large-scale changes. It consists of problem identifier, code transformer (openwrite, piranha or agents), validation, and campaign manager (route PRs to reviewers, split changes on reasonable PRs, rebases, changes prioritization)
As you can see the team uses AI to support the entire development lifecycle. The whole strategy sounds like "Enable Uber engineers to focus on creative work by eliminating toil". By
toil they mean upgrades, migration, bugfixes, writing docs, cleanup. To sum up, Uber has a very reasonable strategy for AI adoption. Of course, they made a huge investment to their agent platform. But one simple recipe suits everyone: define the most tedious repeatable tasks and give it to AI. People are burnt out, agents are not.
#ai #engineering #usecase
YouTube
Uber: Leading engineering through an agentic shift - The Pragmatic Summit
With Ty Smith and Anshu Chada, Uber Dev Platform. At The Pragmatic Summit: www.pragmaticsummit.com
Update on 11 March: the Uber team shared updated numbers as of March 2026:
- 84% of devs at Uber are agentic coding users (either using CLI-based agents orโฆ
Update on 11 March: the Uber team shared updated numbers as of March 2026:
- 84% of devs at Uber are agentic coding users (either using CLI-based agents orโฆ
๐ฅ6
Are You Ready for Coding Agents?
Coding agents are booming. Only the lazy havenโt yet talked about how they built their own agent. But the reality is much more complex. To get the benefits from AI your development ecosystem should be ready for it.
Garbage in -> garbage out, remember?
Low coverage, flaky tests, undefined code style, long verification cycle, poor documentation. Add AI-generated code to this and you will increase the entropy and reduce overall system stability.
To produce predictable results the engineering infrastructure must be stable.
By
๐ธ Linters and automated code style verification.
๐ธ High unit tests coverage (>=80%).
๐ธ Contract tests for all public APIs.
๐ธ System, integration, and E2E tests that run at least once a day.
๐ธ No flakiness. You must fully trust your tests and CI process otherwise you cannot guarantee that agent won't break anything.
๐ธ Security gates. Secret management, vulnerability checks, SAST verification.
๐ธ Documentation. Requirements, architecture, guides, internal agreements. Everything that helps the agent understand how we work.
The most non-obvious part here is test flakiness.
What's the problem with just rerunning the test?
Developers know the context, the agents do not. It means that they will try to fix the test, making it weaker, or modify the code, introducing a bug. The overall result is worse code generation and increased maintenance overhead. So each rerun must be treated as a bug report, not a solution.
If you check how different companies adopt AI, you can notice that all success stories are based on existing powerful CI\CD processes that can safely check AI agent output (Google, Claude Code, Uber, Google, Airbnb).
AI adoption doesn't just bring new tools and processes but also forces the best engineering practices we already have.
#engineering #ci #ai #agents
Coding agents are booming. Only the lazy havenโt yet talked about how they built their own agent. But the reality is much more complex. To get the benefits from AI your development ecosystem should be ready for it.
Garbage in -> garbage out, remember?
Low coverage, flaky tests, undefined code style, long verification cycle, poor documentation. Add AI-generated code to this and you will increase the entropy and reduce overall system stability.
To produce predictable results the engineering infrastructure must be stable.
By
infrastructure I mean:๐ธ Linters and automated code style verification.
๐ธ High unit tests coverage (>=80%).
๐ธ Contract tests for all public APIs.
๐ธ System, integration, and E2E tests that run at least once a day.
๐ธ No flakiness. You must fully trust your tests and CI process otherwise you cannot guarantee that agent won't break anything.
๐ธ Security gates. Secret management, vulnerability checks, SAST verification.
๐ธ Documentation. Requirements, architecture, guides, internal agreements. Everything that helps the agent understand how we work.
The most non-obvious part here is test flakiness.
What's the problem with just rerunning the test?
Developers know the context, the agents do not. It means that they will try to fix the test, making it weaker, or modify the code, introducing a bug. The overall result is worse code generation and increased maintenance overhead. So each rerun must be treated as a bug report, not a solution.
If you check how different companies adopt AI, you can notice that all success stories are based on existing powerful CI\CD processes that can safely check AI agent output (Google, Claude Code, Uber, Google, Airbnb).
AI adoption doesn't just bring new tools and processes but also forces the best engineering practices we already have.
#engineering #ci #ai #agents
๐ฅ3
The Minto Pyramid
Do you know how to structure your docs to make it readable?
The Minto Pyramid Principle is very famous concept to organize docs, presentations and your own thoughts. I met it many times in different resources, so I decided to read the book as an initial source of truth.
Surprisingly, the book was difficult for me, mostly because of academical language and a lot of samples from economic and marketing area. It took me 4 months to finish it and 3 more months to prepare its overview ๐.
The idea is simple:
๐ธ Put the main message at the top. It can be a key point, idea, or even a question.
๐ธ Then add supporting arguments. Keep them consistent, arguments should of the same type and detail level.
๐ธ On the next level add facts and details that back those arguments.
๐ธ Each level should summarize whatโs below it.
๐ธ You can go through pyramid in both directions: from top to down (presentations, explanation) or vice versa (research).
๐ธ Use either induction or deduction when moving between arguments, but donโt mix both.
The overall concept looks obvious. It even reminds me of a math logic course at the university.
But in reality I've seen a lot of unstructured, difficult-to-follow documents. And good structure is important not only for humans, but now it's even more important for AI agents. Thatโs where this book can help. It doesn't just teach how to write readable docs, it teaches you how to organize your thoughts and ideas so others can easily understand you.
#booknook #softskills #communications #documentation
Do you know how to structure your docs to make it readable?
The Minto Pyramid Principle is very famous concept to organize docs, presentations and your own thoughts. I met it many times in different resources, so I decided to read the book as an initial source of truth.
Surprisingly, the book was difficult for me, mostly because of academical language and a lot of samples from economic and marketing area. It took me 4 months to finish it and 3 more months to prepare its overview ๐.
The idea is simple:
๐ธ Put the main message at the top. It can be a key point, idea, or even a question.
๐ธ Then add supporting arguments. Keep them consistent, arguments should of the same type and detail level.
๐ธ On the next level add facts and details that back those arguments.
๐ธ Each level should summarize whatโs below it.
๐ธ You can go through pyramid in both directions: from top to down (presentations, explanation) or vice versa (research).
๐ธ Use either induction or deduction when moving between arguments, but donโt mix both.
The overall concept looks obvious. It even reminds me of a math logic course at the university.
But in reality I've seen a lot of unstructured, difficult-to-follow documents. And good structure is important not only for humans, but now it's even more important for AI agents. Thatโs where this book can help. It doesn't just teach how to write readable docs, it teaches you how to organize your thoughts and ideas so others can easily understand you.
#booknook #softskills #communications #documentation
โค3๐ฅ3
Tech Blogs Reading List
Any technical leader or architect needs to stay on top of the industry trends and develop a broad perspective on architectural solutions and engineering practices.
For that, I read blogs from big tech companies. They give me a sense of whatโs going on, show real-world architecture examples, inspire with new ideas that I can try with my teams.
List of blogs:
- https://www.uber.com/en-IN/blog/engineering
- https://medium.com/airbnb-engineering
- https://engineering.fb.com/
- https://www.linkedin.com/blog/engineering
- https://netflixtechblog.com/
- https://medium.com/@Pinterest_Engineering
- https://engineering.atspotify.com/
- https://aws.amazon.com/blogs/architecture/
- https://github.blog/engineering/
- https://blog.booking.com/
- https://developers.openai.com/blog/
- https://research.google/blog/
- https://www.anthropic.com/engineering
- https://www.anthropic.com/research
I use Feedly to keep everything in one place and check it 1-2 times a week. Most of these blogs are available on free plan.
Iโm always curious what others are reading. So if you have good resources, feel free to share in the comments ๐.
#tips #learning #engineering
Any technical leader or architect needs to stay on top of the industry trends and develop a broad perspective on architectural solutions and engineering practices.
For that, I read blogs from big tech companies. They give me a sense of whatโs going on, show real-world architecture examples, inspire with new ideas that I can try with my teams.
List of blogs:
- https://www.uber.com/en-IN/blog/engineering
- https://medium.com/airbnb-engineering
- https://engineering.fb.com/
- https://www.linkedin.com/blog/engineering
- https://netflixtechblog.com/
- https://medium.com/@Pinterest_Engineering
- https://engineering.atspotify.com/
- https://aws.amazon.com/blogs/architecture/
- https://github.blog/engineering/
- https://blog.booking.com/
- https://developers.openai.com/blog/
- https://research.google/blog/
- https://www.anthropic.com/engineering
- https://www.anthropic.com/research
I use Feedly to keep everything in one place and check it 1-2 times a week. Most of these blogs are available on free plan.
Iโm always curious what others are reading. So if you have good resources, feel free to share in the comments ๐.
#tips #learning #engineering
๐ฅ8๐4โค2
Spec-Driven Development
Do you like frameworks? I'm quite skeptical about them.
They try to solve everything at once and end up adding complexity where itโs not really needed. But engineers love inventing frameworks. Vibecoding is no exception. A new family of frameworks is called spec-driven development (SDD).
The main idea is to write a "spec" before writing code with AI.
A spec is a structured description of WHAT should be done and WHY. In classic terms, a spec is like an interface, generated code is like an implementation.
Main principles:
๐ธ Spec-first. A thoughtful spec is written first, reviewed and then used in a development workflow.
๐ธ Spec-driven. The spec is kept in git repo and itโs used to evolve and maintain the feature.
๐ธ Spec-sourced. Only the spec is edited by human, the code is edited by AI-agent.
Development workflow:
Popular implementations:
- https://github.com/github/spec-kit/
- https://github.com/Fission-AI/OpenSpec
- https://kiro.dev/
- https://github.com/bmad-code-org/BMAD-METHOD
From my perspective it looks like an attempt to bring some control over vibecoding. But as a result the agent generates a bunch of markdown files to review, and I cannot say it's much easier than code review (if not harder, since LLMs tend to be verbose).
Thereโs no common opinion on SDD yet. I know people who like it, and people who don't. Like any AI tool, it needs experimentation and adaptation to your specific tasks.
P.S. Mentioned tools contain interesting prompts that can be reused without SDD itself.
#engineering #ai #sdd
Do you like frameworks? I'm quite skeptical about them.
They try to solve everything at once and end up adding complexity where itโs not really needed. But engineers love inventing frameworks. Vibecoding is no exception. A new family of frameworks is called spec-driven development (SDD).
The main idea is to write a "spec" before writing code with AI.
A spec is a structured description of WHAT should be done and WHY. In classic terms, a spec is like an interface, generated code is like an implementation.
Main principles:
๐ธ Spec-first. A thoughtful spec is written first, reviewed and then used in a development workflow.
๐ธ Spec-driven. The spec is kept in git repo and itโs used to evolve and maintain the feature.
๐ธ Spec-sourced. Only the spec is edited by human, the code is edited by AI-agent.
Development workflow:
intention -> requirements -> design -> tasks -> implementation
Popular implementations:
- https://github.com/github/spec-kit/
- https://github.com/Fission-AI/OpenSpec
- https://kiro.dev/
- https://github.com/bmad-code-org/BMAD-METHOD
From my perspective it looks like an attempt to bring some control over vibecoding. But as a result the agent generates a bunch of markdown files to review, and I cannot say it's much easier than code review (if not harder, since LLMs tend to be verbose).
Thereโs no common opinion on SDD yet. I know people who like it, and people who don't. Like any AI tool, it needs experimentation and adaptation to your specific tasks.
P.S. Mentioned tools contain interesting prompts that can be reused without SDD itself.
#engineering #ai #sdd
โค3๐3๐ฅ1
Rethinking High Availability
Have you noticed how the world has changed? Again.
A year ago I used to say that in public clouds three availability zones are enough. Each zone is in its own data center, data centers are located within ~100 km of each other, and the probability of losing two at the same time was considered very low.
There is no such assumption anymore.
Today, losing two or even all three availability zones in a single region is no longer extremely rare. Itโs something that can actually happen.
What does it mean in practice?
If your business requires system 99.5% availability or higher, relying on a single region is not an option.
Technically it means:
- Using multiple regions even in public clouds (or even different cloud providers).
- Switching to cold or hot standby setup.
- Storing backups in a different region.
- Regularly testing DR scenarios.
Disasters do happen.
The question is whether your system is ready or not.
#engineering #systemdesign #reliability
Have you noticed how the world has changed? Again.
A year ago I used to say that in public clouds three availability zones are enough. Each zone is in its own data center, data centers are located within ~100 km of each other, and the probability of losing two at the same time was considered very low.
There is no such assumption anymore.
Today, losing two or even all three availability zones in a single region is no longer extremely rare. Itโs something that can actually happen.
What does it mean in practice?
If your business requires system 99.5% availability or higher, relying on a single region is not an option.
Technically it means:
- Using multiple regions even in public clouds (or even different cloud providers).
- Switching to cold or hot standby setup.
- Storing backups in a different region.
- Regularly testing DR scenarios.
Disasters do happen.
The question is whether your system is ready or not.
#engineering #systemdesign #reliability
๐ฅ6๐1
SDD: OpenSpec
Last week I wrote about spec-driven development (SDD) as a new wave of frameworks for vibecoding. I tried a few of them. If youโre just exploring SDD, Iโd suggest starting with OpenSpec.
Why?
For me, it felt the simplest and least overloaded.
To get started, you basically need 3 commands:
/opsx:propose
Here you describe what you want to build: bugfix, feature, design or something else (but using SDD for bugfix still feels like overkill).
What I liked:
- You get a structured design (
- It explains why decisions were made
- It adds open questions for clarification to think more
- You get
At this stage I had to iterate a few times because some assumptions were wrong. But in the end I identified gaps in initial feature definition and got a clear plan of future changes. The good thing is that changes at this stage are cheap.
/opsx:apply
If youโre ok with the design, you can ask the agent to execute the plan. It's possible to execute all tasks at once, split them or run multiple agents for parallel execution. Implemented steps are marked as completed in tasks.md.
/opsx:validate
This is a control step. You can request the agent to validate that the implementation matches the design. Becauseโฆ agents still drift and make mistakes.
Of course, OpenSpec contains other interesting commands, but you can add them later when you get used to SDD.
How is it different from Plan mode?
Plan mode mostly generates
SDD reframes the work into a design-first approach and adds additional prompts to do it well.
And honestly, I liked the preparation phase result much more.
I wouldnโt use it everywhere, but for refactoring or feature development it looks really good.
#engineering #ai #sdd
Last week I wrote about spec-driven development (SDD) as a new wave of frameworks for vibecoding. I tried a few of them. If youโre just exploring SDD, Iโd suggest starting with OpenSpec.
Why?
For me, it felt the simplest and least overloaded.
To get started, you basically need 3 commands:
/opsx:propose
Here you describe what you want to build: bugfix, feature, design or something else (but using SDD for bugfix still feels like overkill).
What I liked:
- You get a structured design (
design.md)- It explains why decisions were made
- It adds open questions for clarification to think more
- You get
tasks.md with detailed steps what will be doneAt this stage I had to iterate a few times because some assumptions were wrong. But in the end I identified gaps in initial feature definition and got a clear plan of future changes. The good thing is that changes at this stage are cheap.
/opsx:apply
If youโre ok with the design, you can ask the agent to execute the plan. It's possible to execute all tasks at once, split them or run multiple agents for parallel execution. Implemented steps are marked as completed in tasks.md.
/opsx:validate
This is a control step. You can request the agent to validate that the implementation matches the design. Becauseโฆ agents still drift and make mistakes.
Of course, OpenSpec contains other interesting commands, but you can add them later when you get used to SDD.
How is it different from Plan mode?
Plan mode mostly generates
tasks.md. No design. No real spec. SDD reframes the work into a design-first approach and adds additional prompts to do it well.
And honestly, I liked the preparation phase result much more.
I wouldnโt use it everywhere, but for refactoring or feature development it looks really good.
#engineering #ai #sdd
๐3โ2๐ฅ2
Building AI-Powered Team
AI adoption is one of the biggest challenges and at the same time one of the biggest opportunities for business.
Some teams report significant productivity boost. Others say: โAI might be useful for some tasks.โ
So whatโs the difference?
AI adoption is not about access to the tools. Just buying licenses and giving them to engineers doesn't work. The team need to rethink how they work and integrate these tools into their daily workflow. And thatโs already classic change management task.
On this topic, I recently came across the GitHub Internal Playbook for building an AI-powered workforce. They highlight that AI adoption is not really a technical problem, itโs a human one.
GitHub suggests 8 pillars to drive adoption at the organization level:
- AI advocates. Internal champions who scale adoption through peer-to-peer influence and feedback.
- Clear policies. Defines rules for using AI.
- Learning & development. External and internal training and education.
- Metrics. Track adoption, engagement, and business impact.
- Ownership. A central owner who orchestrates the program and drives the overall strategy.
- Executive support. Visible leadership commitment and strategic vision.
- Right tools. Different tools for different roles.
- Communities. Peer-to-peer learning, knowledge sharing, and collaborative problem-solving.
And in reality, the key part here is the people on the ground, the experts who drive the change, adapt the tools to real tasks, and teach others. This is also covered in more detail in the companion article Activating your internal AI champions.
You canโt roll out AI top-down.
You canโt standardize it with one template for everyone. Every team has its own context. Without understanding it, any โunified approachโ will fail.
#leadership #ai
AI adoption is one of the biggest challenges and at the same time one of the biggest opportunities for business.
Some teams report significant productivity boost. Others say: โAI might be useful for some tasks.โ
So whatโs the difference?
AI adoption is not about access to the tools. Just buying licenses and giving them to engineers doesn't work. The team need to rethink how they work and integrate these tools into their daily workflow. And thatโs already classic change management task.
On this topic, I recently came across the GitHub Internal Playbook for building an AI-powered workforce. They highlight that AI adoption is not really a technical problem, itโs a human one.
GitHub suggests 8 pillars to drive adoption at the organization level:
- AI advocates. Internal champions who scale adoption through peer-to-peer influence and feedback.
- Clear policies. Defines rules for using AI.
- Learning & development. External and internal training and education.
- Metrics. Track adoption, engagement, and business impact.
- Ownership. A central owner who orchestrates the program and drives the overall strategy.
- Executive support. Visible leadership commitment and strategic vision.
- Right tools. Different tools for different roles.
- Communities. Peer-to-peer learning, knowledge sharing, and collaborative problem-solving.
And in reality, the key part here is the people on the ground, the experts who drive the change, adapt the tools to real tasks, and teach others. This is also covered in more detail in the companion article Activating your internal AI champions.
You canโt roll out AI top-down.
You canโt standardize it with one template for everyone. Every team has its own context. Without understanding it, any โunified approachโ will fail.
#leadership #ai
๐8
Agent Harness
Harness is a new buzzword introduced by modern AI.
Let's check what it is and why it matters.
The term
A harness typically includes:
๐ธ System prompts
๐ธ Tools, skills, MCPs and their descriptions
๐ธ State & memory (current task state, past runs, intermediate states)
๐ธ Planning & task decomposition
๐ธ Context engineering strategies
๐ธ Safety & guardrails (allowed tools, rate limiting, prompt injection protection)
๐ธ Bundled infrastructure (filesystem, sandbox, browser)
๐ธ Subagent orchestration logic
๐ธ Hooks/middleware for deterministic execution (compaction, continuation, lint checks)
Well-known examples of harness ecosystems include Claude Code, Cursor, LangChain.
The overall trend is that each model provider now builds and promotes its own harness. But because each provider uses different system prompts, model tuning techniques and context management strategies, the same model in different ecosystems will produce different results.
So the same model does not mean the same agent. And the real competition is no longer between models. Itโs between harnesses fighting for your workflow and your budget.
#ai #engineering
Harness is a new buzzword introduced by modern AI.
Let's check what it is and why it matters.
The term
harness refers to the logic around LLM that controls and guides how an agent operates. It's not the agent itself but the tools and guardrails that help it achieve better results.A harness typically includes:
๐ธ System prompts
๐ธ Tools, skills, MCPs and their descriptions
๐ธ State & memory (current task state, past runs, intermediate states)
๐ธ Planning & task decomposition
๐ธ Context engineering strategies
๐ธ Safety & guardrails (allowed tools, rate limiting, prompt injection protection)
๐ธ Bundled infrastructure (filesystem, sandbox, browser)
๐ธ Subagent orchestration logic
๐ธ Hooks/middleware for deterministic execution (compaction, continuation, lint checks)
Well-known examples of harness ecosystems include Claude Code, Cursor, LangChain.
The overall trend is that each model provider now builds and promotes its own harness. But because each provider uses different system prompts, model tuning techniques and context management strategies, the same model in different ecosystems will produce different results.
So the same model does not mean the same agent. And the real competition is no longer between models. Itโs between harnesses fighting for your workflow and your budget.
#ai #engineering
โค4๐3
CliftonStrengths 34
I recently passed the CliftonStrengths 34 assessment, so today I will share what it is and how it can be useful for your career.
CliftonStrengths is a framework that identifies your natural talents that help create value at work.
It was launched by Gallup in 2001, and since then more than 26 million people have taken it.
So it's based on real data and many years of consulting experience.
The main idea is that we should focus on our strengths to achieve results and not try to improve our weaknesses.
How it works:
- 200 questions
- 4 strength domains: executing, influencing, relationship building, and strategic thinking.
- 34 strength areas within those domains.
- All 34 themes are ranked in a personal order from the strongest to the weakest.
- Top-10 are our main talents to focus on.
The interesting part is that every strength has both a positive and a negative side. It can help you succeed or hold you back.
For example: Learner. The person with this strength quickly picks up new topics, constantly extend their knowledge. But it's easy to get stuck in a โforever studentโ mode.
The test is really helpful from self-reflection perspective:
๐ธ Once you know your strengths, you can rely on good parts and mitigate the downsides.
๐ธ The less you do the work that isnโt natural to you, the more productive and energized you are.
๐ธ We assume others think and work like we do. But they don't. And this is our advantage to use.
What it gave to me? First of all, I realized that I really do the work I'm naturally good at (hello, imposter syndrome). Second, I started noticing my strengths in real situations and using them more consciously.
So the assessment is a helpful tool to understand what to focus on to achieve better results. And itโs a good starting point to rethink your day-to-day activities and align it more with what you enjoy.
#softskills #leadership #productivity
I recently passed the CliftonStrengths 34 assessment, so today I will share what it is and how it can be useful for your career.
CliftonStrengths is a framework that identifies your natural talents that help create value at work.
It was launched by Gallup in 2001, and since then more than 26 million people have taken it.
So it's based on real data and many years of consulting experience.
The main idea is that we should focus on our strengths to achieve results and not try to improve our weaknesses.
How it works:
- 200 questions
- 4 strength domains: executing, influencing, relationship building, and strategic thinking.
- 34 strength areas within those domains.
- All 34 themes are ranked in a personal order from the strongest to the weakest.
- Top-10 are our main talents to focus on.
The interesting part is that every strength has both a positive and a negative side. It can help you succeed or hold you back.
For example: Learner. The person with this strength quickly picks up new topics, constantly extend their knowledge. But it's easy to get stuck in a โforever studentโ mode.
The test is really helpful from self-reflection perspective:
๐ธ Once you know your strengths, you can rely on good parts and mitigate the downsides.
๐ธ The less you do the work that isnโt natural to you, the more productive and energized you are.
๐ธ We assume others think and work like we do. But they don't. And this is our advantage to use.
What it gave to me? First of all, I realized that I really do the work I'm naturally good at (hello, imposter syndrome). Second, I started noticing my strengths in real situations and using them more consciously.
So the assessment is a helpful tool to understand what to focus on to achieve better results. And itโs a good starting point to rethink your day-to-day activities and align it more with what you enjoy.
#softskills #leadership #productivity
โค6๐ฅ4
Inside the Context Window
What makes your work with agents efficient? Chosen model? Harness? Instructions clarity?
I would say that first of all it's the quality of the context you provide.
Context is everything the model sees before it generates a response.
Two facts to know about the context:
1. It's limited (and costs you money ๐ฐ ).
2. The longer the context, the worse the results.
So context engineering is a set of practices to fill the context with just enough information to get the desired results. The main goal is to balance the amount of context given: not too little and vague, not too much and detailed.
The first step in context engineering is to understand what the context actually contains. And itโs not just your prompt.
Typical context structure:
๐ธ System prompts & instructions: the hidden layer of system prompts, safety policies, behavioral rules, role definition. Usually it's part of the harness and you cannot change it.
๐ธ Project context: AGENT.md\CLAUDE.md, repo structure, settings. It's added as a first prompt to any session you open with the agent.
๐ธ Available tools: skill descriptions, MCPs, available CLIs.
๐ธ Retrieved information: loaded files, data from RAG system.
๐ธ State & history: The current conversation, including user, model and tools responses.
๐ธ Reasoning: intermediate reasoning results (thinking mode).
๐ธ Long-term memory: knowledge base from previous conversations like user preferences, summaries of working sessions, facts the agent was asked to remember for future use.
๐ธ Your prompt: the actual user request.
As you can see, the context is already filled with a lot of information before you even start the real work. To make agents efficient, keep their context clean and focused. Don't overload it with unnecessary information.
#ai #engineering
What makes your work with agents efficient? Chosen model? Harness? Instructions clarity?
I would say that first of all it's the quality of the context you provide.
Context is everything the model sees before it generates a response.
Two facts to know about the context:
1. It's limited (and costs you money ๐ฐ ).
2. The longer the context, the worse the results.
So context engineering is a set of practices to fill the context with just enough information to get the desired results. The main goal is to balance the amount of context given: not too little and vague, not too much and detailed.
The first step in context engineering is to understand what the context actually contains. And itโs not just your prompt.
Typical context structure:
๐ธ System prompts & instructions: the hidden layer of system prompts, safety policies, behavioral rules, role definition. Usually it's part of the harness and you cannot change it.
๐ธ Project context: AGENT.md\CLAUDE.md, repo structure, settings. It's added as a first prompt to any session you open with the agent.
๐ธ Available tools: skill descriptions, MCPs, available CLIs.
๐ธ Retrieved information: loaded files, data from RAG system.
๐ธ State & history: The current conversation, including user, model and tools responses.
๐ธ Reasoning: intermediate reasoning results (thinking mode).
๐ธ Long-term memory: knowledge base from previous conversations like user preferences, summaries of working sessions, facts the agent was asked to remember for future use.
๐ธ Your prompt: the actual user request.
As you can see, the context is already filled with a lot of information before you even start the real work. To make agents efficient, keep their context clean and focused. Don't overload it with unnecessary information.
#ai #engineering
๐ฅ4
ReasoningBank
Currently AI agents have one major limitation: they cannot learn. I mean they don't learn from their experience or from the results of completed tasks. Once the model is trained, all we can do is to tune our prompts or enrich results with domain data from RAG.
Researchers from Google started exploring how to overcome this limitation and introduced the concept called ReasoningBank.
The overall idea is simple:
1. The agent writes down the result of successful or failed tasks into a dedicated md file.
2. During task execution, the agent searches the ReasoningBank and pulls relevant memories into the context.
3. Then it uses an LLM-as-a-judge approach to self-evaluate the result, analyze the trajectory of reasoning, and extract success insights or failure reasons.
Each file has the following structure (very similar to skills):
- Title: identifier of the core strategy.
- Description: short summary of the memory item.
- Content: reasoning steps, decision explanation, or operational insights extracted from past experience.
To be honest, benchmark results compared to other agent memory approaches do not look extremely impressive:
At the same time, this approach adds even more data to the context. And context, as we know, directly affects both model behavior quality and usage cost.
The official paper contains interesting research details, including particular prompts and measurements.
From my perspective, the idea and its implementation are very similar to skills or other long-term agent memories (e.g. in Claude Code). But the overall direction of making agents capable of learning from their own experience looks really promising.
#ai #engineering #news
Currently AI agents have one major limitation: they cannot learn. I mean they don't learn from their experience or from the results of completed tasks. Once the model is trained, all we can do is to tune our prompts or enrich results with domain data from RAG.
Researchers from Google started exploring how to overcome this limitation and introduced the concept called ReasoningBank.
The overall idea is simple:
1. The agent writes down the result of successful or failed tasks into a dedicated md file.
2. During task execution, the agent searches the ReasoningBank and pulls relevant memories into the context.
3. Then it uses an LLM-as-a-judge approach to self-evaluate the result, analyze the trajectory of reasoning, and extract success insights or failure reasons.
Each file has the following structure (very similar to skills):
- Title: identifier of the core strategy.
- Description: short summary of the memory item.
- Content: reasoning steps, decision explanation, or operational insights extracted from past experience.
To be honest, benchmark results compared to other agent memory approaches do not look extremely impressive:
ReasoningBank without scaling outperformed memory-free agents by 8.3% on WebArena and 4.6% on SWE-Bench-Verified.
At the same time, this approach adds even more data to the context. And context, as we know, directly affects both model behavior quality and usage cost.
The official paper contains interesting research details, including particular prompts and measurements.
From my perspective, the idea and its implementation are very similar to skills or other long-term agent memories (e.g. in Claude Code). But the overall direction of making agents capable of learning from their own experience looks really promising.
#ai #engineering #news
๐3โค2
AI Engineering
I strongly believe that if you want to use any technology effectively, you need to understand how it works under the hood. Especially in software engineering.
So if you havenโt looked into LLM internals yet, Iโd highly recommend reading AI Engineering by Chip Huyen. The book was published in December 2024. And as AI is moving extremely fast, you might think itโs already outdated. Yes and no.
The book focuses on fundamentals. And they donโt really change that fast. You wonโt find hype topics like skills, harnesses, or agents orchestration there. But for building structured understanding of how AI works, you don't actually need them.
What I personally found useful:
๐ธ Core LLM concepts: tokenization, training and post-training processes, datasets preparation. This part is very similar to Mashing Learning Crash Course from Google.
๐ธ Model evaluation: quite complex but interesting topic about model output results and their comparison. The book covers ranking, model specialization, public benchmarks and AI-as-a-judge approach.
๐ธ Prompt engineering: good reference about context and prompting. Additionally, the author described different security aspects of using prompts, that part really extended my thoughts about what can go wrong.
๐ธ Finetuning: a deep dive into different ways to optimize models. You need to be a good mathematician to understand this part. So I was really glad I'm not an ML engineer ๐ (huge respect to all ML experts, it's really hard).
๐ธ User feedback: basic patterns on how to collect feedback, what to measure and why, common pitfalls.
To sum up, this book is really great to structure your knowledge about modern AI systems. Once you have that foundation, it becomes much easier to navigate all the new tools, patterns and paradigms that appear almost every month.
#booknook #ai #engineering
I strongly believe that if you want to use any technology effectively, you need to understand how it works under the hood. Especially in software engineering.
So if you havenโt looked into LLM internals yet, Iโd highly recommend reading AI Engineering by Chip Huyen. The book was published in December 2024. And as AI is moving extremely fast, you might think itโs already outdated. Yes and no.
The book focuses on fundamentals. And they donโt really change that fast. You wonโt find hype topics like skills, harnesses, or agents orchestration there. But for building structured understanding of how AI works, you don't actually need them.
What I personally found useful:
๐ธ Core LLM concepts: tokenization, training and post-training processes, datasets preparation. This part is very similar to Mashing Learning Crash Course from Google.
๐ธ Model evaluation: quite complex but interesting topic about model output results and their comparison. The book covers ranking, model specialization, public benchmarks and AI-as-a-judge approach.
๐ธ Prompt engineering: good reference about context and prompting. Additionally, the author described different security aspects of using prompts, that part really extended my thoughts about what can go wrong.
๐ธ Finetuning: a deep dive into different ways to optimize models. You need to be a good mathematician to understand this part. So I was really glad I'm not an ML engineer ๐ (huge respect to all ML experts, it's really hard).
๐ธ User feedback: basic patterns on how to collect feedback, what to measure and why, common pitfalls.
To sum up, this book is really great to structure your knowledge about modern AI systems. Once you have that foundation, it becomes much easier to navigate all the new tools, patterns and paradigms that appear almost every month.
#booknook #ai #engineering
๐ฅ4๐3
Ralph Loop
Underterministic nature of AI sometimes produces very interesting engineering solutions. One such example is Ralph (or Ralph Wiggum) Loop.
This is an AI coding pattern inspired by The Simpsons character Ralph Wiggum, known for saying weird things with high confidence ๐.
The idea is simple: The agent can be dumb in a single iteration. But if it keeps retrying with feedback long enough, it eventually converges.
The loop steps:
The loops finishes when all tasks have
But the real value of the technique is not in retries, it's in context engineering strategy under the hood:
๐ธ One loop executes only one task. It keeps agent focused.
๐ธ Each iteration starts a new agent session. State lives outside the context keeping it clean between iterations. State is stored in git history, progress.txt and long-term-memory files.
๐ธ Tasks are delegated to subagents. The main context is not polluted with task execution details and validations.
๐ธ AGENTS.md is updated on each iteration. It is a live artifact that contains discovered patterns, learnings and conventions so future iterations can benefit from those findings and do not repeat previous mistakes.
๐ธ AGENTS.md contains explicit validations for feedback loop. It usually defines linters and typechecks, build and test execution commands.
Ralph Loop is a really powerful pattern to get things done: it just repeats the task until it succeeds making agent execution more reliable. "Deterministically bad" but effective.
But this approach only works if you have good task decomposition, clear completion criteria, and mature SDLC practices with strong validations and feedback loops. Otherwise the agent will generate just a ton of mess.
#ai #engineering #patterns
Underterministic nature of AI sometimes produces very interesting engineering solutions. One such example is Ralph (or Ralph Wiggum) Loop.
This is an AI coding pattern inspired by The Simpsons character Ralph Wiggum, known for saying weird things with high confidence ๐.
The idea is simple: The agent can be dumb in a single iteration. But if it keeps retrying with feedback long enough, it eventually converges.
The loop steps:
Start a new agent\subagent -> load task + memory -> execute 1 selected task -> run validation -> save learnings -> commit progress -> repeat
The loops finishes when all tasks have
passes:true or it reaches the maximum number of iterations (default is 10).But the real value of the technique is not in retries, it's in context engineering strategy under the hood:
๐ธ One loop executes only one task. It keeps agent focused.
๐ธ Each iteration starts a new agent session. State lives outside the context keeping it clean between iterations. State is stored in git history, progress.txt and long-term-memory files.
๐ธ Tasks are delegated to subagents. The main context is not polluted with task execution details and validations.
๐ธ AGENTS.md is updated on each iteration. It is a live artifact that contains discovered patterns, learnings and conventions so future iterations can benefit from those findings and do not repeat previous mistakes.
๐ธ AGENTS.md contains explicit validations for feedback loop. It usually defines linters and typechecks, build and test execution commands.
Ralph Loop is a really powerful pattern to get things done: it just repeats the task until it succeeds making agent execution more reliable. "Deterministically bad" but effective.
But this approach only works if you have good task decomposition, clear completion criteria, and mature SDLC practices with strong validations and feedback loops. Otherwise the agent will generate just a ton of mess.
#ai #engineering #patterns
Geoffrey Huntley
Ralph Wiggum as a "software engineer"
How Ralph Wiggum went from 'The Simpsons' to the biggest name in AI right now - Venture Beat
๐Here's a cool little field report from a Y Combinator hackathon event where they put Ralph Wiggum to the test.
"We Put a Coding Agent in a While Loop and It Shipped
๐Here's a cool little field report from a Y Combinator hackathon event where they put Ralph Wiggum to the test.
"We Put a Coding Agent in a While Loop and It Shipped
๐ฅ4
Skill Packaging
Engineering teams are actively building internal collections of skills for agents: code review, troubleshooting, design preparation, onboarding, security practices.
And it looks great until you hit the question: how do you distribute those skills across dozens of teams and multiple harnesses? For Claude you need to put skills into
To solve this problem big companies mostly build their own in-house solutions. Smaller companies usually just copy files from some shared repository and manage this complexity manually.
I donโt like reinventing the wheel, so when my team faced the same problem, we started looking for an existing solution we could reuse. And the only actively maintained tool we managed to find was apm by Microsoft.
APM is a package manager for prompts, skills, and MCPs. In other words, itโs maven or gomod for agents.
APM package structure:
To install the package in a target repo you need to define apm.yaml with a list of required dependencies:
After that you just run:
and the required skills will be installed into the corresponding harness folders (
The tool works with Github and on-prem git installations like Gitlab.
APM is not perfect. It had some unpleasant (but not critical) issues, and sometimes you can really feel that it was heavily vibe-coded in Python .
But despite all that, the tool actually works: you have a spec to define your skills\prompts packages, distribute and update them with simple
#ai #engineering #agents
Engineering teams are actively building internal collections of skills for agents: code review, troubleshooting, design preparation, onboarding, security practices.
And it looks great until you hit the question: how do you distribute those skills across dozens of teams and multiple harnesses? For Claude you need to put skills into
.claude, for Cursor into .cursor, for Gemini into .gemini, etc. And things become even messier when you need to roll out updates.To solve this problem big companies mostly build their own in-house solutions. Smaller companies usually just copy files from some shared repository and manage this complexity manually.
I donโt like reinventing the wheel, so when my team faced the same problem, we started looking for an existing solution we could reuse. And the only actively maintained tool we managed to find was apm by Microsoft.
APM is a package manager for prompts, skills, and MCPs. In other words, itโs maven or gomod for agents.
APM package structure:
my-package/
โโโ apm.yml
โโโ .apm/
โโโ instructions/
โ โโโ my.instructions.md
โโโ skills/
โ โโโ my-skill/
โ โโโ SKILL.md
โโโ agents/
โโโ prompts/
To install the package in a target repo you need to define apm.yaml with a list of required dependencies:
name: my-projecty
version: 1.0.0
targets:
- claude
- copilot
dependencies:
apm:
- <git-address>/my-package
- <git-address>/another-package
mcp: []
After that you just run:
apm install
and the required skills will be installed into the corresponding harness folders (
.claude, .copilot, etc.). The tool works with Github and on-prem git installations like Gitlab.
APM is not perfect. It had some unpleasant (but not critical) issues, and sometimes you can really feel that it was heavily vibe-coded in Python .
But despite all that, the tool actually works: you have a spec to define your skills\prompts packages, distribute and update them with simple
apm update. And on top of apm dependencies format itโs pretty easy to vibe-code your own internal skills marketplace.#ai #engineering #agents
GitHub
GitHub - microsoft/apm: Agent Package Manager
Agent Package Manager. Contribute to microsoft/apm development by creating an account on GitHub.
๐4โค3
The Fearless Organization
The most dangerous teams are the quiet teams. There are no disagreements, no bad news, no conflicts. Looks like harmony until the real incident.
That's the topic of the book The Fearless Organization: Creating Psychological Safety in the Workplace for Learning, Innovation, and Growth by Amy C. Edmondson.
Amy is a professor of Leadership and Management at the Harvard Business School. She has studied the phenomenon of psychological safety and its impact on team performance for many years across different organizations.
She defines psychological safety as follows:
What it means in practice:
- people are not afraid to ask questions
- they do not hide problems
- they are not afraid to look stupid
- they do not avoid conflicts
- they can freely express their opinions and bring suggestions
Why does it matter?
There is a good example from the book that explains that. Imagine a doctor prescribes treatment for a child. A nurse notices that doctors usually prescribe drug A in such cases, but this time it is missing.
In a team with high psychological safety, the nurse will clarify this with the doctor and may help prevent a medical error.
In a team with low psychological safety, she may be afraid to ask. And the consequences can be dramatic.
The core idea is simple. But the book contains a lot of real stories where a low level of psychological safety leads to dramatic results (e.g. Volkswagen emission scandal, pilot mistakes that caused plane crashes). The author repeatedly highlights that the more complex and critical the profession is, the more important psychological safety becomes.
How does this relate to our daily work?
We as leaders are responsible for the psychological climate in the team: how well we listen to people, accept different opinions, react to questions, mistakes, or bad news. It's our daily routine that either helps the team become more effective, or leads people to hide problems and the real state of things.
Overall, I really liked the book. It explains the idea in simple language with many real examples. And what is important for me, all arguments and recommendations are supported by sociological research, experiments and practical psychology.
So psychological safety is not just an idea. It is a proven behavioral model and set of practices that can actually help leaders build better teams.
P.S. One of the best real examples of psychological safety is Pixar. I wrote about it earlier in overview of Creativity, Inc.: Overcoming the Unseen Forces That Stand in the Way of True Inspiration: parts 1,2,3.
#booknook #softskills #leadership
The most dangerous teams are the quiet teams. There are no disagreements, no bad news, no conflicts. Looks like harmony until the real incident.
That's the topic of the book The Fearless Organization: Creating Psychological Safety in the Workplace for Learning, Innovation, and Growth by Amy C. Edmondson.
Amy is a professor of Leadership and Management at the Harvard Business School. She has studied the phenomenon of psychological safety and its impact on team performance for many years across different organizations.
She defines psychological safety as follows:
a belief that one will not be punished or humiliated for speaking up with ideas, questions, concerns, or mistakes, and that the team is safe for inter-personal risk taking.
What it means in practice:
- people are not afraid to ask questions
- they do not hide problems
- they are not afraid to look stupid
- they do not avoid conflicts
- they can freely express their opinions and bring suggestions
Why does it matter?
There is a good example from the book that explains that. Imagine a doctor prescribes treatment for a child. A nurse notices that doctors usually prescribe drug A in such cases, but this time it is missing.
In a team with high psychological safety, the nurse will clarify this with the doctor and may help prevent a medical error.
In a team with low psychological safety, she may be afraid to ask. And the consequences can be dramatic.
The core idea is simple. But the book contains a lot of real stories where a low level of psychological safety leads to dramatic results (e.g. Volkswagen emission scandal, pilot mistakes that caused plane crashes). The author repeatedly highlights that the more complex and critical the profession is, the more important psychological safety becomes.
How does this relate to our daily work?
We as leaders are responsible for the psychological climate in the team: how well we listen to people, accept different opinions, react to questions, mistakes, or bad news. It's our daily routine that either helps the team become more effective, or leads people to hide problems and the real state of things.
Overall, I really liked the book. It explains the idea in simple language with many real examples. And what is important for me, all arguments and recommendations are supported by sociological research, experiments and practical psychology.
So psychological safety is not just an idea. It is a proven behavioral model and set of practices that can actually help leaders build better teams.
P.S. One of the best real examples of psychological safety is Pixar. I wrote about it earlier in overview of Creativity, Inc.: Overcoming the Unseen Forces That Stand in the Way of True Inspiration: parts 1,2,3.
#booknook #softskills #leadership
๐3
Agent Readiness Framework
A few weeks ago I wrote that adopting coding agents requires strong engineering practices.
Test stability, linting, documentation, security controls matter much more than a particular harness or model.
Agent Readiness framework is an attempt to formalize these criteria for a particular repository and define how much autonomy can be safely delegated to agents.
The framework evaluates repos across 8 dimensions:
- Style & Validation
- Build System
- Testing
- Documentation
- Dev Environment
- Code Quality
- Observability
- Security & Governance
Based on these dimensions, the framework defines 5 levels of repo maturity:
๐ธ Level 1: Functional. Basic checks: README, linters, unit tests.
๐ธ Level 2: Documented. Detailed documentation and basic automations: AGENT.md, reproducible dev env, contribution guides.
๐ธ Level 3: Standardized. E2E tests, observability, security scanning, maintained documentation.
๐ธ Level 4: Optimized. Fast validation loops, canary deployments, build optimization. Process is optimized for fast feedback.
๐ธ Level 5: Autonomous. Task decomposition, multi-service orchestration, self-healing logic, auto-remediation.
The idea is simple: the higher the maturity level, the more predictable and reliable agent results. But looking at these levels, I can see that most repos are actually somewhere between Level 1 and Level 3.
Framework authors also provide a tool to automatically measure these criteria and maturity level, but it's available only after registration and using proprietary APIs. Scanned examples you can find at https://factory.ai/agent-readiness.
There is also an open-source alternative https://github.com/kodustech/agent-readiness. The project doesn't look active, but it gets the job done. It analyzes the repo and generates a report with the overall maturity level, findings for each dimension, and suggestions for improvements. Some rules are not very accurate. Looks like the project was mainly designed for python and js code verification. But anyway the tool gives you a good sense of what to pay attention to in your codebase.
What I like about this framework is that it shows that agent effectiveness is actually limited by the maturity of engineering practices. And it provides measurable and actionable results, that are easy to convert into an improvement plan for a particular repo.
#ai #engineering
A few weeks ago I wrote that adopting coding agents requires strong engineering practices.
Test stability, linting, documentation, security controls matter much more than a particular harness or model.
Agent Readiness framework is an attempt to formalize these criteria for a particular repository and define how much autonomy can be safely delegated to agents.
The framework evaluates repos across 8 dimensions:
- Style & Validation
- Build System
- Testing
- Documentation
- Dev Environment
- Code Quality
- Observability
- Security & Governance
Based on these dimensions, the framework defines 5 levels of repo maturity:
๐ธ Level 1: Functional. Basic checks: README, linters, unit tests.
๐ธ Level 2: Documented. Detailed documentation and basic automations: AGENT.md, reproducible dev env, contribution guides.
๐ธ Level 3: Standardized. E2E tests, observability, security scanning, maintained documentation.
๐ธ Level 4: Optimized. Fast validation loops, canary deployments, build optimization. Process is optimized for fast feedback.
๐ธ Level 5: Autonomous. Task decomposition, multi-service orchestration, self-healing logic, auto-remediation.
The idea is simple: the higher the maturity level, the more predictable and reliable agent results. But looking at these levels, I can see that most repos are actually somewhere between Level 1 and Level 3.
Framework authors also provide a tool to automatically measure these criteria and maturity level, but it's available only after registration and using proprietary APIs. Scanned examples you can find at https://factory.ai/agent-readiness.
There is also an open-source alternative https://github.com/kodustech/agent-readiness. The project doesn't look active, but it gets the job done. It analyzes the repo and generates a report with the overall maturity level, findings for each dimension, and suggestions for improvements. Some rules are not very accurate. Looks like the project was mainly designed for python and js code verification. But anyway the tool gives you a good sense of what to pay attention to in your codebase.
What I like about this framework is that it shows that agent effectiveness is actually limited by the maturity of engineering practices. And it provides measurable and actionable results, that are easy to convert into an improvement plan for a particular repo.
#ai #engineering
๐3๐ฅ3