TechLead Bits
488 subscribers
68 photos
1 file
189 links
About software development with common sense.
Thoughts, tips and useful resources on technical leadership, architecture and engineering practices.

Author: @nelia_loginova
Download Telegram
Are You Ready for Coding Agents?

Coding agents are booming. Only the lazy havenโ€™t yet talked about how they built their own agent. But the reality is much more complex. To get the benefits from AI your development ecosystem should be ready for it.

Garbage in -> garbage out, remember?
Low coverage, flaky tests, undefined code style, long verification cycle, poor documentation. Add AI-generated code to this and you will increase the entropy and reduce overall system stability.

To produce predictable results the engineering infrastructure must be stable.

By infrastructure I mean:
๐Ÿ”ธ Linters and automated code style verification.
๐Ÿ”ธ High unit tests coverage (>=80%).
๐Ÿ”ธ Contract tests for all public APIs.
๐Ÿ”ธ System, integration, and E2E tests that run at least once a day.
๐Ÿ”ธ No flakiness. You must fully trust your tests and CI process otherwise you cannot guarantee that agent won't break anything.
๐Ÿ”ธ Security gates. Secret management, vulnerability checks, SAST verification.
๐Ÿ”ธ Documentation. Requirements, architecture, guides, internal agreements. Everything that helps the agent understand how we work.

The most non-obvious part here is test flakiness.
What's the problem with just rerunning the test?
Developers know the context, the agents do not. It means that they will try to fix the test, making it weaker, or modify the code, introducing a bug. The overall result is worse code generation and increased maintenance overhead. So each rerun must be treated as a bug report, not a solution.

If you check how different companies adopt AI, you can notice that all success stories are based on existing powerful CI\CD processes that can safely check AI agent output (Google, Claude Code, Uber, Google, Airbnb).
AI adoption doesn't just bring new tools and processes but also forces the best engineering practices we already have.

#engineering #ci #ai #agents
๐Ÿ”ฅ3
The Minto Pyramid

Do you know how to structure your docs to make it readable?

The Minto Pyramid Principle is very famous concept to organize docs, presentations and your own thoughts. I met it many times in different resources, so I decided to read the book as an initial source of truth.

Surprisingly, the book was difficult for me, mostly because of academical language and a lot of samples from economic and marketing area. It took me 4 months to finish it and 3 more months to prepare its overview ๐Ÿ˜ƒ.

The idea is simple:
๐Ÿ”ธ Put the main message at the top. It can be a key point, idea, or even a question.
๐Ÿ”ธ Then add supporting arguments. Keep them consistent, arguments should of the same type and detail level.
๐Ÿ”ธ On the next level add facts and details that back those arguments.
๐Ÿ”ธ Each level should summarize whatโ€™s below it.
๐Ÿ”ธ You can go through pyramid in both directions: from top to down (presentations, explanation) or vice versa (research).
๐Ÿ”ธ Use either induction or deduction when moving between arguments, but donโ€™t mix both.

The overall concept looks obvious. It even reminds me of a math logic course at the university.

But in reality I've seen a lot of unstructured, difficult-to-follow documents. And good structure is important not only for humans, but now it's even more important for AI agents. Thatโ€™s where this book can help. It doesn't just teach how to write readable docs, it teaches you how to organize your thoughts and ideas so others can easily understand you.

#booknook #softskills #communications #documentation
โค3๐Ÿ”ฅ3
Tech Blogs Reading List

Any technical leader or architect needs to stay on top of the industry trends and develop a broad perspective on architectural solutions and engineering practices.

For that, I read blogs from big tech companies. They give me a sense of whatโ€™s going on, show real-world architecture examples, inspire with new ideas that I can try with my teams.

List of blogs:
- https://www.uber.com/en-IN/blog/engineering
- https://medium.com/airbnb-engineering
- https://engineering.fb.com/
- https://www.linkedin.com/blog/engineering
- https://netflixtechblog.com/
- https://medium.com/@Pinterest_Engineering
- https://engineering.atspotify.com/
- https://aws.amazon.com/blogs/architecture/
- https://github.blog/engineering/
- https://blog.booking.com/
- https://developers.openai.com/blog/
- https://research.google/blog/
- https://www.anthropic.com/engineering
- https://www.anthropic.com/research

I use Feedly to keep everything in one place and check it 1-2 times a week. Most of these blogs are available on free plan.

Iโ€™m always curious what others are reading. So if you have good resources, feel free to share in the comments ๐Ÿ“š.

#tips #learning #engineering
๐Ÿ”ฅ8๐Ÿ‘4โค2
Spec-Driven Development

Do you like frameworks? I'm quite skeptical about them.

They try to solve everything at once and end up adding complexity where itโ€™s not really needed. But engineers love inventing frameworks. Vibecoding is no exception. A new family of frameworks is called spec-driven development (SDD).

The main idea is to write a "spec" before writing code with AI.
A spec is a structured description of WHAT should be done and WHY. In classic terms, a spec is like an interface, generated code is like an implementation.

Main principles:
๐Ÿ”ธ Spec-first. A thoughtful spec is written first, reviewed and then used in a development workflow.
๐Ÿ”ธ Spec-driven. The spec is kept in git repo and itโ€™s used to evolve and maintain the feature.
๐Ÿ”ธ Spec-sourced. Only the spec is edited by human, the code is edited by AI-agent.

Development workflow:
intention -> requirements -> design -> tasks -> implementation


Popular implementations:
- https://github.com/github/spec-kit/
- https://github.com/Fission-AI/OpenSpec
- https://kiro.dev/
- https://github.com/bmad-code-org/BMAD-METHOD

From my perspective it looks like an attempt to bring some control over vibecoding. But as a result the agent generates a bunch of markdown files to review, and I cannot say it's much easier than code review (if not harder, since LLMs tend to be verbose).

Thereโ€™s no common opinion on SDD yet. I know people who like it, and people who don't. Like any AI tool, it needs experimentation and adaptation to your specific tasks.

P.S. Mentioned tools contain interesting prompts that can be reused without SDD itself.

#engineering #ai #sdd
โค3๐Ÿ‘3๐Ÿ”ฅ1
Rethinking High Availability

Have you noticed how the world has changed? Again.

A year ago I used to say that in public clouds three availability zones are enough. Each zone is in its own data center, data centers are located within ~100 km of each other, and the probability of losing two at the same time was considered very low.

There is no such assumption anymore.
Today, losing two or even all three availability zones in a single region is no longer extremely rare. Itโ€™s something that can actually happen.

What does it mean in practice?
If your business requires system 99.5% availability or higher, relying on a single region is not an option.

Technically it means:
- Using multiple regions even in public clouds (or even different cloud providers).
- Switching to cold or hot standby setup.
- Storing backups in a different region.
- Regularly testing DR scenarios.

Disasters do happen.
The question is whether your system is ready or not.

#engineering #systemdesign #reliability
๐Ÿ”ฅ6๐Ÿ‘1
SDD: OpenSpec

Last week I wrote about spec-driven development (SDD) as a new wave of frameworks for vibecoding. I tried a few of them. If youโ€™re just exploring SDD, Iโ€™d suggest starting with OpenSpec.

Why?
For me, it felt the simplest and least overloaded.

To get started, you basically need 3 commands:

/opsx:propose
Here you describe what you want to build: bugfix, feature, design or something else (but using SDD for bugfix still feels like overkill).
What I liked:
- You get a structured design (design.md)
- It explains why decisions were made
- It adds open questions for clarification to think more
- You get tasks.md with detailed steps what will be done
At this stage I had to iterate a few times because some assumptions were wrong. But in the end I identified gaps in initial feature definition and got a clear plan of future changes. The good thing is that changes at this stage are cheap.

/opsx:apply
If youโ€™re ok with the design, you can ask the agent to execute the plan. It's possible to execute all tasks at once, split them or run multiple agents for parallel execution. Implemented steps are marked as completed in tasks.md.

/opsx:validate
This is a control step. You can request the agent to validate that the implementation matches the design. Becauseโ€ฆ agents still drift and make mistakes.

Of course, OpenSpec contains other interesting commands, but you can add them later when you get used to SDD.

How is it different from Plan mode?
Plan mode mostly generates tasks.md. No design. No real spec.
SDD reframes the work into a design-first approach and adds additional prompts to do it well.
And honestly, I liked the preparation phase result much more.

I wouldnโ€™t use it everywhere, but for refactoring or feature development it looks really good.

#engineering #ai #sdd
๐Ÿ‘3โœ2๐Ÿ”ฅ2
Building AI-Powered Team

AI adoption is one of the biggest challenges and at the same time one of the biggest opportunities for business.
Some teams report significant productivity boost. Others say: โ€œAI might be useful for some tasks.โ€

So whatโ€™s the difference?
AI adoption is not about access to the tools. Just buying licenses and giving them to engineers doesn't work. The team need to rethink how they work and integrate these tools into their daily workflow. And thatโ€™s already classic change management task.

On this topic, I recently came across the GitHub Internal Playbook for building an AI-powered workforce. They highlight that AI adoption is not really a technical problem, itโ€™s a human one.

GitHub suggests 8 pillars to drive adoption at the organization level:
- AI advocates. Internal champions who scale adoption through peer-to-peer influence and feedback.
- Clear policies. Defines rules for using AI.
- Learning & development. External and internal training and education.
- Metrics. Track adoption, engagement, and business impact.
- Ownership. A central owner who orchestrates the program and drives the overall strategy.
- Executive support. Visible leadership commitment and strategic vision.
- Right tools. Different tools for different roles.
- Communities. Peer-to-peer learning, knowledge sharing, and collaborative problem-solving.

And in reality, the key part here is the people on the ground, the experts who drive the change, adapt the tools to real tasks, and teach others. This is also covered in more detail in the companion article Activating your internal AI champions.

You canโ€™t roll out AI top-down.
You canโ€™t standardize it with one template for everyone. Every team has its own context. Without understanding it, any โ€œunified approachโ€ will fail.

#leadership #ai
๐Ÿ‘8
Agent Harness

Harness is a new buzzword introduced by modern AI.
Let's check what it is and why it matters.

The term harness refers to the logic around LLM that controls and guides how an agent operates. It's not the agent itself but the tools and guardrails that help it achieve better results.

A harness typically includes:
๐Ÿ”ธ System prompts
๐Ÿ”ธ Tools, skills, MCPs and their descriptions
๐Ÿ”ธ State & memory (current task state, past runs, intermediate states)
๐Ÿ”ธ Planning & task decomposition
๐Ÿ”ธ Context engineering strategies
๐Ÿ”ธ Safety & guardrails (allowed tools, rate limiting, prompt injection protection)
๐Ÿ”ธ Bundled infrastructure (filesystem, sandbox, browser)
๐Ÿ”ธ Subagent orchestration logic
๐Ÿ”ธ Hooks/middleware for deterministic execution (compaction, continuation, lint checks)

Well-known examples of harness ecosystems include Claude Code, Cursor, LangChain.

The overall trend is that each model provider now builds and promotes its own harness. But because each provider uses different system prompts, model tuning techniques and context management strategies, the same model in different ecosystems will produce different results.

So the same model does not mean the same agent. And the real competition is no longer between models. Itโ€™s between harnesses fighting for your workflow and your budget.

#ai #engineering
โค4๐Ÿ‘3
CliftonStrengths 34

I recently passed the CliftonStrengths 34 assessment, so today I will share what it is and how it can be useful for your career.

CliftonStrengths is a framework that identifies your natural talents that help create value at work.
It was launched by Gallup in 2001, and since then more than 26 million people have taken it.
So it's based on real data and many years of consulting experience.

The main idea is that we should focus on our strengths to achieve results and not try to improve our weaknesses.

How it works:
- 200 questions
- 4 strength domains: executing, influencing, relationship building, and strategic thinking.
- 34 strength areas within those domains.
- All 34 themes are ranked in a personal order from the strongest to the weakest.
- Top-10 are our main talents to focus on.

The interesting part is that every strength has both a positive and a negative side. It can help you succeed or hold you back.
For example: Learner. The person with this strength quickly picks up new topics, constantly extend their knowledge. But it's easy to get stuck in a โ€œforever studentโ€ mode.

The test is really helpful from self-reflection perspective:
๐Ÿ”ธ Once you know your strengths, you can rely on good parts and mitigate the downsides.
๐Ÿ”ธ The less you do the work that isnโ€™t natural to you, the more productive and energized you are.
๐Ÿ”ธ We assume others think and work like we do. But they don't. And this is our advantage to use.

What it gave to me? First of all, I realized that I really do the work I'm naturally good at (hello, imposter syndrome). Second, I started noticing my strengths in real situations and using them more consciously.

So the assessment is a helpful tool to understand what to focus on to achieve better results. And itโ€™s a good starting point to rethink your day-to-day activities and align it more with what you enjoy.

#softskills #leadership #productivity
โค6๐Ÿ”ฅ4
Inside the Context Window

What makes your work with agents efficient? Chosen model? Harness? Instructions clarity?
I would say that first of all it's the quality of the context you provide.

Context is everything the model sees before it generates a response.
Two facts to know about the context:
1. It's limited (and costs you money ๐Ÿ’ฐ ).
2. The longer the context, the worse the results.

So context engineering is a set of practices to fill the context with just enough information to get the desired results. The main goal is to balance the amount of context given: not too little and vague, not too much and detailed.

The first step in context engineering is to understand what the context actually contains. And itโ€™s not just your prompt.

Typical context structure:
๐Ÿ”ธ System prompts & instructions: the hidden layer of system prompts, safety policies, behavioral rules, role definition. Usually it's part of the harness and you cannot change it.
๐Ÿ”ธ Project context: AGENT.md\CLAUDE.md, repo structure, settings. It's added as a first prompt to any session you open with the agent.
๐Ÿ”ธ Available tools: skill descriptions, MCPs, available CLIs.
๐Ÿ”ธ Retrieved information: loaded files, data from RAG system.
๐Ÿ”ธ State & history: The current conversation, including user, model and tools responses.
๐Ÿ”ธ Reasoning: intermediate reasoning results (thinking mode).
๐Ÿ”ธ Long-term memory: knowledge base from previous conversations like user preferences, summaries of working sessions, facts the agent was asked to remember for future use.
๐Ÿ”ธ Your prompt: the actual user request.

As you can see, the context is already filled with a lot of information before you even start the real work. To make agents efficient, keep their context clean and focused. Don't overload it with unnecessary information.

#ai #engineering
๐Ÿ”ฅ4
ReasoningBank

Currently AI agents have one major limitation: they cannot learn. I mean they don't learn from their experience or from the results of completed tasks. Once the model is trained, all we can do is to tune our prompts or enrich results with domain data from RAG.

Researchers from Google started exploring how to overcome this limitation and introduced the concept called ReasoningBank.

The overall idea is simple:
1. The agent writes down the result of successful or failed tasks into a dedicated md file.
2. During task execution, the agent searches the ReasoningBank and pulls relevant memories into the context.
3. Then it uses an LLM-as-a-judge approach to self-evaluate the result, analyze the trajectory of reasoning, and extract success insights or failure reasons.

Each file has the following structure (very similar to skills):
- Title: identifier of the core strategy.
- Description: short summary of the memory item.
- Content: reasoning steps, decision explanation, or operational insights extracted from past experience.

To be honest, benchmark results compared to other agent memory approaches do not look extremely impressive:
ReasoningBank without scaling outperformed memory-free agents by 8.3% on WebArena and 4.6% on SWE-Bench-Verified.


At the same time, this approach adds even more data to the context. And context, as we know, directly affects both model behavior quality and usage cost.

The official paper contains interesting research details, including particular prompts and measurements.

From my perspective, the idea and its implementation are very similar to skills or other long-term agent memories (e.g. in Claude Code). But the overall direction of making agents capable of learning from their own experience looks really promising.

#ai #engineering #news
๐Ÿ‘3โค2
AI Engineering

I strongly believe that if you want to use any technology effectively, you need to understand how it works under the hood. Especially in software engineering.

So if you havenโ€™t looked into LLM internals yet, Iโ€™d highly recommend reading AI Engineering by Chip Huyen. The book was published in December 2024. And as AI is moving extremely fast, you might think itโ€™s already outdated. Yes and no.

The book focuses on fundamentals. And they donโ€™t really change that fast. You wonโ€™t find hype topics like skills, harnesses, or agents orchestration there. But for building structured understanding of how AI works, you don't actually need them.

What I personally found useful:
๐Ÿ”ธ Core LLM concepts: tokenization, training and post-training processes, datasets preparation. This part is very similar to Mashing Learning Crash Course from Google.
๐Ÿ”ธ Model evaluation: quite complex but interesting topic about model output results and their comparison. The book covers ranking, model specialization, public benchmarks and AI-as-a-judge approach.
๐Ÿ”ธ Prompt engineering: good reference about context and prompting. Additionally, the author described different security aspects of using prompts, that part really extended my thoughts about what can go wrong.
๐Ÿ”ธ Finetuning: a deep dive into different ways to optimize models. You need to be a good mathematician to understand this part. So I was really glad I'm not an ML engineer ๐Ÿ˜ƒ (huge respect to all ML experts, it's really hard).
๐Ÿ”ธ User feedback: basic patterns on how to collect feedback, what to measure and why, common pitfalls.

To sum up, this book is really great to structure your knowledge about modern AI systems. Once you have that foundation, it becomes much easier to navigate all the new tools, patterns and paradigms that appear almost every month.

#booknook #ai #engineering
๐Ÿ”ฅ4๐Ÿ‘3
Ralph Loop

Underterministic nature of AI sometimes produces very interesting engineering solutions. One such example is Ralph (or Ralph Wiggum) Loop.

This is an AI coding pattern inspired by The Simpsons character Ralph Wiggum, known for saying weird things with high confidence ๐Ÿ™ƒ.

The idea is simple: The agent can be dumb in a single iteration. But if it keeps retrying with feedback long enough, it eventually converges.

The loop steps:
Start a new agent\subagent -> load task + memory -> execute 1 selected task -> run validation -> save learnings -> commit progress -> repeat

The loops finishes when all tasks have passes:true or it reaches the maximum number of iterations (default is 10).

But the real value of the technique is not in retries, it's in context engineering strategy under the hood:
๐Ÿ”ธ One loop executes only one task. It keeps agent focused.
๐Ÿ”ธ Each iteration starts a new agent session. State lives outside the context keeping it clean between iterations. State is stored in git history, progress.txt and long-term-memory files.
๐Ÿ”ธ Tasks are delegated to subagents. The main context is not polluted with task execution details and validations.
๐Ÿ”ธ AGENTS.md is updated on each iteration. It is a live artifact that contains discovered patterns, learnings and conventions so future iterations can benefit from those findings and do not repeat previous mistakes.
๐Ÿ”ธ AGENTS.md contains explicit validations for feedback loop. It usually defines linters and typechecks, build and test execution commands.

Ralph Loop is a really powerful pattern to get things done: it just repeats the task until it succeeds making agent execution more reliable. "Deterministically bad" but effective.

But this approach only works if you have good task decomposition, clear completion criteria, and mature SDLC practices with strong validations and feedback loops. Otherwise the agent will generate just a ton of mess.

#ai #engineering #patterns
๐Ÿ”ฅ4
Skill Packaging

Engineering teams are actively building internal collections of skills for agents: code review, troubleshooting, design preparation, onboarding, security practices.

And it looks great until you hit the question: how do you distribute those skills across dozens of teams and multiple harnesses? For Claude you need to put skills into .claude, for Cursor into .cursor, for Gemini into .gemini, etc. And things become even messier when you need to roll out updates.

To solve this problem big companies mostly build their own in-house solutions. Smaller companies usually just copy files from some shared repository and manage this complexity manually.

I donโ€™t like reinventing the wheel, so when my team faced the same problem, we started looking for an existing solution we could reuse. And the only actively maintained tool we managed to find was apm by Microsoft.

APM is a package manager for prompts, skills, and MCPs. In other words, itโ€™s maven or gomod for agents.

APM package structure:
my-package/
โ”œโ”€โ”€ apm.yml
โ””โ”€โ”€ .apm/
โ”œโ”€โ”€ instructions/
โ”‚ โ””โ”€โ”€ my.instructions.md
โ”œโ”€โ”€ skills/
โ”‚ โ””โ”€โ”€ my-skill/
โ”‚ โ””โ”€โ”€ SKILL.md
โ”œโ”€โ”€ agents/
โ””โ”€โ”€ prompts/

To install the package in a target repo you need to define apm.yaml with a list of required dependencies:
name: my-projecty
version: 1.0.0
targets:
- claude
- copilot
dependencies:
apm:
- <git-address>/my-package
- <git-address>/another-package
mcp: []

After that you just run:
apm install

and the required skills will be installed into the corresponding harness folders (.claude, .copilot, etc.).

The tool works with Github and on-prem git installations like Gitlab.

APM is not perfect. It had some unpleasant (but not critical) issues, and sometimes you can really feel that it was heavily vibe-coded in Python .

But despite all that, the tool actually works: you have a spec to define your skills\prompts packages, distribute and update them with simple apm update. And on top of apm dependencies format itโ€™s pretty easy to vibe-code your own internal skills marketplace.

#ai #engineering #agents
๐Ÿ‘4โค3
The Fearless Organization

The most dangerous teams are the quiet teams. There are no disagreements, no bad news, no conflicts. Looks like harmony until the real incident.

That's the topic of the book The Fearless Organization: Creating Psychological Safety in the Workplace for Learning, Innovation, and Growth by  Amy C. Edmondson.

Amy is a professor of Leadership and Management at the Harvard Business School. She has studied the phenomenon of psychological safety and its impact on team performance for many years across different organizations.

She defines psychological safety as follows:
a belief that one will not be punished or humiliated for speaking up with ideas, questions, concerns, or mistakes, and that the team is safe for inter-personal risk taking.


What it means in practice:

- people are not afraid to ask questions
- they do not hide problems
- they are not afraid to look stupid
- they do not avoid conflicts
- they can freely express their opinions and bring suggestions

Why does it matter?
There is a good example from the book that explains that. Imagine a doctor prescribes treatment for a child. A nurse notices that doctors usually prescribe drug A in such cases, but this time it is missing.
In a team with high psychological safety, the nurse will clarify this with the doctor and may help prevent a medical error.
In a team with low psychological safety, she may be afraid to ask. And the consequences can be dramatic.

The core idea is simple. But the book contains a lot of real stories where a low level of psychological safety leads to dramatic results (e.g. Volkswagen emission scandal, pilot mistakes that caused plane crashes). The author repeatedly highlights that the more complex and critical the profession is, the more important psychological safety becomes.

How does this relate to our daily work?
We as leaders are responsible for the psychological climate in the team: how well we listen to people, accept different opinions, react to questions, mistakes, or bad news. It's our daily routine that either helps the team become more effective, or leads people to hide problems and the real state of things.

Overall, I really liked the book. It explains the idea in simple language with many real examples. And what is important for me, all arguments and recommendations are supported by sociological research, experiments and practical psychology.
So psychological safety is not just an idea. It is a proven behavioral model and set of practices that can actually help leaders build better teams.

P.S. One of the best real examples of psychological safety is Pixar. I wrote about it earlier in overview of Creativity, Inc.: Overcoming the Unseen Forces That Stand in the Way of True Inspiration: parts 1,2,3.

#booknook #softskills #leadership
๐Ÿ‘3
Agent Readiness Framework

A few weeks ago I wrote that adopting coding agents requires strong engineering practices.
Test stability, linting, documentation, security controls matter much more than a particular harness or model.

Agent Readiness framework is an attempt to formalize these criteria for a particular repository and define how much autonomy can be safely delegated to agents.

The framework evaluates repos across 8 dimensions:
- Style & Validation
- Build System
- Testing
- Documentation
- Dev Environment
- Code Quality
- Observability
- Security & Governance

Based on these dimensions, the framework defines 5 levels of repo maturity:
๐Ÿ”ธ Level 1: Functional. Basic checks: README, linters, unit tests.
๐Ÿ”ธ Level 2: Documented. Detailed documentation and basic automations: AGENT.md, reproducible dev env, contribution guides.
๐Ÿ”ธ Level 3: Standardized. E2E tests, observability, security scanning, maintained documentation.
๐Ÿ”ธ Level 4: Optimized. Fast validation loops, canary deployments, build optimization. Process is optimized for fast feedback.
๐Ÿ”ธ Level 5: Autonomous. Task decomposition, multi-service orchestration, self-healing logic, auto-remediation.

The idea is simple: the higher the maturity level, the more predictable and reliable agent results. But looking at these levels, I can see that most repos are actually somewhere between Level 1 and Level 3.

Framework authors also provide a tool to automatically measure these criteria and maturity level, but it's available only after registration and using proprietary APIs. Scanned examples you can find at https://factory.ai/agent-readiness.

There is also an open-source alternative https://github.com/kodustech/agent-readiness. The project doesn't look active, but it gets the job done. It analyzes the repo and generates a report with the overall maturity level, findings for each dimension, and suggestions for improvements. Some rules are not very accurate. Looks like the project was mainly designed for python and js code verification. But anyway the tool gives you a good sense of what to pay attention to in your codebase.

What I like about this framework is that it shows that agent effectiveness is actually limited by the maturity of engineering practices. And it provides measurable and actionable results, that are easy to convert into an improvement plan for a particular repo.

#ai #engineering
๐Ÿ‘3๐Ÿ”ฅ3
How Anthropic Writes Skills

Last week Anthropic published lessons learnt of how they build agent skills internally. It's quite interesting to read recommendations from the company that introduced the concept in the first place.

Key ideas:
๐Ÿ”ธ Don't be obvious. Model already knows how to code. A skill should provide instructions that change default agent behavior, not repeat the data the model was trained on.
๐Ÿ”ธ Build a gotchas section. Add common mistakes and lessons learned. This helps the agent avoid repeating the same failures.
๐Ÿ”ธ Use progressive disclosure. A skill is not just a SKILL.md. It can include additional files that are loaded on demand, reducing context overload.
๐Ÿ”ธ Don't be too specific. Give the agent information it needs, but leave the flexibility to adapt to the situation.
๐Ÿ”ธ Separate configuration from instructions. Store setup data in config.json or collect required input from the user.
๐Ÿ”ธ Write description for the model, not for humans. A description should help the model to understand when the skill should be invoked.
๐Ÿ”ธ Use long-term memory. Skills can maintain their own data in a subdirectory and reuse it across executions.
๐Ÿ”ธ Automate where possible. Not everything should be a prompt. Some actions can be automated with helper scripts and functions.

Unfortunately, the article doesn't provide any guidelines of how to evaluate skill effectiveness. It's still not clear how to understand if a skill actually works, how to compare two versions of the same skill, or how to detect that a skill is no longer useful.

Provided recommendations are based mostly on observations of how popular internal skills are structured. It's useful but not measurable.

So despite the fact that skills are the most powerful agent extension now, evaluating them remains on of the hardest engineering problem.

But that's another story.

#ai #engineering
๐Ÿ‘3
Project Hail Mary

Technical books and articles are great, but sometimes my brain needs a break. Especially now, when AI is generating more and more new things to learn every day. One of my favorite ways to recharge is reading fiction, and I recently finished the very popular Project Hail Mary by Andy Weir.

I'm not a big sci-fi fan, but I definitely enjoyed this book.
Thanks to the recent movie adaptation, the story is probably familiar to many.

A man wakes up alone on a spaceship with no memory of who he is or why he's there. As his memories gradually return, he discovers that he's a scientist on a mission in another star system.
Humanity is facing extinction. The Sun is losing energy because of a mysterious organism called Astrophage. Nearby stars are also infected except Tau Ceti. A crew is sent there to find out why it's different and, hopefully, save Earth.
Unfortunately, only the main character survives the journey.
He starts his scientific research of the star and eventually noticed a spacecraft on his radar.

And then the almost impossible happens: first contact with an alien. The problem is, how do you communicate when you don't even share the same way of producing speech? The answer is physics. I really liked the idea that laws of physics are universal, making math and science the foundation for building communication between two civilizations.

I won't spoil the rest, but the story is really engaging. Despite being a disaster novel, it contains a good dose of humor and places a strong emphasis on friendship, kindness, and mutual help. And when you finish it, you're left with a surprisingly warm feeling.

I watched the movie after finishing the book, and for once I can say the adaptation is actually good. Of course, it's much more compact and some details are simplified, but it stays remarkably close to the original story while preserving its emotional depth.

Overall, I loved it. Highly recommended both the book and the movie.

#offtop #booknook
๐Ÿ‘3โค1๐Ÿ”ฅ1