<written by a human being>Yesterday I "discovered" a new way to work with coding agents most productively.
Of course, I already knew that agents have long been capable of spinning up other agents on their own - I've written about it several times. But during development I still want to have some degree of control over the process and what's happening, since AI does tend to forget certain things and drift away from the bigger goal.
Because despite a large context window, it's still not quite enough to keep the "final picture" in mind - though more than enough to focus on a specific piece of code.
But at some point, once the core foundation of the system being built is already in place and it's hard to go off the rails, I started noticing that most of my responses to the agent's questions were something along the lines of "Do whatever's architecturally correct." And the oversight process was effectively turning into mindless routine - the kind that makes sense to just delegate away.
So I started appending to the next backlog task prompt a request to spin up an independent agent on the senior Opus model to handle the task, monitor its execution, make sure the branch gets pushed for review, revisions get made, and all project rules are followed. At any decision point requiring a judgment call, act as an architect - prioritize architectural correctness, never drift toward hacks and patches just to pass the tests.
This way I stopped switching my attention back to the development process every few minutes. Instead, 20-30 minutes later I get a report on the work done, a closed ticket, and readiness to pick up the next one.
And the next one can be picked up in the same session - because monitoring coding agents this way from the main session doesn't burn many tokens in it.
🔥1
<written by a human being>A few months ago I wrote about how I was still working with no-code builders the old-fashioned way - manually clicking buttons and tweaking block parameters, since the architecture of these apps is closed off and you can't, for instance, reshape a workflow algorithm via API.
But after installing Playwright, it hit me that the time had come to delegate this too - because today I already trust the latest models with responsible work, and they perform really well, making fewer mistakes than I do.
I started with a backend application built on Directual, where I needed to rework around a dozen workflow scenarios, clear out the call queues in the database, and in doing so reduce the server load - which had been going through the roof lately due to some poorly designed processes.
The work was handled by Codex, which naturally first asked me to authenticate in the app, verified its access, and reported back that it was ready to begin.
I started carefully with a single scenario, reviewed everything it had done - found not a single error. Then I asked it to finish the job and optimize all the remaining scenarios that needed attention.
Twenty minutes later I received a report on the completed work, reviewed it, and broke into yet another blissful smile of satisfaction - one I still can't seem to get used to, even though it lights up my face every single day.
🔥1
<written by a human being>Somewhere along the way, without really noticing, I switched into work mode on weekends - even though I used to rely on other activities to give my brain and attention that reset. Because without it, you don't start the next week with a fresh head, you start it with a loaded one, and your decision-making seriously suffers.
Maybe it'll stop after a while, but AI gave this massive influx of fresh air, thanks to which a second wind appeared - and even a third - which lets me go at development with huge enthusiasm and tackle complex problems that used to seem boring and not worth the time.
Now my work week ends together with my weekly Claude Code and Codex limits, which I scrape completely clean. And I wait impatiently for Monday, when they reset, so I can load the next task into my AI agents.
This, in my opinion, is exactly what a person's work should look like - literally make you impatient to act, and with enthusiasm and energy that isn't drained by the process but replenished during it.
What is this, ikigAI?
🔥1
How Printing Press saves tokens, or your own personal sort of CLI
Seems like everyone's gotten used to MCP by now, except there's one catch: after some (pretty short) amount of time the number of connected MCPs starts growing, and using them in every AI agent session gets expensive - because it means reading through bulky documents that, naturally, eat tokens.
When I'm working with a codebase, for example, I sync everything with a task manager, which requires hooking up the AI agent to it. With Linear that's an MCP server and a plugin, with Plane it's direct API through python scripts (AI agents love writing those, what can you do).
So the API ends up way cheaper - there's a clean call to endpoints that return exactly the data you need in a structured format. What if you wrapped those endpoints in a CLI and called them directly from there?
That's already how a ton of tools work. Just install a CLI and enjoy the token savings. But what if there's no CLI for your tool? That's where Printing Press comes in - with a ready-made library and an algorithm for "printing" CLI tooling.
Obviously we're not doing any of this by hand - we just point a coding agent at the PP repo and ask it to tell us everything we need to know, install the right CLIs from a huge and constantly growing library. And if the tool you need isn't there, like with Plane - we "print" it ourselves with literally one command from Printing Press.
Sure, there'll be some fiddling involved, but in the end it's worth it - no more bloated MCPs for everyday simple tasks. Just lean, fit CLIs.
<written by a human being>Seems like everyone's gotten used to MCP by now, except there's one catch: after some (pretty short) amount of time the number of connected MCPs starts growing, and using them in every AI agent session gets expensive - because it means reading through bulky documents that, naturally, eat tokens.
When I'm working with a codebase, for example, I sync everything with a task manager, which requires hooking up the AI agent to it. With Linear that's an MCP server and a plugin, with Plane it's direct API through python scripts (AI agents love writing those, what can you do).
So the API ends up way cheaper - there's a clean call to endpoints that return exactly the data you need in a structured format. What if you wrapped those endpoints in a CLI and called them directly from there?
That's already how a ton of tools work. Just install a CLI and enjoy the token savings. But what if there's no CLI for your tool? That's where Printing Press comes in - with a ready-made library and an algorithm for "printing" CLI tooling.
Obviously we're not doing any of this by hand - we just point a coding agent at the PP repo and ask it to tell us everything we need to know, install the right CLIs from a huge and constantly growing library. And if the tool you need isn't there, like with Plane - we "print" it ourselves with literally one command from Printing Press.
Sure, there'll be some fiddling involved, but in the end it's worth it - no more bloated MCPs for everyday simple tasks. Just lean, fit CLIs.
🔥2
<written by a human being>Back when I had a regular job, I often had to explain my ideas, plans, concepts, tasks, and all the other joys of corporate life to colleagues and management. Whiteboards with markers or prepared presentations were the usual go-to - because obviously nobody's going to read a dry document, and making one isn't exactly a pleasure either.
These days I present that kind of thing to colleagues as HTML pages, which let you lay out information in a way that actually lands visually. I'm not much of a designer myself, so I hand that part off to the latest AI models, and they nail it every time.
There's not much prep work involved either - you can just throw a pile of inputs at the agent: various documents, meeting transcripts, your own raw thoughts, and a voice note of what you want the final presentation to actually say.
The agent structures all that chaos on its own, picks out only what matters, and synthesizes it into a clean, coherent layout for a clear visual result.
If you want, you can even throw in interactive elements that affect the output - sliders that adjust financial charts, buttons that change the result, whatever comes to mind.
The key thing here, obviously, is time saved. I used to spend a solid third of a workday on this kind of thing, at minimum. Now it's genuinely a 10-minute conversation with an agent.
🔥1
<written by a human being>
A few months back, while designing the architecture for a potential system and planning the stack, I gave this task to three different models - Claude Opus, ChatGPT, and Grok, all in their top reasoning modes. I got three different recommendations that overlapped on individual modules, but each came with an impressive set of arguments for its proposed options.
Same input, different outputs. How do you choose between them? Present each model with the other two versions and ask it to critique all three - including its own - based on the consolidated reasoning from the others.
After a couple of iterations of this kind of triage, a consensus emerges somewhere in the middle.
Fast forward a few months, and I'm kicking off development on a new system. PRDs are ready, functional and non-functional requirements are gathered, constraints are defined - time to start planning. Long hours of interviews, self-checks, spec analysis by independent agents, and the final architecture and stack are starting to take shape.
As always, with a healthy dose of skepticism toward any single model's decisions, I load the same set of input documents into two others - and the output is... practically identical recommendations for the stack. They diverge on a few individual elements, but that's genuinely 10% of the total volume.
Consensus reached without any cross-analysis between them. What is this - identical training data? Or is it actually the best choice for my case, and the models have gotten smart enough that they all arrived at the same conclusion independently? Have you noticed this?
SDD, aka Spec (Specification) Driven Development, or how vibe-coders reinvented systems analysis
For seasoned software developers, it's not news at all that before writing code, it's a good idea to have documentation of what's actually going to be built. Moreover, in enterprise environments this has long been a mandatory phase - and the practice of systems analysis, which is precisely what handles (or used to handle, in the pre-AI world) the development of that documentation, is a perfectly normal thing.
The moment it became possible to quickly write code that works the way the author intended, everyone rushed to do exactly that - and with predictable frustration discovered that things aren't quite as simple as advertised by the companies collecting real money for AI tokens.
Turns out the final product doesn't come together quickly, and rarely the way you'd want. A lot of things don't work, or turn out to be impossible despite the AI promising everything would fly straight to the moon. Features stumble at every step, you end up rewriting them a dozen times, burning through all your limits. And even if you managed to vibe-code something that more or less worked, it turned out there was a gaping security hole in it and hacking the thing was trivially easy. I actually talked about this last fall.
For those who've been in development for a long time, none of this is surprising. Pretty much the same result you'd get if you let a junior developer loose on a codebase. On the twentieth attempt, crooked and cobbled together, they might deliver something resembling a final product - but they'll miss a ton of details, including cybersecurity.
And lo and behold, the so-called SDD concept was born (or rather, people remembered the fundamentals). I think I've got material for several posts here, so stay tuned.
<written by a human being>
For seasoned software developers, it's not news at all that before writing code, it's a good idea to have documentation of what's actually going to be built. Moreover, in enterprise environments this has long been a mandatory phase - and the practice of systems analysis, which is precisely what handles (or used to handle, in the pre-AI world) the development of that documentation, is a perfectly normal thing.
The moment it became possible to quickly write code that works the way the author intended, everyone rushed to do exactly that - and with predictable frustration discovered that things aren't quite as simple as advertised by the companies collecting real money for AI tokens.
Turns out the final product doesn't come together quickly, and rarely the way you'd want. A lot of things don't work, or turn out to be impossible despite the AI promising everything would fly straight to the moon. Features stumble at every step, you end up rewriting them a dozen times, burning through all your limits. And even if you managed to vibe-code something that more or less worked, it turned out there was a gaping security hole in it and hacking the thing was trivially easy. I actually talked about this last fall.
For those who've been in development for a long time, none of this is surprising. Pretty much the same result you'd get if you let a junior developer loose on a codebase. On the twentieth attempt, crooked and cobbled together, they might deliver something resembling a final product - but they'll miss a ton of details, including cybersecurity.
And lo and behold, the so-called SDD concept was born (or rather, people remembered the fundamentals). I think I've got material for several posts here, so stay tuned.
<written by a human being>
SDD, or Spec-Driven Development - the concept is dead simple. First you write the spec for what you're building, then, following that spec (that's the key part), you write the actual code.
It seems pretty logical, because nobody questions that before building a house you should first carefully prepare an architectural plan with all the details - including things that might seem like minor stuff you could just fix on the fly. But the more thorough the plan, the faster and smoother the build goes. This is exactly the "measure twice, cut once" moment (or "measure seven times" as the Russian saying goes.
I don't know where this myth came from that software development can work differently, but in practice the same thing happens with software. I'm talking about mature software, more enterprise-level - because yeah, you can build a shed without a detailed architectural plan. But it'll look like one too.
That's exactly why in a little while we're gonna see a whole wave of these shed-products, slapped together without a proper plan or solid architecture, that'll collapse at the first storm or any halfway serious load.
By the way, there's nothing wrong with that - honestly a lot of micro-software (not the Bill kind), built personally by a vibe-coder for some specific niche task, can totally get away without complex engineering. I write one-shot utilities for my own work and clients all the time, no plans, no architecture.
But when it comes to more or less serious systems - there's no way around doing the architecture first.
<written by a human being>
We figured out that SDD is the right and sensible approach to building software with AI agents. It's like drawing up the architectural plan before building a house. But what does it look like in practice?
Sooner or later I'll write my own skill for AI agents that writes specs to my requirements, but for now I'm using a skill from the Superpowers pack, which I talked about earlier. It's a plugin from Anthropic that's basically a skill set, and it has a spec-writing skill. It's not tuned specifically for development though - more of a generic type - but it works fine overall, especially if you spend a long time grinding through the first part with the questionnaire and demanding that the necessary details get added.
At the end you get a document in Markdown format, which I strongly recommend reading, because the AI can go sideways in some places, forget something, or generously add stuff at its own discretion.
I also recommend doing a double or even triple check of the spec. First - ask the AI to run an independent agent (within an independent session) validation of the spec for errors, consistency, logic, and other criteria that matter for this specific spec.
Second - take the spec that already went through that review-and-fix cycle and hand it off for validation to other models - ChatGPT, Grok, whatever else. Let them find the weak spots and inconsistencies. Different training data is really good at enabling that kind of unbiased take.
Only after that kind of multi-layer check should you start working with it. Specifically - break it into tasks and start executing.
Infinite development loop
Hit a problem I don't know how to get out of yet. Been developing with an AI agent for weeks now, the system is mostly written and works module by module, but it's still very rough and crooked. Talking about a LangGraph-based graph that mixes deterministic nodes with code scripts and LLM calls - which, of course, tend to deviate from what's actually required.
And that's exactly where the constant problems keep spawning: either the criteria used to evaluate the LLM's output are too narrow and the model is literally set up to never meet them. Or there's not enough context, but the deterministic nodes can't surface that by their very nature. Or the dev agent seems to find yet another bug that's supposed to fix everything, but in practice pulls things in a different direction, digging the tech debt deeper and pushing the final result further away.
Not the first time I've tried to change the approach, nudge the agent in what seems like the right direction - but we keep sliding back into an infinite loop of fixes. Every new fix spawns a few more "bugs," which keeps growing the task count that's supposed to shrink over time.
How to get out of this hole - no idea yet. And even though the graph system we're building is pretty complex, it's still not rocket science. But Opus 4.7 is doing a poor job navigating the development so far, which the lack of results makes obvious.
Once I figure it all out, I'll definitely share what I found. But for now - tell me, are you running into similar issues with agentic development?
<written by a human being>
Hit a problem I don't know how to get out of yet. Been developing with an AI agent for weeks now, the system is mostly written and works module by module, but it's still very rough and crooked. Talking about a LangGraph-based graph that mixes deterministic nodes with code scripts and LLM calls - which, of course, tend to deviate from what's actually required.
And that's exactly where the constant problems keep spawning: either the criteria used to evaluate the LLM's output are too narrow and the model is literally set up to never meet them. Or there's not enough context, but the deterministic nodes can't surface that by their very nature. Or the dev agent seems to find yet another bug that's supposed to fix everything, but in practice pulls things in a different direction, digging the tech debt deeper and pushing the final result further away.
Not the first time I've tried to change the approach, nudge the agent in what seems like the right direction - but we keep sliding back into an infinite loop of fixes. Every new fix spawns a few more "bugs," which keeps growing the task count that's supposed to shrink over time.
How to get out of this hole - no idea yet. And even though the graph system we're building is pretty complex, it's still not rocket science. But Opus 4.7 is doing a poor job navigating the development so far, which the lack of results makes obvious.
Once I figure it all out, I'll definitely share what I found. But for now - tell me, are you running into similar issues with agentic development?
Media is too big
VIEW IN TELEGRAM
The Tasks AI Still Can't Do (And Why It Pretends It Can)
<written by a human being>
So, you've written a bunch of specs for a system you're planning to build. What's next?
First, you need to make sense of all this stuff. In my case, for example, I ended up with 12 ADRs (Architecture Decision Records) and 13 accompanying specifications. All of this needs to be structurally organized within the future repository. There's actually a dedicated spec for that - one that locks down this very structure. That's what you start working from.
The next agent should plan the deployment of a working project environment for development. Draw up a plan of everything that'll be needed, and actually get to work in accordance with the specs - they should always stay in context.
On top of that, I spun up a separate agent to visualize the architecture and system topology for me in C4 (a format specifically designed for architectural diagrams). Claude is honestly pretty bad at this right now, but I'll get what I want.
I know you're already itching to start building - but it's genuinely too early for that. Not if you want to avoid ending up with vibe-coded, leaky slop, anyway. You still need to lock down all the rules for working with the repo, the code, the system, and the docs, learn how to preserve and pass context properly, and keep clean, organized track of project tasks.
I'm doing all of this in parallel with my posts, by the way - so I'm literally sharing a behind-the-scenes look straight from the source: what's actually happening in my VSCode.
<written by a human being>
A couple of days ago, a friend of mine who works at a bank asked me about existing tools for drawing C4-level diagrams - basically, visually representing the architectural topology of an information system.
Reliable diagram visualization is something I've been saying for a long time - it's a task that AI currently handles quite poorly. Sure, it understands what charts and diagrams are, can recognize them and even "build" them - but only in theory, and only as text.
The one thing I actually managed to pull off was a BPMN diagram, and the saving grace there was that it's deterministic XML markup. So it's logical to assume that diagrams which are "drawn" with text should come naturally to AI.
But no such luck. For some reason, visual representation is exactly where they struggle the most. There's Structurizr DSL, for instance, which supposedly lets you generate the right diagrams through code. Except deploying it turned out to be a fairly labor-intensive task - which felt like overkill for a single diagram.
There are UML primitives that let you assemble what you need, but first of all it's not pure C4, and second of all it still comes out crooked.
In the end, the simplest thing that actually gives me the result I need - for now - is HTML visualization. Not without some tap-dancing, of course, but it gets the job done overall. And visually you can make it look pretty decent too.
For the rest - we're waiting for model updates where the training data will include more diagrams.
Media is too big
VIEW IN TELEGRAM
AI Delegation Is About Doing What You Never Could
<written by a human being>
I've talked about LangGraph a few times already and how I'm building my video editing system on top of it. And a few days ago I started running into videos about ADK - Google's Agents Development Kit. It's a relatively fresh framework that literally two days ago added its own graph-based Workflow Runtime, which immediately puts it if not on the same level, then at least closer to LangGraph.
And of course, already beaten up by the infinite development loop on LangGraph, I went to take a closer look at ADK and the potential of migrating my system to it. It's got a graph too, which lets you run deterministic nodes - meaning regular code scripts - and everything goes through familiar Python classes.
Spinning up a quick simple agent with a repeatable loop and strict procedures using ADK is gonna be way faster - you literally just grab ready-made components. In LangGraph you build everything from scratch, but that also gives you more control over what's happening.
But if you need to build a genuinely complex and detailed process (which is exactly what my system turned out to be) - LangGraph wins in a lot of ways, precisely because of the high boilerplate - state schema, nodes, edges. In ADK you literally define an agent
Agent(...) and that's it.Another mismatch with my requirements is ADK's hard lock-in to the Google ecosystem. If you're already deep in there, it's only a plus. But my goal from the start was to build an independent system on my own host, with no strict vendor lock.
So for now I'm staying on LangGraph, but keeping one eye on ADK and how it develops. I'm sure with such an active community it'll catch up fast and get polished to production-ready.
Meta-agents
The structure of my work project pushed me toward using a concept I call Meta-agents.
I have one repository, from which I design a new repository for a future system from scratch. Me and the agents plan everything - architecture, stack, order of work, project management, infrastructure deployment, all of it.
We also plan the work of AI agents on developing this new system and, accordingly, their work in a repository separate from the current one. We prepared a set of instructions and even a skill set that the agents will use.
And evolutionarily arrived at planning the work of those agents. Now we run smoke tests with them, where the initial prompts to the agents sound something like "Complete task DSP-160" and that's it. The agent has to figure out everything else on its own, while me and the agent in the other repo watch its actions and analyze what went wrong and what to tweak in the instructions, rules, and skills.
So here's the picture we ended up with: we created an environment for agents, launch them, and with another Meta-agent we observe and fine-tune the habitat for the new ones. Ironic, right?
So far this approach has been working really well: literally after the first smoke run and one instruction tweak, the second session went almost without a hitch, which is great. Continuing to observe.
<written by a human being>
The structure of my work project pushed me toward using a concept I call Meta-agents.
I have one repository, from which I design a new repository for a future system from scratch. Me and the agents plan everything - architecture, stack, order of work, project management, infrastructure deployment, all of it.
We also plan the work of AI agents on developing this new system and, accordingly, their work in a repository separate from the current one. We prepared a set of instructions and even a skill set that the agents will use.
And evolutionarily arrived at planning the work of those agents. Now we run smoke tests with them, where the initial prompts to the agents sound something like "Complete task DSP-160" and that's it. The agent has to figure out everything else on its own, while me and the agent in the other repo watch its actions and analyze what went wrong and what to tweak in the instructions, rules, and skills.
So here's the picture we ended up with: we created an environment for agents, launch them, and with another Meta-agent we observe and fine-tune the habitat for the new ones. Ironic, right?
So far this approach has been working really well: literally after the first smoke run and one instruction tweak, the second session went almost without a hitch, which is great. Continuing to observe.
Media is too big
VIEW IN TELEGRAM
Give AI a Terminal - Genius. Give It Bubble - Disaster
<written by a human being>
A couple months ago I talked about how no-code solutions were still stuck in semi-manual mode for me - the Claude Chrome plugin worked unbearably slow and ate an unbearable amount of (tokens), so it was faster and easier to just make edits by hand.
AI stayed this wise algorithm and UX advisor, helping me debug stuff and quickly come up with design solutions.
The next evolution step was feeding exports and dumps to agents. Pretty much any no-code tool lets you get data about what's happening inside the app or database one way or another - as tables, JSON files, or their own formats that end up being machine-readable anyway.
Bubble, for example, lets you export the whole app in .bubble format, which is basically minified JSON, and if you clean it up into standard format a coding agent starts understanding it pretty effectively.
Or Directual, which I work with a lot - it lets you export all your scenarios or workflows, plus table structure, into clean JSON formats, which lets an AI agent go deep into any step of the algorithms.
The last piece that closed the loop was Playwright CLI, which lets an AI agent interactively navigate apps exactly the same way you do manually in a browser, except straight from the command line - native territory for our smart assistants. It can even take screenshots, just like a regular browser, and analyze the final UX/UI!
And that already opens up a whole layer of possibilities - from hunting down the nastiest bugs to iteratively improving the design of apps built on top of no-code solutions.
Right now, for example, I'm closing out some pretty old hanging bugs that have been buried deep in the backlog as not-super-urgent but that still mess up users' lives. How long would those have taken me - probably weeks, since there'd already been a few attempts at fixing them with zero results. But for an AI agent it's just another routine task.