Anticodeguy
654 subscribers
825 photos
164 videos
1 file
328 links
Technomad & systems thinker exploring paths to freedom and prosperity

https://stan.store/anticodeguy
Download Telegram
Infinite development loop
<written by a human being>


Hit a problem I don't know how to get out of yet. Been developing with an AI agent for weeks now, the system is mostly written and works module by module, but it's still very rough and crooked. Talking about a LangGraph-based graph that mixes deterministic nodes with code scripts and LLM calls - which, of course, tend to deviate from what's actually required.

And that's exactly where the constant problems keep spawning: either the criteria used to evaluate the LLM's output are too narrow and the model is literally set up to never meet them. Or there's not enough context, but the deterministic nodes can't surface that by their very nature. Or the dev agent seems to find yet another bug that's supposed to fix everything, but in practice pulls things in a different direction, digging the tech debt deeper and pushing the final result further away.

Not the first time I've tried to change the approach, nudge the agent in what seems like the right direction - but we keep sliding back into an infinite loop of fixes. Every new fix spawns a few more "bugs," which keeps growing the task count that's supposed to shrink over time.

How to get out of this hole - no idea yet. And even though the graph system we're building is pretty complex, it's still not rocket science. But Opus 4.7 is doing a poor job navigating the development so far, which the lack of results makes obvious.

Once I figure it all out, I'll definitely share what I found. But for now - tell me, are you running into similar issues with agentic development?
Media is too big
VIEW IN TELEGRAM
The Tasks AI Still Can't Do (And Why It Pretends It Can)
<written by a human being>

So, you've written a bunch of specs for a system you're planning to build. What's next?

First, you need to make sense of all this stuff. In my case, for example, I ended up with 12 ADRs (Architecture Decision Records) and 13 accompanying specifications. All of this needs to be structurally organized within the future repository. There's actually a dedicated spec for that - one that locks down this very structure. That's what you start working from.

The next agent should plan the deployment of a working project environment for development. Draw up a plan of everything that'll be needed, and actually get to work in accordance with the specs - they should always stay in context.

On top of that, I spun up a separate agent to visualize the architecture and system topology for me in C4 (a format specifically designed for architectural diagrams). Claude is honestly pretty bad at this right now, but I'll get what I want.

I know you're already itching to start building - but it's genuinely too early for that. Not if you want to avoid ending up with vibe-coded, leaky slop, anyway. You still need to lock down all the rules for working with the repo, the code, the system, and the docs, learn how to preserve and pass context properly, and keep clean, organized track of project tasks.

I'm doing all of this in parallel with my posts, by the way - so I'm literally sharing a behind-the-scenes look straight from the source: what's actually happening in my VSCode.
<written by a human being>

A couple of days ago, a friend of mine who works at a bank asked me about existing tools for drawing C4-level diagrams - basically, visually representing the architectural topology of an information system.

Reliable diagram visualization is something I've been saying for a long time - it's a task that AI currently handles quite poorly. Sure, it understands what charts and diagrams are, can recognize them and even "build" them - but only in theory, and only as text.

The one thing I actually managed to pull off was a BPMN diagram, and the saving grace there was that it's deterministic XML markup. So it's logical to assume that diagrams which are "drawn" with text should come naturally to AI.

But no such luck. For some reason, visual representation is exactly where they struggle the most. There's Structurizr DSL, for instance, which supposedly lets you generate the right diagrams through code. Except deploying it turned out to be a fairly labor-intensive task - which felt like overkill for a single diagram.

There are UML primitives that let you assemble what you need, but first of all it's not pure C4, and second of all it still comes out crooked.

In the end, the simplest thing that actually gives me the result I need - for now - is HTML visualization. Not without some tap-dancing, of course, but it gets the job done overall. And visually you can make it look pretty decent too.

For the rest - we're waiting for model updates where the training data will include more diagrams.
Media is too big
VIEW IN TELEGRAM
AI Delegation Is About Doing What You Never Could
<written by a human being>


I've talked about LangGraph a few times already and how I'm building my video editing system on top of it. And a few days ago I started running into videos about ADK - Google's Agents Development Kit. It's a relatively fresh framework that literally two days ago added its own graph-based Workflow Runtime, which immediately puts it if not on the same level, then at least closer to LangGraph.

And of course, already beaten up by the infinite development loop on LangGraph, I went to take a closer look at ADK and the potential of migrating my system to it. It's got a graph too, which lets you run deterministic nodes - meaning regular code scripts - and everything goes through familiar Python classes.

Spinning up a quick simple agent with a repeatable loop and strict procedures using ADK is gonna be way faster - you literally just grab ready-made components. In LangGraph you build everything from scratch, but that also gives you more control over what's happening.

But if you need to build a genuinely complex and detailed process (which is exactly what my system turned out to be) - LangGraph wins in a lot of ways, precisely because of the high boilerplate - state schema, nodes, edges. In ADK you literally define an agent Agent(...) and that's it.

Another mismatch with my requirements is ADK's hard lock-in to the Google ecosystem. If you're already deep in there, it's only a plus. But my goal from the start was to build an independent system on my own host, with no strict vendor lock.

So for now I'm staying on LangGraph, but keeping one eye on ADK and how it develops. I'm sure with such an active community it'll catch up fast and get polished to production-ready.
Meta-agents
<written by a human being>


The structure of my work project pushed me toward using a concept I call Meta-agents.

I have one repository, from which I design a new repository for a future system from scratch. Me and the agents plan everything - architecture, stack, order of work, project management, infrastructure deployment, all of it.

We also plan the work of AI agents on developing this new system and, accordingly, their work in a repository separate from the current one. We prepared a set of instructions and even a skill set that the agents will use.

And evolutionarily arrived at planning the work of those agents. Now we run smoke tests with them, where the initial prompts to the agents sound something like "Complete task DSP-160" and that's it. The agent has to figure out everything else on its own, while me and the agent in the other repo watch its actions and analyze what went wrong and what to tweak in the instructions, rules, and skills.

So here's the picture we ended up with: we created an environment for agents, launch them, and with another Meta-agent we observe and fine-tune the habitat for the new ones. Ironic, right?

So far this approach has been working really well: literally after the first smoke run and one instruction tweak, the second session went almost without a hitch, which is great. Continuing to observe.
Media is too big
VIEW IN TELEGRAM
Give AI a Terminal - Genius. Give It Bubble - Disaster
<written by a human being>


A couple months ago I talked about how no-code solutions were still stuck in semi-manual mode for me - the Claude Chrome plugin worked unbearably slow and ate an unbearable amount of (tokens), so it was faster and easier to just make edits by hand.

AI stayed this wise algorithm and UX advisor, helping me debug stuff and quickly come up with design solutions.

The next evolution step was feeding exports and dumps to agents. Pretty much any no-code tool lets you get data about what's happening inside the app or database one way or another - as tables, JSON files, or their own formats that end up being machine-readable anyway.

Bubble, for example, lets you export the whole app in .bubble format, which is basically minified JSON, and if you clean it up into standard format a coding agent starts understanding it pretty effectively.

Or Directual, which I work with a lot - it lets you export all your scenarios or workflows, plus table structure, into clean JSON formats, which lets an AI agent go deep into any step of the algorithms.

The last piece that closed the loop was Playwright CLI, which lets an AI agent interactively navigate apps exactly the same way you do manually in a browser, except straight from the command line - native territory for our smart assistants. It can even take screenshots, just like a regular browser, and analyze the final UX/UI!

And that already opens up a whole layer of possibilities - from hunting down the nastiest bugs to iteratively improving the design of apps built on top of no-code solutions.

Right now, for example, I'm closing out some pretty old hanging bugs that have been buried deep in the backlog as not-super-urgent but that still mess up users' lives. How long would those have taken me - probably weeks, since there'd already been a few attempts at fixing them with zero results. But for an AI agent it's just another routine task.
Lazy AI agents, or why attention to detail actually matters
<written by a human being>


I currently have two systems in development - a personal finance tracking system and a video editing system. Both have complex data processing pipelines.

And when an LLM runs through them, the agents have a tendency toward "lazy" solutions - taking the path of least resistance (just like humans, I guess). For example, there's a money transfer transaction to another account, recorded in a bank statement. By default, and logically, the system assigns it a transfer category. But at the same time it tries to find the other account in the database that the money was sent to. Except that transaction turned out to be a payment to a contractor - paid by transferring to their personal card.

Yeah, the system doesn't have enough data to make the right classification call, but that's exactly where I was counting on the LLM, not just a dumb algorithm - I figured it would stop and hand the case to me so I could decide what it actually is. But nope - it silently decides it's a standard transfer and the mismatch only surfaces during the final balance reconciliation, which means a long, painful unraveling of the entire data chain to find the source.

In moments like this you have to stop the slacking machine that supposedly has intelligence, and force it to walk through each transaction with you in detail to fix the pipeline design.

I've simplified the case a bit for clarity, but the point should be clear. It's still way too early to blindly trust LLM decisions when accuracy matters - like in finance. Especially when the data is incomplete.
<written by a human being>


Seems like the AI world has been unusually quiet lately - no major model drops, no talk of revolutionary quality leaps, image and video models are spinning their wheels too. Have we hit peak perfection?

Either way - it's actually a good time to stop chasing updates and go deep on the tools we already have.

After yesterday's dev session with AI agents, I realized that depth is exactly what I'm missing from their side. A genuine understanding of what we're actually doing - even when it's spelled out in the instructions and prompts. Somewhere around the middle of a session I suddenly realize we're not solving the root cause of a bug - we're skimming the surface, painting over a scratch, while completely missing the fact that underneath that paint there might be a rotting foundation that needs to be fixed first.

It reminds me of Elon Musk's first principles approach - the one that lets you cut through to the very essence of a problem, the foundation everything else is built on. How do you get AI agents to actually apply that same approach?

For now, the only way I've found is through a series of progressively deeper questions. But the agent still tries to patch the bug, move on, do something. It struggles to stop and ask itself - why am I doing this right now, what's the actual end goal?

That, to me, will be the real breakthrough in future models. Until then, you have to keep them on a short leash and watch them closely.
Media is too big
VIEW IN TELEGRAM
The One File Every AI Coding Project Needs
How not to drown in a constantly filling pool of AI tools, or your perfect AI stack
<written by a human being>


Recently I came across a video about AI stacks and realized that for a lot of people this can be an incredibly non-obvious choice - given the insane number of new tools that drop if not every day, then every week or two for sure.

How do you keep up with all of it? You don't! And there's not much point. Give it a bit more time and the market will consolidate, like it always does, and it'll become clear who's daddy - the leaders will split the market between them and there'll be a long tail of niche tools, each covering the needs of a specific segment and use case.

That's exactly the approach I use when building my own stack: I pick a couple of leaders I use on a daily basis and sprinkle in periodic use of niche tools.

For example my go-to AI is Claude and I work 97% of the time in CLI mode from VSCode. It covers a solid 80% of all tasks on the flagship model. I split some of the work with Codex too - mostly data processing, bug fixes, and client tasks, which it handles great.

And that's pretty much it. Everything else is niche usage. Like, I dictate long prompts into Wispr Flow, transcribe call recordings in ElevenLabs via API, generate images in ChatGPT.

If something comes up that needs a specific tool the flagships can't handle, I'll go find the right one. But honestly, over the past several months that just hasn't happened.
<written by a human being>


The AI-coding community is slowly starting to turn toward Codex, which lately seemed to have faded into the background and given up its lead to Claude Code and even Cursor's Composer.

Personally, in practice I've always used both, since I didn't see a major difference in the results - though some distinctions in specific cases were still noticeable. I've mentioned this a few times: tasks involving data, where accuracy matters, I more often hand off to Codex, and it handles them brilliantly. Opus handles them too, but typically over more iterations - meaning, roughly speaking, longer and more expensive.

Cost, by the way, is one of the key factors when choosing one tool or another, because when the output quality is roughly equal, price becomes the natural deciding edge. And the calculations one guy did a few months back, comparing equivalent subscription cost to API spend, put Claude in an undisputed lead: for a $100 monthly subscription you're buying around $1,300 worth of API value.

That math was the key motivator for me to get the Max subscription on Claude, which ended up becoming my main coding agent.

But now people are saying that Codex's limits on the same $100 subscription far exceed Claude's and feel practically unlimited. I haven't put that to the test yet, but it definitely got me thinking. I think I'll need to spend some time testing them side by side to compare the feel of the results, the usage limits, and the quirks of each one on different kinds of tasks.

But if you don't yet have a subscription to the top-tier plans of the flagship coding AIs, the choice just got a lot harder.
Media is too big
VIEW IN TELEGRAM
The One File That Stops AI Agents From Wasting Your Tokens
How many AI agents can you juggle at once?
<written by a human being>


AI agents give you an undeniable speed advantage on a lot of tasks. Over time I started spinning up multiple agents simultaneously, each working on a different project.

One's trying to hunt down a bug in a client's system. Another's editing a promo site for a new event for a different client. A third is spinning up a local dev environment for a new project.

And I'm switching between them, adding context, unblocking blockers and conflicts, reviewing output, giving the go-ahead or queuing the next task. Sounds pretty productive, right?

Except it has the opposite effect too - constant and frequent context switching, trying to hold the whole stack in your head while processing it all pretty fast, gives you the same feeling you get at a hard deadline when your ass is on fire, you can't keep up with anything and you just keep paddling to stay afloat.

And yet paradoxically, you get everything done and then some - more than you planned - but it doesn't feel that way. Apparently my cognitive hardware isn't used to operating in these conditions yet and still interprets it the old-fashioned way.

One task at a time - works great. But how do you deal with the fact that the AI is doing the task and you're just watching it? Sit there doing nothing waiting for the next iteration? No way, if there's time, might as well knock out other things in parallel! And that's how it starts spiraling.

You get this too? How do you deal?
<written by a human being>


I don't usually talk about new models, but today I'll join the trend, since I've already had a chance to get my hands on the new Opus - and I really liked it!

Literally from a single prompt it solved tasks - and did so faster, more precisely, without any hassle, deviations, or back-and-forth questions.

One of my automations broke and I asked it to fix it. Not only did it do that on the first try, but it also dug into the root cause - which turned out to be two consecutive system crashes that had corrupted the automation script file. On top of that, it kindly warned me that this isn't normal and suggested checking the system for serious errors. The previous Opus wasn't that perceptive.

Today I have several coding sessions ahead, and I'll be putting the new model through its paces in real battle conditions, for its intended purpose. If anything stands out beyond the noticeably improved thoughtfulness, I'll definitely share.

In the meantime, we're being promised that the new model has become more honest and accurate in its assessments. That was genuinely a problem before - it would give completely unrealistic timelines, like saying 2 days for a task it would finish in 20 minutes. And apparently it should be less of a yes-man now, and instead think harder about whether an action actually makes sense before doing it.

And the cherry on top - optimized token consumption, meaning the model got smarter without supposedly eating through more limits than the previous one. But all of that will show in practice. Let's go check it out!
Media is too big
VIEW IN TELEGRAM
Stop Re-Explaining Your Project to Claude Every Session
<written by a human being>


My impressions of the new Claude Opus 4.8 are very positive. Just a couple of posts ago I was writing about how running multiple agents simultaneously causes cognitive overload due to constant context-switching and the simple fact that agents demand your attention.

So yesterday I fired up the first task and it went off to do it. Purposefully, deeply - and it became pretty clear pretty fast that my help wasn't needed there. It didn't get stuck on obvious little things, made decisions (which, by the way, I thought were the right ones).

And I realized I could calmly spin up another agent in parallel. Did that, and it went off to work too. A few minutes passed - out of habit I started checking on both of them, but both were so busy that I had no choice but to launch a third!

I also thought that in this mode they'd burn through all the 5-hour limits pretty fast, but I was wrong. All three delivered results, working for 10–15 minutes and spending around 15% of the limit, while completing the expected task.

Then there was one interesting case where I gave it a task involving client database remapping - something that implied a multi-stage pipeline: exporting raw data from different sources and several stages of processing it into its final form. However, the session token for exporting one of the sources had expired, and there was no way to download the data from there without my help. The previous version would've stopped immediately and reported the blocker. But 4.8 saw an opportunity to keep going - it first downloaded data from the available source, ran it through processing, and only then asked me for the token.

That really impressed me. The new model keeps pushing toward the result, navigating around obstacles, and doesn't bother the operator until there's a real blocker. All in all, I'm more than satisfied so far. I don't see any reason to use 4.7 anymore.
🔥1
<written by a human being>


My endless dev-loop on the video editing system continues, but now I've actually got real hope. The fresh Opus model, the moment it saw the project, started tearing our development approach from the previous model to shreds and proposing other solutions.

And right here you can already feel that promised honesty - it actually says what looks questionable, instead of just nodding along, meekly accepting my decisions and getting on with executing them.

The agent now runs longer and clearly tries to deliver a result that moves things further, closer to the end goal.

Second case - my home accounting. We're untangling a knot of data spanning many years of keeping it, filling the gaps, consolidating a ton of sources, including bank exports, databases from old programs I used to run, transaction history on the blockchain. And of course it doesn't immediately work to match, say, transfers - the balance doesn't come out to zero.

But Opus, on its very first pass through the current analysis cycle, suggested updating the approach: first classify a taxonomy of the unmatched transactions, and only then glue them into final chains as separate clusters. And as it turned out, this approach actually worked and moved us forward way more effectively than all the previous days of work on the old algorithm.

From the observations - you can now actually hand it complex tasks to reason through. Before, I tried to decompose them so I wouldn't overload the agent's context, which would inevitably forget something from it. And now I get the feeling that the full context gets taken into account, and all at once.

Two days in - the flight's beautiful. Off to test Max effort. By the way, the latest version of the Claude app got voice input - finally!
🔥1
<written by a human being>


People ask me how to set up a proper workflow with AI. When you're steeped in it every single day, you barely even notice how it becomes part of the routine - open VSCode, launch an agent from the terminal with the --dangerously-skip-permissions flag, point it at the instructions, brief it on the task at hand, and respond to occasional HITL callbacks.

But for someone who's far from technical work, all of this feels unfamiliar, alien, not particularly meaningful, confusing, and complicated. Let's untangle it. I don't want to make this technically heavy, so I'll pull out the 20% of effort that delivers 80% of the results.

First - work with coding agents instead of classic chat. The thing is, in coding mode you can do everything you can do in chat mode, but it gives you a whole lot more leverage when working on any kind of project, even if that project has nothing to do with development or writing code.

Second - create a project working folder and launch your coding agent from that folder. This way it always has the ability to pull together the full context of what we're working with. And that "project" can be your entire business or personal life - it doesn't have to be some atomic scope dedicated to a narrow topic.

Third - agent instructions: essentially a text file that briefly and densely describes what we're working on in this project (folder), with pointers to individual files that deepen the context when needed.

If you switch to this mode of working - a coding agent, launched from a project folder, with instructions - you'll already be getting a massive advantage from using AI.
🔥2