Agent personalities
I've been working on quite a tedious and bulky task. In general, it's about extracting a structure from medium-sized data in a free form. I came up with the following pipeline:
💣 take a sample of about 100 records
💣 review them "manually" - one pass of an agent in one context pass - to discover clusters
💣 create a prompt for an LLM using an agent
💣 iterative prompt refinement (with an agent, using some test runs through the LLM)
💣 big run through the LLM
💣 analysis of the run
💣 production of final artefacts
As you can see, the pipeline is quite lengthy, and even the most advanced agents started to stumble on it. The interesting thing I want to share is how they stumbled.
Codex I built this pipeline using this agent and, in general, I'm quite happy with it. I'm talking with this instrument in free form, and it's enough to say something like "don't use that proxy, switch to that basic thing," and it understands. I had two real pains in the neck. First of all, when using a proxy, it starts to complain about the "apply_patch" instrument. This tool produces some warnings, and although it looks like a petty problem, it blocks the work because Codex starts to dwell on this topic for tens of minutes. The second is that a context flush is like amnesia for it. I asked it to save the project state into special .md files and manually checked that we restore state from our files, not using the default agent context compaction tool. Probably, it's my fault, but I don't understand what I did wrong.
Claude When I started Claude in the environment in which Codex slowly but surely solved my tasks, Claude started to act. It moved in a good direction. But it exceeded its quota without producing any valuable result, so I forbade it.
Gemini Probably the funniest story. One session. I didn't pay attention to the context window at all. I started it, gave it an instruction like "solve problem X," and forgot about it for a day while working with Codex. It solved the problem. Then I asked it, "Analyse your work, the problems you stumbled upon, and write down a memory note on how to avoid these problems in the future." It made this note, and a similar task was performed ideally in 10 minutes. From this moment, Gemini became my working horse and, basically, thanks to it I managed to solve my task in time.
For me now they have three personalities: Claude is a stingy person who promises to perform a task if you pay, but you don't see results. Codex is a very smart person, like a professor from a Disney movie. It's very nice to talk to him, and he can solve your problem, but you have to remind him who you are. Gemini is a worker. If it has a nice instruction, everything will be done quickly and with nice quality.
If you know someone, who might like this post, don't hesitate to share it!
I've been working on quite a tedious and bulky task. In general, it's about extracting a structure from medium-sized data in a free form. I came up with the following pipeline:
💣 take a sample of about 100 records
💣 review them "manually" - one pass of an agent in one context pass - to discover clusters
💣 create a prompt for an LLM using an agent
💣 iterative prompt refinement (with an agent, using some test runs through the LLM)
💣 big run through the LLM
💣 analysis of the run
💣 production of final artefacts
As you can see, the pipeline is quite lengthy, and even the most advanced agents started to stumble on it. The interesting thing I want to share is how they stumbled.
Codex I built this pipeline using this agent and, in general, I'm quite happy with it. I'm talking with this instrument in free form, and it's enough to say something like "don't use that proxy, switch to that basic thing," and it understands. I had two real pains in the neck. First of all, when using a proxy, it starts to complain about the "apply_patch" instrument. This tool produces some warnings, and although it looks like a petty problem, it blocks the work because Codex starts to dwell on this topic for tens of minutes. The second is that a context flush is like amnesia for it. I asked it to save the project state into special .md files and manually checked that we restore state from our files, not using the default agent context compaction tool. Probably, it's my fault, but I don't understand what I did wrong.
Claude When I started Claude in the environment in which Codex slowly but surely solved my tasks, Claude started to act. It moved in a good direction. But it exceeded its quota without producing any valuable result, so I forbade it.
Gemini Probably the funniest story. One session. I didn't pay attention to the context window at all. I started it, gave it an instruction like "solve problem X," and forgot about it for a day while working with Codex. It solved the problem. Then I asked it, "Analyse your work, the problems you stumbled upon, and write down a memory note on how to avoid these problems in the future." It made this note, and a similar task was performed ideally in 10 minutes. From this moment, Gemini became my working horse and, basically, thanks to it I managed to solve my task in time.
For me now they have three personalities: Claude is a stingy person who promises to perform a task if you pay, but you don't see results. Codex is a very smart person, like a professor from a Disney movie. It's very nice to talk to him, and he can solve your problem, but you have to remind him who you are. Gemini is a worker. If it has a nice instruction, everything will be done quickly and with nice quality.
If you know someone, who might like this post, don't hesitate to share it!
👍2❤1🔥1
Round Numbers
I really don’t want to push my luck, but it seems we’ve reached a perfectly round number of subscribers. So let me tell you a story.
Many, many years ago, an Indian shah was bored. Then a wise man came and presented him with the game of chess. The shah was thrilled and offered the man anything he wanted. The wise man asked for as much rice as the shah could place on a chessboard using the following rule: on the first square, put one grain of rice; on the second, two grains; and so on. Each next square should contain twice as many grains as the previous one.
I don’t actually know how this story ends because, obviously, 2**64 is quite a big number, and the shah could not possibly give the wise man everything he had asked for.
But this story gives us exactly the picture I started this post with.
Yesterday our community covered one row of this proverbial chessboard. Today — one cell of the next row. That’s the strange property of fast-growing populations.
As far as I remember, about 30% of all people who have ever lived are alive right now. So if you hear the joke “100% of people who eat cucumbers died,” don’t trust it. No more than 70%.
I really don’t want to push my luck, but it seems we’ve reached a perfectly round number of subscribers. So let me tell you a story.
Many, many years ago, an Indian shah was bored. Then a wise man came and presented him with the game of chess. The shah was thrilled and offered the man anything he wanted. The wise man asked for as much rice as the shah could place on a chessboard using the following rule: on the first square, put one grain of rice; on the second, two grains; and so on. Each next square should contain twice as many grains as the previous one.
I don’t actually know how this story ends because, obviously, 2**64 is quite a big number, and the shah could not possibly give the wise man everything he had asked for.
But this story gives us exactly the picture I started this post with.
Yesterday our community covered one row of this proverbial chessboard. Today — one cell of the next row. That’s the strange property of fast-growing populations.
As far as I remember, about 30% of all people who have ever lived are alive right now. So if you hear the joke “100% of people who eat cucumbers died,” don’t trust it. No more than 70%.
👍1🤔1
School of Data Analysis. Agent intensive
There is an informational storm in the channels I read about agentic programming. The School of Data Analysis released a free course on agentic programming, and a lot of people are discussing it. Mostly, they are trying to find gaps in it.
For example:
🐞 Polyakov, tools and MCP or
🐞 Kovalsky, on Polyakov
The idea of this channel is to share things from my perspective, so I want to share my conspect of the course. These are the things that made a difference for me personally, given my current background in this area.
Let's start.
Agent
It's a totally unstable concept. Of course, you need an LLM (large language model), some prompts to guide it, tools to call, memory, guardrails (for safety), and planning skills (to fill the gap in the LLM's ability to make plans).
For me, guardrails and planning skills are the most interesting things to hear about.
LLM
The lecture said that an LLM is basically two files. One huge file with parameters, and one small program that runs it. For me, this statement is important because it demystifies the technology. Just two files. That's it.
Karpaty's LLM OS
I had read about this idea several times before. But now it really clicked.
In an agent, the LLM works like a CPU on a motherboard. It processes data in different modalities, acts through tools, and performs Input/Output operations. This "OS" thing sounds wrong to me. An OS is the first program that starts on your computer when you turn it on. A CPU on a motherboard suits this analogy much better.
Special tokens
You know, an LLM can't see letters. It sees tokens. Each token is a group of letters. This helps optimize both training and inference.
I already knew that. But for me there was still a gap between the JSON I send to the LLM API and the array of tokens actually fed into the model. There is one element that makes this gap narrower: special tokens.
I knew there were special tokens to start and stop generation. But it turned out that there are also tokens for roles in conversations and for actions, like text translation.
There is an informational storm in the channels I read about agentic programming. The School of Data Analysis released a free course on agentic programming, and a lot of people are discussing it. Mostly, they are trying to find gaps in it.
For example:
🐞 Polyakov, tools and MCP or
🐞 Kovalsky, on Polyakov
The idea of this channel is to share things from my perspective, so I want to share my conspect of the course. These are the things that made a difference for me personally, given my current background in this area.
Let's start.
Agent
It's a totally unstable concept. Of course, you need an LLM (large language model), some prompts to guide it, tools to call, memory, guardrails (for safety), and planning skills (to fill the gap in the LLM's ability to make plans).
For me, guardrails and planning skills are the most interesting things to hear about.
LLM
The lecture said that an LLM is basically two files. One huge file with parameters, and one small program that runs it. For me, this statement is important because it demystifies the technology. Just two files. That's it.
Karpaty's LLM OS
I had read about this idea several times before. But now it really clicked.
In an agent, the LLM works like a CPU on a motherboard. It processes data in different modalities, acts through tools, and performs Input/Output operations. This "OS" thing sounds wrong to me. An OS is the first program that starts on your computer when you turn it on. A CPU on a motherboard suits this analogy much better.
Special tokens
You know, an LLM can't see letters. It sees tokens. Each token is a group of letters. This helps optimize both training and inference.
I already knew that. But for me there was still a gap between the JSON I send to the LLM API and the array of tokens actually fed into the model. There is one element that makes this gap narrower: special tokens.
I knew there were special tokens to start and stop generation. But it turned out that there are also tokens for roles in conversations and for actions, like text translation.
👍1
273
My head is about to break because of the School of Data Analysis course. Diagrams of interactions between MCP components give me nightmares. Let's talk about constants in physics.
Everyone knows that absolute zero is approximately -273 degrees Celsius. But what is the source of this constant? Is it experimental or theoretical?
A piece of totally impractical, but dear to me, knowledge is the following. If we take the melting point of water and its boiling point as reference points, divide the whole range into 100 equal parts using an expanding liquid like mercury as a measure, then a decrease in temperature by 1 degree Celsius leads to the gas shrinking by approximately 1/273 of its volume at 0 degrees Celsius. So absolute zero is the point at which the gas shrinks to nothing.
Phew. So much easier than two stage-embedding retrieval.
My head is about to break because of the School of Data Analysis course. Diagrams of interactions between MCP components give me nightmares. Let's talk about constants in physics.
Everyone knows that absolute zero is approximately -273 degrees Celsius. But what is the source of this constant? Is it experimental or theoretical?
A piece of totally impractical, but dear to me, knowledge is the following. If we take the melting point of water and its boiling point as reference points, divide the whole range into 100 equal parts using an expanding liquid like mercury as a measure, then a decrease in temperature by 1 degree Celsius leads to the gas shrinking by approximately 1/273 of its volume at 0 degrees Celsius. So absolute zero is the point at which the gas shrinks to nothing.
Phew. So much easier than two stage-embedding retrieval.
👍2
Shad Intensive. Memory. Guardrails.
Just want to share that I'm listening to this course. With a delay, but I'm trying to eat this mammoth piece by piece.
Today, in the "Memory and Guardrails" lecture, I didn't hear anything that gave me an insight I would like to share. Just common words about context length and context compaction in the memory part. In the guardrails part, they mentioned sources of danger like every user input, RAG, and API. I believe this is the usual computer security paranoia: you can't trust anyone. And it's much better to turn your computer off, drop it in liquid cement, and let it set.
The only thing that really interests me is not what I understood, but what I didn't. Surprisingly, I don't quite get this "context window" concept. Probably it's just my hallucination. If not, and if there is something interesting here, I'll share it.
Just want to share that I'm listening to this course. With a delay, but I'm trying to eat this mammoth piece by piece.
Today, in the "Memory and Guardrails" lecture, I didn't hear anything that gave me an insight I would like to share. Just common words about context length and context compaction in the memory part. In the guardrails part, they mentioned sources of danger like every user input, RAG, and API. I believe this is the usual computer security paranoia: you can't trust anyone. And it's much better to turn your computer off, drop it in liquid cement, and let it set.
The only thing that really interests me is not what I understood, but what I didn't. Surprisingly, I don't quite get this "context window" concept. Probably it's just my hallucination. If not, and if there is something interesting here, I'll share it.
Shad Intensive. Evaluation.
It seems I’ll be eating this mammoth in small pieces for ages. So for now, here’s a link to a nice post on agent evaluation
It seems I’ll be eating this mammoth in small pieces for ages. So for now, here’s a link to a nice post on agent evaluation
Telegram
Поляков считает: AI, код и кейсы
Как тестировать AI-агентов: на полях лекций в ШАД
Продолжаю Agents Week от ШАД. Четвёртая лекция — как проверять качество агентов. Тема, которую все откладывают и которая больше всего бьёт по репутации ИИ в проде или интегратора.
📋 Что советует лекция
…
Продолжаю Agents Week от ШАД. Четвёртая лекция — как проверять качество агентов. Тема, которую все откладывают и которая больше всего бьёт по репутации ИИ в проде или интегратора.
📋 Что советует лекция
…
Are we doing meme shitposting today?
Anonymous Poll
86%
Memes! Yeah!
7%
I'll unsubscribe immediately
0%
I've already unsubscribed
21%
TGIF!!!
TGIF. Meme
First of all, let's check the results of the poll. One subscriber promised to unsubscribe, and the other 9 voted for memes and shitposting. A naive approach would be to think that if I publish the meme, I'll lose 1 subscriber. But you can solve the proportion, and it gives -27.6 subscribers. The result is stunning, so, let's see. I expect to drop to 248.4 subscribers.
The topic of today's Friday meme is The Soup.
First of all. I stole these memes from The Wizard . I think that it is a magnificent channel and I ask you to promote it as widely as you can.
And now to the chase.
Dad's soup
My dad cooks absolutely hellish food.
It’s a sort of averaged recipe, because there are lots of variations.
He takes soup — but reheating it is not my dad’s style.
He pours the soup into a frying pan and starts frying it.
He adds a huge amount of onion, garlic, tomato paste, flour for thickness, and mayonnaise on top.
The whole thing fries until smoke starts coming out.
Then he takes it off the heat, lets it cool on the balcony, brings it back, pours on even more mayonnaise, and starts eating.
He eats straight from the pan, scraping it with a spoon, muttering under his breath, “oh, damn.”
Sweat is standing on his forehead.
Sometimes he politely offers me some, but I refuse.
Needless to say, the aftermath is monstrous.
The stench is so intense that the wallpaper peels off the walls.
P.S. To be honest, all this subscribe/unsubscribe stuff is starting to get to me. I’d really appreciate some support — even a couple of emojis wouldn’t hurt.
P.P.S. Tomorrow I’ll try to pull myself together and write something clever. Probably continue the “Titanic” line.
First of all, let's check the results of the poll. One subscriber promised to unsubscribe, and the other 9 voted for memes and shitposting. A naive approach would be to think that if I publish the meme, I'll lose 1 subscriber. But you can solve the proportion, and it gives -27.6 subscribers. The result is stunning, so, let's see. I expect to drop to 248.4 subscribers.
The topic of today's Friday meme is The Soup.
First of all. I stole these memes from The Wizard . I think that it is a magnificent channel and I ask you to promote it as widely as you can.
And now to the chase.
Dad's soup
My dad cooks absolutely hellish food.
It’s a sort of averaged recipe, because there are lots of variations.
He takes soup — but reheating it is not my dad’s style.
He pours the soup into a frying pan and starts frying it.
He adds a huge amount of onion, garlic, tomato paste, flour for thickness, and mayonnaise on top.
The whole thing fries until smoke starts coming out.
Then he takes it off the heat, lets it cool on the balcony, brings it back, pours on even more mayonnaise, and starts eating.
He eats straight from the pan, scraping it with a spoon, muttering under his breath, “oh, damn.”
Sweat is standing on his forehead.
Sometimes he politely offers me some, but I refuse.
Needless to say, the aftermath is monstrous.
The stench is so intense that the wallpaper peels off the walls.
P.S. To be honest, all this subscribe/unsubscribe stuff is starting to get to me. I’d really appreciate some support — even a couple of emojis wouldn’t hurt.
P.P.S. Tomorrow I’ll try to pull myself together and write something clever. Probably continue the “Titanic” line.
❤2😁2🔥1