Algorithms. Physics. Mathematics. Machine Learning.
289 subscribers
147 photos
9 videos
1 file
66 links
DIY projects, fun with 3d, electronics, programming, vibe coding, math, ML algorithms.
Download Telegram
Marionette Codex

Today I want to share one technical insight.

Suppose you are building your own project and you want to add an agent into it. At first glance, it sounds simple: send a prompt, get an answer, repeat, call a tool.

But very quickly it turns out that writing your own harness is not simple at all.

A useful agent must be able to inspect files, write tiny code snippets, run commands, patch something, continue the conversation, call tools, and do all this in a secure and flexible way. Supporting such a thing yourself is a hell on Earth.

The nice trick is: you do not need to write all this stuff yourself.

You can just call Codex with a few flags. OpenAI documents codex exec as the non-interactive mode for scripts and automation, --json for machine-readable JSONL events, and codex exec resume for continuing an earlier automation run.

Something in this spirit:

codex exec --json --skip-git-repo-check --model <model> "<prompt>"


Then on the next turn:

codex exec resume <thread_id> --json --skip-git-repo-check --model <model> "<prompt>"


This leads to a very neat architecture:

one turn = one subprocess materialization
state = session id stored by Codex outside your app
control loop = fully yours

And this is the part I like most.

You reuse the whole safety and tool-calling machinery already built into Codex, while still having total control over the agent loop in your own application. By default codex exec runs in a read-only sandbox, while --full-auto switches to a lower-friction mode with approvals on request and workspace-write sandboxing.

So the agent is not some giant subsystem embedded into your codebase.

It is just your loop around Codex, with Codex keeping the conversation state on its side.

If I had to compress the whole idea into one sentence:

Use Codex as an external agent engine: launch it turn by turn, keep the session id, and build your own loop around it.

UPD: community shared a piece of knowledge. One can use agentic SDK for such things. Will try. Stay tuned.
πŸ”₯3πŸ‘Ύ2
Tasks and Ribbons

There is a famous problem, that comes up from time to time. I stumbled upon it in several interviews with Big Tech companies.

It has different statements. Sometimes it's like "there are a lot of meetings, you have start and end times,what's the max number of meetings overlapping at some moment in time?". Sometimes: "There is a red ribbon starting at 0 and ends at "a" and several blue ribbons, you have their start points and ends in an array. Is some piece of red ribbon is visible?"

My favorite approach to this class of problems is to split each ribbon (or meeting) into a stream of events: start of meeting, end of meeting (or scanning line meets ribbon, scanning line leaves ribbon).

Then you sort this stream of events and process it. Here you are to pay attention. Because tuples (x, 1) and (x, -1) which correspond to opening and closing events tend to go in unpleasant order. So, either you are to sort in reverse order for the second tuple component, or you introduce weird notation for an opening event as (x, -1) and for closing (x, 1).

So, a lot of nuances. And you really have to think about all this weird stuff, until you are to materialize your count as an array. In this case you can use hash maps and to store the number of opening and closing events at each point. The code is nice and simple.

And one more idea. When you are asked to maximize dot product of two arrays with non-negative numbers, sort them, multiply elements pairwise and sum results.

If this topic is interesting and you want to discuss program line by line, react with πŸ”₯
πŸ”₯3πŸ‘2
In the friendly chat we have a friendly dispute on:

whether you are to code programs by hand when you are studying. You can find theses in comments to the post. Your opinion is important!
Anonymous Poll
67%
You are to write code by hand when studying
7%
It's enough just to read
27%
It's Friday, dudes!
In the poll reference to the "friendly chat" didn't appeared. So:

friendly channel

friendly chat

friendly message

Links are not quite working, figuring out why...
#shitposting #meme

It's Friday!

Last week I found a channel and I want to share some content because it's hilarious. I spent Monday skimming through it. It turned out that these memes are universal and can be applied to quite a wide range of situations. For example, I used the first picture today, while presenting my work from the past two weeks.
❀4
Yesterday I tried to make the first in the history of this channel memeshitposting. The idea is to do it once a week. What do you think?
Anonymous Poll
14%
no memes, just ML, math and hardcore
67%
memes ok, no obscenities
10%
more memes, no math, phys
10%
Who am I? What do you want from me?
Titanic. Embarkation

In previous posts we started exploratory data analysis of the Titanic dataset. We already checked the list of features and checked whether age is a useful feature.

This time let's talk about the embarked feature. It is about the place where a passenger embarked on the ship. We can think about this feature in two modes: wearing an ML engineer hat, or wearing an analyst hat. For an ML engineer, it is quite enough to have a strong connection between a feature and the target to include the feature into the dataset and train a model. An analyst is a little more curious creature. They would ask more questions. What is the nature of the connection between embarkation and survival? How does this feature interact with other features?

Let's check the picture. I'm more than confident that it is not totally accurate, but I hope it is correct in the most important aspect: the order of ports. Before I checked this order, I had a very naive hypothesis that passengers who embarked earlier had a better chance to leave the ship, so their survival rate is higher. But that can't be the case for two reasons. First of all, the dataset contains records of all passengers who were on board at the moment of the disaster. Second, passengers from the last stop had the best chances.

The real reason seems to be a little more subtle. In "S" there were a lot of crew and third-class passengers, and later we will see that it was bad to be a third-class man on Titanic (oops, spoiler). Therefore, embarked could be a proxy feature for economic status.
❀2πŸ‘2πŸ”₯1
Milestone

I have been waiting for this milestone for quite a while. I already have an idea for the celebratory post.

For now, let's quietly awe (o-o-o!) this point.
πŸ‘7πŸŽ‰1
Agent personalities

I've been working on quite a tedious and bulky task. In general, it's about extracting a structure from medium-sized data in a free form. I came up with the following pipeline:

πŸ’£ take a sample of about 100 records
πŸ’£ review them "manually" - one pass of an agent in one context pass - to discover clusters
πŸ’£ create a prompt for an LLM using an agent
πŸ’£ iterative prompt refinement (with an agent, using some test runs through the LLM)
πŸ’£ big run through the LLM
πŸ’£ analysis of the run
πŸ’£ production of final artefacts

As you can see, the pipeline is quite lengthy, and even the most advanced agents started to stumble on it. The interesting thing I want to share is how they stumbled.

Codex I built this pipeline using this agent and, in general, I'm quite happy with it. I'm talking with this instrument in free form, and it's enough to say something like "don't use that proxy, switch to that basic thing," and it understands. I had two real pains in the neck. First of all, when using a proxy, it starts to complain about the "apply_patch" instrument. This tool produces some warnings, and although it looks like a petty problem, it blocks the work because Codex starts to dwell on this topic for tens of minutes. The second is that a context flush is like amnesia for it. I asked it to save the project state into special .md files and manually checked that we restore state from our files, not using the default agent context compaction tool. Probably, it's my fault, but I don't understand what I did wrong.

Claude When I started Claude in the environment in which Codex slowly but surely solved my tasks, Claude started to act. It moved in a good direction. But it exceeded its quota without producing any valuable result, so I forbade it.

Gemini Probably the funniest story. One session. I didn't pay attention to the context window at all. I started it, gave it an instruction like "solve problem X," and forgot about it for a day while working with Codex. It solved the problem. Then I asked it, "Analyse your work, the problems you stumbled upon, and write down a memory note on how to avoid these problems in the future." It made this note, and a similar task was performed ideally in 10 minutes. From this moment, Gemini became my working horse and, basically, thanks to it I managed to solve my task in time.

For me now they have three personalities: Claude is a stingy person who promises to perform a task if you pay, but you don't see results. Codex is a very smart person, like a professor from a Disney movie. It's very nice to talk to him, and he can solve your problem, but you have to remind him who you are. Gemini is a worker. If it has a nice instruction, everything will be done quickly and with nice quality.

If you know someone, who might like this post, don't hesitate to share it!
πŸ‘2❀1πŸ”₯1