Algorithms. Physics. Mathematics. Machine Learning.

I'm not totally original

While I was digging into this area on my own, with heavy help from ChatGPT, of course, I never expected that I have a completely new result.

On Monday my colleagues and I discussed how a linear model drops out of Bayesian probability formulas and the book from the attachment was mentioned. It turns out "my" expressions are in there too - see the references in the picture.

From uplift to logit

Where we are
In the previous post we started to build a bridge from uplift to a score, which lets us solve binary classification problem. What we achieved:
⚗️ calculated probabilities for all combinations of binary feature f and target…

450 viewsedited 17:45

Algorithms. Physics. Mathematics. Machine Learning.

The Elements of Statistical Learning - Trevor Hastie.pdf

12.2 MB

414 views17:45

Algorithms. Physics. Mathematics. Machine Learning.

Tiny math problem anniversary

Ten years ago I accidentally invented an easy but funny mathematical problem. Let a long decimal number be "a". We permute digits and get "b". Is (a - b) divisible by 9?

The answer is obvious. But I saw some advanced contest-level math students, for whom it wasn't easy at all to come up with a strict explanation, why.

✍1❤1

421 viewsedited 06:33

Algorithms. Physics. Mathematics. Machine Learning.

Titanic. Age.

We started talking about the Titanic dataset. Let's discuss the Age factor. We saw that Sex, Pclass and Embarked are strong features and to study them we used that these features are categorical with low cardinality. Age is different. It's a continuous value from 0 to 80 with missing values. I tried the following approaches to figure out its signal:

👁 ROC with age as the score and survival as the target: 47% ROC AUC, no clean picture although some age ranges seem to carry signal
👁 Train CatBoost with age as the only feature, survival as the target and logloss as the loss function. Not bad on train, but random on eval
👁 In the spirit of Bayesian analysis, compare the age distribution of survivors and not-survivors. It seems the most promising exercise. There is a 0-10 range with higher density in the survival distribution, while 18-28 is more prominent among those who died. Naive model which just gives 1 to the 0-10 range and so on gives 57% ROC AUC, it seems decent
👁 Age field is filled for 80.1% of cases
👁 Survival rate is 40.6% (slightly more) for passengers with known age and 29.4% (significantly less) for passengers with missing age

All in all, age feature seems to be weak, but useful.

647 views18:21

Algorithms. Physics. Mathematics. Machine Learning.

126 and 01239

👉 The sets 1, 2, 6 and 0, 1, 2, 3, 9 are fun because their arithmetic mean is an integer. And 01239 is slightly more fun: both median and mean are inside the set.

📖 A well known fact: if you look for a point that minimizes the total distance Σ |xᵢ − a|you end up with the median. The explanation is also intuitive. Imagine these numbers are coordinates of houses, and you want to place a transformer station. You want to minimize the total length of wires. (I like a transformer station more than the legendary postman: a postman who can carry only one package at a time feels… suspicious.) If there are more houses on the right, it’s profitable to move the station to the right; if more on the left — move left. Equilibrium happens when the counts on both sides are equal. That’s the median.

But the “sum of squares” story always felt less obvious to me: Σ (xᵢ − b)²

🦴 Straight physical system is a net of springs which connect a nail (what?!!) with all houses. In this case physics is quite trivial — this mythic nail would be placed in the position where all forces are in an equilibrium and, by definition of forces through potential energy, sum of squared distances is minimized.

⚛️ This model looks totally awkward to me now. When I was young this analogy looked great. But now it stopped clicking. Springs have finite length and it's hard imagine nail connected with houses by springs. Probabilities are totally different thing. We all want to live in an universe with high probability. It's a respectable place in the middle of multiverse. So, if x1, ..., xn are samples and you want a single “most representative” number, you can assume the source distribution is Gaussian. Then maximizing likelihood is the same as minimizing the sum of squared errors — and the answer becomes the mean. Much more natural than a city-wide spring spiderweb.

👍1

484 viewsedited 17:30

Algorithms. Physics. Mathematics. Machine Learning.

Agentic RAG Legal Challenge

Why am I doing it? I'm reading a lot about agents, llms and other quite popular nowadays hot stuff. But, besides my project on work, I have no hands-on experience with all these technologies. So, I decided to take part in this competition. I don't think I will win any prize, but hope to get some experience and slightly deeper understanding of all these RAG-like buzzwords.

Informal contest description. The main idea is to create a system, which can take as input a bunch of PDF files, preprocess them and then answer questions. It's important to track telemetry - how fast the first token was generated, which chunk was retrieved from the RAG system, total amount of burned tokens.

If you want to read more, feel free to check:
📖 participant guide
⚗️ short version of QA session

Please, react with fire (🔥), if you want to read more about this competition.

🔥3❤1👏1

378 viewsedited 11:37

Algorithms. Physics. Mathematics. Machine Learning.

The Farmer Was Replaced — navigation

🛸 From time to time I write about the game The Farmer Was Replaced.

Unexpectedly it turned into a nice playground for experimenting with algorithms, navigation on toroidal grids, and even small distributed systems.

👇👇👇 Below is a collection of posts about TFWR.

If you want to start quickly

These five posts contain most of the ideas.

52 — The Farmer Was Replaced tg web_en web_ru
59 — Torus navigation tg web_en web_ru
67 — TFWR. Labyrinth tg web_en web_ru
71 — TFWR. Spawn drones tg web_en web_ru
91 — Sunflowers. MapReduce tg web_en web_ru

Full list
💣 Introduction

52 — The Farmer Was Replaced tg web_en web_ru
54 — TFWR. Navigation tg web_en web_ru

🧭 Navigation on the field

59 — Torus navigation tg web_en web_ru

🌽 Labyrinth and maze traversal 🌽

67 — TFWR. Labyrinth tg web_en web_ru
117 — Left hand rule in The Farmer Was Replaced tg web_en web_ru
119 — TFWR. Left hand maze traversal tg web_en web_ru

🏋️ Sorting tasks 🏋️

70 — Bubble sort of one cacti column in TFWR tg web_en web_ru
115 — 2D sort tg web_en web_ru
116 — 2D insertion sort. Implementation tg web_en web_ru

✈️ Drones and distributed programming ✈️

71 — TFWR. Spawn drones tg web_en web_ru
78 — TFWR. Drone coordination tg web_en web_ru

🗺 MapReduce experiments 🗺

86 — MapReduce in The Farmer Was Replaced. Part I tg web_en web_ru
90 — MapReduce in The Farmer Was Replaced. Part II tg web_en web_ru
91 — Sunflowers. MapReduce tg web_en web_ru

🧙‍♂️ Small optimization trick 🧙‍♂️

122 — Dict vs % tg web_en web_ru

📢 Discussion 📢

110 — For TFWR solutions discussions tg web_en web_ru

❤3

441 views07:07

Algorithms. Physics. Mathematics. Machine Learning.

A fleeting moment of joy. Vibe analytics.

Thank you very much for the supportive reactions to the previous post about the RAG competition.

A few words about the progress. On Sunday I spent quite a solid piece of time playing with the available data.

The idea of the competition is that you have an initial portion of data: 30 PDF documents and 100 questions. You are to prepare a pipeline which can digest these documents and prepare system to quickly answer questions.

I started with the questions. The prompt "figure out groups of questions with minimal size so questions in each group are as similar as possible" gave me 7 groups of questions.

Then "analyze these questions and figure out a relational scheme which allows us to address them in the most efficient way" gave me quite a complex relation scheme which was implemented in SQLite. Definitely it is not an architecture to win the contest, but it is a solid basis to create a golden set of answers with which we can compare answers of our quick RAG system.

Then "work without interruptions, create and check hypotheses, evaluate them, find proper answers to all questions in the contest, save explanations" gave me 100 answers. Short review of a few cases showed me that references to documents are quite solid and it totally makes sense to try this approach as the first submission.

There are several categories of questions. Some of them require a precise answer (number or list or name or boolean), some - free text. We have 1.0 score on precise answers and the nice value 0.693 on freeform answers.

I think it's a big step for our team and it's very rewarding to get 7th place in one day of work. Out of 144 members with some non-zero submissions and 300 registered teams.

❤6

402 viewsedited 07:11

Algorithms. Physics. Mathematics. Machine Learning.

0:57

Media is too big

VIEW IN TELEGRAM

TFWR. Simple pumpkin harvest

The goal of this post is not to give you a ready polished and optimized pumpkin harvester in the TFWR game. I want to show you the simplest possible harvester, as I know it and explain step-by-step how it works.

def best_move(d1, d2, n1, n2):
  d, n = ((d1, n1), (d2, n2))[n2 < n1]
  for _ in range(n):
    move(d)

def nav(x2, y2):
  x1, y1 = get_pos_x(), get_pos_y()
  n = get_world_size()
  best_move(East, West, (x2 - x1)%n, (x1 - x2)%n)
  best_move(North, South, (y2 - y1)%n, (y1 - y2)%n)

My favorite navigation. Why it works is written here.

n = 12
set_world_size(n)

Just for the sake of the video let our farm be 12 cells long. If you don't have this function yet, you can measure your farm with

n = get_world_size()

Let's start with tilling the whole farm.

for y in range(n):
  for x in range(n):
    nav(x,y)
    if get_ground_type() == Grounds.Grassland:
      till()

Two loops allow us to visit every cell on the farm. It's a good idea to check the ground type because without this check the script will always change the ground type to the opposite.

Now let's plan our pumpkin endeavor.

plan = []
for y in range(n):
  for x in range(n):
    plan.append((x, y))

We created an array of tuples with all cells in which we want to have mature pumpkins.

while plan:
  x, y = plan.pop(0)
  nav(x, y)
  if not can_harvest():
    plant(Entities.Pumpkin)
    plan.append((x, y))

while plan

is quite an interesting construction. It executes a piece of code while there is something in the list 'plan'. You can re-write this condition as len(plan) > 0, but while plan is more pythonic.

Then we get a pair of coordinates from the beginning of the list and go to the (x, y) place. Now the most tricky, at least for me, part starts.

A pumpkin can be grown up, in this case can_harvest() is True. But if it is not, there are two options: the pumpkin is OK, but it is growing and the pumpkin can be rotten. This code elegantly handles all these options. It tries to plant a pumpkin. If it is growing already, plant does nothing. If it is rotten, plant replaces it with a new one. In all these cases it makes sense to check the pumpkin again, so we append the tuple of coordinates to the end of our list.

If we are outside the loop, our plan is empty, it means that all pumpkins are grown up and we can harvest them.

Last but not least, we need to make an infinite loop to continue the harvest.

Code on github

👍1🎃1

265 viewsedited 17:00

Algorithms. Physics. Mathematics. Machine Learning.

Algorithms. Physics. Mathematics. Machine Learning. pinned a photo

17:41

Algorithms. Physics. Mathematics. Machine Learning.

6*6 to a really round number

One more number I can't leave unnoticed. It is not a true round number, but I can see a few interesting facts here.

First of all, it's the voltage of household outlets. I can't help but mention the joke "By using this outlet you have a 160V bonus"

I'm really waiting for 100000000₂, and there are only 6*6 subscribers left to this rooooound number.

Due to vector algebra, three-phase symmetry, and the properties of complex numbers, √3⋅220≈1.73⋅220≈380

❤4

242 views14:05

Algorithms. Physics. Mathematics. Machine Learning.

0:14

This media is not supported in your browser

VIEW IN TELEGRAM

Simple wood farm

It's a simple wood farm for the beginning of the game.

def best_move(d1, n1, d2, n2):
  d, n  = ((d1, n1), (d2, n2))[n2 < n1]
  for _ in range(n):
    move(d)

def nav(x2, y2):
  x1, y1 = get_pos_x(), get_pos_y()
  n = get_world_size()
  best_move(East, (x2 - x1) % n, West, (x1 - x2) % n)
  best_move(North, (y2 - y1) % n, South, (y1 - y2) % n)

n = 6
set_world_size(n)
while True:
  for y in range(n):
    for x in range(n):
      nav(x, y)
      p = Entities.Tree
      if (x + y) % 2:
        p = Entities.Bush
      if can_harvest():
        harvest()
      plant(p)

Let's check some interesting moments.

First of all, I like nav function a lot. It allows to write a clean and concise code. Write once, use forever. Until you need a heavy optimization, but it's a different story.

n = 6
set_world_size(n)

I have quite an advanced farm, these lines shrink it back into the "beginning of the game" state. In the beginning of the game you had better use

n = get_world_size()

while True:

Let's repeat our harvesting.

  for y in range(n):
    for x in range(n):
      nav(x, y)

We traverse the whole board.

      p = Entities.Tree
      if (x + y) % 2:
        p = Entities.Bush

a chessboard pattern for bushes and trees. Allows trees to grow fast.

      if can_harvest():
        harvest()

quite an important trick. Prevents young trees from being chopped down. Without this "if" one can chop a tree without getting a reward for it.

plant(p)

plant the plant again.

Happy woodchopping!!!

👍1

236 viewsedited 08:27

Algorithms. Physics. Mathematics. Machine Learning.

Груздь, сделай меня Богом

Минутка этнографии

Пушкин взял сюжет «Золотой рыбки» из сборника братьев Гримм и внес некоторые изменения. Поменял говорящую камбалу (она к тому же заколдованный принц) на золотую рыбку, рыбака с женой – на старика со старухой, убрал желание бабы стать римским папой и переделал…

230 views10:52

Algorithms. Physics. Mathematics. Machine Learning.

Marionette Codex

Today I want to share one technical insight.

Suppose you are building your own project and you want to add an agent into it. At first glance, it sounds simple: send a prompt, get an answer, repeat, call a tool.

But very quickly it turns out that writing your own harness is not simple at all.

A useful agent must be able to inspect files, write tiny code snippets, run commands, patch something, continue the conversation, call tools, and do all this in a secure and flexible way. Supporting such a thing yourself is a hell on Earth.

The nice trick is: you do not need to write all this stuff yourself.

You can just call Codex with a few flags. OpenAI documents codex exec as the non-interactive mode for scripts and automation, --json for machine-readable JSONL events, and codex exec resume for continuing an earlier automation run.

Something in this spirit:

codex exec --json --skip-git-repo-check --model <model> "<prompt>"

Then on the next turn:

codex exec resume <thread_id> --json --skip-git-repo-check --model <model> "<prompt>"

This leads to a very neat architecture:

one turn = one subprocess materialization
state = session id stored by Codex outside your app
control loop = fully yours

And this is the part I like most.

You reuse the whole safety and tool-calling machinery already built into Codex, while still having total control over the agent loop in your own application. By default codex exec runs in a read-only sandbox, while --full-auto switches to a lower-friction mode with approvals on request and workspace-write sandboxing.

So the agent is not some giant subsystem embedded into your codebase.

It is just your loop around Codex, with Codex keeping the conversation state on its side.

If I had to compress the whole idea into one sentence:

Use Codex as an external agent engine: launch it turn by turn, keep the session id, and build your own loop around it.

UPD: community shared a piece of knowledge. One can use agentic SDK for such things. Will try. Stay tuned.

🔥3👾2

299 viewsedited 14:57

Algorithms. Physics. Mathematics. Machine Learning.

Tasks and Ribbons

There is a famous problem, that comes up from time to time. I stumbled upon it in several interviews with Big Tech companies.

It has different statements. Sometimes it's like "there are a lot of meetings, you have start and end times,what's the max number of meetings overlapping at some moment in time?". Sometimes: "There is a red ribbon starting at 0 and ends at "a" and several blue ribbons, you have their start points and ends in an array. Is some piece of red ribbon is visible?"

My favorite approach to this class of problems is to split each ribbon (or meeting) into a stream of events: start of meeting, end of meeting (or scanning line meets ribbon, scanning line leaves ribbon).

Then you sort this stream of events and process it. Here you are to pay attention. Because tuples (x, 1) and (x, -1) which correspond to opening and closing events tend to go in unpleasant order. So, either you are to sort in reverse order for the second tuple component, or you introduce weird notation for an opening event as (x, -1) and for closing (x, 1).

So, a lot of nuances. And you really have to think about all this weird stuff, until you are to materialize your count as an array. In this case you can use hash maps and to store the number of opening and closing events at each point. The code is nice and simple.

And one more idea. When you are asked to maximize dot product of two arrays with non-negative numbers, sort them, multiply elements pairwise and sum results.

If this topic is interesting and you want to discuss program line by line, react with 🔥

👍2🔥2

154 views18:33

Algorithms. Physics. Mathematics. Machine Learning.

In the friendly chat we have a friendly dispute on:

whether you are to code programs by hand when you are studying. You can find theses in comments to the post. Your opinion is important!

Anonymous Poll

64%

You are to write code by hand when studying

It's enough just to read

27%

It's Friday, dudes!

11 voters162 views07:04

Algorithms. Physics. Mathematics. Machine Learning.

In the poll reference to the "friendly chat" didn't appeared. So:

friendly channel

friendly chat

friendly message

Links are not quite working, figuring out why...

165 viewsedited 07:08

Algorithms. Physics. Mathematics. Machine Learning.

#shitposting #meme

It's Friday!

Last week I found a channel and I want to share some content because it's hilarious. I spent Monday skimming through it. It turned out that these memes are universal and can be applied to quite a wide range of situations. For example, I used the first picture today, while presenting my work from the past two weeks.

Retard Wizard

Канал моей шизы
Shadow Wizard Money Gang

❤4

154 views12:50

About

Blog

Apps

Platform