Algorithms. Physics. Mathematics. Machine Learning.
171 subscribers
86 photos
7 videos
40 links
DIY projects, fun with 3d, electronics, programming, vibe coding, math, ML algorithms.
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
Bubble sort of one cacti column in TFWR

This game provides quite a nice opportunity to see how different sorting algorithms work. In this post, let's discuss the famous bubble sort.

You can check the code on github.

Let's check how it is built.

best_move is a helper function for nav. nav allows us to move the drone to any given coordinates on the field.

Then, in the script I set the farm size to 16, just to test moving the drone to the point (6, 6). Till the soil, so we can plant cacti. Then - plant cacti. And, finally, lines 32-36 - bubble sort.

For me, here it's interesting to clearly set variables and their physical sense. I don't want to do unnecessary work and I want to correctly handle corner cases. So:
* y_upper - is the last cell we want to swap cacti with

Let's stop here and think, what we can derive from this statement. It means that y_upper moves from (n-1) to 1 inclusive. So, small subtask - to set range for y_upper correctly. It's range(n-1, 0, -1).

After that, everything is simple. The inner cycle, which sets the drone's position, just loops from 0 to y_upper - 1, which gives a simple range(y_upper). Basically, that's it. The measure() function from the game is very handy for this task, comparison with the upper cell is just
measure() > measure(North)

as well as swapping with this cell
swap(North)


One more interesting thing. The map is on the torus, so in the beginning of sorting the drone "rotates" in one direction. But when more than half of the column is sorted, it starts to move back and forth.
TFWR. Spawn drones.

In "The Farmer Was Replaced" game there is quite an interesting option - one can spawn drones in order to operate the farm more efficiently. This option is advanced because of language tricks and game limitations. Nevertheless, the problems and opportunities are quite similar to those which "adult" programmers have with multithreading and multiprocessing. Therefore, it's quite an interesting topic to master.

In this post, I just want to show the very basic use of this technology. I drew a small smiley on a piece of paper and put the coordinates of pixels in this simple picture into a list of tuples. Then I ask one drone to visit these places one by one and call the spawn_drone() function at each place. This spawn_drone function requires a function as a parameter - hello, metaprogramming - and executes this function in the new drone. This time the function is very simple: wait_forever. It just does nothing. But an infinite loop inside the function keeps the drone alive. When the function ends, the corresponding drone ceases to exist.

All in all, you can now see a smiling face made of drones. Happy hacking!

Code on github
🔥2
GBDTE. Culprit is found.

I started to dig really deep and, with the help of the iron friend, I checked several hypotheses. I found the optimal expression for the ideal model for this dataset. And this result is worth a separate publication. It was interesting to find out that my intuition about the linear model and linear lift was totally correct: in the dataset I constructed, the optimal model's score linearly depends on time.

Today I decided to compare all these theoretical results with one step of my model. You can see the picture at the beginning of the article. This time the problem is in vibecoding. The model decided that it's a good idea to put all features into an extra part of the dataset. It's quite obvious what to do next: separate them and test again. Stay tuned.
LLM in education

Sorry for this quite a rather unoriginal topic, but it bothers me. I'm teaching my son programming. So far we have discussed it in a classical way: variables, loops, algorithms. We spend hours discussing how to build stacks, queues and heaps over arrays. We wrote qsort, merge sort in Python, and now in C++.

Recently it's almost impossible to avoid "vibe coding" – Visual Studio Code pastes like ten lines of code when you just define a function drawSun, and these lines are exactly the same you wanted to discuss and carefully build word by word.

I'm trying to adapt. We accept these lines, discuss them, and I'm trying to urge my son to learn the ideas behind them.

But I'm unsure. Maybe it's better to become just an LLM Luddite during the learning process? Or separate "vibe coding" lessons from the usual lessons? One time we are trying to build an application as fast as possible, and another time we are training to write all these nuts and bolts from scratch?

Please, feel free to share your thoughts on my case in comments. Opinions and best practices are totally appreciated.

There is no policy like "english only". I'm writing in English just for practicing the writing skill. There is autotranslate function turned on. You can write comments in English, Russian, whatever works for you best.

Нет никакой политики насчёт языка. Я пишу посты на Английском только для прокачки писательского скила. Стоит автопереводчик, давайте общаться, как кому удобнее.
Topics request

Now I have a list of topics for the nearest posts.

* video for the first vibe coding project
* why binary features are bad extrapolation features (explanation why quality goes down when interpolating features are included into extra variables)
* import in TWFR
* relation of ROC and trees
* for in python and in c-group languages
* superformula
* [(0,0), (1,0), (2,0), (3,0), (9,0)]
* list of interesting disney science works
* 3d model for toricelly jets
* jets intersection problem for viscous speed constraint
* left hand maze traversal + tremaux's algorithm
* temperature drop due to the salt in snow
* pay for google AI services
* viscosity-limited velocities for jets
* TFWR 300 reuses - random walk
* top works in Computer Science in last 30 years
* insertion sort with multidrones
* TFWRF - companion 3x3 resource farming
* operator % - properties, tasks, problems
* operator ^ - properties, tasks, problems
* quinto-quartian circle in music
* transposition of melody using %
* recall that weird algorithm of sorting with decks
* pumpkins with queue
* 4 definitions of binomial coefficient
* Remainder by 2 for binomial coefficient
* Shap, boosting features
* bridge from Logistic optimization to optimal Bayesian model
* compare logistic regression with optimal Bayesian model on Titanic
* derive neutral value for absent factor
* continue to work with Bayesian models on Titanic
* Prony's method
* Binomial coefficients as a crossroad
* two unpaired numbers in array with pairs
* fast iterator/slow iterator
* calculate polynome value with data in iterator

Feel free to post what bothers you in comments. I'll consider your themes for future posts. Let's build an interesting place together. (russian or english - either way works)

English:
pole
Martin: break your fast
morphological intuition flails
Algorithms. Physics. Mathematics. Machine Learning. pinned «Topics request Now I have a list of topics for the nearest posts. * video for the first vibe coding project * why binary features are bad extrapolation features (explanation why quality goes down when interpolating features are included into extra variables)…»
The First Vibe Coding Project

In the chat about TFWR, one of the subscribers asked how to learn Python and HTML. I answered like this: “Feel free to vibecode a small application that uses Python, HTML, and CSS, and then use an LLM to talk about this application.” Word for word, I promised to show how to create such applications. And here we are.

Let's assume that we have a subscription to one of the contemporary LLM systems. I'll use Codex from OpenAI, but you can do the same with Claude or one of Google's models. They are approximately the same for simple tasks like this one.

Then I want to create a repository on GitHub. Basically, it's just the “Create a new repository” button. I like to create a README file, so don't forget to press this button. Then clone the repository with the git clone command. All these steps are optional for you but mandatory for me, because I want to put a reference to this project at the end of this post.

The main prerequisite for this tutorial is that you have an OpenAI subscription. Let's start our practice.

I can't memorise the command, so I asked Google “how to install Codex CLI” and got npm install -g @openai/codex. I ran it in the clone of my repository and... Attention! This is the most interesting part. I asked it:

Create an app where I can manage a simple todo list. This is a learning project; I mainly want to understand how to write programs in Python, CSS, and HTML. Use Python for the backend, and HTML, CSS, and JS for the frontend. The app must let me enter a new task for the list, mark tasks as completed, and delete a task from the list. The user interface should include an input field, an add button, and a delete button. Each task in the list should be markable as done.


And basically, that is it. You can check the result in the github repo . A screenshot of the result is at the beginning of this post.

You can check both en and ru versions of my prompt.
This media is not supported in your browser
VIEW IN TELEGRAM
TFWR. Drone coordination.

Last time we discussed how to create drones. Today let's discuss how to coordinate their activity.

In the game there is quite a limited number of tricks that allow you to coordinate drones. Let's use the approach with "the central coordinator drone" and use the number of drones as a synchronization tool.

I think one of the simplest tasks is to plant sunflowers, so let's do it. The pool of tasks is the pool of columns. I want to keep alive the main drone we have on screen and ask it to spread tasks over the other drones. Let's discuss the nuts and bolts of this approach.

Let's check the spawn_drone function. It takes as a parameter a function which operates a new drone. I have 32 drones and 32 columns. So it's necessary to create a tool for the creation of these parameterized drone functions. Let's do it with the "factory function" approach. It takes the number of a column as a parameter and returns a drone function with proper parameterization.

def f(column):
def g():
one_drone_plant_job(column)
return g


A simple function to plant all sunflowers in a given column:

def one_drone_plant_job(column):
n = get_world_size()
for y in range(n):
nav(column, y)
if can_harvest():
harvest()
if get_ground_type() == Grounds.GrassLand:
till()
plant(Entities.Sunflower)


And, last but not least, the code for the manager

Let's create a list of tasks

tasks = []
for t in range(get_world_size()):
tasks.append(t)


Dispatch them between drones

while True:
tasks = []
for t in range(get_world_size()):
tasks.append(t)

while tasks:
t = tasks.pop()
while num_drones() == 16: # max_drones():
pass
spawn_drone(f(t))


This "pass" loop is an "interprocess synchronization" tool that uses the number of drones as a synchronization object. When the "while tasks" loop finishes, the job is done - the field is full of sunflowers.

Code on github
Programming flood

To discuss programming tasks in comments
A short update on GBDTE presentation.

The die is cast. I set the date for my presentation on Gradient Boosted Decision trees. It's Jan 15. I'm aiming for a 45 minutes presentation, so like 30 slides to make. The timeline isn’t actually tight, but there’s no opportunity to do nothing either.

So I decided to stop here, where I am now, and work on the presentation. The current point is:
🌳there is an efficient multithreaded Golang implementation with python bridge
🌳there is a synthetic MSE dataset
🌳there is a synthetic LogLoss dataset

datasets demonstrate that the model is working and can handle hundreds of thousands of records

The crazy picture is about my predecessors:
🌳2017 LinXGBoost: Extension of XGBoost to Generalized Local Linear Models arxiv
🌳2019 Gradient Boosting with Piece-Wise Linear Regression Trees (GBDT-PL) — IJCAI arxiv
🌳2023 Fast Linear Model Trees by PILOT arxiv
🌳2024 PINE / PINEBoost: Efficient Piecewise-Linear Trees for Gradient Boosting arxiv

I started to think about them because there is quite an interesting thing. There are papers in which the authors try to mix extrapolating and interpolating features. My point is that it is not a good idea, because it breaks a fundamental idea behind this method: group objects with one type of feature and capture trends with another.

My inadvertent experiment demonstrated that the performance drops when you do it. And I think I know how to explain why.
🔥1
the point of no return has been crossed

🔪The Rubicon has been crossed.
🔪We've crossed the point of no return.
🔪There’s no turning back now.
🔪The die is cast.
🔪The bridge behind us has been burned.

So, except "The die is cast" expressions are quite similar to one we use in Russian.
sin(a+b), cos(a+b)

Hello everyone!

Today, while doing my dishes, I suddenly realized that sin and cos of a sum can be derived through the consecutive matrix multiplication. But I definitely remember that in school our teacher derived it through the Pythagorean theorem. Can anyone explain why?

P.S. I honestly think that in school sin(a+b) and cos(a+b) should be axiomatic basis, but if, suddenly, someone wants to derive it, what it the best way to do it?

Всем привет!

Сегодня, перемывая посуду, я внезапно обнаружил что косинус и синус суммы выводятся из последовательного применения операции поворота координат. Но я абсолютно точно помню что в школе эту формулу выводили из теоремы Пифагора. У кого-нибудь есть идеи, почему?

З.Ы. На самом деле я считаю что эти формулы в школе нужно тупо выучить наизусть и выводить из них всю остальную тригонометрию. Но если уж решили выводить, то как лучше?
Heteronym of the day

SOW

🐷 noun: "female pig", sound: /saʊ/
🐷 verb: "to plant seeds", sound: /soʊ/
Sunflower achievement

We have already started to discuss multidrone programming. I applied this technique to get the "Sunflower" achievement. I'm not quite happy with the code yet, but here we are: it works.

code - code to get this achievement
doc - crucial functions

I want to clean up this code and then I'll post a more thorough explanation of how it works.
Media is too big
VIEW IN TELEGRAM
MapReduce in The Farmer Was Replaced

Code discussion. Part I.

27 years ago Google was born. One of its key ideas was PageRank – a “credibility” score for a web page. (the paper) To compute it quickly, you need a lot of computational power. One option is to build a specialized supercomputer, but Google as a company rejected that path. Instead, they decided to use large arrays of commodity‑class hardware, connected into a network. You have to carefully orchestrate programs that run on such a platform, and back then writing distributed computation was almost like witchcraft.

In 2004 a new era of distributed computation started – the era of MapReduce platforms (see the paper by Dean and Ghemawat). The idea is simple: split your data into pieces, run the same “map” function in parallel on many machines, then combine (“reduce”) the partial results into one final result.

This post is about adapting that approach to the sunflower‑collection challenge in the game “The Farmer Was Replaced”.

The in‑game task has three stages: prepare the soil, plant sunflowers, harvest. We want to use the 5× bonus, so we try to collect sunflowers in order of measure(): from highest to lower. I’m talking about a late part of the game, when I have a 32×32 field and 32 drones, so I can assign one drone to each column or row.

I wanted a tiny “framework” to assign tasks to drones and run them in parallel. Because we take measurements, we also need to collect data from different drones and merge it. The core of this framework is the function drone_map_reduce from the script.

The function drone_map_reduce takes three parameters:

⌖ objects – things to process
⌖ f_map – a function that processes a batch of objects
⌖ f_red – a reducer function that combines results from each batch into one big result

GitHub reference

The drone API we rely on:

spawn_drone(f) – starts a drone running function f and returns a drone id
⌖ a drone “disappears” when f finishes
wait_for(drone_id) – waits for the drone to finish and returns its result

First we create slots for drones and distribute tasks between them:

def drone_map_reduce(objects, f_map, f_red):
drone_jobs = []

for _ in range(max_drones()):
drone_jobs.append([])

ind = 0
for q in objects:
drone_jobs[ind % len(drone_jobs)].append(q)
ind += 1


Now each element in drone_jobs is a batch of work for one drone (or for the main thread).

Then we start the drones and also process some batches in the main thread:

    results = []
drones = []

while drone_jobs:
while drone_jobs and num_drones() < max_drones():
current = drone_jobs.pop()
drones.append(spawn_drone(f_map(current)))
if drone_jobs:
current = drone_jobs.pop()
results.append(f_map(current)())


After that we collect all results from the drones:

    for current_drone in drones:
results.append(wait_for(current_drone))


And finally we combine everything into one result using f_red:

    reduced = None

for r in results:
if reduced is None:
reduced = r
else:
reduced = f_red(reduced, r)

return reduced


A few important details:

⌖ To create a valid drone job, f_map expects a list of the same type of elements that you store in objects.
⌖ We create up to max_drones() batches, so in the inner loop we just spawn drones until num_drones() reaches max_drones().
⌖ The code still works when the number of tasks is less than the number of drones (for example, with set_world_size(3)); some batches just end up empty.
⌖ If your drone function doesn’t return anything (None) and you don’t care about results, you can pass a dummy reducer like no_reduce that ignores its arguments. Or even just None value for reducer function because it will be never called.

Next time: how to build actual game logic on the top of this instrument.
🔥2
GBDTE presentation. Unstable class.

I’m working now on a presentation for my upcoming public talk. Here’s a picture that illustrates a small but telling idea: an applicant with exactly three unique phone numbers in their applications looked suspicious in 2014, neutral in 2015, and honest in 2016.
Word of the day

I’m in ɔ of this world.
Three letters → one ɔ.
Competitive programming guides.

When somebody asks about best guides for competitive programming, my first reaction is:
* A. Shen: "Programming: theorems and problems"
* Cormen Leiserson Rivest Stein: «Introduction to Algorithms»

Web:
* max algo - static site with a lot of interesting and useful algorithms
* sirius pythonic algo courses
* sirius c++ algo courses

New for me resources:
* competitive programming chectsheet https://usaco.guide/
* demos for algorithms https://t.me/python_ready/1986
MapReduce in The Farmer Was Replaced

Code discussion. Part II.

In the previous post we discussed the code which uses map reduce approach to farm sunflowers. We reviewed the drone_map_reduce(objects, f_map, f_red) function line-by-line and realized that:
👉🏻 objects is an array of tasks descriptions
👉🏻 f_map - a factory function that creates a task from its description
👉🏻 f_red - reducer function, merges results into one big result if necessary

Today we are going to discuss how this framework applies to solving three sunflower tasks: till the entire field, plant sunflowers, harvest sunflowers with x5 bonus.

Till the field

def till_field():
columns = []

for x in range(get_world_size()):
columns.append(x)

def till_column(columns):
def g():
for q in columns:
for p in range(get_world_size()):
nav(q, p)
till_soil()
return g

drone_map_reduce(columns, till_column, no_reduce)

Let's check this code. First, we prepare a list of columns to till. Then we prepare the factory function till_column, which takes a list of columns to process and returns... Attention!!! - a function to till all these columns. It uses helper functions: nav for navigation and till_soil for tilling grassland into Soil (it doesn't till it back). We use quite an interesting trick - nested function g inside the function till_column. It captures the columns variable and therefore is self-aware, it can run with no parameters. In "normal" programming it is like "partial" function which binds parameters with values. This function is called inside the spawn_drone function and determines the drone's actions and its lifetime.

Plant sunflowers

def plant_sunflowers():
drone_jobs = []

for q in range(get_world_size()):
drone_jobs.append(q)

def drone_job(columns):
def g():
heights = dict()
for q in columns:
for p in range(get_world_size()):
nav(q, p)
plant(Entities.Sunflower)
heights[(q, p)] = measure()
return heights
return g

def reduce_hights(h1, h2):
for k in h2:
h1[k] = h2[k]
return h1

return drone_map_reduce(drone_jobs, drone_job, reduce_hights)


The first part is quite familiar - forming a list of columns to process. The drone_job function is again a factory of functions. Again, these functions are quite similar to previous till_column. But now we have an interesting detail. Each sunflower has its own petal count and we are to use these petals to determine which sunflower to collect first. So, after planting a sunflower we call the measure() function and put its result into a dictionary:
heights[(q, p)] = measure()

Keys of this dictionary are tuples of coordinates and the values - the numbers of petals.

And last, but not least detail: reduce_heights function. It takes two dictionaries as parameters, updates the first with the values from the second and returns this first dictionary. Here the logic of the whole affair is a little bit complicated and I intentionally did it. In my first version the code was clearer: every time I created a fresh dictionary and filled it with the content of two parameters. But in this case drone suddenly stopped for a noticeably long time. It turned out that the price of clean code was quadratic complexity of the algorithm. Therefore I changed it and now you see messier, but faster code.

I'm not sure that one post can contain so much code, so let's use links to github. Starting from this line I convert a dictionary from format (x,y) -> h to h -> [(x1, y1), (x2, y2), ...]. For each height of sunflower now we have a list of cells with sunflowers of this height in these cells.

Finally , we use our map_reduce approach to collect all sunflowers in a correct order. "gather_all" is a factory function for this task.
🔥1