Algorithms. Physics. Mathematics. Machine Learning.
171 subscribers
86 photos
7 videos
40 links
DIY projects, fun with 3d, electronics, programming, vibe coding, math, ML algorithms.
Download Telegram
It's alive!

Finally I ran the full cycle of training and applying my EGBDT model in JupyterLab.

I spent two days in a very unpleasant debug session because I broke a simple rule:

Always do EDA!

EDA—Exploratory Data Analysis—is simple: before you do anything with your data, get a taste of it. Check the mean of the target and features. Take a small sample and read its raw dump. Plot histograms of your factors. Do smoke tests.

Instead, I just downloaded the dataset and jumped straight into training. The best I saw was 0.2 MSE on train and 0.3 on test. I started suspecting deep, fundamental problems—some math interfering with my plans.

Then a very simple thought: plot the graphs. Nothing extraordinary—just a basis-function factor over time.

It turned out my iron friend used
sin(𝑡) instead of sin(50𝑡). I was trying to approximate a high-frequency signal with a low-frequency one.

Fixing that made the MSE zero. On the first iteration.

Incredible—and incredibly unsatisfying to spend two days on something so simple: skipping EDA at the start.
Some simple EDA steps: number of components in the whole dataset
The first group in the dataset
The key graph in catching a culprit: basis functions. In wrong basis functions there wasn't any periods, just one slope.
Retrophotos. Physics.

It’s a photo from my previous life as a physicist. To be honest, it’s one of the greatest surprises of my life. You take a glass-clear piece of diamond—perfectly transparent and homogeneous. You put it in the electron microscope, close the lid, pour liquid nitrogen into the vacuum pumps, and wait four hours. Then you start and tune the electron-beam system, cool the sample holder with liquid nitrogen, adjust the optical system—and then… you see this picture. It’s a natural diamond, and the growth sectors are clearly visible. You can see blue, orange, and green lines of luminescence.

Blue region — N3 center (λ ≈ 415 nm), an aggregated-nitrogen defect.
Green — H3 center, formed by irradiation + annealing (often enhanced by plastic deformation).
Yellow — NV⁰ center at 575 nm (nitrogen + vacancy).

The electron microscope was half of the setup. The other half was a fairly large spectrometer. We recorded spectra in different areas of the samples and tried to capture the diffusion of vacancies.

Those days gave me the habit of writing down everything you do in your experiments, very carefully. When you're writing, everything feels obvious. A month later, it's anything but obvious—and you curse that guy who didn’t put in enough effort to write down the crucial details you now crave while trying to write an article.
3👾1
Tree. From Gradient Boosted Decision Trees.

In playing with some technology or algorithm, my favorite moment is that elusive, transitional state when it’s still a little bit “wtf?” and yet already a solid—though not yet boring—tool. Gradient-Boosted Decision Trees with Extrapolation (GBDTE) is exactly there right now.

In earlier posts I explained how I built a dataset for testing this ML idea. The image shows one training step of the algorithm on that dataset. Let’s unpack what’s going on. In the next post I’ll introduce the four basis functions: 1, t, sin(kt), cos(kt). Our dataset contains eight groups of points, and each group is a linear combination of those basis functions. So the model’s task is two-part: first, classify points and assign them to a group; second, fit the best approximation for each group.

Let’s check ourselves. In a later post I highlight two of the most prominent groups. We’ll locate them in the tree, inspect their weights, then find them on the picture and compare our interpretation with the graphs.

Just for fun, let's inspect the bubble with id=4. Read from top to bottom: 0.20 means that when t = 0 this component should have value 0.2. Next 0.09 value means that we have slightly rising trend. For sin and cos we have zeros. It means that there is no oscillations in this component. Now we can find out values of f-parameters which describe this component. One should take "right-left-left" route from root, in terms of factor values it's (0, 0, 1). Be careful, on tree picture factors are f0, f1, f2. On pictures with components - f1, f2, f3. My bad. Check picture below. You will see that our description of this component is totally correct. It works!

In the opposite direction: the second component has much steeper tilt, we can expect bigger value of the second component. Curve rises, so the second component in a leaf is positive. Intersection with Oy is lower, so the first component of a leaf should be close to zero. Some oscillations are visible, but not very prominent. We can expect non-zero small third and fourth components. Static factors are (0, 1, 0), it reads like left, left, right sequence on the tree diagram and leads into the node with id:1. Weights are -0.01, 0.26, 0, -0.01. I think, perfect match.

And the third curve: 1, 0, 0 - left, right, left - id=2; weights 0.03, 0.18, 0.09, -0.02; finally has quite a prominent harmonic part and it is totally visible on the curve.

Quite an impressive result for one step of the algorithm, isn't it?

When trends in general are quite good and the tree corresponds to the graph, there is one small issue which bothers me. MSE shows mismatch on the level of 0.0003. I don't understand why. Yet.
2
Four basis functions.

To build the MSE dataset I used four basis functions: 1, t, sin(50t), cos(50t). Why these four?

𐂅 Constant never hurts — it lets the curve shift up or down.
𐂅Linear captures any overall trend.
𐂅sin and cos with the same frequency come as a pair: by mixing their weights you set both amplitude and phase (i.e., you can place the peaks where you need them).

Why 50?
Because on 𝑡∈[0,1] the period 2π/50 roughly matches the width of the letters in the title I’m approximating, so the oscillations align with the letter shapes.
Three out of eight

Let's check three out of eight components. On the colored picture you can see the whole title.

Let’s inspect three of the eight components (the colored figure shows the full title). The first component captures the “hats” of the T’s and the upper parts of B and O; the second picks up interior points that happen to lie on steep slopes; the third follows portions of the vertical strokes.
🥰1
The Absolute Evil of Default Parameters

Default parameters are evil. Seriously. It’s very convenient evil—so convenient that it crawled into my library.

On the first image there is a call to train a model with:

🐴n_stages = 1: one step of boosting (one tree)
🐴learning_rate = 1.0: jump immediately into the minimum of the current parabola
🐴max_depth = 3: use three static variables to separate objects into groups (8 groups)

And that seems to be it. So why do we have a non-zero loss? No clue.

God bless ChatGPT. I uploaded the dataset and explained the hypothesis: there’s something odd—try to find the best fitting of extra parameters to the target in each group defined by static features. Then I went to brush my teeth. When I was back, the answer was ready: the dataset is OK, MSE is about 1e-32—zero in terms of double precision.

I scratched my head, zipped my whole repository, uploaded it to ChatGPT, and asked: “Find me the source of 0.0007 RMSE.” Then I started my daily English routine. Fifteen minutes later the answer was ready:

F**ING DEFAULT PARAMETERS

My iron friend dug out this line:
🐴reg_lambda: float = 1e-4

This L2 regularization parameter forced the model to miss points in the training dataset and caused that small but annoying mismatch.

I can imagine myself trying to debug this. My iron friend saved me about two days of checking different parts of my system again and again. And, probably, everything would have been paused - again.
👍1
Everybody wants to be like ewe

Track on youtube

Just a small reminder:
🐑Homophones - "you" and "ewe", same pronunciation but different meanings
🐑Homographs - same spelling, different pronunciations (wind noun vs. wind verb). cat noun and cat verb...
🐑Homonyms (broad sense) umbrella term covering same-sound and/or same-spelling pairs

That's not it.

🐑Oronyms: phrases that sound alike — “ice cream” / “I scream”.
🐑Homographs: same spelling, different meanings — lead (metal) / lead (guide)
🐑Heteronyms: same spelling, different pronunciations and meanings — wind (noun) / wind (verb).
🐑Capitonyms: meaning (and sometimes pronunciation) changes with capitalization — Polish / polish, March / march.
🐑Heterographs: different spellings for words in a pair — you/ewe
🐑Polysemy: one word with related senses — head (of a person, of a department)
🐑Auto-antonyms (contronyms/Janus words): one word with opposite meanings — sanction (“approve” / “penalize”)

Mishearings

🐑Mondegreen: misheard lyric — “’Scuse me while I kiss this guy” for “kiss the sky.”
🐑Eggcorn: plausible but wrong substitution — “for all intensive purposes” for “intents and purposes.”
🐑Malapropism: wrong word, similar sound — “dance the flamingo” for flamenco.
🐑Spoonerism: swapping sounds — “You have hissed all my mystery lectures.”
👍1
The Farmer Was Replaced

In one of ML channels I'm reading, I found a reference to the game "The Farmer Was Replaced". Here you are to write programs on a language which is quite similar to Python, but is not quite Python to solve farm puzzles.

Some of them are quite essential: to chop grass and collect hay, to grow bushes and trees and collect wood. Carrot is totally normal - you just to spend some wood to seed it. Weird things starts with pumpkins: they rot with 20% probability, but when there is a patch of ready to harvest pumpkins, they merge together and when you harvest this mega-pumpkin, you have bonus harvest.

Then things go totally astray: on the picture it's a dinosaur chasing the apple. Basically you are to program automatic pilot for the famous "snake" game.

There are also cacti. You have to sort them on two-dimensional grid in order to get bonus.

I totally like it because it gives a chance to practice medium programming tasks at the Meta/Google 20-minutes level while playing the game.

It also brought back some memories I thought I'd forgotten. I will try to share them in this channel.
Memories Awakened by “The Farmer Was Replaced”

It seems like a very trivial question: “What should a drone do if it is on the lower (South) edge of the field and it has a move(South) command to perform?” But, strangely, this question leads to quite an interesting set of consequences.

In The Farmer Was Replaced the drone teleports to the other end of the field. It means that the upper edge of the field is glued to the lower edge. As I drew it, a square with upper and lower edges glued together becomes a cylinder from the topological point of view. Then the same goes for the left and right boundaries, and we have a torus.

There are other options. For instance, if the drone disappears on the lower edge and then appears on the left one (and similarly with upper and right), we have a sphere.

But that’s not all. You can play with the orientation of edges. If the drone appears at the beginning of the lower edge and then at the end of the upper edge, it means that we reversed the orientation. After we finish gluing edges, we get a Klein bottle or the projective plane. Unfortunately, you can’t embed these figures in 3D without self-intersection.

About memories. I first read about this in Anatoly Fomenko’s book. He also wrote “History: Fiction or Science?” (New Chronology) and, while he’s quite famous for that, he’s also a mathematician; his “Visual Geometry and Topology” is an interesting read. It even contains illustrations for “The Master and Margarita.” What I explained about gluing the edges of a square together is a simplified version of the introductory section of that book.
TFWR. Navigation.

In order to navigate the drone, there are get_pos_x(), get_pos_y(), and move() functions. move() takes East (x + 1), North (y + 1), West (x − 1), and South (y − 1) commands. get_world_size() is useful too. The world is square, so a single number (the side length) gives you everything you need.

nav(x, y) is a nice-to-have function. It moves the drone to position (x, y).

In the screenshot, a naive navigation function decides where to go—positive or negative direction—and moves accordingly. Why naive? Because it doesn't take into account the toroidal nature of the world (wrap-around at the boundaries).
1👨‍💻1