Which of the two formats is more convenient for you to read?
Anonymous Poll
50%
Instant view⚡️
50%
Post + pictures after it🦥
0%
Who is here?❓
0%
I need more info: what are these approaches? ❓
How to find linear superposition in chaos
Now we have a set of points which, while fairly random from a mathematical point of view, give us a depiction of the “Extra Boost” sign. For my method, I need to find several groups, each represented by a linear combination of basis functions. I set time \(t\) to go from left to right, from 0 to 1. The basis functions are [1,t, sin(kt), cos(kt)], so the extrapolating function is (Expression below).
Weights (A,B,C,D) can be estimated from the dataset using least squares, but we still need to pick k. After a set of experiments I chose k=50: it gives a convenient scale—the wavelength is roughly the width of a letter.
With this setup I obtained the picture you see at the beginning of the article. Then I decided the tolerance was too large and reduced the band width.
Here we are: a narrow band.
Next, I removed points within the tolerance range and repeated the process. To my surprise, after the first iteration nothing changed.
You can see that the dots disappeared, but the curve didn’t change. After a while I understood why. It was vibe-coding: I asked my iron friend to find a curve that captures the highest number of points; instead, it wrote code that minimizes MSE. That approach has an interesting property: when you delete points lying on the curve, the MSE is unchanged, so the same curve remains optimal.
I told the iron friend that, instead of minimizing squared distance to the points, it should maximize the number of captured points. It proposed the RANSAC approach, which was new to me: repeatedly select four random points, fit the curve, count captured points, and keep the candidate with the most inliers. It worked.
I ran the process iteratively, and it decomposed the figure into a superposition of functions. Unfortunately, the upper half of “B” wasn’t captured. I suspected the issue was the different heights of lowercase and uppercase letters and created a second version of the drawing.
The same procedure gave me the sign decomposed into eight components, each a superposition of the basis functions.
Finally, I encoded the group number as a 0–1 vector of static features f1,f2,f3 and exported the dataset as CSV. Hooray — now we have data to test the MSE mode of the EXTRA BOOST model.
Now we have a set of points which, while fairly random from a mathematical point of view, give us a depiction of the “Extra Boost” sign. For my method, I need to find several groups, each represented by a linear combination of basis functions. I set time \(t\) to go from left to right, from 0 to 1. The basis functions are [1,t, sin(kt), cos(kt)], so the extrapolating function is (Expression below).
Weights (A,B,C,D) can be estimated from the dataset using least squares, but we still need to pick k. After a set of experiments I chose k=50: it gives a convenient scale—the wavelength is roughly the width of a letter.
With this setup I obtained the picture you see at the beginning of the article. Then I decided the tolerance was too large and reduced the band width.
Here we are: a narrow band.
Next, I removed points within the tolerance range and repeated the process. To my surprise, after the first iteration nothing changed.
You can see that the dots disappeared, but the curve didn’t change. After a while I understood why. It was vibe-coding: I asked my iron friend to find a curve that captures the highest number of points; instead, it wrote code that minimizes MSE. That approach has an interesting property: when you delete points lying on the curve, the MSE is unchanged, so the same curve remains optimal.
I told the iron friend that, instead of minimizing squared distance to the points, it should maximize the number of captured points. It proposed the RANSAC approach, which was new to me: repeatedly select four random points, fit the curve, count captured points, and keep the candidate with the most inliers. It worked.
I ran the process iteratively, and it decomposed the figure into a superposition of functions. Unfortunately, the upper half of “B” wasn’t captured. I suspected the issue was the different heights of lowercase and uppercase letters and created a second version of the drawing.
The same procedure gave me the sign decomposed into eight components, each a superposition of the basis functions.
Finally, I encoded the group number as a 0–1 vector of static features f1,f2,f3 and exported the dataset as CSV. Hooray — now we have data to test the MSE mode of the EXTRA BOOST model.
How to set up openai helper in jupyterlab
For quite a while I was procrastinating quite a simple task: to set up AI assistant in jupyter lab. Here I want to write down a sequence of steps for memory.
* set up environment: . ~/envs/env312 (my working venv)
* pip install jupyterlab
* pip install "jupyter-ai[all]"
* export OPENAI_API_KEY="sk-...your key..."
start jupyter lab, inside jupyter notebook
%load_ext jupyter_ai_magics
%ai list openai-chat
It gives you a list of available models
%config AiMagics.default_language_model = "openai-chat:gpt-4o-mini"
cost efficient everyday option
On the side panel there is a new pane "jupyter ai chat". Select here your model, paste OPENAI_API_KEY. It's a little bit ugly: seems like you are to both export it as an environment variable and plug in here, I couldn't fight it.
Now we have: "Hi there! I'm Jupyternaut, your programming assistant."
jupyter ai documentation
Here we are. Magic command gives us what we want
For quite a while I was procrastinating quite a simple task: to set up AI assistant in jupyter lab. Here I want to write down a sequence of steps for memory.
* set up environment: . ~/envs/env312 (my working venv)
* pip install jupyterlab
* pip install "jupyter-ai[all]"
* export OPENAI_API_KEY="sk-...your key..."
start jupyter lab, inside jupyter notebook
%load_ext jupyter_ai_magics
%ai list openai-chat
It gives you a list of available models
%config AiMagics.default_language_model = "openai-chat:gpt-4o-mini"
cost efficient everyday option
On the side panel there is a new pane "jupyter ai chat". Select here your model, paste OPENAI_API_KEY. It's a little bit ugly: seems like you are to both export it as an environment variable and plug in here, I couldn't fight it.
Now we have: "Hi there! I'm Jupyternaut, your programming assistant."
jupyter ai documentation
Here we are. Magic command gives us what we want
%%ai chatgpt --format code
create a picture of 17 points equally distant on a circle, pairwise connected
It's alive!
Finally I ran the full cycle of training and applying my EGBDT model in JupyterLab.
I spent two days in a very unpleasant debug session because I broke a simple rule:
Always do EDA!
EDA—Exploratory Data Analysis—is simple: before you do anything with your data, get a taste of it. Check the mean of the target and features. Take a small sample and read its raw dump. Plot histograms of your factors. Do smoke tests.
Instead, I just downloaded the dataset and jumped straight into training. The best I saw was 0.2 MSE on train and 0.3 on test. I started suspecting deep, fundamental problems—some math interfering with my plans.
Then a very simple thought: plot the graphs. Nothing extraordinary—just a basis-function factor over time.
It turned out my iron friend used
sin(𝑡) instead of sin(50𝑡). I was trying to approximate a high-frequency signal with a low-frequency one.
Fixing that made the MSE zero. On the first iteration.
Incredible—and incredibly unsatisfying to spend two days on something so simple: skipping EDA at the start.
Finally I ran the full cycle of training and applying my EGBDT model in JupyterLab.
I spent two days in a very unpleasant debug session because I broke a simple rule:
Always do EDA!
EDA—Exploratory Data Analysis—is simple: before you do anything with your data, get a taste of it. Check the mean of the target and features. Take a small sample and read its raw dump. Plot histograms of your factors. Do smoke tests.
Instead, I just downloaded the dataset and jumped straight into training. The best I saw was 0.2 MSE on train and 0.3 on test. I started suspecting deep, fundamental problems—some math interfering with my plans.
Then a very simple thought: plot the graphs. Nothing extraordinary—just a basis-function factor over time.
It turned out my iron friend used
sin(𝑡) instead of sin(50𝑡). I was trying to approximate a high-frequency signal with a low-frequency one.
Fixing that made the MSE zero. On the first iteration.
Incredible—and incredibly unsatisfying to spend two days on something so simple: skipping EDA at the start.
Retrophotos. Physics.
It’s a photo from my previous life as a physicist. To be honest, it’s one of the greatest surprises of my life. You take a glass-clear piece of diamond—perfectly transparent and homogeneous. You put it in the electron microscope, close the lid, pour liquid nitrogen into the vacuum pumps, and wait four hours. Then you start and tune the electron-beam system, cool the sample holder with liquid nitrogen, adjust the optical system—and then… you see this picture. It’s a natural diamond, and the growth sectors are clearly visible. You can see blue, orange, and green lines of luminescence.
Blue region — N3 center (λ ≈ 415 nm), an aggregated-nitrogen defect.
Green — H3 center, formed by irradiation + annealing (often enhanced by plastic deformation).
Yellow — NV⁰ center at 575 nm (nitrogen + vacancy).
The electron microscope was half of the setup. The other half was a fairly large spectrometer. We recorded spectra in different areas of the samples and tried to capture the diffusion of vacancies.
Those days gave me the habit of writing down everything you do in your experiments, very carefully. When you're writing, everything feels obvious. A month later, it's anything but obvious—and you curse that guy who didn’t put in enough effort to write down the crucial details you now crave while trying to write an article.
It’s a photo from my previous life as a physicist. To be honest, it’s one of the greatest surprises of my life. You take a glass-clear piece of diamond—perfectly transparent and homogeneous. You put it in the electron microscope, close the lid, pour liquid nitrogen into the vacuum pumps, and wait four hours. Then you start and tune the electron-beam system, cool the sample holder with liquid nitrogen, adjust the optical system—and then… you see this picture. It’s a natural diamond, and the growth sectors are clearly visible. You can see blue, orange, and green lines of luminescence.
Blue region — N3 center (λ ≈ 415 nm), an aggregated-nitrogen defect.
Green — H3 center, formed by irradiation + annealing (often enhanced by plastic deformation).
Yellow — NV⁰ center at 575 nm (nitrogen + vacancy).
The electron microscope was half of the setup. The other half was a fairly large spectrometer. We recorded spectra in different areas of the samples and tried to capture the diffusion of vacancies.
Those days gave me the habit of writing down everything you do in your experiments, very carefully. When you're writing, everything feels obvious. A month later, it's anything but obvious—and you curse that guy who didn’t put in enough effort to write down the crucial details you now crave while trying to write an article.
❤3👾1
Tree. From Gradient Boosted Decision Trees.
In playing with some technology or algorithm, my favorite moment is that elusive, transitional state when it’s still a little bit “wtf?” and yet already a solid—though not yet boring—tool. Gradient-Boosted Decision Trees with Extrapolation (GBDTE) is exactly there right now.
In earlier posts I explained how I built a dataset for testing this ML idea. The image shows one training step of the algorithm on that dataset. Let’s unpack what’s going on. In the next post I’ll introduce the four basis functions: 1, t, sin(kt), cos(kt). Our dataset contains eight groups of points, and each group is a linear combination of those basis functions. So the model’s task is two-part: first, classify points and assign them to a group; second, fit the best approximation for each group.
Let’s check ourselves. In a later post I highlight two of the most prominent groups. We’ll locate them in the tree, inspect their weights, then find them on the picture and compare our interpretation with the graphs.
Just for fun, let's inspect the bubble with id=4. Read from top to bottom: 0.20 means that when t = 0 this component should have value 0.2. Next 0.09 value means that we have slightly rising trend. For sin and cos we have zeros. It means that there is no oscillations in this component. Now we can find out values of f-parameters which describe this component. One should take "right-left-left" route from root, in terms of factor values it's (0, 0, 1). Be careful, on tree picture factors are f0, f1, f2. On pictures with components - f1, f2, f3. My bad. Check picture below. You will see that our description of this component is totally correct. It works!
In the opposite direction: the second component has much steeper tilt, we can expect bigger value of the second component. Curve rises, so the second component in a leaf is positive. Intersection with Oy is lower, so the first component of a leaf should be close to zero. Some oscillations are visible, but not very prominent. We can expect non-zero small third and fourth components. Static factors are (0, 1, 0), it reads like left, left, right sequence on the tree diagram and leads into the node with id:1. Weights are -0.01, 0.26, 0, -0.01. I think, perfect match.
And the third curve: 1, 0, 0 - left, right, left - id=2; weights 0.03, 0.18, 0.09, -0.02; finally has quite a prominent harmonic part and it is totally visible on the curve.
Quite an impressive result for one step of the algorithm, isn't it?
When trends in general are quite good and the tree corresponds to the graph, there is one small issue which bothers me. MSE shows mismatch on the level of 0.0003. I don't understand why. Yet.
In playing with some technology or algorithm, my favorite moment is that elusive, transitional state when it’s still a little bit “wtf?” and yet already a solid—though not yet boring—tool. Gradient-Boosted Decision Trees with Extrapolation (GBDTE) is exactly there right now.
In earlier posts I explained how I built a dataset for testing this ML idea. The image shows one training step of the algorithm on that dataset. Let’s unpack what’s going on. In the next post I’ll introduce the four basis functions: 1, t, sin(kt), cos(kt). Our dataset contains eight groups of points, and each group is a linear combination of those basis functions. So the model’s task is two-part: first, classify points and assign them to a group; second, fit the best approximation for each group.
Let’s check ourselves. In a later post I highlight two of the most prominent groups. We’ll locate them in the tree, inspect their weights, then find them on the picture and compare our interpretation with the graphs.
Just for fun, let's inspect the bubble with id=4. Read from top to bottom: 0.20 means that when t = 0 this component should have value 0.2. Next 0.09 value means that we have slightly rising trend. For sin and cos we have zeros. It means that there is no oscillations in this component. Now we can find out values of f-parameters which describe this component. One should take "right-left-left" route from root, in terms of factor values it's (0, 0, 1). Be careful, on tree picture factors are f0, f1, f2. On pictures with components - f1, f2, f3. My bad. Check picture below. You will see that our description of this component is totally correct. It works!
In the opposite direction: the second component has much steeper tilt, we can expect bigger value of the second component. Curve rises, so the second component in a leaf is positive. Intersection with Oy is lower, so the first component of a leaf should be close to zero. Some oscillations are visible, but not very prominent. We can expect non-zero small third and fourth components. Static factors are (0, 1, 0), it reads like left, left, right sequence on the tree diagram and leads into the node with id:1. Weights are -0.01, 0.26, 0, -0.01. I think, perfect match.
And the third curve: 1, 0, 0 - left, right, left - id=2; weights 0.03, 0.18, 0.09, -0.02; finally has quite a prominent harmonic part and it is totally visible on the curve.
Quite an impressive result for one step of the algorithm, isn't it?
When trends in general are quite good and the tree corresponds to the graph, there is one small issue which bothers me. MSE shows mismatch on the level of 0.0003. I don't understand why. Yet.
❤2
Four basis functions.
To build the MSE dataset I used four basis functions: 1, t, sin(50t), cos(50t). Why these four?
𐂅 Constant never hurts — it lets the curve shift up or down.
𐂅Linear captures any overall trend.
𐂅sin and cos with the same frequency come as a pair: by mixing their weights you set both amplitude and phase (i.e., you can place the peaks where you need them).
Why 50?
Because on 𝑡∈[0,1] the period 2π/50 roughly matches the width of the letters in the title I’m approximating, so the oscillations align with the letter shapes.
To build the MSE dataset I used four basis functions: 1, t, sin(50t), cos(50t). Why these four?
𐂅 Constant never hurts — it lets the curve shift up or down.
𐂅Linear captures any overall trend.
𐂅sin and cos with the same frequency come as a pair: by mixing their weights you set both amplitude and phase (i.e., you can place the peaks where you need them).
Why 50?