Great news everyone!
I turned on autotranslate on this channel, so everyone can read posts on their favorite language.
I turned on autotranslate on this channel, so everyone can read posts on their favorite language.
The same post in a slightly different format. Let's try and see what's work better for us.
Designing an MSE Dataset
Intro
Today I want to break our strict posting sequence and jump to a current problem.
I want to open my GitHub project and give users a nice dataset to test our method. There’s already a good dataset for the log-loss function, and I was struggling to design one for MSE. So here we are: we have a model that can group objects and find the best linear combination of basis functions to minimize a loss. How do we create a nice, interesting dataset for such a model? Buckle up—let’s dive in.
DIY Dataset
First of all, I started with a small bit of handcrafting and wrote “Extra Boost” with dots on a piece of paper. Then I uploaded it to ChatGPT and started a vibe-coding session. In no time I turned the dots into a scatter plot. But, as you know, screen and math coordinate systems differ in the direction of the Y-axis, so my title was upside down.
It seems trivial—barely worth mentioning. And, probably, it is in the world of traditional programming. But in vibe-coding it became quite a nuisance. Like two silly servants in an old Soviet cartoon, the model did two things at once: it changed the dataset and flipped the sign of Y. As a result, I spent quite a while staring at an upside-down title. After a while I figured out what was going on and got a normal scatter plot.
“And now what?” — probably the question you have in your head right now. Don’t worry—I had the same question. Then I started to think about properties of my approach:
it can group objects by static features;
it can find the best fit for several basis functions.
So I needed some interesting basis functions. OK: [1, t, sin(kt), cos(kt)]. The constant lets you move the plot up and down—always useful. t is for trends. The pair sin(kt) and cos(kt) lets us place a harmonic component where we want it; with the right amplitudes you can shift it left or right.
Let’s stop here. Where these basis functions show up in our “Extra Boost” title—I’ll explain in the next post.
Designing an MSE Dataset
Intro
Today I want to break our strict posting sequence and jump to a current problem.
I want to open my GitHub project and give users a nice dataset to test our method. There’s already a good dataset for the log-loss function, and I was struggling to design one for MSE. So here we are: we have a model that can group objects and find the best linear combination of basis functions to minimize a loss. How do we create a nice, interesting dataset for such a model? Buckle up—let’s dive in.
DIY Dataset
First of all, I started with a small bit of handcrafting and wrote “Extra Boost” with dots on a piece of paper. Then I uploaded it to ChatGPT and started a vibe-coding session. In no time I turned the dots into a scatter plot. But, as you know, screen and math coordinate systems differ in the direction of the Y-axis, so my title was upside down.
It seems trivial—barely worth mentioning. And, probably, it is in the world of traditional programming. But in vibe-coding it became quite a nuisance. Like two silly servants in an old Soviet cartoon, the model did two things at once: it changed the dataset and flipped the sign of Y. As a result, I spent quite a while staring at an upside-down title. After a while I figured out what was going on and got a normal scatter plot.
“And now what?” — probably the question you have in your head right now. Don’t worry—I had the same question. Then I started to think about properties of my approach:
it can group objects by static features;
it can find the best fit for several basis functions.
So I needed some interesting basis functions. OK: [1, t, sin(kt), cos(kt)]. The constant lets you move the plot up and down—always useful. t is for trends. The pair sin(kt) and cos(kt) lets us place a harmonic component where we want it; with the right amplitudes you can shift it left or right.
Let’s stop here. Where these basis functions show up in our “Extra Boost” title—I’ll explain in the next post.
Which of the two formats is more convenient for you to read?
Anonymous Poll
50%
Instant view⚡️
50%
Post + pictures after it🦥
0%
Who is here?❓
0%
I need more info: what are these approaches? ❓
How to find linear superposition in chaos
Now we have a set of points which, while fairly random from a mathematical point of view, give us a depiction of the “Extra Boost” sign. For my method, I need to find several groups, each represented by a linear combination of basis functions. I set time \(t\) to go from left to right, from 0 to 1. The basis functions are [1,t, sin(kt), cos(kt)], so the extrapolating function is (Expression below).
Weights (A,B,C,D) can be estimated from the dataset using least squares, but we still need to pick k. After a set of experiments I chose k=50: it gives a convenient scale—the wavelength is roughly the width of a letter.
With this setup I obtained the picture you see at the beginning of the article. Then I decided the tolerance was too large and reduced the band width.
Here we are: a narrow band.
Next, I removed points within the tolerance range and repeated the process. To my surprise, after the first iteration nothing changed.
You can see that the dots disappeared, but the curve didn’t change. After a while I understood why. It was vibe-coding: I asked my iron friend to find a curve that captures the highest number of points; instead, it wrote code that minimizes MSE. That approach has an interesting property: when you delete points lying on the curve, the MSE is unchanged, so the same curve remains optimal.
I told the iron friend that, instead of minimizing squared distance to the points, it should maximize the number of captured points. It proposed the RANSAC approach, which was new to me: repeatedly select four random points, fit the curve, count captured points, and keep the candidate with the most inliers. It worked.
I ran the process iteratively, and it decomposed the figure into a superposition of functions. Unfortunately, the upper half of “B” wasn’t captured. I suspected the issue was the different heights of lowercase and uppercase letters and created a second version of the drawing.
The same procedure gave me the sign decomposed into eight components, each a superposition of the basis functions.
Finally, I encoded the group number as a 0–1 vector of static features f1,f2,f3 and exported the dataset as CSV. Hooray — now we have data to test the MSE mode of the EXTRA BOOST model.
Now we have a set of points which, while fairly random from a mathematical point of view, give us a depiction of the “Extra Boost” sign. For my method, I need to find several groups, each represented by a linear combination of basis functions. I set time \(t\) to go from left to right, from 0 to 1. The basis functions are [1,t, sin(kt), cos(kt)], so the extrapolating function is (Expression below).
Weights (A,B,C,D) can be estimated from the dataset using least squares, but we still need to pick k. After a set of experiments I chose k=50: it gives a convenient scale—the wavelength is roughly the width of a letter.
With this setup I obtained the picture you see at the beginning of the article. Then I decided the tolerance was too large and reduced the band width.
Here we are: a narrow band.
Next, I removed points within the tolerance range and repeated the process. To my surprise, after the first iteration nothing changed.
You can see that the dots disappeared, but the curve didn’t change. After a while I understood why. It was vibe-coding: I asked my iron friend to find a curve that captures the highest number of points; instead, it wrote code that minimizes MSE. That approach has an interesting property: when you delete points lying on the curve, the MSE is unchanged, so the same curve remains optimal.
I told the iron friend that, instead of minimizing squared distance to the points, it should maximize the number of captured points. It proposed the RANSAC approach, which was new to me: repeatedly select four random points, fit the curve, count captured points, and keep the candidate with the most inliers. It worked.
I ran the process iteratively, and it decomposed the figure into a superposition of functions. Unfortunately, the upper half of “B” wasn’t captured. I suspected the issue was the different heights of lowercase and uppercase letters and created a second version of the drawing.
The same procedure gave me the sign decomposed into eight components, each a superposition of the basis functions.
Finally, I encoded the group number as a 0–1 vector of static features f1,f2,f3 and exported the dataset as CSV. Hooray — now we have data to test the MSE mode of the EXTRA BOOST model.
How to set up openai helper in jupyterlab
For quite a while I was procrastinating quite a simple task: to set up AI assistant in jupyter lab. Here I want to write down a sequence of steps for memory.
* set up environment: . ~/envs/env312 (my working venv)
* pip install jupyterlab
* pip install "jupyter-ai[all]"
* export OPENAI_API_KEY="sk-...your key..."
start jupyter lab, inside jupyter notebook
%load_ext jupyter_ai_magics
%ai list openai-chat
It gives you a list of available models
%config AiMagics.default_language_model = "openai-chat:gpt-4o-mini"
cost efficient everyday option
On the side panel there is a new pane "jupyter ai chat". Select here your model, paste OPENAI_API_KEY. It's a little bit ugly: seems like you are to both export it as an environment variable and plug in here, I couldn't fight it.
Now we have: "Hi there! I'm Jupyternaut, your programming assistant."
jupyter ai documentation
Here we are. Magic command gives us what we want
For quite a while I was procrastinating quite a simple task: to set up AI assistant in jupyter lab. Here I want to write down a sequence of steps for memory.
* set up environment: . ~/envs/env312 (my working venv)
* pip install jupyterlab
* pip install "jupyter-ai[all]"
* export OPENAI_API_KEY="sk-...your key..."
start jupyter lab, inside jupyter notebook
%load_ext jupyter_ai_magics
%ai list openai-chat
It gives you a list of available models
%config AiMagics.default_language_model = "openai-chat:gpt-4o-mini"
cost efficient everyday option
On the side panel there is a new pane "jupyter ai chat". Select here your model, paste OPENAI_API_KEY. It's a little bit ugly: seems like you are to both export it as an environment variable and plug in here, I couldn't fight it.
Now we have: "Hi there! I'm Jupyternaut, your programming assistant."
jupyter ai documentation
Here we are. Magic command gives us what we want
%%ai chatgpt --format code
create a picture of 17 points equally distant on a circle, pairwise connected
It's alive!
Finally I ran the full cycle of training and applying my EGBDT model in JupyterLab.
I spent two days in a very unpleasant debug session because I broke a simple rule:
Always do EDA!
EDA—Exploratory Data Analysis—is simple: before you do anything with your data, get a taste of it. Check the mean of the target and features. Take a small sample and read its raw dump. Plot histograms of your factors. Do smoke tests.
Instead, I just downloaded the dataset and jumped straight into training. The best I saw was 0.2 MSE on train and 0.3 on test. I started suspecting deep, fundamental problems—some math interfering with my plans.
Then a very simple thought: plot the graphs. Nothing extraordinary—just a basis-function factor over time.
It turned out my iron friend used
sin(𝑡) instead of sin(50𝑡). I was trying to approximate a high-frequency signal with a low-frequency one.
Fixing that made the MSE zero. On the first iteration.
Incredible—and incredibly unsatisfying to spend two days on something so simple: skipping EDA at the start.
Finally I ran the full cycle of training and applying my EGBDT model in JupyterLab.
I spent two days in a very unpleasant debug session because I broke a simple rule:
Always do EDA!
EDA—Exploratory Data Analysis—is simple: before you do anything with your data, get a taste of it. Check the mean of the target and features. Take a small sample and read its raw dump. Plot histograms of your factors. Do smoke tests.
Instead, I just downloaded the dataset and jumped straight into training. The best I saw was 0.2 MSE on train and 0.3 on test. I started suspecting deep, fundamental problems—some math interfering with my plans.
Then a very simple thought: plot the graphs. Nothing extraordinary—just a basis-function factor over time.
It turned out my iron friend used
sin(𝑡) instead of sin(50𝑡). I was trying to approximate a high-frequency signal with a low-frequency one.
Fixing that made the MSE zero. On the first iteration.
Incredible—and incredibly unsatisfying to spend two days on something so simple: skipping EDA at the start.