Algorithms. Physics. Mathematics. Machine Learning.

How to find linear superposition in chaos

Now we have a set of points which, while fairly random from a mathematical point of view, give us a depiction of the “Extra Boost” sign. For my method, I need to find several groups, each represented by a linear combination of basis functions. I set time \(t\) to go from left to right, from 0 to 1. The basis functions are [1,t, sin(kt), cos(kt)], so the extrapolating function is (Expression below).

Weights (A,B,C,D) can be estimated from the dataset using least squares, but we still need to pick k. After a set of experiments I chose k=50: it gives a convenient scale—the wavelength is roughly the width of a letter.

With this setup I obtained the picture you see at the beginning of the article. Then I decided the tolerance was too large and reduced the band width.

Here we are: a narrow band.

Next, I removed points within the tolerance range and repeated the process. To my surprise, after the first iteration nothing changed.

You can see that the dots disappeared, but the curve didn’t change. After a while I understood why. It was vibe-coding: I asked my iron friend to find a curve that captures the highest number of points; instead, it wrote code that minimizes MSE. That approach has an interesting property: when you delete points lying on the curve, the MSE is unchanged, so the same curve remains optimal.

I told the iron friend that, instead of minimizing squared distance to the points, it should maximize the number of captured points. It proposed the RANSAC approach, which was new to me: repeatedly select four random points, fit the curve, count captured points, and keep the candidate with the most inliers. It worked.

I ran the process iteratively, and it decomposed the figure into a superposition of functions. Unfortunately, the upper half of “B” wasn’t captured. I suspected the issue was the different heights of lowercase and uppercase letters and created a second version of the drawing.

The same procedure gave me the sign decomposed into eight components, each a superposition of the basis functions.

Finally, I encoded the group number as a 0–1 vector of static features f1,f2,f3 and exported the dataset as CSV. Hooray — now we have data to test the MSE mode of the EXTRA BOOST model.

91 viewsedited 04:11

extrapolating function

97 views04:14

Algorithms. Physics. Mathematics. Machine Learning.

narrow band

105 views04:15

Algorithms. Physics. Mathematics. Machine Learning.

The second iteration of RANSAC approach

105 views04:17

Algorithms. Physics. Mathematics. Machine Learning.

The second dataset with all letters capital

96 views04:17

Algorithms. Physics. Mathematics. Machine Learning.

The final dataset

100 views04:18

Algorithms. Physics. Mathematics. Machine Learning.

Meta joke - joke about LLM cognition. To be honest, I didn't get this joke until asked iron friend to explain it.

😁1

98 viewsedited 12:24

Algorithms. Physics. Mathematics. Machine Learning.

How to set up openai helper in jupyterlab

For quite a while I was procrastinating quite a simple task: to set up AI assistant in jupyter lab. Here I want to write down a sequence of steps for memory.

* set up environment: . ~/envs/env312 (my working venv)
* pip install jupyterlab
* pip install "jupyter-ai[all]"
* export OPENAI_API_KEY="sk-...your key..."

start jupyter lab, inside jupyter notebook

%load_ext jupyter_ai_magics
%ai list openai-chat

It gives you a list of available models

%config AiMagics.default_language_model = "openai-chat:gpt-4o-mini"
cost efficient everyday option

On the side panel there is a new pane "jupyter ai chat". Select here your model, paste OPENAI_API_KEY. It's a little bit ugly: seems like you are to both export it as an environment variable and plug in here, I couldn't fight it.

Now we have: "Hi there! I'm Jupyternaut, your programming assistant."

jupyter ai documentation

Here we are. Magic command gives us what we want

%%ai chatgpt --format code
create a picture of 17 points equally distant on a circle, pairwise connected

115 views08:28

Algorithms. Physics. Mathematics. Machine Learning.

It's alive!

Finally I ran the full cycle of training and applying my EGBDT model in JupyterLab.

I spent two days in a very unpleasant debug session because I broke a simple rule:

Always do EDA!

EDA—Exploratory Data Analysis—is simple: before you do anything with your data, get a taste of it. Check the mean of the target and features. Take a small sample and read its raw dump. Plot histograms of your factors. Do smoke tests.

Instead, I just downloaded the dataset and jumped straight into training. The best I saw was 0.2 MSE on train and 0.3 on test. I started suspecting deep, fundamental problems—some math interfering with my plans.

Then a very simple thought: plot the graphs. Nothing extraordinary—just a basis-function factor over time.

It turned out my iron friend used
sin(𝑡) instead of sin(50𝑡). I was trying to approximate a high-frequency signal with a low-frequency one.

Fixing that made the MSE zero. On the first iteration.

Incredible—and incredibly unsatisfying to spend two days on something so simple: skipping EDA at the start.

117 viewsedited 16:43

Algorithms. Physics. Mathematics. Machine Learning.

Some simple EDA steps: number of components in the whole dataset

116 views16:45

Algorithms. Physics. Mathematics. Machine Learning.

The first group in the dataset

113 views16:46

Algorithms. Physics. Mathematics. Machine Learning.

The key graph in catching a culprit: basis functions. In wrong basis functions there wasn't any periods, just one slope.

114 views16:50

Algorithms. Physics. Mathematics. Machine Learning.

Retrophotos. Physics.

It’s a photo from my previous life as a physicist. To be honest, it’s one of the greatest surprises of my life. You take a glass-clear piece of diamond—perfectly transparent and homogeneous. You put it in the electron microscope, close the lid, pour liquid nitrogen into the vacuum pumps, and wait four hours. Then you start and tune the electron-beam system, cool the sample holder with liquid nitrogen, adjust the optical system—and then… you see this picture. It’s a natural diamond, and the growth sectors are clearly visible. You can see blue, orange, and green lines of luminescence.

Blue region — N3 center (λ ≈ 415 nm), an aggregated-nitrogen defect.
Green — H3 center, formed by irradiation + annealing (often enhanced by plastic deformation).
Yellow — NV⁰ center at 575 nm (nitrogen + vacancy).

The electron microscope was half of the setup. The other half was a fairly large spectrometer. We recorded spectra in different areas of the samples and tried to capture the diffusion of vacancies.

Those days gave me the habit of writing down everything you do in your experiments, very carefully. When you're writing, everything feels obvious. A month later, it's anything but obvious—and you curse that guy who didn’t put in enough effort to write down the crucial details you now crave while trying to write an article.

❤3👾1

121 viewsedited 19:00

Algorithms. Physics. Mathematics. Machine Learning.

Tree. From Gradient Boosted Decision Trees.

In playing with some technology or algorithm, my favorite moment is that elusive, transitional state when it’s still a little bit “wtf?” and yet already a solid—though not yet boring—tool. Gradient-Boosted Decision Trees with Extrapolation (GBDTE) is exactly there right now.

In earlier posts I explained how I built a dataset for testing this ML idea. The image shows one training step of the algorithm on that dataset. Let’s unpack what’s going on. In the next post I’ll introduce the four basis functions: 1, t, sin(kt), cos(kt). Our dataset contains eight groups of points, and each group is a linear combination of those basis functions. So the model’s task is two-part: first, classify points and assign them to a group; second, fit the best approximation for each group.

Let’s check ourselves. In a later post I highlight two of the most prominent groups. We’ll locate them in the tree, inspect their weights, then find them on the picture and compare our interpretation with the graphs.

Just for fun, let's inspect the bubble with id=4. Read from top to bottom: 0.20 means that when t = 0 this component should have value 0.2. Next 0.09 value means that we have slightly rising trend. For sin and cos we have zeros. It means that there is no oscillations in this component. Now we can find out values of f-parameters which describe this component. One should take "right-left-left" route from root, in terms of factor values it's (0, 0, 1). Be careful, on tree picture factors are f0, f1, f2. On pictures with components - f1, f2, f3. My bad. Check picture below. You will see that our description of this component is totally correct. It works!

In the opposite direction: the second component has much steeper tilt, we can expect bigger value of the second component. Curve rises, so the second component in a leaf is positive. Intersection with Oy is lower, so the first component of a leaf should be close to zero. Some oscillations are visible, but not very prominent. We can expect non-zero small third and fourth components. Static factors are (0, 1, 0), it reads like left, left, right sequence on the tree diagram and leads into the node with id:1. Weights are -0.01, 0.26, 0, -0.01. I think, perfect match.

And the third curve: 1, 0, 0 - left, right, left - id=2; weights 0.03, 0.18, 0.09, -0.02; finally has quite a prominent harmonic part and it is totally visible on the curve.

Quite an impressive result for one step of the algorithm, isn't it?

When trends in general are quite good and the tree corresponds to the graph, there is one small issue which bothers me. MSE shows mismatch on the level of 0.0003. I don't understand why. Yet.

❤2

99 viewsedited 17:03

Algorithms. Physics. Mathematics. Machine Learning.

Four basis functions.

To build the MSE dataset I used four basis functions: 1, t, sin(50t), cos(50t). Why these four?

𐂅 Constant never hurts — it lets the curve shift up or down.
𐂅Linear captures any overall trend.
𐂅sin and cos with the same frequency come as a pair: by mixing their weights you set both amplitude and phase (i.e., you can place the peaks where you need them).

Why 50?

Because on 𝑡∈[0,1] the period 2π/50 roughly matches the width of the letters in the title I’m approximating, so the oscillations align with the letter shapes.

132 viewsedited 18:11

About

Blog

Apps

Platform