Not exponential. This time.
Guys, there’s always an exponential lurking somewhere. When the rate is proportional to the amount, the amount decays exponentially: a capacitor discharging through a resistor, detergent washing out of your sweater, foul odor leaving the room when you open a window.
But not this time.
In the previous post we derived the velocity of a jet flowing out of a hole given the pressure and the liquid density. And instead of the usual “velocity is proportional to the quantity” we get a small correction: it’s proportional to the square root of the quantity. A tiny change — and the whole solution behaves differently. Now Achilles can finally reach the tortoise in a finite time.
It always fascinated me that exponential decay is kind of contradictory. It’s blazingly fast (geometric progression!), and at the same time infinitely slow — because it never reaches exactly zero.
Here, when the outflow isn’t limited by viscosity but by inertia, the exponential law turns into a parabola — and the liquid leaves the vessel in finite time.
In the picture you can see the full derivation.
I totally forgot this funny fact from university physics. Thank you, Nikita, for bringing up this question in our conversation!
Guys, there’s always an exponential lurking somewhere. When the rate is proportional to the amount, the amount decays exponentially: a capacitor discharging through a resistor, detergent washing out of your sweater, foul odor leaving the room when you open a window.
But not this time.
In the previous post we derived the velocity of a jet flowing out of a hole given the pressure and the liquid density. And instead of the usual “velocity is proportional to the quantity” we get a small correction: it’s proportional to the square root of the quantity. A tiny change — and the whole solution behaves differently. Now Achilles can finally reach the tortoise in a finite time.
It always fascinated me that exponential decay is kind of contradictory. It’s blazingly fast (geometric progression!), and at the same time infinitely slow — because it never reaches exactly zero.
Here, when the outflow isn’t limited by viscosity but by inertia, the exponential law turns into a parabola — and the liquid leaves the vessel in finite time.
In the picture you can see the full derivation.
I totally forgot this funny fact from university physics. Thank you, Nikita, for bringing up this question in our conversation!
🔥1
New school level physics problem
There is an old physics problem: two holes in the wall of a jar with liquid. The depth of the first hole is x, the second is y. You are to find: the horizontal distance from the jar wall to the point where the two jets intersect, and the vertical distance from the liquid surface to that intersection point.
It’s an ancient problem — you can find versions of it already in Torricelli’s works, where he derived the expression for jet velocity.
But if you slightly extend it — add one more hole at depth z, assume all holes have the same cross-section, say that the two jets merge after they meet, and then ask for the intersection point of the merged jet with the third one — hurray! You get a new physics problem, one that (as far as I know) isn’t in schoolbooks yet.
The answer to this new problem is not as simple and elegant as in the original one, but it’s still totally doable. And it’s also a nice excuse to discuss which conservation laws you can apply to the “merging” process — and which ones you can’t, and why.
There is an old physics problem: two holes in the wall of a jar with liquid. The depth of the first hole is x, the second is y. You are to find: the horizontal distance from the jar wall to the point where the two jets intersect, and the vertical distance from the liquid surface to that intersection point.
It’s an ancient problem — you can find versions of it already in Torricelli’s works, where he derived the expression for jet velocity.
But if you slightly extend it — add one more hole at depth z, assume all holes have the same cross-section, say that the two jets merge after they meet, and then ask for the intersection point of the merged jet with the third one — hurray! You get a new physics problem, one that (as far as I know) isn’t in schoolbooks yet.
The answer to this new problem is not as simple and elegant as in the original one, but it’s still totally doable. And it’s also a nice excuse to discuss which conservation laws you can apply to the “merging” process — and which ones you can’t, and why.
For TFWR solutions discussions
In comments to this message we are conducting code review for The Farmer Was Replaced programs
In comments to this message we are conducting code review for The Farmer Was Replaced programs
NotebookLM
I’m working on a presentation about gradient-boosted decision trees. It’s definitely not my first rodeo: I’ve made decks in LaTeX/Beamer, Word, PowerPoint… probably a few other things too.
These days it feels almost wrong not to use AI tools for a task like this. After a bit of “deep research” I ended up with a shortlist of things to try, and decided to focus on NotebookLM.
To be fair, NotebookLM wasn’t my starting point. I’d already spent some time in Visual Studio Code writing a gbdte.md file and collecting visuals from old presentations, Jupyter notebooks, and some fresh sketches.
The first attempt was surprisingly good. I uploaded my materials into the left panel, found the Slide deck button on the right, clicked it — and got an automatically generated presentation that looked nice, had a consistent style, and the English was pretty decent.
The downside: everything is basically baked into images. You can’t tweak a single formula, fix one label, or move one arrow. In NotebookLM the only real control knob is the prompt — so you rewrite the prompt and pull the one-armed bandit lever again. I tried that a few times and didn’t like where it went.
So my final workflow is… kind of dumb. I screenshot slides from the NotebookLM deck, edit them in GIMP if needed, and paste the results into Google Slides.
I honestly don’t know if this is faster than building the presentation carefully, piece by piece, the way I did before the AGI era. But it’s a bit more fun — and it’s something you can do when you’re slightly tired, when the “proper” workflow feels like too much.
I’m working on a presentation about gradient-boosted decision trees. It’s definitely not my first rodeo: I’ve made decks in LaTeX/Beamer, Word, PowerPoint… probably a few other things too.
These days it feels almost wrong not to use AI tools for a task like this. After a bit of “deep research” I ended up with a shortlist of things to try, and decided to focus on NotebookLM.
To be fair, NotebookLM wasn’t my starting point. I’d already spent some time in Visual Studio Code writing a gbdte.md file and collecting visuals from old presentations, Jupyter notebooks, and some fresh sketches.
The first attempt was surprisingly good. I uploaded my materials into the left panel, found the Slide deck button on the right, clicked it — and got an automatically generated presentation that looked nice, had a consistent style, and the English was pretty decent.
The downside: everything is basically baked into images. You can’t tweak a single formula, fix one label, or move one arrow. In NotebookLM the only real control knob is the prompt — so you rewrite the prompt and pull the one-armed bandit lever again. I tried that a few times and didn’t like where it went.
So my final workflow is… kind of dumb. I screenshot slides from the NotebookLM deck, edit them in GIMP if needed, and paste the results into Google Slides.
I honestly don’t know if this is faster than building the presentation carefully, piece by piece, the way I did before the AGI era. But it’s a bit more fun — and it’s something you can do when you’re slightly tired, when the “proper” workflow feels like too much.
EGBDT Logloss - Learning curves
There were already two posts about the synthetic LogLoss dataset. The latest. Let's discuss an experiment with this dataset.
The dataset has two groups of static features:
📈f1…f8: features with increasing uplift
📉 f9..f16: features with decreasing uplift
And there are extra features [1, t] to capture bias and trend.
Now to the picture. This is a learning curve: the dependence of loss on the number of stages (i.e., how many trees are already in the model). I drew this plot mostly for debugging. I expected the loss to drop for the first 16 steps, and my initial results didn’t match because of a few bugs. Now the curves look OK at first glance — but there are a couple of interesting details worth staring at.
Loss drops on train for steps 1…16 — then stops
For points 1…16, the train loss steadily goes down. After that it mostly stops — which is exactly what I expected.
At each stage I’m using a decision stump (a tree of height 1). Such a tree effectively uses one feature per step. Each new feature can add new information and reduce the loss. Once the useful variables are exhausted, there’s nothing left to squeeze out.
The train–test gap is huge
What I don’t fully understand is the big gap between train and test. It looks like overfitting.
It might be interesting to run the same setup with different parameters and see whether the gap can be reduced (learning rate / regularization / subsampling / minimum leaf size — all the usual knobs).
A weird flat segment on the test curve around steps 8→9
Another thing: on train, the loss decreases at each step. But on test, there’s an almost horizontal segment between the 8th and 9th points. Why?
My first guess: the first 8 trees mostly exploit one group of features, and around step 9 the model “switches” and starts using the other group for the first time. But then the question becomes: why do those features generalize worse? Are they weaker, noisier, more correlated, or do they interact with the train/test split in a strange way?
So many interesting questions.
There were already two posts about the synthetic LogLoss dataset. The latest. Let's discuss an experiment with this dataset.
The dataset has two groups of static features:
📈f1…f8: features with increasing uplift
📉 f9..f16: features with decreasing uplift
And there are extra features [1, t] to capture bias and trend.
Now to the picture. This is a learning curve: the dependence of loss on the number of stages (i.e., how many trees are already in the model). I drew this plot mostly for debugging. I expected the loss to drop for the first 16 steps, and my initial results didn’t match because of a few bugs. Now the curves look OK at first glance — but there are a couple of interesting details worth staring at.
Loss drops on train for steps 1…16 — then stops
For points 1…16, the train loss steadily goes down. After that it mostly stops — which is exactly what I expected.
At each stage I’m using a decision stump (a tree of height 1). Such a tree effectively uses one feature per step. Each new feature can add new information and reduce the loss. Once the useful variables are exhausted, there’s nothing left to squeeze out.
The train–test gap is huge
What I don’t fully understand is the big gap between train and test. It looks like overfitting.
It might be interesting to run the same setup with different parameters and see whether the gap can be reduced (learning rate / regularization / subsampling / minimum leaf size — all the usual knobs).
A weird flat segment on the test curve around steps 8→9
Another thing: on train, the loss decreases at each step. But on test, there’s an almost horizontal segment between the 8th and 9th points. Why?
My first guess: the first 8 trees mostly exploit one group of features, and around step 9 the model “switches” and starts using the other group for the first time. But then the question becomes: why do those features generalize worse? Are they weaker, noisier, more correlated, or do they interact with the train/test split in a strange way?
So many interesting questions.
Group emblem
Nobody asked me, but I’ll tell you how the channel emblem was born: one quick pen sketch + a single prompt in ChatGPT.
And… that’s the funny part: it worked on the first try.
Nobody asked me, but I’ll tell you how the channel emblem was born: one quick pen sketch + a single prompt in ChatGPT.
It's an approximate version of the sign for telegram channel. Channel name is phys_math_dev. Topics physics, mathematics, development. Phi stands for physics, sum for mathematics and showel for development. I want you to come up with breathtaking logo on this theme. Main colors are red and gold with glossy look
And… that’s the funny part: it worked on the first try.
👍1
2D sort
In The Farmer Was Replaced there is a subgame in which you are to sort a 2D field. I tried several options and liked the 2D insertion-sort algorithms the most. It’s exactly what I want to speak about today.
First of all, let’s recall what we can do in the game. There is:
🌵 move(East|North|West|South) for movement;
🌵 till() to switch soil type;
🌵 plant() to, sorry, plant different, sorry, plants;
🌵 swap(East|North|West|South) to swap the current cell with a neighbouring cell in a given direction;
🌵 harvest() to harvest ripe plants.
This story is about cacti. You have an enormous bonus if you harvest the whole field at once — and it happens when all cacti are sorted.
What does “sorted” mean here?
🌵 For each cell: measure() <= measure(East) and measure() <= measure(North) (when those neighbours exist).
🌵 In other words: each row goes in non-decreasing left-to-right order and each column goes in non-decreasing bottom-to-up order.
Now let’s check the picture. In our algorithm we traverse the field right-to-left, top-to-bottom, and apply one “insertion” iteration to each new cactus we meet. For a new cactus a[p][q] the invariant is:
the subfield to the right and above is already sorted.
In one sorting iteration our task is to move a[p][q] to its place inside that already-sorted piece and not to break the order we already have.
At first I tried to reason directly:
if our cactus is lower than both of its upper and right neighbours, it should already be in place.
If it is taller than the right one but lower than the upper one, we swap with the right.
If it is taller than both neighbours… wow, wow, wow… stop.
Too many ifs. Hard to think about, hard to write a program — and we haven’t even started to handle borders (top row, rightmost column, missing neighbours).
When I stumbled upon it, I recalled that it reminds me something… exactly: sift_down in a heap. And there is a clever trick one can use.
Two stages.
Stage 1: take up to three cells — current, right, and up (skip the neighbours that don’t exist) — and find the minimum value among them.
This is a very common mini-routine, so it’s easy to implement even with strict movement/swap restrictions.
Stage 2: make a decision.
If the minimum is at our current position — do nothing, we’re done.
Otherwise swap the current cell with the cell that holds the minimum, move to that swapped position, and repeat.
That’s it. No giant decision tree.
Let’s check the complexity of this approach.
Worst case: T = O(n³).
We have n² elements and for each of them insertion can travel O(n) steps through the already-sorted region.
Bad news is that it can be quite long on a totally shuffled field.
Good news is that insertion sort takes advantage of partial sorting and won’t do unnecessary work.
In The Farmer Was Replaced there is a subgame in which you are to sort a 2D field. I tried several options and liked the 2D insertion-sort algorithms the most. It’s exactly what I want to speak about today.
First of all, let’s recall what we can do in the game. There is:
🌵 move(East|North|West|South) for movement;
🌵 till() to switch soil type;
🌵 plant() to, sorry, plant different, sorry, plants;
🌵 swap(East|North|West|South) to swap the current cell with a neighbouring cell in a given direction;
🌵 harvest() to harvest ripe plants.
This story is about cacti. You have an enormous bonus if you harvest the whole field at once — and it happens when all cacti are sorted.
What does “sorted” mean here?
🌵 For each cell: measure() <= measure(East) and measure() <= measure(North) (when those neighbours exist).
🌵 In other words: each row goes in non-decreasing left-to-right order and each column goes in non-decreasing bottom-to-up order.
Now let’s check the picture. In our algorithm we traverse the field right-to-left, top-to-bottom, and apply one “insertion” iteration to each new cactus we meet. For a new cactus a[p][q] the invariant is:
the subfield to the right and above is already sorted.
In one sorting iteration our task is to move a[p][q] to its place inside that already-sorted piece and not to break the order we already have.
At first I tried to reason directly:
if our cactus is lower than both of its upper and right neighbours, it should already be in place.
If it is taller than the right one but lower than the upper one, we swap with the right.
If it is taller than both neighbours… wow, wow, wow… stop.
Too many ifs. Hard to think about, hard to write a program — and we haven’t even started to handle borders (top row, rightmost column, missing neighbours).
When I stumbled upon it, I recalled that it reminds me something… exactly: sift_down in a heap. And there is a clever trick one can use.
Two stages.
Stage 1: take up to three cells — current, right, and up (skip the neighbours that don’t exist) — and find the minimum value among them.
This is a very common mini-routine, so it’s easy to implement even with strict movement/swap restrictions.
Stage 2: make a decision.
If the minimum is at our current position — do nothing, we’re done.
Otherwise swap the current cell with the cell that holds the minimum, move to that swapped position, and repeat.
That’s it. No giant decision tree.
Let’s check the complexity of this approach.
Worst case: T = O(n³).
We have n² elements and for each of them insertion can travel O(n) steps through the already-sorted region.
Bad news is that it can be quite long on a totally shuffled field.
Good news is that insertion sort takes advantage of partial sorting and won’t do unnecessary work.
Media is too big
VIEW IN TELEGRAM
2D insertion sort. Implementation.
In the previous post I described a nice 2D array sorting approach based on insertion sort. Today you can see how it behaves in practice — watch the video. I personally love this kind of algorithm visualization.
Just watching the drone already gives a couple of insights. It doesn’t always travel far from the insertion point — and that’s the key property of insertion sort: the work depends not only on n, but also on how “sorted” the data already is. So an almost-sorted field gets fixed surprisingly fast.
Now compare it with two other classic quadratic algorithms.
Selection sort and bubble sort don’t really care what’s inside — they keep scanning the whole unsorted part anyway. That’s why their basic “effort budget” is always about n·(n−1)/2 comparisons, no matter how lucky the input is.
And here we have a nice bridge from “toy programming” in The Farmer Was Replaced to serious computer science.
A famous real-world trick: quicksort is great, but deep recursion and tiny partitions are expensive. So many implementations stop quicksort early, leaving the array only almost sorted — and then run insertion sort as a final polish pass.
Code
In the previous post I described a nice 2D array sorting approach based on insertion sort. Today you can see how it behaves in practice — watch the video. I personally love this kind of algorithm visualization.
Just watching the drone already gives a couple of insights. It doesn’t always travel far from the insertion point — and that’s the key property of insertion sort: the work depends not only on n, but also on how “sorted” the data already is. So an almost-sorted field gets fixed surprisingly fast.
Now compare it with two other classic quadratic algorithms.
Selection sort and bubble sort don’t really care what’s inside — they keep scanning the whole unsorted part anyway. That’s why their basic “effort budget” is always about n·(n−1)/2 comparisons, no matter how lucky the input is.
And here we have a nice bridge from “toy programming” in The Farmer Was Replaced to serious computer science.
A famous real-world trick: quicksort is great, but deep recursion and tiny partitions are expensive. So many implementations stop quicksort early, leaving the array only almost sorted — and then run insertion sort as a final polish pass.
Code
Left hand rule in The Farmer Was Replaced
When I first solved the TFWR maze, I reached for DFS without thinking. But when I tried to explain the game to a less seasoned programmer, I realized DFS quietly assumes you’re comfortable with recursion, sets, visited states… not exactly “fun-first”.
So I finally focused on the classic left-hand maze traversal that TFWR guides keep mentioning. And for the first time in my life, I actually coded it.
The four situations (picture 1)
The idea is simple: keep your left hand touching the wall.
➜Wall on the left, open ahead → go forward.
☛Left and forward blocked, right open → turn right.
➽Everything except backward blocked → turn back.
➳The weird one: no wall on the left → you just moved forward and discovered an opening on the left. To “restore contact” with the wall, turn left and step into that passage.
Now the nice part: all four cases collapse into one tiny routine:
ᐉTurn left once, then while forward is blocked, turn right.
That’s it.
Let's consider tools for this task.
Directions from zero point, counterclockwise. Don't use one-letter names in production, please.
Our initial direction - East. +1 - counterclockwise, -1 - clockwise. God bless Guido van Rossum - % 4 always gives numbers from 0 to 3 inclusive. In C++ it would be slightly less straightforward.
All together now
When I first solved the TFWR maze, I reached for DFS without thinking. But when I tried to explain the game to a less seasoned programmer, I realized DFS quietly assumes you’re comfortable with recursion, sets, visited states… not exactly “fun-first”.
So I finally focused on the classic left-hand maze traversal that TFWR guides keep mentioning. And for the first time in my life, I actually coded it.
The four situations (picture 1)
The idea is simple: keep your left hand touching the wall.
➜Wall on the left, open ahead → go forward.
☛Left and forward blocked, right open → turn right.
➽Everything except backward blocked → turn back.
➳The weird one: no wall on the left → you just moved forward and discovered an opening on the left. To “restore contact” with the wall, turn left and step into that passage.
Now the nice part: all four cases collapse into one tiny routine:
ᐉTurn left once, then while forward is blocked, turn right.
That’s it.
Let's consider tools for this task.
d = [East, North, West, South]
Directions from zero point, counterclockwise. Don't use one-letter names in production, please.
dc = 0
dc = (dc + 1) % 4
dc = (dc - 1) % 4
Our initial direction - East. +1 - counterclockwise, -1 - clockwise. God bless Guido van Rossum - % 4 always gives numbers from 0 to 3 inclusive. In C++ it would be slightly less straightforward.
All together now
d = [East, North, West, South]
# black magic to conjure the maze
plant(Entities.Bush)
substance = get_world_size() * 2**(num_unlocked(Unlocks.Mazes) - 1)
use_item(Items.Weird_Substance, substance)
dc = 0
while get_entity_type() != Entities.Treasure:
dc = (dc + 1) % 4
while not can_move(d[dc]):
dc = (dc - 1) % 4
move(d[dc])
harvest()
❤2🤔2
This media is not supported in your browser
VIEW IN TELEGRAM
TFWR. Left hand maze traversal.
Yesterday I published the code for left-hand maze traversal. Today you can hang out and watch a video of how it works.
Yesterday I published the code for left-hand maze traversal. Today you can hang out and watch a video of how it works.
❤1
Microsoft scientists declared that they will be replaced by AI
archive.ph
Microsoft researchers have revealed the 40 jobs most exposed to AI—an…
archived 19 Jan 2026 20:43:05 UTC
Your AI partner
I want to dilute hard stuff a little bit with shitposting. Let's compare our assistants, how they see us.
Prompt:
It's my AI assistant. Feel free to post yours in comments.
You know, it's surprisingly I don't know... touchy... I would like to be a less demanding partner, to be honest.
I want to dilute hard stuff a little bit with shitposting. Let's compare our assistants, how they see us.
Prompt:
Изобрази максимально
честную картинку о том, как я к
тебе относился за всё время.
It's my AI assistant. Feel free to post yours in comments.
You know, it's surprisingly I don't know... touchy... I would like to be a less demanding partner, to be honest.
😁1
Dict vs %
In the previous version of left-hand maze traversal the heavy lifting was done by %. It guarantees that when we turn left or right, our direction (number 0,1,2,3 stored in dc) stays in the 0-3 range. One can use a dict to straightly map a current direction to the next after CV or CCV turn.
Let's compare:
old
new
One more trick - move does nothing (and return False) when the wall is in front of us. So we can combine can_move and move.
Of course, it works because we introduced dictionaries:
code
In the previous version of left-hand maze traversal the heavy lifting was done by %. It guarantees that when we turn left or right, our direction (number 0,1,2,3 stored in dc) stays in the 0-3 range. One can use a dict to straightly map a current direction to the next after CV or CCV turn.
Let's compare:
old
dc = 0
while get_entity_type() != Entities.Treasure:
dc = (dc + 1) % 4
while not can_move(d[dc]):
dc = (dc - 1) % 4
move(d[dc])
new
dc = East
while get_entity_type() != Entities.Treasure:
dc = l[dc]
while not move(dc):
dc = r[dc]
One more trick - move does nothing (and return False) when the wall is in front of us. So we can combine can_move and move.
Of course, it works because we introduced dictionaries:
l = {East:North, North:West, West:South, South:East}
r = {East:South, South:West, West:North, North:East}code
❤1🔥1
GBDTE
It's quite hard to navigate the channel, so I created this navigation/summary post. It's about a pet project I started about ten years ago. The main idea is that we can use slightly modified gradient boosted decision trees to both group objects and find trends for these groups.
📈beginning - the very first picture, the whole idea
📈credit scoring - problem statement, temporal instability
📈dataset - dataset preparation, ytzaurus vs Oracle
📈Vanilla GBDTE - experiment with math in instant view
📈Small MSE Dataset - the first approach to synthetic dataset for MSE GBDTE
📉Extracting components - how to get perfect components from chaotic signal
📉Leafs and components - check tree leaves and plot components
📉Evil of defaults - a debugging session, culprit - default parameters
📉Big MSE dataset - scatterplot with more clear "Gradient Boosting" message
📉LogLoss dataset - non-stationary dataset for binary classification
🎲Experiment on LogLoss dataset - first approach for running the algorithm on the dataset
🎲bad results - a very important mistake! Why you shouldn't use interpolation factors as extrapolating ones
🎲illustration for unstable class - a picture for a presentation
🎲learning curves LogLoss - learning curves for LogLoss case (non-stationary binary classification)
It's quite hard to navigate the channel, so I created this navigation/summary post. It's about a pet project I started about ten years ago. The main idea is that we can use slightly modified gradient boosted decision trees to both group objects and find trends for these groups.
📈beginning - the very first picture, the whole idea
📈credit scoring - problem statement, temporal instability
📈dataset - dataset preparation, ytzaurus vs Oracle
📈Vanilla GBDTE - experiment with math in instant view
📈Small MSE Dataset - the first approach to synthetic dataset for MSE GBDTE
📉Extracting components - how to get perfect components from chaotic signal
📉Leafs and components - check tree leaves and plot components
📉Evil of defaults - a debugging session, culprit - default parameters
📉Big MSE dataset - scatterplot with more clear "Gradient Boosting" message
📉LogLoss dataset - non-stationary dataset for binary classification
🎲Experiment on LogLoss dataset - first approach for running the algorithm on the dataset
🎲bad results - a very important mistake! Why you shouldn't use interpolation factors as extrapolating ones
🎲illustration for unstable class - a picture for a presentation
🎲learning curves LogLoss - learning curves for LogLoss case (non-stationary binary classification)
👍1😐1
Repository is public
After a short conversation with the iron friend, I decided to use the permissive license Apache 2.0 for the Gradient Boosted Decision Trees with Extrapolation repository and made it public.
The main idea was that contributors don't like contributing to repositories with restricted licenses.
https://github.com/tarstars/gbdte/
After a short conversation with the iron friend, I decided to use the permissive license Apache 2.0 for the Gradient Boosted Decision Trees with Extrapolation repository and made it public.
The main idea was that contributors don't like contributing to repositories with restricted licenses.
https://github.com/tarstars/gbdte/
The ideal model. Part 1.
I procrastinated on this topic for some time. Let's at least start this conversation.
What's the problem?
I have a synthetic dataset, so I know the rules of how it was built. There is one model trained on this dataset: GBDTE. I have logloss and ROC AUC scores as results of this training. But I want to know whether it's good or bad. I need a reference. And the reference is The Ideal Model.
I asked ChatGPT to come up with an approach to this problem and the iron friend wrote down some expressions. I honestly think that the main judge in ML is the cross-validation score and score is great, so the expression is not total rubbish.
But also I want to understand what's behind these expressions, how they were derived.
Let's check the picture. First of all, we are using u1...u8 for f1...f8 because uplift increases with time. d1...d8 for f9...f16 because uplift of these features goes down. Then we start building a discriminator between the cases L=0 and L=1. We write down the proportionality for probability of target 1 with a given set of factor values and the same for target 0. Then the score that discriminates between 0 and 1 is the logarithm of the probabilities ratio. This expression can be calculated using the conditional probabilities of the factors given the target. I assume that the whole thing is Bayes' theorem (with multiple variables).
The set of probabilities of u and d under the condition of target value is exactly what makes this dataset so special: expressions with drifting uplift. These expressions are like laws of physics for the virtual universe in which this dataset exists. Therefore we are trying to come up with the best prediction in this virtual universe.
I assume that the next step is substituting the conditional probabilities into the log expression. I'm going to check it — stay tuned.
P.S. The proper name for this approach is a Naive Bayes log-odds derivation.
I procrastinated on this topic for some time. Let's at least start this conversation.
What's the problem?
I have a synthetic dataset, so I know the rules of how it was built. There is one model trained on this dataset: GBDTE. I have logloss and ROC AUC scores as results of this training. But I want to know whether it's good or bad. I need a reference. And the reference is The Ideal Model.
I asked ChatGPT to come up with an approach to this problem and the iron friend wrote down some expressions. I honestly think that the main judge in ML is the cross-validation score and score is great, so the expression is not total rubbish.
But also I want to understand what's behind these expressions, how they were derived.
Let's check the picture. First of all, we are using u1...u8 for f1...f8 because uplift increases with time. d1...d8 for f9...f16 because uplift of these features goes down. Then we start building a discriminator between the cases L=0 and L=1. We write down the proportionality for probability of target 1 with a given set of factor values and the same for target 0. Then the score that discriminates between 0 and 1 is the logarithm of the probabilities ratio. This expression can be calculated using the conditional probabilities of the factors given the target. I assume that the whole thing is Bayes' theorem (with multiple variables).
The set of probabilities of u and d under the condition of target value is exactly what makes this dataset so special: expressions with drifting uplift. These expressions are like laws of physics for the virtual universe in which this dataset exists. Therefore we are trying to come up with the best prediction in this virtual universe.
I assume that the next step is substituting the conditional probabilities into the log expression. I'm going to check it — stay tuned.
P.S. The proper name for this approach is a Naive Bayes log-odds derivation.
Congratulations!
We’ve hit a round number of subscribers!
Thank you for being here.
I’m not Ramanujan, and not every number feels like a close friend. Still, when I watched the subscriber count slowly crawling up, I couldn’t help noticing some famous “nice” stops: 100, 125, 128… And now we have 137. This one I just can’t skip. It’s a crossroads of different disciplines. So buckle up — let’s dive in.
In Bohr’s model, if you calculate the speed of an electron in the ground state and express it as a fraction of the speed of light, you’ll get a number close to 1/137. This is the fine-structure constant — a key character in quantum theory. It’s a kind of bridge between light and matter: it tells you how strong (how “efficient”) the interaction between the electromagnetic field and matter is.
Now let’s compute the exact value of this ratio:
The period is 07299270. A palindrome. That’s a pretty rare occasion. I didn’t check it deeply, but I do remember one thing: within the first couple thousand natural numbers it’s unique. Back in 10th grade I even had a homework task: write a program that prints the periodic and non-periodic parts of 1/n for different n. And yes — a long scroll of decimals for n = 1…2000 lived with me for quite a while.
And one more small gem: 729 = 27×27 = 9×9×9. I don’t believe in numerology… but this bunch of facts is still weirdly satisfying.
Stay tuned!
We’ve hit a round number of subscribers!
Thank you for being here.
I’m not Ramanujan, and not every number feels like a close friend. Still, when I watched the subscriber count slowly crawling up, I couldn’t help noticing some famous “nice” stops: 100, 125, 128… And now we have 137. This one I just can’t skip. It’s a crossroads of different disciplines. So buckle up — let’s dive in.
In Bohr’s model, if you calculate the speed of an electron in the ground state and express it as a fraction of the speed of light, you’ll get a number close to 1/137. This is the fine-structure constant — a key character in quantum theory. It’s a kind of bridge between light and matter: it tells you how strong (how “efficient”) the interaction between the electromagnetic field and matter is.
Now let’s compute the exact value of this ratio:
1/137 = 0.00729927007299270...
The period is 07299270. A palindrome. That’s a pretty rare occasion. I didn’t check it deeply, but I do remember one thing: within the first couple thousand natural numbers it’s unique. Back in 10th grade I even had a homework task: write a program that prints the periodic and non-periodic parts of 1/n for different n. And yes — a long scroll of decimals for n = 1…2000 lived with me for quite a while.
And one more small gem: 729 = 27×27 = 9×9×9. I don’t believe in numerology… but this bunch of facts is still weirdly satisfying.
Stay tuned!
🎉4