Everybody wants to be like ewe
Track on youtube
Just a small reminder:
🐑Homophones - "you" and "ewe", same pronunciation but different meanings
🐑Homographs - same spelling, different pronunciations (wind noun vs. wind verb). cat noun and cat verb...
🐑Homonyms (broad sense) umbrella term covering same-sound and/or same-spelling pairs
That's not it.
🐑Oronyms: phrases that sound alike — “ice cream” / “I scream”.
🐑Homographs: same spelling, different meanings — lead (metal) / lead (guide)
🐑Heteronyms: same spelling, different pronunciations and meanings — wind (noun) / wind (verb).
🐑Capitonyms: meaning (and sometimes pronunciation) changes with capitalization — Polish / polish, March / march.
🐑Heterographs: different spellings for words in a pair — you/ewe
🐑Polysemy: one word with related senses — head (of a person, of a department)
🐑Auto-antonyms (contronyms/Janus words): one word with opposite meanings — sanction (“approve” / “penalize”)
Mishearings
🐑Mondegreen: misheard lyric — “’Scuse me while I kiss this guy” for “kiss the sky.”
🐑Eggcorn: plausible but wrong substitution — “for all intensive purposes” for “intents and purposes.”
🐑Malapropism: wrong word, similar sound — “dance the flamingo” for flamenco.
🐑Spoonerism: swapping sounds — “You have hissed all my mystery lectures.”
Track on youtube
Just a small reminder:
🐑Homophones - "you" and "ewe", same pronunciation but different meanings
🐑Homographs - same spelling, different pronunciations (wind noun vs. wind verb). cat noun and cat verb...
🐑Homonyms (broad sense) umbrella term covering same-sound and/or same-spelling pairs
That's not it.
🐑Oronyms: phrases that sound alike — “ice cream” / “I scream”.
🐑Homographs: same spelling, different meanings — lead (metal) / lead (guide)
🐑Heteronyms: same spelling, different pronunciations and meanings — wind (noun) / wind (verb).
🐑Capitonyms: meaning (and sometimes pronunciation) changes with capitalization — Polish / polish, March / march.
🐑Heterographs: different spellings for words in a pair — you/ewe
🐑Polysemy: one word with related senses — head (of a person, of a department)
🐑Auto-antonyms (contronyms/Janus words): one word with opposite meanings — sanction (“approve” / “penalize”)
Mishearings
🐑Mondegreen: misheard lyric — “’Scuse me while I kiss this guy” for “kiss the sky.”
🐑Eggcorn: plausible but wrong substitution — “for all intensive purposes” for “intents and purposes.”
🐑Malapropism: wrong word, similar sound — “dance the flamingo” for flamenco.
🐑Spoonerism: swapping sounds — “You have hissed all my mystery lectures.”
👍1
The Farmer Was Replaced
In one of ML channels I'm reading, I found a reference to the game "The Farmer Was Replaced". Here you are to write programs on a language which is quite similar to Python, but is not quite Python to solve farm puzzles.
Some of them are quite essential: to chop grass and collect hay, to grow bushes and trees and collect wood. Carrot is totally normal - you just to spend some wood to seed it. Weird things starts with pumpkins: they rot with 20% probability, but when there is a patch of ready to harvest pumpkins, they merge together and when you harvest this mega-pumpkin, you have bonus harvest.
Then things go totally astray: on the picture it's a dinosaur chasing the apple. Basically you are to program automatic pilot for the famous "snake" game.
There are also cacti. You have to sort them on two-dimensional grid in order to get bonus.
I totally like it because it gives a chance to practice medium programming tasks at the Meta/Google 20-minutes level while playing the game.
It also brought back some memories I thought I'd forgotten. I will try to share them in this channel.
In one of ML channels I'm reading, I found a reference to the game "The Farmer Was Replaced". Here you are to write programs on a language which is quite similar to Python, but is not quite Python to solve farm puzzles.
Some of them are quite essential: to chop grass and collect hay, to grow bushes and trees and collect wood. Carrot is totally normal - you just to spend some wood to seed it. Weird things starts with pumpkins: they rot with 20% probability, but when there is a patch of ready to harvest pumpkins, they merge together and when you harvest this mega-pumpkin, you have bonus harvest.
Then things go totally astray: on the picture it's a dinosaur chasing the apple. Basically you are to program automatic pilot for the famous "snake" game.
There are also cacti. You have to sort them on two-dimensional grid in order to get bonus.
I totally like it because it gives a chance to practice medium programming tasks at the Meta/Google 20-minutes level while playing the game.
It also brought back some memories I thought I'd forgotten. I will try to share them in this channel.
Memories Awakened by “The Farmer Was Replaced”
It seems like a very trivial question: “What should a drone do if it is on the lower (South) edge of the field and it has a move(South) command to perform?” But, strangely, this question leads to quite an interesting set of consequences.
In The Farmer Was Replaced the drone teleports to the other end of the field. It means that the upper edge of the field is glued to the lower edge. As I drew it, a square with upper and lower edges glued together becomes a cylinder from the topological point of view. Then the same goes for the left and right boundaries, and we have a torus.
There are other options. For instance, if the drone disappears on the lower edge and then appears on the left one (and similarly with upper and right), we have a sphere.
But that’s not all. You can play with the orientation of edges. If the drone appears at the beginning of the lower edge and then at the end of the upper edge, it means that we reversed the orientation. After we finish gluing edges, we get a Klein bottle or the projective plane. Unfortunately, you can’t embed these figures in 3D without self-intersection.
About memories. I first read about this in Anatoly Fomenko’s book. He also wrote “History: Fiction or Science?” (New Chronology) and, while he’s quite famous for that, he’s also a mathematician; his “Visual Geometry and Topology” is an interesting read. It even contains illustrations for “The Master and Margarita.” What I explained about gluing the edges of a square together is a simplified version of the introductory section of that book.
It seems like a very trivial question: “What should a drone do if it is on the lower (South) edge of the field and it has a move(South) command to perform?” But, strangely, this question leads to quite an interesting set of consequences.
In The Farmer Was Replaced the drone teleports to the other end of the field. It means that the upper edge of the field is glued to the lower edge. As I drew it, a square with upper and lower edges glued together becomes a cylinder from the topological point of view. Then the same goes for the left and right boundaries, and we have a torus.
There are other options. For instance, if the drone disappears on the lower edge and then appears on the left one (and similarly with upper and right), we have a sphere.
But that’s not all. You can play with the orientation of edges. If the drone appears at the beginning of the lower edge and then at the end of the upper edge, it means that we reversed the orientation. After we finish gluing edges, we get a Klein bottle or the projective plane. Unfortunately, you can’t embed these figures in 3D without self-intersection.
About memories. I first read about this in Anatoly Fomenko’s book. He also wrote “History: Fiction or Science?” (New Chronology) and, while he’s quite famous for that, he’s also a mathematician; his “Visual Geometry and Topology” is an interesting read. It even contains illustrations for “The Master and Margarita.” What I explained about gluing the edges of a square together is a simplified version of the introductory section of that book.
TFWR. Navigation.
In order to navigate the drone, there are get_pos_x(), get_pos_y(), and move() functions. move() takes East (x + 1), North (y + 1), West (x − 1), and South (y − 1) commands. get_world_size() is useful too. The world is square, so a single number (the side length) gives you everything you need.
nav(x, y) is a nice-to-have function. It moves the drone to position (x, y).
In the screenshot, a naive navigation function decides where to go—positive or negative direction—and moves accordingly. Why naive? Because it doesn't take into account the toroidal nature of the world (wrap-around at the boundaries).
In order to navigate the drone, there are get_pos_x(), get_pos_y(), and move() functions. move() takes East (x + 1), North (y + 1), West (x − 1), and South (y − 1) commands. get_world_size() is useful too. The world is square, so a single number (the side length) gives you everything you need.
nav(x, y) is a nice-to-have function. It moves the drone to position (x, y).
In the screenshot, a naive navigation function decides where to go—positive or negative direction—and moves accordingly. Why naive? Because it doesn't take into account the toroidal nature of the world (wrap-around at the boundaries).
✍1👨💻1
Hello everyone!
I really appreciate that all of you have signed up for my channel. It's kinda touching, and I feel a little bit awkward just writing about GBDT and TFWR (but what else could I write about…).
So, I’d like to have a small talk with you. Let’s discuss in a comments a few things:
🤖How has the LLM revolution affected you?
🤖What do you expect in the near future?
🤖How do you see your career plans, taking into account all this hubbub?
I'll address some of these topics not to bias discussion.
Всем привет!
Я очень ценю то, что вы подписались на этот канал. Это трогательно и я чувствую себя немного неловко продолжая просто писать про градиентный бустинг и игрушку с дроном вместо фермера (правда, я не очень знаю, о чём ещё писать). Мне очень хочется со всеми поговорить. Давайте по-флудим в комментах по следующему поводу:
🤖Как тебя затронула LLM шумиха?
🤖Какие ожидания от ближайшего будущего?
🤖Влияет ли вся эта радость на карьерные планы?
Я для затравки напишу чуть-чуть чтобы ни у кого не украсть тезисы и допишу когда (если) пойдёт жара.
I really appreciate that all of you have signed up for my channel. It's kinda touching, and I feel a little bit awkward just writing about GBDT and TFWR (but what else could I write about…).
So, I’d like to have a small talk with you. Let’s discuss in a comments a few things:
🤖How has the LLM revolution affected you?
🤖What do you expect in the near future?
🤖How do you see your career plans, taking into account all this hubbub?
I'll address some of these topics not to bias discussion.
Всем привет!
Я очень ценю то, что вы подписались на этот канал. Это трогательно и я чувствую себя немного неловко продолжая просто писать про градиентный бустинг и игрушку с дроном вместо фермера (правда, я не очень знаю, о чём ещё писать). Мне очень хочется со всеми поговорить. Давайте по-флудим в комментах по следующему поводу:
🤖Как тебя затронула LLM шумиха?
🤖Какие ожидания от ближайшего будущего?
🤖Влияет ли вся эта радость на карьерные планы?
Я для затравки напишу чуть-чуть чтобы ни у кого не украсть тезисы и допишу когда (если) пойдёт жара.
❤2
Eggcorn of the day
Bone apple tea → bon appétit
Bone apple tea → bon appétit
Some time ago my friend, who worked with native English speakers, told me that when they worked with a colleague on a piece of technical documentation, the word “she” suddenly appeared referring to the “user.” They discussed it, and it turned out the gender of this word in English is opposite to the gender of the word’s translation into Russian.
When I was on Vocabulary.com, I found the phrase above and recalled that “she-user” conversation. ChatGPT assures me that the user is gender-neutral, as is soldier in the example above.
I tried to find cases of gender reversal. In the form of “word”: (English gender) (Russian gender)
the sea: she neuter
a city: she masculine
a ship: she masculine
a car: she (“автомобиль” — муж, “машина” — жен)
a country: she
a hurricane: she
This is ChatGPT’s opinion. Do you agree? Did you have some clashes like that in your practice?
When I was on Vocabulary.com, I found the phrase above and recalled that “she-user” conversation. ChatGPT assures me that the user is gender-neutral, as is soldier in the example above.
I tried to find cases of gender reversal. In the form of “word”: (English gender) (Russian gender)
the sea: she neuter
a city: she masculine
a ship: she masculine
a car: she (“автомобиль” — муж, “машина” — жен)
a country: she
a hurricane: she
This is ChatGPT’s opinion. Do you agree? Did you have some clashes like that in your practice?
Dawn of the day
It dawned on me today that Harlequin ↔️ Harley Quinn. An example of paronomasia (a pun)—more specifically, it works like an eggcorn/mondegreen for a proper name
It dawned on me today that Harlequin ↔️ Harley Quinn. An example of paronomasia (a pun)—more specifically, it works like an eggcorn/mondegreen for a proper name
Torus navigation
Problem statement: issue control commands on torus map to navigate bot to a given point.
Essence
* pythonic (x2-x1) % n gives eastward toroidal distance
* easier to think in "direction first" terms
* go_bestt(East, West, (x2-x1)%n, (x1-x2)%n) is a solution
About three years ago I met this challenge on codingame platform. Now when I tried to solve it again in TFWR I totally forget the approach and here I want to make a note on what works well and what leads to a cumbersome constructions. I always try not to memorize solutions, but approaches instead. In this post I want to compare different ways of thinking about this problem and check programs we have as a result.
My first approach
It's a straightforward way to think about this problem: to select a direction and to walk in this direction. I think it is not bad, in general, while can be improved using functions. Then I tried to extend this approach to handle torus topology. It worked, but the result (obviously) was so cumbersome, I decided not to publish it.
Next idea was to reverse direction. Logic behind is like "we reverse our decision to go East or West if it gives not the best distance on torus". I think this trick is funny, so, let's check it closer
Here we use an interesting property of XOR gate: it gives the inversion of the first parameter, if the second is true. When I realized that ^ doesn't work in TFWR, I used == instead. (a ^ b) == (a != b).
But this way is kinda creepy and I dwelled for a while, what it the best way of thinking about this problem in a right way. The results are:
1) We are working with remainders modulo n. Indeed, possible coordinates are 0, ..., (n-1) and n is equivalent to 0
2) we can think about our positions as of positions on vertices of n-gon
3) if we have two points x1, x2 they divide the n-gon into two arcs
4) in python we can calculate lengths of arcs using (x2 - x1) % n for east and (x1 - x2) % n for west directions
And that is it!
P.S.
Here I use "boolean as index" trick because there is no (v1 if cond else v2) construction in the TFWR dialect.
Note that (x2 - x1) % n is Pythonic expression is important because, for example, in C++ operator % works in a little bit different way and there you are to write something like (r%n+n)%n to have the same result.
Problem statement: issue control commands on torus map to navigate bot to a given point.
Essence
* pythonic (x2-x1) % n gives eastward toroidal distance
* easier to think in "direction first" terms
* go_bestt(East, West, (x2-x1)%n, (x1-x2)%n) is a solution
About three years ago I met this challenge on codingame platform. Now when I tried to solve it again in TFWR I totally forget the approach and here I want to make a note on what works well and what leads to a cumbersome constructions. I always try not to memorize solutions, but approaches instead. In this post I want to compare different ways of thinking about this problem and check programs we have as a result.
My first approach
def navigate(x, y):
dx = x - get_pos_x()
dy = y - get_pos_y()
if dx:
if dx > 0:
for _ in range(dx):
move(East)
else:
for _ in range(-dx):
move(West)
if dy:
if dy > 0:
for _ in range(dy):
move(North)
else:
for _ in range(-dy):
move(South)
It's a straightforward way to think about this problem: to select a direction and to walk in this direction. I think it is not bad, in general, while can be improved using functions. Then I tried to extend this approach to handle torus topology. It worked, but the result (obviously) was so cumbersome, I decided not to publish it.
Next idea was to reverse direction. Logic behind is like "we reverse our decision to go East or West if it gives not the best distance on torus". I think this trick is funny, so, let's check it closer
if (dx > 0) ^ (n - dx < dx):
go(East, abs(dx))
else:
go(West, abs(dx))
Here we use an interesting property of XOR gate: it gives the inversion of the first parameter, if the second is true. When I realized that ^ doesn't work in TFWR, I used == instead. (a ^ b) == (a != b).
But this way is kinda creepy and I dwelled for a while, what it the best way of thinking about this problem in a right way. The results are:
1) We are working with remainders modulo n. Indeed, possible coordinates are 0, ..., (n-1) and n is equivalent to 0
2) we can think about our positions as of positions on vertices of n-gon
3) if we have two points x1, x2 they divide the n-gon into two arcs
4) in python we can calculate lengths of arcs using (x2 - x1) % n for east and (x1 - x2) % n for west directions
And that is it!
def go_best(dr, dl, nr, nl):
d, s = [(dr, nr), (dl, nl)][dl<dr]
for _ in range(s):
move(d)
def nav(x2, y2):
n = get_world_size()
x1, y1 = get_pos_x(), get_pos_y()
go_best(East, West, (x2 - x1) % n, (x1 - x2) %n)
go_best(North, South, (y2 - y1) % n, (y1 - y2) %n)
P.S.
Here I use "boolean as index" trick because there is no (v1 if cond else v2) construction in the TFWR dialect.
Note that (x2 - x1) % n is Pythonic expression is important because, for example, in C++ operator % works in a little bit different way and there you are to write something like (r%n+n)%n to have the same result.
CodinGame
Practice Conditions with the exercise "Power of Thor - Episode 1"
Want to practice coding? Try to solve this easy puzzle "Power of Thor - Episode 1" (25+ languages supported).
👀1
Liquid Nitrogen Station
An attentive reader of one of my previous posts about the diamond plate in an electron microscope could notice a repetitive note: let's pour liquid nitrogen here, let's pour liquid nitrogen there. Sounds like a lot of liquid nitrogen. And it is. When I worked at the Physical Institute of the Russian Academy of Sciences, it was an everyday workout—to go with a special Dewar bowl and bring 17 liters of liquid nitrogen to the laboratory. Like 13 liters went to pump the air out of the microscope and cool down the specimen during the measurement. The remaining 4 liters evaporated during the night.
It's quite an interesting liquid and, when you have it in abundance, you can have a lot of fun. I heard that our colleagues froze ice cream with it. I myself conducted all the well-known experiments: freeze and cleave a piece of rubber. Dip your hand into it. The most amazing thing was to put a porous plastic which was widespread in the Soviet Union for packing scientific equipment. In normal conditions it's a springy material that you can compress and then it restores its original shape. But when you put it into liquid nitrogen, it becomes extremely brittle, and when you squeeze it, it becomes just a small heap of thin powder.
One more experiment I conducted inadvertently. When I was pouring liquid nitrogen into the photodetector bowl, a narrow stream escaped and soaked my jeans. So, when I unbent my leg, my jeans cracked. Jeans. Cracked. It was fun.
An attentive reader of one of my previous posts about the diamond plate in an electron microscope could notice a repetitive note: let's pour liquid nitrogen here, let's pour liquid nitrogen there. Sounds like a lot of liquid nitrogen. And it is. When I worked at the Physical Institute of the Russian Academy of Sciences, it was an everyday workout—to go with a special Dewar bowl and bring 17 liters of liquid nitrogen to the laboratory. Like 13 liters went to pump the air out of the microscope and cool down the specimen during the measurement. The remaining 4 liters evaporated during the night.
It's quite an interesting liquid and, when you have it in abundance, you can have a lot of fun. I heard that our colleagues froze ice cream with it. I myself conducted all the well-known experiments: freeze and cleave a piece of rubber. Dip your hand into it. The most amazing thing was to put a porous plastic which was widespread in the Soviet Union for packing scientific equipment. In normal conditions it's a springy material that you can compress and then it restores its original shape. But when you put it into liquid nitrogen, it becomes extremely brittle, and when you squeeze it, it becomes just a small heap of thin powder.
One more experiment I conducted inadvertently. When I was pouring liquid nitrogen into the photodetector bowl, a narrow stream escaped and soaked my jeans. So, when I unbent my leg, my jeans cracked. Jeans. Cracked. It was fun.
Big MSE dataset for GBDT
In the previous post I demonstrated a small dataset that shows how GBDT works. Now I want to present you quite a big one. There are 10 000 points in it.
I wasn't happy with the quality of the text in the small dataset, so I decided to repeat the algorithm and take more points this time.
While it's quite hard to set many points with a pen, it's easy to ask an iron friend to sample them.
At this point I got stuck for a while with an unexpected problem: I didn't like the fonts. I work in Linux, and mostly fonts don't bother me. Until now. I wanted a nice and interesting font for this task. I have no clue what "interesting" means here—probably a fat, rounded font. I don't know. And then a strange thing happened. When I googled something like "try different fonts online tool", I got nothing. Probably I was banned by Google that day, or I was extremely unlucky, but all the pages I came up with were non-functional or solved some other problem. At last, somehow I got to the... ta-da... fonts.google.com page. But the chain of thought wasn't straight.
Then progress started to go faster. My iron friend quickly rendered the title and scattered 10 000 points across it. Separation into components wasn't hard either. So I ended up with a dataset of 10 000 points separated into 128 groups.
Let's pause here. There's quite an interesting thing about tree height and the number of groups. I want to talk about it slowly and clearly, so let's do it in the next post.
In the previous post I demonstrated a small dataset that shows how GBDT works. Now I want to present you quite a big one. There are 10 000 points in it.
I wasn't happy with the quality of the text in the small dataset, so I decided to repeat the algorithm and take more points this time.
While it's quite hard to set many points with a pen, it's easy to ask an iron friend to sample them.
At this point I got stuck for a while with an unexpected problem: I didn't like the fonts. I work in Linux, and mostly fonts don't bother me. Until now. I wanted a nice and interesting font for this task. I have no clue what "interesting" means here—probably a fat, rounded font. I don't know. And then a strange thing happened. When I googled something like "try different fonts online tool", I got nothing. Probably I was banned by Google that day, or I was extremely unlucky, but all the pages I came up with were non-functional or solved some other problem. At last, somehow I got to the... ta-da... fonts.google.com page. But the chain of thought wasn't straight.
Then progress started to go faster. My iron friend quickly rendered the title and scattered 10 000 points across it. Separation into components wasn't hard either. So I ended up with a dataset of 10 000 points separated into 128 groups.
Let's pause here. There's quite an interesting thing about tree height and the number of groups. I want to talk about it slowly and clearly, so let's do it in the next post.
Big MSE dataset. Depth of trees.
There are 128 groups in the Big dataset. And to distinguish them perfectly, it's necessary to use exactly 7 binary features. Why is it so? Because when we add one more level to a decision tree, we change one leaf into two, replacing it with a decision rule. So, if we start with one root and build 7 levels over it, we will have exactly 128 leaves—one leaf for each group.
This illustrates a common phrase about GBDT: "Height of trees is the number of factors we want to work together." I have always heard this phrase, but it was just a theory for me, because in real‑world datasets you have no idea how many factors should work together.
In this dataset we know the exact number of factors to work together—7—and we can test that intuition. And that's exactly the story these three plots are telling us. At the top—trees have only two levels. The model has a limited ability to learn, and we see classic learning curves: the model grasps some generic knowledge and both train and test curves go down. Then it starts to learn noise and the test curve goes up. We can see that, because of the low depth, it can't fit the train set with good quality. It's exactly what we saw in the previous post on the plot "predicted values on train data."
If we check the lower graph, we can see that with depth 7 the MSE becomes much lower and we see a better correspondence between the dataset and the model's inference.
The third image is about the learning rate. lr for upper plots is 0.3 and for the lower 0.1 You can see that the learning curves are less steep. But it doesn't help the asymptotic value of the train curve. When points are wrongly attributed to groups in the initial steps, there is no way to re‑attribute them later.
There are 128 groups in the Big dataset. And to distinguish them perfectly, it's necessary to use exactly 7 binary features. Why is it so? Because when we add one more level to a decision tree, we change one leaf into two, replacing it with a decision rule. So, if we start with one root and build 7 levels over it, we will have exactly 128 leaves—one leaf for each group.
This illustrates a common phrase about GBDT: "Height of trees is the number of factors we want to work together." I have always heard this phrase, but it was just a theory for me, because in real‑world datasets you have no idea how many factors should work together.
In this dataset we know the exact number of factors to work together—7—and we can test that intuition. And that's exactly the story these three plots are telling us. At the top—trees have only two levels. The model has a limited ability to learn, and we see classic learning curves: the model grasps some generic knowledge and both train and test curves go down. Then it starts to learn noise and the test curve goes up. We can see that, because of the low depth, it can't fit the train set with good quality. It's exactly what we saw in the previous post on the plot "predicted values on train data."
If we check the lower graph, we can see that with depth 7 the MSE becomes much lower and we see a better correspondence between the dataset and the model's inference.
The third image is about the learning rate. lr for upper plots is 0.3 and for the lower 0.1 You can see that the learning curves are less steep. But it doesn't help the asymptotic value of the train curve. When points are wrongly attributed to groups in the initial steps, there is no way to re‑attribute them later.
GBDTE log‑loss dataset
In this post I want to solemnly declare: I'm not a mathematician. My friends who are, I'm totally sure, would solve this problem without effort using Bayes' formalism. I can only wave my hands.
So, what's the fuss?
I want to generate a synthetic dataset with properties similar to those of the initial fraudulent‑users dataset. And I want to control how much information the features bring about the target value. Moreover, I want to introduce a new variable to the dataset—t (time)—and add time dependence to the dataset's statistical properties.
Because of my math inaptitude, I use my intuition from physics problem‑solving. First, let's draw the figure you see in the picture. The horizontal axis stands for the binary factor f (feature). The vertical axis stands for the binary target l (label). The height of the horizontal line that splits l=0 and l=1 has a well‑defined meaning—it's the average value of our target, α. Let α = 0.5. That's the second equation, because the first is a + b + c + d = 1.
Then we can think about the average value of the factor. I want to have 16 factors in my dataset, and I want them, in total, to give slightly less information than would allow 100% recovery of the target. So the average factor value β should be 1/16 = 0.0625. β is the coverage—how often the factor equals 1. Third equation.
And finally, the lift. It's a ratio: in the numerator is the probability that the target equals 1 when the factor equals 1, and in the denominator is the average target value. In terms of our variables, d/(d+b) is the average target when f = 1, and α is the average target value.
When lift = 1, the factor gives no information about the target. When it's > 1, it shows how much stronger our position is when using this factor. For example, a lift of 1.3 shows that we would catch 30% more credit‑fraud users when using this factor. It's convenient to use log‑lift: it is 0 when there is no gain and has the same sign as the correlation between the target and the factor.
For my dataset I want time to run from 0 to 1, and I want two groups of factors: with the lift going up and with the lift going down. The expressions for these lifts are quite simple:
Now we have four variables and four equations to determine them. I solved the system using Gaussian elimination, and the result is in the lower picture.
I'm going to implement these expressions in the synthetic dataset generation script. It's already done, but I wanted to recap the logic behind it. Next time—the dataset.
In this post I want to solemnly declare: I'm not a mathematician. My friends who are, I'm totally sure, would solve this problem without effort using Bayes' formalism. I can only wave my hands.
So, what's the fuss?
I want to generate a synthetic dataset with properties similar to those of the initial fraudulent‑users dataset. And I want to control how much information the features bring about the target value. Moreover, I want to introduce a new variable to the dataset—t (time)—and add time dependence to the dataset's statistical properties.
Because of my math inaptitude, I use my intuition from physics problem‑solving. First, let's draw the figure you see in the picture. The horizontal axis stands for the binary factor f (feature). The vertical axis stands for the binary target l (label). The height of the horizontal line that splits l=0 and l=1 has a well‑defined meaning—it's the average value of our target, α. Let α = 0.5. That's the second equation, because the first is a + b + c + d = 1.
Then we can think about the average value of the factor. I want to have 16 factors in my dataset, and I want them, in total, to give slightly less information than would allow 100% recovery of the target. So the average factor value β should be 1/16 = 0.0625. β is the coverage—how often the factor equals 1. Third equation.
And finally, the lift. It's a ratio: in the numerator is the probability that the target equals 1 when the factor equals 1, and in the denominator is the average target value. In terms of our variables, d/(d+b) is the average target when f = 1, and α is the average target value.
When lift = 1, the factor gives no information about the target. When it's > 1, it shows how much stronger our position is when using this factor. For example, a lift of 1.3 shows that we would catch 30% more credit‑fraud users when using this factor. It's convenient to use log‑lift: it is 0 when there is no gain and has the same sign as the correlation between the target and the factor.
For my dataset I want time to run from 0 to 1, and I want two groups of factors: with the lift going up and with the lift going down. The expressions for these lifts are quite simple:
lift_up = 0.25 + 0.5*t
lift_down = 0.75 - 0.5*t
Now we have four variables and four equations to determine them. I solved the system using Gaussian elimination, and the result is in the lower picture.
I'm going to implement these expressions in the synthetic dataset generation script. It's already done, but I wanted to recap the logic behind it. Next time—the dataset.
In my other channel I published post about physical exposition which my father and I found in 1988. Quite an exclusive material . In Russian.
Telegram
Alina_Yerevan_frontend
ВДНХ. Профессор Соколов. 1988.
В 1988 году с отцом забрели на ВДНХ в павильон "Молодёжный". Мне 9 лет. Много разных экспозиций в маленьких, разделённых загородками закутках. В том числе - Занимательная Физика от Николая Николаевича Соколова. Что он Н.Н.…
В 1988 году с отцом забрели на ВДНХ в павильон "Молодёжный". Мне 9 лет. Много разных экспозиций в маленьких, разделённых загородками закутках. В том числе - Занимательная Физика от Николая Николаевича Соколова. Что он Н.Н.…
Riddle of the day
"What number, when you remove one letter from its spelling, transforms into an even number?"
UPD: There are slightly different options for this riddle, like "I’m an odd number. If I lose one letter, I become even. What number am I?"
"What number, when you remove one letter from its spelling, transforms into an even number?"
UPD: There are slightly different options for this riddle, like "I’m an odd number. If I lose one letter, I become even. What number am I?"
When is a door not a door?
I first heard this joke in the 1997 animated movie Anastasia, and it’s stuck with me ever since. In the film it’s treated like one of those classic jokes everyone’s supposed to know.
Do you know the answer? 😏
Share your guess in the comments!
I first heard this joke in the 1997 animated movie Anastasia, and it’s stuck with me ever since. In the film it’s treated like one of those classic jokes everyone’s supposed to know.
Do you know the answer? 😏
Share your guess in the comments!
TFWR. Labyrinth
Let’s think about the labyrinth problem in The Farmer Was Replaced game.
First of all, let’s state it. We have an
𝑛×𝑛 labyrinth, where n = get_world_size(). In position measure() there is a treasure. Our current position is (get_pos_x(), get_pos_y()). We can move our drone by issuing commands move(East|North|West|South). Our task is to find the treasure, which we can detect by the condition get_entity() == Entities.Treasure(). Information on the maze map we can obtain through the boolean can_move(East|North|West|South) function.
To be honest, I have no clue why the approach of “right hand” or “left hand” is so popular. I started with DFS (Depth First Search), and I want to explain this approach in the post.
It seems that it’s quite popular to start with these “hand” approaches. But they fail when you are trying to gain more money from your treasure hunt. According to the rules, the prize doubles when, instead of harvesting, you use weird substance on the treasure. In this case, you reuse the labyrinth, and the treasure jumps into some other place. You can do it 30 times (and should, if you want to have an efficient farm). Also, when the treasure jumps, some walls in the labyrinth disappear. There are two consequences of this disappearance. First of all, the “hand” approaches don’t work anymore. On the other hand, the maze simplifies and, if you use an appropriate algorithm, your drone finds the treasure faster and faster.
In this topic I want to show the simplest possible DFS code which doesn’t rely on a “strict tree structure” of the maze.
Let’s do it. DIRECTIONS – array with arguments for the move() function. visited – the set of positions we have visited so far. OPPOSITE – the dictionary with opposite directions.
The code is as simple as possible and because of this it contains a few obvious issues:
issue: in this algorithm the drone “jitters” forward and backward all the time
reason: it checks its current position in visited, so it really needs to move into a cell to check it
how to fix: write a “projection” function which can calculate the drone’s position after it makes a step and check it in visited
issue: this code collects the treasure immediately
reason: I wanted this code to be as simple as possible
how to fix: add logic to break the recursion when the drone is over the treasure, and create an outer loop for the labyrinth upgrade
Let’s think about the labyrinth problem in The Farmer Was Replaced game.
First of all, let’s state it. We have an
𝑛×𝑛 labyrinth, where n = get_world_size(). In position measure() there is a treasure. Our current position is (get_pos_x(), get_pos_y()). We can move our drone by issuing commands move(East|North|West|South). Our task is to find the treasure, which we can detect by the condition get_entity() == Entities.Treasure(). Information on the maze map we can obtain through the boolean can_move(East|North|West|South) function.
To be honest, I have no clue why the approach of “right hand” or “left hand” is so popular. I started with DFS (Depth First Search), and I want to explain this approach in the post.
It seems that it’s quite popular to start with these “hand” approaches. But they fail when you are trying to gain more money from your treasure hunt. According to the rules, the prize doubles when, instead of harvesting, you use weird substance on the treasure. In this case, you reuse the labyrinth, and the treasure jumps into some other place. You can do it 30 times (and should, if you want to have an efficient farm). Also, when the treasure jumps, some walls in the labyrinth disappear. There are two consequences of this disappearance. First of all, the “hand” approaches don’t work anymore. On the other hand, the maze simplifies and, if you use an appropriate algorithm, your drone finds the treasure faster and faster.
In this topic I want to show the simplest possible DFS code which doesn’t rely on a “strict tree structure” of the maze.
Let’s do it. DIRECTIONS – array with arguments for the move() function. visited – the set of positions we have visited so far. OPPOSITE – the dictionary with opposite directions.
def go_best(d1, d2, l1, l2):
d, l = d1, l1
if l2 < l1:
d, l = d2, l2
for _ in range(l):
move(d)
def nav(x2, y2):
n = get_world_size()
x1, y1 = get_pos_x(), get_pos_y()
go_best(East, West, (x2 - x1)%n, (x1 - x2)%n)
go_best(North, South, (y2 - y1)%n, (y1 - y2)%n)
def apply_proper_substance():
substance = get_world_size() * 2**(num_unlocked(Unlocks.Mazes) - 1)
use_item(Items.Weird_Substance, substance)
set_world_size(8)
nav(3, 3)
plant(Entities.Bush)
apply_proper_substance()
DIRECTIONS=[East, North, South, West]
OPPOSITE={East:West, West:East, North:South, South:North}
def dfs(visited):
if (get_pos_x(), get_pos_y()) in visited:
return
visited.add((get_pos_x(), get_pos_y()))
if get_entity_type() == Entities.Treasure:
harvest()
for dir_to_move in DIRECTIONS:
if can_move(dir_to_move):
move(dir_to_move)
dfs(visited)
move(OPPOSITE[dir_to_move])
visited = set()
dfs(visited)
while True:
pass
The code is as simple as possible and because of this it contains a few obvious issues:
issue: in this algorithm the drone “jitters” forward and backward all the time
reason: it checks its current position in visited, so it really needs to move into a cell to check it
how to fix: write a “projection” function which can calculate the drone’s position after it makes a step and check it in visited
issue: this code collects the treasure immediately
reason: I wanted this code to be as simple as possible
how to fix: add logic to break the recursion when the drone is over the treasure, and create an outer loop for the labyrinth upgrade
GBDTE: LogLoss dataset
My approach with the synthetic MSE dataset was successful. I created it, and all experiments gave me quite expected results. The next frontier is a synthetic dataset for testing the logloss function.
This is important for me because I started with this problem, and I want a clear demonstration, especially on synthetic data, that this approach works and that we can improve the stability of our model by incorporating time.
So I started with the same approach my friend and I used almost ten years ago when we published our article. We created a dataset with everything binary: a binary target and binary features. That means all regular features used for splits and the target can be either 0 or 1. The secret ingredient is time: a value in the range from 0 to 1, and a time‑dependent basis [1, t]. The goal of the model is to find weights w1 and w2 so that (w1*1 + w2*t) is the best possible score, which minimizes logloss on this dataset.
To make things interesting, we introduce time dependence into the dataset’s statistical properties. I decided that it’s convenient to make the lift time‑dependent.
Let’s stop here for a moment and discuss this term. We can calculate the target average over the whole dataset, and then over a group selected using a factor. Lift is the ratio of the average target over the selected subset to the average target over the whole dataset. My idea is to make this lift change over time.
There are two groups of static factors: f1..f8 with increasing lift and f9..f16 with decreasing lift. In the original setup the picture was slightly asymmetrical, and the points where the lifts were equal to one for the two groups were separated, but this time I set the dependencies gamma_up = 0.5 + t and gamma_down = 1.5 - t.
I expected to get a picture similar to what we had in our article, but this time I wasn’t as lucky as we were nine years ago. The picture was different. I have no idea why, and now I’m digging deep into the theory of time‑dependent binary datasets.
My approach with the synthetic MSE dataset was successful. I created it, and all experiments gave me quite expected results. The next frontier is a synthetic dataset for testing the logloss function.
This is important for me because I started with this problem, and I want a clear demonstration, especially on synthetic data, that this approach works and that we can improve the stability of our model by incorporating time.
So I started with the same approach my friend and I used almost ten years ago when we published our article. We created a dataset with everything binary: a binary target and binary features. That means all regular features used for splits and the target can be either 0 or 1. The secret ingredient is time: a value in the range from 0 to 1, and a time‑dependent basis [1, t]. The goal of the model is to find weights w1 and w2 so that (w1*1 + w2*t) is the best possible score, which minimizes logloss on this dataset.
To make things interesting, we introduce time dependence into the dataset’s statistical properties. I decided that it’s convenient to make the lift time‑dependent.
Let’s stop here for a moment and discuss this term. We can calculate the target average over the whole dataset, and then over a group selected using a factor. Lift is the ratio of the average target over the selected subset to the average target over the whole dataset. My idea is to make this lift change over time.
There are two groups of static factors: f1..f8 with increasing lift and f9..f16 with decreasing lift. In the original setup the picture was slightly asymmetrical, and the points where the lifts were equal to one for the two groups were separated, but this time I set the dependencies gamma_up = 0.5 + t and gamma_down = 1.5 - t.
I expected to get a picture similar to what we had in our article, but this time I wasn’t as lucky as we were nine years ago. The picture was different. I have no idea why, and now I’m digging deep into the theory of time‑dependent binary datasets.
This media is not supported in your browser
VIEW IN TELEGRAM
Bubble sort of one cacti column in TFWR
This game provides quite a nice opportunity to see how different sorting algorithms work. In this post, let's discuss the famous bubble sort.
You can check the code on github.
Let's check how it is built.
best_move is a helper function for nav. nav allows us to move the drone to any given coordinates on the field.
Then, in the script I set the farm size to 16, just to test moving the drone to the point (6, 6). Till the soil, so we can plant cacti. Then - plant cacti. And, finally, lines 32-36 - bubble sort.
For me, here it's interesting to clearly set variables and their physical sense. I don't want to do unnecessary work and I want to correctly handle corner cases. So:
* y_upper - is the last cell we want to swap cacti with
Let's stop here and think, what we can derive from this statement. It means that y_upper moves from (n-1) to 1 inclusive. So, small subtask - to set range for y_upper correctly. It's range(n-1, 0, -1).
After that, everything is simple. The inner cycle, which sets the drone's position, just loops from 0 to y_upper - 1, which gives a simple range(y_upper). Basically, that's it. The measure() function from the game is very handy for this task, comparison with the upper cell is just
as well as swapping with this cell
One more interesting thing. The map is on the torus, so in the beginning of sorting the drone "rotates" in one direction. But when more than half of the column is sorted, it starts to move back and forth.
This game provides quite a nice opportunity to see how different sorting algorithms work. In this post, let's discuss the famous bubble sort.
You can check the code on github.
Let's check how it is built.
best_move is a helper function for nav. nav allows us to move the drone to any given coordinates on the field.
Then, in the script I set the farm size to 16, just to test moving the drone to the point (6, 6). Till the soil, so we can plant cacti. Then - plant cacti. And, finally, lines 32-36 - bubble sort.
For me, here it's interesting to clearly set variables and their physical sense. I don't want to do unnecessary work and I want to correctly handle corner cases. So:
* y_upper - is the last cell we want to swap cacti with
Let's stop here and think, what we can derive from this statement. It means that y_upper moves from (n-1) to 1 inclusive. So, small subtask - to set range for y_upper correctly. It's range(n-1, 0, -1).
After that, everything is simple. The inner cycle, which sets the drone's position, just loops from 0 to y_upper - 1, which gives a simple range(y_upper). Basically, that's it. The measure() function from the game is very handy for this task, comparison with the upper cell is just
measure() > measure(North)
as well as swapping with this cell
swap(North)
One more interesting thing. The map is on the torus, so in the beginning of sorting the drone "rotates" in one direction. But when more than half of the column is sorted, it starts to move back and forth.