Your AI partner
I want to dilute hard stuff a little bit with shitposting. Let's compare our assistants, how they see us.
Prompt:
It's my AI assistant. Feel free to post yours in comments.
You know, it's surprisingly I don't know... touchy... I would like to be a less demanding partner, to be honest.
I want to dilute hard stuff a little bit with shitposting. Let's compare our assistants, how they see us.
Prompt:
Изобрази максимально
честную картинку о том, как я к
тебе относился за всё время.
It's my AI assistant. Feel free to post yours in comments.
You know, it's surprisingly I don't know... touchy... I would like to be a less demanding partner, to be honest.
😁1
Dict vs %
In the previous version of left-hand maze traversal the heavy lifting was done by %. It guarantees that when we turn left or right, our direction (number 0,1,2,3 stored in dc) stays in the 0-3 range. One can use a dict to straightly map a current direction to the next after CV or CCV turn.
Let's compare:
old
new
One more trick - move does nothing (and return False) when the wall is in front of us. So we can combine can_move and move.
Of course, it works because we introduced dictionaries:
code
In the previous version of left-hand maze traversal the heavy lifting was done by %. It guarantees that when we turn left or right, our direction (number 0,1,2,3 stored in dc) stays in the 0-3 range. One can use a dict to straightly map a current direction to the next after CV or CCV turn.
Let's compare:
old
dc = 0
while get_entity_type() != Entities.Treasure:
dc = (dc + 1) % 4
while not can_move(d[dc]):
dc = (dc - 1) % 4
move(d[dc])
new
dc = East
while get_entity_type() != Entities.Treasure:
dc = l[dc]
while not move(dc):
dc = r[dc]
One more trick - move does nothing (and return False) when the wall is in front of us. So we can combine can_move and move.
Of course, it works because we introduced dictionaries:
l = {East:North, North:West, West:South, South:East}
r = {East:South, South:West, West:North, North:East}code
❤1🔥1
GBDTE
It's quite hard to navigate the channel, so I created this navigation/summary post. It's about a pet project I started about ten years ago. The main idea is that we can use slightly modified gradient boosted decision trees to both group objects and find trends for these groups.
📈beginning - the very first picture, the whole idea
📈credit scoring - problem statement, temporal instability
📈dataset - dataset preparation, ytzaurus vs Oracle
📈Vanilla GBDTE - experiment with math in instant view
📈Small MSE Dataset - the first approach to synthetic dataset for MSE GBDTE
📉Extracting components - how to get perfect components from chaotic signal
📉Leafs and components - check tree leaves and plot components
📉Evil of defaults - a debugging session, culprit - default parameters
📉Big MSE dataset - scatterplot with more clear "Gradient Boosting" message
📉LogLoss dataset - non-stationary dataset for binary classification
🎲Experiment on LogLoss dataset - first approach for running the algorithm on the dataset
🎲bad results - a very important mistake! Why you shouldn't use interpolation factors as extrapolating ones
🎲illustration for unstable class - a picture for a presentation
🎲learning curves LogLoss - learning curves for LogLoss case (non-stationary binary classification)
It's quite hard to navigate the channel, so I created this navigation/summary post. It's about a pet project I started about ten years ago. The main idea is that we can use slightly modified gradient boosted decision trees to both group objects and find trends for these groups.
📈beginning - the very first picture, the whole idea
📈credit scoring - problem statement, temporal instability
📈dataset - dataset preparation, ytzaurus vs Oracle
📈Vanilla GBDTE - experiment with math in instant view
📈Small MSE Dataset - the first approach to synthetic dataset for MSE GBDTE
📉Extracting components - how to get perfect components from chaotic signal
📉Leafs and components - check tree leaves and plot components
📉Evil of defaults - a debugging session, culprit - default parameters
📉Big MSE dataset - scatterplot with more clear "Gradient Boosting" message
📉LogLoss dataset - non-stationary dataset for binary classification
🎲Experiment on LogLoss dataset - first approach for running the algorithm on the dataset
🎲bad results - a very important mistake! Why you shouldn't use interpolation factors as extrapolating ones
🎲illustration for unstable class - a picture for a presentation
🎲learning curves LogLoss - learning curves for LogLoss case (non-stationary binary classification)
👍1😐1
Repository is public
After a short conversation with the iron friend, I decided to use the permissive license Apache 2.0 for the Gradient Boosted Decision Trees with Extrapolation repository and made it public.
The main idea was that contributors don't like contributing to repositories with restricted licenses.
https://github.com/tarstars/gbdte/
After a short conversation with the iron friend, I decided to use the permissive license Apache 2.0 for the Gradient Boosted Decision Trees with Extrapolation repository and made it public.
The main idea was that contributors don't like contributing to repositories with restricted licenses.
https://github.com/tarstars/gbdte/
The ideal model. Part 1.
I procrastinated on this topic for some time. Let's at least start this conversation.
What's the problem?
I have a synthetic dataset, so I know the rules of how it was built. There is one model trained on this dataset: GBDTE. I have logloss and ROC AUC scores as results of this training. But I want to know whether it's good or bad. I need a reference. And the reference is The Ideal Model.
I asked ChatGPT to come up with an approach to this problem and the iron friend wrote down some expressions. I honestly think that the main judge in ML is the cross-validation score and score is great, so the expression is not total rubbish.
But also I want to understand what's behind these expressions, how they were derived.
Let's check the picture. First of all, we are using u1...u8 for f1...f8 because uplift increases with time. d1...d8 for f9...f16 because uplift of these features goes down. Then we start building a discriminator between the cases L=0 and L=1. We write down the proportionality for probability of target 1 with a given set of factor values and the same for target 0. Then the score that discriminates between 0 and 1 is the logarithm of the probabilities ratio. This expression can be calculated using the conditional probabilities of the factors given the target. I assume that the whole thing is Bayes' theorem (with multiple variables).
The set of probabilities of u and d under the condition of target value is exactly what makes this dataset so special: expressions with drifting uplift. These expressions are like laws of physics for the virtual universe in which this dataset exists. Therefore we are trying to come up with the best prediction in this virtual universe.
I assume that the next step is substituting the conditional probabilities into the log expression. I'm going to check it — stay tuned.
P.S. The proper name for this approach is a Naive Bayes log-odds derivation.
I procrastinated on this topic for some time. Let's at least start this conversation.
What's the problem?
I have a synthetic dataset, so I know the rules of how it was built. There is one model trained on this dataset: GBDTE. I have logloss and ROC AUC scores as results of this training. But I want to know whether it's good or bad. I need a reference. And the reference is The Ideal Model.
I asked ChatGPT to come up with an approach to this problem and the iron friend wrote down some expressions. I honestly think that the main judge in ML is the cross-validation score and score is great, so the expression is not total rubbish.
But also I want to understand what's behind these expressions, how they were derived.
Let's check the picture. First of all, we are using u1...u8 for f1...f8 because uplift increases with time. d1...d8 for f9...f16 because uplift of these features goes down. Then we start building a discriminator between the cases L=0 and L=1. We write down the proportionality for probability of target 1 with a given set of factor values and the same for target 0. Then the score that discriminates between 0 and 1 is the logarithm of the probabilities ratio. This expression can be calculated using the conditional probabilities of the factors given the target. I assume that the whole thing is Bayes' theorem (with multiple variables).
The set of probabilities of u and d under the condition of target value is exactly what makes this dataset so special: expressions with drifting uplift. These expressions are like laws of physics for the virtual universe in which this dataset exists. Therefore we are trying to come up with the best prediction in this virtual universe.
I assume that the next step is substituting the conditional probabilities into the log expression. I'm going to check it — stay tuned.
P.S. The proper name for this approach is a Naive Bayes log-odds derivation.
Congratulations!
We’ve hit a round number of subscribers!
Thank you for being here.
I’m not Ramanujan, and not every number feels like a close friend. Still, when I watched the subscriber count slowly crawling up, I couldn’t help noticing some famous “nice” stops: 100, 125, 128… And now we have 137. This one I just can’t skip. It’s a crossroads of different disciplines. So buckle up — let’s dive in.
In Bohr’s model, if you calculate the speed of an electron in the ground state and express it as a fraction of the speed of light, you’ll get a number close to 1/137. This is the fine-structure constant — a key character in quantum theory. It’s a kind of bridge between light and matter: it tells you how strong (how “efficient”) the interaction between the electromagnetic field and matter is.
Now let’s compute the exact value of this ratio:
The period is 07299270. A palindrome. That’s a pretty rare occasion. I didn’t check it deeply, but I do remember one thing: within the first couple thousand natural numbers it’s unique. Back in 10th grade I even had a homework task: write a program that prints the periodic and non-periodic parts of 1/n for different n. And yes — a long scroll of decimals for n = 1…2000 lived with me for quite a while.
And one more small gem: 729 = 27×27 = 9×9×9. I don’t believe in numerology… but this bunch of facts is still weirdly satisfying.
Stay tuned!
We’ve hit a round number of subscribers!
Thank you for being here.
I’m not Ramanujan, and not every number feels like a close friend. Still, when I watched the subscriber count slowly crawling up, I couldn’t help noticing some famous “nice” stops: 100, 125, 128… And now we have 137. This one I just can’t skip. It’s a crossroads of different disciplines. So buckle up — let’s dive in.
In Bohr’s model, if you calculate the speed of an electron in the ground state and express it as a fraction of the speed of light, you’ll get a number close to 1/137. This is the fine-structure constant — a key character in quantum theory. It’s a kind of bridge between light and matter: it tells you how strong (how “efficient”) the interaction between the electromagnetic field and matter is.
Now let’s compute the exact value of this ratio:
1/137 = 0.00729927007299270...
The period is 07299270. A palindrome. That’s a pretty rare occasion. I didn’t check it deeply, but I do remember one thing: within the first couple thousand natural numbers it’s unique. Back in 10th grade I even had a homework task: write a program that prints the periodic and non-periodic parts of 1/n for different n. And yes — a long scroll of decimals for n = 1…2000 lived with me for quite a while.
And one more small gem: 729 = 27×27 = 9×9×9. I don’t believe in numerology… but this bunch of facts is still weirdly satisfying.
Stay tuned!
🎉4
Levels of liquid in a Markov process = PageRank
This post is a compact version of what happened in a neighboring chat. Sorry to newcomers who already read it — and welcome!
The first two pictures are from a student who was wondering why we suddenly put zeroes on the right-hand side of the equations. The student was totally disoriented, because they called it “normalization”.
Here’s the idea.
A stationary situation is the one where nothing changes in time, i.e. all time derivatives are zero. So it looks like we have four equations for four unknowns and we can just solve them. Not yet.
When the original problem is a Markov process with transition intensities/probabilities, you can read it as a physical model:
🫙four jars (states),
🚰 tubes between them (directed edges),
💨 flow through each tube is proportional to the “pressure” in the source jar (so yes, a weird viscous toy model).
Physical intuition: the system relaxes. After some time, the levels stop changing — inflow equals outflow for every jar. Those “asymptotic probabilities” are exactly the stationary distribution.
Now the important part: liquid can’t appear or disappear. Total amount is conserved. Because of that, the balance equations are linearly dependent: one of them is redundant. That’s why you cross out one equation and replace it with the conservation law:
p₀ + p₁ + p₂ + p₃ = 1
That’s the “normalization”. It’s not some mystical extra trick — it’s just “total liquid is 1 m³”.
The third picture shows the result of numerical modeling for different initial states. You can clearly see the relaxation stage and then convergence to the same stable solution (dotted lines).
And here comes the nice historical echo. ~30 years ago, Larry Page and Sergey Brin solved basically the same stationary-flow problem on the web graph: links became transition probabilities, and the stationary distribution became a ranking score for pages. This approach is called PageRank — not because of “pages”, but because of Larry’s family name.
It worked amazingly well for the early Internet (when “a link is a vote” was closer to truth), and it helped Google rocket. It’s not the whole story of ranking anymore — but as a first big scalable idea, it was a game changer.
This post is a compact version of what happened in a neighboring chat. Sorry to newcomers who already read it — and welcome!
The first two pictures are from a student who was wondering why we suddenly put zeroes on the right-hand side of the equations. The student was totally disoriented, because they called it “normalization”.
Here’s the idea.
A stationary situation is the one where nothing changes in time, i.e. all time derivatives are zero. So it looks like we have four equations for four unknowns and we can just solve them. Not yet.
When the original problem is a Markov process with transition intensities/probabilities, you can read it as a physical model:
🫙four jars (states),
🚰 tubes between them (directed edges),
💨 flow through each tube is proportional to the “pressure” in the source jar (so yes, a weird viscous toy model).
Physical intuition: the system relaxes. After some time, the levels stop changing — inflow equals outflow for every jar. Those “asymptotic probabilities” are exactly the stationary distribution.
Now the important part: liquid can’t appear or disappear. Total amount is conserved. Because of that, the balance equations are linearly dependent: one of them is redundant. That’s why you cross out one equation and replace it with the conservation law:
p₀ + p₁ + p₂ + p₃ = 1
That’s the “normalization”. It’s not some mystical extra trick — it’s just “total liquid is 1 m³”.
The third picture shows the result of numerical modeling for different initial states. You can clearly see the relaxation stage and then convergence to the same stable solution (dotted lines).
And here comes the nice historical echo. ~30 years ago, Larry Page and Sergey Brin solved basically the same stationary-flow problem on the web graph: links became transition probabilities, and the stationary distribution became a ranking score for pages. This approach is called PageRank — not because of “pages”, but because of Larry’s family name.
It worked amazingly well for the early Internet (when “a link is a vote” was closer to truth), and it helped Google rocket. It’s not the whole story of ranking anymore — but as a first big scalable idea, it was a game changer.
Vibe Coding Guide
I’m monitoring a few resources with good vibe-coding and agentic programming practices, and I try to apply this stuff in my everyday work.
There’s a manual with some best vibe-coding practices. I’m going to read it and share the ideas that clicked for me here.
I’m monitoring a few resources with good vibe-coding and agentic programming practices, and I try to apply this stuff in my everyday work.
There’s a manual with some best vibe-coding practices. I’m going to read it and share the ideas that clicked for me here.
🔥3❤2
Slightly improved MapReduce sunflowers
I just realized that in the post I promised to share a new version of map-reduce sunflowers with a small improvement for watering cells with tall sunflowers.
There are no 10 fire flags, but 4 flags is good enough for me, I really appreciate it.
So, small improvement: the whole code
Exact line - we check if the sunflower is a tallest possible one, and if it is, we provide it with water.
Different logic for coordinates: for each height we store a list of points
A little bit different merge stage.
I just realized that in the post I promised to share a new version of map-reduce sunflowers with a small improvement for watering cells with tall sunflowers.
There are no 10 fire flags, but 4 flags is good enough for me, I really appreciate it.
So, small improvement: the whole code
Exact line - we check if the sunflower is a tallest possible one, and if it is, we provide it with water.
Different logic for coordinates: for each height we store a list of points
A little bit different merge stage.
Telegram
Algorithms. Physics. Mathematics. Machine Learning.
Sunflowers. MapReduce.
P.S. While writing this post, I realized this algorithm can be improved. There’s some unnecessary work: first I build a (x, y) → h dictionary, and then I rebuild it into h → [(xi, yi), …]. We can generate the latter format from the…
P.S. While writing this post, I realized this algorithm can be improved. There’s some unnecessary work: first I build a (x, y) → h dictionary, and then I rebuild it into h → [(xi, yi), …]. We can generate the latter format from the…
To those, who want to understand agents properly
There is a post (in Russian) from a guy I really respect. Formally, it’s an advertisement for Yet Another Agent’s channel — but this time it’s not about news or hype. It’s about fundamentals.
What I especially like: the post contains a curated list of publications worth reading first, before drowning in frameworks, tools, and buzzwords.
There is a post (in Russian) from a guy I really respect. Formally, it’s an advertisement for Yet Another Agent’s channel — but this time it’s not about news or hype. It’s about fundamentals.
What I especially like: the post contains a curated list of publications worth reading first, before drowning in frameworks, tools, and buzzwords.
From uplift to logit
Where we are
In the previous post we started to build a bridge from uplift to a score, which lets us solve binary classification problem. What we achieved:
⚗️ calculated probabilities for all combinations of binary feature f and target L
⚗️ wrote a score expression for S as logarithm of conditional probabilities fraction
Notation
⚗️ L - a binary target, 0 or 1 values (l in picture)
⚗️ r - a binary feature, 0 or 1 values
⚗️ α = P(L=1) - probability of target to be 1 or average value of target
⚗️ ϐ = P(f=1) - coverage, probability of feature to be equal 1, or average value of feature
⚗️ γ = P(L=1|f=1)/P(l=1) - uplift, gain in target we have when selected subset with f = 1
⚗️ S - score for ideal model to predict target
Problem statement
For given α, ϐ, γ:
🦐 create a dataset with non-stationary uplift γ
🦐 create baseline S for target prediction
Increment
For me these conditional probabilities are terra incognita. I don't have solid intuition, what is right and what is wrong. So, I was wanted a small sanity check for all this complex for me math. And test is quite simple: γ=1 means that factor doesn't give new information about target. In this case weights for this factor should become 0. In the previous attempt it wasn't obvious at all.
This time I slightly changed notation and used conditional probabilities instead of absolute one. And it is totally obvious now that when γ=1, expressions under logarithms become 1, logarithms are equal to 0 and therefore, expressions with N and M (number of ones and zeroes in sets of factors) goes away. It means that factors are not important, exactly what we assumed taking γ=1. Now I'm quite confident in this result, so let's dive deeper.
Outcomes
🦐 we have watertight expressions for dataset generation for arbitrary γ
🦐 there is an expression for the optimal model S
🦐 case γ=1 checked and S behave as it expected
🦐 expression for S is a bridge from probabilities to logistic regression model
🦐 it’s a universal approach — we can try it in the wild
Where we are
In the previous post we started to build a bridge from uplift to a score, which lets us solve binary classification problem. What we achieved:
⚗️ calculated probabilities for all combinations of binary feature f and target L
⚗️ wrote a score expression for S as logarithm of conditional probabilities fraction
Notation
⚗️ L - a binary target, 0 or 1 values (l in picture)
⚗️ r - a binary feature, 0 or 1 values
⚗️ α = P(L=1) - probability of target to be 1 or average value of target
⚗️ ϐ = P(f=1) - coverage, probability of feature to be equal 1, or average value of feature
⚗️ γ = P(L=1|f=1)/P(l=1) - uplift, gain in target we have when selected subset with f = 1
⚗️ S - score for ideal model to predict target
Problem statement
For given α, ϐ, γ:
🦐 create a dataset with non-stationary uplift γ
🦐 create baseline S for target prediction
Increment
For me these conditional probabilities are terra incognita. I don't have solid intuition, what is right and what is wrong. So, I was wanted a small sanity check for all this complex for me math. And test is quite simple: γ=1 means that factor doesn't give new information about target. In this case weights for this factor should become 0. In the previous attempt it wasn't obvious at all.
This time I slightly changed notation and used conditional probabilities instead of absolute one. And it is totally obvious now that when γ=1, expressions under logarithms become 1, logarithms are equal to 0 and therefore, expressions with N and M (number of ones and zeroes in sets of factors) goes away. It means that factors are not important, exactly what we assumed taking γ=1. Now I'm quite confident in this result, so let's dive deeper.
Outcomes
🦐 we have watertight expressions for dataset generation for arbitrary γ
🦐 there is an expression for the optimal model S
🦐 case γ=1 checked and S behave as it expected
🦐 expression for S is a bridge from probabilities to logistic regression model
🦐 it’s a universal approach — we can try it in the wild
Titanic - Maching Learning from Disaster
I want to close my gestalt. I invented a new ML method, I'm working as ML engineer for 9 years, I taught ML in GoTo school and in Russian Armenian University. But I have never submitted Titanic from Kaggle. Let's try to build our own approach. I think it's always interesting to start from scratch, build your own solution and only then read work of others.
Kaggle
It's a famous platform for machine learning. There is plethora of datasets for different tasks. You can find datasets for regression, classification, ranking, whatever. Kaggle provides free GPU resources for training your models, it allows you to check the quality of your solutions, compete with best ML geeks of our planet. "Grandmaster of Kaggle" is the title you want to have in your CV. So, not a bad place to hang around.
Titanic
13 years ago (what a number!) The Competition was published. Nowadays it's a good tradition to start ML lessons with this dataset. Why? Because:
🛳 It's a binary classification task
🛳 you can train your grasp of precision, accuracy, recall, ROC, ROC AUC
🛳 a lot of missing values
🛳 multimodal data (strings + numbers)
Data
On the data page one can check dataset's structure. It's split on two parts: training part and test part. You are to train model on your data, then apply the model to test data and submit it. Platform will check your results and give you feedback. Let's check fields:
target
☠️ survived - target field, 1 for survived (38%), 0 otherwise.
features
🛂 PassengerId — Unique identifier for each passenger. Not used as a predictor.
🏛Pclass — Ticket class: 1 = 1st (upper), 2 = 2nd (middle), 3 = 3rd (lower). Proxy for socio-economic status.
👺Name — Passenger name. Can be used to extract title (Mr, Mrs, Miss, etc.) or family.
👧🏻 Sex — Biological sex (male / female). Strong predictor of survival (“women and children first”).
🧙🏿♂️ Age — Age in years. Fractional if under 1. Has missing values.
👩❤️👨 SibSp — Number of siblings or spouses aboard (Sibling + Spouse). Discrete count (0–8).
👨👩👦 Parch — Number of parents or children aboard (Parent + Child). Discrete count (0–6).
🎫 Ticket — Ticket number. Often alphanumeric; can be messy for modeling.
💲 Fare — Passenger fare paid. Continuous; may correlate with Pclass and Cabin.
🏡 Cabin — Cabin number. Heavily missing; when present, deck/position may be informative.
🧳 Embarked — Port of embarkation: C = Cherbourg, Q = Queenstown, S = Southampton.
EDA
It's always a good idea to check your data. You can understand, how full they are and how informative they are.
🏛Pclass
No missing values, three values with frequencies:
1 216
2 184
3 491
We can calculate average survival rate for all three classes:
1 63%
2 47%
3 24%
Now we can recall that average survival rate is 38%. So, first class highly votes for the survival, second class slightly votes for survival and third class highly votes against survival.
👺Name
It's a good quiestion, whether name contains some information for survival at all. Let's check. I want to use hash trick technique and figure out. Experiment with hash trick demonstrated that Name feature contains information and gives 62% ROC AUC. So I dove into this field deeper. It turned out that it has format like
I checked distribution of survived for different titles and it is a good feature:
Master 0.575000 40
Miss 0.702703 185
Mr 0.156673 517
Other 0.724832 149
👧🏻 Sex
definitely a strong feature, it is on the picture
🧳 Embarked — Port of embarkation: C = Cherbourg, Q = Queenstown, S = Southampton.
Three different values, strong uplift => valueable feature
I'm going to continue this exploration. I would like to see that this topic is interesting. Please vote with 🔥 if you like it.
Jupyter Lab
github: eda
I want to close my gestalt. I invented a new ML method, I'm working as ML engineer for 9 years, I taught ML in GoTo school and in Russian Armenian University. But I have never submitted Titanic from Kaggle. Let's try to build our own approach. I think it's always interesting to start from scratch, build your own solution and only then read work of others.
Kaggle
It's a famous platform for machine learning. There is plethora of datasets for different tasks. You can find datasets for regression, classification, ranking, whatever. Kaggle provides free GPU resources for training your models, it allows you to check the quality of your solutions, compete with best ML geeks of our planet. "Grandmaster of Kaggle" is the title you want to have in your CV. So, not a bad place to hang around.
Titanic
13 years ago (what a number!) The Competition was published. Nowadays it's a good tradition to start ML lessons with this dataset. Why? Because:
🛳 It's a binary classification task
🛳 you can train your grasp of precision, accuracy, recall, ROC, ROC AUC
🛳 a lot of missing values
🛳 multimodal data (strings + numbers)
Data
On the data page one can check dataset's structure. It's split on two parts: training part and test part. You are to train model on your data, then apply the model to test data and submit it. Platform will check your results and give you feedback. Let's check fields:
target
☠️ survived - target field, 1 for survived (38%), 0 otherwise.
features
🛂 PassengerId — Unique identifier for each passenger. Not used as a predictor.
🏛Pclass — Ticket class: 1 = 1st (upper), 2 = 2nd (middle), 3 = 3rd (lower). Proxy for socio-economic status.
👺Name — Passenger name. Can be used to extract title (Mr, Mrs, Miss, etc.) or family.
👧🏻 Sex — Biological sex (male / female). Strong predictor of survival (“women and children first”).
🧙🏿♂️ Age — Age in years. Fractional if under 1. Has missing values.
👩❤️👨 SibSp — Number of siblings or spouses aboard (Sibling + Spouse). Discrete count (0–8).
👨👩👦 Parch — Number of parents or children aboard (Parent + Child). Discrete count (0–6).
🎫 Ticket — Ticket number. Often alphanumeric; can be messy for modeling.
💲 Fare — Passenger fare paid. Continuous; may correlate with Pclass and Cabin.
🏡 Cabin — Cabin number. Heavily missing; when present, deck/position may be informative.
🧳 Embarked — Port of embarkation: C = Cherbourg, Q = Queenstown, S = Southampton.
EDA
It's always a good idea to check your data. You can understand, how full they are and how informative they are.
🏛Pclass
No missing values, three values with frequencies:
1 216
2 184
3 491
We can calculate average survival rate for all three classes:
1 63%
2 47%
3 24%
Now we can recall that average survival rate is 38%. So, first class highly votes for the survival, second class slightly votes for survival and third class highly votes against survival.
👺Name
It's a good quiestion, whether name contains some information for survival at all. Let's check. I want to use hash trick technique and figure out. Experiment with hash trick demonstrated that Name feature contains information and gives 62% ROC AUC. So I dove into this field deeper. It turned out that it has format like
Futrelle, Mrs. Jacques Heath (Lily May Peel)
I checked distribution of survived for different titles and it is a good feature:
Master 0.575000 40
Miss 0.702703 185
Mr 0.156673 517
Other 0.724832 149
👧🏻 Sex
definitely a strong feature, it is on the picture
🧳 Embarked — Port of embarkation: C = Cherbourg, Q = Queenstown, S = Southampton.
Three different values, strong uplift => valueable feature
I'm going to continue this exploration. I would like to see that this topic is interesting. Please vote with 🔥 if you like it.
Jupyter Lab
github: eda
🔥3
Old but gold
Let's discuss a programming task. You have a bunch of numeric pairs like 5, 5, 7, 7, 2, 2 and one unique number, like 13. You mix everything into one array and shuffle it. Now you have something like [7, 2, 13, 5, 2, 7, 5]. Let n be the number of elements and let it be huge, like 100000000. Numbers are 32- or 64-bit wide. The task is to find the unique number. In our case it would be 13.
Feel free to share in comments time and memory complexities of your approach to this problem. It would be like T=O(...); M=O(...)
I think it would be fun to collect options for a few days and then compare algorithms.
Let's discuss a programming task. You have a bunch of numeric pairs like 5, 5, 7, 7, 2, 2 and one unique number, like 13. You mix everything into one array and shuffle it. Now you have something like [7, 2, 13, 5, 2, 7, 5]. Let n be the number of elements and let it be huge, like 100000000. Numbers are 32- or 64-bit wide. The task is to find the unique number. In our case it would be 13.
Feel free to share in comments time and memory complexities of your approach to this problem. It would be like T=O(...); M=O(...)
I think it would be fun to collect options for a few days and then compare algorithms.
How to vibecode Chizhik-Pyzchic app
In the alternative channel there is step-by-step instruction how to create a small, but fully functional web application. Post language is Russian.
In the alternative channel there is step-by-step instruction how to create a small, but fully functional web application. Post language is Russian.
Telegram
Alina_Yerevan_frontend
Как навайбкодить Чижика-Пыжика
Мы живём в интересное время. Все привыкли, что программисты - технари. Большие языковые модели делают программистом всякого, кто может хорошо и чётко выражать свои мысли устно или письменно. То есть теперь гуманитарии - программисты.…
Мы живём в интересное время. Все привыкли, что программисты - технари. Большие языковые модели делают программистом всякого, кто может хорошо и чётко выражать свои мысли устно или письменно. То есть теперь гуманитарии - программисты.…
❤1👍1
Old but gold
There seems to be two different problems. But really, it's the same problem.
1. Find the repeating and non-repeating parts of the decimal expansion of 1/n.
1/3 = 0.(3)
1/6 = 0.1(6)
2. For a singly linked list, check whether it contains a cycle.
Share your thoughts and ideas in the comments. Use the “spoiler” feature wisely. I’ll publish the canonical approach in 3 days, if needed.
There seems to be two different problems. But really, it's the same problem.
1. Find the repeating and non-repeating parts of the decimal expansion of 1/n.
1/3 = 0.(3)
1/6 = 0.1(6)
2. For a singly linked list, check whether it contains a cycle.
Share your thoughts and ideas in the comments. Use the “spoiler” feature wisely. I’ll publish the canonical approach in 3 days, if needed.