Algorithms. Physics. Mathematics. Machine Learning.
390 subscribers
167 photos
10 videos
1 file
76 links
DIY projects, fun with 3d, electronics, programming, vibe coding, math, ML algorithms.
Download Telegram
Friday shitposting. Old Moscow photos.

It seems that Iโ€™m stuck in a deep perfectionist loop, endlessly polishing my post about stochastic wandering. Meanwhile, let me share some trash photos I took about 20 years ago.
๐Ÿ˜1
Stochastic AUROC

ยซ prev | content | next ยป

Five previous posts were a preparation for this one. We prepared age model which uses only age factor. The model gives us the Score with low cardinal number because there are only three values: -1, 0, 1. And we plotted ROC and calculated AUROC.

The question which has bugged me for eleven years, since the moment I heard the ROC and AUROC definitions in the School of Data Analysis, is: "How come there are only three points on ROC, when, by definition, there should be at least 891 points?" The magic number 891 is the number of passengers in the Titanic dataset, and three is the number of different model answers.

Let's start thinking from first principles. What is our ROC approach? To assign a score, to make a rank. But if we have only -1, 0, 1 values, the rank looks more like three unorganized crowds. The nature of these crowds shows itself when we are applying our algorithm: "move from the highest values of scores toward lower". When all scores are different, we have a strictly determined order. When a group of passengers has the same score, we can only pick randomly from this group. The target value for each passenger guides our path over the TPR-FPR plane. So, now we have random wandering.

Let's check the picture. Blue line - we were EXTREMELY lucky and picked passengers from the crowd in the right order: survived first. Orange - opposite extreme, deceased first. The lines close to the diagonal of the rectangle are random trajectories.

There are two points where all lines intersect. Their coordinates are determined by the number of survived and deceased passengers in each group. The number of survived passengers is the height of the rectangle, and the number of deceased passengers is the width.

Phew. Here we are. I hope it is clear now that, when we have a small number of different values from the model, we don't have enough information to rank objects - passengers - deterministically. The sequence which guides ROC becomes stochastic, and ROC becomes a random walk trajectory.

I think it is important because:

๐Ÿ‘‰ This approach closes a serious gap in understanding ROC properties for a wide class of applications. For example, we have a binary feature and a binary target. What is ROC in this case? Now we know.

๐Ÿ‘‰ The cardinality of the model's output is a property which is frequently ignored. But it can be either accidentally blown up by noisy calculations, or reduced by aggressive quantization. We have to be aware of these effects.
Hardcore. Stochastic wandering.

ยซ prev | content

My own investigation

When I started all this commotion, my intuition was as simple as:
๐Ÿ‘‰ tie in score โ†’ stochastic wandering on rectangular grid โ†’ 0s and 1s for up/right moves โ†’ binomial counting โ†’ we can calculate everything

Indeed, if we want to make p steps up and n steps right, we have C(p+n, p) routes - school program. To get distribution over AUC you have to write a function which calculates it from "001101" sequence. Suddenly Young (or Ferrers) diagrams appeared. Bars under the curve increase their height, therefore each zero-one sequence encodes a correct Young diagram. Height of column is number of ones to the left of our current position, we are to sum it up when hit zero. So, AUC numerator is sum of #(all ones to the left of each zero) (column based count) or sum of #(all zeroes to the right of each one) (row based count).

If you meditate long enough, you can understand that AUC numerator is exactly number of correctly ordered pairs (1 on the left side of 0). AUC itself is this number divided by p*n.

Here my mathematical skills left me alone and I dove into the topic with ChatGPT.

Vibe science

ChatGPT gave me closed-form expression for AUC numerator variance so quickly, that I decided that this result is trivial. Big mistake. I found myself in the middle of complex and beautiful mathematics. Vast collection of names and facts started to bury me. So I picked up the most bright and interesting one. Let the show start!

๐Ÿ‘‰ There is a mathematical object which addresses exactly the question we stated: how many paths there are with n negative samples, p positive samples and AUC numerator = q.
๐Ÿ‘‰ Its name - Gaussian (q-binomial) coefficient
๐Ÿ‘‰ Math expression is on the picture
๐Ÿ‘‰ Gaussian coefficient is a polynomial and numbers of paths are coefficients of this polynomial
๐Ÿ‘‰ It's a q-analog of binomial coefficient: when the polynomial variable is set to 1, gaussian coefficient becomes ordinary binomial coefficient
๐Ÿ‘‰ Distribution is unimodal, with maximum around p*n/2
๐Ÿ‘‰ Center of distribution: q = p*n/2
๐Ÿ‘‰ Variance of q: Var(q) = p*n*(p+n+1)/12
๐Ÿ‘‰ Variance of AUC: Var(AUC) = (p+n+1)/(12*p*n)
๐Ÿ‘‰ Mode of distribution corresponds to the central coefficient(s) of the Gaussian coefficient. Sometimes it is one coefficient, sometimes there is a plateau.

Practical implications

๐Ÿ‘น There are some packages which can calculate confidence intervals for ROC AUC, they are not popular
๐Ÿ‘น The approach can be useful for feature selection and experiments with low-cardinality-output models

That's almost it. I think it would be useful to collect all references I found in one more post.
Friday shitposting.

My own photos. The first one demonstrates that it is not always a good idea to show users a randomly generated sequence of letters. The second... Basically, I have no clue. Probably a secret training site for ninjas?
โค4๐Ÿ˜2
Self-portrait

To generate such a magnificent picture, say: "Redraw my photo in MS Paint style, with some likeness, but also off in a confusing way..." I'll try to write about myself under a similar prompt.

The beginning
Trushin Arseniy. Born in the USSR. Rotary phone, 3 Soviet kopecks per bun, 5 per metro ride, 48 for a big chunk of ice cream. Favorite reading - Soviet encyclopedia for youngsters, sister's schoolbooks. Tons of DIY stuff, small "Quantum" magazines. Father - radio amateur. Diodes, LEDs, batteries and lamps - favorite toys.

Science
Ordinary school, special class in the same school, phys-math brilliant macabre: school 54 between Frunzenskaya and Sportivnaya. First own computer - Soviet, 1992. Sura PK 8500. Now I wouldn't dare solve the problems I solved on that computer. Like prove that (1+e) creates a ring if e is a 17th root of 1. I vaguely recall this task, but I can recall that I solved it numerically, and the key point was a page from a student notebook, covered with digits and letters.

PhD in laser physics, 14 articles. Then - change of course. First interview - as a C++ developer. Then: Cadence, Yandex, Equifax, HeadHunter, App in the Air, Yandex again.

The best feedback on my educational activities: "Arseniy Sergeevich, when you explained physics to us, everything was unclear, but very interesting. With our current teacher it is unclear as well, but not interesting anymore."

9 students: MSU, MIPT, RAO. +1 article on ML.

MISC
English - my curse. Always B in school. This blog helps me maintain my current level. I write it from my head, then polish with ChatGPT.

This blog. You know, just working kills me. Even when I implemented my own idea, it was a fascinating year, but it was hard to push it 8x5. For me the panacea is to have something I'm thinking about just for fun. Because we can. This channel is a spaghetti of such thoughts. You can see that there are threads: about TFWR, GBDTE, agents... Some are moving, some are on pause.

Pelevin? Can't say that I'm a fan, but I read everything I have heard about. Of course, Strugatskie. "Monday...", "The Tale...", "A Billion Years..." - my moral DNA. "M&M" by Bulgakov. As the Master put it in "Nine Princes in Amber":

Goodbye and hello, as always.
๐Ÿ‘5โค2
Geometric proof of an algebraic formula

In a friendly chat I saw an idea for a nice toy. It's a well-known thing, and you can check the post with a magnificent picture, and a beautiful site with a lot of deep facts from math, physics, geometry and engineering.

I haven't printed anything for a while, so I opened Blender and created a model from scratch. Actually, two models. The first one used 5 scaled cubes, later joined together. The second one used one cube as a starting point: I stretched it and then extruded these terraces. If you want to see a short video on how to create such models in Blender from scratch, please press the pumpkin emoji.

Then I printed the model, and the funniest part started. You can use three pyramids to create a slab with a ridge on one side. Six pyramids make two ridged slabs. Move them together - and you have one slab. Each pyramid contains x terraces with unit height. So, the volume of this pyramid is 1*1 + 2*2 + 3*3 + ... + x*x. Six pyramids give you a slab with sides x, x + 1, 2*x + 1 (check the picture). And finally:

1*1 + 2*2 + ... + x*x = x * (x + 1) * (2*x + 1) / 6

One thing bothers me. The sum of squares on the left side is clearly an integer. But there is a fraction on the right side of the equation. What if for some x it is not an integer? Please share your thoughts on this subject.
๐ŸŽƒ1
This media is not supported in your browser
VIEW IN TELEGRAM
Friday challenge โœจ

Can you guess how these ice cubes started glowing?

Share your theories in the comments.