Algorithms. Physics. Mathematics. Machine Learning.
355 subscribers
162 photos
9 videos
1 file
75 links
DIY projects, fun with 3d, electronics, programming, vibe coding, math, ML algorithms.
Download Telegram
Cyberpunk we deserved

πŸ‘‰ There was a problem during the RL stage of ChatGPT 5.5, and now the system prompt has to include an instruction suppressing talk about goblins and raccoons.
πŸ‘‰ There is a Factorio mod that turns the game into bureaucracy. Biters bring you complaints, and you have to process them.
AUROC clearly explained

Β« prev | content | next Β»

The worst thing you can do is start explaining ML metrics to stakeholders. I did it a lot. Don’t ever do it. Seriously. But here, in our cozy space with physics, mathematics and ML, we can be luxurious and start discussing how to measure the quality of a model, as promised some time ago.

Our setup:

πŸ‘‰ target β€” the thing we are predicting: survival β€” binary β€” 0/1, false/true
πŸ‘‰ model β€” any function you can imagine; takes all features, returns the Score
πŸ‘‰ Score β€” a number; small ~ false (0) target, big ~ true (1) target

In our informal language, we calculate the score for each passenger using the chain: passenger β†’ features β†’ model β†’ Score. There is a question: whether this Score works, or what? How do we understand it?

Let’s cheat. For a moment, let’s use survived as the Score. Then ask passengers to form a rank by increasing score. Obviously, all survived people would be on the right end of this rank, and deceased people β€” on the left side. There would be a position in this rank where we can split it into two crowds. In the left crowd, for each passenger target = 0, and in the right β€” target = 1.

It gives us the idea. We ask our passengers to form a rank guided by score increase. Then we move from right to left and count survived and deceased. Just counting is not enough. At the end of the day, we would get only the number of passengers in these two classes. We need to memorize the dynamics of these two counts. But how?

A totally genius idea is to use these two meters as coordinates on the plane. Therefore, a survived passenger urges us to make a step up, and a deceased passenger β€” to make a step to the right.

If we cheated, we move all the way up, then all the way to the right. If passengers were randomly shuffled, we move in a narrow band near the diagonal of the rectangle.

Last step β€” scale the axes to transform our bounding rectangle into a square. Here we are. The coordinates are TPR and FPR, our trajectory is ROC, and the area under it is AUROC β€” Area Under the Receiver Operating Characteristic Curve.

I totally believe that you heard all this stuff. But I need a solid base for the madness which follows.
Friday shitposting. 1 May

Thirty-five years ago, in April 1991, I started this project.

The idea was quite simple: use the DRAW operator in QBASIC to draw a stylised β€œ1 MAY” banner. The DRAW operator offered only a small palette of graphical possibilities, and my drawing skills were far outmatched by my ambitions β€” so the project stalled.

Now I think that actually completing a task is often more important than doing it flawlessly. So here we are: the First of May greeting is finally ready.

A few words about the DRAW operator language. It uses a state-machine-style drawing approach with a traced cursor position. Initially, the cursor is in the centre of the screen. Then there are movement commands: R, U, L, and D for right, up, left, and down, plus E, F, G, and H for the 45-degree directions.

For example, U10 moves the cursor 10 points up and draws a segment. BU10 moves the cursor up without drawing. P11,15 paints the current area, flooding it with colour 11 until it reaches areas coloured 15. There are a few more details in this tiny language, but you get the idea.

For me, 35 years ago, it was quite a revelation that there could be a programming language inside another programming language. Nowadays, DSLs β€” domain-specific languages β€” are a powerful tool that every professional engineer should have in their tool belt.

It is also worth mentioning that the languages used by 3D printers and similar machines are quite close in spirit. It would be totally possible to convert this program into a 3D-printer version and print out the title.

Maybe one day...

Feel free to check out my project online - It runs in QBJS, a browser-based QBASIC environment implemented in JavaScript.

P.S. Almost an LLM-free post. The program is hand-written, and the screenshot is honest. Only the text of the post was polished.
πŸ”₯1
Pi !

While I'm preparing a post on random nature of ROC, let's celebrate this point. I was wondering which next number would trigger me, tried to check perfect squares, cubes... Nothing clicked. But today I saw 314 and it was like meeting an old friend.

It's totally irrational. And I'm not talking about Pi. Though it is totally irrational. Now I'm talking about my feelings. I don't know why, but this 314 always catches my attention. In school I managed to recite the first 23 decimal digits.

One of my schoolmates once stunned me. He totally believed that Pi = 3.14 15 16 17 ...

The collection of facts and warm memories seems to be endless. Donald Knuth decided to version his famous TeX program by revealing more and more digits of Pi, because that's how path to the perfection works. There is a serious biological article where biologists discovered a stunning property of anthill have circumference "approximately three times" larger then the diameter of the heap.
πŸ‘1
Friday shitposting. Old Moscow photos.

It seems that I’m stuck in a deep perfectionist loop, endlessly polishing my post about stochastic wandering. Meanwhile, let me share some trash photos I took about 20 years ago.
😁1
Stochastic AUROC

Β« prev | content | next Β»

Five previous posts were a preparation for this one. We prepared age model which uses only age factor. The model gives us the Score with low cardinal number because there are only three values: -1, 0, 1. And we plotted ROC and calculated AUROC.

The question which has bugged me for eleven years, since the moment I heard the ROC and AUROC definitions in the School of Data Analysis, is: "How come there are only three points on ROC, when, by definition, there should be at least 891 points?" The magic number 891 is the number of passengers in the Titanic dataset, and three is the number of different model answers.

Let's start thinking from first principles. What is our ROC approach? To assign a score, to make a rank. But if we have only -1, 0, 1 values, the rank looks more like three unorganized crowds. The nature of these crowds shows itself when we are applying our algorithm: "move from the highest values of scores toward lower". When all scores are different, we have a strictly determined order. When a group of passengers has the same score, we can only pick randomly from this group. The target value for each passenger guides our path over the TPR-FPR plane. So, now we have random wandering.

Let's check the picture. Blue line - we were EXTREMELY lucky and picked passengers from the crowd in the right order: survived first. Orange - opposite extreme, deceased first. The lines close to the diagonal of the rectangle are random trajectories.

There are two points where all lines intersect. Their coordinates are determined by the number of survived and deceased passengers in each group. The number of survived passengers is the height of the rectangle, and the number of deceased passengers is the width.

Phew. Here we are. I hope it is clear now that, when we have a small number of different values from the model, we don't have enough information to rank objects - passengers - deterministically. The sequence which guides ROC becomes stochastic, and ROC becomes a random walk trajectory.

I think it is important because:

πŸ‘‰ This approach closes a serious gap in understanding ROC properties for a wide class of applications. For example, we have a binary feature and a binary target. What is ROC in this case? Now we know.

πŸ‘‰ The cardinality of the model's output is a property which is frequently ignored. But it can be either accidentally blown up by noisy calculations, or reduced by aggressive quantization. We have to be aware of these effects.
Hardcore. Stochastic wandering.

Β« prev | content

My own investigation

When I started all this commotion, my intuition was as simple as:
πŸ‘‰ tie in score β†’ stochastic wandering on rectangular grid β†’ 0s and 1s for up/right moves β†’ binomial counting β†’ we can calculate everything

Indeed, if we want to make p steps up and n steps right, we have C(p+n, p) routes - school program. To get distribution over AUC you have to write a function which calculates it from "001101" sequence. Suddenly Young (or Ferrers) diagrams appeared. Bars under the curve increase their height, therefore each zero-one sequence encodes a correct Young diagram. Height of column is number of ones to the left of our current position, we are to sum it up when hit zero. So, AUC numerator is sum of #(all ones to the left of each zero) (column based count) or sum of #(all zeroes to the right of each one) (row based count).

If you meditate long enough, you can understand that AUC numerator is exactly number of correctly ordered pairs (1 on the left side of 0). AUC itself is this number divided by p*n.

Here my mathematical skills left me alone and I dove into the topic with ChatGPT.

Vibe science

ChatGPT gave me closed-form expression for AUC numerator variance so quickly, that I decided that this result is trivial. Big mistake. I found myself in the middle of complex and beautiful mathematics. Vast collection of names and facts started to bury me. So I picked up the most bright and interesting one. Let the show start!

πŸ‘‰ There is a mathematical object which addresses exactly the question we stated: how many paths there are with n negative samples, p positive samples and AUC numerator = q.
πŸ‘‰ Its name - Gaussian (q-binomial) coefficient
πŸ‘‰ Math expression is on the picture
πŸ‘‰ Gaussian coefficient is a polynomial and numbers of paths are coefficients of this polynomial
πŸ‘‰ It's a q-analog of binomial coefficient: when the polynomial variable is set to 1, gaussian coefficient becomes ordinary binomial coefficient
πŸ‘‰ Distribution is unimodal, with maximum around p*n/2
πŸ‘‰ Center of distribution: q = p*n/2
πŸ‘‰ Variance of q: Var(q) = p*n*(p+n+1)/12
πŸ‘‰ Variance of AUC: Var(AUC) = (p+n+1)/(12*p*n)
πŸ‘‰ Mode of distribution corresponds to the central coefficient(s) of the Gaussian coefficient. Sometimes it is one coefficient, sometimes there is a plateau.

Practical implications

πŸ‘Ή There are some packages which can calculate confidence intervals for ROC AUC, they are not popular
πŸ‘Ή The approach can be useful for feature selection and experiments with low-cardinality-output models

That's almost it. I think it would be useful to collect all references I found in one more post.
Friday shitposting.

My own photos. The first one demonstrates that it is not always a good idea to show users a randomly generated sequence of letters. The second... Basically, I have no clue. Probably a secret training site for ninjas?
❀2