This media is not supported in your browser
VIEW IN TELEGRAM
Friday challenge β¨
Can you guess how these ice cubes started glowing?
Share your theories in the comments.
Can you guess how these ice cubes started glowing?
Share your theories in the comments.
π₯3
A reading list on Gaussian binomial coefficients, q-binomial coefficients, and their connection to stochastic AUROC
Β« prev | content | next Β»
Part 1
Quick orientation / popular sources
π Wikipedia β Gaussian binomial coefficient The best first overview.
π MathWorld β q-Binomial Coefficient. A compact formula reference.
π NIST DLMF, Chapter 17 β q-Hypergeometric and Related Functions; q-Pochhammer symbols, q-binomial coefficients, and the q-binomial theorem. Not the friendliest introduction, but very reliable for formulas.
π An Invitation to Enumeration β q-analogues; One of the clearest online introductions to the topic. This is exactly the language needed for random ROC wandering, where a path is weighted by its area.
Gentle textbook-style entry points
π George Andrews, Kimmo Eriksson β Integer Partitions Probably the best textbook-style entry point. It has a chapter on Gaussian polynomials, lattice paths and q-binomial numbers, and the q-binomial theorem. A good source if you want understanding rather than just formulas.
π Peter Cameron β Notes on Counting: An Introduction to Enumerative Combinatorics A useful source for q-analogues from the finite-vector-space point of view. It explains why the Gaussian coefficient, when q is a prime power, counts k-dimensional subspaces of an n-dimensional vector space over GF(q). This is not the main route for ROC, but it helps explain why the topic is so large.
π Laszlo Babai-style notes β q-combinatorics Short lecture notes.
π Herbert Wilf β generatingfunctionology Not specifically about Gaussian binomial coefficients, but extremely useful for learning the general language of generating functions. For our purposes, the key habit is: a discrete distribution can be encoded as the coefficients of a polynomial. That is exactly what happens with the AUROC distribution.
π Richard Stanley β Enumerative Combinatorics, Volume 1 The classic serious textbook. Not a light introduction, but an excellent reference if you want to place q-binomial coefficients inside the broader world of inversions, partitions, posets, generating functions, and q-analogues.
Historical sources
π H. A. Rothe β Handbuch der reinen Mathematik, 1811 Historically important. The original is not an easy read, but Rothe is often mentioned as one of the early published sources for the q-binomial theorem.
π P. A. MacMahon β Combinatory Analysis, 1916 A central source for the combinatorial and generating-function tradition. If your interest is βarea under a path,β βinversions,β and βpartitions,β MacMahon is closer to our ROC story than the purely analytic q-series tradition.
π Leonard Carlitz β A set of polynomials, 1940 Not the easiest entry point, but Carlitz is important in the later q-polynomial and finite-field tradition.
π Frank Wilcoxon β Individual Comparisons by Ranking Methods, 1945 This is not about q-binomial coefficients, but it is important for the statistical side of the story. It belongs to the origin of rank-based methods, which are directly connected to AUROC.
π Mann, Whitney β On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other, 1947 The key historical source for the MannβWhitney U statistic. AUROC can be viewed as a normalized MannβWhitney statistic. In our setting, random tie-breaking inside score blocks gives a finite distribution of a closely related statistic.
Hard but interesting sources
π George Andrews β The Theory of Partitions A classic work in partition theory. Useful if you want to understand Gaussian polynomials as generating functions for partitions. Deeper and harder than AndrewsβEriksson.
π Gasper, Rahman β Basic Hypergeometric Series also option2 The heavy analytic side of q-series. Not necessary for the ROC project at the beginning, but it is a standard deep reference if you move toward basic hypergeometric series.
Β« prev | content | next Β»
Part 1
Quick orientation / popular sources
π Wikipedia β Gaussian binomial coefficient The best first overview.
π MathWorld β q-Binomial Coefficient. A compact formula reference.
π NIST DLMF, Chapter 17 β q-Hypergeometric and Related Functions; q-Pochhammer symbols, q-binomial coefficients, and the q-binomial theorem. Not the friendliest introduction, but very reliable for formulas.
π An Invitation to Enumeration β q-analogues; One of the clearest online introductions to the topic. This is exactly the language needed for random ROC wandering, where a path is weighted by its area.
Gentle textbook-style entry points
π George Andrews, Kimmo Eriksson β Integer Partitions Probably the best textbook-style entry point. It has a chapter on Gaussian polynomials, lattice paths and q-binomial numbers, and the q-binomial theorem. A good source if you want understanding rather than just formulas.
π Peter Cameron β Notes on Counting: An Introduction to Enumerative Combinatorics A useful source for q-analogues from the finite-vector-space point of view. It explains why the Gaussian coefficient, when q is a prime power, counts k-dimensional subspaces of an n-dimensional vector space over GF(q). This is not the main route for ROC, but it helps explain why the topic is so large.
π Laszlo Babai-style notes β q-combinatorics Short lecture notes.
π Herbert Wilf β generatingfunctionology Not specifically about Gaussian binomial coefficients, but extremely useful for learning the general language of generating functions. For our purposes, the key habit is: a discrete distribution can be encoded as the coefficients of a polynomial. That is exactly what happens with the AUROC distribution.
π Richard Stanley β Enumerative Combinatorics, Volume 1 The classic serious textbook. Not a light introduction, but an excellent reference if you want to place q-binomial coefficients inside the broader world of inversions, partitions, posets, generating functions, and q-analogues.
Historical sources
π H. A. Rothe β Handbuch der reinen Mathematik, 1811 Historically important. The original is not an easy read, but Rothe is often mentioned as one of the early published sources for the q-binomial theorem.
π P. A. MacMahon β Combinatory Analysis, 1916 A central source for the combinatorial and generating-function tradition. If your interest is βarea under a path,β βinversions,β and βpartitions,β MacMahon is closer to our ROC story than the purely analytic q-series tradition.
π Leonard Carlitz β A set of polynomials, 1940 Not the easiest entry point, but Carlitz is important in the later q-polynomial and finite-field tradition.
π Frank Wilcoxon β Individual Comparisons by Ranking Methods, 1945 This is not about q-binomial coefficients, but it is important for the statistical side of the story. It belongs to the origin of rank-based methods, which are directly connected to AUROC.
π Mann, Whitney β On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other, 1947 The key historical source for the MannβWhitney U statistic. AUROC can be viewed as a normalized MannβWhitney statistic. In our setting, random tie-breaking inside score blocks gives a finite distribution of a closely related statistic.
Hard but interesting sources
π George Andrews β The Theory of Partitions A classic work in partition theory. Useful if you want to understand Gaussian polynomials as generating functions for partitions. Deeper and harder than AndrewsβEriksson.
π Gasper, Rahman β Basic Hypergeometric Series also option2 The heavy analytic side of q-series. Not necessary for the ROC project at the beginning, but it is a standard deep reference if you move toward basic hypergeometric series.
Β« prev | content
Reading list. Part 2.
π Andrews, Askey, Roy β Special Functions A broad and serious reference on special functions. Useful if you want to see the q-binomial theorem as part of the larger world of special functions.
π Kathleen OβHara β Unimodality of Gaussian coefficients: A constructive proof, 1990 A famous paper on the unimodality of Gaussian coefficients. Difficult, but important if you are interested in the shape of the distribution: why the coefficients rise toward the middle and then fall.
π Doron Zeilberger β Kathy OβHaraβs Constructive Proof of the Unimodality of the Gaussian Polynomials, 1989 A more explanatory bridge to OβHaraβs proof. Still not easy, but easier than starting with the original paper.
π Pak, Panova β Strict unimodality of q-binomial coefficients, 2013 A serious paper on strict unimodality of q-binomial coefficients. Interesting if you want to go beyond βthere is a maximum near the centreβ and understand finer shape properties of the distribution.
π Dhand β A combinatorial proof of strict unimodality for q-binomial coefficients, 2014 A combinatorial proof of strict unimodality in large cases. Hard, but closer in spirit to the combinatorial nature of our problem.
Sources connecting this to ROC AUC
π Donald Bamber β The area above the ordinal dominance graph and the area below the receiver operating characteristic graph, 1975 A very important bridge between ROC AUC and the MannβWhitney U statistic. A good reference for the claim that the area under the ROC curve is a pairwise ranking statistic.
π Hanley, McNeil β The meaning and use of the area under a receiver operating characteristic curve, 1982 A classic applied paper on the meaning of ROC AUC. Useful for the interpretation of AUC as the probability that a randomly chosen positive example receives a higher score than a randomly chosen negative example.
π Green, Swets β Signal Detection Theory and Psychophysics, 1966 Classic background on signal detection theory and ROC curves. Less about discrete tied score blocks, more about the historical and conceptual origins of ROC analysis.
Suggested reading order for the stochastic ROC project
First:
π Wikipedia β Gaussian binomial coefficient
π An Invitation to Enumeration β q-analogues
Then:
π Andrews, Eriksson β Integer Partitions
π Bamber β area below ROC and the MannβWhitney connection
Then, for a more serious foundation:
π Stanley β Enumerative Combinatorics
π Mann, Whitney, 1947
And only then, if you want to go deeper into the shape of the distribution:
π OβHara β Unimodality of Gaussian coefficients
π Pak, Panova β Strict unimodality of q-binomial coefficients
Reading list. Part 2.
π Andrews, Askey, Roy β Special Functions A broad and serious reference on special functions. Useful if you want to see the q-binomial theorem as part of the larger world of special functions.
π Kathleen OβHara β Unimodality of Gaussian coefficients: A constructive proof, 1990 A famous paper on the unimodality of Gaussian coefficients. Difficult, but important if you are interested in the shape of the distribution: why the coefficients rise toward the middle and then fall.
π Doron Zeilberger β Kathy OβHaraβs Constructive Proof of the Unimodality of the Gaussian Polynomials, 1989 A more explanatory bridge to OβHaraβs proof. Still not easy, but easier than starting with the original paper.
π Pak, Panova β Strict unimodality of q-binomial coefficients, 2013 A serious paper on strict unimodality of q-binomial coefficients. Interesting if you want to go beyond βthere is a maximum near the centreβ and understand finer shape properties of the distribution.
π Dhand β A combinatorial proof of strict unimodality for q-binomial coefficients, 2014 A combinatorial proof of strict unimodality in large cases. Hard, but closer in spirit to the combinatorial nature of our problem.
Sources connecting this to ROC AUC
π Donald Bamber β The area above the ordinal dominance graph and the area below the receiver operating characteristic graph, 1975 A very important bridge between ROC AUC and the MannβWhitney U statistic. A good reference for the claim that the area under the ROC curve is a pairwise ranking statistic.
π Hanley, McNeil β The meaning and use of the area under a receiver operating characteristic curve, 1982 A classic applied paper on the meaning of ROC AUC. Useful for the interpretation of AUC as the probability that a randomly chosen positive example receives a higher score than a randomly chosen negative example.
π Green, Swets β Signal Detection Theory and Psychophysics, 1966 Classic background on signal detection theory and ROC curves. Less about discrete tied score blocks, more about the historical and conceptual origins of ROC analysis.
Suggested reading order for the stochastic ROC project
First:
π Wikipedia β Gaussian binomial coefficient
π An Invitation to Enumeration β q-analogues
Then:
π Andrews, Eriksson β Integer Partitions
π Bamber β area below ROC and the MannβWhitney connection
Then, for a more serious foundation:
π Stanley β Enumerative Combinatorics
π Mann, Whitney, 1947
And only then, if you want to go deeper into the shape of the distribution:
π OβHara β Unimodality of Gaussian coefficients
π Pak, Panova β Strict unimodality of q-binomial coefficients
Friday trash
Be careful, the first photo is really a kind of trash. 100 kilometers north of Moscow, kids are playing with a dead snake. If your toys were wooden, you were lucky.
Near Zvenigorod there is a bunch of identical houses. I don't know why, but it still looks funny to me. I heard that there is a business around some legislative procedures: before signing a contract, you place some dummy structures on your plot and pay less for registration, or something like that. The next year, all these houses were removed.
Round-leaved sundew. Near the artificial lake Sima.
Be careful, the first photo is really a kind of trash. 100 kilometers north of Moscow, kids are playing with a dead snake. If your toys were wooden, you were lucky.
Near Zvenigorod there is a bunch of identical houses. I don't know why, but it still looks funny to me. I heard that there is a business around some legislative procedures: before signing a contract, you place some dummy structures on your plot and pay less for registration, or something like that. The next year, all these houses were removed.
Round-leaved sundew. Near the artificial lake Sima.
π2
Implicit Curriculum aka skill graph
Why should I read all these articles? - I asked myself this week, when I suddenly found this diamond.
To explain why I jumped out of my chair, let me tell you about my previous team. It was Schoolbook. My second assignment after I joined it was to figure out how to build, store and edit a skill graph. It is quite an intuitive idea: in order to solve quadratic equations, you have to understand square roots, summation, multiplication and so on. These simple skills are prerequisites. You can represent these dependencies using a DAG (directed acyclic graph). There were a lot of professional teachers who built mathematics and Russian language graphs in our team. While the idea seems quite straightforward and intuitive, its implementation is the opposite of simple and clear. It took me half a year to understand how it should work and how to implement it in a production environment.
But what about the article? It gives a nice landscape of how skills emerge during model training on unstructured internet data. The described articles give important understanding of how skills emerge in large language models, how to evaluate them, how to detect phase transitions in a model and how to find internal representations of skills inside the model.
For me, these findings are very important because they are very close to the bridge between model training and human education. I believe that our brain is just a very complex classical computer and that insights can be transferred from human education practice to machine learning and back. After reading the article, I feel much more confident that the skill graph is a real thing, not just a figment of our teachers' imagination.
There is quite a well-shaped method of model evaluation. Probably, it can be transferred to the realities of educational platforms and give us metrics for educational efficiency.
Why should I read all these articles? - I asked myself this week, when I suddenly found this diamond.
To explain why I jumped out of my chair, let me tell you about my previous team. It was Schoolbook. My second assignment after I joined it was to figure out how to build, store and edit a skill graph. It is quite an intuitive idea: in order to solve quadratic equations, you have to understand square roots, summation, multiplication and so on. These simple skills are prerequisites. You can represent these dependencies using a DAG (directed acyclic graph). There were a lot of professional teachers who built mathematics and Russian language graphs in our team. While the idea seems quite straightforward and intuitive, its implementation is the opposite of simple and clear. It took me half a year to understand how it should work and how to implement it in a production environment.
But what about the article? It gives a nice landscape of how skills emerge during model training on unstructured internet data. The described articles give important understanding of how skills emerge in large language models, how to evaluate them, how to detect phase transitions in a model and how to find internal representations of skills inside the model.
For me, these findings are very important because they are very close to the bridge between model training and human education. I believe that our brain is just a very complex classical computer and that insights can be transferred from human education practice to machine learning and back. After reading the article, I feel much more confident that the skill graph is a real thing, not just a figment of our teachers' imagination.
There is quite a well-shaped method of model evaluation. Probably, it can be transferred to the realities of educational platforms and give us metrics for educational efficiency.
π3β€1
This media is not supported in your browser
VIEW IN TELEGRAM
The simplest cacti sort
At the last moment I got a request to write the smallest possible program which sorts cacti in The Farmer Was Replaced game.
Here is my version:
Let's work through this solution:
Here I both set the world size to 8 and introduce a new variable n for the world size.
Let's check an interesting idiom:
I introduce the flag disorder_detected and put this flag up before the loop. This way the loop will be executed at least once. Then, at the beginning of the loop, I put it down. It means that if there are no disorders, the field is ready for harvest. Disorders are: absence of cactus, or cacti in the wrong order.
Navigation is very simple. I have two nested loops. The inner loop moves the drone horizontally from left to right. The outer loop moves it vertically. The map is toroidal, therefore the drone appears on the left after crossing the right boundary; similarly, it appears at the bottom after crossing the upper boundary.
When we start the script, the field is empty. So we plant cactus on empty cell and count it as a disorder. Cactus requires Soil, we ensure it with till().
And finally, the main thing:
If our current cactus is higher than its neighbor to the right or above, we swap them. This way each pass of the drone brings more order into the field, and one day the field becomes sorted. On a sorted field no swaps occur, therefore no disorder is detected and the while loop ends.
The drone stops at the left bottom end of the field, and we can call harvest() function, which picks up all cacti with a big bonus.
At the last moment I got a request to write the smallest possible program which sorts cacti in The Farmer Was Replaced game.
Here is my version:
n = 8
set_world_size(n)
disorder_detected = True
while(disorder_detected):
disorder_detected = False
for y in range(n):
for x in range(n):
if get_entity_type() != Entities.Cactus:
if get_ground_type() != Grounds.Soil:
till()
if plant(Entities.Cactus):
disorder_detected = True
if measure() != None and measure(East) != None and measure() > measure(East):
if swap(East):
disorder_detected = True
if measure() != None and measure(North) != None and measure() > measure(North):
if swap(North):
disorder_detected = True
move(East)
move(North)
harvest()
Let's work through this solution:
n = 8
set_world_size(n)
Here I both set the world size to 8 and introduce a new variable n for the world size.
Let's check an interesting idiom:
disorder_detected = True
while(disorder_detected):
disorder_detected = False
...
if plant(Entities.Cactus):
disorder_detected = True
if swap(East):
disorder_detected = True
I introduce the flag disorder_detected and put this flag up before the loop. This way the loop will be executed at least once. Then, at the beginning of the loop, I put it down. It means that if there are no disorders, the field is ready for harvest. Disorders are: absence of cactus, or cacti in the wrong order.
Navigation is very simple. I have two nested loops. The inner loop moves the drone horizontally from left to right. The outer loop moves it vertically. The map is toroidal, therefore the drone appears on the left after crossing the right boundary; similarly, it appears at the bottom after crossing the upper boundary.
if get_entity_type() != Entities.Cactus:
if get_ground_type() != Grounds.Soil:
till()
if plant(Entities.Cactus):
disorder_detected = True
When we start the script, the field is empty. So we plant cactus on empty cell and count it as a disorder. Cactus requires Soil, we ensure it with till().
And finally, the main thing:
if measure() != None and measure(East) != None and measure() > measure(East):
if swap(East):
disorder_detected = True
if measure() != None and measure(North) != None and measure() > measure(North):
if swap(North):
disorder_detected = True
If our current cactus is higher than its neighbor to the right or above, we swap them. This way each pass of the drone brings more order into the field, and one day the field becomes sorted. On a sorted field no swaps occur, therefore no disorder is detected and the while loop ends.
The drone stops at the left bottom end of the field, and we can call harvest() function, which picks up all cacti with a big bonus.
π2
5. Longest Palindromic Substring
My friend is solving some LeetCode problems now, and I couldn't help but join him.
I know, I know... LeetCode is dead, all this is trash, and we shouldn't solve it. But for now the standards of interviews in Big Tech are settled, and you can, at least, laugh at us, who spend tons of time performing cargo cult rituals.
First of all, it's like a 20-minute problem. In these 20 minutes you have to:
β 0 min
βοΈ read the problem statement
βοΈ come up with a few test cases - it helps both to understand the problem better and to have test cases for the later testing session
βοΈ figure out a working solution for the problem
βοΈ explain your approach to the interviewer, give them T=O(...), M=O(...) estimations
βοΈ get approval from the interviewer to go on with this solution
- 5 min
βοΈ implement your approach
βοΈ test your implementation on your test cases
βοΈ fix it, if necessary
βοΈ check that your T=O(...), M=O(...) are true
- 18 min
βοΈ discuss pros and cons
- 20 min
I'm not a quick thinker, and in order to be able to take part in this competition, I thought a lot and came up with some tricks which I want to share. First of all, how to read the statement and how to approach a solution.
"the longest palindromic substring" - if you, like me, start to think "I read really beautiful algorithms which solve it in linear time, what was that book..." - poof, you have already lost this game. Nice thought. But you have to mention it really quickly and move on. What is the real complexity you want to aim at? Check s.length <= 1000 and think about the naive approach. All substrings - n squared, palindrome check - n => n cubed. Rough estimation for time - 10 secs in Python - doesn't look good.
Here you have to recall at least a glimpse of that wonderful algorithm you read about. There was something... about... middle points of palindromes! Let's check. n middle points, linear check => n squared. 0.01s for n = 1000. Sounds promising.
In the program I want to share a trick: decomposition. Of course, you can try to write a program in one long sheet of text. But if you split it into logical parts, it's better for you. Python gives you a great opportunity - iterators. In the function you can generate candidates and then select the best using just the max function.
In order to use just max, we are to be careful about an empty sequence. It is not a case here, because n >= 1. If an empty sequence is possible, default = ... in the max function is your friend.
For odd-length palindromes, candidate generation is quite boring. We start with the trivial one-letter palindrome and extend it, if it is possible. No problem.
Even-length palindromes have a real chance to be absent at all. One more trick - make a step back and set up variables in the previous position before they form a valid even palindrome. Then we can check whether the variables are still in this weird initial position. If they are, we don't have a candidate.
So, here we are. A few tricks, and the problem is solved.
My friend is solving some LeetCode problems now, and I couldn't help but join him.
I know, I know... LeetCode is dead, all this is trash, and we shouldn't solve it. But for now the standards of interviews in Big Tech are settled, and you can, at least, laugh at us, who spend tons of time performing cargo cult rituals.
First of all, it's like a 20-minute problem. In these 20 minutes you have to:
β 0 min
βοΈ read the problem statement
βοΈ come up with a few test cases - it helps both to understand the problem better and to have test cases for the later testing session
βοΈ figure out a working solution for the problem
βοΈ explain your approach to the interviewer, give them T=O(...), M=O(...) estimations
βοΈ get approval from the interviewer to go on with this solution
- 5 min
βοΈ implement your approach
βοΈ test your implementation on your test cases
βοΈ fix it, if necessary
βοΈ check that your T=O(...), M=O(...) are true
- 18 min
βοΈ discuss pros and cons
- 20 min
I'm not a quick thinker, and in order to be able to take part in this competition, I thought a lot and came up with some tricks which I want to share. First of all, how to read the statement and how to approach a solution.
"the longest palindromic substring" - if you, like me, start to think "I read really beautiful algorithms which solve it in linear time, what was that book..." - poof, you have already lost this game. Nice thought. But you have to mention it really quickly and move on. What is the real complexity you want to aim at? Check s.length <= 1000 and think about the naive approach. All substrings - n squared, palindrome check - n => n cubed. Rough estimation for time - 10 secs in Python - doesn't look good.
Here you have to recall at least a glimpse of that wonderful algorithm you read about. There was something... about... middle points of palindromes! Let's check. n middle points, linear check => n squared. 0.01s for n = 1000. Sounds promising.
In the program I want to share a trick: decomposition. Of course, you can try to write a program in one long sheet of text. But if you split it into logical parts, it's better for you. Python gives you a great opportunity - iterators. In the function you can generate candidates and then select the best using just the max function.
In order to use just max, we are to be careful about an empty sequence. It is not a case here, because n >= 1. If an empty sequence is possible, default = ... in the max function is your friend.
For odd-length palindromes, candidate generation is quite boring. We start with the trivial one-letter palindrome and extend it, if it is possible. No problem.
Even-length palindromes have a real chance to be absent at all. One more trick - make a step back and set up variables in the previous position before they form a valid even palindrome. Then we can check whether the variables are still in this weird initial position. If they are, we don't have a candidate.
So, here we are. A few tricks, and the problem is solved.
πΏ1