Developer's notes

Heavy tails

Recently I remembered times when I was hard studying probability theory, though here I’m not going to bring up with formulas and faint definitions, let’s think about a simple and easily observed physical value, namely the height of a grown-up (over 18 years) man. Remember your acquaintances, people who you face during your commuting, and so on. Of course, I can’t be certain about your neighborhood, but in mine, their height varies from 160cm to 195cm, where the boundaries aren’t too often. Is it possible to face a guy higher or lower than that? – Surely, but how often? – Not that much (I suppose you don’t hang out with a basketball team). If you don’t believe that people higher than 195cm are a rare event to observe here you can read a bit about how few of them are.

Hopefully, by this point, everyone has realized that men’s height is a value mostly concentrated in a certain range. It’s called Gaussian or normal distribution and is represented in the form of those boring formulas I promised not to bring up here. Briefly, it means the following: if we lay a height on the X axis and its share over the population by the Y axis, we’ll get a figure close to a bell, with its maximum at the average value and with symmetrical borders in a certain range. Everything is already drawn for us (scroll to the phrase "Height is normally distributed").

Are all the values distributed according to the normal distribution? - No. Let’s consider another very practical value – a person’s income, very often we can’t know this value for their people but we can estimate it by their expenses. If you spend some time thinking over it you’ll probably realize that there are much fewer rich people than there are poor people, and if you take a person who is considered to be rich in your neighborhood and multiply their income by ten and try to find someone with this income you’ll immediately realize that they are even more rare: there are around 3 thousand of billionaires over the world. And again, all these facts were summarized and expressed as a mathematical law a long time ago. Look at the first picture there: there is a maximum at a low value and then an infinitely long tail coming to the right. If we apply this law to the example with an income it’ll give us the following picture: most people have several hundred dollars a month and then rarely and more rarely we’ll find richer people ending with very separate points representing people from the Forbes list. It’d be much funnier if we try to apply the same to the example about men’s height: most men are 170cm tall though it’s possible to find one who is as tall as the Eifel Tower or Empire-State-building, or one whose height equals to the distance from the Earth to Pluto…

These kinds of distributions are called heavy-tailed ones, several years ago I started reading the famous “The Black Swan” book, but ended it up soon because I’d already learned the statistics.

#flood #education #math #probability_theory #books

🔥1

20 views19:44

Developer's notes

Is there a chance?

Let’s go on with the section “Probability theory for the little ones”. If you’ve opened and flicked through the link from the previous post you might have noticed many obvious points, namely: height is dependent on many factors, and in various regions average height is different – people from Montenegro are tall even among Europeans, and people from Thailand are low. It means if we took data from only one region, better monoethnic, and even better only people of the same year of birth we would get the same “bell” with a narrower range. By the way, its range is determined by a so-called standard deviation (a square root of the variance). The impact of many different factors is exactly why height is distributed according to normal distribution.

Are there any other values distributed according to it? – Surely! They occur especially often when it relates to “real” physical values: results of experiments or measurements, results of a production line, it can be a volume of ice cream or a volume of espresso cooked by a coffee machine with the same settings. If you don’t believe it's possible, look at it. However, if an error (3 * standard derivation) is much less than the nominal value a customer should not be worried about it. In the given link the error lies in the range of -1…+1 grams per 1000 grams.

One can think that the milk filling line is just a bad example and in other industries, everything must be absolutely precise, in fact, – no, those errors are unavoidable, and the best that we can limit them under certain conditions. Chipset factories are a multi-billion business but they can’t guarantee all CPUs produced on the same line have equal characteristics, instead they are tested and classified afterward.

#flood #education #math #probability_theory

👍1

26 views14:36

Developer's notes

How to sum it up?

It’s a weekend, so I’d like to return to the topic I introduced a week ago, in other words, it’s our probability theory section. Namely, reconsider the example about the milk filling caning line: for us, it’s important that there is an error +- 1 gram per kilo. This time, let’s discuss what will be the error if we buy 100 milk bottles where each one is one kilo.

First of all, it’s necessary to understand that the weight of a particular bottle is randomly scattered in a range of 999…1001 grams being distributed accordingly to the normal distribution. So, if I sum up one hundred random values, barely I can state that the result will always be something like 100 kilo + 42 grams (or any other particularly given number). The result must be a random value, too.

Now let me translate the task into terms of the probability theory: there are one hundred identically distributed random values, whose mean equals a kilo and standard deviation equals 1/3 gram (coming from the 3-sigma rule). There is a complex matter for me here: it’s impossible to solve this problem not involving mathematical equations, but I particularly don’t want to use them in my posts. I just state that means and variations are just summed up, and the most significant thing is that the error is dependent on the standard derivation – a square root of variance.

In numbers we’ll get the following: the mean (of the sum) is 100 kilos, the variation for one bottle is a square of the standard deviation – 1/9, thus the variation of the sum is 100/9, and, finally, the standard deviation of the sum is 10/3, applying 3-sigma rule the wanted error is 10 grams. Pay attention a relative error for 1 kilo is 1/1000 = 0.1% and for 100 kilos it’s 10/100 000 = 0.01%. This result isn’t accidental.

Because there are things to be added here, it’s to be continued.

#flood #education #math #probability_theory

👍1

18 views13:02

Developer's notes

Is there a limit?

Last time when I was translating the task into probability theory terms, I assumed that each separate random value is distributed accordingly to the Normal distribution, and then I applied a rule that the means and variances can be summed up. Actually, there is a more interesting way to solve this task, namely to use the Central limit theorem (CLT). As you may guess there is no way that I’ll burden my readers with a proof of the theorem or with its truly mathematical representation. Without further ado, the theorem states: there are n equally and independently distributed values with a finite mean and variance then its sum is approximately distributed according to the Normal distribution and its mean and variance are the same as in the previous post.

Explain it one more time: we get the same result with weakened conditions. It doesn’t require that each separate value is normally distributed instead of that we say that any distribution with a finite mean and variance is suitable (except heavy-tailed ones because they don’t have a mean). However, having rejected this assumption I’d need to calculate a variance of one value by the given error not relying on the 3-sigma rule.

This short article shows why the Normal distribution is so popular and important. I won’t talk about it any longer, next time I’ll return to the heavy-tailed distributions.

#flood #education #math #probability_theory

🔥1

16 views12:06

About

Blog

Apps

Platform