Aspiring Data Science
318 subscribers
386 photos
10 videos
6 files
1.41K links
Заметки экономиста о программировании, прогнозировании и принятии решений, научном методе познания.
Контакт: @fingoldo

I call myself a data scientist because I know just enough math, economics & programming to be dangerous.
Download Telegram
#statistics #fitter

Как компактно описать 1d массив данных неизвестной природы в разумное время? Авторы fitter-а упоминают 80+ распределений из scipy (а по факту фиттятся уже 109). Это занимает даже на скромном массиве длины 1_000 уже 2.5 минуты. При этом многие распределения ну явно близняшки. На добрую половину наверняка не стоило тратить время. Как бы соптимайзить? Найти что-то вроде "опорных распределений"... Best bang for your buck.

get_common_distributions() даёт [ "cauchy", "chi2", "expon", "exponpow", "gamma", "lognorm", "norm", "powerlaw", "rayleigh", "uniform"], но по какому принципу они выбраны?

Хотелось бы получить о данных наиболее полную картину, используя как можно меньше распределений. К примеру, что, если входные данные - это микс 3 распределений?

https://github.com/cokelaer/fitter/issues/81
Forwarded from Artem Ryblov’s Data Science Weekly (Artem Ryblov)
Mindful Modeler by Christoph Molnar

The newsletter combines the best of two worlds: the performance mindset of machine learning and the mindfulness of statistical thinking.

Machine learning has become mainstream while falling short in the silliest ways: lack of interpretability, biased and missing data, wrong conclusions, … To statisticians, these shortcomings are often unsurprising. Statisticians are relentless in their quest to understand how the data came about. They make sure that their models reflect the data-generating process and interpret models accordingly.
In a sea of people who basically know how to model.fit() and model.predict() you can stand out by bringing statistical thinking to the arena.
Sign up for this newsletter to combine performance-driven machine learning with statistical thinking. Become a mindful modeller.

You'll learn about:
- Thinking like a statistician while performing like a machine learner
- Spotting non-obvious data problems
- Interpretable machine learning
- Other modelling mindsets such as causal inference and prompt engineering

Link
https://mindfulmodeler.substack.com/

Navigational hashtags: #armknowledgesharing #armnewsletters
General hashtags: #modelling #modeling #ml #machinelearning #statistics #modelinterpretation #data #interpretability #casualinference

@accelerated_learning
#chess #statistics #simulation #cheating #visualization #stockfish

Красивые визуализации шахмат от Дориана Квелле. Интересная метрика average loss after novelty, видно, что он сам придумал, классная идея.

"Yet, the standout performer is David Howell, with an average score of 0.9071. Despite his tendency to withdraw or join late to tournaments, this strategy seems to benefit him as he avoids facing high-performing players in later rounds. With a record of 120 wins, 14 draws, and only 6 losses, Howell clearly dominates the field."

"Yet, the left plot exposes a glaring inconsistency in Kramnik’s performance. Despite his high accuracy in playing engine-recommended moves, he registers a higher average loss per move than one would expect for a player of his Elo rating. This suggests that while Kramnik plays the engine move in most cases, he is also prone to significant blunders. Playing the fools mate because you’re suspecting a player of cheating doesn’t help either."

https://dorianquelle.github.io/blog/Cheating-In-Titled-Tuesday/
Forwarded from Artem Ryblov’s Data Science Weekly (Artem Ryblov)
Thinking Clearly with Data: A Guide to Quantitative Reasoning and Analysis by Ethan Bueno de Mesquita, Anthony Fowler

An introduction to data science or statistics shouldn’t involve proving complex theorems or memorizing obscure terms and formulas, but that is exactly what most introductory quantitative textbooks emphasize. In contrast, Thinking Clearly with Data focuses, first and foremost, on critical thinking and conceptual understanding in order to teach students how to be better consumers and analysts of the kinds of quantitative information and arguments that they will encounter throughout their lives.

Among much else, the book teaches how to assess whether an observed relationship in data reflects a genuine relationship in the world and, if so, whether it is causal; how to make the most informative comparisons for answering questions; what questions to ask others who are making arguments using quantitative evidence; which statistics are particularly informative or misleading; how quantitative evidence should and shouldn’t influence decision-making; and how to make better decisions by using moral values as well as data.

- An ideal textbook for introductory quantitative methods courses in data science, statistics, political science, economics, psychology, sociology, public policy, and other fields
- Introduces the basic toolkit of data analysis―including sampling, hypothesis testing, Bayesian inference, regression, experiments, instrumental variables, differences in differences, and regression discontinuity
- Uses real-world examples and data from a wide variety of subjects
- Includes practice questions and data exercises

Link: https://www.amazon.com/Thinking-Clearly-Data-Quantitative-Reasoning/dp/0691214352

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #datascience #correlation #regression #causation #randomizedexperiments #statistics

@data_science_links
Forwarded from asisakov
Please open Telegram to view this post
VIEW IN TELEGRAM