Aspiring Data Science – Telegram

Aspiring Data Science

@aspiringdatascience

370 subscribers

425 photos

11 videos

11 files

1.88K links

Заметки экономиста о программировании, прогнозировании и принятии решений, научном методе познания.
Контакт: @fingoldo

I call myself a data scientist because I know just enough math, economics & programming to be dangerous.

Download Telegram

About

Blog

Apps

Platform

Aspiring Data Science

370 subscribers

Aspiring Data Science

#ml #metrics #brier

Как известно, оценка Бриера (Брайера?) для бинарного классификатора представляет собой по сути среднеквадратическую ошибку между реальными исходами и предсказанными вероятностями. В теории это число между 0 и 1, где 0 означает идеальную калибрацию (из всех событий, предсказанных с вероятностью 25%, реализовались точно 25%, и тд). Я на эту метрику в работе часто смотрю, т.к. откалиброванность модельки очень важна, особенно когда бизнес-решения принимаются на вероятностях. И вот сегодня узнал нечто новое. Задумался, а чего вообще можно ожидать от модели, идеально предсказывающей вероятности, в терминах оценки Бриера. Давайте для этого скрафтим реализации миллиона событий, следующие заранее известным вероятностям:

probs = np.random.uniform(size=1000_000)
realizations = np.random.uniform(size=len(probs))
realizations = (realizations < probs).astype(np.int8)

В теории, у нас теперь есть массив единичек и нулей realizations, порождённый "истинными" вероятностями probs. Если ситуацию перевернуть, рассмотреть probs как вероятности, предсказанные моделью машинного обучения, а realizations как то, что мы реально пронаблюдали в жизни, то подобная точность должна быть мечтой любого ML-щика!

measure of the accuracy of probabilistic predictions

❤1

76 viewsedited 22:03

Aspiring Data Science

#ml #classificaion #probabilistic #brierscore

Возвращаясь к недавнему посту про оценку Бриера, суммаризирую:

1) Бриер=0 достигается не просто когда вероятности идеально откалиброваны. Для "нулевых" примеров предсказанные вероятности должны быть строго равны нулю, для "единичных" - единице.
2) в реальной задаче Бриер даже очень хорошей модели никогда не достигнет 0
3) более того, в каждой задаче своё распределение таргета, соответственно, минимально и максимально достижимые Бриер скоры РАЗНЫЕ. Например, для упоминавшегося выше равномерного распределения, Бриер идеальной модели стремится к 0.166, нерелевантной модели к 0.333, "антимодели" к 0.5
4) вещи становятся страннее, когда меняется распределение таргета. для "ненормального" и уж точно не равномерного таргета с картинки в комментах Бриер идеальной модели 0.221, Бриер перемешанных примеров 0.238, Бриер DummyClassifier (всегда предсказывает фактическую частоту класса 1) 0.230.

Т.е. абсолютная разница в оценках Бриера может быть мизерная, хотя на самом деле сравниваются идеальная модель и "почти случайное" угадывание.

Вывод: в каждом случае оценивайте границы оценок Бриера, хотя бы косвенными методами, прежде чем принимать решение о качестве модели.

Aspiring Data Science

#ml #metrics #brier

Как известно, оценка Бриера (Брайера?) для бинарного классификатора представляет собой по сути среднеквадратическую ошибку между реальными исходами и предсказанными вероятностями. В теории это число между 0 и 1, где 0 означает идеальную…

👍2❤1

46 viewsedited 04:41

Aspiring Data Science

#ml #catboost #metrics #bugs

Утро прошло в жарких спорах о точности. Нашёл предположительный баг в том, как катбуст считает precision.

https://github.com/catboost/catboost/issues/2422

Precision calculation error in Early Stopping. Request to add pos_label. · Issue #2422 · catboost/catboost

Problem: catboost version: 1.2 Operating System: Win CPU: + GPU: + Я думаю, в коде catboost вычисляющем precision где-то перепутаны предсказания и истинные значения, поэтому ранняя остановка по точ...

121 viewsAnatoly Alekseev, edited 11:11

Aspiring Data Science

#optimization #ml #metrics #python #numba #codegems

В общем, sklearn-овские метрики оказались слишком медленными, пришлось их переписать на numba. Вот пример classification_report, который работает в тысячу раз быстрее и поддерживает почти всю функциональность (кроме весов и микровзвешивания). Также оптимизировал метрики auc (алгоритм взят из fastauc) и calibration (считаю бины предсказанные vs реальные, потом mae/std от их разностей). На 8M сэмплов всё работает за ~30 миллисекунд кроме auc, та ~300 мс. Для сравнения, scikit-learn-овские работают от нескольких секунд до нескольких десятков секунд.

@njit()
def fast_classification_report(
    y_true: np.ndarray, y_pred: np.ndarray, nclasses: int = 2, zero_division: int = 0
):
    """Custom classification report, proof of concept."""

    N_AVG_ARRAYS = 3  # precisions, recalls, f1s

    # storage inits
    weighted_averages = np.empty(N_AVG_ARRAYS, dtype=np.float64)
    macro_averages = np.empty(N_AVG_ARRAYS, dtype=np.float64)
    supports = np.zeros(nclasses, dtype=np.int64)
    allpreds = np.zeros(nclasses, dtype=np.int64)
    misses = np.zeros(nclasses, dtype=np.int64)
    hits = np.zeros(nclasses, dtype=np.int64)

    # count stats
    for true_class, predicted_class in zip(y_true, y_pred):
        supports[true_class] += 1
        allpreds[predicted_class] += 1
        if predicted_class == true_class:
            hits[predicted_class] += 1
        else:
            misses[predicted_class] += 1

    # main calcs
    accuracy = hits.sum() / len(y_true)
    balanced_accuracy = np.nan_to_num(hits / supports, copy=True, nan=zero_division).mean()

    recalls = hits / supports
    precisions = hits / allpreds
    f1s = 2 * (precisions * recalls) / (precisions + recalls)

    # fix nans & compute averages
    i=0
    for arr in (precisions, recalls, f1s):
        np.nan_to_num(arr, copy=False, nan=zero_division)
        weighted_averages[i] = (arr * supports).sum() / len(y_true)
        macro_averages[i] = arr.mean()
        i+=1

    return hits, misses, accuracy, balanced_accuracy, supports, precisions, recalls, f1s, macro_averages, weighted_averages

👍6✍3

160 viewsAnatoly Alekseev, edited 19:00

Aspiring Data Science

#sklearn #metrics #optimization #numba

В гитхабе sklearn-а началась некая дискуссия о том, нужны ли быстрые метрики или даже использование Numba в sklearn. Возможно, у Вас тоже есть своё мнение?

Speed up classification_report · Issue #26808 · scikit-learn/scikit-learn

Describe the workflow you want to enable I'm concerned with slow execution speed of the classification_report procedure which makes it barely suitable for production-grade workloads. On a 8M sa...

❤‍🔥3

116 viewsAnatoly Alekseev, edited 18:41

Aspiring Data Science

#ml #mlops #mlflow #me #metrics #multimodel

Очень срезонировало это выступление. Я сейчас разрабатываю как раз такую систему, с мультиметриками, несколькими моделями разных классов. Даже ещё добавляю сразу ансамбли. Про ME (Maximum Error) как обязательную regression-метрику кажется очень полезно, никогда раньше не слышал. От себя бы добавил в обязательные метрики классификации что-то калибрационное: MAE/std над бинами калибрационной кривой, к примеру.

https://www.youtube.com/watch?v=VJWrSTAlxEs

Андрей Зубков - Без чего с ML в проде жизнь не мила

Data Fest 2023:
https://ods.ai/events/datafestonline2023
Трек "MLOps":
https://ods.ai/tracks/df23-mlops

Наши соц.сети:
Telegram: https://t.me/datafest
Вконтакте: https://vk.com/datafest

191 viewsAnatoly Alekseev, edited 21:53

Aspiring Data Science

#metrics #mse

Why is MSE so popular? The reasons are mostly based on theoretical properties, although there are a few properties that have value in some situations. Here are some of the main advantages of MSE as a measure of the performance of a model:

• It is fast and easy to compute.
• It is continuous and differentiable in most applications. Thus, it will be well behaved for most optimization algorithms.
• It is very intuitive in that it is simply an average of errors. Moreover, the squaring causes large errors to have a larger impact than small errors, which is good in many situations.
• Under commonly reasonable conditions (the most important being that the distribution is normal or a member of a related family), parameter estimates computed by minimizing MSE also have the desirable statistical property of being maximum likelihood estimates. This loosely means that of all possible parameter values, the one computed is the most likely to be correct.

We see that MSE satisfies the theoretical statisticians who design models, it satisfies the numerical analysts who design the training algorithms, and it satisfies the intuition of the users. All of the bases are covered.

👍3

123 viewsAnatoly Alekseev, edited 17:21

Aspiring Data Science

#ml #metrics #masters

It has been seen that good average or total performance in the training set is not the only important optimization consideration. Consistent performance is also important.
It encourages good performance outside the training set, and it provides stability as models are evolved by selective updating. An often easy way to encourage consistency is to stratify the training set, compute for each stratum a traditional optimization criterion like one of those previously discussed, and then let the final optimization criterion be a function of the values for the strata.

The power of stratification can sometimes be astounding. It is surprisingly easy to train a model that seems quite good on a training set only to discover later that its performance is spotty. It may be that the good average training performance was based on spectacular performance in part of the training set and mediocre performance in the rest. When the conditions that engender mediocrity appear in real life and the model fails to perform up to expectations, it is only then that the researcher may study its historical performance more closely and discover the problem. It is always better to discover this sort of problem early in the design process.

121 viewsAnatoly Alekseev, 10:01

Aspiring Data Science

#ml #metrics #regression #masters

В задачах регрессии Мастерс советует резервировать, помимо тестового множества, отдельное confidence set, репрезентативное к генеральной совокупности, на котором считать доверительные интервалы ошибок. Если распределение ошибок не нормальное, можно использовать просто квантили. Дополнительно он считает вероятности, что сами доверительные интервалы не нарушатся, используя для этого неполное бета-распределение.

132 viewsAnatoly Alekseev, 16:45

Aspiring Data Science

#trading #backtesting #metrics

https://www.youtube.com/watch?v=3WXHFjQFGYs

4 - Performance Metrics | Quant Trading in Futures

How do we evaluate a trading strategy?
What metrics measure risk-adjusted returns in terms of volatility?
What metrics measure risk-adjusted returns in terms of worst-case-scenario loss?
What metrics measure risk-adjusted returns in terms of correlation to…

129 viewsAnatoly Alekseev, 13:09

Aspiring Data Science

#recommenders #metrics

Оказывается, поставлю ли я 5-ку фильму после просмотра, и хочу ли я смотреть этот фильм сегодня вечером - это не одно и то же )

https://www.youtube.com/watch?v=DAdnbffMkcE

Что такое хорошо и что такое плохо: метрики для рекомендательных систем / Ирина Пчелинцева (Яндекс)

При поддержке AvitoTech мы впервые публикуем все видео с UseData Conf 2019 в открытый доступ. Учитесь, вдохновляйтесь и перенимайте лучшие практики у спикеров, не выходя из дома.

Календарь конференций - https://ontico.ru
--------
UseDataConf 2019

Тезисы…

👍1

115 viewsAnatoly Alekseev, 07:52

Aspiring Data Science

#pit #calibration #metrics

Как-то я совсем пропустил эту идею с PIT. Диаграммы рассеяния я постоянно использую, надо бы и PIT графики заценить.

The Probability Integral Transform (PIT) and binned reliability diagrams (e.g., plotting binned probabilities vs. real hit frequencies) are both tools for evaluating the calibration of probabilistic predictions, but they have distinct advantages and limitations:

Advantages of PIT over Binned Probabilities:

Continuous Assessment:

PIT uses the entire predicted distribution for each observation, providing a continuous view of calibration rather than relying on discretized bins.
This avoids issues with arbitrarily choosing bin edges or having too few samples per bin, which can bias binned reliability diagrams.
Higher Resolution:

PIT evaluates the full shape of the calibration, capturing subtle patterns in miscalibration that might be lost in coarse binning.
Better for Continuous Variables:

PIT is particularly advantageous for continuous outcomes (e.g., temperature, stock prices) where using bins can be challenging or lead to overly smoothed results.
Works Naturally for CDF Predictions:

If your model directly predicts cumulative probabilities (e.g., quantile regression or distributional models), PIT aligns naturally with this representation. Binned probabilities may not integrate smoothly with these types of predictions.
Uniform Distribution Diagnostic:

PIT values being uniformly distributed under perfect calibration provide a statistically robust test of calibration, allowing for formal hypothesis testing (e.g., Kolmogorov-Smirnov test or histogram-based goodness-of-fit tests).

Advantages of Binned Probabilities:

Intuitive Visualization:

Binned reliability diagrams are easier for non-experts to understand, as they directly show how predicted probabilities correspond to observed frequencies.
Focused on Predicted Probabilities:

These diagrams emphasize the calibration of specific probability ranges (e.g., "Does a predicted 70% chance event happen 70% of the time?"), which is useful for discrete probabilistic predictions like classification.
Handles Classification Tasks Well:

For binary classification tasks, binned probabilities are more direct and interpretable, especially when dealing with predicted probabilities rather than full distributions.

PS. Попробовал я эти PIT диаграммы, для классификаторов это вообще не подходит (

https://medium.com/@maltetichy/demystifying-the-probability-integral-transform-77b7de3a3af9

Demystifying the Probability Integral Transform

The Probability Integral Transform formalizes an intuitive and comprehensible approach to validating probabilistic predictions.

121 viewsAnatoly Alekseev, edited 06:01

Aspiring Data Science

#metrics

https://www.youtube.com/watch?v=PeYQIyOyKB8

Maria Khalusova: Machine Learning Model Evaluation Metrics | PyData LA 2019

www.pydata.org

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each…

97 viewsAnatoly Alekseev, 15:15

Aspiring Data Science

Forwarded from Artem Ryblov’s Data Science Weekly

The Kaggle Book by Konrad Banachewicz and Luca Massaron

Millions of data enthusiasts from around the world compete on Kaggle, the most famous data science competition platform of them all. Participating in Kaggle competitions is a surefire way to improve your data analysis skills, network with an amazing community of data scientists, and gain valuable experience to help grow your career.

The first book of its kind, The Kaggle Book assembles in one place the techniques and skills you'll need for success in competitions, data science projects, and beyond. Two Kaggle Grandmasters walk you through modeling strategies you won't easily find elsewhere, and the knowledge they've accumulated along the way. As well as Kaggle-specific tips, you'll learn more general techniques for approaching tasks based on image, tabular, textual data, and reinforcement learning. You'll design better validation schemes and work more comfortably with different evaluation metrics.

Whether you want to climb the ranks of Kaggle, build some more data science skills, or improve the accuracy of your existing models, this book is for you.

Link: Book

Navigational hashtags: #armknowledgesharing #armbooks
General hashtags: #ml #machinelearning #featureengineering #kaggle #metrics #validation #hyperparameters #tabular #cv #nlp

@data_science_weekly

98 viewsAnatoly Alekseev, 14:36

Aspiring Data Science

#trading #metrics

https://medium.datadriveninvestor.com/performance-measures-for-quantitative-portfolio-and-strategy-evaluation-with-python-implementations-608e6b0c61b8

Performance Measures for Quantitative Portfolio and Strategy Evaluation with Python Implementations

A comprehensive list of most used metrics to evaluate portfolios and strategies’ performance with implementations in Python.

155 viewsAnatoly Alekseev, 06:51