#ml #modelling #evaluation #masters
Как оценить вероятность, что обученная модель достигла своих результатов не благодаря случайному совпадению?
Мастерс предлагает N раз переобучить модель, каждый раз перемешивая таргет, и подсчитать процент случаев, когда метрика "перемешанной" модели лучше, чем оригинальной. Что-то вроде Boruta, но не для предикторов, а для таргета.
Более того, он раскладывает преимущества от использования модели на 3 фактора: обобщающую способность, смещение данных, и смещение модели:
Зачастую inherent bias можно получить аналитически, например, в финансовых задачах это может быть результат от buy & hold на растущем рынке. Тогда можно оценить
Remember that permutation can also be used with cross validation, walk-forward testing, and most other out-of-sample evaluation methods in order to compute the probability that good observed out-of-sample results could have arisen from a worthless model. The principle is exactly the same, although of course it makes no sense to eliminate training bias from out-of-sample results."
Пример вычислений со слабым классификатором:
"Called fraud 141 of 100000 (0.14 percent)
Actual fraud 0.94 percent
p = 0.57600
Original gain = 0.55370 with original inherent bias = 0.52818
Mean permuted gain = 0.56517
Mean permuted inherent bias = 0.50323
Training bias = 0.06194 (0.56517 minus 0.50323)
Unbiased actual gain = 0.49176 (0.55370 minus 0.06194)
Unbiased gain above inherent bias = -0.01147 (0.49176 minus 0.50323)
In this no-power example, the number of cases called fraud (0.14 percent) is quite different from the actual number (0.94 percent). Such variation is common when there is little predictive power. The most important number in this table is the p-value, the probability that a worthless model could have performed as well or better just by luck. This p-value is 0.576, a strong indication that the model could be worthless. You really want to see a tiny p-value, ideally 0.01 or less, in order to have solid confidence in the model.
The per-case gain of the unpermuted model is 0.55370, which may seem decent until you notice that 0.52818 of it is inherent bias. So even after optimization of performance, it barely beats what a coin-toss model would achieve. Things get even worse when you see that the mean gain for permuted data is 0.56517, better than the original model!"
Как оценить вероятность, что обученная модель достигла своих результатов не благодаря случайному совпадению?
Мастерс предлагает N раз переобучить модель, каждый раз перемешивая таргет, и подсчитать процент случаев, когда метрика "перемешанной" модели лучше, чем оригинальной. Что-то вроде Boruta, но не для предикторов, а для таргета.
Более того, он раскладывает преимущества от использования модели на 3 фактора: обобщающую способность, смещение данных, и смещение модели:
TrainingGain = Ability + InherentBias +TrainingBiasТак как "перемешанные" модельки не получают аутентичных связей предикторов с таргетом, в них первый компонент отсутствует:
PermutedGain = InherentBias +TrainingBiasСоответственно, сравнивая метрики оригинальных и перемешанных моделей, можно уже по трейн сету оценить истинную обобщающую способность и вклад шума в результат.
Зачастую inherent bias можно получить аналитически, например, в финансовых задачах это может быть результат от buy & hold на растущем рынке. Тогда можно оценить
TrainingBias = PermutedGain - InherentBias"It tells us how much the process of training optimistically inflates the gain. If the TrainingBias is large, this constitutes evidence that the model may be too powerful relative to the amount of noise in the data. This information can be particularly handy early in model development, when it is used to compare some competing modeling methodologies to judge whether additional data is required.
Remember that permutation can also be used with cross validation, walk-forward testing, and most other out-of-sample evaluation methods in order to compute the probability that good observed out-of-sample results could have arisen from a worthless model. The principle is exactly the same, although of course it makes no sense to eliminate training bias from out-of-sample results."
Пример вычислений со слабым классификатором:
"Called fraud 141 of 100000 (0.14 percent)
Actual fraud 0.94 percent
p = 0.57600
Original gain = 0.55370 with original inherent bias = 0.52818
Mean permuted gain = 0.56517
Mean permuted inherent bias = 0.50323
Training bias = 0.06194 (0.56517 minus 0.50323)
Unbiased actual gain = 0.49176 (0.55370 minus 0.06194)
Unbiased gain above inherent bias = -0.01147 (0.49176 minus 0.50323)
In this no-power example, the number of cases called fraud (0.14 percent) is quite different from the actual number (0.94 percent). Such variation is common when there is little predictive power. The most important number in this table is the p-value, the probability that a worthless model could have performed as well or better just by luck. This p-value is 0.576, a strong indication that the model could be worthless. You really want to see a tiny p-value, ideally 0.01 or less, in order to have solid confidence in the model.
The per-case gain of the unpermuted model is 0.55370, which may seem decent until you notice that 0.52818 of it is inherent bias. So even after optimization of performance, it barely beats what a coin-toss model would achieve. Things get even worse when you see that the mean gain for permuted data is 0.56517, better than the original model!"
Forwarded from Artem Ryblov’s Data Science Weekly (Artem Ryblov)
Mindful Modeler by Christoph Molnar
The newsletter combines the best of two worlds: the performance mindset of machine learning and the mindfulness of statistical thinking.
Machine learning has become mainstream while falling short in the silliest ways: lack of interpretability, biased and missing data, wrong conclusions, … To statisticians, these shortcomings are often unsurprising. Statisticians are relentless in their quest to understand how the data came about. They make sure that their models reflect the data-generating process and interpret models accordingly.
In a sea of people who basically know how to
Sign up for this newsletter to combine performance-driven machine learning with statistical thinking. Become a mindful modeller.
You'll learn about:
- Thinking like a statistician while performing like a machine learner
- Spotting non-obvious data problems
- Interpretable machine learning
- Other modelling mindsets such as causal inference and prompt engineering
Link
https://mindfulmodeler.substack.com/
Navigational hashtags: #armknowledgesharing #armnewsletters
General hashtags: #modelling #modeling #ml #machinelearning #statistics #modelinterpretation #data #interpretability #casualinference
@accelerated_learning
The newsletter combines the best of two worlds: the performance mindset of machine learning and the mindfulness of statistical thinking.
Machine learning has become mainstream while falling short in the silliest ways: lack of interpretability, biased and missing data, wrong conclusions, … To statisticians, these shortcomings are often unsurprising. Statisticians are relentless in their quest to understand how the data came about. They make sure that their models reflect the data-generating process and interpret models accordingly.
In a sea of people who basically know how to
model.fit()
and model.predict()
you can stand out by bringing statistical thinking to the arena.Sign up for this newsletter to combine performance-driven machine learning with statistical thinking. Become a mindful modeller.
You'll learn about:
- Thinking like a statistician while performing like a machine learner
- Spotting non-obvious data problems
- Interpretable machine learning
- Other modelling mindsets such as causal inference and prompt engineering
Link
https://mindfulmodeler.substack.com/
Navigational hashtags: #armknowledgesharing #armnewsletters
General hashtags: #modelling #modeling #ml #machinelearning #statistics #modelinterpretation #data #interpretability #casualinference
@accelerated_learning
Substack
Mindful Modeler | Christoph Molnar | Substack
Better machine learning by thinking like a statistician. About model interpretation, paying attention to data, and always staying critical. Click to read Mindful Modeler, by Christoph Molnar, a Substack publication with tens of thousands of subscribers.
👍1