Forwarded from Artem Ryblov’s Data Science Weekly (Artem Ryblov)
Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning by Sebastian Raschka
The correct use of model evaluation, model selection, and algorithm selection techniques is vital in academic machine learning research as well as in many industrial settings.
This article reviews different techniques that can be used for each of these three subtasks and discusses the main advantages and disadvantages of each technique with references to theoretical and empirical studies. Further, recommendations are given to encourage best yet feasible practices in research and applications of machine learning.
Link
https://arxiv.org/abs/1811.12808
Navigational hashtags: #armknowledgesharing #armarticles
General hashtags: #machinelearning #ml #modelevaluation #evaluation #selection #cv #crossvalidation
@accelerated_learning
The correct use of model evaluation, model selection, and algorithm selection techniques is vital in academic machine learning research as well as in many industrial settings.
This article reviews different techniques that can be used for each of these three subtasks and discusses the main advantages and disadvantages of each technique with references to theoretical and empirical studies. Further, recommendations are given to encourage best yet feasible practices in research and applications of machine learning.
Link
https://arxiv.org/abs/1811.12808
Navigational hashtags: #armknowledgesharing #armarticles
General hashtags: #machinelearning #ml #modelevaluation #evaluation #selection #cv #crossvalidation
@accelerated_learning
#ml #modelling #evaluation #masters
Как оценить вероятность, что обученная модель достигла своих результатов не благодаря случайному совпадению?
Мастерс предлагает N раз переобучить модель, каждый раз перемешивая таргет, и подсчитать процент случаев, когда метрика "перемешанной" модели лучше, чем оригинальной. Что-то вроде Boruta, но не для предикторов, а для таргета.
Более того, он раскладывает преимущества от использования модели на 3 фактора: обобщающую способность, смещение данных, и смещение модели:
Зачастую inherent bias можно получить аналитически, например, в финансовых задачах это может быть результат от buy & hold на растущем рынке. Тогда можно оценить
Remember that permutation can also be used with cross validation, walk-forward testing, and most other out-of-sample evaluation methods in order to compute the probability that good observed out-of-sample results could have arisen from a worthless model. The principle is exactly the same, although of course it makes no sense to eliminate training bias from out-of-sample results."
Пример вычислений со слабым классификатором:
"Called fraud 141 of 100000 (0.14 percent)
Actual fraud 0.94 percent
p = 0.57600
Original gain = 0.55370 with original inherent bias = 0.52818
Mean permuted gain = 0.56517
Mean permuted inherent bias = 0.50323
Training bias = 0.06194 (0.56517 minus 0.50323)
Unbiased actual gain = 0.49176 (0.55370 minus 0.06194)
Unbiased gain above inherent bias = -0.01147 (0.49176 minus 0.50323)
In this no-power example, the number of cases called fraud (0.14 percent) is quite different from the actual number (0.94 percent). Such variation is common when there is little predictive power. The most important number in this table is the p-value, the probability that a worthless model could have performed as well or better just by luck. This p-value is 0.576, a strong indication that the model could be worthless. You really want to see a tiny p-value, ideally 0.01 or less, in order to have solid confidence in the model.
The per-case gain of the unpermuted model is 0.55370, which may seem decent until you notice that 0.52818 of it is inherent bias. So even after optimization of performance, it barely beats what a coin-toss model would achieve. Things get even worse when you see that the mean gain for permuted data is 0.56517, better than the original model!"
Как оценить вероятность, что обученная модель достигла своих результатов не благодаря случайному совпадению?
Мастерс предлагает N раз переобучить модель, каждый раз перемешивая таргет, и подсчитать процент случаев, когда метрика "перемешанной" модели лучше, чем оригинальной. Что-то вроде Boruta, но не для предикторов, а для таргета.
Более того, он раскладывает преимущества от использования модели на 3 фактора: обобщающую способность, смещение данных, и смещение модели:
TrainingGain = Ability + InherentBias +TrainingBiasТак как "перемешанные" модельки не получают аутентичных связей предикторов с таргетом, в них первый компонент отсутствует:
PermutedGain = InherentBias +TrainingBiasСоответственно, сравнивая метрики оригинальных и перемешанных моделей, можно уже по трейн сету оценить истинную обобщающую способность и вклад шума в результат.
Зачастую inherent bias можно получить аналитически, например, в финансовых задачах это может быть результат от buy & hold на растущем рынке. Тогда можно оценить
TrainingBias = PermutedGain - InherentBias"It tells us how much the process of training optimistically inflates the gain. If the TrainingBias is large, this constitutes evidence that the model may be too powerful relative to the amount of noise in the data. This information can be particularly handy early in model development, when it is used to compare some competing modeling methodologies to judge whether additional data is required.
Remember that permutation can also be used with cross validation, walk-forward testing, and most other out-of-sample evaluation methods in order to compute the probability that good observed out-of-sample results could have arisen from a worthless model. The principle is exactly the same, although of course it makes no sense to eliminate training bias from out-of-sample results."
Пример вычислений со слабым классификатором:
"Called fraud 141 of 100000 (0.14 percent)
Actual fraud 0.94 percent
p = 0.57600
Original gain = 0.55370 with original inherent bias = 0.52818
Mean permuted gain = 0.56517
Mean permuted inherent bias = 0.50323
Training bias = 0.06194 (0.56517 minus 0.50323)
Unbiased actual gain = 0.49176 (0.55370 minus 0.06194)
Unbiased gain above inherent bias = -0.01147 (0.49176 minus 0.50323)
In this no-power example, the number of cases called fraud (0.14 percent) is quite different from the actual number (0.94 percent). Such variation is common when there is little predictive power. The most important number in this table is the p-value, the probability that a worthless model could have performed as well or better just by luck. This p-value is 0.576, a strong indication that the model could be worthless. You really want to see a tiny p-value, ideally 0.01 or less, in order to have solid confidence in the model.
The per-case gain of the unpermuted model is 0.55370, which may seem decent until you notice that 0.52818 of it is inherent bias. So even after optimization of performance, it barely beats what a coin-toss model would achieve. Things get even worse when you see that the mean gain for permuted data is 0.56517, better than the original model!"