#ml #uncertainty #catboost #medicine #blood
Всё-таки иногда попадаются и качественные научные работы ML-тематики. Зацените строгость подхода, всё сделано по лучшим практикам.
"Code for the analysis can be found at https://github.com/oizin/glucose-data-driven-prediction.
Model validation
The dataset is randomly split into a 70% training (13 279 ICU admissions) and 30% test (5682 ICU admissions) sets. Sample splits are performed by ICU admission ID to avoid potential information leakage. We evaluate all models on the test set only after finalization of hyperparameter settings to ensure unbiased assessments of model generalizability. As the algorithms were computationally expensive to train, we perform hyperparameter tuning by randomly splitting the training set into 80% development and 20% validation sets."
Ну разве что до SHAP всё-таки не дотянули. А сама работа меня заинтересовала тем, что там сравнивается мультиквантильная регрессия с "регрессией с неопределённостью" :
We develop 2 ML approaches using the Catboost gradient boosting library.39 These models were chosen as they present alternative approaches to predicting both a point estimate and uncertainty quantification through probabilistic forecasting. The first is a Catboost regression model with dual estimation of the expected outcome and the standard deviation of the prediction distribution, the ‘uncertainty regression’ model.43 This form of estimation can be performed using the class CatBoostRegressor with the argument loss_function=“RMSEWithUncertainty” in the Python version of Catboost 2.4. The second model is a combination of quantile regressions with models for quantiles of 0.025, 0.5, and 0.975, the “quantile regression” model.
Квантили дали вот какое преимущество:
In order to have clinical utility, it is important that the model can detect hyperglycemia and hypoglycemia. Detection of hyperglycemia was only slightly worse than values in the ICU normal blood glucose range. However, similar to previous research, our point estimates were unable to detect hypoglycemia at 2-hour forecasts.35 However, by forecasting an interval, we increase the potential to flag circumstances in which hypoglycemia is a risk, with 41% of hypoglycemic events captured within the prediction intervals.
Если Вы использовали одну из таких функций потерь в работе, буду рад, если поделитесь выводами об их полезности.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8324237/
Всё-таки иногда попадаются и качественные научные работы ML-тематики. Зацените строгость подхода, всё сделано по лучшим практикам.
"Code for the analysis can be found at https://github.com/oizin/glucose-data-driven-prediction.
Model validation
The dataset is randomly split into a 70% training (13 279 ICU admissions) and 30% test (5682 ICU admissions) sets. Sample splits are performed by ICU admission ID to avoid potential information leakage. We evaluate all models on the test set only after finalization of hyperparameter settings to ensure unbiased assessments of model generalizability. As the algorithms were computationally expensive to train, we perform hyperparameter tuning by randomly splitting the training set into 80% development and 20% validation sets."
Ну разве что до SHAP всё-таки не дотянули. А сама работа меня заинтересовала тем, что там сравнивается мультиквантильная регрессия с "регрессией с неопределённостью" :
We develop 2 ML approaches using the Catboost gradient boosting library.39 These models were chosen as they present alternative approaches to predicting both a point estimate and uncertainty quantification through probabilistic forecasting. The first is a Catboost regression model with dual estimation of the expected outcome and the standard deviation of the prediction distribution, the ‘uncertainty regression’ model.43 This form of estimation can be performed using the class CatBoostRegressor with the argument loss_function=“RMSEWithUncertainty” in the Python version of Catboost 2.4. The second model is a combination of quantile regressions with models for quantiles of 0.025, 0.5, and 0.975, the “quantile regression” model.
Квантили дали вот какое преимущество:
In order to have clinical utility, it is important that the model can detect hyperglycemia and hypoglycemia. Detection of hyperglycemia was only slightly worse than values in the ICU normal blood glucose range. However, similar to previous research, our point estimates were unable to detect hypoglycemia at 2-hour forecasts.35 However, by forecasting an interval, we increase the potential to flag circumstances in which hypoglycemia is a risk, with 41% of hypoglycemic events captured within the prediction intervals.
Если Вы использовали одну из таких функций потерь в работе, буду рад, если поделитесь выводами об их полезности.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8324237/
GitHub
GitHub - oizin/glucose-data-driven-prediction: Code for paper: Incorporating real-world evidence into the development of patient…
Code for paper: Incorporating real-world evidence into the development of patient blood glucose prediction algorithms for the ICU - GitHub - oizin/glucose-data-driven-prediction: Code for paper: In...
👍1
#energy #uncertainty #conformal
Residuals of the point predictions on the calibration set - интересная идея базиса.
Альтернативы:
1) квантильная регрессия
2) разброс прогнозов ансамбля
3) замена регресии мульти-классификацией
4) conformal=0+1
Впервые вижу, что хвалят метрику crps, я от неё отказался, уж не помню почему.
MapieTimeSeriesRegressor интересный.
https://www.youtube.com/watch?v=aIZf2cQ0r5U
Residuals of the point predictions on the calibration set - интересная идея базиса.
Альтернативы:
1) квантильная регрессия
2) разброс прогнозов ансамбля
3) замена регресии мульти-классификацией
4) conformal=0+1
Впервые вижу, что хвалят метрику crps, я от неё отказался, уж не помню почему.
MapieTimeSeriesRegressor интересный.
https://www.youtube.com/watch?v=aIZf2cQ0r5U
YouTube
Harnessing uncertainty: the role of probabilistic time series forecasting in the renewable energy...
Harnessing uncertainty: the role of probabilistic time series forecasting in the renewable energy transition
How can probabilistic forecasting accelerate the renewable energy transition? The rapid growth of non-steerable and intermittent wind and solar power…
How can probabilistic forecasting accelerate the renewable energy transition? The rapid growth of non-steerable and intermittent wind and solar power…