Aspiring Data Science

#ml #uncertainty #catboost #medicine #blood

Всё-таки иногда попадаются и качественные научные работы ML-тематики. Зацените строгость подхода, всё сделано по лучшим практикам.

"Code for the analysis can be found at https://github.com/oizin/glucose-data-driven-prediction.
Model validation

The dataset is randomly split into a 70% training (13 279 ICU admissions) and 30% test (5682 ICU admissions) sets. Sample splits are performed by ICU admission ID to avoid potential information leakage. We evaluate all models on the test set only after finalization of hyperparameter settings to ensure unbiased assessments of model generalizability. As the algorithms were computationally expensive to train, we perform hyperparameter tuning by randomly splitting the training set into 80% development and 20% validation sets."

Ну разве что до SHAP всё-таки не дотянули. А сама работа меня заинтересовала тем, что там сравнивается мультиквантильная регрессия с "регрессией с неопределённостью" :

We develop 2 ML approaches using the Catboost gradient boosting library.39 These models were chosen as they present alternative approaches to predicting both a point estimate and uncertainty quantification through probabilistic forecasting. The first is a Catboost regression model with dual estimation of the expected outcome and the standard deviation of the prediction distribution, the ‘uncertainty regression’ model.43 This form of estimation can be performed using the class CatBoostRegressor with the argument loss_function=“RMSEWithUncertainty” in the Python version of Catboost 2.4. The second model is a combination of quantile regressions with models for quantiles of 0.025, 0.5, and 0.975, the “quantile regression” model.

Квантили дали вот какое преимущество:

In order to have clinical utility, it is important that the model can detect hyperglycemia and hypoglycemia. Detection of hyperglycemia was only slightly worse than values in the ICU normal blood glucose range. However, similar to previous research, our point estimates were unable to detect hypoglycemia at 2-hour forecasts.35 However, by forecasting an interval, we increase the potential to flag circumstances in which hypoglycemia is a risk, with 41% of hypoglycemic events captured within the prediction intervals.

Если Вы использовали одну из таких функций потерь в работе, буду рад, если поделитесь выводами об их полезности.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8324237/

GitHub

GitHub - oizin/glucose-data-driven-prediction: Code for paper: Incorporating real-world evidence into the development of patient…

Code for paper: Incorporating real-world evidence into the development of patient blood glucose prediction algorithms for the ICU - GitHub - oizin/glucose-data-driven-prediction: Code for paper: In...

👍1

128 viewsAnatoly Alekseev, edited 22:01

Aspiring Data Science

#energy #uncertainty #conformal

Residuals of the point predictions on the calibration set - интересная идея базиса.

Альтернативы:
1) квантильная регрессия
2) разброс прогнозов ансамбля
3) замена регресии мульти-классификацией
4) conformal=0+1

Впервые вижу, что хвалят метрику crps, я от неё отказался, уж не помню почему.
MapieTimeSeriesRegressor интересный.

https://www.youtube.com/watch?v=aIZf2cQ0r5U

YouTube

Harnessing uncertainty: the role of probabilistic time series forecasting in the renewable energy...

Harnessing uncertainty: the role of probabilistic time series forecasting in the renewable energy transition

How can probabilistic forecasting accelerate the renewable energy transition? The rapid growth of non-steerable and intermittent wind and solar power…

166 viewsAnatoly Alekseev, edited 19:54

About

Blog

Apps

Platform