Aspiring Data Science

#llms #ai #gpt #math #ensembling #diversity

Недавно на ютубе смотрел видео о применении компьютера (в частности, языка Julia для исследования проблемы 196), спросил в комментах математика (явно советской школы), который написал статью по модификации проблемы, не пробовал ли он ИИ для генерации кода, да и вообще обкатки идей, на что получил кажущийся высокомерным ответ, мол, этот ваш ИИ слишком слаб для моих задач. Повеяло Сурдиным, в общем ) Я такой думаю, да что ж вы за гиганты мысли такие, что он для вас слаб и неточен, я вот прекрасно с ним работаю и оочень доволен. А вот и пример.

Q.: Suppose i'm running HPT/HPO for some model on a particular dataset, and I plan to later use more than one HP combinations to form an ensemble. Now, am I looking for HPs producing un-correlated predictions (zero MI), or anti-correlated predictions (corrcoeff=-1)?

A.: You're looking for hyperparameter (HP) combinations that produce uncorrelated predictions (zero mutual information, MI ≈ 0) rather than strictly anti-correlated ones (corrcoeff=−1).

Why uncorrelated predictions?
Ensemble Diversity: The key to a strong ensemble is combining diverse but individually strong models. If two models are perfectly anti-correlated (ρ=−1), their predictions will cancel each other out when averaged, leading to little to no improvement.
Reduction in Variance: The goal of an ensemble is to reduce variance and improve generalization. If the models are uncorrelated, they make different kinds of errors, so their combination will smooth out those errors.
Avoiding Overcompensation: A correlation of −1 means that one model's overestimation is exactly another model's underestimation, which could lead to instability in the ensemble.

How to achieve this?
Optimize for performance first: Choose hyperparameters that result in good individual models.
Select diverse models: Instead of maximizing anti-correlation, look for models that make errors in different places (measured by MI or correlation).
Use decorrelation techniques: Cluster models based on prediction similarity and pick those with the lowest pairwise correlation.
In short, low correlation (not necessarily negative) is desirable, but enforcing perfect anti-correlation is counterproductive.

Вот эта интересная идея, о которой я и не подумал, Cluster models based on prediction similarity and pick those with the lowest pairwise correlation. А ведь интересная визуализация может получиться. Хоть какое-то реальное применение для unsupervised ML 😅

Q.: What are other practical means to look for models that make errors in different places, except MI?

A.: Aside from mutual information (MI), there are several practical ways to identify models that make errors in different places when forming an ensemble:

Disagreement Rate (Prediction Diversity Index)
For classification, measure how often two models make different predictions on the same data points.
Higher values indicate greater diversity.

Jaccard Similarity for Misclassified Samples
Compute the Jaccard index for misclassified examples between two models:
Lower values mean models make mistakes on different samples.

Double-Fold Cross-Validation Differences
Train models on different folds and analyze how much their predictions differ on holdout sets.
If a model performs well on Fold 1 but poorly on Fold 2, and another model shows the opposite behavior, they might complement each other.

Shapley Value or Feature Importance Differences
Compare the features each model relies on using SHAP, permutation importance, or Gini importance.
If two models prioritize different features, they likely capture different aspects of the data.

Orthogonality of Prediction Vectors (in High-Dimensional Spaces)
Compute cosine similarity between prediction vectors:
Lower cosine similarity implies more independent predictions.

Wikipedia

Проблема 196

условное название нерешённой математической задачи

✍2

96 viewsAnatoly Alekseev, edited 12:50