#books #kagglebook #noise #augmentaion #todo
Denoising with autoencoders
In his famous post (https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/ discussion/44629), Michael Jahrer describes how a DAE can not only remove noise but also automatically create new features, so the representation of the features is learned in a similar way to what happens in image competitions. In the post, he mentions the secret sauce for the DAE recipe, which is not simply the layers, but the noise you put into the data in order to augment it. He also made clear that the technique requires stacking together training and test data, implying that the technique would not have applications beyond winning a Kaggle competition.
In order to help train any kind of DAE, you need to inject noise that helps to augment the training data and avoid the overparameterized neural network just memorizing inputs (in other words, overfitting). In the Porto Seguro competition, Michael Jahrer added noise by using a technique called swap noise, which he described as follows: Here I sample from the feature itself with a certain probability “inputSwapNoise” in the table above. 0.15 means 15% of features replaced by values from another row.
What is described is basically an augmentation technique called mixup (which is also used in image augmentation: https://arxiv.org/abs/1710.09412). In mixup for tabular data, you decide a probability for mixing up. Based on that probability, you change some of the original values in a sample, replacing them with values from a more or less similar sample from the same training data.
Большой вопрос, на который нет ответа - почему аугментации всегда используются в задачах машинного зрения, и никогда - в табличных данных? А ведь это отличный метод создания diversity. Нужно исследование.
Neural networks for tabular competitions
Use activations such as GeLU, SeLU, or Mish instead of ReLU; they are quoted in quite a few papers as being more suitable for modeling tabular data and our own experience confirms that they tend to perform better.
Use augmentation with mixup (discussed in the section on autoencoders).
Use quantile transformation on numeric features and force, as a result, uniform or Gaussian distributions.
Leverage embedding layers, but also remember that embeddings do not model everything. In fact, they miss interactions between the embedded feature and all the others (so you have to force these interactions into the network with direct feature engineering).
Remember that embedding layers are reusable. In fact, they consist only of a matrix multiplication that reduces the input (a sparse one-hot encoding of the high cardinality variable) to a dense one of lower dimensionality. By recording and storing away the embedding of a trained neural network, you can transform the same feature and use the resulting embeddings in many other different algorithms, from gradient boosting to linear models.
Denoising with autoencoders
In his famous post (https://www.kaggle.com/c/porto-seguro-safe-driver-prediction/ discussion/44629), Michael Jahrer describes how a DAE can not only remove noise but also automatically create new features, so the representation of the features is learned in a similar way to what happens in image competitions. In the post, he mentions the secret sauce for the DAE recipe, which is not simply the layers, but the noise you put into the data in order to augment it. He also made clear that the technique requires stacking together training and test data, implying that the technique would not have applications beyond winning a Kaggle competition.
In order to help train any kind of DAE, you need to inject noise that helps to augment the training data and avoid the overparameterized neural network just memorizing inputs (in other words, overfitting). In the Porto Seguro competition, Michael Jahrer added noise by using a technique called swap noise, which he described as follows: Here I sample from the feature itself with a certain probability “inputSwapNoise” in the table above. 0.15 means 15% of features replaced by values from another row.
What is described is basically an augmentation technique called mixup (which is also used in image augmentation: https://arxiv.org/abs/1710.09412). In mixup for tabular data, you decide a probability for mixing up. Based on that probability, you change some of the original values in a sample, replacing them with values from a more or less similar sample from the same training data.
Большой вопрос, на который нет ответа - почему аугментации всегда используются в задачах машинного зрения, и никогда - в табличных данных? А ведь это отличный метод создания diversity. Нужно исследование.
Neural networks for tabular competitions
Use activations such as GeLU, SeLU, or Mish instead of ReLU; they are quoted in quite a few papers as being more suitable for modeling tabular data and our own experience confirms that they tend to perform better.
Use augmentation with mixup (discussed in the section on autoencoders).
Use quantile transformation on numeric features and force, as a result, uniform or Gaussian distributions.
Leverage embedding layers, but also remember that embeddings do not model everything. In fact, they miss interactions between the embedded feature and all the others (so you have to force these interactions into the network with direct feature engineering).
Remember that embedding layers are reusable. In fact, they consist only of a matrix multiplication that reduces the input (a sparse one-hot encoding of the high cardinality variable) to a dense one of lower dimensionality. By recording and storing away the embedding of a trained neural network, you can transform the same feature and use the resulting embeddings in many other different algorithms, from gradient boosting to linear models.
Kaggle
Porto Seguro’s Safe Driver Prediction
Predict if a driver will file an insurance claim next year.