๐ Understanding the Impact of Feature Selection vs. Feature Extraction in Dimensionality Reduction for Big Data ๐
In the era of big data, working with high-dimensional datasets presents major challenges in processing, visualization, and model performance. A recent study titled "Comparison of Feature Selection and Feature Extraction Role in Dimensionality Reduction of Big Data" (Journal of Techniques, 2023) offers a comprehensive evaluation of Feature Selection (FS) and Feature Extraction (FE) using the ANSUR II dataset โ a U.S. Army anthropometric dataset with 109 features and 6068 observations.
๐ Study Goals
To compare FS and FE techniques in terms of:
โก๏ธ Dimensionality reduction
โก๏ธ Predictive performance
โก๏ธ Information retention
โ๏ธ Techniques Explored
๐งน Feature Selection:
๐ธ Highly Correlated Filter โ removes features with correlation > 0.88
๐ธ Recursive Feature Elimination (RFE) โ eliminates the least important features iteratively
๐ Feature Extraction:
๐น Principal Component Analysis (PCA) โ transforms original features into orthogonal components
๐งช Methodology
๐งผ Data preprocessing using Missing Value Ratio
๐ง Classification using ML models:
โ K-Nearest Neighbors (KNN)
โ Decision Tree
โ Support Vector Machine (SVM)
โ Neural Network
โ Random Forest
๐ Post-reduction classification using the same models
๐ Key Results
๐ KNN consistently performed best, maintaining 83% accuracy pre- and post-reduction
๐ง RFE showed the highest accuracy among reduction techniques with 66% post-reduction accuracy
๐งฉ PCA effectively reduced features but slightly decreased accuracy and interpretability
๐ก Takeaways
โ Use Feature Selection when interpretability and maintaining original structure are important
โ Use Feature Extraction for noisy or highly redundant datasets
๐ฏ The choice depends on your data and modeling objectives
๐ Read the full paper here: DOI: 10.51173/jt.v5i1.1027
This is an excellent reference for anyone navigating the complexities of dimensionality reduction in ML pipelines. Whether you're optimizing models or just curious about FS vs. FE, this study is gold! ๐ง โจ
#MachineLearning #DataScience #FeatureEngineering #DimensionalityReduction #BigData #AI #KNN #PCA #RFE #MLResearch #DataAnalytics
In the era of big data, working with high-dimensional datasets presents major challenges in processing, visualization, and model performance. A recent study titled "Comparison of Feature Selection and Feature Extraction Role in Dimensionality Reduction of Big Data" (Journal of Techniques, 2023) offers a comprehensive evaluation of Feature Selection (FS) and Feature Extraction (FE) using the ANSUR II dataset โ a U.S. Army anthropometric dataset with 109 features and 6068 observations.
๐ Study Goals
To compare FS and FE techniques in terms of:
โก๏ธ Dimensionality reduction
โก๏ธ Predictive performance
โก๏ธ Information retention
โ๏ธ Techniques Explored
๐งน Feature Selection:
๐ธ Highly Correlated Filter โ removes features with correlation > 0.88
๐ธ Recursive Feature Elimination (RFE) โ eliminates the least important features iteratively
๐ Feature Extraction:
๐น Principal Component Analysis (PCA) โ transforms original features into orthogonal components
๐งช Methodology
๐งผ Data preprocessing using Missing Value Ratio
๐ง Classification using ML models:
โ K-Nearest Neighbors (KNN)
โ Decision Tree
โ Support Vector Machine (SVM)
โ Neural Network
โ Random Forest
๐ Post-reduction classification using the same models
๐ Key Results
๐ KNN consistently performed best, maintaining 83% accuracy pre- and post-reduction
๐ง RFE showed the highest accuracy among reduction techniques with 66% post-reduction accuracy
๐งฉ PCA effectively reduced features but slightly decreased accuracy and interpretability
๐ก Takeaways
โ Use Feature Selection when interpretability and maintaining original structure are important
โ Use Feature Extraction for noisy or highly redundant datasets
๐ฏ The choice depends on your data and modeling objectives
๐ Read the full paper here: DOI: 10.51173/jt.v5i1.1027
This is an excellent reference for anyone navigating the complexities of dimensionality reduction in ML pipelines. Whether you're optimizing models or just curious about FS vs. FE, this study is gold! ๐ง โจ
#MachineLearning #DataScience #FeatureEngineering #DimensionalityReduction #BigData #AI #KNN #PCA #RFE #MLResearch #DataAnalytics