𝗗𝗮𝘁𝗮 𝗦𝗰𝗶𝗲𝗻𝗰𝗲 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝗤𝘂𝗲𝘀𝘁𝗶𝗼𝗻:
How does outliers impact kNN?
Outliers can significantly impact the performance of kNN, leading to inaccurate predictions due to the model's reliance on proximity for decision-making. Here’s a breakdown of how outliers influence kNN:
𝗛𝗶𝗴𝗵 𝗩𝗮𝗿𝗶𝗮𝗻𝗰𝗲
The presence of outliers can increase the model's variance, as predictions near outliers may fluctuate unpredictably depending on which neighbors are included. This makes the model less reliable for regression tasks with scattered or sparse data.
𝗗𝗶𝘀𝘁𝗮𝗻𝗰𝗲 𝗠𝗲𝘁𝗿𝗶𝗰 𝗦𝗲𝗻𝘀𝗶𝘁𝗶𝘃𝗶𝘁𝘆
kNN relies on distance metrics, which can be significantly affected by outliers. In high-dimensional spaces, outliers can increase the range of distances, making it harder for the algorithm to distinguish between nearby points and those farther away. This issue can lead to an overall reduction in accuracy as the model’s ability to effectively measure "closeness" degrades.
𝗥𝗲𝗱𝘂𝗰𝗲 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗶𝗻 𝗖𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻/𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻 𝗧𝗮𝘀𝗸𝘀
Outliers near class boundaries can pull the decision boundary toward them, potentially misclassifying nearby points that should belong to a different class. This is particularly problematic if k is small, as individual points (like outliers) have a greater influence. The same happens in regression tasks as well.
𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗜𝗻𝗳𝗹𝘂𝗲𝗻𝗰𝗲 𝗗𝗶𝘀𝗽𝗿𝗼𝗽𝗼𝗿𝘁𝗶𝗼𝗻
If certain features contain outliers, they can dominate the distance calculations and overshadow the impact of other features. For example, an outlier in a high-magnitude feature may cause distances to be determined largely by that feature, affecting the quality of the neighbor selection.
How does outliers impact kNN?
Outliers can significantly impact the performance of kNN, leading to inaccurate predictions due to the model's reliance on proximity for decision-making. Here’s a breakdown of how outliers influence kNN:
𝗛𝗶𝗴𝗵 𝗩𝗮𝗿𝗶𝗮𝗻𝗰𝗲
The presence of outliers can increase the model's variance, as predictions near outliers may fluctuate unpredictably depending on which neighbors are included. This makes the model less reliable for regression tasks with scattered or sparse data.
𝗗𝗶𝘀𝘁𝗮𝗻𝗰𝗲 𝗠𝗲𝘁𝗿𝗶𝗰 𝗦𝗲𝗻𝘀𝗶𝘁𝗶𝘃𝗶𝘁𝘆
kNN relies on distance metrics, which can be significantly affected by outliers. In high-dimensional spaces, outliers can increase the range of distances, making it harder for the algorithm to distinguish between nearby points and those farther away. This issue can lead to an overall reduction in accuracy as the model’s ability to effectively measure "closeness" degrades.
𝗥𝗲𝗱𝘂𝗰𝗲 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗶𝗻 𝗖𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻/𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻 𝗧𝗮𝘀𝗸𝘀
Outliers near class boundaries can pull the decision boundary toward them, potentially misclassifying nearby points that should belong to a different class. This is particularly problematic if k is small, as individual points (like outliers) have a greater influence. The same happens in regression tasks as well.
𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗜𝗻𝗳𝗹𝘂𝗲𝗻𝗰𝗲 𝗗𝗶𝘀𝗽𝗿𝗼𝗽𝗼𝗿𝘁𝗶𝗼𝗻
If certain features contain outliers, they can dominate the distance calculations and overshadow the impact of other features. For example, an outlier in a high-magnitude feature may cause distances to be determined largely by that feature, affecting the quality of the neighbor selection.
Company Name : Amazon
Role : Cloud Support Associate
Batch : 2024/2023 passouts
Link : https://www.amazon.jobs/en/jobs/2676989/cloud-support-associate
Role : Cloud Support Associate
Batch : 2024/2023 passouts
Link : https://www.amazon.jobs/en/jobs/2676989/cloud-support-associate
Company Name : Swiggy
Role : Associate Software Engineer
Batch : 2024/2023/2022 passouts
Link : https://docs.google.com/forms/d/1E029cjZV8Em6zPC0YJYAMDDP_NjPtDkwufqHfvkVG2E/viewform?edit_requested=true&pli=1
Role : Associate Software Engineer
Batch : 2024/2023/2022 passouts
Link : https://docs.google.com/forms/d/1E029cjZV8Em6zPC0YJYAMDDP_NjPtDkwufqHfvkVG2E/viewform?edit_requested=true&pli=1