Machine Learning And AI

When it comes to sorting an array that contains only 0s, 1s, and 2s, the task becomes relatively simpler compared to sorting general arrays. This type of problem is often referred to as the Dutch N…

320 views15:59

Amazon Data Science Interview Question:
In a linear regression model, what are the key assumptions that need to be satisfied for the model to be valid? How would you evaluate whether these assumptions hold in your dataset?

This is also, the most common question I see across companies!

So the assumptions are -

𝗟𝗶𝗻𝗲𝗮𝗿𝗶𝘁𝘆
The relationship between the independent variables (predictors) and the dependent variable is linear. This means that the effect of each predictor on the outcome is constant and additive.
How to evaluate? - Scatter plots of predictors vs. the dependent variable and residual vs. fitted value plots. You can also use polynomial regression or transformations (log, square root) if non-linearity is detected.
How to fix? - Apply feature transformations (e.g., log, square root, polynomial) or use non-linear models.

𝗡𝗼𝗿𝗺𝗮𝗹𝗶𝘁𝘆 𝗼𝗳 𝗘𝗿𝗿𝗼𝗿𝘀
The residuals are normally distributed, especially for the purpose of conducting statistical tests and constructing confidence intervals.
How to evaluate - Residual autocorrelation plots or the Durbin-Watson test for time-series data. For non-time-series data, this assumption can often be assumed to be satisfied if the data is randomly sampled.
How to fix - Transform the dependent variable (log, box-cox) and/or check for outliers.

𝗛𝗼𝗺𝗼𝘀𝗰𝗲𝗱𝗮𝘀𝘁𝗶𝗰𝗶𝘁𝘆 (𝗖𝗼𝗻𝘀𝘁𝗮𝗻𝘁 𝗩𝗮𝗿𝗶𝗮𝗻𝗰𝗲 𝗼𝗳 𝗘𝗿𝗿𝗼𝗿𝘀)
The variance of the residuals (errors) is constant across all levels of the independent variables. In other words, the spread of residuals should not increase or decrease as the predicted values increase.
How to evaluate - Plot the residuals against fitted values. If the plot shows a "fan" shape (i.e., increasing or decreasing spread of residuals), you may need to address heteroscedasticity using robust standard errors or a transformation (e.g., log-transformation).
How to fix - Transformation of dependent variable (log, box-cox) or weighted least squares regression can help

𝗡𝗼 𝗠𝘂𝗹𝘁𝗶𝗰𝗼𝗹𝗹𝗶𝗻𝗲𝗮𝗿𝗶𝘁𝘆
The independent variables (predictors) are not highly correlated with each other. High correlation between predictors can lead to multicollinearity, which makes it difficult to determine the individual effect of each predictor on the dependent variable.
How to evaluate - Calculate the Variance Inflation Factor (VIF) for each predictor. If VIF is high, consider removing highly correlated predictors or combining them into a single predictor (e.g., using Principal Component Analysis).
How to fix - Remove or combine correlated predictors, or use regularized regression models like Ridge or Lasso regression.

470 views08:38

Machine Learning And AI

https://geekycodesin.wordpress.com/?p=11435&preview=true&_thumbnail_id=11438

Geeky Codes

Understanding Object-Oriented Programming (OOP) Concepts in C#

Object-Oriented Programming (OOP) is a programming paradigm based on the concept of “objects,” which can contain both data (in the form of fields, also known as properties) and code (in…

310 views18:51

Machine Learning And AI

https://geekycodesin.wordpress.com/2024/11/16/finding-the-maximum-subarray-sum-in-on-time-a-guide-to-kadanes-algorithm/

Geeky Codes

Finding the Maximum Subarray Sum in O(N) Time: A Guide to Kadane’s Algorithm

If you’ve ever worked with arrays or lists in programming, you’ve probably come across the problem of finding the maximum sum of any contiguous subarray. This is a classic problem in co…

👍3

366 views19:06

Machine Learning And AI

https://youtu.be/P1QX6bhnojk?si=gO0kplNJfzNL1AF6

YouTube

Big Countries | Leet Code SQL Day 3 | 50 Day Challenge

Hey Guys,
Welcome to my Youtube Channel,Geeky Codes. In this channel, I am trying to create videos related to data science. I've started a series of SQL 50 Day Plan This is 3rd Video in this series. Please do not forget to subscribe my channel if you're…

338 views16:17

Machine Learning And AI

https://geekycodesin.wordpress.com/2024/11/17/minimum-number-of-platforms-required-for-trains-a-problem-solution-approach/

335 views12:46

Machine Learning And AI

𝗗𝗮𝘁𝗮 𝗦𝗰𝗶𝗲𝗻𝗰𝗲 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝗤𝘂𝗲𝘀𝘁𝗶𝗼𝗻:
How does outliers impact kNN?

Outliers can significantly impact the performance of kNN, leading to inaccurate predictions due to the model's reliance on proximity for decision-making. Here’s a breakdown of how outliers influence kNN:

𝗛𝗶𝗴𝗵 𝗩𝗮𝗿𝗶𝗮𝗻𝗰𝗲
The presence of outliers can increase the model's variance, as predictions near outliers may fluctuate unpredictably depending on which neighbors are included. This makes the model less reliable for regression tasks with scattered or sparse data.

𝗗𝗶𝘀𝘁𝗮𝗻𝗰𝗲 𝗠𝗲𝘁𝗿𝗶𝗰 𝗦𝗲𝗻𝘀𝗶𝘁𝗶𝘃𝗶𝘁𝘆
kNN relies on distance metrics, which can be significantly affected by outliers. In high-dimensional spaces, outliers can increase the range of distances, making it harder for the algorithm to distinguish between nearby points and those farther away. This issue can lead to an overall reduction in accuracy as the model’s ability to effectively measure "closeness" degrades.

𝗥𝗲𝗱𝘂𝗰𝗲 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗶𝗻 𝗖𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻/𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻 𝗧𝗮𝘀𝗸𝘀
Outliers near class boundaries can pull the decision boundary toward them, potentially misclassifying nearby points that should belong to a different class. This is particularly problematic if k is small, as individual points (like outliers) have a greater influence. The same happens in regression tasks as well.

𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗜𝗻𝗳𝗹𝘂𝗲𝗻𝗰𝗲 𝗗𝗶𝘀𝗽𝗿𝗼𝗽𝗼𝗿𝘁𝗶𝗼𝗻
If certain features contain outliers, they can dominate the distance calculations and overshadow the impact of other features. For example, an outlier in a high-magnitude feature may cause distances to be determined largely by that feature, affecting the quality of the neighbor selection.

717 views14:32

Machine Learning And AI

https://geekycodesin.wordpress.com/2024/11/19/questions-asked-in-data-scientist-interviews-part-6/

Geeky Codes

Geeky CodesQuestions asked in Data Scientist Interviews Part 6

Q1. What is Data Science? List the differences between supervised and unsupervised learning. Data Science is a blend of various tools, algorithms, and machine learning principles with the goal to d…

418 views19:29

Machine Learning And AI

Company Name : Amazon
Role : Cloud Support Associate
Batch : 2024/2023 passouts

Link : https://www.amazon.jobs/en/jobs/2676989/cloud-support-associate

460 views22:55

Machine Learning And AI

👍1