So what should an entry-level interview experience look like?
Having been on both sides of the process - this format, IMO, is the most effective one
Round 1:
⭐️ 30 minutes LeetCode, 30 minutes SQL
The goal? Understand how candidate approaches the problem - clarifies ambiguity, addresses edge cases, and writes code.
Passing a few test cases is required, but not all.
Better than brute force is required, optimal solution is not.
Round 2:
⭐️ Machine Learning/Statistics and Resume-based
The goal? Make sure they understand basic concepts - bias vs variance, hypothesis testing, cleaning data etc. and how they have approached ML formulation, metric selection and modelling in the past.
Round 3:
⭐️ Hiring Manager (+ senior team member) to review work on resume + culture fit
The goal? For the HM and senior team members to assess if the candidate is a culture fit with the team; To review prior work and see if how they think about solving a data/ML problem would work in the team (or if the person is coachable)
Join our channel for more information like this
Having been on both sides of the process - this format, IMO, is the most effective one
Round 1:
⭐️ 30 minutes LeetCode, 30 minutes SQL
The goal? Understand how candidate approaches the problem - clarifies ambiguity, addresses edge cases, and writes code.
Passing a few test cases is required, but not all.
Better than brute force is required, optimal solution is not.
Round 2:
⭐️ Machine Learning/Statistics and Resume-based
The goal? Make sure they understand basic concepts - bias vs variance, hypothesis testing, cleaning data etc. and how they have approached ML formulation, metric selection and modelling in the past.
Round 3:
⭐️ Hiring Manager (+ senior team member) to review work on resume + culture fit
The goal? For the HM and senior team members to assess if the candidate is a culture fit with the team; To review prior work and see if how they think about solving a data/ML problem would work in the team (or if the person is coachable)
Join our channel for more information like this
https://geekycodesin.wordpress.com/2024/11/13/understanding-scrum-methodology-a-comprehensive-guide/
Geeky Codes
Understanding Scrum Methodology: A Comprehensive Guide
In today’s fast-paced, ever-changing world of software development, traditional project management approaches often struggle to keep up with the demands of innovation, speed, and flexibility. Enter…
Amazon Data Science Interview Question:
In a linear regression model, what are the key assumptions that need to be satisfied for the model to be valid? How would you evaluate whether these assumptions hold in your dataset?
This is also, the most common question I see across companies!
So the assumptions are -
𝗟𝗶𝗻𝗲𝗮𝗿𝗶𝘁𝘆
The relationship between the independent variables (predictors) and the dependent variable is linear. This means that the effect of each predictor on the outcome is constant and additive.
How to evaluate? - Scatter plots of predictors vs. the dependent variable and residual vs. fitted value plots. You can also use polynomial regression or transformations (log, square root) if non-linearity is detected.
How to fix? - Apply feature transformations (e.g., log, square root, polynomial) or use non-linear models.
𝗡𝗼𝗿𝗺𝗮𝗹𝗶𝘁𝘆 𝗼𝗳 𝗘𝗿𝗿𝗼𝗿𝘀
The residuals are normally distributed, especially for the purpose of conducting statistical tests and constructing confidence intervals.
How to evaluate - Residual autocorrelation plots or the Durbin-Watson test for time-series data. For non-time-series data, this assumption can often be assumed to be satisfied if the data is randomly sampled.
How to fix - Transform the dependent variable (log, box-cox) and/or check for outliers.
𝗛𝗼𝗺𝗼𝘀𝗰𝗲𝗱𝗮𝘀𝘁𝗶𝗰𝗶𝘁𝘆 (𝗖𝗼𝗻𝘀𝘁𝗮𝗻𝘁 𝗩𝗮𝗿𝗶𝗮𝗻𝗰𝗲 𝗼𝗳 𝗘𝗿𝗿𝗼𝗿𝘀)
The variance of the residuals (errors) is constant across all levels of the independent variables. In other words, the spread of residuals should not increase or decrease as the predicted values increase.
How to evaluate - Plot the residuals against fitted values. If the plot shows a "fan" shape (i.e., increasing or decreasing spread of residuals), you may need to address heteroscedasticity using robust standard errors or a transformation (e.g., log-transformation).
How to fix - Transformation of dependent variable (log, box-cox) or weighted least squares regression can help
𝗡𝗼 𝗠𝘂𝗹𝘁𝗶𝗰𝗼𝗹𝗹𝗶𝗻𝗲𝗮𝗿𝗶𝘁𝘆
The independent variables (predictors) are not highly correlated with each other. High correlation between predictors can lead to multicollinearity, which makes it difficult to determine the individual effect of each predictor on the dependent variable.
How to evaluate - Calculate the Variance Inflation Factor (VIF) for each predictor. If VIF is high, consider removing highly correlated predictors or combining them into a single predictor (e.g., using Principal Component Analysis).
How to fix - Remove or combine correlated predictors, or use regularized regression models like Ridge or Lasso regression.
In a linear regression model, what are the key assumptions that need to be satisfied for the model to be valid? How would you evaluate whether these assumptions hold in your dataset?
This is also, the most common question I see across companies!
So the assumptions are -
𝗟𝗶𝗻𝗲𝗮𝗿𝗶𝘁𝘆
The relationship between the independent variables (predictors) and the dependent variable is linear. This means that the effect of each predictor on the outcome is constant and additive.
How to evaluate? - Scatter plots of predictors vs. the dependent variable and residual vs. fitted value plots. You can also use polynomial regression or transformations (log, square root) if non-linearity is detected.
How to fix? - Apply feature transformations (e.g., log, square root, polynomial) or use non-linear models.
𝗡𝗼𝗿𝗺𝗮𝗹𝗶𝘁𝘆 𝗼𝗳 𝗘𝗿𝗿𝗼𝗿𝘀
The residuals are normally distributed, especially for the purpose of conducting statistical tests and constructing confidence intervals.
How to evaluate - Residual autocorrelation plots or the Durbin-Watson test for time-series data. For non-time-series data, this assumption can often be assumed to be satisfied if the data is randomly sampled.
How to fix - Transform the dependent variable (log, box-cox) and/or check for outliers.
𝗛𝗼𝗺𝗼𝘀𝗰𝗲𝗱𝗮𝘀𝘁𝗶𝗰𝗶𝘁𝘆 (𝗖𝗼𝗻𝘀𝘁𝗮𝗻𝘁 𝗩𝗮𝗿𝗶𝗮𝗻𝗰𝗲 𝗼𝗳 𝗘𝗿𝗿𝗼𝗿𝘀)
The variance of the residuals (errors) is constant across all levels of the independent variables. In other words, the spread of residuals should not increase or decrease as the predicted values increase.
How to evaluate - Plot the residuals against fitted values. If the plot shows a "fan" shape (i.e., increasing or decreasing spread of residuals), you may need to address heteroscedasticity using robust standard errors or a transformation (e.g., log-transformation).
How to fix - Transformation of dependent variable (log, box-cox) or weighted least squares regression can help
𝗡𝗼 𝗠𝘂𝗹𝘁𝗶𝗰𝗼𝗹𝗹𝗶𝗻𝗲𝗮𝗿𝗶𝘁𝘆
The independent variables (predictors) are not highly correlated with each other. High correlation between predictors can lead to multicollinearity, which makes it difficult to determine the individual effect of each predictor on the dependent variable.
How to evaluate - Calculate the Variance Inflation Factor (VIF) for each predictor. If VIF is high, consider removing highly correlated predictors or combining them into a single predictor (e.g., using Principal Component Analysis).
How to fix - Remove or combine correlated predictors, or use regularized regression models like Ridge or Lasso regression.