Machine Learning And AI

Hey Guys,
Welcome to my Youtube Channel,Geeky Codes. In this channel, I am trying to create videos related to data science. I've started a series of SQL 50 Day Plan This is 3rd Video in this series. Please do not forget to subscribe my channel if you're…

338 views16:17

https://geekycodesin.wordpress.com/2024/11/17/minimum-number-of-platforms-required-for-trains-a-problem-solution-approach/

335 views12:46

Machine Learning And AI

𝗗𝗮𝘁𝗮 𝗦𝗰𝗶𝗲𝗻𝗰𝗲 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝗤𝘂𝗲𝘀𝘁𝗶𝗼𝗻:
How does outliers impact kNN?

Outliers can significantly impact the performance of kNN, leading to inaccurate predictions due to the model's reliance on proximity for decision-making. Here’s a breakdown of how outliers influence kNN:

𝗛𝗶𝗴𝗵 𝗩𝗮𝗿𝗶𝗮𝗻𝗰𝗲
The presence of outliers can increase the model's variance, as predictions near outliers may fluctuate unpredictably depending on which neighbors are included. This makes the model less reliable for regression tasks with scattered or sparse data.

𝗗𝗶𝘀𝘁𝗮𝗻𝗰𝗲 𝗠𝗲𝘁𝗿𝗶𝗰 𝗦𝗲𝗻𝘀𝗶𝘁𝗶𝘃𝗶𝘁𝘆
kNN relies on distance metrics, which can be significantly affected by outliers. In high-dimensional spaces, outliers can increase the range of distances, making it harder for the algorithm to distinguish between nearby points and those farther away. This issue can lead to an overall reduction in accuracy as the model’s ability to effectively measure "closeness" degrades.

𝗥𝗲𝗱𝘂𝗰𝗲 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗶𝗻 𝗖𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗰𝗮𝘁𝗶𝗼𝗻/𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻 𝗧𝗮𝘀𝗸𝘀
Outliers near class boundaries can pull the decision boundary toward them, potentially misclassifying nearby points that should belong to a different class. This is particularly problematic if k is small, as individual points (like outliers) have a greater influence. The same happens in regression tasks as well.

𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗜𝗻𝗳𝗹𝘂𝗲𝗻𝗰𝗲 𝗗𝗶𝘀𝗽𝗿𝗼𝗽𝗼𝗿𝘁𝗶𝗼𝗻
If certain features contain outliers, they can dominate the distance calculations and overshadow the impact of other features. For example, an outlier in a high-magnitude feature may cause distances to be determined largely by that feature, affecting the quality of the neighbor selection.

717 views14:32

Machine Learning And AI

https://geekycodesin.wordpress.com/2024/11/19/questions-asked-in-data-scientist-interviews-part-6/

Geeky Codes

Geeky CodesQuestions asked in Data Scientist Interviews Part 6

Q1. What is Data Science? List the differences between supervised and unsupervised learning. Data Science is a blend of various tools, algorithms, and machine learning principles with the goal to d…

418 views19:29

Machine Learning And AI

Company Name : Amazon
Role : Cloud Support Associate
Batch : 2024/2023 passouts

Link : https://www.amazon.jobs/en/jobs/2676989/cloud-support-associate

460 views22:55

Machine Learning And AI

👍1

580 views08:00

Machine Learning And AI

tiktok data science interview questions.pdf

251 KB

452 views12:07

Machine Learning And AI

Company Name : Swiggy
Role : Associate Software Engineer
Batch : 2024/2023/2022 passouts

Link : https://docs.google.com/forms/d/1E029cjZV8Em6zPC0YJYAMDDP_NjPtDkwufqHfvkVG2E/viewform?edit_requested=true&pli=1

443 views13:11

Machine Learning And AI

https://geekycodes.in/2024/11/20/questions-asked-in-data-scientist-interviews-part-7-2/

481 views17:07

Machine Learning And AI

https://geekycodes.in/2024/11/21/questions-asked-in-data-scientist-interviews-part-8-2/

Geeky Codes

Geeky CodesQuestions asked in Data Scientist Interviews Part 8

Q5. What is the difference between “long” and “wide” format data? In the wide-format, a subject’s repeated responses will be in a single row, and each response is in a separate column. In the long-…

509 views09:13

Machine Learning And AI

https://geekycodes.in/2024/11/21/questions-asked-in-data-scientist-interviews-part-9/

Geeky Codes

Geeky CodesQuestions asked in Data Scientist Interviews Part 9

Q1 . What is correlation and covariance in statistics? Correlation is considered or described as the best technique for measuring and also for estimating the quantitative relationship between two v…

528 views11:56

Machine Learning And AI

https://geekycodes.in/2024/11/26/questions-asked-in-data-scientist-interviews-part-10/

477 views07:05

Machine Learning And AI

https://geekycodes.in/2024/11/26/questions-asked-in-data-scientist-interviews-part-11/

531 views09:52

Machine Learning And AI

https://geekycodes.in/2024/11/26/questions-asked-in-data-scientist-interviews-part-12/

580 views09:52

Machine Learning And AI

https://geekycodes.in/2024/11/28/understanding-the-singleton-pattern-in-nestjs/

Geeky Codes

Geeky CodesUnderstanding the Singleton Pattern in NestJS

When building applications in NestJS, one of the fundamental design patterns that you’ll encounter is the singleton pattern. This pattern plays a crucial role in managing service lifecycles, …

656 views19:15

Machine Learning And AI

https://geekycodes.in/2024/11/28/the-complete-guide-to-the-manual-testing-process/

773 views20:09

Machine Learning And AI

https://geekycodes.in/2024/11/29/exploring-java-8-a-game-changer-in-modern-software-development/

869 views05:20

Machine Learning And AI

https://geekycodes.in/2024/12/10/changing-the-default-port-in-sql-server-a-complete-guide/

702 views08:09

Machine Learning And AI

Data science interview questions :

Position : Data Scientist
There were 3 rounds of interview followed by 1 HR discussion.

Coding related questions :

1. You are given 2 lists
l1 = [1,2,2,3,4,5,6,6,7]
l2= [1,2,4,5,5,6,6,7]
Return all the elements from list l2 which were spotted in l1 as per their frequency of occurence.
For eg: elements in l2 which occured in l1 are : 1, 2 (only once) ,4 ,5,6 ,7
Expected Output : [1,2,4,5,6,6,7]

2.
text = ' I am a data scientist working in Paypal'
Return the longest word along with count of letters
Expected Output : ('scientist', 9)
(In case of ties , sort the words in alphabetical order and return 1st one)

3. You are given 2 tables in SQL , table 1 contains one column which has only single value '3' repeated 5 times , table 2 contains only one column which has 2 values ,'2' repeated 3 times & '3' repeated 4 times. Tell me number of records in case of inner . left , right ,outer , cross join.

4. You are given a transaction table which has txn_id (primary key) ,cust_id (foreign key) , txn_date (datetime) , txn_amt as 4 columns , one cust id has multiple txn_ids. Your job is to find out all the cust_id for which there were minimum 2 txns which are made in 10 seconds of duration. (This might help to identify fraudulent patterns)

Case study questions :
1. Tell me business model and revneue sources of Paypal.
Tell me general consequences when you change pricing of any product.

2. You are data scientist who studies impact of pricing change. Suppose business team comes to you and asks you what will happen if they increase the merchant fees per txn by 5%.
What will be your recommendation & strategy ?
What all factors will you think of and what will be the proposed ML solutioning look like ?

3. You see that some merchants are heavily misusing the refund facility (for incorrect txn /disputed txn merchants get refund) , they are claiming reimbursements by doing fake txns.
List possible scenarios and ways to identify such merchants ?

4. How will you decide pricing of a premier product like Iphone in India vs say South Africa ? What factors will you consider ?

Statistics Questions:
1. What is multicollinearity ?
2. What is Type1 error and type 2 error (explain in pricing experimentation pov)
3. What is Weibull distribution ?
4. What is CLT ? What is difference between t & normal distribution.
5. What is Wald's test ?
6. What is Ljung box test and explain null hypothesis for ADF test in Time series
7. What is causality ?

ML Questions:
1. What is logistic regression ? What is deviance ?
2. What is difference between R-Squared & Adj R-squared
3. How does Randomforest works ?
4. Difference between bagging and boosting ?
On paradigm of Variance -Bias what does bagging/boosting attempt to solve ?

728 views06:59

Machine Learning And AI

I struggled with Data Science interviews until...

I followed this roadmap:

𝗣𝘆𝘁𝗵𝗼𝗻
👉🏼 Master the basics: syntax, loops, functions, and data structures (lists, dictionaries, sets, tuples)
👉🏼 Learn Pandas & NumPy for data manipulation
👉🏼 Matplotlib & Seaborn for data visualization

𝗦𝘁𝗮𝘁𝗶𝘀𝘁𝗶𝗰𝘀 & 𝗣𝗿𝗼𝗯𝗮𝗯𝗶𝗹𝗶𝘁𝘆
👉🏼 Descriptive statistics: mean, median, mode, standard deviation
👉🏼 Probability theory: distributions, Bayes' theorem, conditional probability
👉🏼 Hypothesis testing & A/B testing

𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴
👉🏼 Supervised vs. unsupervised learning
👉🏼 Key algorithms: Linear & Logistic Regression, Decision Trees, Random Forest, KNN, SVM
👉🏼 Model evaluation metrics: accuracy, precision, recall, F1 score, ROC-AUC
👉🏼 Cross-validation & hyperparameter tuning

𝗗𝗲𝗲𝗽 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴
👉🏼 Neural Networks & their architecture
👉🏼 Working with Keras & TensorFlow/PyTorch
👉🏼 CNNs for image data and RNNs for sequence data

𝗗𝗮𝘁𝗮 𝗖𝗹𝗲𝗮𝗻𝗶𝗻𝗴 & 𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴
👉🏼 Handling missing data, outliers, and data scaling
👉🏼 Feature selection techniques (e.g., correlation, mutual information)

𝗡𝗟𝗣 (𝗡𝗮𝘁𝘂𝗿𝗮𝗹 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴)
👉🏼 Tokenization, stemming, lemmatization
👉🏼 Bag-of-Words, TF-IDF
👉🏼 Sentiment analysis & topic modeling

𝗖𝗹𝗼𝘂𝗱 𝗮𝗻𝗱 𝗕𝗶𝗴 𝗗𝗮𝘁𝗮
👉🏼 Understanding cloud services (AWS, GCP, Azure) for data storage & computing
👉🏼 Working with distributed data using Spark
👉🏼 SQL for querying large datasets

Don’t get overwhelmed by the breadth of topics. Start small—master one concept, then move to the next. 📈

You’ve got this! 💪🏼

787 views22:40

About

Blog

Apps

Platform