Top 15 #AI websites for #Interview Preparations for #Jobseekers!
1) Huru.ai
AI-powered interview prep with tailored questions.
2) Talkberry.ai
Language learning with simulated English job interviews.
3) Interviewigniter.com
AI roleplay simulations for post-interview evaluations.
4) AI Mock Interview - (Sqlpad.io)
Tailored interview practice with personalized feedback.
5) Rightjoin.co
Customized mock interviews based on resumes and job postings.
6) Interviewsby.ai
Custom mock interviews with real-time voice feedback.
7) Jobinterview-ai.com
Real-time AI-assisted English interview practice.
8) Interview Coach
AI-generated job-specific interview questions and guidance.
9) InterviewGPT.ai
AI-powered practice sessions and personalized feedback.
10) Interviewai.me
AI-generated personalized cover letters and interview questions.
11) Interviewprep-ai.com
Streamlined CV integration and customized interview practice.
12) Interview warmup (grow.google)
Practice platform for answering interview questions with transcription.
13) Metaview.ai
Interview Notes
14) Applyish.com
Apply Automatically
15) Hnresumetojobs.com
Resume to jobs
16) Matchthaoleai.com
Job search
1) Huru.ai
AI-powered interview prep with tailored questions.
2) Talkberry.ai
Language learning with simulated English job interviews.
3) Interviewigniter.com
AI roleplay simulations for post-interview evaluations.
4) AI Mock Interview - (Sqlpad.io)
Tailored interview practice with personalized feedback.
5) Rightjoin.co
Customized mock interviews based on resumes and job postings.
6) Interviewsby.ai
Custom mock interviews with real-time voice feedback.
7) Jobinterview-ai.com
Real-time AI-assisted English interview practice.
8) Interview Coach
AI-generated job-specific interview questions and guidance.
9) InterviewGPT.ai
AI-powered practice sessions and personalized feedback.
10) Interviewai.me
AI-generated personalized cover letters and interview questions.
11) Interviewprep-ai.com
Streamlined CV integration and customized interview practice.
12) Interview warmup (grow.google)
Practice platform for answering interview questions with transcription.
13) Metaview.ai
Interview Notes
14) Applyish.com
Apply Automatically
15) Hnresumetojobs.com
Resume to jobs
16) Matchthaoleai.com
Job search
Job trends for software developers in the next 5 years:
[1] Most startups are starting to realize that they need to control profits, not just revenues. Many unicorns who were on track to their IPO had to delay their IPOs due to lack of profitability.
For software developers: This means that startups may be less likely to offer high salaries to attract new talent. However, it also means that startups are becoming more focused on profitability, which is a good thing for the industry in the long term.
[2] All smart CEOs have started to focus on their personal brands. This is evident on LinkedIn as well. People like to buy from people, not from companies. Therefore, almost every 'Shark' now feature on their own advertisement. Why? Personal brand.
Take a cue and start cultivating your personal brand as well.
For software developers: Building a personal brand is a great way to attract new job opportunities and build a following. You can do this by writing blog posts, creating videos, or speaking at industry events.
[3] Gone are the days when you could depend on 1 job. Firms are not loyal to you. And, any smart employee can see through this. If you are not building backups already, you are doing yourself a disservice. And, the volatile job market is might one day hit you badly.
For software developers: Software developers are in high demand, so they have more flexibility to take on multiple jobs. This can be a great way to increase income and diversify your skills. For example, you could work as a full-time software engineer and also freelance as a software developer on the side.
[4] Rise of remote jobs:
To cut the long story short, if work could be done from home, why would you waste your time, effort, energy travelling? Not every job fits into 'work from home' culture, but many do.
For software developers: Software development is a job that can easily be done remotely, so many companies are now offering remote positions. This can be a great benefit for software developers who want more flexibility in their work-life balance.
[5] With the rise of financial & career education, most people would prefer decent money + family time (OVER) crazy money + no time.
And owning small businesses is one way of fulfilling this goal.
For software developers: Starting a small business can be a great way for software developers to have more control over their work and earn more money. Some examples of small businesses that software developers can start include developing and selling software products, providing software consulting services, or freelancing as a software developer.
6] As more and more businesses move online, the demand for software developers will continue to grow. Software developers are responsible for building and maintaining the digital infrastructure that businesses need to operate in the digital age.
P.S. I am not an expert and these are speculations
[1] Most startups are starting to realize that they need to control profits, not just revenues. Many unicorns who were on track to their IPO had to delay their IPOs due to lack of profitability.
For software developers: This means that startups may be less likely to offer high salaries to attract new talent. However, it also means that startups are becoming more focused on profitability, which is a good thing for the industry in the long term.
[2] All smart CEOs have started to focus on their personal brands. This is evident on LinkedIn as well. People like to buy from people, not from companies. Therefore, almost every 'Shark' now feature on their own advertisement. Why? Personal brand.
Take a cue and start cultivating your personal brand as well.
For software developers: Building a personal brand is a great way to attract new job opportunities and build a following. You can do this by writing blog posts, creating videos, or speaking at industry events.
[3] Gone are the days when you could depend on 1 job. Firms are not loyal to you. And, any smart employee can see through this. If you are not building backups already, you are doing yourself a disservice. And, the volatile job market is might one day hit you badly.
For software developers: Software developers are in high demand, so they have more flexibility to take on multiple jobs. This can be a great way to increase income and diversify your skills. For example, you could work as a full-time software engineer and also freelance as a software developer on the side.
[4] Rise of remote jobs:
To cut the long story short, if work could be done from home, why would you waste your time, effort, energy travelling? Not every job fits into 'work from home' culture, but many do.
For software developers: Software development is a job that can easily be done remotely, so many companies are now offering remote positions. This can be a great benefit for software developers who want more flexibility in their work-life balance.
[5] With the rise of financial & career education, most people would prefer decent money + family time (OVER) crazy money + no time.
And owning small businesses is one way of fulfilling this goal.
For software developers: Starting a small business can be a great way for software developers to have more control over their work and earn more money. Some examples of small businesses that software developers can start include developing and selling software products, providing software consulting services, or freelancing as a software developer.
6] As more and more businesses move online, the demand for software developers will continue to grow. Software developers are responsible for building and maintaining the digital infrastructure that businesses need to operate in the digital age.
P.S. I am not an expert and these are speculations
👍4
Date - 30/12/2023
Company Name - Course i5
Role: Data Scientist
Q. How can outlier values be treated?
A. An outlier is an observation in a dataset that differs significantly from the rest of the data. This signifies that an outlier is much larger or smaller than the rest of the data.
Given are some of the methods of treating the outliers: Trimming or removing the outlier, Quantile based flooring and capping, Mean/Median imputation.
Q. What is root cause analysis?
A. A root cause is a component that contributed to a nonconformance and should be eradicated permanently through process improvement. The root cause is the most fundamental problem—the most fundamental reason—that puts in motion the entire cause-and-effect chain that leads to the problem (s). Root cause analysis (RCA) is a word that refers to a variety of approaches, tools, and procedures used to identify the root causes of problems. Some RCA approaches are more directed toward uncovering actual root causes than others, while others are more general problem-solving procedures, and yet others just provide support for the root cause analysis core activity.
Q. What is bias and variance in Data Science?
A. The model's simplifying assumptions simplify the target function, making it easier to estimate. Bias is the difference between the Predicted Value and the Expected Value in its most basic form. Variance refers to how much the target function's estimate will fluctuate as a result of varied training data. In contrast to bias, variance occurs when the model takes into account the data's fluctuations, or noise.
Q. What is a confusion matrix?
A. A confusion matrix is a method of summarising a classification algorithm's performance. Calculating a confusion matrix can help you understand what your classification model is getting right and where it is going wrong. This gives us the following: "True positive" for event values that were successfully predicted. "False positive" for event values that were mistakenly predicted. For successfully anticipated no-event values, "true negative" is used. "False negative" for no-event values that were mistakenly predicted.
————————————————————-
Company Name - Course i5
Role: Data Scientist
Q. How can outlier values be treated?
A. An outlier is an observation in a dataset that differs significantly from the rest of the data. This signifies that an outlier is much larger or smaller than the rest of the data.
Given are some of the methods of treating the outliers: Trimming or removing the outlier, Quantile based flooring and capping, Mean/Median imputation.
Q. What is root cause analysis?
A. A root cause is a component that contributed to a nonconformance and should be eradicated permanently through process improvement. The root cause is the most fundamental problem—the most fundamental reason—that puts in motion the entire cause-and-effect chain that leads to the problem (s). Root cause analysis (RCA) is a word that refers to a variety of approaches, tools, and procedures used to identify the root causes of problems. Some RCA approaches are more directed toward uncovering actual root causes than others, while others are more general problem-solving procedures, and yet others just provide support for the root cause analysis core activity.
Q. What is bias and variance in Data Science?
A. The model's simplifying assumptions simplify the target function, making it easier to estimate. Bias is the difference between the Predicted Value and the Expected Value in its most basic form. Variance refers to how much the target function's estimate will fluctuate as a result of varied training data. In contrast to bias, variance occurs when the model takes into account the data's fluctuations, or noise.
Q. What is a confusion matrix?
A. A confusion matrix is a method of summarising a classification algorithm's performance. Calculating a confusion matrix can help you understand what your classification model is getting right and where it is going wrong. This gives us the following: "True positive" for event values that were successfully predicted. "False positive" for event values that were mistakenly predicted. For successfully anticipated no-event values, "true negative" is used. "False negative" for no-event values that were mistakenly predicted.
————————————————————-
👍4
Date: 02-01-2024
Company name: Oracle
Role: Data Analyst
Topic: outlier, dax, filter, recursive stored procedure
1. What are the ways to detect outliers?
Outliers are detected using two methods:
Box Plot Method: According to this method, the value is considered an outlier if it exceeds or falls below 1.5*IQR (interquartile range), that is, if it lies above the top quartile (Q3) or below the bottom quartile (Q1).
Standard Deviation Method: According to this method, an outlier is defined as a value that is greater or lower than the mean ± (3*standard deviation).
2. What is a Recursive Stored Procedure?
A stored procedure that calls itself until a boundary condition is reached, is called a recursive stored procedure. This recursive function helps the programmers to deploy the same set of code several times as and when required.
3. What is the shortcut to add a filter to a table in EXCEL?
The filter mechanism is used when you want to display only specific data from the entire dataset. By doing so, there is no change being made to the data. The shortcut to add a filter to a table is Ctrl+Shift+L.
4. What is DAX in Power BI?
DAX stands for Data Analysis Expressions. It's a collection of functions, operators, and constants used in formulas to calculate and return values. In other words, it helps you create new info from data you already have.
Company name: Oracle
Role: Data Analyst
Topic: outlier, dax, filter, recursive stored procedure
1. What are the ways to detect outliers?
Outliers are detected using two methods:
Box Plot Method: According to this method, the value is considered an outlier if it exceeds or falls below 1.5*IQR (interquartile range), that is, if it lies above the top quartile (Q3) or below the bottom quartile (Q1).
Standard Deviation Method: According to this method, an outlier is defined as a value that is greater or lower than the mean ± (3*standard deviation).
2. What is a Recursive Stored Procedure?
A stored procedure that calls itself until a boundary condition is reached, is called a recursive stored procedure. This recursive function helps the programmers to deploy the same set of code several times as and when required.
3. What is the shortcut to add a filter to a table in EXCEL?
The filter mechanism is used when you want to display only specific data from the entire dataset. By doing so, there is no change being made to the data. The shortcut to add a filter to a table is Ctrl+Shift+L.
4. What is DAX in Power BI?
DAX stands for Data Analysis Expressions. It's a collection of functions, operators, and constants used in formulas to calculate and return values. In other words, it helps you create new info from data you already have.
👍2
Date: 03-01-2024
Company : Accenture
Role : Data Scientist
Topic : Silhouette coeff, Trend&Seasonality, Bag of words, Self join
1. What do you understand by the term silhouette coefficient?
The silhouette coefficient is a measure of how well clustered together a data point is with respect to the other points in its cluster. It is a measure of how similar a point is to the points in its own cluster, and how dissimilar it is to the points in other clusters. The silhouette coefficient ranges from -1 to 1, with 1 being the best possible score and -1 being the worst possible score.
2. What is the difference between trend and seasonality in time series?
Trends and seasonality are two characteristics of time series metrics that break many models. Trends are continuous increases or decreases in a metric’s value. Seasonality, on the other hand, reflects periodic (cyclical) patterns that occur in a system, usually rising above a baseline and then decreasing again.
3. What is Bag of Words in NLP?
Bag of Words is a commonly used model that depends on word frequencies or occurrences to train a classifier. This model creates an occurrence matrix for documents or sentences irrespective of its grammatical structure or word order.
4. What is a Self-Join?
A self-join is a type of join that can be used to connect two tables. As a result, it is a unary relationship. Each row of the table is attached to itself and all other rows of the same table in a self-join. As a result, a self-join is mostly used to combine and compare rows from the same database table.
5.Explain the Law of Large Numbers.
The ‘Law of Large Numbers’ states that if an experiment is repeated independently a large number of times, the average of the individual results is close to the expected value. It also states that the sample variance and standard deviation also converge towards the expected value.
Company : Accenture
Role : Data Scientist
Topic : Silhouette coeff, Trend&Seasonality, Bag of words, Self join
1. What do you understand by the term silhouette coefficient?
The silhouette coefficient is a measure of how well clustered together a data point is with respect to the other points in its cluster. It is a measure of how similar a point is to the points in its own cluster, and how dissimilar it is to the points in other clusters. The silhouette coefficient ranges from -1 to 1, with 1 being the best possible score and -1 being the worst possible score.
2. What is the difference between trend and seasonality in time series?
Trends and seasonality are two characteristics of time series metrics that break many models. Trends are continuous increases or decreases in a metric’s value. Seasonality, on the other hand, reflects periodic (cyclical) patterns that occur in a system, usually rising above a baseline and then decreasing again.
3. What is Bag of Words in NLP?
Bag of Words is a commonly used model that depends on word frequencies or occurrences to train a classifier. This model creates an occurrence matrix for documents or sentences irrespective of its grammatical structure or word order.
4. What is a Self-Join?
A self-join is a type of join that can be used to connect two tables. As a result, it is a unary relationship. Each row of the table is attached to itself and all other rows of the same table in a self-join. As a result, a self-join is mostly used to combine and compare rows from the same database table.
5.Explain the Law of Large Numbers.
The ‘Law of Large Numbers’ states that if an experiment is repeated independently a large number of times, the average of the individual results is close to the expected value. It also states that the sample variance and standard deviation also converge towards the expected value.
👍1
Java_Built_in_Methods_Cheatsheet_1702836925.pdf
87.9 KB
Java_Built_in_Methods_Cheatsheet_1702836925.pdf
🔥2
Interview QnAs
Company - Bosch
Date- 09/01/2024
Role: Data Scientist
1. What is a logistic function? What is the range of values of a logistic function?
f(z) = 1/(1+e -z )
The values of a logistic function will range from 0 to 1. The values of Z will vary from -infinity to +infinity.
2. What is the difference between R square and adjusted R square?
R square and adjusted R square values are used for model validation in case of linear regression. R square indicates the variation of all the independent variables on the dependent variable. i.e. it considers all the independent variable to explain the variation. In the case of Adjusted R squared, it considers only significant variables(P values less than 0.05) to indicate the percentage of variation in the model.
Thus Adjusted R2 is always lesser then R2.
3. What is stratify in Train_test_split?
Stratification means that the train_test_split method returns training and test subsets that have the same proportions of class labels as the input dataset. So if my input data has 60% 0's and 40% 1's as my class label, then my train and test dataset will also have the similar proportions.
4. What is Backpropagation in Artificial Neuron Network?
Backpropagation is the method of fine-tuning the weights of a neural network based on the error rate obtained in the previous epoch (i.e., iteration). Proper tuning of the weights allows you to reduce error rates and make the model reliable by increasing its generalization.
Company - Bosch
Date- 09/01/2024
Role: Data Scientist
1. What is a logistic function? What is the range of values of a logistic function?
f(z) = 1/(1+e -z )
The values of a logistic function will range from 0 to 1. The values of Z will vary from -infinity to +infinity.
2. What is the difference between R square and adjusted R square?
R square and adjusted R square values are used for model validation in case of linear regression. R square indicates the variation of all the independent variables on the dependent variable. i.e. it considers all the independent variable to explain the variation. In the case of Adjusted R squared, it considers only significant variables(P values less than 0.05) to indicate the percentage of variation in the model.
Thus Adjusted R2 is always lesser then R2.
3. What is stratify in Train_test_split?
Stratification means that the train_test_split method returns training and test subsets that have the same proportions of class labels as the input dataset. So if my input data has 60% 0's and 40% 1's as my class label, then my train and test dataset will also have the similar proportions.
4. What is Backpropagation in Artificial Neuron Network?
Backpropagation is the method of fine-tuning the weights of a neural network based on the error rate obtained in the previous epoch (i.e., iteration). Proper tuning of the weights allows you to reduce error rates and make the model reliable by increasing its generalization.
🔥1
Date - 10-01-2024
Company name: LTI
Role: ML Engineer
Topic: cart, cross validation, knn, pandas data structures
1. Mention The Different Types Of Data Structures In pandas?
There are two data structures supported by pandas library, Series and DataFrames. Both of the data structures are built on top of Numpy. Series is a one-dimensional data structure in pandas and DataFrame is the two-dimensional data structure in pandas. There is one more axis label known as Panel which is a three-dimensional data structure and it includes items, major_axis, and minor_axis.
2. Why is KNN a non-parametric Algorithm?
The term “non-parametric” refers to not making any assumptions on the underlying data distribution. These methods do not have any fixed numbers of parameters in the model.
Similarly in KNN, the model parameters grow with the training data by considering each training case as a parameter of the model. So, KNN is a non-parametric algorithm.
3. Explain the CART Algorithm for Decision Trees.
CART is a variation of the decision tree algorithm. It can handle both classification and regression tasks.The CART stands for Classification and Regression Trees is a greedy algorithm that greedily searches for an optimum split at the top level, then repeats the same process at each of the subsequent levels. Moreover, it does verify whether the split will lead to the lowest impurity or not as well as the solution provided by the greedy algorithm is not guaranteed to be optimal, it often produces a solution that’s reasonably good since finding the optimal Tree is an NP-Complete problem that requires exponential time complexity.
4. Explain leave-p-out cross validation.
When using this exhaustive method, we take p number of points out from the total number of data points in the dataset(say n). While training the model we train it on these (n – p) data points and test the model on p data points. We repeat this process for all the possible combinations of p from the original dataset. Then to get the final accuracy, we average the accuracies from all these iterations.
————————————————————-
Stay Safe & Happy Learning💙
Company name: LTI
Role: ML Engineer
Topic: cart, cross validation, knn, pandas data structures
1. Mention The Different Types Of Data Structures In pandas?
There are two data structures supported by pandas library, Series and DataFrames. Both of the data structures are built on top of Numpy. Series is a one-dimensional data structure in pandas and DataFrame is the two-dimensional data structure in pandas. There is one more axis label known as Panel which is a three-dimensional data structure and it includes items, major_axis, and minor_axis.
2. Why is KNN a non-parametric Algorithm?
The term “non-parametric” refers to not making any assumptions on the underlying data distribution. These methods do not have any fixed numbers of parameters in the model.
Similarly in KNN, the model parameters grow with the training data by considering each training case as a parameter of the model. So, KNN is a non-parametric algorithm.
3. Explain the CART Algorithm for Decision Trees.
CART is a variation of the decision tree algorithm. It can handle both classification and regression tasks.The CART stands for Classification and Regression Trees is a greedy algorithm that greedily searches for an optimum split at the top level, then repeats the same process at each of the subsequent levels. Moreover, it does verify whether the split will lead to the lowest impurity or not as well as the solution provided by the greedy algorithm is not guaranteed to be optimal, it often produces a solution that’s reasonably good since finding the optimal Tree is an NP-Complete problem that requires exponential time complexity.
4. Explain leave-p-out cross validation.
When using this exhaustive method, we take p number of points out from the total number of data points in the dataset(say n). While training the model we train it on these (n – p) data points and test the model on p data points. We repeat this process for all the possible combinations of p from the original dataset. Then to get the final accuracy, we average the accuracies from all these iterations.
————————————————————-
Stay Safe & Happy Learning💙
❤1🔥1
Today's Interview QnAs
Date : 11/01/2024
Company : KPMG
Role: Data Scientist
Q1. You are given a data set. The data set has missing values that spread along 1 standard deviation from the median. What percentage of data would remain unaffected? Why?
This question has enough hints for you to start thinking! Since the data is spread across the median, let’s assume it’s a normal distribution. We know, in a normal distribution, ~68% of the data lies in 1 standard deviation from mean (or mode, median), which leaves ~32% of the data unaffected. Therefore, ~32% of the data would remain unaffected by missing values.
Q2. What do you mean by convex hull?
Convex hull is represents to outer boundaries of two-level group of data point. Once is convex hull has to been created data-set value, we get maximum data-set value level of margin hyperplane (MMH), which attempts to create data set value greatest departure between two groups data set value, as a vertical bisector between two convex hulls data set value.
A convex hull of a set S is the intersection of all convex set of which S is a subset. We denote it by [S] the convex hull of S.
Example:
S= {x: |x|=1} implies [S]= {x: |x|< or =1}
S= {x: |x| > or= 1} implies [S]=En
Q3. You’ve built a random forest model with 10000 trees. You got delighted after getting training error as 0.00. But, the validation error is 34.23. What is going on? Haven’t you trained your model perfectly?
Ans: The model has overfitted. Training error 0.00 means the classifier has mimicked the training data patterns to an extent, that they are not available in the unseen data. Hence, when this classifier was run on an unseen sample, it couldn’t find those patterns and returned predictions with higher error. In a random forest, it happens when we use a larger number of trees than necessary. Hence, to avoid this situation, we should tune the number of trees using cross-validation.
Q4. State one real life applications of convex hulls?
Ans: One applications convex hulls is to computation/construction of convex relaxations. Can say this is a way to find 'closest' convex problem to a non-convex problem one is attempting to solve.
————————————————————-
Stay Safe & Happy Learning💙
Date : 11/01/2024
Company : KPMG
Role: Data Scientist
Q1. You are given a data set. The data set has missing values that spread along 1 standard deviation from the median. What percentage of data would remain unaffected? Why?
This question has enough hints for you to start thinking! Since the data is spread across the median, let’s assume it’s a normal distribution. We know, in a normal distribution, ~68% of the data lies in 1 standard deviation from mean (or mode, median), which leaves ~32% of the data unaffected. Therefore, ~32% of the data would remain unaffected by missing values.
Q2. What do you mean by convex hull?
Convex hull is represents to outer boundaries of two-level group of data point. Once is convex hull has to been created data-set value, we get maximum data-set value level of margin hyperplane (MMH), which attempts to create data set value greatest departure between two groups data set value, as a vertical bisector between two convex hulls data set value.
A convex hull of a set S is the intersection of all convex set of which S is a subset. We denote it by [S] the convex hull of S.
Example:
S= {x: |x|=1} implies [S]= {x: |x|< or =1}
S= {x: |x| > or= 1} implies [S]=En
Q3. You’ve built a random forest model with 10000 trees. You got delighted after getting training error as 0.00. But, the validation error is 34.23. What is going on? Haven’t you trained your model perfectly?
Ans: The model has overfitted. Training error 0.00 means the classifier has mimicked the training data patterns to an extent, that they are not available in the unseen data. Hence, when this classifier was run on an unseen sample, it couldn’t find those patterns and returned predictions with higher error. In a random forest, it happens when we use a larger number of trees than necessary. Hence, to avoid this situation, we should tune the number of trees using cross-validation.
Q4. State one real life applications of convex hulls?
Ans: One applications convex hulls is to computation/construction of convex relaxations. Can say this is a way to find 'closest' convex problem to a non-convex problem one is attempting to solve.
————————————————————-
Stay Safe & Happy Learning💙
👍1🔥1
Date: 13-01-2024
Company name: Musigma
Role: ML Engineer
Topic: random forest, linear regression, naive bayes, gradient descent
1. How is the Error calculated in a Linear Regression model?
Measuring the distance of the observed y-values from the predicted y-values at each value of x.
Squaring each of these distances.
Calculating the mean of each of the squared distances.
MSE = (1/n) * Σ(actual – forecast)2
The smaller the Mean Squared Error, the closer you are to finding the line of best fit
How bad or good is this final value always depends on the context of the problem, but the main goal is that its value is as minimal as possible.
2. Explain the intuition behind the Gradient Descent algorithm.
Gradient descent is an optimization algorithm that’s used when training a machine learning model and is based on a convex function and tweaks its parameters iteratively to minimize a given function to its local minimum (that is, slope = 0).
For a start, we have to select a random bias and weights, and then iterate over the slope function to get a slope of 0.
The way we change update the value of the bias and weights is through a variable called the learning rate. We have to be wise on the learning rate because choosing:
A small leaning rate may lead to the model to take some time to learn
A large learning rate will make the model converge as our pointer will shoot and we’ll not be able to get to minima.
3. How is a Random Forest related to Decision Trees?
Random forest is an ensemble learning method that works by constructing a multitude of decision trees. A random forest can be constructed for both classification and regression tasks.
Random forest outperforms decision trees, and it also does not have the habit of overfitting the data as decision trees do.
A decision tree trained on a specific dataset will become very deep and cause overfitting. To create a random forest, decision trees can be trained on different subsets of the training dataset, and then the different decision trees can be averaged with the goal of decreasing the variance.
4. What are some disadvantages of using Naive Bayes Algorithm?
Some disadvantages of using Naive Bayes Algorithm are:
It relies on a very big assumption that the independent variables are not related to each other.
It is generally not suitable for datasets with large numbers of numerical attributes.
It has been observed that if a rare case is not in the training dataset but is in the testing dataset, then it will most definitely be wrong.
————————————————————-
Stay Safe & Happy Learning💙
Company name: Musigma
Role: ML Engineer
Topic: random forest, linear regression, naive bayes, gradient descent
1. How is the Error calculated in a Linear Regression model?
Measuring the distance of the observed y-values from the predicted y-values at each value of x.
Squaring each of these distances.
Calculating the mean of each of the squared distances.
MSE = (1/n) * Σ(actual – forecast)2
The smaller the Mean Squared Error, the closer you are to finding the line of best fit
How bad or good is this final value always depends on the context of the problem, but the main goal is that its value is as minimal as possible.
2. Explain the intuition behind the Gradient Descent algorithm.
Gradient descent is an optimization algorithm that’s used when training a machine learning model and is based on a convex function and tweaks its parameters iteratively to minimize a given function to its local minimum (that is, slope = 0).
For a start, we have to select a random bias and weights, and then iterate over the slope function to get a slope of 0.
The way we change update the value of the bias and weights is through a variable called the learning rate. We have to be wise on the learning rate because choosing:
A small leaning rate may lead to the model to take some time to learn
A large learning rate will make the model converge as our pointer will shoot and we’ll not be able to get to minima.
3. How is a Random Forest related to Decision Trees?
Random forest is an ensemble learning method that works by constructing a multitude of decision trees. A random forest can be constructed for both classification and regression tasks.
Random forest outperforms decision trees, and it also does not have the habit of overfitting the data as decision trees do.
A decision tree trained on a specific dataset will become very deep and cause overfitting. To create a random forest, decision trees can be trained on different subsets of the training dataset, and then the different decision trees can be averaged with the goal of decreasing the variance.
4. What are some disadvantages of using Naive Bayes Algorithm?
Some disadvantages of using Naive Bayes Algorithm are:
It relies on a very big assumption that the independent variables are not related to each other.
It is generally not suitable for datasets with large numbers of numerical attributes.
It has been observed that if a rare case is not in the training dataset but is in the testing dataset, then it will most definitely be wrong.
————————————————————-
Stay Safe & Happy Learning💙
👍2🔥1😁1
Date - 20/01/2024
Company Name - Bridgei2i Analytics
Role - Data Scientist
Q. Why does overfitting occur?
A. Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. This means that the noise or random fluctuations in the training data is picked up and learned as concepts by the model
Q. What is ensemble learning?
A. Ensemble learning is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem. Ensemble learning is primarily used to improve the (classification, prediction, function approximation, etc.) performance of a model, or reduce the likelihood of an unfortunate selection of a poor one.
Q. What is F1 score?
A. The F1 score is defined as the harmonic mean of precision and recall. As a short reminder, the harmonic mean is an alternative metric for the more common arithmetic mean. It is often useful when computing an average rate. In the F1 score, we compute the average of precision and recall.
Q. What is pickling and unpickling?
A.“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.
Q. What is lambda function?
A. Python Lambda Functions are anonymous function means that the function is without a name. As we already know that the def keyword is used to define a normal function in Python. Similarly, the lambda keyword is used to define an anonymous function in Python.
Q. What is the trade of between bias and variance ?
A. Bias is the simplifying assumptions made by the model to make the target function easier to approximate. Variance is the amount that the estimate of the target function will change given different training data. Trade-off is tension between the error introduced by the bias and the variance.
Company Name - Bridgei2i Analytics
Role - Data Scientist
Q. Why does overfitting occur?
A. Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. This means that the noise or random fluctuations in the training data is picked up and learned as concepts by the model
Q. What is ensemble learning?
A. Ensemble learning is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem. Ensemble learning is primarily used to improve the (classification, prediction, function approximation, etc.) performance of a model, or reduce the likelihood of an unfortunate selection of a poor one.
Q. What is F1 score?
A. The F1 score is defined as the harmonic mean of precision and recall. As a short reminder, the harmonic mean is an alternative metric for the more common arithmetic mean. It is often useful when computing an average rate. In the F1 score, we compute the average of precision and recall.
Q. What is pickling and unpickling?
A.“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.
Q. What is lambda function?
A. Python Lambda Functions are anonymous function means that the function is without a name. As we already know that the def keyword is used to define a normal function in Python. Similarly, the lambda keyword is used to define an anonymous function in Python.
Q. What is the trade of between bias and variance ?
A. Bias is the simplifying assumptions made by the model to make the target function easier to approximate. Variance is the amount that the estimate of the target function will change given different training data. Trade-off is tension between the error introduced by the bias and the variance.
🔥1