Coding Interview ⛥
1.5K subscribers
115 photos
215 files
30 links
This channel contains the free resources and solution of coding problems which are usually asked in the interviews.
Download Telegram
Java_Built_in_Methods_Cheatsheet_1702836925.pdf
87.9 KB
Java_Built_in_Methods_Cheatsheet_1702836925.pdf
🔥2
Interview QnAs
Company - Bosch
Date- 09/01/2024
Role: Data Scientist

1. What is a logistic function? What is the range of values of a logistic function?

f(z) = 1/(1+e -z )
The values of a logistic function will range from 0 to 1. The values of Z will vary from -infinity to +infinity.


2. What is the difference between R square and adjusted R square?

R square and adjusted R square values are used for model validation in case of linear regression. R square indicates the variation of all the independent variables on the dependent variable. i.e. it considers all the independent variable to explain the variation. In the case of Adjusted R squared, it considers only significant variables(P values less than 0.05) to indicate the percentage of variation in the model.

Thus Adjusted R2 is always lesser then R2.


3. What is stratify in Train_test_split?

Stratification means that the train_test_split method returns training and test subsets that have the same proportions of class labels as the input dataset. So if my input data has 60% 0's and 40% 1's as my class label, then my train and test dataset will also have the similar proportions.


4. What is Backpropagation in Artificial Neuron Network?

Backpropagation is the method of fine-tuning the weights of a neural network based on the error rate obtained in the previous epoch (i.e., iteration). Proper tuning of the weights allows you to reduce error rates and make the model reliable by increasing its generalization.
🔥1
Date - 10-01-2024
Company name: LTI
Role: ML Engineer
Topic: cart, cross validation, knn, pandas data structures

1. Mention The Different Types Of Data Structures In pandas?

There are two data structures supported by pandas library, Series and DataFrames. Both of the data structures are built on top of Numpy. Series is a one-dimensional data structure in pandas and DataFrame is the two-dimensional data structure in pandas. There is one more axis label known as Panel which is a three-dimensional data structure and it includes items, major_axis, and minor_axis.

2. Why is KNN a non-parametric Algorithm?

The term “non-parametric” refers to not making any assumptions on the underlying data distribution. These methods do not have any fixed numbers of parameters in the model.
Similarly in KNN, the model parameters grow with the training data by considering each training case as a parameter of the model. So, KNN is a non-parametric algorithm.

3. Explain the CART Algorithm for Decision Trees.

CART is a  variation of the decision tree algorithm. It can handle both classification and regression tasks.The CART stands for Classification and Regression Trees is a greedy algorithm that greedily searches for an optimum split at the top level, then repeats the same process at each of the subsequent levels. Moreover, it does verify whether the split will lead to the lowest impurity or not as well as the solution provided by the greedy algorithm is not guaranteed to be optimal, it often produces a solution that’s reasonably good since finding the optimal Tree is an NP-Complete problem that requires exponential time complexity.

4. Explain leave-p-out cross validation.

When using this exhaustive method, we take p number of points out from the total number of data points in the dataset(say n). While training the model we train it on these (n – p) data points and test the model on p data points. We repeat this process for all the possible combinations of p from the original dataset. Then to get the final accuracy, we average the accuracies from all these iterations.

————————————————————-


Stay Safe & Happy Learning💙
1🔥1
Today's Interview QnAs
Date : 11/01/2024
Company : KPMG
Role: Data Scientist


Q1. You are given a data set. The data set has missing values that spread along 1 standard deviation from the median. What percentage of data would remain unaffected? Why?

This question has enough hints for you to start thinking! Since the data is spread across the median, let’s assume it’s a normal distribution. We know, in a normal distribution, ~68% of the data lies in 1 standard deviation from mean (or mode, median), which leaves ~32% of the data unaffected. Therefore, ~32% of the data would remain unaffected by missing values.


Q2. What do you mean by convex hull?

Convex hull is represents to outer boundaries of two-level group of data point. Once is convex hull has to been created data-set value, we get maximum data-set value level of margin hyperplane (MMH), which attempts to create data set value greatest departure between two groups data set value, as a vertical bisector between two convex hulls data set value.
A convex hull of a set S is the intersection of all convex set of which S is a subset. We denote it by [S] the convex hull of S.

Example:

S= {x: |x|=1} implies [S]= {x: |x|< or =1}
S= {x: |x| > or= 1} implies [S]=En


Q3. You’ve built a random forest model with 10000 trees. You got delighted after getting training error as 0.00. But, the validation error is 34.23. What is going on? Haven’t you trained your model perfectly?

Ans: The model has overfitted. Training error 0.00 means the classifier has mimicked the training data patterns to an extent, that they are not available in the unseen data. Hence, when this classifier was run on an unseen sample, it couldn’t find those patterns and returned predictions with higher error. In a random forest, it happens when we use a larger number of trees than necessary. Hence, to avoid this situation, we should tune the number of trees using cross-validation.


Q4. State one real life applications of convex hulls?

Ans: One applications convex hulls is to computation/construction of convex relaxations. Can say this is a way to find 'closest' convex problem to a non-convex problem one is attempting to solve.

————————————————————-


Stay Safe & Happy Learning💙
👍1🔥1
Date: 13-01-2024
Company name: Musigma
Role: ML Engineer
Topic: random forest, linear regression, naive bayes, gradient descent

1. How is the Error calculated in a Linear Regression model?

Measuring the distance of the observed y-values from the predicted y-values at each value of x.
Squaring each of these distances.
Calculating the mean of each of the squared distances.
MSE = (1/n) * Σ(actual – forecast)2
The smaller the Mean Squared Error, the closer you are to finding the line of best fit
How bad or good is this final value always depends on the context of the problem, but the main goal is that its value is as minimal as possible.

2. Explain the intuition behind the Gradient Descent algorithm.

Gradient descent is an optimization algorithm that’s used when training a machine learning model and is based on a convex function and tweaks its parameters iteratively to minimize a given function to its local minimum (that is, slope = 0).
For a start, we have to select a random bias and weights, and then iterate over the slope function to get a slope of 0.
The way we change update the value of the bias and weights is through a variable called the learning rate. We have to be wise on the learning rate because choosing:
A small leaning rate may lead to the model to take some time to learn
A large learning rate will make the model converge as our pointer will shoot and we’ll not be able to get to minima.

3. How is a Random Forest related to Decision Trees?

Random forest is an ensemble learning method that works by constructing a multitude of decision trees. A random forest can be constructed for both classification and regression tasks.
Random forest outperforms decision trees, and it also does not have the habit of overfitting the data as decision trees do.
A decision tree trained on a specific dataset will become very deep and cause overfitting. To create a random forest, decision trees can be trained on different subsets of the training dataset, and then the different decision trees can be averaged with the goal of decreasing the variance.

4. What are some disadvantages of using Naive Bayes Algorithm?

Some disadvantages of using Naive Bayes Algorithm are:
It relies on a very big assumption that the independent variables are not related to each other.
It is generally not suitable for datasets with large numbers of numerical attributes.
It has been observed that if a rare case is not in the training dataset but is in the testing dataset, then it will most definitely be wrong.

————————————————————-


Stay Safe & Happy Learning💙
👍2🔥1😁1
Date - 20/01/2024
Company Name - Bridgei2i Analytics
Role - Data Scientist


Q. Why does overfitting occur?

A. Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. This means that the noise or random fluctuations in the training data is picked up and learned as concepts by the model


Q. What is ensemble learning?

A. Ensemble learning is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem. Ensemble learning is primarily used to improve the (classification, prediction, function approximation, etc.) performance of a model, or reduce the likelihood of an unfortunate selection of a poor one.


Q. What is F1 score?

A. The F1 score is defined as the harmonic mean of precision and recall. As a short reminder, the harmonic mean is an alternative metric for the more common arithmetic mean. It is often useful when computing an average rate. In the F1 score, we compute the average of precision and recall.


Q. What is pickling and unpickling?

A.“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.


Q. What is lambda function?

A.  Python Lambda Functions are anonymous function means that the function is without a name. As we already know that the def keyword is used to define a normal function in Python. Similarly, the lambda keyword is used to define an anonymous function in Python.


Q.  What is the trade of between bias and variance ?

A. Bias is the simplifying assumptions made by the model to make the target function easier to approximate. Variance is the amount that the estimate of the target function will change given different training data. Trade-off is tension between the error introduced by the bias and the variance.
🔥1
Java Basic Programs.pdf
3.4 MB
Java Basic Programs.pdf
🔥1
Linux Helpful Commands List (2).pdf
458.1 KB
Linux Helpful Commands List (2).pdf
🔥1
Date: 27-01-2024
Company name: Infosys
Role: ML Engineer
Topic: gradient descent, random forest, kmean vs knn, svm

1. Explain how does the Gradient descent work in Linear Regression

The Gradient Descent works by starting with random values for each coefficient in the linear regression model.
After this, the sum of the squared errors is calculated for each pair of input and output values (loss function), using a learning rate as a scale factor.
For each iteration, the coefficients are updated in the direction towards minimizing the error,
then we keep repeating the iteration process until a minimum sum squared error is achieved or no further improvement is possible.


2. What does Random refer to in Random Forest?

Random forest is an extension of the bagging method as it utilizes both bagging and feature randomness to create an uncorrelated forest of decision trees. Hence, random forest is Random in the following ways:
Each tree is trained on a random subset of features, which ensures low correlation among decision trees.
Each tree in the forest is trained in 2/3-rd of the total training data and data points are drawn at random from the original dataset.

3. What is the main difference between k-Means and k-Nearest Neighbours?

k-Means is a clustering algorithm that tries to partition a set of points into k sets such that the points in each cluster tend to be near each other. It is unsupervised because the points have no external classification.

k-Nearest Neighbors is a classification (or regression) algorithm that, in order to determine the classification of a point, combines the classification of the k nearest points. It is supervised because it is trying to classify a point based on the known classification of other points.


4. What are Support Vectors in SVMs?

Support vectors are the data points nearest to the hyperplane, the points of a data set that, if removed, would alter the position of the dividing hyperplane.
Using these support vectors, we maximize the margin of the classifier.
For computing predictions, only the support vectors are used.

————————————————————-



Stay Safe & Happy Learning💙
🔥1
Today's Interview QnAs
Date - 25/01/2024
Company - The Math Company
Role- Data Analyst


1.How to create filters in Power BI?

Filters are an integral part of Power BI reports. They are used to slice and dice the data as per the dimensions we want. Filters are created in a couple of ways.

Using Slicers: A slicer is a visual under Visualization Pane. This can be added to the design view to filter our reports. When a slicer is added to the design view, it requires a field to be added to it. For example- Slicer can be added for Country fields. Then the data can be filtered based on countries.
Using Filter Pane: The Power BI team has added a filter pane to the reports, which is a single space where we can add different fields as filters. And these fields can be added depending on whether you want to filter only one visual(Visual level filter), or all the visuals in the report page(Page level filters), or applicable to all the pages of the report(report level filters)


2.How to sort data in Power BI?

Sorting is available in multiple formats. In the data view, a common sorting option of alphabetical order is there. Apart from that, we have the option of Sort by column, where one can sort a column based on another column. The sorting option is available in visuals as well. Sort by ascending and descending option by the fields and measure present in the visual is also available.


3.How to convert pdf to excel?

Open the PDF document you want to convert in XLSX format in Acrobat DC.
Go to the right pane and click on the “Export PDF” option.
Choose spreadsheet as the Export format.
Select “Microsoft Excel Workbook.”
Now click “Export.”
Download the converted file or share it.


4. How to enable macros in excel?

Click the file tab and then click “Options.”
A dialog box will appear. In the “Excel Options” dialog box, click on the “Trust Center” and then “Trust Center Settings.”
Go to the “Macro Settings” and select “enable all macros.”
Click OK to apply the macro settings.
Javascript Notes.pdf
26.8 MB
Javascript Notes.pdf
🔥1
🔺CLOUD COMPUTING☁️🔺 TUTORIAL SHORT NOTES.pdf
4 MB
🔺CLOUD COMPUTING☁️🔺 TUTORIAL SHORT NOTES.pdf
Date: 06-02-2024
Company name: Elica
Role: Data Scientist
Topic: kmeans, KNN, LSTM, powerbi, SQL

1. Explain some cases where k-Means clustering fails to give good results

k-means has trouble clustering data where clusters are of various sizes and densities.Outliers will cause the centroids to be dragged, or the outliers might get their own cluster instead of being ignored. Outliers should be clipped or removed before clustering.If the number of dimensions increase, a distance-based similarity measure converges to a constant value between any given examples. Dimensions should be reduced before clustering them.

2. If your Time-Series Dataset is very long, what architecture would you use?

If the dataset for time-series is very long, LSTMs are ideal for it because it can not only process single data points, but also entire sequences of data. A time-series being a sequence of data makes LSTM ideal for it.For an even stronger representational capacity, making the LSTM's multi-layered is better.Another method for long time-series dataset is to use CNNs to extract information.

3. How would you define Power BI as an effective solution ?

Power BI is a strong business analytical tool that creates useful insights and reports by collating data from unrelated sources. This data can be extracted from any source like Microsoft Excel or hybrid data warehouses. Power BI drives an extreme level of utility and purpose using interactive graphical interface and visualizations.

4. Why is the KNN Algorithm known as Lazy Learner?

When the KNN algorithm gets the training data, it does not learn and make a model, it just stores the data. Instead of finding any discriminative function with the help of the training data, it follows instance-based learning and also uses the training data when it actually needs to do some prediction on the unseen datasets. As a result, KNN does not immediately learn a model rather delays the learning thereby being referred to as Lazy Learner.

5. Explain the difference between drop and truncate.

In SQL, the DROP command is used to remove the whole database or table indexes, data, and more. Whereas the TRUNCATE command is used to remove all the rows from the table.

————————————————————-


Stay Safe & Happy Learning💙
👍1