Coding Interview ⛥
1.5K subscribers
115 photos
215 files
30 links
This channel contains the free resources and solution of coding problems which are usually asked in the interviews.
Download Telegram
top 100 Java interview questions .pdf
624.3 KB
top 100 Java interview questions .pdf
🤩1
Date - 30/12/2023
Company Name - Course i5
Role: Data Scientist

Q.   How can outlier values be treated?


A.  An outlier is an observation in a dataset that differs significantly from the rest of the data. This signifies that an outlier is much larger or smaller than the rest of the data.
Given are some of the methods of treating the outliers: Trimming or removing the outlier, Quantile based flooring and capping, Mean/Median imputation.


Q.   What is root cause analysis?


A.  A root cause is a component that contributed to a nonconformance and should be eradicated permanently through process improvement. The root cause is the most fundamental problem—the most fundamental reason—that puts in motion the entire cause-and-effect chain that leads to the problem (s). Root cause analysis (RCA) is a word that refers to a variety of approaches, tools, and procedures used to identify the root causes of problems. Some RCA approaches are more directed toward uncovering actual root causes than others, while others are more general problem-solving procedures, and yet others just provide support for the root cause analysis core activity.


Q.  What is bias and variance in Data Science?

A.  The model's simplifying assumptions simplify the target function, making it easier to estimate. Bias is the difference between the Predicted Value and the Expected Value in its most basic form. Variance refers to how much the target function's estimate will fluctuate as a result of varied training data. In contrast to bias, variance occurs when the model takes into account the data's fluctuations, or noise.


Q.  What is a confusion matrix?

A.   A confusion matrix is a method of summarising a classification algorithm's performance. Calculating a confusion matrix can help you understand what your classification model is getting right and where it is going wrong. This gives us the following: "True positive" for event values that were successfully predicted. "False positive" for event values that were mistakenly predicted. For successfully anticipated no-event values, "true negative" is used. "False negative" for no-event values that were mistakenly predicted.

————————————————————-
👍4
Date: 02-01-2024
Company name: Oracle
Role: Data Analyst
Topic: outlier, dax, filter, recursive stored procedure

1. What are the ways to detect outliers?

Outliers are detected using two methods:

Box Plot Method: According to this method, the value is considered an outlier if it exceeds or falls below 1.5*IQR (interquartile range), that is, if it lies above the top quartile (Q3) or below the bottom quartile (Q1).

Standard Deviation Method: According to this method, an outlier is defined as a value that is greater or lower than the mean ± (3*standard deviation).


2. What is a Recursive Stored Procedure?

A stored procedure that calls itself until a boundary condition is reached, is called a recursive stored procedure. This recursive function helps the programmers to deploy the same set of code several times as and when required.


3. What is the shortcut to add a filter to a table in EXCEL?

The filter mechanism is used when you want to display only specific data from the entire dataset. By doing so, there is no change being made to the data. The shortcut to add a filter to a table is Ctrl+Shift+L.

4. What is DAX in Power BI?

DAX stands for Data Analysis Expressions. It's a collection of functions, operators, and constants used in formulas to calculate and return values. In other words, it helps you create new info from data you already have.
👍2
1👍1
Date: 03-01-2024
Company : Accenture
Role : Data Scientist
Topic : Silhouette coeff, Trend&Seasonality, Bag of words, Self join

1. What do you understand by the term silhouette coefficient?
The silhouette coefficient is a measure of how well clustered together a data point is with respect to the other points in its cluster. It is a measure of how similar a point is to the points in its own cluster, and how dissimilar it is to the points in other clusters. The silhouette coefficient ranges from -1 to 1, with 1 being the best possible score and -1 being the worst possible score.

2. What is the difference between trend and seasonality in time series?
Trends and seasonality are two characteristics of time series metrics that break many models. Trends are continuous increases or decreases in a metric’s value. Seasonality, on the other hand, reflects periodic (cyclical) patterns that occur in a system, usually rising above a baseline and then decreasing again.

3. What is Bag of Words in NLP?
Bag of Words is a commonly used model that depends on word frequencies or occurrences to train a classifier. This model creates an occurrence matrix for documents or sentences irrespective of its grammatical structure or word order.

4. What is a Self-Join?

A self-join is a type of join that can be used to connect two tables. As a result, it is a unary relationship. Each row of the table is attached to itself and all other rows of the same table in a self-join. As a result, a self-join is mostly used to combine and compare rows from the same database table.

5.Explain the Law of Large Numbers.

The ‘Law of Large Numbers’ states that if an experiment is repeated independently a large number of times, the average of the individual results is close to the expected value. It also states that the sample variance and standard deviation also converge towards the expected value.
👍1
Cold Email Template
🔥2🤩1
Java_Built_in_Methods_Cheatsheet_1702836925.pdf
87.9 KB
Java_Built_in_Methods_Cheatsheet_1702836925.pdf
🔥2
Interview QnAs
Company - Bosch
Date- 09/01/2024
Role: Data Scientist

1. What is a logistic function? What is the range of values of a logistic function?

f(z) = 1/(1+e -z )
The values of a logistic function will range from 0 to 1. The values of Z will vary from -infinity to +infinity.


2. What is the difference between R square and adjusted R square?

R square and adjusted R square values are used for model validation in case of linear regression. R square indicates the variation of all the independent variables on the dependent variable. i.e. it considers all the independent variable to explain the variation. In the case of Adjusted R squared, it considers only significant variables(P values less than 0.05) to indicate the percentage of variation in the model.

Thus Adjusted R2 is always lesser then R2.


3. What is stratify in Train_test_split?

Stratification means that the train_test_split method returns training and test subsets that have the same proportions of class labels as the input dataset. So if my input data has 60% 0's and 40% 1's as my class label, then my train and test dataset will also have the similar proportions.


4. What is Backpropagation in Artificial Neuron Network?

Backpropagation is the method of fine-tuning the weights of a neural network based on the error rate obtained in the previous epoch (i.e., iteration). Proper tuning of the weights allows you to reduce error rates and make the model reliable by increasing its generalization.
🔥1
Date - 10-01-2024
Company name: LTI
Role: ML Engineer
Topic: cart, cross validation, knn, pandas data structures

1. Mention The Different Types Of Data Structures In pandas?

There are two data structures supported by pandas library, Series and DataFrames. Both of the data structures are built on top of Numpy. Series is a one-dimensional data structure in pandas and DataFrame is the two-dimensional data structure in pandas. There is one more axis label known as Panel which is a three-dimensional data structure and it includes items, major_axis, and minor_axis.

2. Why is KNN a non-parametric Algorithm?

The term “non-parametric” refers to not making any assumptions on the underlying data distribution. These methods do not have any fixed numbers of parameters in the model.
Similarly in KNN, the model parameters grow with the training data by considering each training case as a parameter of the model. So, KNN is a non-parametric algorithm.

3. Explain the CART Algorithm for Decision Trees.

CART is a  variation of the decision tree algorithm. It can handle both classification and regression tasks.The CART stands for Classification and Regression Trees is a greedy algorithm that greedily searches for an optimum split at the top level, then repeats the same process at each of the subsequent levels. Moreover, it does verify whether the split will lead to the lowest impurity or not as well as the solution provided by the greedy algorithm is not guaranteed to be optimal, it often produces a solution that’s reasonably good since finding the optimal Tree is an NP-Complete problem that requires exponential time complexity.

4. Explain leave-p-out cross validation.

When using this exhaustive method, we take p number of points out from the total number of data points in the dataset(say n). While training the model we train it on these (n – p) data points and test the model on p data points. We repeat this process for all the possible combinations of p from the original dataset. Then to get the final accuracy, we average the accuracies from all these iterations.

————————————————————-


Stay Safe & Happy Learning💙
1🔥1
Today's Interview QnAs
Date : 11/01/2024
Company : KPMG
Role: Data Scientist


Q1. You are given a data set. The data set has missing values that spread along 1 standard deviation from the median. What percentage of data would remain unaffected? Why?

This question has enough hints for you to start thinking! Since the data is spread across the median, let’s assume it’s a normal distribution. We know, in a normal distribution, ~68% of the data lies in 1 standard deviation from mean (or mode, median), which leaves ~32% of the data unaffected. Therefore, ~32% of the data would remain unaffected by missing values.


Q2. What do you mean by convex hull?

Convex hull is represents to outer boundaries of two-level group of data point. Once is convex hull has to been created data-set value, we get maximum data-set value level of margin hyperplane (MMH), which attempts to create data set value greatest departure between two groups data set value, as a vertical bisector between two convex hulls data set value.
A convex hull of a set S is the intersection of all convex set of which S is a subset. We denote it by [S] the convex hull of S.

Example:

S= {x: |x|=1} implies [S]= {x: |x|< or =1}
S= {x: |x| > or= 1} implies [S]=En


Q3. You’ve built a random forest model with 10000 trees. You got delighted after getting training error as 0.00. But, the validation error is 34.23. What is going on? Haven’t you trained your model perfectly?

Ans: The model has overfitted. Training error 0.00 means the classifier has mimicked the training data patterns to an extent, that they are not available in the unseen data. Hence, when this classifier was run on an unseen sample, it couldn’t find those patterns and returned predictions with higher error. In a random forest, it happens when we use a larger number of trees than necessary. Hence, to avoid this situation, we should tune the number of trees using cross-validation.


Q4. State one real life applications of convex hulls?

Ans: One applications convex hulls is to computation/construction of convex relaxations. Can say this is a way to find 'closest' convex problem to a non-convex problem one is attempting to solve.

————————————————————-


Stay Safe & Happy Learning💙
👍1🔥1
Date: 13-01-2024
Company name: Musigma
Role: ML Engineer
Topic: random forest, linear regression, naive bayes, gradient descent

1. How is the Error calculated in a Linear Regression model?

Measuring the distance of the observed y-values from the predicted y-values at each value of x.
Squaring each of these distances.
Calculating the mean of each of the squared distances.
MSE = (1/n) * Σ(actual – forecast)2
The smaller the Mean Squared Error, the closer you are to finding the line of best fit
How bad or good is this final value always depends on the context of the problem, but the main goal is that its value is as minimal as possible.

2. Explain the intuition behind the Gradient Descent algorithm.

Gradient descent is an optimization algorithm that’s used when training a machine learning model and is based on a convex function and tweaks its parameters iteratively to minimize a given function to its local minimum (that is, slope = 0).
For a start, we have to select a random bias and weights, and then iterate over the slope function to get a slope of 0.
The way we change update the value of the bias and weights is through a variable called the learning rate. We have to be wise on the learning rate because choosing:
A small leaning rate may lead to the model to take some time to learn
A large learning rate will make the model converge as our pointer will shoot and we’ll not be able to get to minima.

3. How is a Random Forest related to Decision Trees?

Random forest is an ensemble learning method that works by constructing a multitude of decision trees. A random forest can be constructed for both classification and regression tasks.
Random forest outperforms decision trees, and it also does not have the habit of overfitting the data as decision trees do.
A decision tree trained on a specific dataset will become very deep and cause overfitting. To create a random forest, decision trees can be trained on different subsets of the training dataset, and then the different decision trees can be averaged with the goal of decreasing the variance.

4. What are some disadvantages of using Naive Bayes Algorithm?

Some disadvantages of using Naive Bayes Algorithm are:
It relies on a very big assumption that the independent variables are not related to each other.
It is generally not suitable for datasets with large numbers of numerical attributes.
It has been observed that if a rare case is not in the training dataset but is in the testing dataset, then it will most definitely be wrong.

————————————————————-


Stay Safe & Happy Learning💙
👍2🔥1😁1
Date - 20/01/2024
Company Name - Bridgei2i Analytics
Role - Data Scientist


Q. Why does overfitting occur?

A. Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. This means that the noise or random fluctuations in the training data is picked up and learned as concepts by the model


Q. What is ensemble learning?

A. Ensemble learning is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem. Ensemble learning is primarily used to improve the (classification, prediction, function approximation, etc.) performance of a model, or reduce the likelihood of an unfortunate selection of a poor one.


Q. What is F1 score?

A. The F1 score is defined as the harmonic mean of precision and recall. As a short reminder, the harmonic mean is an alternative metric for the more common arithmetic mean. It is often useful when computing an average rate. In the F1 score, we compute the average of precision and recall.


Q. What is pickling and unpickling?

A.“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.


Q. What is lambda function?

A.  Python Lambda Functions are anonymous function means that the function is without a name. As we already know that the def keyword is used to define a normal function in Python. Similarly, the lambda keyword is used to define an anonymous function in Python.


Q.  What is the trade of between bias and variance ?

A. Bias is the simplifying assumptions made by the model to make the target function easier to approximate. Variance is the amount that the estimate of the target function will change given different training data. Trade-off is tension between the error introduced by the bias and the variance.
🔥1
Java Basic Programs.pdf
3.4 MB
Java Basic Programs.pdf
🔥1
Linux Helpful Commands List (2).pdf
458.1 KB
Linux Helpful Commands List (2).pdf
🔥1
Date: 27-01-2024
Company name: Infosys
Role: ML Engineer
Topic: gradient descent, random forest, kmean vs knn, svm

1. Explain how does the Gradient descent work in Linear Regression

The Gradient Descent works by starting with random values for each coefficient in the linear regression model.
After this, the sum of the squared errors is calculated for each pair of input and output values (loss function), using a learning rate as a scale factor.
For each iteration, the coefficients are updated in the direction towards minimizing the error,
then we keep repeating the iteration process until a minimum sum squared error is achieved or no further improvement is possible.


2. What does Random refer to in Random Forest?

Random forest is an extension of the bagging method as it utilizes both bagging and feature randomness to create an uncorrelated forest of decision trees. Hence, random forest is Random in the following ways:
Each tree is trained on a random subset of features, which ensures low correlation among decision trees.
Each tree in the forest is trained in 2/3-rd of the total training data and data points are drawn at random from the original dataset.

3. What is the main difference between k-Means and k-Nearest Neighbours?

k-Means is a clustering algorithm that tries to partition a set of points into k sets such that the points in each cluster tend to be near each other. It is unsupervised because the points have no external classification.

k-Nearest Neighbors is a classification (or regression) algorithm that, in order to determine the classification of a point, combines the classification of the k nearest points. It is supervised because it is trying to classify a point based on the known classification of other points.


4. What are Support Vectors in SVMs?

Support vectors are the data points nearest to the hyperplane, the points of a data set that, if removed, would alter the position of the dividing hyperplane.
Using these support vectors, we maximize the margin of the classifier.
For computing predictions, only the support vectors are used.

————————————————————-



Stay Safe & Happy Learning💙
🔥1
Today's Interview QnAs
Date - 25/01/2024
Company - The Math Company
Role- Data Analyst


1.How to create filters in Power BI?

Filters are an integral part of Power BI reports. They are used to slice and dice the data as per the dimensions we want. Filters are created in a couple of ways.

Using Slicers: A slicer is a visual under Visualization Pane. This can be added to the design view to filter our reports. When a slicer is added to the design view, it requires a field to be added to it. For example- Slicer can be added for Country fields. Then the data can be filtered based on countries.
Using Filter Pane: The Power BI team has added a filter pane to the reports, which is a single space where we can add different fields as filters. And these fields can be added depending on whether you want to filter only one visual(Visual level filter), or all the visuals in the report page(Page level filters), or applicable to all the pages of the report(report level filters)


2.How to sort data in Power BI?

Sorting is available in multiple formats. In the data view, a common sorting option of alphabetical order is there. Apart from that, we have the option of Sort by column, where one can sort a column based on another column. The sorting option is available in visuals as well. Sort by ascending and descending option by the fields and measure present in the visual is also available.


3.How to convert pdf to excel?

Open the PDF document you want to convert in XLSX format in Acrobat DC.
Go to the right pane and click on the “Export PDF” option.
Choose spreadsheet as the Export format.
Select “Microsoft Excel Workbook.”
Now click “Export.”
Download the converted file or share it.


4. How to enable macros in excel?

Click the file tab and then click “Options.”
A dialog box will appear. In the “Excel Options” dialog box, click on the “Trust Center” and then “Trust Center Settings.”
Go to the “Macro Settings” and select “enable all macros.”
Click OK to apply the macro settings.
Javascript Notes.pdf
26.8 MB
Javascript Notes.pdf
🔥1