Coding Interview ⛥
1.5K subscribers
115 photos
215 files
30 links
This channel contains the free resources and solution of coding problems which are usually asked in the interviews.
Download Telegram
Date - 20/01/2024
Company Name - Bridgei2i Analytics
Role - Data Scientist


Q. Why does overfitting occur?

A. Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. This means that the noise or random fluctuations in the training data is picked up and learned as concepts by the model


Q. What is ensemble learning?

A. Ensemble learning is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem. Ensemble learning is primarily used to improve the (classification, prediction, function approximation, etc.) performance of a model, or reduce the likelihood of an unfortunate selection of a poor one.


Q. What is F1 score?

A. The F1 score is defined as the harmonic mean of precision and recall. As a short reminder, the harmonic mean is an alternative metric for the more common arithmetic mean. It is often useful when computing an average rate. In the F1 score, we compute the average of precision and recall.


Q. What is pickling and unpickling?

A.“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.


Q. What is lambda function?

A.  Python Lambda Functions are anonymous function means that the function is without a name. As we already know that the def keyword is used to define a normal function in Python. Similarly, the lambda keyword is used to define an anonymous function in Python.


Q.  What is the trade of between bias and variance ?

A. Bias is the simplifying assumptions made by the model to make the target function easier to approximate. Variance is the amount that the estimate of the target function will change given different training data. Trade-off is tension between the error introduced by the bias and the variance.
🔥1
Java Basic Programs.pdf
3.4 MB
Java Basic Programs.pdf
🔥1
Linux Helpful Commands List (2).pdf
458.1 KB
Linux Helpful Commands List (2).pdf
🔥1
Date: 27-01-2024
Company name: Infosys
Role: ML Engineer
Topic: gradient descent, random forest, kmean vs knn, svm

1. Explain how does the Gradient descent work in Linear Regression

The Gradient Descent works by starting with random values for each coefficient in the linear regression model.
After this, the sum of the squared errors is calculated for each pair of input and output values (loss function), using a learning rate as a scale factor.
For each iteration, the coefficients are updated in the direction towards minimizing the error,
then we keep repeating the iteration process until a minimum sum squared error is achieved or no further improvement is possible.


2. What does Random refer to in Random Forest?

Random forest is an extension of the bagging method as it utilizes both bagging and feature randomness to create an uncorrelated forest of decision trees. Hence, random forest is Random in the following ways:
Each tree is trained on a random subset of features, which ensures low correlation among decision trees.
Each tree in the forest is trained in 2/3-rd of the total training data and data points are drawn at random from the original dataset.

3. What is the main difference between k-Means and k-Nearest Neighbours?

k-Means is a clustering algorithm that tries to partition a set of points into k sets such that the points in each cluster tend to be near each other. It is unsupervised because the points have no external classification.

k-Nearest Neighbors is a classification (or regression) algorithm that, in order to determine the classification of a point, combines the classification of the k nearest points. It is supervised because it is trying to classify a point based on the known classification of other points.


4. What are Support Vectors in SVMs?

Support vectors are the data points nearest to the hyperplane, the points of a data set that, if removed, would alter the position of the dividing hyperplane.
Using these support vectors, we maximize the margin of the classifier.
For computing predictions, only the support vectors are used.

————————————————————-



Stay Safe & Happy Learning💙
🔥1
Today's Interview QnAs
Date - 25/01/2024
Company - The Math Company
Role- Data Analyst


1.How to create filters in Power BI?

Filters are an integral part of Power BI reports. They are used to slice and dice the data as per the dimensions we want. Filters are created in a couple of ways.

Using Slicers: A slicer is a visual under Visualization Pane. This can be added to the design view to filter our reports. When a slicer is added to the design view, it requires a field to be added to it. For example- Slicer can be added for Country fields. Then the data can be filtered based on countries.
Using Filter Pane: The Power BI team has added a filter pane to the reports, which is a single space where we can add different fields as filters. And these fields can be added depending on whether you want to filter only one visual(Visual level filter), or all the visuals in the report page(Page level filters), or applicable to all the pages of the report(report level filters)


2.How to sort data in Power BI?

Sorting is available in multiple formats. In the data view, a common sorting option of alphabetical order is there. Apart from that, we have the option of Sort by column, where one can sort a column based on another column. The sorting option is available in visuals as well. Sort by ascending and descending option by the fields and measure present in the visual is also available.


3.How to convert pdf to excel?

Open the PDF document you want to convert in XLSX format in Acrobat DC.
Go to the right pane and click on the “Export PDF” option.
Choose spreadsheet as the Export format.
Select “Microsoft Excel Workbook.”
Now click “Export.”
Download the converted file or share it.


4. How to enable macros in excel?

Click the file tab and then click “Options.”
A dialog box will appear. In the “Excel Options” dialog box, click on the “Trust Center” and then “Trust Center Settings.”
Go to the “Macro Settings” and select “enable all macros.”
Click OK to apply the macro settings.
Javascript Notes.pdf
26.8 MB
Javascript Notes.pdf
🔥1
🔺CLOUD COMPUTING☁️🔺 TUTORIAL SHORT NOTES.pdf
4 MB
🔺CLOUD COMPUTING☁️🔺 TUTORIAL SHORT NOTES.pdf
Date: 06-02-2024
Company name: Elica
Role: Data Scientist
Topic: kmeans, KNN, LSTM, powerbi, SQL

1. Explain some cases where k-Means clustering fails to give good results

k-means has trouble clustering data where clusters are of various sizes and densities.Outliers will cause the centroids to be dragged, or the outliers might get their own cluster instead of being ignored. Outliers should be clipped or removed before clustering.If the number of dimensions increase, a distance-based similarity measure converges to a constant value between any given examples. Dimensions should be reduced before clustering them.

2. If your Time-Series Dataset is very long, what architecture would you use?

If the dataset for time-series is very long, LSTMs are ideal for it because it can not only process single data points, but also entire sequences of data. A time-series being a sequence of data makes LSTM ideal for it.For an even stronger representational capacity, making the LSTM's multi-layered is better.Another method for long time-series dataset is to use CNNs to extract information.

3. How would you define Power BI as an effective solution ?

Power BI is a strong business analytical tool that creates useful insights and reports by collating data from unrelated sources. This data can be extracted from any source like Microsoft Excel or hybrid data warehouses. Power BI drives an extreme level of utility and purpose using interactive graphical interface and visualizations.

4. Why is the KNN Algorithm known as Lazy Learner?

When the KNN algorithm gets the training data, it does not learn and make a model, it just stores the data. Instead of finding any discriminative function with the help of the training data, it follows instance-based learning and also uses the training data when it actually needs to do some prediction on the unseen datasets. As a result, KNN does not immediately learn a model rather delays the learning thereby being referred to as Lazy Learner.

5. Explain the difference between drop and truncate.

In SQL, the DROP command is used to remove the whole database or table indexes, data, and more. Whereas the TRUNCATE command is used to remove all the rows from the table.

————————————————————-


Stay Safe & Happy Learning💙
👍1
Power BI and Microsoft Fabric are two different products offered by Microsoft, each serving different purposes:

1. Power BI:
   - Power BI is a business analytics tool that enables users to visualize and analyze data from various sources.
   - It allows users to create interactive reports, dashboards, and data visualizations to gain insights and make data-driven decisions.
   - Power BI offers features for data preparation, data modeling, data visualization, and collaboration.
   - Users can connect to a wide range of data sources, such as Excel files, databases, online services, and cloud platforms.
   - Power BI is widely used for business intelligence, data analysis, and reporting purposes.

2. Microsoft Fabric (also known as Fluent Design System):
   - Microsoft Fabric is a design language developed by Microsoft for creating user interfaces across various devices and platforms.
   - It provides a set of design guidelines, components, and tools for building visually appealing and consistent user interfaces.
   - Microsoft Fabric includes components like buttons, cards, typography, icons, and layouts that can be used to create modern and responsive user interfaces.
   - It focuses on creating a seamless and intuitive user experience across Windows applications, web applications, and mobile apps.
   - Microsoft Fabric is commonly used by developers and designers to create user-friendly interfaces that align with Microsoft's design principles.

In summary, Power BI is a business analytics tool for data visualization and analysis, while Microsoft Fabric is a design language and toolkit for creating user interfaces. They serve different purposes in the realm of data analytics and user interface design within the Microsoft ecosystem.
👍1
👇🏻Data Engineering Interview QnA👇🏻

➡️Can you discuss the pros and cons of using Hadoop for real-time processing?

Hadoop was originally designed for batch processing of large datasets and may not be suitable for real-time processing due to its high latency and limited support for streaming data. However, newer technologies such as Apache Spark and Apache Flink provide faster processing and better support for streaming data. Hadoop's strengths lie in its scalability, fault tolerance, and ability to handle large datasets efficiently.


➡️How do you handle input data that is not in the expected format in a MapReduce job?

Handling input data that is not in the expected format can be challenging in a MapReduce job. One approach is to use a custom InputFormat that can parse the data into the desired format before processing it. Another approach is to use a custom mapper that can transform the input data into the expected format. If the input data cannot be transformed easily, it may be necessary to preprocess the data outside of Hadoop before ingesting it into the Hadoop cluster.


➡️Explain The Five Vs of Big Data.

The five Vs of Big Data are –
Volume – Amount of data in the Petabytes and Exabytes
Variety – Includes formats like an videos, audio sources, textual data, etc.
Velocity – Everyday data growth which are includes conversations in forums,blogs,social media posts,etc.
Veracity – Degree of accuracy of data are available
Value – Deriving insights from collected data to the achieve business milestones and new heights
Date - 14-02-2024
Company name: Elica
Role: ML Engineer
Topic: PCA, SGD, decision tree, ARM

1. What is the difference between stochastic gradient descent (SGD) and gradient descent (GD)?

Gradient Descent and Stochastic Gradient Descent are the algorithms that find the set of parameters that will minimize a loss function.
The difference is that in Gradient Descent, all training samples are evaluated for each set of parameters. While in Stochastic Gradient Descent only one training sample is evaluated for the set of parameters identified.


2. Can you mention some advantages and disadvantages of decision trees?

The advantages of decision trees are that they are easier to interpret, are nonparametric and hence robust to outliers, and have relatively few parameters to tune.
On the other hand, the disadvantage is that they are prone to overfitting.


3. What do you mean by Associative Rule Mining (ARM)?

Associative Rule Mining is one of the techniques to discover patterns in data like features (dimensions) which occur together and features (dimensions) which are correlated. It is mostly used in Market-based Analysis to find how frequently an itemset occurs in a transaction. Association rules have to satisfy minimum support and minimum confidence at the very same time.


4. What is the Principle Component Analysis?

The idea here is to reduce the dimensionality of the data set by reducing the number of variables that are correlated with each other. Although the variation needs to be retained to the maximum extent.
The variables are transformed into a new set of variables that are known as Principal Components’. These PCs are the eigenvectors of a covariance matrix and therefore are orthogonal.

Stay Safe & Happy Learning💙
👍1
DRY (Don't Repeat Yourself): This principle emphasizes that every piece of knowledge or logic should have a single, unambiguous representation within a system. Duplication in code leads to maintenance overhead, increases the risk of inconsistencies, and makes changes more difficult.
2nd principle of software design

KISS (Keep It Simple, Stupid): Simple solutions are easier to understand, maintain, and extend. This principle encourages avoiding unnecessary complexity and favoring straightforward, easy-to-understand designs.
🔥1