Coding Interview ⛥
1.5K subscribers
115 photos
215 files
30 links
This channel contains the free resources and solution of coding problems which are usually asked in the interviews.
Download Telegram
Today's Interview QnAs
Date - 25/01/2024
Company - The Math Company
Role- Data Analyst


1.How to create filters in Power BI?

Filters are an integral part of Power BI reports. They are used to slice and dice the data as per the dimensions we want. Filters are created in a couple of ways.

Using Slicers: A slicer is a visual under Visualization Pane. This can be added to the design view to filter our reports. When a slicer is added to the design view, it requires a field to be added to it. For example- Slicer can be added for Country fields. Then the data can be filtered based on countries.
Using Filter Pane: The Power BI team has added a filter pane to the reports, which is a single space where we can add different fields as filters. And these fields can be added depending on whether you want to filter only one visual(Visual level filter), or all the visuals in the report page(Page level filters), or applicable to all the pages of the report(report level filters)


2.How to sort data in Power BI?

Sorting is available in multiple formats. In the data view, a common sorting option of alphabetical order is there. Apart from that, we have the option of Sort by column, where one can sort a column based on another column. The sorting option is available in visuals as well. Sort by ascending and descending option by the fields and measure present in the visual is also available.


3.How to convert pdf to excel?

Open the PDF document you want to convert in XLSX format in Acrobat DC.
Go to the right pane and click on the “Export PDF” option.
Choose spreadsheet as the Export format.
Select “Microsoft Excel Workbook.”
Now click “Export.”
Download the converted file or share it.


4. How to enable macros in excel?

Click the file tab and then click “Options.”
A dialog box will appear. In the “Excel Options” dialog box, click on the “Trust Center” and then “Trust Center Settings.”
Go to the “Macro Settings” and select “enable all macros.”
Click OK to apply the macro settings.
Javascript Notes.pdf
26.8 MB
Javascript Notes.pdf
🔥1
🔺CLOUD COMPUTING☁️🔺 TUTORIAL SHORT NOTES.pdf
4 MB
🔺CLOUD COMPUTING☁️🔺 TUTORIAL SHORT NOTES.pdf
Date: 06-02-2024
Company name: Elica
Role: Data Scientist
Topic: kmeans, KNN, LSTM, powerbi, SQL

1. Explain some cases where k-Means clustering fails to give good results

k-means has trouble clustering data where clusters are of various sizes and densities.Outliers will cause the centroids to be dragged, or the outliers might get their own cluster instead of being ignored. Outliers should be clipped or removed before clustering.If the number of dimensions increase, a distance-based similarity measure converges to a constant value between any given examples. Dimensions should be reduced before clustering them.

2. If your Time-Series Dataset is very long, what architecture would you use?

If the dataset for time-series is very long, LSTMs are ideal for it because it can not only process single data points, but also entire sequences of data. A time-series being a sequence of data makes LSTM ideal for it.For an even stronger representational capacity, making the LSTM's multi-layered is better.Another method for long time-series dataset is to use CNNs to extract information.

3. How would you define Power BI as an effective solution ?

Power BI is a strong business analytical tool that creates useful insights and reports by collating data from unrelated sources. This data can be extracted from any source like Microsoft Excel or hybrid data warehouses. Power BI drives an extreme level of utility and purpose using interactive graphical interface and visualizations.

4. Why is the KNN Algorithm known as Lazy Learner?

When the KNN algorithm gets the training data, it does not learn and make a model, it just stores the data. Instead of finding any discriminative function with the help of the training data, it follows instance-based learning and also uses the training data when it actually needs to do some prediction on the unseen datasets. As a result, KNN does not immediately learn a model rather delays the learning thereby being referred to as Lazy Learner.

5. Explain the difference between drop and truncate.

In SQL, the DROP command is used to remove the whole database or table indexes, data, and more. Whereas the TRUNCATE command is used to remove all the rows from the table.

————————————————————-


Stay Safe & Happy Learning💙
👍1
Power BI and Microsoft Fabric are two different products offered by Microsoft, each serving different purposes:

1. Power BI:
   - Power BI is a business analytics tool that enables users to visualize and analyze data from various sources.
   - It allows users to create interactive reports, dashboards, and data visualizations to gain insights and make data-driven decisions.
   - Power BI offers features for data preparation, data modeling, data visualization, and collaboration.
   - Users can connect to a wide range of data sources, such as Excel files, databases, online services, and cloud platforms.
   - Power BI is widely used for business intelligence, data analysis, and reporting purposes.

2. Microsoft Fabric (also known as Fluent Design System):
   - Microsoft Fabric is a design language developed by Microsoft for creating user interfaces across various devices and platforms.
   - It provides a set of design guidelines, components, and tools for building visually appealing and consistent user interfaces.
   - Microsoft Fabric includes components like buttons, cards, typography, icons, and layouts that can be used to create modern and responsive user interfaces.
   - It focuses on creating a seamless and intuitive user experience across Windows applications, web applications, and mobile apps.
   - Microsoft Fabric is commonly used by developers and designers to create user-friendly interfaces that align with Microsoft's design principles.

In summary, Power BI is a business analytics tool for data visualization and analysis, while Microsoft Fabric is a design language and toolkit for creating user interfaces. They serve different purposes in the realm of data analytics and user interface design within the Microsoft ecosystem.
👍1
👇🏻Data Engineering Interview QnA👇🏻

➡️Can you discuss the pros and cons of using Hadoop for real-time processing?

Hadoop was originally designed for batch processing of large datasets and may not be suitable for real-time processing due to its high latency and limited support for streaming data. However, newer technologies such as Apache Spark and Apache Flink provide faster processing and better support for streaming data. Hadoop's strengths lie in its scalability, fault tolerance, and ability to handle large datasets efficiently.


➡️How do you handle input data that is not in the expected format in a MapReduce job?

Handling input data that is not in the expected format can be challenging in a MapReduce job. One approach is to use a custom InputFormat that can parse the data into the desired format before processing it. Another approach is to use a custom mapper that can transform the input data into the expected format. If the input data cannot be transformed easily, it may be necessary to preprocess the data outside of Hadoop before ingesting it into the Hadoop cluster.


➡️Explain The Five Vs of Big Data.

The five Vs of Big Data are –
Volume – Amount of data in the Petabytes and Exabytes
Variety – Includes formats like an videos, audio sources, textual data, etc.
Velocity – Everyday data growth which are includes conversations in forums,blogs,social media posts,etc.
Veracity – Degree of accuracy of data are available
Value – Deriving insights from collected data to the achieve business milestones and new heights
Date - 14-02-2024
Company name: Elica
Role: ML Engineer
Topic: PCA, SGD, decision tree, ARM

1. What is the difference between stochastic gradient descent (SGD) and gradient descent (GD)?

Gradient Descent and Stochastic Gradient Descent are the algorithms that find the set of parameters that will minimize a loss function.
The difference is that in Gradient Descent, all training samples are evaluated for each set of parameters. While in Stochastic Gradient Descent only one training sample is evaluated for the set of parameters identified.


2. Can you mention some advantages and disadvantages of decision trees?

The advantages of decision trees are that they are easier to interpret, are nonparametric and hence robust to outliers, and have relatively few parameters to tune.
On the other hand, the disadvantage is that they are prone to overfitting.


3. What do you mean by Associative Rule Mining (ARM)?

Associative Rule Mining is one of the techniques to discover patterns in data like features (dimensions) which occur together and features (dimensions) which are correlated. It is mostly used in Market-based Analysis to find how frequently an itemset occurs in a transaction. Association rules have to satisfy minimum support and minimum confidence at the very same time.


4. What is the Principle Component Analysis?

The idea here is to reduce the dimensionality of the data set by reducing the number of variables that are correlated with each other. Although the variation needs to be retained to the maximum extent.
The variables are transformed into a new set of variables that are known as Principal Components’. These PCs are the eigenvectors of a covariance matrix and therefore are orthogonal.

Stay Safe & Happy Learning💙
👍1
DRY (Don't Repeat Yourself): This principle emphasizes that every piece of knowledge or logic should have a single, unambiguous representation within a system. Duplication in code leads to maintenance overhead, increases the risk of inconsistencies, and makes changes more difficult.
2nd principle of software design

KISS (Keep It Simple, Stupid): Simple solutions are easier to understand, maintain, and extend. This principle encourages avoiding unnecessary complexity and favoring straightforward, easy-to-understand designs.
🔥1
Guys, kindly Add this folders -> https://t.me/addlist/wcoDjKedDTBhNzFl
1
Date: 15-02-2024
Company name : Walmart
Role: ML Engineer
Topic: Cross validation, bagging, boosting, sampling

1. What is Cross-validation in Machine Learning?

Cross-validation is a resampling method that uses different portions of the data to test and train a model on different iterations. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice This sampling process is done to break the dataset into smaller parts that have the same number of rows, out of which a random part is selected as a test set and the rest of the parts are kept as train sets. Cross-validation consists of the following techniques:
• Holdout method
• K-fold cross-validation
• Stratified k-fold cross-validation
• Leave p-out cross-validation


2. What is bagging and boosting in Machine Learning?

Bagging is a homogeneous weak learners’ model that learns from each other independently in parallel and combines them for determining the model average.

Boosting is also a homogeneous weak learners’ model but works differently from Bagging. In this model, learners learn sequentially and adaptively to improve model predictions of a learning algorithm.


3.What is systematic sampling and cluster sampling ?

Systematic sampling is a type of probability sampling method. The sample members are selected from a larger population with a random starting point but a fixed periodic interval. This interval is known as the sampling interval. The sampling interval is calculated by dividing the population size by the desired sample size.

Cluster sampling involves dividing the sample population into separate groups, called clusters. Then, a simple random sample of clusters is selected from the population. Analysis is conducted on data from the sampled clusters.

4.What is market basket analysis?

Market Basket Analysis is a modeling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items.
Answers for this👀

🔺Basic SQL Concepts:
SQL vs NoSQL: SQL is relational, structured, and uses a predefined schema. NoSQL is non-relational, flexible, and schema-less.
Common Data Types: Examples include INT, VARCHAR, DATE, and BOOLEAN.

🔺Querying:
Retrieve all records from "Customers": SELECT * FROM Customers;
SELECT vs SELECT DISTINCT: SELECT retrieves all rows, while SELECT DISTINCT returns only unique values.
WHERE clause: Filters data based on specified conditions.

🔺Joins:
Types of Joins: INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN.
INNER JOIN example: SELECT * FROM Table1 INNER JOIN Table2 ON Table1.ID = Table2.ID;

🔺Aggregate Functions:
Aggregate Functions: Examples include COUNT, AVG, SUM.
Calculate average, sum, count: SELECT AVG(column), SUM(column), COUNT(column) FROM Table;

🔺Grouping and Filtering:
GROUP BY clause: Groups results based on specified columns.
HAVING clause: Filters grouped results.

🔺Subqueries:
Subquery: A query within another query. Example: SELECT column FROM Table WHERE column = (SELECT MAX(column) FROM Table);

🔺Indexes and Optimization:
Importance of Indexes: Improve query performance by speeding up data retrieval.
Optimize slow query: Add indexes, optimize queries, and consider database design.

🔺Normalization and Data Integrity:
Normalization: Organizing data to reduce redundancy and dependency.
Data Integrity: Enforce rules to maintain accuracy and consistency.

🔺Transactions:
SQL Transaction: A sequence of one or more SQL statements treated as a single unit.
ACID properties: Atomicity, Consistency, Isolation, Durability.

🔺Views and Stored Procedures:
Database View: Virtual table based on the result of a SELECT query.
Stored Procedure: Precompiled SQL code stored in the database for reuse.

🔺Advanced SQL:
Recursive SQL query: Used for hierarchical data.
Window Functions: Perform calculations across a set of rows related to the current row.

React❤️👉 to this if you like the post
👍1
SQL Cheat Sheet for Data Analysts.pdf
6.8 MB
SQL Cheat Sheet for Data Analysts.pdf
Let's start with Python Learning Series today 💪

Complete Python Topics for Data Analysis

Introduction to Python.

1. Variables, Data Types, and Basic Operations:
   - Variables: In Python, variables are containers for storing data values. For example:
   
     age = 25
     name = "John"
    

   - Data Types: Python supports various data types, including int, float, str, list, tuple, and more. Example:
   
     height = 1.75  # float
     colors = ['red', 'green', 'blue']  # list
    

   - Basic Operations: You can perform basic arithmetic operations:
   
     result = 10 + 5
    

2. Control Structures (If Statements, Loops):
   - If Statements: Conditional statements allow you to make decisions in your code.
   
     age = 18
     if age >= 18:
         print("You are an adult.")
     else:
         print("You are a minor.")
    

   - Loops (For and While): Loops are used for iterating over a sequence (string, list, tuple, dictionary, etc.).
   
     fruits = ['apple', 'banana', 'orange']
     for fruit in fruits:
         print(fruit)
    

3. Functions and Modules:
   - Functions: Functions are blocks of reusable code. Example:
   
     def greet(name):
         return f"Hello, {name}!"

     result = greet("Alice")
    

   - Modules: Modules allow you to organize code into separate files. Example:
   
     # mymodule.py
     def multiply(x, y):
         return x * y

     # main script
     import mymodule
     result = mymodule.multiply(3, 4)
    

Understanding these basics is crucial as they lay the foundation for more advanced topics.


Hope it helps :)