Coding Interview ⛥
1.5K subscribers
115 photos
215 files
30 links
This channel contains the free resources and solution of coding problems which are usually asked in the interviews.
Download Telegram
Power BI and Microsoft Fabric are two different products offered by Microsoft, each serving different purposes:

1. Power BI:
   - Power BI is a business analytics tool that enables users to visualize and analyze data from various sources.
   - It allows users to create interactive reports, dashboards, and data visualizations to gain insights and make data-driven decisions.
   - Power BI offers features for data preparation, data modeling, data visualization, and collaboration.
   - Users can connect to a wide range of data sources, such as Excel files, databases, online services, and cloud platforms.
   - Power BI is widely used for business intelligence, data analysis, and reporting purposes.

2. Microsoft Fabric (also known as Fluent Design System):
   - Microsoft Fabric is a design language developed by Microsoft for creating user interfaces across various devices and platforms.
   - It provides a set of design guidelines, components, and tools for building visually appealing and consistent user interfaces.
   - Microsoft Fabric includes components like buttons, cards, typography, icons, and layouts that can be used to create modern and responsive user interfaces.
   - It focuses on creating a seamless and intuitive user experience across Windows applications, web applications, and mobile apps.
   - Microsoft Fabric is commonly used by developers and designers to create user-friendly interfaces that align with Microsoft's design principles.

In summary, Power BI is a business analytics tool for data visualization and analysis, while Microsoft Fabric is a design language and toolkit for creating user interfaces. They serve different purposes in the realm of data analytics and user interface design within the Microsoft ecosystem.
👍1
👇🏻Data Engineering Interview QnA👇🏻

➡️Can you discuss the pros and cons of using Hadoop for real-time processing?

Hadoop was originally designed for batch processing of large datasets and may not be suitable for real-time processing due to its high latency and limited support for streaming data. However, newer technologies such as Apache Spark and Apache Flink provide faster processing and better support for streaming data. Hadoop's strengths lie in its scalability, fault tolerance, and ability to handle large datasets efficiently.


➡️How do you handle input data that is not in the expected format in a MapReduce job?

Handling input data that is not in the expected format can be challenging in a MapReduce job. One approach is to use a custom InputFormat that can parse the data into the desired format before processing it. Another approach is to use a custom mapper that can transform the input data into the expected format. If the input data cannot be transformed easily, it may be necessary to preprocess the data outside of Hadoop before ingesting it into the Hadoop cluster.


➡️Explain The Five Vs of Big Data.

The five Vs of Big Data are –
Volume – Amount of data in the Petabytes and Exabytes
Variety – Includes formats like an videos, audio sources, textual data, etc.
Velocity – Everyday data growth which are includes conversations in forums,blogs,social media posts,etc.
Veracity – Degree of accuracy of data are available
Value – Deriving insights from collected data to the achieve business milestones and new heights
Date - 14-02-2024
Company name: Elica
Role: ML Engineer
Topic: PCA, SGD, decision tree, ARM

1. What is the difference between stochastic gradient descent (SGD) and gradient descent (GD)?

Gradient Descent and Stochastic Gradient Descent are the algorithms that find the set of parameters that will minimize a loss function.
The difference is that in Gradient Descent, all training samples are evaluated for each set of parameters. While in Stochastic Gradient Descent only one training sample is evaluated for the set of parameters identified.


2. Can you mention some advantages and disadvantages of decision trees?

The advantages of decision trees are that they are easier to interpret, are nonparametric and hence robust to outliers, and have relatively few parameters to tune.
On the other hand, the disadvantage is that they are prone to overfitting.


3. What do you mean by Associative Rule Mining (ARM)?

Associative Rule Mining is one of the techniques to discover patterns in data like features (dimensions) which occur together and features (dimensions) which are correlated. It is mostly used in Market-based Analysis to find how frequently an itemset occurs in a transaction. Association rules have to satisfy minimum support and minimum confidence at the very same time.


4. What is the Principle Component Analysis?

The idea here is to reduce the dimensionality of the data set by reducing the number of variables that are correlated with each other. Although the variation needs to be retained to the maximum extent.
The variables are transformed into a new set of variables that are known as Principal Components’. These PCs are the eigenvectors of a covariance matrix and therefore are orthogonal.

Stay Safe & Happy Learning💙
👍1
DRY (Don't Repeat Yourself): This principle emphasizes that every piece of knowledge or logic should have a single, unambiguous representation within a system. Duplication in code leads to maintenance overhead, increases the risk of inconsistencies, and makes changes more difficult.
2nd principle of software design

KISS (Keep It Simple, Stupid): Simple solutions are easier to understand, maintain, and extend. This principle encourages avoiding unnecessary complexity and favoring straightforward, easy-to-understand designs.
🔥1
Guys, kindly Add this folders -> https://t.me/addlist/wcoDjKedDTBhNzFl
1
Date: 15-02-2024
Company name : Walmart
Role: ML Engineer
Topic: Cross validation, bagging, boosting, sampling

1. What is Cross-validation in Machine Learning?

Cross-validation is a resampling method that uses different portions of the data to test and train a model on different iterations. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice This sampling process is done to break the dataset into smaller parts that have the same number of rows, out of which a random part is selected as a test set and the rest of the parts are kept as train sets. Cross-validation consists of the following techniques:
• Holdout method
• K-fold cross-validation
• Stratified k-fold cross-validation
• Leave p-out cross-validation


2. What is bagging and boosting in Machine Learning?

Bagging is a homogeneous weak learners’ model that learns from each other independently in parallel and combines them for determining the model average.

Boosting is also a homogeneous weak learners’ model but works differently from Bagging. In this model, learners learn sequentially and adaptively to improve model predictions of a learning algorithm.


3.What is systematic sampling and cluster sampling ?

Systematic sampling is a type of probability sampling method. The sample members are selected from a larger population with a random starting point but a fixed periodic interval. This interval is known as the sampling interval. The sampling interval is calculated by dividing the population size by the desired sample size.

Cluster sampling involves dividing the sample population into separate groups, called clusters. Then, a simple random sample of clusters is selected from the population. Analysis is conducted on data from the sampled clusters.

4.What is market basket analysis?

Market Basket Analysis is a modeling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items.
Answers for this👀

🔺Basic SQL Concepts:
SQL vs NoSQL: SQL is relational, structured, and uses a predefined schema. NoSQL is non-relational, flexible, and schema-less.
Common Data Types: Examples include INT, VARCHAR, DATE, and BOOLEAN.

🔺Querying:
Retrieve all records from "Customers": SELECT * FROM Customers;
SELECT vs SELECT DISTINCT: SELECT retrieves all rows, while SELECT DISTINCT returns only unique values.
WHERE clause: Filters data based on specified conditions.

🔺Joins:
Types of Joins: INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN.
INNER JOIN example: SELECT * FROM Table1 INNER JOIN Table2 ON Table1.ID = Table2.ID;

🔺Aggregate Functions:
Aggregate Functions: Examples include COUNT, AVG, SUM.
Calculate average, sum, count: SELECT AVG(column), SUM(column), COUNT(column) FROM Table;

🔺Grouping and Filtering:
GROUP BY clause: Groups results based on specified columns.
HAVING clause: Filters grouped results.

🔺Subqueries:
Subquery: A query within another query. Example: SELECT column FROM Table WHERE column = (SELECT MAX(column) FROM Table);

🔺Indexes and Optimization:
Importance of Indexes: Improve query performance by speeding up data retrieval.
Optimize slow query: Add indexes, optimize queries, and consider database design.

🔺Normalization and Data Integrity:
Normalization: Organizing data to reduce redundancy and dependency.
Data Integrity: Enforce rules to maintain accuracy and consistency.

🔺Transactions:
SQL Transaction: A sequence of one or more SQL statements treated as a single unit.
ACID properties: Atomicity, Consistency, Isolation, Durability.

🔺Views and Stored Procedures:
Database View: Virtual table based on the result of a SELECT query.
Stored Procedure: Precompiled SQL code stored in the database for reuse.

🔺Advanced SQL:
Recursive SQL query: Used for hierarchical data.
Window Functions: Perform calculations across a set of rows related to the current row.

React❤️👉 to this if you like the post
👍1
SQL Cheat Sheet for Data Analysts.pdf
6.8 MB
SQL Cheat Sheet for Data Analysts.pdf
Let's start with Python Learning Series today 💪

Complete Python Topics for Data Analysis

Introduction to Python.

1. Variables, Data Types, and Basic Operations:
   - Variables: In Python, variables are containers for storing data values. For example:
   
     age = 25
     name = "John"
    

   - Data Types: Python supports various data types, including int, float, str, list, tuple, and more. Example:
   
     height = 1.75  # float
     colors = ['red', 'green', 'blue']  # list
    

   - Basic Operations: You can perform basic arithmetic operations:
   
     result = 10 + 5
    

2. Control Structures (If Statements, Loops):
   - If Statements: Conditional statements allow you to make decisions in your code.
   
     age = 18
     if age >= 18:
         print("You are an adult.")
     else:
         print("You are a minor.")
    

   - Loops (For and While): Loops are used for iterating over a sequence (string, list, tuple, dictionary, etc.).
   
     fruits = ['apple', 'banana', 'orange']
     for fruit in fruits:
         print(fruit)
    

3. Functions and Modules:
   - Functions: Functions are blocks of reusable code. Example:
   
     def greet(name):
         return f"Hello, {name}!"

     result = greet("Alice")
    

   - Modules: Modules allow you to organize code into separate files. Example:
   
     # mymodule.py
     def multiply(x, y):
         return x * y

     # main script
     import mymodule
     result = mymodule.multiply(3, 4)
    

Understanding these basics is crucial as they lay the foundation for more advanced topics.


Hope it helps :)
This is how ML works
Java interview questions.pdf
506.4 KB
Java interview questions.pdf
👍2🎉1
Date: 23-02-2024
Company name: Oracle
Role: Data Analyst
Topic: outlier, dax, filter, recursive stored procedure

1. What are the ways to detect outliers?

Outliers are detected using two methods:

Box Plot Method: According to this method, the value is considered an outlier if it exceeds or falls below 1.5*IQR (interquartile range), that is, if it lies above the top quartile (Q3) or below the bottom quartile (Q1).

Standard Deviation Method: According to this method, an outlier is defined as a value that is greater or lower than the mean ± (3*standard deviation).


2. What is a Recursive Stored Procedure?

A stored procedure that calls itself until a boundary condition is reached, is called a recursive stored procedure. This recursive function helps the programmers to deploy the same set of code several times as and when required.


3. What is the shortcut to add a filter to a table in EXCEL?

The filter mechanism is used when you want to display only specific data from the entire dataset. By doing so, there is no change being made to the data. The shortcut to add a filter to a table is Ctrl+Shift+L.

4. What is DAX in Power BI?

DAX stands for Data Analysis Expressions. It's a collection of functions, operators, and constants used in formulas to calculate and return values. In other words, it helps you create new info from data you already have.

————————————————————-


Stay Safe & Happy Learning💙
👍2
Date: 22-02-2024
Company name: IKEA
Role: Data Analyst
Topic: Statistics, SQL

1. How can we deal with problems that arise when the data flows in from a variety of sources?

There are many ways to go about dealing with multi-source problems. However, these are done primarily to solve the problems of:

Identifying the presence of similar/same records and merging them into a single recordRe-structuring the schema to ensure there is good schema integration


2.  Where is Time Series Analysis used?

Since time series analysis (TSA) has a wide scope of usage, it can be used in multiple domains. Here are some of the places where TSA plays an important role:

Statistics
Signal processing
Econometrics
Weather forecasting
Earthquake prediction
Astronomy
Applied science


3. What are the ideal situations in which t-test or z-test can be used?

It is a standard practice that a t-test is used when there is a sample size less than 30 and the z-test is considered when the sample size exceeds 30 in most cases.


4. What is the usage of the NVL() function?

The NVL() function is used to convert the NULL value to the other value. The function returns the value of the second parameter if the first parameter is NULL. If the first parameter is anything other than NULL, it is left unchanged. This function is used in Oracle, not in SQL and MySQL. Instead of NVL() function, MySQL have IFNULL() and SQL Server have ISNULL() function.


5. What is the difference between DROP and TRUNCATE commands?

If a table is dropped, all things associated with that table are dropped as well. This includes the relationships defined on the table with other tables, access privileges, and grants that the table has, as well as the integrity checks and constraints. 

However, if a table is truncated, there are no such problems as mentioned above. The table retains its original structure and the data is dropped.

————————————————————-

Stay Safe & Happy Learning💙
🔥1
Date - 26-02-2024
Company name: IBM
Role: Data Analyst
Topic: rank vs dense rank, Join, OLTP, Joining vs blending, worksheet excel

1.  What is a Self-Join?

A self-join is a type of join that can be used to connect two tables. As a result, it is a unary relationship. Each row of the table is attached to itself and all other rows of the same table in a self-join. As a result, a self-join is mostly used to combine and compare rows from the same database table.


2.  What is OLTP?

OLTP, or online transactional processing, allows huge groups of people to execute massive amounts of database transactions in real time, usually via the internet. A database transaction occurs when data in a database is changed, inserted, deleted, or queried.


3. What is the difference between joining and blending in Tableau?

Joining term is used when you are combining data from the same source, for example, worksheet in an Excel file or tables in Oracle databaseWhile blending requires two completely defined data sources in your report.

4. How to prevent someone from copying the cell from your worksheet in excel?

If you want to protect your worksheet from being copied, go into Menu bar > Review > Protect sheet > Password.

By entering password you can prevent your worksheet from getting copied.

5. What is the difference between the RANK() and DENSE_RANK() functions?

The RANK() function in the result set defines the rank of each row within your ordered partition. If both rows have the same rank, the next number in the ranking will be the previous rank plus a number of duplicates. If we have three records at rank 4, for example, the next level indicated is 7. The DENSE_RANK() function assigns a distinct rank to each row within a partition based on the provided column value, with no gaps. If we have three records at rank 4, for example, the next level indicated is 5.
🔥1
Google announces VideoPrism

A Foundational Visual Encoder for Video Understanding
👎1
Date: 01-03-2024
Company Name - Nestle
Role: Data Analyst
Topic - SQL and Tableau

1. Explain character-manipulation functions? Explains its different types in SQL.

Change, extract, and edit the character string using character manipulation routines. The function will do its action on the input strings and return the result when one or more characters and words are supplied into it.

The character manipulation functions in SQL are as follows:

A) CONCAT (joining two or more values): This function is used to join two or more values together. The second string is always appended to the end of the first string.

B) SUBSTR: This function returns a segment of a string from a given start point to a given endpoint.

C) LENGTH: This function returns the length of the string in numerical form, including blank spaces.

D) INSTR: This function calculates the precise numeric location of a character or word in a string.

E) LPAD: For right-justified values, it returns the padding of the left-side character value.

F) RPAD: For a left-justified value, it returns the padding of the right-side character value.

G) TRIM: This function removes all defined characters from the beginning, end, or both ends of a string. It also reduced the amount of wasted space.

H) REPLACE: This function replaces all instances of a word or a section of a string (substring) with the other string value specified.


2. How Do You Calculate the Daily Profit Measures Using LOD?

LOD expressions allow us to easily create bins on aggregated data such as profit per day.

Scenario: We want to measure our success by the total profit per business day.

Create a calculated field named LOD - Profit per day and enter the formula:

FIXED [Order Date] : SUM ([Profit])

Create another calculated field named LOD - Daily Profit KPI and enter the formula:

IF [LOD - Profit per day] > 2000 then “Highly Profitable.”

ELSEIF [LOD - Profit per day] <= 0 then “Unprofitable”

ELSE “Profitable”

END

To calculate daily profit measure using LOD, follow these steps to draw the visualization:

Bring YEAR(Order Date) and MONTH(Order Date) to the Columns shelf
Drag Order Id field to Rows shelf. Right-click on it, select Measure and click on Count(Distinct)
Drag LOD - Daily Profit KPI to the Rows shelf
Bring LOD - Daily Profit KPI to marks card and change mark type from automatic to area.

3. What are Superkey and candidate key?

A super key may be a single or a combination of keys that help to identify a record in a table. Know that Super keys can have one or more attributes, even though all the attributes are not necessary to identify the records.

A candidate key is the subset of Superkey, which can have one or more than one attributes to identify records in a table. Unlike Superkey, all the attributes of the candidate key must be helpful to identify the records.

Note that all the candidate keys can be Super keys, but all the super keys cannot be candidate keys.


4.What is Database Cardinality?

Database Cardinality denotes the uniqueness of values in the tables. It supports optimizing query plans and hence improves query performance. There are three types of database cardinalities in SQL, as given below:

Higher Cardinality
Normal Cardinality
Lower Cardinality

————————————————————-
👍1🔥1
🔥2