Data Analytics

Scenario based Interview Questions & Answers for Data Analyst

1. Scenario: You are working on a SQL database that stores customer information. The database has a table called "Orders" that contains order details. Your task is to write a SQL query to retrieve the total number of orders placed by each customer.
Question:
- Write a SQL query to find the total number of orders placed by each customer.
Expected Answer:
    SELECT CustomerID, COUNT(*) AS TotalOrders
    FROM Orders
    GROUP BY CustomerID;

2. Scenario: You are working on a SQL database that stores employee information. The database has a table called "Employees" that contains employee details. Your task is to write a SQL query to retrieve the names of all employees who have been with the company for more than 5 years.
Question:
- Write a SQL query to find the names of employees who have been with the company for more than 5 years.
Expected Answer:
    SELECT Name
    FROM Employees
    WHERE DATEDIFF(year, HireDate, GETDATE()) > 5;

Power BI Scenario-Based Questions

1. Scenario: You have been given a dataset in Power BI that contains sales data for a company. Your task is to create a report that shows the total sales by product category and region.
    Expected Answer:
    - Load the dataset into Power BI.
    - Create relationships if necessary.
    - Use the "Fields" pane to select the necessary fields (Product Category, Region, Sales).
    - Drag these fields into the "Values" area of a new visualization (e.g., a table or bar chart).
    - Use the "Filters" pane to filter data as needed.
    - Format the visualization to enhance clarity and readability.

2. Scenario: You have been asked to create a Power BI dashboard that displays real-time stock prices for a set of companies. The stock prices are available through an API.
Expected Answer:
    - Use Power BI Desktop to connect to the API.
    - Go to "Get Data" > "Web" and enter the API URL.
    - Configure the data refresh settings to ensure real-time updates (e.g., setting up a scheduled refresh or using DirectQuery if supported).
    - Create visualizations using the imported data.
    - Publish the report to the Power BI service and set up a data gateway if needed for continuous refresh.

3. Scenario: You have been given a Power BI report that contains multiple visualizations. The report is taking a long time to load and is impacting the performance of the application.
    Expected Answer:
    - Analyze the current performance using Performance Analyzer.
    - Optimize data model by reducing the number of columns and rows, and removing unnecessary calculations.
    - Use aggregated tables to pre-compute results.
    - Simplify DAX calculations.
    - Optimize visualizations by reducing the number of visuals per page and avoiding complex custom visuals.
    - Ensure proper indexing on the data source.

Free SQL Resources: https://whatsapp.com/channel/0029VanC5rODzgT6TiTGoa1v

Like if you need more similar content

Hope it helps :)

❤14

4.3K views12:05

Data Analytics

Real-world SQL Questions with Answers 🔥

Let's dive into some real-world SQL questions with a mini dataset.

📊 Dataset: employees

id  name    department  salary  manager_id
1   Aditi   HR          30000   5
2   Rahul   IT          50000   6
3   Neha    IT          60000   6
4   Aman    Sales       40000   7
5   Kiran   HR          70000   NULL
6   Mohit   IT          80000   NULL
7   Suresh  Sales       65000   NULL
8   Pooja   HR          30000   5

1. Find average salary per department

SELECT department, AVG(salary) AS avg_salary 
FROM employees 
GROUP BY department;

2. Find employees earning above department average

SELECT name, department, salary 
FROM employees e 
WHERE salary > ( 
    SELECT AVG(salary) 
    FROM employees 
    WHERE department = e.department 
);

3. Find highest salary in each department

SELECT department, MAX(salary) AS max_salary 
FROM employees 
GROUP BY department;

4. Find employees who earn more than their manager

SELECT e.name 
FROM employees e 
JOIN employees m ON e.manager_id = m.id 
WHERE e.salary > m.salary;

5. Count employees in each department

SELECT department, COUNT(*) AS total_employees 
FROM employees 
GROUP BY department;

6. Find departments with more than 2 employees

SELECT department, COUNT(*) AS total 
FROM employees 
GROUP BY department 
HAVING COUNT(*) > 2;

7. Find second highest salary

SELECT MAX(salary) 
FROM employees 
WHERE salary < (SELECT MAX(salary) FROM employees);

8. Find employees without managers

SELECT name 
FROM employees 
WHERE manager_id IS NULL;

9. Rank employees by salary

SELECT name, salary, RANK() OVER (ORDER BY salary DESC) AS rank 
FROM employees;

10. Find duplicate salaries

SELECT salary, COUNT(*) 
FROM employees 
GROUP BY salary 
HAVING COUNT(*) > 1;

11. Top 2 highest salaries

SELECT DISTINCT salary 
FROM employees 
ORDER BY salary DESC 
LIMIT 2;

Double Tap ❤️ For More

❤40👍3

4.62K views17:20

Data Analytics

Here's a concise cheat sheet to help you get started with Python for Data Analytics. This guide covers essential libraries and functions that you'll frequently use.

1. Python Basics
- Variables:
x = 10

y = "Hello"

- Data Types:
- Integers:

 x = 10

- Floats:

 y = 3.14

- Strings:

 name = "Alice"

- Lists:

 my_list = [1, 2, 3]

- Dictionaries:

 my_dict = {"key": "value"}

- Tuples:

 my_tuple = (1, 2, 3)

- Control Structures:
- if, elif, else statements
- Loops:

    for i in range(5):
        print(i)

- While loop:

    while x < 5:
        print(x)
        x += 1

2. Importing Libraries

- NumPy:

  import numpy as np

- Pandas:

  import pandas as pd

- Matplotlib:

  import matplotlib.pyplot as plt

- Seaborn:

  import seaborn as sns

3. NumPy for Numerical Data

- Creating Arrays:

  arr = np.array([1, 2, 3, 4])

- Array Operations:

  arr.sum()
  arr.mean()

- Reshaping Arrays:

  arr.reshape((2, 2))

- Indexing and Slicing:

  arr[0:2]  # First two elements

4. Pandas for Data Manipulation

- Creating DataFrames:

  df = pd.DataFrame({
      'col1': [1, 2, 3],
      'col2': ['A', 'B', 'C']
  })

- Reading Data:

  df = pd.read_csv('file.csv')

- Basic Operations:

  df.head()          # First 5 rows
  df.describe()      # Summary statistics
  df.info()          # DataFrame info

- Selecting Columns:

  df['col1']
  df[['col1', 'col2']]

- Filtering Data:

  df[df['col1'] > 2]

- Handling Missing Data:

  df.dropna()        # Drop missing values
  df.fillna(0)       # Replace missing values

- GroupBy:

  df.groupby('col2').mean()

5. Data Visualization

- Matplotlib:

  plt.plot(df['col1'], df['col2'])
  plt.xlabel('X-axis')
  plt.ylabel('Y-axis')
  plt.title('Title')
  plt.show()

- Seaborn:

  sns.histplot(df['col1'])
  sns.boxplot(x='col1', y='col2', data=df)

6. Common Data Operations

- Merging DataFrames:

  pd.merge(df1, df2, on='key')

- Pivot Table:

  df.pivot_table(index='col1', columns='col2', values='col3')

- Applying Functions:

  df['col1'].apply(lambda x: x*2)

7. Basic Statistics

- Descriptive Stats:

  df['col1'].mean()
  df['col1'].median()
  df['col1'].std()

- Correlation:

  df.corr()

This cheat sheet should give you a solid foundation in Python for data analytics. As you get more comfortable, you can delve deeper into each library's documentation for more advanced features.

I have curated the best resources to learn Python 👇👇
https://whatsapp.com/channel/0029VaiM08SDuMRaGKd9Wv0L

Hope you'll like it

Like this post if you need more resources like this 👍❤️

❤16👍1🔥1

3.98K views19:23

Data Analytics

Real‑world Data Analytics Questions with Answers 🔥

Let’s practice end‑to‑end data thinking using this small dataset.

📊 Dataset: customer_orders
| order_id | customer_id | product | category | quantity | unit_price | order_date |
|----------|-------------|---------------|------------|----------|------------|-------------|
| 1 | 1001 | Laptop | Electronics| 2 | 75000 | 2025‑01‑10 |
| 2 | 1002 | Mouse | Electronics| 10 | 1500 | 2025‑01‑12 |
| 3 | 1003 | Chair | Furniture | 5 | 8000 | 2025‑01‑15 |
| 4 | 1001 | Keyboard | Electronics| 8 | 2500 | 2025‑01‑11 |
| 5 | 1004 | Desk | Furniture | 3 | 15000 | 2025‑01‑18 |
| 6 | 1002 | Monitor | Electronics| 4 | 25000 | 2025‑01‑20 |
| 7 | 1005 | Table | Furniture | 6 | 5000 | 2025‑01‑22 |
| 8 | 1003 | Webcam | Electronics| 12 | 3000 | 2025‑01‑14 |

1. Define what “data analysis” means in this context
• Data analysis means transforming raw orders into insights: what sells most, who the best customers are, and how revenue changes over time.
• You’d use SQL to query, Excel/Python to clean, and Power BI to visualize.

2. What are the key metrics you’d track for this business?
• Revenue = quantity × unit_price
• Order count and average order value (AOV)
• Top‑selling categories and best customers by revenue

3. Write a SQL query for total revenue by category

SELECT 
    category,
    SUM(quantity * unit_price) AS total_revenue
FROM customer_orders
GROUP BY category;

4. How would you find repeat customers?

SELECT 
    customer_id,
    COUNT(order_id) AS order_count,
    SUM(quantity * unit_price) AS total_spent
FROM customer_orders
GROUP BY customer_id
HAVING COUNT(order_id) > 1;

• Customers with order_count > 1 are repeat buyers.

5. How would you detect “top customers”?
• Define “top” by total_spent or average order value:
– SUM(revenue) / COUNT(orders)
• Use Power BI/Excel to sort descending and highlight top 10%.

6. What would an outlier analysis look like?
• Compute min, max, average, standard deviation of revenue per order.
• Flag orders where:
– revenue > average + 2 * standard_deviation
• Check if such orders are errors or real big deals (e.g., enterprise purchase).

7. How would you report month‑on‑month growth?
• In SQL/Power BI:
– Group by YEAR(order_date) and MONTH(order_date)
– Compute revenue per month
– Then calculate:
▪ MoM % = (CurrentMonthRevenue − PreviousMonthRevenue) / PreviousMonthRevenue

8. How would you turn this into a dashboard?
• Page 1 – Overview: Cards for total revenue, total orders, AOV.
• Page 2 – Trends: Line chart for MoM revenue, bar chart for category split.
• Page 3 – Customers: Table for top 10 customers and repeat customers.

9. How would you handle dirty data (nulls, duplicates)?
• Pre‑check:
– COUNT(*) vs COUNT(customer_id) to spot missing customers.
• Clean:
– Drop or impute missing critical fields.
– Remove duplicate orders using DISTINCT or ROW_NUMBER().

10. How would you explain your findings to a non‑tech manager?
• Use simple language + visuals:
– “Our top product category is Electronics, contributing X% of revenue.”
– “N top customers account for M% of total sales.”
• Avoid formulas; focus on business impact: retention, profitability, growth.

Double Tap ❤️ For More!

❤10

4.54K viewsedited 04:54

Data Analytics

SQL Cheatsheet 📝

This SQL cheatsheet is designed to be your quick reference guide for SQL programming. Whether you’re a beginner learning how to query databases or an experienced developer looking for a handy resource, this cheatsheet covers essential SQL topics.

1. Database Basics
- CREATE DATABASE db_name;
- USE db_name;

2. Tables
- Create Table: CREATE TABLE table_name (col1 datatype, col2 datatype);
- Drop Table: DROP TABLE table_name;
- Alter Table: ALTER TABLE table_name ADD column_name datatype;

3. Insert Data
- INSERT INTO table_name (col1, col2) VALUES (val1, val2);

4. Select Queries
- Basic Select: SELECT * FROM table_name;
- Select Specific Columns: SELECT col1, col2 FROM table_name;
- Select with Condition: SELECT * FROM table_name WHERE condition;

5. Update Data
- UPDATE table_name SET col1 = value1 WHERE condition;

6. Delete Data
- DELETE FROM table_name WHERE condition;

7. Joins
- Inner Join: SELECT * FROM table1 INNER JOIN table2 ON table1.col = table2.col;
- Left Join: SELECT * FROM table1 LEFT JOIN table2 ON table1.col = table2.col;
- Right Join: SELECT * FROM table1 RIGHT JOIN table2 ON table1.col = table2.col;

8. Aggregations
- Count: SELECT COUNT(*) FROM table_name;
- Sum: SELECT SUM(col) FROM table_name;
- Group By: SELECT col, COUNT(*) FROM table_name GROUP BY col;

9. Sorting & Limiting
- Order By: SELECT * FROM table_name ORDER BY col ASC|DESC;
- Limit Results: SELECT * FROM table_name LIMIT n;

10. Indexes
- Create Index: CREATE INDEX idx_name ON table_name (col);
- Drop Index: DROP INDEX idx_name;

11. Subqueries
- SELECT * FROM table_name WHERE col IN (SELECT col FROM other_table);

12. Views
- Create View: CREATE VIEW view_name AS SELECT * FROM table_name;
- Drop View: DROP VIEW view_name;

❤15👍3

5.61K views15:35

Data Analytics

✅ Top Data Analyst Interview Questions

✅ SQL
1. What is a window function?
2. What is the difference between RANK() and ROW_NUMBER()?
3. How do you find the second highest salary?
4. What is a recursive CTE?
5. What is the difference between correlated and non-correlated subquery?
6. How do you remove duplicates without DISTINCT?
7. What is an INDEX and when do you use it?
8. Explain self-join with example.
9. What is the difference between DELETE, DROP, and TRUNCATE?
10. How do you pivot/unpivot data in SQL?
11. What is LAG() and LEAD()?
12. How do you handle NULL in aggregates?
13. What is the difference between VIEW and MATERIALIZED VIEW?
14. Explain ACID properties.
15. How do you optimize a slow query?
16. What is the difference between INNER JOIN and EXISTS?
17. What is a FULL OUTER JOIN?
18. How do you find duplicates across tables?
19. What are SQL constraints?
20. Explain GROUPING SETS.

✅ Python
1. How do you handle missing data in Pandas?
2. What is the difference between loc[] and iloc[]?
3. What are lambda functions in data analysis?
4. How do you remove duplicates from DataFrame?
5. Explain groupby() and agg().
6. How do you merge/join DataFrames?
7. What is vectorization?
8. How do you handle outliers using IQR method?
9. What is the difference between list, tuple, dict?
10. How do you pivot data with pivot_table()?
11. What libraries do you use for viz (Matplotlib/Seaborn)?
12. Explain apply() vs map() vs applymap().
13. How do you read CSV with chunks?
14. What is NumPy broadcasting?
15. How do you handle time-series with Pandas (resample, shift)?
16. How do you calculate correlation matrix?
17. How do you optimize Pandas performance (e.g., dtype)?
18. What is the difference between Series vs DataFrame?
19. How do you filter DataFrame conditionally?
20. How do you rename columns efficiently?

✅ Power BI
1. What is DAX?
2. What is the difference between Power Query and Power Pivot?
3. What is the difference between measure vs calculated column?
4. Explain CALCULATE() function.
5. What are relationships (1:M, M:M)?
6. How do you handle many-to-many?
7. What is row-level security (RLS)?
8. How do you setup incremental refresh?
9. What is the difference between filters vs slicers?
10. What is a data model?
11. How do you publish and share reports?
12. What is performance analyzer tool?
13. How do you create month-on-month growth DAX?
14. How do you use custom visuals?
15. What is gateway for refresh?
16. What is a .pbix file?
17. What are quick measures examples?
18. What is data blending in Power BI?
19. How do you optimize large datasets?
20. What is Get Data?

✅ Excel
1. What is INDEX-MATCH combo?
2. How do you use Power Query?
3. What are dynamic arrays (FILTER, UNIQUE)?
4. What is SUMIFS vs SUMPRODUCT?
5. What is Power Pivot?
6. What are Goal Seek and Solver?
7. What are conditional formatting rules?
8. How do you remove duplicates advanced?
9. How do you create sparklines?
10. How do you use INDIRECT function?
11. What are data validation lists?
12. What are array formulas (Ctrl+Shift+Enter)?
13. What are Macros/VBA basics?
14. What are pivot table slicers?
15. What is what-if analysis?
16. What are TEXTJOIN and CONCAT?
17. What are XLOOKUP advantages?
18. What is Flash Fill feature?
19. What are Table vs Range pros?
20. How do you import from SQL/Web?

✅ Double Tap ❤️ For More

❤39

4K views17:18

About

Blog

Apps

Platform