Which statement is TRUE about CTE?
Anonymous Quiz
6%
A. It is stored permanently
82%
B. It exists only during query execution
7%
C. It replaces SELECT
5%
D. It cannot be used with JOIN
Now, letโs move to the next topic:
Views (Virtual Tables)
๐ง 1. What is a VIEW?
A VIEW is a virtual table based on a SQL query
๐ It does NOT store data
๐ It stores the query
Think like this ๐
๐ โSaved SQL query โ reuse anytimeโ
โก 2. Why Use Views?
- Simplify complex queries
- Reuse logic
- Hide sensitive data
- Improve readability
โก 3. Create a VIEW
CREATE VIEW high_salary_emp AS
SELECT name, salary
FROM employees
WHERE salary > 50000;
๐ 4. Use a VIEW
SELECT FROM high_salary_emp;
โ Works like a normal table
๐ 5. Update a VIEW
CREATE OR REPLACE VIEW high_salary_emp AS
SELECT name, salary, department
FROM employees
WHERE salary > 50000;
โ 6. Drop a VIEW
DROP VIEW high_salary_emp;
๐ฏ 7. Real Example
๐ Create view for department-wise average salary
CREATE VIEW dept_avg_salary AS
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
๐ Use it:
SELECT FROM dept_avg_salary;
โก 8. Important Points
- View does NOT store data
- Changes in table โ reflect in view
- Can be used like a table
๐ฏ 9. Practice Tasks
1. Create view for employees with salary > 40k
2. Create view for IT department employees
3. Create view for avg salary per department
4. Query data using created views
5. Drop a view
๐ฅ Here are the solutions for VIEW practice tasks
โ 1. Create view for employees with salary > 40k
CREATE VIEW high_salary_emp AS
SELECT
FROM employees
WHERE salary > 40000;
โ 2. Create view for IT department employees
CREATE VIEW it_employees AS
SELECT
FROM employees
WHERE department = 'IT';
โ 3. Create view for avg salary per department
CREATE VIEW dept_avg_salary AS
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
โ 4. Query data using created views
SELECT FROM high_salary_emp;
SELECT FROM it_employees;
SELECT FROM dept_avg_salary;
โ 5. Drop a view
DROP VIEW high_salary_emp;
โก Mini Challenge ๐ฅ
๐ Create a view to show top 3 highest salary employees
โก Mini Challenge Solution
CREATE VIEW top_3_salary AS
SELECT
FROM employees
ORDER BY salary DESC
LIMIT 3;
๐ Use the view:
SELECT FROM top_3_salary;
๐ฅ Pro Tip:
Views are heavily used in:
๐ Dashboards
๐ Reporting systems
๐ Data analytics projects
Because they simplify complex SQL ๐ฏ
๐ Table โ stores data
๐ View โ stores query
Double Tap โค๏ธ For More
Views (Virtual Tables)
๐ง 1. What is a VIEW?
A VIEW is a virtual table based on a SQL query
๐ It does NOT store data
๐ It stores the query
Think like this ๐
๐ โSaved SQL query โ reuse anytimeโ
โก 2. Why Use Views?
- Simplify complex queries
- Reuse logic
- Hide sensitive data
- Improve readability
โก 3. Create a VIEW
CREATE VIEW high_salary_emp AS
SELECT name, salary
FROM employees
WHERE salary > 50000;
๐ 4. Use a VIEW
SELECT FROM high_salary_emp;
โ Works like a normal table
๐ 5. Update a VIEW
CREATE OR REPLACE VIEW high_salary_emp AS
SELECT name, salary, department
FROM employees
WHERE salary > 50000;
โ 6. Drop a VIEW
DROP VIEW high_salary_emp;
๐ฏ 7. Real Example
๐ Create view for department-wise average salary
CREATE VIEW dept_avg_salary AS
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
๐ Use it:
SELECT FROM dept_avg_salary;
โก 8. Important Points
- View does NOT store data
- Changes in table โ reflect in view
- Can be used like a table
๐ฏ 9. Practice Tasks
1. Create view for employees with salary > 40k
2. Create view for IT department employees
3. Create view for avg salary per department
4. Query data using created views
5. Drop a view
๐ฅ Here are the solutions for VIEW practice tasks
โ 1. Create view for employees with salary > 40k
CREATE VIEW high_salary_emp AS
SELECT
FROM employees
WHERE salary > 40000;
โ 2. Create view for IT department employees
CREATE VIEW it_employees AS
SELECT
FROM employees
WHERE department = 'IT';
โ 3. Create view for avg salary per department
CREATE VIEW dept_avg_salary AS
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
โ 4. Query data using created views
SELECT FROM high_salary_emp;
SELECT FROM it_employees;
SELECT FROM dept_avg_salary;
โ 5. Drop a view
DROP VIEW high_salary_emp;
โก Mini Challenge ๐ฅ
๐ Create a view to show top 3 highest salary employees
โก Mini Challenge Solution
CREATE VIEW top_3_salary AS
SELECT
FROM employees
ORDER BY salary DESC
LIMIT 3;
๐ Use the view:
SELECT FROM top_3_salary;
๐ฅ Pro Tip:
Views are heavily used in:
๐ Dashboards
๐ Reporting systems
๐ Data analytics projects
Because they simplify complex SQL ๐ฏ
๐ Table โ stores data
๐ View โ stores query
Double Tap โค๏ธ For More
โค16
What is a VIEW in SQL?
Anonymous Quiz
4%
A. A physical table
84%
B. A virtual table based on a query
8%
C. A database
4%
D. A stored procedure
Does a VIEW store actual data?
Anonymous Quiz
24%
A. Yes
56%
B. No
6%
C. Sometimes
14%
D. Only in MySQL
Which command is used to create a VIEW?
Anonymous Quiz
6%
A. MAKE VIEW
86%
B. CREATE VIEW
4%
C. NEW VIEW
4%
D. ADD VIEW
What will happen if underlying table data changes?
Anonymous Quiz
22%
A. View remains unchanged
69%
B. View automatically reflects changes
5%
C. View gets deleted
5%
D. Error occurs
Which command removes a VIEW?
Anonymous Quiz
18%
A. DELETE VIEW
26%
B. REMOVE VIEW
49%
C. DROP VIEW
7%
D. CLEAR VIEW
โค4
Now, letโs move to the next topic:
Indexes ๐
๐ง 1. What is an INDEX?
An INDEX is used to make data retrieval faster
๐ Think like a book ๐
- Without index โ scan every page
- With index โ jump directly to topic
Same happens in databases ๐ฏ
โก 2. Why Use Indexes?
โ Faster SELECT queries
โ Faster searching
โ Better performance on large tables
โ But:
- Uses extra storage
- INSERT/UPDATE become slightly slower
๐ Visual Understanding
โก 3. Create an INDEX
CREATE INDEX idx_salary
ON employees(salary);
๐ Creates index on salary column
๐ 4. Query Using Indexed Column
SELECT FROM employees
WHERE salary > 50000;
โ Faster because of index
โ 5. Drop an INDEX
DROP INDEX idx_salary ON employees;
๐ฅ 6. Primary Key Automatically Creates Index
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
name VARCHAR(50)
);
โ PRIMARY KEY โ automatically indexed
โก 7. Types of Indexes
- Primary Index: Created on primary key
- Unique Index: Prevent duplicate values
- Composite Index: Index on multiple columns
๐ฏ 8. Composite Index Example
CREATE INDEX idx_dept_salary
ON employees(department, salary);
โ Useful when filtering both columns together
๐ฏ 9. Practice Tasks
1. Create index on employee name
2. Create index on department column
3. Create composite index on department + salary
4. Query employees using indexed column
5. Drop created index
โก Mini Challenge ๐ฅ
๐ Create a unique index on email column
๐ฅ Indexes improve READ speed but may slow down INSERT / UPDATE
Double Tap โค๏ธ For More
Indexes ๐
๐ง 1. What is an INDEX?
An INDEX is used to make data retrieval faster
๐ Think like a book ๐
- Without index โ scan every page
- With index โ jump directly to topic
Same happens in databases ๐ฏ
โก 2. Why Use Indexes?
โ Faster SELECT queries
โ Faster searching
โ Better performance on large tables
โ But:
- Uses extra storage
- INSERT/UPDATE become slightly slower
๐ Visual Understanding
โก 3. Create an INDEX
CREATE INDEX idx_salary
ON employees(salary);
๐ Creates index on salary column
๐ 4. Query Using Indexed Column
SELECT FROM employees
WHERE salary > 50000;
โ Faster because of index
โ 5. Drop an INDEX
DROP INDEX idx_salary ON employees;
๐ฅ 6. Primary Key Automatically Creates Index
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
name VARCHAR(50)
);
โ PRIMARY KEY โ automatically indexed
โก 7. Types of Indexes
- Primary Index: Created on primary key
- Unique Index: Prevent duplicate values
- Composite Index: Index on multiple columns
๐ฏ 8. Composite Index Example
CREATE INDEX idx_dept_salary
ON employees(department, salary);
โ Useful when filtering both columns together
๐ฏ 9. Practice Tasks
1. Create index on employee name
2. Create index on department column
3. Create composite index on department + salary
4. Query employees using indexed column
5. Drop created index
โก Mini Challenge ๐ฅ
๐ Create a unique index on email column
๐ฅ Indexes improve READ speed but may slow down INSERT / UPDATE
Double Tap โค๏ธ For More
โค17๐4
๐ Top 100 Data Analyst Interview Questions
๐ง Data Analyst Role Basics
1. What does a data analyst do in a company?
2. What is the difference between a data analyst, data scientist, and BI analyst?
3. What is the typical workflow of a data analyst (from requirement to insight)?
4. What are the main goals of data analysis (descriptive, diagnostic, predictive, prescriptive)?
5. What is KPI and why is it important?
6. What is the difference between metrics and KPIs?
7. What is a dashboard vs a report?
8. What is exploratory data analysis (EDA)?
9. What is the difference between raw data and processed data?
10. How do you prioritize which analysis to work on first?
๐ SQL Databases
11. What is SQL and why is it critical for data analysts?
12. How do SELECT, WHERE, ORDER BY, LIMIT work?
13. How do you join two tables (INNER, LEFT, RIGHT, FULL joins)?
14. How do GROUP BY and aggregate functions (SUM, AVG, COUNT, MAX, MIN) work?
15. How do you write subqueries and CTEs?
16. How do you calculate running totals or rolling averages with window functions?
17. How do you clean and filter data directly in SQL?
18. How do you handle duplicates and NULL values in SQL?
19. How do you optimize a slow query?
20. How do you design a simple schema for a business domain (e.g., orders, users)?
๐งฎ Excel Spreadsheets
21. How do you use Excel for quick data cleaning and analysis?
22. How do you use SUMIF, COUNTIF, VLOOKUP / XLOOKUP in Excel?
23. How do you remove duplicates and standardize text in Excel?
24. How do you use PivotTables for summarizing data?
25. How do you build simple dashboards in Excel (charts + slicers)?
26. How do you use conditional formatting for insights?
27. How do you export data to CSV or share formatted reports?
28. How do you handle large datasets in Excel vs a database?
29. How do you avoid common Excel pitfalls (e.g., hardโcoded numbers, no labels)?
30. How do you document your Excel analyses?
๐ Data Visualization BI Tools
31. What is the purpose of data visualization?
32. When do you use bar charts, line charts, pie charts, histograms?
33. What are best practices for labeling, colors, and readability?
34. How do you design a dashboard for a nonโtechnical stakeholder?
35. What is the difference between a report and a selfโservice dashboard?
36. How do you use Power BI / Tableau / Looker / Google Data Studio for dashboards?
37. How do you filter and slice data in a BI tool?
38. How do you handle measures and dimensions in BI tools?
39. How do you share dashboards and control access?
40. How do you tell a โdata storyโ using charts and annotations?
๐ Descriptive Statistics EDA
41. What are mean, median, and mode?
42. What is standard deviation and variance?
43. What are quartiles and IQR?
44. How do you detect outliers and what should you do with them?
45. What is a distribution and how do you inspect it (histograms, boxplots)?
46. What is skewness and kurtosis?
47. How do you calculate growth rate, percentage change, CAGR?
48. How do you compute cohortโstyle metrics (e.g., retention by signup month)?
49. How do you summarize categorical vs numerical data?
50. How do you structure an EDA notebook or report?
๐ ๏ธ Python (or R) for Data Analysis
51. Why do data analysts use Python instead of (or along with) Excel?
52. How do you load data from CSV or SQL into a pandas DataFrame?
53. How do you inspect the first/last rows, shape, data types, and missing values?
54. How do you clean missing values (dropna, fillna, interpolation)?
55. How do you filter, sort, and group data with pandas?
56. How do you calculate aggregates and pivots with groupby and pivot_table?
๐ง Data Analyst Role Basics
1. What does a data analyst do in a company?
2. What is the difference between a data analyst, data scientist, and BI analyst?
3. What is the typical workflow of a data analyst (from requirement to insight)?
4. What are the main goals of data analysis (descriptive, diagnostic, predictive, prescriptive)?
5. What is KPI and why is it important?
6. What is the difference between metrics and KPIs?
7. What is a dashboard vs a report?
8. What is exploratory data analysis (EDA)?
9. What is the difference between raw data and processed data?
10. How do you prioritize which analysis to work on first?
๐ SQL Databases
11. What is SQL and why is it critical for data analysts?
12. How do SELECT, WHERE, ORDER BY, LIMIT work?
13. How do you join two tables (INNER, LEFT, RIGHT, FULL joins)?
14. How do GROUP BY and aggregate functions (SUM, AVG, COUNT, MAX, MIN) work?
15. How do you write subqueries and CTEs?
16. How do you calculate running totals or rolling averages with window functions?
17. How do you clean and filter data directly in SQL?
18. How do you handle duplicates and NULL values in SQL?
19. How do you optimize a slow query?
20. How do you design a simple schema for a business domain (e.g., orders, users)?
๐งฎ Excel Spreadsheets
21. How do you use Excel for quick data cleaning and analysis?
22. How do you use SUMIF, COUNTIF, VLOOKUP / XLOOKUP in Excel?
23. How do you remove duplicates and standardize text in Excel?
24. How do you use PivotTables for summarizing data?
25. How do you build simple dashboards in Excel (charts + slicers)?
26. How do you use conditional formatting for insights?
27. How do you export data to CSV or share formatted reports?
28. How do you handle large datasets in Excel vs a database?
29. How do you avoid common Excel pitfalls (e.g., hardโcoded numbers, no labels)?
30. How do you document your Excel analyses?
๐ Data Visualization BI Tools
31. What is the purpose of data visualization?
32. When do you use bar charts, line charts, pie charts, histograms?
33. What are best practices for labeling, colors, and readability?
34. How do you design a dashboard for a nonโtechnical stakeholder?
35. What is the difference between a report and a selfโservice dashboard?
36. How do you use Power BI / Tableau / Looker / Google Data Studio for dashboards?
37. How do you filter and slice data in a BI tool?
38. How do you handle measures and dimensions in BI tools?
39. How do you share dashboards and control access?
40. How do you tell a โdata storyโ using charts and annotations?
๐ Descriptive Statistics EDA
41. What are mean, median, and mode?
42. What is standard deviation and variance?
43. What are quartiles and IQR?
44. How do you detect outliers and what should you do with them?
45. What is a distribution and how do you inspect it (histograms, boxplots)?
46. What is skewness and kurtosis?
47. How do you calculate growth rate, percentage change, CAGR?
48. How do you compute cohortโstyle metrics (e.g., retention by signup month)?
49. How do you summarize categorical vs numerical data?
50. How do you structure an EDA notebook or report?
๐ ๏ธ Python (or R) for Data Analysis
51. Why do data analysts use Python instead of (or along with) Excel?
52. How do you load data from CSV or SQL into a pandas DataFrame?
53. How do you inspect the first/last rows, shape, data types, and missing values?
54. How do you clean missing values (dropna, fillna, interpolation)?
55. How do you filter, sort, and group data with pandas?
56. How do you calculate aggregates and pivots with groupby and pivot_table?
โค12
60. How do you compute month-on-month or week-on-week growth?
61. How do you write a query to calculate retention / churn?
62. How do you calculate LTV (lifetime value) conceptually?
63. How do you write a funnel analysis query (e.g., sign-up โ activation โ purchase)?
64. How do you handle time-based aggregations (daily, weekly, monthly)?
65. How do you compare cohorts (e.g., users by month of acquisition)?
66. How do you calculate lead-time, cycle-time, or other business-process metrics?
67. How do you implement A/B test-style analysis in SQL?
68. How do you approximate segmentation (RFM-style) in SQL?
69. How do you document and version your SQL queries?
๐ง Behavioral Business-Sense Questions
70. Walk me through a real-world analysis you did end-to-end.
71. Tell me about a time you presented insights to a non-technical audience.
72. Tell me about a time your analysis changed a decision or strategy.
73. Tell me about a time you found a data quality issue and how you fixed it.
74. How do you translate a vague business question into a concrete analysis?
75. How do you handle conflicting priorities from stakeholders?
76. How do you collaborate with product, marketing, and engineering teams?
77. How do you validate your analysis before sharing it?
78. How do you explain statistical or technical concepts in simple language?
79. How do you stay updated with data-analysis trends and tools?
๐ Real-World Case-Study / Scenario-Style Questions
80. Design an analysis to track product usage or feature adoption.
81. Design an analysis to evaluate marketing campaign performance.
82. Design a churn / retention dashboard for a SaaS product.
83. Design a sales-performance report for a regional team.
84. Design a customer-segmentation analysis (e.g., high-value vs low-value).
85. How would you analyze a sudden drop in website traffic or orders?
86. How would you analyze a pricing change or discount test?
87. How would you analyze customer support ticket volume and trends?
88. How would you design a simple A/B test and its success metrics?
89. How would you explain results and next steps to a manager?
๐ง Tooling, Processes Best Practices
90. What tools do you use most often as a data analyst?
91. How do you version your code and SQL (e.g., Git, folder structure)?
92. How do you document queries, dashboards, and assumptions?
93. How do you handle data privacy and PII in your analyses?
94. How do you manage permissions and access to dashboards?
95. How do you automate repetitive reports (scheduled exports, SQL jobs, etc.)?
96. How do you handle ad-hoc vs recurring analyses?
97. How do you get feedback on your dashboards and improve them?
98. What are your top 5 productivity shortcuts / habits as a data analyst?
99. What skills do you want to improve most in the next 6โ12 months?
๐ Double Tap โฅ๏ธ For Detailed Answers
61. How do you write a query to calculate retention / churn?
62. How do you calculate LTV (lifetime value) conceptually?
63. How do you write a funnel analysis query (e.g., sign-up โ activation โ purchase)?
64. How do you handle time-based aggregations (daily, weekly, monthly)?
65. How do you compare cohorts (e.g., users by month of acquisition)?
66. How do you calculate lead-time, cycle-time, or other business-process metrics?
67. How do you implement A/B test-style analysis in SQL?
68. How do you approximate segmentation (RFM-style) in SQL?
69. How do you document and version your SQL queries?
๐ง Behavioral Business-Sense Questions
70. Walk me through a real-world analysis you did end-to-end.
71. Tell me about a time you presented insights to a non-technical audience.
72. Tell me about a time your analysis changed a decision or strategy.
73. Tell me about a time you found a data quality issue and how you fixed it.
74. How do you translate a vague business question into a concrete analysis?
75. How do you handle conflicting priorities from stakeholders?
76. How do you collaborate with product, marketing, and engineering teams?
77. How do you validate your analysis before sharing it?
78. How do you explain statistical or technical concepts in simple language?
79. How do you stay updated with data-analysis trends and tools?
๐ Real-World Case-Study / Scenario-Style Questions
80. Design an analysis to track product usage or feature adoption.
81. Design an analysis to evaluate marketing campaign performance.
82. Design a churn / retention dashboard for a SaaS product.
83. Design a sales-performance report for a regional team.
84. Design a customer-segmentation analysis (e.g., high-value vs low-value).
85. How would you analyze a sudden drop in website traffic or orders?
86. How would you analyze a pricing change or discount test?
87. How would you analyze customer support ticket volume and trends?
88. How would you design a simple A/B test and its success metrics?
89. How would you explain results and next steps to a manager?
๐ง Tooling, Processes Best Practices
90. What tools do you use most often as a data analyst?
91. How do you version your code and SQL (e.g., Git, folder structure)?
92. How do you document queries, dashboards, and assumptions?
93. How do you handle data privacy and PII in your analyses?
94. How do you manage permissions and access to dashboards?
95. How do you automate repetitive reports (scheduled exports, SQL jobs, etc.)?
96. How do you handle ad-hoc vs recurring analyses?
97. How do you get feedback on your dashboards and improve them?
98. What are your top 5 productivity shortcuts / habits as a data analyst?
99. What skills do you want to improve most in the next 6โ12 months?
๐ Double Tap โฅ๏ธ For Detailed Answers
โค18
What is the main purpose of an INDEX in SQL?
Anonymous Quiz
2%
A. Delete data
10%
B. Increase storage
78%
C. Speed up data retrieval
10%
D. Create tables
Which command is used to create an index?
Anonymous Quiz
11%
A. ADD INDEX
84%
B. CREATE INDEX
3%
C. MAKE INDEX
3%
D. NEW INDEX
What happens when a PRIMARY KEY is created?
Anonymous Quiz
3%
A. Table gets deleted
86%
B. Automatic index is created
8%
C. Data becomes duplicated
3%
D. Query becomes invalid
Which operation becomes faster with indexes?
Anonymous Quiz
62%
A. SELECT
21%
B. INSERT
13%
C. UPDATE
3%
D. DELETE
โค4
What is a Composite Index?
Anonymous Quiz
16%
A. Index on one column
75%
B. Index on multiple columns
6%
C. Index without table
2%
D. Duplicate index
๐ Data Analyst Interview Questions with Answers โ Part 1
๐ง Data Analyst Role & Basics
1. What does a data analyst do in a company?
A data analyst collects, cleans, analyzes, and interprets data to help businesses make better decisions. They create reports, dashboards, and insights that improve performance, reduce costs, and identify opportunities.
2. What is the difference between a data analyst, data scientist, and BI analyst?
โ Data Analyst โ Focuses on analyzing historical data, creating reports, dashboards, and business insights.
โ Data Scientist โ Works on advanced analytics, machine learning, predictive modeling, and AI solutions.
โ BI Analyst โ Primarily focuses on business intelligence tools like Power BI/Tableau to build dashboards and monitor KPIs.
3. What is the typical workflow of a data analyst?
A common workflow is:
1๏ธโฃ Understand business requirements
2๏ธโฃ Collect data from databases/files/APIs
3๏ธโฃ Clean and preprocess data
4๏ธโฃ Analyze data using SQL/Excel/Python
5๏ธโฃ Create dashboards or visualizations
6๏ธโฃ Present insights to stakeholders
7๏ธโฃ Monitor results and improve analysis
4. What are the main goals of data analysis?
๐ Descriptive Analysis โ What happened?
๐ Diagnostic Analysis โ Why did it happen?
๐ฎ Predictive Analysis โ What may happen next?
๐ฏ Prescriptive Analysis โ What action should be taken?
5. What is KPI and why is it important?
KPI (Key Performance Indicator) is a measurable metric used to track business performance.
Examples:
โ๏ธ Revenue Growth
โ๏ธ Customer Retention
โ๏ธ Conversion Rate
โ๏ธ Website Traffic
KPIs help companies measure progress toward goals and make data-driven decisions.
6. What is the difference between metrics and KPIs?
๐ Metrics = Any measurable value
Example: Number of website visitors
๐ KPIs = Critical metrics tied to business goals
Example: Monthly customer conversion rate
๐ All KPIs are metrics, but not all metrics are KPIs.
7. What is a dashboard vs a report?
๐ Dashboard
โข Interactive
โข Real-time or frequently updated
โข High-level overview of KPIs
๐ Report
โข Detailed and static
โข Often shared weekly/monthly
โข Used for deep analysis
8. What is exploratory data analysis (EDA)?
EDA is the process of exploring and understanding data before detailed analysis or modeling.
It includes:
โ๏ธ Finding missing values
โ๏ธ Detecting outliers
โ๏ธ Understanding distributions
โ๏ธ Identifying trends and patterns
Tools commonly used: SQL, Excel, Python, Power BI.
9. What is the difference between raw data and processed data?
๐ Raw Data โ Original uncleaned data directly from sources.
Example: Duplicate rows, missing values, inconsistent formats.
๐ Processed Data โ Cleaned and transformed data ready for analysis.
10. How do you prioritize which analysis to work on first?
A data analyst usually prioritizes tasks based on:
โ Business impact
โ Urgency
โ Stakeholder requirements
โ Revenue/customer impact
โ Time and resource availability
High-impact and time-sensitive analyses are handled first.
๐ Double Tap โค๏ธ For More
๐ง Data Analyst Role & Basics
1. What does a data analyst do in a company?
A data analyst collects, cleans, analyzes, and interprets data to help businesses make better decisions. They create reports, dashboards, and insights that improve performance, reduce costs, and identify opportunities.
2. What is the difference between a data analyst, data scientist, and BI analyst?
โ Data Analyst โ Focuses on analyzing historical data, creating reports, dashboards, and business insights.
โ Data Scientist โ Works on advanced analytics, machine learning, predictive modeling, and AI solutions.
โ BI Analyst โ Primarily focuses on business intelligence tools like Power BI/Tableau to build dashboards and monitor KPIs.
3. What is the typical workflow of a data analyst?
A common workflow is:
1๏ธโฃ Understand business requirements
2๏ธโฃ Collect data from databases/files/APIs
3๏ธโฃ Clean and preprocess data
4๏ธโฃ Analyze data using SQL/Excel/Python
5๏ธโฃ Create dashboards or visualizations
6๏ธโฃ Present insights to stakeholders
7๏ธโฃ Monitor results and improve analysis
4. What are the main goals of data analysis?
๐ Descriptive Analysis โ What happened?
๐ Diagnostic Analysis โ Why did it happen?
๐ฎ Predictive Analysis โ What may happen next?
๐ฏ Prescriptive Analysis โ What action should be taken?
5. What is KPI and why is it important?
KPI (Key Performance Indicator) is a measurable metric used to track business performance.
Examples:
โ๏ธ Revenue Growth
โ๏ธ Customer Retention
โ๏ธ Conversion Rate
โ๏ธ Website Traffic
KPIs help companies measure progress toward goals and make data-driven decisions.
6. What is the difference between metrics and KPIs?
๐ Metrics = Any measurable value
Example: Number of website visitors
๐ KPIs = Critical metrics tied to business goals
Example: Monthly customer conversion rate
๐ All KPIs are metrics, but not all metrics are KPIs.
7. What is a dashboard vs a report?
๐ Dashboard
โข Interactive
โข Real-time or frequently updated
โข High-level overview of KPIs
๐ Report
โข Detailed and static
โข Often shared weekly/monthly
โข Used for deep analysis
8. What is exploratory data analysis (EDA)?
EDA is the process of exploring and understanding data before detailed analysis or modeling.
It includes:
โ๏ธ Finding missing values
โ๏ธ Detecting outliers
โ๏ธ Understanding distributions
โ๏ธ Identifying trends and patterns
Tools commonly used: SQL, Excel, Python, Power BI.
9. What is the difference between raw data and processed data?
๐ Raw Data โ Original uncleaned data directly from sources.
Example: Duplicate rows, missing values, inconsistent formats.
๐ Processed Data โ Cleaned and transformed data ready for analysis.
10. How do you prioritize which analysis to work on first?
A data analyst usually prioritizes tasks based on:
โ Business impact
โ Urgency
โ Stakeholder requirements
โ Revenue/customer impact
โ Time and resource availability
High-impact and time-sensitive analyses are handled first.
๐ Double Tap โค๏ธ For More
โค15
๐ Data Analyst Interview Questions with Answers โ Part 2
๐ SQL & Databases
11. What is SQL and why is it critical for data analysts?
SQL (Structured Query Language) is used to communicate with databases. It helps analysts retrieve, filter, clean, and analyze data efficiently.
It is critical because most business data is stored in databases, and SQL allows analysts to extract insights directly from large datasets.
12. How do "SELECT", "WHERE", "ORDER BY", and "LIMIT" work?
โ "SELECT" โ Used to choose columns from a table
SELECT name, salary FROM employees;
โ "WHERE" โ Filters rows based on conditions
SELECT FROM employees
WHERE salary > 50000;
โ "ORDER BY" โ Sorts data ascending or descending
SELECT FROM employees
ORDER BY salary DESC;
โ "LIMIT" โ Restricts the number of rows returned
SELECT FROM employees
LIMIT 5;
13. How do you join two tables ("INNER", "LEFT", "RIGHT", "FULL" joins)?
๐ "INNER JOIN" โ Returns matching records from both tables
๐ "LEFT JOIN" โ Returns all records from the left table + matching rows from the right table
๐ "RIGHT JOIN" โ Returns all records from the right table + matching rows from the left table
๐ "FULL JOIN" โ Returns all matching and non-matching records from both tables
Example:
SELECT customers.name, orders.order_id
FROM customers
INNER JOIN orders
ON customers.id = orders.customer_id;
14. How do "GROUP BY" and aggregate functions work?
Aggregate functions summarize data.
Common functions:
โ๏ธ "SUM()"
โ๏ธ "AVG()"
โ๏ธ "COUNT()"
โ๏ธ "MAX()"
โ๏ธ "MIN()"
Example:
SELECT department, AVG(salary)
FROM employees
GROUP BY department;
This groups employees by department and calculates average salary.
15. How do you write subqueries and CTEs?
๐ Subquery โ Query inside another query
SELECT name
FROM employees
WHERE salary > (
SELECT AVG(salary)
FROM employees
);
๐ CTE (Common Table Expression) โ Temporary result set that improves readability
WITH high_salary AS (
SELECT
FROM employees
WHERE salary > 50000
)
SELECT FROM high_salary;
16. How do you calculate running totals or rolling averages with window functions?
Window functions perform calculations across rows without collapsing data.
Example โ Running Total:
SELECT order_date,
sales,
SUM(sales) OVER (ORDER BY order_date) AS running_total
FROM orders;
Example โ Rolling Average:
SELECT order_date,
AVG(sales) OVER (
ORDER BY order_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS rolling_avg
FROM orders;
17. How do you clean and filter data directly in SQL?
Data cleaning in SQL includes:
โ๏ธ Removing duplicates
โ๏ธ Handling NULL values
โ๏ธ Standardizing text
โ๏ธ Filtering invalid rows
Example:
SELECT TRIM(LOWER(name))
FROM customers
WHERE email IS NOT NULL;
18. How do you handle duplicates and NULL values in SQL?
โ Remove duplicates using "DISTINCT"
SELECT DISTINCT city
FROM customers;
โ Find NULL values
SELECT
FROM employees
WHERE salary IS NULL;
โ Replace NULL values
SELECT COALESCE(salary, 0)
FROM employees;
19. How do you optimize a slow query?
Common optimization techniques:
๐ Use indexes
๐ Avoid unnecessary columns in "SELECT *"
๐ Filter data early using "WHERE"
๐ Optimize joins
๐ Use proper aggregations
๐ Analyze execution plans
Efficient queries improve performance and reduce database load.
20. How do you design a simple schema for a business domain?
A schema organizes data into related tables.
Example for an e-commerce business:
๐ "Customers" table
๐ "Orders" table
๐ "Products" table
๐ "Payments" table
Relationships are created using primary keys and foreign keys to maintain data integrity.
๐ Double Tap โค๏ธ For Part-3
๐ SQL & Databases
11. What is SQL and why is it critical for data analysts?
SQL (Structured Query Language) is used to communicate with databases. It helps analysts retrieve, filter, clean, and analyze data efficiently.
It is critical because most business data is stored in databases, and SQL allows analysts to extract insights directly from large datasets.
12. How do "SELECT", "WHERE", "ORDER BY", and "LIMIT" work?
โ "SELECT" โ Used to choose columns from a table
SELECT name, salary FROM employees;
โ "WHERE" โ Filters rows based on conditions
SELECT FROM employees
WHERE salary > 50000;
โ "ORDER BY" โ Sorts data ascending or descending
SELECT FROM employees
ORDER BY salary DESC;
โ "LIMIT" โ Restricts the number of rows returned
SELECT FROM employees
LIMIT 5;
13. How do you join two tables ("INNER", "LEFT", "RIGHT", "FULL" joins)?
๐ "INNER JOIN" โ Returns matching records from both tables
๐ "LEFT JOIN" โ Returns all records from the left table + matching rows from the right table
๐ "RIGHT JOIN" โ Returns all records from the right table + matching rows from the left table
๐ "FULL JOIN" โ Returns all matching and non-matching records from both tables
Example:
SELECT customers.name, orders.order_id
FROM customers
INNER JOIN orders
ON customers.id = orders.customer_id;
14. How do "GROUP BY" and aggregate functions work?
Aggregate functions summarize data.
Common functions:
โ๏ธ "SUM()"
โ๏ธ "AVG()"
โ๏ธ "COUNT()"
โ๏ธ "MAX()"
โ๏ธ "MIN()"
Example:
SELECT department, AVG(salary)
FROM employees
GROUP BY department;
This groups employees by department and calculates average salary.
15. How do you write subqueries and CTEs?
๐ Subquery โ Query inside another query
SELECT name
FROM employees
WHERE salary > (
SELECT AVG(salary)
FROM employees
);
๐ CTE (Common Table Expression) โ Temporary result set that improves readability
WITH high_salary AS (
SELECT
FROM employees
WHERE salary > 50000
)
SELECT FROM high_salary;
16. How do you calculate running totals or rolling averages with window functions?
Window functions perform calculations across rows without collapsing data.
Example โ Running Total:
SELECT order_date,
sales,
SUM(sales) OVER (ORDER BY order_date) AS running_total
FROM orders;
Example โ Rolling Average:
SELECT order_date,
AVG(sales) OVER (
ORDER BY order_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS rolling_avg
FROM orders;
17. How do you clean and filter data directly in SQL?
Data cleaning in SQL includes:
โ๏ธ Removing duplicates
โ๏ธ Handling NULL values
โ๏ธ Standardizing text
โ๏ธ Filtering invalid rows
Example:
SELECT TRIM(LOWER(name))
FROM customers
WHERE email IS NOT NULL;
18. How do you handle duplicates and NULL values in SQL?
โ Remove duplicates using "DISTINCT"
SELECT DISTINCT city
FROM customers;
โ Find NULL values
SELECT
FROM employees
WHERE salary IS NULL;
โ Replace NULL values
SELECT COALESCE(salary, 0)
FROM employees;
19. How do you optimize a slow query?
Common optimization techniques:
๐ Use indexes
๐ Avoid unnecessary columns in "SELECT *"
๐ Filter data early using "WHERE"
๐ Optimize joins
๐ Use proper aggregations
๐ Analyze execution plans
Efficient queries improve performance and reduce database load.
20. How do you design a simple schema for a business domain?
A schema organizes data into related tables.
Example for an e-commerce business:
๐ "Customers" table
๐ "Orders" table
๐ "Products" table
๐ "Payments" table
Relationships are created using primary keys and foreign keys to maintain data integrity.
๐ Double Tap โค๏ธ For Part-3
โค22
๐ Data Analyst Interview Questions with Answers โ Part 3
๐งฎ Excel & Spreadsheets
21. How do you use Excel for quick data cleaning and analysis?
Excel is widely used for fast data cleaning and exploration.
Common tasks include:
- Removing duplicates
- Filtering and sorting data
- Using formulas
- Creating PivotTables
- Applying conditional formatting
- Cleaning text using functions like TRIM, UPPER, LOWER
It is useful for quick business analysis without writing code.
22. How do you use "SUMIF", "COUNTIF", "VLOOKUP", and "XLOOKUP" in Excel?
โ SUMIF โ Adds values based on a condition
=SUMIF(A:A,"Sales",B:B)
โ COUNTIF โ Counts cells matching a condition
=COUNTIF(C:C,">500")
โ VLOOKUP โ Searches vertically for a value
=VLOOKUP(101,A:D,2,FALSE)
โ XLOOKUP โ Modern replacement for VLOOKUP with more flexibility
=XLOOKUP(101,A:A,B:B)
23. How do you remove duplicates and standardize text in Excel?
๐ Remove duplicates using: Data โ Remove Duplicates
๐ Standardize text using functions:
=TRIM(A2)
=UPPER(A2)
=LOWER(A2)
=PROPER(A2)
These functions help clean inconsistent formatting.
24. How do you use PivotTables for summarizing data?
PivotTables quickly summarize large datasets without formulas.
They help with:
- Total sales by region
- Average revenue by product
- Monthly trends
- Category-wise counts
Steps:
1. Select dataset
2. Insert โ PivotTable
3. Drag fields into Rows, Columns, and Values
25. How do you build simple dashboards in Excel?
A basic Excel dashboard usually contains:
- Charts
- KPIs
- PivotTables
- Slicers
- Conditional formatting
Dashboards help stakeholders track important business metrics visually.
26. How do you use conditional formatting for insights?
Conditional formatting highlights patterns automatically.
Examples:
- Highlight top performers
- Show duplicate values
- Identify low sales
- Use color scales for trends
Example:
Home โ Conditional Formatting โ Highlight Cell Rules
27. How do you export data to CSV or share formatted reports?
โ Save files as .csv for database imports or system sharing
File โ Save As โ CSV
โ Share formatted reports using:
- Excel files
- PDFs
- Shared OneDrive/Google Drive links
Always ensure formatting and labels are clear before sharing.
28. How do you handle large datasets in Excel vs a database?
๐ Excel is good for: smaller datasets and quick analysis.
๐ Databases are better for:
- Millions of rows
- Faster querying
- Multi-user access
- Better performance and security
Analysts often use SQL databases for large-scale analysis.
29. How do you avoid common Excel pitfalls?
Common best practices:
- Avoid hard-coded numbers in formulas
- Avoid merged cells
- Donโt leave blank headers
- Avoid inconsistent formatting
Do instead:
- Use proper labels
- Keep raw data separate from analysis
- Document formulas clearly
30. How do you document your Excel analyses?
Good documentation includes:
- Sheet descriptions
- Formula explanations
- Data-source details
- Assumptions used
- KPI definitions
- Date/version tracking
Proper documentation improves collaboration and reduces confusion.
๐ Double Tap โค๏ธ For Part-4
๐งฎ Excel & Spreadsheets
21. How do you use Excel for quick data cleaning and analysis?
Excel is widely used for fast data cleaning and exploration.
Common tasks include:
- Removing duplicates
- Filtering and sorting data
- Using formulas
- Creating PivotTables
- Applying conditional formatting
- Cleaning text using functions like TRIM, UPPER, LOWER
It is useful for quick business analysis without writing code.
22. How do you use "SUMIF", "COUNTIF", "VLOOKUP", and "XLOOKUP" in Excel?
โ SUMIF โ Adds values based on a condition
=SUMIF(A:A,"Sales",B:B)
โ COUNTIF โ Counts cells matching a condition
=COUNTIF(C:C,">500")
โ VLOOKUP โ Searches vertically for a value
=VLOOKUP(101,A:D,2,FALSE)
โ XLOOKUP โ Modern replacement for VLOOKUP with more flexibility
=XLOOKUP(101,A:A,B:B)
23. How do you remove duplicates and standardize text in Excel?
๐ Remove duplicates using: Data โ Remove Duplicates
๐ Standardize text using functions:
=TRIM(A2)
=UPPER(A2)
=LOWER(A2)
=PROPER(A2)
These functions help clean inconsistent formatting.
24. How do you use PivotTables for summarizing data?
PivotTables quickly summarize large datasets without formulas.
They help with:
- Total sales by region
- Average revenue by product
- Monthly trends
- Category-wise counts
Steps:
1. Select dataset
2. Insert โ PivotTable
3. Drag fields into Rows, Columns, and Values
25. How do you build simple dashboards in Excel?
A basic Excel dashboard usually contains:
- Charts
- KPIs
- PivotTables
- Slicers
- Conditional formatting
Dashboards help stakeholders track important business metrics visually.
26. How do you use conditional formatting for insights?
Conditional formatting highlights patterns automatically.
Examples:
- Highlight top performers
- Show duplicate values
- Identify low sales
- Use color scales for trends
Example:
Home โ Conditional Formatting โ Highlight Cell Rules
27. How do you export data to CSV or share formatted reports?
โ Save files as .csv for database imports or system sharing
File โ Save As โ CSV
โ Share formatted reports using:
- Excel files
- PDFs
- Shared OneDrive/Google Drive links
Always ensure formatting and labels are clear before sharing.
28. How do you handle large datasets in Excel vs a database?
๐ Excel is good for: smaller datasets and quick analysis.
๐ Databases are better for:
- Millions of rows
- Faster querying
- Multi-user access
- Better performance and security
Analysts often use SQL databases for large-scale analysis.
29. How do you avoid common Excel pitfalls?
Common best practices:
- Avoid hard-coded numbers in formulas
- Avoid merged cells
- Donโt leave blank headers
- Avoid inconsistent formatting
Do instead:
- Use proper labels
- Keep raw data separate from analysis
- Document formulas clearly
30. How do you document your Excel analyses?
Good documentation includes:
- Sheet descriptions
- Formula explanations
- Data-source details
- Assumptions used
- KPI definitions
- Date/version tracking
Proper documentation improves collaboration and reduces confusion.
๐ Double Tap โค๏ธ For Part-4
โค14๐2
๐ Data Analyst Interview Questions with Answers โ Part 4
๐ Data Visualization & BI Tools
31. What is the purpose of data visualization?
Data visualization helps transform raw data into charts and visuals that are easier to understand.
It helps businesses:
โ๏ธ Identify trends
โ๏ธ Detect patterns
โ๏ธ Compare performance
โ๏ธ Make faster decisions
โ๏ธ Communicate insights clearly
Good visualizations simplify complex data.
32. When do you use bar charts, line charts, pie charts, and histograms?
๐ Bar Chart โ Compare categories
Example: Sales by region
๐ Line Chart โ Show trends over time
Example: Monthly revenue growth
๐ฅง Pie Chart โ Show proportions or percentages
Example: Market share distribution
๐ Histogram โ Show data distribution
Example: Customer age distribution
Choosing the correct chart improves readability and insight quality.
33. What are best practices for labeling, colors, and readability?
โ Use clear titles and labels
โ Keep charts simple and uncluttered
โ Use consistent colors
โ Highlight important insights
โ Avoid excessive colors or 3D effects
โ Ensure fonts are readable
โ Add legends only when necessary
The goal is to make insights easy to understand quickly.
34. How do you design a dashboard for a non-technical stakeholder?
A stakeholder-friendly dashboard should:
โ๏ธ Focus on business KPIs
โ๏ธ Use simple language
โ๏ธ Avoid technical jargon
โ๏ธ Include filters and slicers
โ๏ธ Show summary insights first
โ๏ธ Use intuitive charts and layouts
Dashboards should answer business questions immediately.
35. What is the difference between a report and a self-service dashboard?
๐ Report
โข Static and detailed
โข Usually scheduled weekly/monthly
โข Used for deep analysis
๐ Self-Service Dashboard
โข Interactive
โข Users can filter and explore data themselves
โข Real-time or frequently updated
Self-service dashboards improve decision-making speed.
36. How do you use Power BI, Tableau, Looker, or Google Data Studio for dashboards?
These BI tools help analysts:
โ๏ธ Connect multiple data sources
โ๏ธ Build interactive dashboards
โ๏ธ Create KPIs and measures
โ๏ธ Apply filters and drill-downs
โ๏ธ Share reports with teams
Popular tools include:
๐ Microsoft Power BI
๐ Tableau
๐ Looker
๐ Google Data Studio
37. How do you filter and slice data in a BI tool?
Filters and slicers allow users to interact with dashboards dynamically.
Examples:
โ๏ธ Filter by date range
โ๏ธ Select region or product category
โ๏ธ Drill down into specific KPIs
This helps users analyze data without modifying the original report.
38. How do you handle measures and dimensions in BI tools?
๐ Dimensions โ Qualitative fields used for categorization
Examples: Product, Region, Customer Name
๐ Measures โ Numerical fields used for calculations
Examples: Revenue, Profit, Quantity Sold
Dimensions segment the data, while measures calculate insights.
39. How do you share dashboards and control access?
Dashboards are usually shared through:
โ๏ธ Cloud workspaces
โ๏ธ Scheduled email reports
โ๏ธ Embedded links
โ๏ธ Organization portals
Access control is managed using:
๐ User permissions
๐ Row-level security
๐ Workspace roles
This ensures sensitive data is protected.
40. How do you tell a โdata storyโ using charts and annotations?
Data storytelling combines visuals with business context.
A good data story should:
๐ Start with the business problem
๐ Present key findings clearly
๐ Use charts to support insights
๐ Add annotations for important trends
๐ End with recommendations or actions
The goal is not just showing numbers, but explaining what they mean for the business.
๐ Double Tap โค๏ธ For Part-5
๐ Data Visualization & BI Tools
31. What is the purpose of data visualization?
Data visualization helps transform raw data into charts and visuals that are easier to understand.
It helps businesses:
โ๏ธ Identify trends
โ๏ธ Detect patterns
โ๏ธ Compare performance
โ๏ธ Make faster decisions
โ๏ธ Communicate insights clearly
Good visualizations simplify complex data.
32. When do you use bar charts, line charts, pie charts, and histograms?
๐ Bar Chart โ Compare categories
Example: Sales by region
๐ Line Chart โ Show trends over time
Example: Monthly revenue growth
๐ฅง Pie Chart โ Show proportions or percentages
Example: Market share distribution
๐ Histogram โ Show data distribution
Example: Customer age distribution
Choosing the correct chart improves readability and insight quality.
33. What are best practices for labeling, colors, and readability?
โ Use clear titles and labels
โ Keep charts simple and uncluttered
โ Use consistent colors
โ Highlight important insights
โ Avoid excessive colors or 3D effects
โ Ensure fonts are readable
โ Add legends only when necessary
The goal is to make insights easy to understand quickly.
34. How do you design a dashboard for a non-technical stakeholder?
A stakeholder-friendly dashboard should:
โ๏ธ Focus on business KPIs
โ๏ธ Use simple language
โ๏ธ Avoid technical jargon
โ๏ธ Include filters and slicers
โ๏ธ Show summary insights first
โ๏ธ Use intuitive charts and layouts
Dashboards should answer business questions immediately.
35. What is the difference between a report and a self-service dashboard?
๐ Report
โข Static and detailed
โข Usually scheduled weekly/monthly
โข Used for deep analysis
๐ Self-Service Dashboard
โข Interactive
โข Users can filter and explore data themselves
โข Real-time or frequently updated
Self-service dashboards improve decision-making speed.
36. How do you use Power BI, Tableau, Looker, or Google Data Studio for dashboards?
These BI tools help analysts:
โ๏ธ Connect multiple data sources
โ๏ธ Build interactive dashboards
โ๏ธ Create KPIs and measures
โ๏ธ Apply filters and drill-downs
โ๏ธ Share reports with teams
Popular tools include:
๐ Microsoft Power BI
๐ Tableau
๐ Looker
๐ Google Data Studio
37. How do you filter and slice data in a BI tool?
Filters and slicers allow users to interact with dashboards dynamically.
Examples:
โ๏ธ Filter by date range
โ๏ธ Select region or product category
โ๏ธ Drill down into specific KPIs
This helps users analyze data without modifying the original report.
38. How do you handle measures and dimensions in BI tools?
๐ Dimensions โ Qualitative fields used for categorization
Examples: Product, Region, Customer Name
๐ Measures โ Numerical fields used for calculations
Examples: Revenue, Profit, Quantity Sold
Dimensions segment the data, while measures calculate insights.
39. How do you share dashboards and control access?
Dashboards are usually shared through:
โ๏ธ Cloud workspaces
โ๏ธ Scheduled email reports
โ๏ธ Embedded links
โ๏ธ Organization portals
Access control is managed using:
๐ User permissions
๐ Row-level security
๐ Workspace roles
This ensures sensitive data is protected.
40. How do you tell a โdata storyโ using charts and annotations?
Data storytelling combines visuals with business context.
A good data story should:
๐ Start with the business problem
๐ Present key findings clearly
๐ Use charts to support insights
๐ Add annotations for important trends
๐ End with recommendations or actions
The goal is not just showing numbers, but explaining what they mean for the business.
๐ Double Tap โค๏ธ For Part-5
โค13
๐ Data Analyst Interview Questions with Answers โ Part 5
๐ Descriptive Statistics & EDA
41. What are mean, median, and mode?
๐ Mean โ Average value of data
Mean = Sum of all values / Number of values
๐ Median โ Middle value when data is sorted
๐ Mode โ Most frequently occurring value
These measures help summarize data quickly.
42. What is standard deviation and variance?
๐ Variance measures how far data points spread from the mean.
๐ Standard Deviation is the square root of variance and shows data variability in the same unit as the data.
Low standard deviation โ data points are close to the mean.
High standard deviation โ data points are more spread out.
43. What are quartiles and IQR?
๐ Quartiles divide data into four equal parts.
โข Q1 โ 25th percentile
โข Q2 โ Median (50th percentile)
โข Q3 โ 75th percentile
๐ IQR (Interquartile Range) measures the spread of the middle 50% of data.
IQR = Q3 - Q1
IQR is commonly used to detect outliers.
44. How do you detect outliers and what should you do with them?
Outliers are unusual data points that differ significantly from other observations.
Common detection methods:
โ๏ธ Boxplots
โ๏ธ Z-score
โ๏ธ IQR method
Possible actions:
๐ Remove incorrect data
๐ Investigate business reasons
๐ Transform data if needed
๐ Keep them if they are valid business cases
45. What is a distribution and how do you inspect it?
A distribution shows how data values are spread.
Common ways to inspect distributions:
๐ Histograms
๐ Boxplots
๐ Density plots
These help analysts understand patterns, skewness, and variability.
46. What is skewness and kurtosis?
๐ Skewness measures asymmetry in data distribution.
โข Positive skew โ Tail on the right
โข Negative skew โ Tail on the left
๐ Kurtosis measures how heavy or light the tails of a distribution are compared to normal distribution.
These metrics help understand data behavior.
47. How do you calculate growth rate, percentage change, and CAGR?
๐ Percentage Change Formula:
Percentage Change = (New Value - Old Value) / Old Value * 100
๐ CAGR (Compound Annual Growth Rate):
CAGR = (Ending Value / Beginning Value)^(1/n) - 1
Where n = number of years
These metrics are widely used in finance and business performance tracking.
48. How do you compute cohort-style metrics?
Cohort analysis groups users based on a shared characteristic such as signup month.
Example:
๐ Retention rate by signup month
๐ Revenue by customer acquisition month
It helps businesses analyze user behavior over time.
49. How do you summarize categorical vs numerical data?
๐ Categorical Data โ Summarized using counts, percentages, and frequency tables.
Examples:
โ๏ธ Gender
โ๏ธ Country
โ๏ธ Product Category
๐ Numerical Data โ Summarized using statistical measures.
Examples:
โ๏ธ Mean
โ๏ธ Median
โ๏ธ Standard deviation
โ๏ธ Minimum and maximum values
50. How do you structure an EDA notebook or report?
A good EDA structure usually includes:
1๏ธโฃ Business problem statement
2๏ธโฃ Data overview
3๏ธโฃ Data cleaning steps
4๏ธโฃ Missing-value analysis
5๏ธโฃ Outlier detection
6๏ธโฃ Univariate and bivariate analysis
7๏ธโฃ Visualizations
8๏ธโฃ Key insights and recommendations
Well-structured EDA improves clarity and collaboration.
๐ Double Tap โค๏ธ For Part-6
๐ Descriptive Statistics & EDA
41. What are mean, median, and mode?
๐ Mean โ Average value of data
Mean = Sum of all values / Number of values
๐ Median โ Middle value when data is sorted
๐ Mode โ Most frequently occurring value
These measures help summarize data quickly.
42. What is standard deviation and variance?
๐ Variance measures how far data points spread from the mean.
๐ Standard Deviation is the square root of variance and shows data variability in the same unit as the data.
Low standard deviation โ data points are close to the mean.
High standard deviation โ data points are more spread out.
43. What are quartiles and IQR?
๐ Quartiles divide data into four equal parts.
โข Q1 โ 25th percentile
โข Q2 โ Median (50th percentile)
โข Q3 โ 75th percentile
๐ IQR (Interquartile Range) measures the spread of the middle 50% of data.
IQR = Q3 - Q1
IQR is commonly used to detect outliers.
44. How do you detect outliers and what should you do with them?
Outliers are unusual data points that differ significantly from other observations.
Common detection methods:
โ๏ธ Boxplots
โ๏ธ Z-score
โ๏ธ IQR method
Possible actions:
๐ Remove incorrect data
๐ Investigate business reasons
๐ Transform data if needed
๐ Keep them if they are valid business cases
45. What is a distribution and how do you inspect it?
A distribution shows how data values are spread.
Common ways to inspect distributions:
๐ Histograms
๐ Boxplots
๐ Density plots
These help analysts understand patterns, skewness, and variability.
46. What is skewness and kurtosis?
๐ Skewness measures asymmetry in data distribution.
โข Positive skew โ Tail on the right
โข Negative skew โ Tail on the left
๐ Kurtosis measures how heavy or light the tails of a distribution are compared to normal distribution.
These metrics help understand data behavior.
47. How do you calculate growth rate, percentage change, and CAGR?
๐ Percentage Change Formula:
Percentage Change = (New Value - Old Value) / Old Value * 100
๐ CAGR (Compound Annual Growth Rate):
CAGR = (Ending Value / Beginning Value)^(1/n) - 1
Where n = number of years
These metrics are widely used in finance and business performance tracking.
48. How do you compute cohort-style metrics?
Cohort analysis groups users based on a shared characteristic such as signup month.
Example:
๐ Retention rate by signup month
๐ Revenue by customer acquisition month
It helps businesses analyze user behavior over time.
49. How do you summarize categorical vs numerical data?
๐ Categorical Data โ Summarized using counts, percentages, and frequency tables.
Examples:
โ๏ธ Gender
โ๏ธ Country
โ๏ธ Product Category
๐ Numerical Data โ Summarized using statistical measures.
Examples:
โ๏ธ Mean
โ๏ธ Median
โ๏ธ Standard deviation
โ๏ธ Minimum and maximum values
50. How do you structure an EDA notebook or report?
A good EDA structure usually includes:
1๏ธโฃ Business problem statement
2๏ธโฃ Data overview
3๏ธโฃ Data cleaning steps
4๏ธโฃ Missing-value analysis
5๏ธโฃ Outlier detection
6๏ธโฃ Univariate and bivariate analysis
7๏ธโฃ Visualizations
8๏ธโฃ Key insights and recommendations
Well-structured EDA improves clarity and collaboration.
๐ Double Tap โค๏ธ For Part-6
โค19๐2๐1