Machine Learning
39.4K subscribers
4.35K photos
40 videos
50 files
1.42K links
Real Machine Learning β€” simple, practical, and built on experience.
Learn step by step with clear explanations and working code.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
πŸ“Œ RF-DETR Under the Hood: The Insights of a Real-Time Transformer Detection

πŸ—‚ Category: DEEP LEARNING

πŸ•’ Date: 2025-10-31 | ⏱️ Read time: 6 min read

From rigid grids to adaptive attention, this is the evolutionary path that made detection transformers…
πŸ“Œ TDS Newsletter: October Must-Reads on Agents, Python, Context Engineering, and More

πŸ—‚ Category: THE VARIABLE

πŸ•’ Date: 2025-10-30 | ⏱️ Read time: 3 min read

A good month on TDS is one in which we get to share a wide…
πŸ“Œ Long Short Term Memory (LSTM)- Improving RNNs

πŸ—‚ Category: DEEP LEARNING

πŸ•’ Date: 2024-05-31 | ⏱️ Read time: 9 min read

How state of the art RNNs work
πŸ“Œ Orchestrating a Dynamic Time-series Pipeline in Azure

πŸ—‚ Category: DATA ENGINEERING

πŸ•’ Date: 2024-05-31 | ⏱️ Read time: 9 min read

Explore how to build, trigger, and parameterize a time-series data pipeline with ADF and Databricks,…
πŸ“Œ Data Science Portfolios, Speeding Up Python, KANs, and Other May Must-Reads

πŸ—‚ Category: DATA SCIENCE

πŸ•’ Date: 2024-05-30 | ⏱️ Read time: 4 min read

The stories that resonated the most with our community in the past month
Top 30 Pandas Functions & Methods
πŸ‘‡πŸ‘‡πŸ‘‡πŸ‘‡
Please open Telegram to view this post
VIEW IN TELEGRAM
#Pandas #DataAnalysis #Python #DataScience #Tutorial

Top 30 Pandas Functions & Methods

This lesson covers 30 essential Pandas functions for data manipulation and analysis, each with a standalone example and its output.

---

1. pd.DataFrame()
Creates a new DataFrame (a 2D labeled data structure) from various inputs like dictionaries or lists.

import pandas as pd
data = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data)
print(df)

col1  col2
0 1 3
1 2 4


---

2. pd.Series()
Creates a new Series (a 1D labeled array).

import pandas as pd
s = pd.Series([10, 20, 30, 40], name='MyNumbers')
print(s)

0    10
1 20
2 30
3 40
Name: MyNumbers, dtype: int64


---

3. pd.read_csv()
Reads data from a CSV file into a DataFrame. (Assuming a file data.csv exists).

# Create a dummy csv file first
with open('data.csv', 'w') as f:
f.write('Name,Age\nAlice,25\nBob,30')

df = pd.read_csv('data.csv')
print(df)

Name  Age
0 Alice 25
1 Bob 30


---

4. df.to_csv()
Writes a DataFrame to a CSV file.

import pandas as pd
df = pd.DataFrame({'Name': ['Charlie'], 'Age': [35]})
# index=False prevents writing the DataFrame index to the file
df.to_csv('output.csv', index=False)
# You can check that 'output.csv' has been created.
print("File 'output.csv' created.")

File 'output.csv' created.

#PandasIO #DataFrame #Series

---

5. df.head()
Returns the first n rows of the DataFrame (default is 5).

import pandas as pd
data = {'Name': ['A', 'B', 'C', 'D', 'E', 'F'], 'Value': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
print(df.head(3))

Name  Value
0 A 1
1 B 2
2 C 3


---

6. df.tail()
Returns the last n rows of the DataFrame (default is 5).

import pandas as pd
data = {'Name': ['A', 'B', 'C', 'D', 'E', 'F'], 'Value': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
print(df.tail(2))

Name  Value
4 E 5
5 F 6


---

7. df.info()
Provides a concise summary of the DataFrame, including data types and non-null values.

import pandas as pd
import numpy as np
data = {'col1': [1, 2, 3], 'col2': [4.0, 5.0, np.nan], 'col3': ['A', 'B', 'C']}
df = pd.DataFrame(data)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 3 non-null int64
1 col2 2 non-null float64
2 col3 3 non-null object
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes


---

8. df.shape
Returns a tuple representing the dimensionality (rows, columns) of the DataFrame.

import pandas as pd
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4], 'C': [5, 6]})
print(df.shape)

(2, 3)

#DataInspection #PandasBasics

---

9. df.describe()
Generates descriptive statistics for numerical columns (count, mean, std, min, max, etc.).

import pandas as pd
df = pd.DataFrame({'Age': [22, 38, 26, 35, 29]})
print(df.describe())
❀4
Age
count 5.000000
mean 30.000000
std 6.363961
min 22.000000
25% 26.000000
50% 29.000000
75% 35.000000
max 38.000000


---

10. df.columns
Returns the column labels of the DataFrame.

import pandas as pd
df = pd.DataFrame({'Name': [], 'Age': [], 'City': []})
print(df.columns)

Index(['Name', 'Age', 'City'], dtype='object')


---

11. df.dtypes
Returns the data type of each column.

import pandas as pd
df = pd.DataFrame({'Name': ['Alice'], 'Age': [25], 'Salary': [75000.50]})
print(df.dtypes)

Name       object
Age int64
Salary float64
dtype: object


---

12. Selecting a Column
Select a single column, which returns a Pandas Series.

import pandas as pd
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
ages = df['Age']
print(ages)

0    25
1 30
Name: Age, dtype: int64

#DataSelection #Indexing #Statistics

---

13. df.loc[]
Access a group of rows and columns by label(s) or a boolean array.

import pandas as pd
data = {'Age': [25, 30, 35], 'City': ['NY', 'LA', 'CH']}
df = pd.DataFrame(data, index=['Alice', 'Bob', 'Charlie'])
print(df.loc['Bob'])

Age     30
City LA
Name: Bob, dtype: object


---

14. df.iloc[]
Access a group of rows and columns by integer position(s).

import pandas as pd
data = {'Age': [25, 30, 35], 'City': ['NY', 'LA', 'CH']}
df = pd.DataFrame(data, index=['Alice', 'Bob', 'Charlie'])
print(df.iloc[1]) # Get the second row (index 1)

Age     30
City LA
Name: Bob, dtype: object


---

15. df.isnull()
Returns a DataFrame of the same shape with boolean values indicating if a value is missing (NaN).

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan], 'B': [3, 4]})
print(df.isnull())

A      B
0 False False
1 True False


---

16. df.dropna()
Removes missing values.

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [4, 5, 6]})
cleaned_df = df.dropna()
print(cleaned_df)

A  B
0 1.0 4
2 3.0 6

#DataCleaning #MissingData

---

17. df.fillna()
Fills missing (NaN) values with a specified value or method.

import pandas as pd
import numpy as np
df = pd.DataFrame({'Score': [90, 85, np.nan, 92]})
filled_df = df.fillna(0)
print(filled_df)

Score
0 90.0
1 85.0
2 0.0
3 92.0


---

18. df.drop_duplicates()
Removes duplicate rows from the DataFrame.

import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Alice'], 'Age': [25, 30, 25]}
df = pd.DataFrame(data)
unique_df = df.drop_duplicates()
print(unique_df)

Name  Age
0 Alice 25
1 Bob 30


---

19. df.rename()
Alters axes labels (e.g., column names).

import pandas as pd
df = pd.DataFrame({'A': [1], 'B': [2]})
renamed_df = df.rename(columns={'A': 'Column_A', 'B': 'Column_B'})
print(renamed_df)

Column_A  Column_B
0 1 2


---

20. series.value_counts()
Returns a Series containing counts of unique values.
import pandas as pd
s = pd.Series(['A', 'B', 'A', 'C', 'A', 'B'])
print(s.value_counts())

A    3
B 2
C 1
dtype: int64

#DataManipulation #Transformation

---

21. series.unique()
Returns an array of unique values in a Series.

import pandas as pd
s = pd.Series(['A', 'B', 'A', 'C', 'A', 'B'])
print(s.unique())

['A' 'B' 'C']


---

22. df.sort_values()
Sorts a DataFrame by the values of one or more columns.

import pandas as pd
data = {'Name': ['Charlie', 'Alice', 'Bob'], 'Age': [35, 25, 30]}
df = pd.DataFrame(data)
sorted_df = df.sort_values(by='Age')
print(sorted_df)

Name  Age
1 Alice 25
2 Bob 30
0 Charlie 35


---

23. df.groupby()
Groups a DataFrame using a mapper or by a Series of columns for aggregation.

import pandas as pd
data = {'Dept': ['HR', 'IT', 'HR', 'IT'], 'Salary': [70, 85, 75, 90]}
df = pd.DataFrame(data)
grouped = df.groupby('Dept').mean()
print(grouped)

Salary
Dept
HR 72.5
IT 87.5


---

24. df.agg()
Applies one or more aggregations over the specified axis.

import pandas as pd
data = {'Dept': ['HR', 'IT', 'HR', 'IT'], 'Salary': [70, 85, 75, 90]}
df = pd.DataFrame(data)
agg_results = df.groupby('Dept')['Salary'].agg(['mean', 'sum'])
print(agg_results)

mean  sum
Dept
HR 72.5 145
IT 87.5 175

#Aggregation #Grouping #Sorting

---

25. df.apply()
Applies a function along an axis of the DataFrame.

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30]})
# Apply a function to double each value in column 'A'
df['A_doubled'] = df['A'].apply(lambda x: x * 2)
print(df)

A   B  A_doubled
0 1 10 2
1 2 20 4
2 3 30 6


---

26. pd.merge()
Merges two DataFrames based on a common column or index, similar to a SQL join.

import pandas as pd
df1 = pd.DataFrame({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'ID': [1, 2], 'Role': ['Engineer', 'Analyst']})
merged_df = pd.merge(df1, df2, on='ID')
print(merged_df)

ID   Name      Role
0 1 Alice Engineer
1 2 Bob Analyst


---

27. pd.concat()
Concatenates (stacks) pandas objects along a particular axis.

import pandas as pd
df1 = pd.DataFrame({'A': ['A0'], 'B': ['B0']})
df2 = pd.DataFrame({'A': ['A1'], 'B': ['B1']})
concatenated_df = pd.concat([df1, df2])
print(concatenated_df)

A   B
0 A0 B0
0 A1 B1


---

28. df.pivot_table()
Creates a spreadsheet-style pivot table as a DataFrame.
❀2
import pandas as pd
data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02'],
'City': ['NY', 'LA', 'NY'],
'Sales': [100, 150, 120]}
df = pd.DataFrame(data)
pivot = df.pivot_table(values='Sales', index='Date', columns='City')
print(pivot)

City             LA     NY
Date
2023-01-01 150.0 100.0
2023-01-02 NaN 120.0

#CombiningData #PivotTable

---

29. df.set_index()
Sets one or more existing columns as the DataFrame index.

import pandas as pd
data = {'ID': ['a1', 'a2'], 'Name': ['Alice', 'Bob']}
df = pd.DataFrame(data)
df_indexed = df.set_index('ID')
print(df_indexed)

Name
ID
a1 Alice
a2 Bob


---

30. df.reset_index()
Resets the index of the DataFrame, making the old index a new column.

import pandas as pd
data = {'Name': ['Alice', 'Bob']}
df = pd.DataFrame(data, index=['a1', 'a2'])
df_reset = df.reset_index()
print(df_reset)

index   Name
0 a1 Alice
1 a2 Bob

#Indexing #PandasTips

━━━━━━━━━━━━━━━
By: @DataScienceM ✨
❀8
πŸ“Œ The History of Bodybuilding Through Network Visualization

πŸ—‚ Category: DATA SCIENCE

πŸ•’ Date: 2024-05-30 | ⏱️ Read time: 5 min read

Constructing the Shared Podium Graph of Mr. Olympia Winners (1965-2023) using Python and Gephi.
Top 30 MATLAB Image Processing Functions

#MATLAB #ImageProcessing #Basics

πŸ‘‡πŸ‘‡πŸ‘‡πŸ‘‡πŸ‘‡
Please open Telegram to view this post
VIEW IN TELEGRAM
Top 30 MATLAB Image Processing Functions

#MATLAB #ImageProcessing #Basics

#1. imread()
Reads an image from a file into a matrix.

img = imread('peppers.png');
disp('Image "peppers.png" loaded into variable "img".');

Image "peppers.png" loaded into variable "img".


#2. imshow()
Displays an image in a figure window.

img = imread('peppers.png');
imshow(img);
title('Peppers Image');

Output: A new figure window opens, displaying the 'peppers.png' image with the title "Peppers Image".


#3. imwrite()
Writes an image matrix to a file.

img = imread('cameraman.tif');
imwrite(img, 'my_cameraman.jpg');
disp('Image saved as my_cameraman.jpg');

Image saved as my_cameraman.jpg


#4. size()
Returns the dimensions of the image matrix (rows, columns, color channels).

rgb_img = imread('peppers.png');
gray_img = imread('cameraman.tif');
size_rgb = size(rgb_img);
size_gray = size(gray_img);
disp(['Size of RGB image: ', num2str(size_rgb)]);
disp(['Size of grayscale image: ', num2str(size_gray)]);

Size of RGB image: 384   512     3
Size of grayscale image: 256 256


#5. rgb2gray()
Converts an RGB color image to a grayscale intensity image.

rgb_img = imread('peppers.png');
gray_img = rgb2gray(rgb_img);
imshow(gray_img);
title('Grayscale Peppers');

Output: A figure window displays the grayscale version of the peppers image.

---
#MATLAB #ImageProcessing #Conversion #Transformation

#6. im2double()
Converts an image to double-precision format, scaling data to the range [0, 1].

img_uint8 = imread('cameraman.tif');
img_double = im2double(img_uint8);
disp(['Max value of original image: ', num2str(max(img_uint8(:)))]);
disp(['Max value of double image: ', num2str(max(img_double(:)))]);

Max value of original image: 253
Max value of double image: 0.99216


#7. imresize()
Resizes an image to a specified size.

img = imread('cameraman.tif');
resized_img = imresize(img, 0.5); % Resize to 50% of original size
imshow(resized_img);
title('Resized Cameraman');

Output: A figure window displays the cameraman image at half its original size.


#8. imrotate()
Rotates an image by a specified angle.

img = imread('cameraman.tif');
rotated_img = imrotate(img, 30, 'bilinear', 'crop');
imshow(rotated_img);
title('Rotated 30 Degrees');

Output: A figure window displays the cameraman image rotated by 30 degrees, cropped to the original size.


#9. imcrop()
Crops an image to a specified rectangle.

img = imread('peppers.png');
% [xmin ymin width height]
cropped_img = imcrop(img, [100 80 250 200]);
imshow(cropped_img);
title('Cropped Image');

Output: A figure window displays only the rectangular section specified from the peppers image.


#10. rgb2hsv()
Converts an RGB image to the Hue-Saturation-Value (HSV) color space.

rgb_img = imread('peppers.png');
hsv_img = rgb2hsv(rgb_img);
hue_channel = hsv_img(:,:,1); % Extract the Hue channel
imshow(hue_channel);
title('Hue Channel of Peppers Image');

Output: A figure window displays the Hue channel of the peppers image as a grayscale image.

---
#MATLAB #ImageProcessing #Enhancement

#11. imhist()
Displays the histogram of an image, showing the distribution of pixel intensity values.
❀1
gray_img = imread('pout.tif');
imhist(gray_img);
title('Histogram of a Low-Contrast Image');

Output: A figure window with a bar chart showing the intensity distribution of the 'pout.tif' image.


#12. histeq()
Enhances contrast using histogram equalization.

low_contrast_img = imread('pout.tif');
high_contrast_img = histeq(low_contrast_img);
imshow(high_contrast_img);
title('Histogram Equalized Image');

Output: A figure window displays a higher contrast version of the 'pout.tif' image.


#13. imadjust()
Adjusts image intensity values or colormap by mapping intensity values to new values.

img = imread('cameraman.tif');
adjusted_img = imadjust(img, [0.3 0.7], []);
imshow(adjusted_img);
title('Intensity Adjusted Image');

Output: A figure window showing a high-contrast version of the cameraman image, where intensities between 0.3 and 0.7 are stretched to the full [0, 1] range.


#14. imtranslate()
Translates (shifts) an image horizontally and vertically.

img = imread('cameraman.tif');
translated_img = imtranslate(img, [25, 15]); % Shift 25 pixels right, 15 pixels down
imshow(translated_img);
title('Translated Image');

Output: A figure window shows the cameraman image shifted to the right and down.


#15. imsharpen()
Sharpens an image using the unsharp masking method.

img = imread('peppers.png');
sharpened_img = imsharpen(img);
imshow(sharpened_img);
title('Sharpened Image');

Output: A figure window displays a crisper, more detailed version of the peppers image.

---
#MATLAB #ImageProcessing #Filtering #Noise

#16. imnoise()
Adds a specified type of noise to an image.

img = imread('cameraman.tif');
noisy_img = imnoise(img, 'salt & pepper', 0.02);
imshow(noisy_img);
title('Image with Salt & Pepper Noise');

Output: A figure window displays the cameraman image with random white and black pixels (noise).


#17. fspecial()
Creates a predefined 2-D filter kernel (e.g., for averaging, Gaussian blur, Laplacian).

h = fspecial('motion', 20, 45); % Create a motion blur filter
disp('Generated a 2D motion filter kernel.');
disp(h);

Generated a 2D motion filter kernel.
(Output is a matrix representing the filter kernel)


#18. imfilter()
Filters a multidimensional image with a specified filter kernel.

img = imread('cameraman.tif');
h = fspecial('motion', 20, 45);
motion_blur_img = imfilter(img, h, 'replicate');
imshow(motion_blur_img);
title('Motion Blurred Image');

Output: A figure window shows the cameraman image with a motion blur effect applied at a 45-degree angle.


#19. medfilt2()
Performs 2-D median filtering, which is excellent for removing 'salt & pepper' noise.

noisy_img = imnoise(imread('cameraman.tif'), 'salt & pepper', 0.02);
denoised_img = medfilt2(noisy_img);
imshow(denoised_img);
title('Denoised with Median Filter');

Output: A figure window shows the noisy image significantly cleaned up, with most salt & pepper noise removed.


#20. edge()
Finds edges in an intensity image using various algorithms (e.g., Sobel, Canny).
img = imread('cameraman.tif');
edges = edge(img, 'Canny');
imshow(edges);
title('Edges found with Canny Detector');

Output: A figure window displays a binary image showing only the detected edges from the original image in white.

---
#MATLAB #ImageProcessing #Segmentation #Morphology

#21. graythresh()
Computes a global image threshold from a grayscale image using Otsu's method.

img = imread('coins.png');
level = graythresh(img);
disp(['Optimal threshold level (Otsu): ', num2str(level)]);

Optimal threshold level (Otsu): 0.49412


#22. imbinarize()
Converts a grayscale image to a binary image based on a threshold.

img = imread('coins.png');
level = graythresh(img); % Find optimal threshold
bw_img = imbinarize(img, level);
imshow(bw_img);
title('Binarized Image (Otsu Method)');

Output: A figure window displays a black and white image of the coins.


#23. strel()
Creates a morphological structuring element (SE), which is used to probe an image in morphological operations.

se = strel('disk', 5);
disp('Created a disk-shaped structuring element with radius 5.');
disp(se);

Created a disk-shaped structuring element with radius 5.
(Output describes the strel object and shows its matrix representation)


#24. imdilate()
Dilates a binary image, making objects larger and filling small holes.

img = imread('text.png');
se = strel('line', 3, 90); % A vertical line SE
dilated_img = imdilate(img, se);
imshow(dilated_img);
title('Dilated Text');

Output: A figure window shows the text characters appearing thicker, especially in the vertical direction.


#25. imerode()
Erodes a binary image, shrinking objects and removing small noise.

img = imread('text.png');
se = strel('line', 3, 0); % A horizontal line SE
eroded_img = imerode(img, se);
imshow(eroded_img);
title('Eroded Text');

Output: A figure window shows the text characters appearing thinner, with horizontal parts possibly disappearing.

---
#MATLAB #ImageProcessing #Analysis

#26. imopen()
Performs morphological opening (erosion followed by dilation). It smooths contours and removes small objects.

original = imread('circbw.tif');
se = strel('disk', 10);
opened_img = imopen(original, se);
imshow(opened_img);
title('Morphologically Opened Image');

Output: A figure window displays the image with small protrusions removed and gaps between objects widened.


#27. bwareaopen()
Removes all connected components (objects) from a binary image that have fewer than a specified number of pixels.

img = imread('text.png');
cleaned_img = bwareaopen(img, 50); % Remove objects with fewer than 50 pixels
imshow(cleaned_img);
title('Image after removing small objects');

Output: A figure window shows the text image with small noise specks or broken parts of characters removed.


#28. bwlabel()
Labels connected components in a binary image.

img = imread('text.png');
[L, num] = bwlabel(img);
disp(['Number of connected objects found: ', num2str(num)]);

Number of connected objects found: 114


#29. regionprops()
Measures a set of properties for each labeled region in an image.
img = imread('coins.png');
bw = imbinarize(img);
stats = regionprops('table', bw, 'Centroid', 'MajorAxisLength', 'MinorAxisLength');
disp('Properties of the first 3 coins:');
disp(stats(1:3,:));

Properties of the first 3 coins:
Centroid MajorAxisLength MinorAxisLength
________ _______________ _______________

52.715 26.839 48.24 47.369
134.5 27.067 48.745 47.534
215.01 29.805 47.854 47.502


#30. hough()
Performs the Hough transform, used to detect lines in an image.

img = imrotate(imread('circuit.tif'), 33, 'crop');
edges = edge(img, 'canny');
[H, T, R] = hough(edges);
imshow(imadjust(rescale(H)), 'XData', T, 'YData', R, ...
'InitialMagnification', 'fit');
xlabel('\theta'), ylabel('\rho');
title('Hough Transform of Circuit');

Output: A figure window displaying the Hough transform as an image, where bright spots correspond to potential lines in the original image.


━━━━━━━━━━━━━━━
By: @DataScienceM ✨
❀2
πŸ“Œ Constructive Heuristics in Discrete Optimization

πŸ—‚ Category:

πŸ•’ Date: 2024-05-30 | ⏱️ Read time: 13 min read

Obtain initial solutions for combinatorial optimization problems with Python examples
Top 100 Data Analyst Interview Questions & Answers

πŸ‘‡πŸ‘‡πŸ‘‡πŸ‘‡πŸ‘‡
Please open Telegram to view this post
VIEW IN TELEGRAM
Top 100 Data Analyst Interview Questions & Answers

#DataAnalysis #InterviewQuestions #SQL #Python #Statistics #CaseStudy #DataScience

Part 1: SQL Questions (Q1-30)

#1. What is the difference between DELETE, TRUNCATE, and DROP?
A:
β€’ DELETE is a DML command that removes rows from a table based on a WHERE clause. It is slower as it logs each row deletion and can be rolled back.
β€’ TRUNCATE is a DDL command that quickly removes all rows from a table. It is faster, cannot be rolled back, and resets table identity.
β€’ DROP is a DDL command that removes the entire table, including its structure, data, and indexes.

#2. Select all unique departments from the employees table.
A: Use the DISTINCT keyword.

SELECT DISTINCT department
FROM employees;


#3. Find the top 5 highest-paid employees.
A: Use ORDER BY and LIMIT.

SELECT name, salary
FROM employees
ORDER BY salary DESC
LIMIT 5;


#4. What is the difference between WHERE and HAVING?
A:
β€’ WHERE is used to filter records before any groupings are made (i.e., it operates on individual rows).
β€’ HAVING is used to filter groups after aggregations (GROUP BY) have been performed.

-- Find departments with more than 10 employees
SELECT department, COUNT(employee_id)
FROM employees
GROUP BY department
HAVING COUNT(employee_id) > 10;


#5. What are the different types of SQL joins?
A:
β€’ (INNER) JOIN: Returns records that have matching values in both tables.
β€’ LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table.
β€’ RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left table.
β€’ FULL (OUTER) JOIN: Returns all records when there is a match in either the left or right table.
β€’ SELF JOIN: A regular join, but the table is joined with itself.

#6. Write a query to find the second-highest salary.
A: Use OFFSET or a subquery.

-- Method 1: Using OFFSET
SELECT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

-- Method 2: Using a Subquery
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);


#7. Find duplicate emails in a customers table.
A: Group by the email column and use HAVING to find groups with a count greater than 1.

SELECT email, COUNT(email)
FROM customers
GROUP BY email
HAVING COUNT(email) > 1;


#8. What is a primary key vs. a foreign key?
A:
β€’ A Primary Key is a constraint that uniquely identifies each record in a table. It must contain unique values and cannot contain NULL values.
β€’ A Foreign Key is a key used to link two tables together. It is a field (or collection of fields) in one table that refers to the Primary Key in another table.

#9. Explain Window Functions. Give an example.
A: Window functions perform a calculation across a set of table rows that are somehow related to the current row. Unlike aggregate functions, they do not collapse rows.

-- Rank employees by salary within each department
SELECT
name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;


#10. What is a CTE (Common Table Expression)?
A: A CTE is a temporary, named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. It helps improve readability and break down complex queries.
WITH DepartmentSales AS (
SELECT department, SUM(sale_amount) as total_sales
FROM sales
GROUP BY department
)
SELECT department, total_sales
FROM DepartmentSales
WHERE total_sales > 100000;

---
#11. Difference between UNION and UNION ALL?
A:
β€’ UNION combines the result sets of two or more SELECT statements and removes duplicate rows.
β€’ UNION ALL also combines result sets but includes all rows, including duplicates. It is faster because it doesn't check for duplicates.

#12. How would you find the total number of employees in each department?
A: Use COUNT() with GROUP BY.

SELECT department, COUNT(employee_id) as number_of_employees
FROM employees
GROUP BY department;


#13. What is the difference between RANK() and DENSE_RANK()?
A:
β€’ RANK() assigns a rank to each row within a partition. If there are ties, it skips the next rank(s). (e.g., 1, 2, 2, 4)
β€’ DENSE_RANK() also assigns ranks, but it does not skip any ranks in case of ties. (e.g., 1, 2, 2, 3)

#14. Write a query to get the Nth highest salary.
A: Use DENSE_RANK() in a CTE.

WITH SalaryRanks AS (
SELECT
salary,
DENSE_RANK() OVER (ORDER BY salary DESC) as rnk
FROM employees
)
SELECT salary
FROM SalaryRanks
WHERE rnk = 5; -- For the 5th highest salary


#15. What is COALESCE() used for?
A: The COALESCE() function returns the first non-NULL value in a list of expressions. It's useful for providing default values for nulls.

SELECT name, COALESCE(commission, 0) as commission
FROM employees; -- Replaces NULL commissions with 0

---
#16. How would you select all employees whose name starts with 'A'?
A: Use the LIKE operator with a wildcard (%).

SELECT name
FROM employees
WHERE name LIKE 'A%';


#17. Get the current date and time.
A: This is function-dependent on the SQL dialect.
β€’ PostgreSQL/MySQL: NOW()
β€’ SQL Server: GETDATE()

SELECT NOW();


#18. How can you extract the month from a date?
A: Use the EXTRACT function or MONTH().

-- Standard SQL
SELECT EXTRACT(MONTH FROM '2023-10-27');
-- MySQL
SELECT MONTH('2023-10-27');


#19. What is a subquery? What are the types?
A: A subquery is a query nested inside another query.
β€’ Scalar Subquery: Returns a single value (one row, one column).
β€’ Multi-row Subquery: Returns multiple rows.
β€’ Correlated Subquery: An inner query that depends on the outer query for its values. It is evaluated once for each row processed by the outer query.

#20. Write a query to find all employees who work in the 'Sales' department.
A: Use a JOIN or a subquery.

-- Using JOIN (preferred)
SELECT e.name
FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE d.name = 'Sales';

---
#21. How would you calculate the month-over-month growth rate of sales?
A: Use the LAG() window function to get the previous month's sales and then apply the growth formula.

WITH MonthlySales AS (
SELECT
DATE_TRUNC('month', order_date)::DATE as sales_month,
SUM(sale_amount) as total_sales
FROM sales
GROUP BY 1
)
SELECT
sales_month,
total_sales,
(total_sales - LAG(total_sales, 1) OVER (ORDER BY sales_month)) / LAG(total_sales, 1) OVER (ORDER BY sales_month) * 100 as growth_rate
FROM MonthlySales;