# Create a copy of the original image to draw on
output_img = img.copy()
# Draw a bounding box for each detected bottle
for box in bottle_boxes:
x1, y1, x2, y2 = map(int, box)
# Draw a green rectangle around each bottle
cv2.rectangle(output_img, (x1, y1), (x2, y2), (0, 255, 0), 2)
# Add the final count as text on the image
summary_text = f"Bottle Count: {bottle_count}"
cv2.putText(output_img, summary_text, (20, 50),
cv2.FONT_HERSHEY_SIMPLEX, 1.5, (0, 0, 255), 4)
# Save the resulting image
cv2.imwrite('factory_bottles_result.jpg', output_img)
print("Result image with detections has been saved as 'factory_bottles_result.jpg'")
---
Step 6: Discussion of Results and Limitations
#Discussion #Limitations #FineTuning
Result: The code successfully uses a pre-trained YOLOv8 model to identify and count standard plastic bottles in an image. The final output provides both a numerical count and a visual confirmation of the detections.
Limitations of Pre-trained Model:
1. Occlusion: If bottles are heavily clustered or hiding behind each other, the model might miss some, leading to an undercount.
2. Unusual Shapes: The model is trained on common bottles (from the COCO dataset). If your factory produces bottles of a very unique shape or color, the model's accuracy might decrease.
3. Environmental Factors: Poor lighting, motion blur (if from a fast conveyor belt), or reflections can all negatively impact detection performance.
How to Improve (Next Steps): For a real-world, high-accuracy industrial application, you should not rely on a generic pre-trained model. The best approach is Fine-Tuning. This involves:
1. Collecting Data: Take hundreds or thousands of pictures of your specific bottles in your actual factory environment*.
2. Annotating Data: Draw bounding boxes around every bottle in those images.
3. Training: Use this custom dataset to train (or "fine-tune") the YOLOv8 model. This teaches the model exactly what to look for in your specific use case, leading to much higher accuracy and reliability.
━━━━━━━━━━━━━━━
By: @DataScienceM ✨
❤1
Please open Telegram to view this post
VIEW IN TELEGRAM
#Pandas #DataAnalysis #Python #DataScience #Tutorial
Top 30 Pandas Functions & Methods
This lesson covers 30 essential Pandas functions for data manipulation and analysis, each with a standalone example and its output.
---
1.
Creates a new DataFrame (a 2D labeled data structure) from various inputs like dictionaries or lists.
---
2.
Creates a new Series (a 1D labeled array).
---
3.
Reads data from a CSV file into a DataFrame. (Assuming a file
---
4.
Writes a DataFrame to a CSV file.
#PandasIO #DataFrame #Series
---
5.
Returns the first
---
6.
Returns the last
---
7.
Provides a concise summary of the DataFrame, including data types and non-null values.
---
8.
Returns a tuple representing the dimensionality (rows, columns) of the DataFrame.
#DataInspection #PandasBasics
---
9.
Generates descriptive statistics for numerical columns (count, mean, std, min, max, etc.).
Top 30 Pandas Functions & Methods
This lesson covers 30 essential Pandas functions for data manipulation and analysis, each with a standalone example and its output.
---
1.
pd.DataFrame()Creates a new DataFrame (a 2D labeled data structure) from various inputs like dictionaries or lists.
import pandas as pd
data = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data)
print(df)
col1 col2
0 1 3
1 2 4
---
2.
pd.Series()Creates a new Series (a 1D labeled array).
import pandas as pd
s = pd.Series([10, 20, 30, 40], name='MyNumbers')
print(s)
0 10
1 20
2 30
3 40
Name: MyNumbers, dtype: int64
---
3.
pd.read_csv()Reads data from a CSV file into a DataFrame. (Assuming a file
data.csv exists).# Create a dummy csv file first
with open('data.csv', 'w') as f:
f.write('Name,Age\nAlice,25\nBob,30')
df = pd.read_csv('data.csv')
print(df)
Name Age
0 Alice 25
1 Bob 30
---
4.
df.to_csv()Writes a DataFrame to a CSV file.
import pandas as pd
df = pd.DataFrame({'Name': ['Charlie'], 'Age': [35]})
# index=False prevents writing the DataFrame index to the file
df.to_csv('output.csv', index=False)
# You can check that 'output.csv' has been created.
print("File 'output.csv' created.")
File 'output.csv' created.
#PandasIO #DataFrame #Series
---
5.
df.head()Returns the first
n rows of the DataFrame (default is 5).import pandas as pd
data = {'Name': ['A', 'B', 'C', 'D', 'E', 'F'], 'Value': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
print(df.head(3))
Name Value
0 A 1
1 B 2
2 C 3
---
6.
df.tail()Returns the last
n rows of the DataFrame (default is 5).import pandas as pd
data = {'Name': ['A', 'B', 'C', 'D', 'E', 'F'], 'Value': [1, 2, 3, 4, 5, 6]}
df = pd.DataFrame(data)
print(df.tail(2))
Name Value
4 E 5
5 F 6
---
7.
df.info()Provides a concise summary of the DataFrame, including data types and non-null values.
import pandas as pd
import numpy as np
data = {'col1': [1, 2, 3], 'col2': [4.0, 5.0, np.nan], 'col3': ['A', 'B', 'C']}
df = pd.DataFrame(data)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 3 non-null int64
1 col2 2 non-null float64
2 col3 3 non-null object
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes
---
8.
df.shapeReturns a tuple representing the dimensionality (rows, columns) of the DataFrame.
import pandas as pd
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4], 'C': [5, 6]})
print(df.shape)
(2, 3)
#DataInspection #PandasBasics
---
9.
df.describe()Generates descriptive statistics for numerical columns (count, mean, std, min, max, etc.).
import pandas as pd
df = pd.DataFrame({'Age': [22, 38, 26, 35, 29]})
print(df.describe())
❤4
Age
count 5.000000
mean 30.000000
std 6.363961
min 22.000000
25% 26.000000
50% 29.000000
75% 35.000000
max 38.000000
---
10.
df.columnsReturns the column labels of the DataFrame.
import pandas as pd
df = pd.DataFrame({'Name': [], 'Age': [], 'City': []})
print(df.columns)
Index(['Name', 'Age', 'City'], dtype='object')
---
11.
df.dtypesReturns the data type of each column.
import pandas as pd
df = pd.DataFrame({'Name': ['Alice'], 'Age': [25], 'Salary': [75000.50]})
print(df.dtypes)
Name object
Age int64
Salary float64
dtype: object
---
12. Selecting a Column
Select a single column, which returns a Pandas Series.
import pandas as pd
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
ages = df['Age']
print(ages)
0 25
1 30
Name: Age, dtype: int64
#DataSelection #Indexing #Statistics
---
13.
df.loc[]Access a group of rows and columns by label(s) or a boolean array.
import pandas as pd
data = {'Age': [25, 30, 35], 'City': ['NY', 'LA', 'CH']}
df = pd.DataFrame(data, index=['Alice', 'Bob', 'Charlie'])
print(df.loc['Bob'])
Age 30
City LA
Name: Bob, dtype: object
---
14.
df.iloc[]Access a group of rows and columns by integer position(s).
import pandas as pd
data = {'Age': [25, 30, 35], 'City': ['NY', 'LA', 'CH']}
df = pd.DataFrame(data, index=['Alice', 'Bob', 'Charlie'])
print(df.iloc[1]) # Get the second row (index 1)
Age 30
City LA
Name: Bob, dtype: object
---
15.
df.isnull()Returns a DataFrame of the same shape with boolean values indicating if a value is missing (NaN).
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan], 'B': [3, 4]})
print(df.isnull())
A B
0 False False
1 True False
---
16.
df.dropna()Removes missing values.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [4, 5, 6]})
cleaned_df = df.dropna()
print(cleaned_df)
A B
0 1.0 4
2 3.0 6
#DataCleaning #MissingData
---
17.
df.fillna()Fills missing (NaN) values with a specified value or method.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Score': [90, 85, np.nan, 92]})
filled_df = df.fillna(0)
print(filled_df)
Score
0 90.0
1 85.0
2 0.0
3 92.0
---
18.
df.drop_duplicates()Removes duplicate rows from the DataFrame.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Alice'], 'Age': [25, 30, 25]}
df = pd.DataFrame(data)
unique_df = df.drop_duplicates()
print(unique_df)
Name Age
0 Alice 25
1 Bob 30
---
19.
df.rename()Alters axes labels (e.g., column names).
import pandas as pd
df = pd.DataFrame({'A': [1], 'B': [2]})
renamed_df = df.rename(columns={'A': 'Column_A', 'B': 'Column_B'})
print(renamed_df)
Column_A Column_B
0 1 2
---
20.
series.value_counts()Returns a Series containing counts of unique values.
import pandas as pd
s = pd.Series(['A', 'B', 'A', 'C', 'A', 'B'])
print(s.value_counts())
A 3
B 2
C 1
dtype: int64
#DataManipulation #Transformation
---
21.
series.unique()Returns an array of unique values in a Series.
import pandas as pd
s = pd.Series(['A', 'B', 'A', 'C', 'A', 'B'])
print(s.unique())
['A' 'B' 'C']
---
22.
df.sort_values()Sorts a DataFrame by the values of one or more columns.
import pandas as pd
data = {'Name': ['Charlie', 'Alice', 'Bob'], 'Age': [35, 25, 30]}
df = pd.DataFrame(data)
sorted_df = df.sort_values(by='Age')
print(sorted_df)
Name Age
1 Alice 25
2 Bob 30
0 Charlie 35
---
23.
df.groupby()Groups a DataFrame using a mapper or by a Series of columns for aggregation.
import pandas as pd
data = {'Dept': ['HR', 'IT', 'HR', 'IT'], 'Salary': [70, 85, 75, 90]}
df = pd.DataFrame(data)
grouped = df.groupby('Dept').mean()
print(grouped)
Salary
Dept
HR 72.5
IT 87.5
---
24.
df.agg()Applies one or more aggregations over the specified axis.
import pandas as pd
data = {'Dept': ['HR', 'IT', 'HR', 'IT'], 'Salary': [70, 85, 75, 90]}
df = pd.DataFrame(data)
agg_results = df.groupby('Dept')['Salary'].agg(['mean', 'sum'])
print(agg_results)
mean sum
Dept
HR 72.5 145
IT 87.5 175
#Aggregation #Grouping #Sorting
---
25.
df.apply()Applies a function along an axis of the DataFrame.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30]})
# Apply a function to double each value in column 'A'
df['A_doubled'] = df['A'].apply(lambda x: x * 2)
print(df)
A B A_doubled
0 1 10 2
1 2 20 4
2 3 30 6
---
26.
pd.merge()Merges two DataFrames based on a common column or index, similar to a SQL join.
import pandas as pd
df1 = pd.DataFrame({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'ID': [1, 2], 'Role': ['Engineer', 'Analyst']})
merged_df = pd.merge(df1, df2, on='ID')
print(merged_df)
ID Name Role
0 1 Alice Engineer
1 2 Bob Analyst
---
27.
pd.concat()Concatenates (stacks) pandas objects along a particular axis.
import pandas as pd
df1 = pd.DataFrame({'A': ['A0'], 'B': ['B0']})
df2 = pd.DataFrame({'A': ['A1'], 'B': ['B1']})
concatenated_df = pd.concat([df1, df2])
print(concatenated_df)
A B
0 A0 B0
0 A1 B1
---
28.
df.pivot_table()Creates a spreadsheet-style pivot table as a DataFrame.
❤2
import pandas as pd
data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02'],
'City': ['NY', 'LA', 'NY'],
'Sales': [100, 150, 120]}
df = pd.DataFrame(data)
pivot = df.pivot_table(values='Sales', index='Date', columns='City')
print(pivot)
City LA NY
Date
2023-01-01 150.0 100.0
2023-01-02 NaN 120.0
#CombiningData #PivotTable
---
29.
df.set_index()Sets one or more existing columns as the DataFrame index.
import pandas as pd
data = {'ID': ['a1', 'a2'], 'Name': ['Alice', 'Bob']}
df = pd.DataFrame(data)
df_indexed = df.set_index('ID')
print(df_indexed)
Name
ID
a1 Alice
a2 Bob
---
30.
df.reset_index()Resets the index of the DataFrame, making the old index a new column.
import pandas as pd
data = {'Name': ['Alice', 'Bob']}
df = pd.DataFrame(data, index=['a1', 'a2'])
df_reset = df.reset_index()
print(df_reset)
index Name
0 a1 Alice
1 a2 Bob
#Indexing #PandasTips
━━━━━━━━━━━━━━━
By: @DataScienceM ✨
❤8
Please open Telegram to view this post
VIEW IN TELEGRAM
Top 30 MATLAB Image Processing Functions
#MATLAB #ImageProcessing #Basics
#1.
Reads an image from a file into a matrix.
#2.
Displays an image in a figure window.
#3.
Writes an image matrix to a file.
#4.
Returns the dimensions of the image matrix (rows, columns, color channels).
#5.
Converts an RGB color image to a grayscale intensity image.
---
#MATLAB #ImageProcessing #Conversion #Transformation
#6.
Converts an image to double-precision format, scaling data to the range [0, 1].
#7.
Resizes an image to a specified size.
#8.
Rotates an image by a specified angle.
#9.
Crops an image to a specified rectangle.
#10.
Converts an RGB image to the Hue-Saturation-Value (HSV) color space.
---
#MATLAB #ImageProcessing #Enhancement
#11.
Displays the histogram of an image, showing the distribution of pixel intensity values.
#MATLAB #ImageProcessing #Basics
#1.
imread()Reads an image from a file into a matrix.
img = imread('peppers.png');
disp('Image "peppers.png" loaded into variable "img".');Image "peppers.png" loaded into variable "img".
#2.
imshow()Displays an image in a figure window.
img = imread('peppers.png');
imshow(img);
title('Peppers Image');Output: A new figure window opens, displaying the 'peppers.png' image with the title "Peppers Image".
#3.
imwrite()Writes an image matrix to a file.
img = imread('cameraman.tif');
imwrite(img, 'my_cameraman.jpg');
disp('Image saved as my_cameraman.jpg');Image saved as my_cameraman.jpg
#4.
size()Returns the dimensions of the image matrix (rows, columns, color channels).
rgb_img = imread('peppers.png');
gray_img = imread('cameraman.tif');
size_rgb = size(rgb_img);
size_gray = size(gray_img);
disp(['Size of RGB image: ', num2str(size_rgb)]);
disp(['Size of grayscale image: ', num2str(size_gray)]);Size of RGB image: 384 512 3
Size of grayscale image: 256 256
#5.
rgb2gray()Converts an RGB color image to a grayscale intensity image.
rgb_img = imread('peppers.png');
gray_img = rgb2gray(rgb_img);
imshow(gray_img);
title('Grayscale Peppers');Output: A figure window displays the grayscale version of the peppers image.
---
#MATLAB #ImageProcessing #Conversion #Transformation
#6.
im2double()Converts an image to double-precision format, scaling data to the range [0, 1].
img_uint8 = imread('cameraman.tif');
img_double = im2double(img_uint8);
disp(['Max value of original image: ', num2str(max(img_uint8(:)))]);
disp(['Max value of double image: ', num2str(max(img_double(:)))]);Max value of original image: 253
Max value of double image: 0.99216
#7.
imresize()Resizes an image to a specified size.
img = imread('cameraman.tif');
resized_img = imresize(img, 0.5); % Resize to 50% of original size
imshow(resized_img);
title('Resized Cameraman');Output: A figure window displays the cameraman image at half its original size.
#8.
imrotate()Rotates an image by a specified angle.
img = imread('cameraman.tif');
rotated_img = imrotate(img, 30, 'bilinear', 'crop');
imshow(rotated_img);
title('Rotated 30 Degrees');Output: A figure window displays the cameraman image rotated by 30 degrees, cropped to the original size.
#9.
imcrop()Crops an image to a specified rectangle.
img = imread('peppers.png');
% [xmin ymin width height]
cropped_img = imcrop(img, [100 80 250 200]);
imshow(cropped_img);
title('Cropped Image');Output: A figure window displays only the rectangular section specified from the peppers image.
#10.
rgb2hsv()Converts an RGB image to the Hue-Saturation-Value (HSV) color space.
rgb_img = imread('peppers.png');
hsv_img = rgb2hsv(rgb_img);
hue_channel = hsv_img(:,:,1); % Extract the Hue channel
imshow(hue_channel);
title('Hue Channel of Peppers Image');Output: A figure window displays the Hue channel of the peppers image as a grayscale image.
---
#MATLAB #ImageProcessing #Enhancement
#11.
imhist()Displays the histogram of an image, showing the distribution of pixel intensity values.
❤1
gray_img = imread('pout.tif');
imhist(gray_img);
title('Histogram of a Low-Contrast Image');Output: A figure window with a bar chart showing the intensity distribution of the 'pout.tif' image.
#12.
histeq()Enhances contrast using histogram equalization.
low_contrast_img = imread('pout.tif');
high_contrast_img = histeq(low_contrast_img);
imshow(high_contrast_img);
title('Histogram Equalized Image');Output: A figure window displays a higher contrast version of the 'pout.tif' image.
#13.
imadjust()Adjusts image intensity values or colormap by mapping intensity values to new values.
img = imread('cameraman.tif');
adjusted_img = imadjust(img, [0.3 0.7], []);
imshow(adjusted_img);
title('Intensity Adjusted Image');Output: A figure window showing a high-contrast version of the cameraman image, where intensities between 0.3 and 0.7 are stretched to the full [0, 1] range.
#14.
imtranslate()Translates (shifts) an image horizontally and vertically.
img = imread('cameraman.tif');
translated_img = imtranslate(img, [25, 15]); % Shift 25 pixels right, 15 pixels down
imshow(translated_img);
title('Translated Image');Output: A figure window shows the cameraman image shifted to the right and down.
#15.
imsharpen()Sharpens an image using the unsharp masking method.
img = imread('peppers.png');
sharpened_img = imsharpen(img);
imshow(sharpened_img);
title('Sharpened Image');Output: A figure window displays a crisper, more detailed version of the peppers image.
---
#MATLAB #ImageProcessing #Filtering #Noise
#16.
imnoise()Adds a specified type of noise to an image.
img = imread('cameraman.tif');
noisy_img = imnoise(img, 'salt & pepper', 0.02);
imshow(noisy_img);
title('Image with Salt & Pepper Noise');Output: A figure window displays the cameraman image with random white and black pixels (noise).
#17.
fspecial()Creates a predefined 2-D filter kernel (e.g., for averaging, Gaussian blur, Laplacian).
h = fspecial('motion', 20, 45); % Create a motion blur filter
disp('Generated a 2D motion filter kernel.');
disp(h);Generated a 2D motion filter kernel.
(Output is a matrix representing the filter kernel)
#18.
imfilter()Filters a multidimensional image with a specified filter kernel.
img = imread('cameraman.tif');
h = fspecial('motion', 20, 45);
motion_blur_img = imfilter(img, h, 'replicate');
imshow(motion_blur_img);
title('Motion Blurred Image');Output: A figure window shows the cameraman image with a motion blur effect applied at a 45-degree angle.
#19.
medfilt2()Performs 2-D median filtering, which is excellent for removing 'salt & pepper' noise.
noisy_img = imnoise(imread('cameraman.tif'), 'salt & pepper', 0.02);
denoised_img = medfilt2(noisy_img);
imshow(denoised_img);
title('Denoised with Median Filter');Output: A figure window shows the noisy image significantly cleaned up, with most salt & pepper noise removed.
#20.
edge()Finds edges in an intensity image using various algorithms (e.g., Sobel, Canny).
img = imread('cameraman.tif');
edges = edge(img, 'Canny');
imshow(edges);
title('Edges found with Canny Detector');Output: A figure window displays a binary image showing only the detected edges from the original image in white.
---
#MATLAB #ImageProcessing #Segmentation #Morphology
#21.
graythresh()Computes a global image threshold from a grayscale image using Otsu's method.
img = imread('coins.png');
level = graythresh(img);
disp(['Optimal threshold level (Otsu): ', num2str(level)]);Optimal threshold level (Otsu): 0.49412
#22.
imbinarize()Converts a grayscale image to a binary image based on a threshold.
img = imread('coins.png');
level = graythresh(img); % Find optimal threshold
bw_img = imbinarize(img, level);
imshow(bw_img);
title('Binarized Image (Otsu Method)');Output: A figure window displays a black and white image of the coins.
#23.
strel()Creates a morphological structuring element (SE), which is used to probe an image in morphological operations.
se = strel('disk', 5);
disp('Created a disk-shaped structuring element with radius 5.');
disp(se);Created a disk-shaped structuring element with radius 5.
(Output describes the strel object and shows its matrix representation)
#24.
imdilate()Dilates a binary image, making objects larger and filling small holes.
img = imread('text.png');
se = strel('line', 3, 90); % A vertical line SE
dilated_img = imdilate(img, se);
imshow(dilated_img);
title('Dilated Text');Output: A figure window shows the text characters appearing thicker, especially in the vertical direction.
#25.
imerode()Erodes a binary image, shrinking objects and removing small noise.
img = imread('text.png');
se = strel('line', 3, 0); % A horizontal line SE
eroded_img = imerode(img, se);
imshow(eroded_img);
title('Eroded Text');Output: A figure window shows the text characters appearing thinner, with horizontal parts possibly disappearing.
---
#MATLAB #ImageProcessing #Analysis
#26.
imopen()Performs morphological opening (erosion followed by dilation). It smooths contours and removes small objects.
original = imread('circbw.tif');
se = strel('disk', 10);
opened_img = imopen(original, se);
imshow(opened_img);
title('Morphologically Opened Image');Output: A figure window displays the image with small protrusions removed and gaps between objects widened.
#27.
bwareaopen()Removes all connected components (objects) from a binary image that have fewer than a specified number of pixels.
img = imread('text.png');
cleaned_img = bwareaopen(img, 50); % Remove objects with fewer than 50 pixels
imshow(cleaned_img);
title('Image after removing small objects');Output: A figure window shows the text image with small noise specks or broken parts of characters removed.
#28.
bwlabel()Labels connected components in a binary image.
img = imread('text.png');
[L, num] = bwlabel(img);
disp(['Number of connected objects found: ', num2str(num)]);Number of connected objects found: 114
#29.
regionprops()Measures a set of properties for each labeled region in an image.
img = imread('coins.png');
bw = imbinarize(img);
stats = regionprops('table', bw, 'Centroid', 'MajorAxisLength', 'MinorAxisLength');
disp('Properties of the first 3 coins:');
disp(stats(1:3,:));Properties of the first 3 coins:
Centroid MajorAxisLength MinorAxisLength
________ _______________ _______________
52.715 26.839 48.24 47.369
134.5 27.067 48.745 47.534
215.01 29.805 47.854 47.502
#30.
hough()Performs the Hough transform, used to detect lines in an image.
img = imrotate(imread('circuit.tif'), 33, 'crop');
edges = edge(img, 'canny');
[H, T, R] = hough(edges);
imshow(imadjust(rescale(H)), 'XData', T, 'YData', R, ...
'InitialMagnification', 'fit');
xlabel('\theta'), ylabel('\rho');
title('Hough Transform of Circuit');Output: A figure window displaying the Hough transform as an image, where bright spots correspond to potential lines in the original image.
━━━━━━━━━━━━━━━
By: @DataScienceM ✨
❤2
Please open Telegram to view this post
VIEW IN TELEGRAM
Top 100 Data Analyst Interview Questions & Answers
#DataAnalysis #InterviewQuestions #SQL #Python #Statistics #CaseStudy #DataScience
Part 1: SQL Questions (Q1-30)
#1. What is the difference between
A:
•
•
•
#2. Select all unique departments from the
A: Use the
#3. Find the top 5 highest-paid employees.
A: Use
#4. What is the difference between
A:
•
•
#5. What are the different types of SQL joins?
A:
•
•
•
•
•
#6. Write a query to find the second-highest salary.
A: Use
#7. Find duplicate emails in a
A: Group by the email column and use
#8. What is a primary key vs. a foreign key?
A:
• A Primary Key is a constraint that uniquely identifies each record in a table. It must contain unique values and cannot contain NULL values.
• A Foreign Key is a key used to link two tables together. It is a field (or collection of fields) in one table that refers to the Primary Key in another table.
#9. Explain Window Functions. Give an example.
A: Window functions perform a calculation across a set of table rows that are somehow related to the current row. Unlike aggregate functions, they do not collapse rows.
#10. What is a CTE (Common Table Expression)?
A: A CTE is a temporary, named result set that you can reference within a
#DataAnalysis #InterviewQuestions #SQL #Python #Statistics #CaseStudy #DataScience
Part 1: SQL Questions (Q1-30)
#1. What is the difference between
DELETE, TRUNCATE, and DROP?A:
•
DELETE is a DML command that removes rows from a table based on a WHERE clause. It is slower as it logs each row deletion and can be rolled back.•
TRUNCATE is a DDL command that quickly removes all rows from a table. It is faster, cannot be rolled back, and resets table identity.•
DROP is a DDL command that removes the entire table, including its structure, data, and indexes.#2. Select all unique departments from the
employees table.A: Use the
DISTINCT keyword.SELECT DISTINCT department
FROM employees;
#3. Find the top 5 highest-paid employees.
A: Use
ORDER BY and LIMIT.SELECT name, salary
FROM employees
ORDER BY salary DESC
LIMIT 5;
#4. What is the difference between
WHERE and HAVING?A:
•
WHERE is used to filter records before any groupings are made (i.e., it operates on individual rows).•
HAVING is used to filter groups after aggregations (GROUP BY) have been performed.-- Find departments with more than 10 employees
SELECT department, COUNT(employee_id)
FROM employees
GROUP BY department
HAVING COUNT(employee_id) > 10;
#5. What are the different types of SQL joins?
A:
•
(INNER) JOIN: Returns records that have matching values in both tables.•
LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table.•
RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left table.•
FULL (OUTER) JOIN: Returns all records when there is a match in either the left or right table.•
SELF JOIN: A regular join, but the table is joined with itself.#6. Write a query to find the second-highest salary.
A: Use
OFFSET or a subquery.-- Method 1: Using OFFSET
SELECT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;
-- Method 2: Using a Subquery
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
#7. Find duplicate emails in a
customers table.A: Group by the email column and use
HAVING to find groups with a count greater than 1.SELECT email, COUNT(email)
FROM customers
GROUP BY email
HAVING COUNT(email) > 1;
#8. What is a primary key vs. a foreign key?
A:
• A Primary Key is a constraint that uniquely identifies each record in a table. It must contain unique values and cannot contain NULL values.
• A Foreign Key is a key used to link two tables together. It is a field (or collection of fields) in one table that refers to the Primary Key in another table.
#9. Explain Window Functions. Give an example.
A: Window functions perform a calculation across a set of table rows that are somehow related to the current row. Unlike aggregate functions, they do not collapse rows.
-- Rank employees by salary within each department
SELECT
name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;
#10. What is a CTE (Common Table Expression)?
A: A CTE is a temporary, named result set that you can reference within a
SELECT, INSERT, UPDATE, or DELETE statement. It helps improve readability and break down complex queries.