Data Science Machine Learning Data Analysis

Top 100 Data Analyst Interview Questions & Answers

#DataAnalysis #InterviewQuestions #SQL #Python #Statistics #CaseStudy #DataScience

Part 1: SQL Questions (Q1-30)

#1. What is the difference between DELETE, TRUNCATE, and DROP?
A:
• DELETE is a DML command that removes rows from a table based on a WHERE clause. It is slower as it logs each row deletion and can be rolled back.
• TRUNCATE is a DDL command that quickly removes all rows from a table. It is faster, cannot be rolled back, and resets table identity.
• DROP is a DDL command that removes the entire table, including its structure, data, and indexes.

#2. Select all unique departments from the employees table.
A: Use the DISTINCT keyword.

SELECT DISTINCT department
FROM employees;

#3. Find the top 5 highest-paid employees.
A: Use ORDER BY and LIMIT.

SELECT name, salary
FROM employees
ORDER BY salary DESC
LIMIT 5;

#4. What is the difference between WHERE and HAVING?
A:
• WHERE is used to filter records before any groupings are made (i.e., it operates on individual rows).
• HAVING is used to filter groups after aggregations (GROUP BY) have been performed.

-- Find departments with more than 10 employees
SELECT department, COUNT(employee_id)
FROM employees
GROUP BY department
HAVING COUNT(employee_id) > 10;

#5. What are the different types of SQL joins?
A:
• (INNER) JOIN: Returns records that have matching values in both tables.
• LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the right table.
• RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from the left table.
• FULL (OUTER) JOIN: Returns all records when there is a match in either the left or right table.
• SELF JOIN: A regular join, but the table is joined with itself.

#6. Write a query to find the second-highest salary.
A: Use OFFSET or a subquery.

-- Method 1: Using OFFSET
SELECT salary
FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

-- Method 2: Using a Subquery
SELECT MAX(salary)
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

#7. Find duplicate emails in a customers table.
A: Group by the email column and use HAVING to find groups with a count greater than 1.

SELECT email, COUNT(email)
FROM customers
GROUP BY email
HAVING COUNT(email) > 1;

#8. What is a primary key vs. a foreign key?
A:
• A Primary Key is a constraint that uniquely identifies each record in a table. It must contain unique values and cannot contain NULL values.
• A Foreign Key is a key used to link two tables together. It is a field (or collection of fields) in one table that refers to the Primary Key in another table.

#9. Explain Window Functions. Give an example.
A: Window functions perform a calculation across a set of table rows that are somehow related to the current row. Unlike aggregate functions, they do not collapse rows.

-- Rank employees by salary within each department
SELECT
    name,
    department,
    salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;

#10. What is a CTE (Common Table Expression)?
A: A CTE is a temporary, named result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. It helps improve readability and break down complex queries.

343 views19:27

Data Science Machine Learning Data Analysis

💡 Applying Image Filters with Pillow

Pillow's ImageFilter module provides a set of pre-defined filters you can apply to your images with a single line of code. This example demonstrates how to apply a Gaussian blur effect, which is useful for softening images or creating depth-of-field effects.

from PIL import Image, ImageFilter

try:
    # Open an existing image
    with Image.open("your_image.jpg") as img:
        # Apply the Gaussian Blur filter
        # The radius parameter controls the blur intensity
        blurred_img = img.filter(ImageFilter.GaussianBlur(radius=5))

        # Display the blurred image
        blurred_img.show()

        # Save the new image
        blurred_img.save("blurred_image.png")

except FileNotFoundError:
    print("Error: 'your_image.jpg' not found. Please provide an image.")

Code explanation: The script opens an image file, applies a GaussianBlur filter from the ImageFilter module using the .filter() method, and then displays and saves the resulting blurred image. The blur intensity is controlled by the radius argument.

#Python #Pillow #ImageProcessing #ImageFilter #PIL

━━━━━━━━━━━━━━━
By: @DataScienceM ✨

466 views10:48

Data Science Machine Learning Data Analysis

• Get raw audio data as a NumPy array.

import numpy as np
samples = np.array(audio.get_array_of_samples())

• Create a Pydub segment from a NumPy array.

new_audio = AudioSegment(
    samples.tobytes(),
    frame_rate=audio.frame_rate,
    sample_width=audio.sample_width,
    channels=audio.channels
)

• Read a WAV file directly into a NumPy array.

from scipy.io.wavfile import read
rate, data = read("sound.wav")

• Write a NumPy array to a WAV file.

from scipy.io.wavfile import write
write("new_sound.wav", rate, data)

• Generate a sine wave.

import numpy as np
sample_rate = 44100
frequency = 440 # A4 note
duration = 5
t = np.linspace(0., duration, int(sample_rate * duration))
amplitude = np.iinfo(np.int16).max * 0.5
data = amplitude * np.sin(2. * np.pi * frequency * t)
# This array can now be written to a file

VIII. Audio Analysis with Librosa

• Load audio with Librosa.

import librosa
y, sr = librosa.load("sound.mp3")

• Estimate tempo (Beats Per Minute).

tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)

• Get beat event times in seconds.

beat_times = librosa.frames_to_time(beat_frames, sr=sr)

• Decompose into harmonic and percussive components.

y_harmonic, y_percussive = librosa.effects.hpss(y)

• Compute a spectrogram.

import numpy as np
D = librosa.stft(y)
S_db = librosa.amplitude_to_db(np.abs(D), ref=np.max)

• Compute Mel-Frequency Cepstral Coefficients (MFCCs).

mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)

• Compute Chroma features (related to musical pitch).

chroma = librosa.feature.chroma_stft(y=y, sr=sr)

• Detect onset events (the start of notes).

onset_frames = librosa.onset.onset_detect(y=y, sr=sr)
onset_times = librosa.frames_to_time(onset_frames, sr=sr)

• Pitch shifting.

y_pitched = librosa.effects.pitch_shift(y, sr=sr, n_steps=4) # Shift up 4 semitones

• Time stretching (change speed without changing pitch).

y_fast = librosa.effects.time_stretch(y, rate=2.0) # Double speed

IX. More Utilities

• Detect leading silence.

from pydub.silence import detect_leading_silence
trim_ms = detect_leading_silence(audio)
trimmed_audio = audio[trim_ms:]

• Get the root mean square (RMS) energy.

rms = audio.rms

• Get the maximum possible RMS for the audio format.

max_possible_rms = audio.max_possible_amplitude

• Find the loudest section of an audio file.

from pydub.scipy_effects import normalize
loudest_part = normalize(audio.strip_silence(silence_len=1000, silence_thresh=-32))

• Change the frame rate (resample).

resampled = audio.set_frame_rate(16000)

• Create a simple band-pass filter.

from pydub.scipy_effects import band_pass_filter
filtered = band_pass_filter(audio, 400, 2000) # Pass between 400Hz and 2000Hz

• Convert file format in one line.

AudioSegment.from_file("music.ogg").export("music.mp3", format="mp3")

• Get the raw bytes of the audio data.

raw_data = audio.raw_data

• Get the maximum amplitude.

max_amp = audio.max

• Match the volume of two segments.

matched_audio2 = audio2.apply_gain(audio1.dBFS - audio2.dBFS)

#Python #AudioProcessing #Pydub #Librosa #SignalProcessing

━━━━━━━━━━━━━━━
By: @DataScienceM ✨

❤2

617 views10:57

Data Science Machine Learning Data Analysis

segment = sine_wave[0:51]
windowed_segment = segment * window

VI. Convolution & Correlation

• Perform linear convolution.

sig1 = np.repeat([0., 1., 0.], 100)
sig2 = np.repeat([0., 1., 1., 0.], 100)
convolved = signal.convolve(sig1, sig2, mode='same')

• Compute cross-correlation.

# Useful for finding delays between signals
correlation = signal.correlate(sig1, sig2, mode='full')

• Compute auto-correlation.

# Useful for finding periodicities in a signal
autocorr = signal.correlate(sine_wave, sine_wave, mode='full')

VII. Time-Frequency Analysis

• Compute and plot a spectrogram.

f, t_spec, Sxx = signal.spectrogram(chirp_signal, fs)
plt.pcolormesh(t_spec, f, Sxx, shading='gouraud')
plt.show()

• Perform Continuous Wavelet Transform (CWT).

widths = np.arange(1, 31)
cwt_matrix = signal.cwt(chirp_signal, signal.ricker, widths)

• Perform Hilbert transform to get the analytic signal.

analytic_signal = signal.hilbert(sine_wave)

• Calculate instantaneous frequency.

instant_phase = np.unwrap(np.angle(analytic_signal))
instant_freq = (np.diff(instant_phase) / (2.0*np.pi) * fs)

VIII. Feature Extraction

• Find peaks in a signal.

peaks, _ = signal.find_peaks(sine_wave, height=0.5)

• Find peaks with prominence criteria.

peaks_prom, _ = signal.find_peaks(noisy_signal, prominence=1)

• Differentiate a signal (e.g., to find velocity from position).

derivative = np.diff(sine_wave)

• Integrate a signal.

from scipy.integrate import cumulative_trapezoid
integral = cumulative_trapezoid(sine_wave, t, initial=0)

• Detrend a signal to remove a linear trend.

trend = np.linspace(0, 1, fs)
trended_signal = sine_wave + trend
detrended = signal.detrend(trended_signal)

IX. System Analysis

• Define a system via a transfer function (numerator, denominator).

# Example: 2nd order low-pass filter
system = signal.TransferFunction([1], [1, 1, 1])

• Compute the step response of a system.

t_step, y_step = signal.step(system)

• Compute the impulse response of a system.

t_impulse, y_impulse = signal.impulse(system)

• Compute the Bode plot of a system's frequency response.

w, mag, phase = signal.bode(system)

X. Signal Generation from Data

• Generate a signal from a function.

t = np.linspace(0, 1, 500)
custom_signal = np.sinc(2 * np.pi * 4 * t)

• Convert a list of values to a signal array.

my_data = [0, 1, 2, 3, 2, 1, 0, -1, -2, -1, 0]
data_signal = np.array(my_data)

• Read signal data from a WAV file.

from scipy.io import wavfile
samplerate, data = wavfile.read('audio.wav')

• Create a pulse train signal.

pulse_train = np.zeros(fs)
pulse_train[::100] = 1 # Impulse every 100 samples

#Python #SignalProcessing #SciPy #NumPy #DSP

━━━━━━━━━━━━━━━
By: @DataScienceM ✨

312 views04:19

Data Science Machine Learning Data Analysis

fig, ax = plt.subplots() # Single subplot
fig, axes = plt.subplots(2, 2) # 2x2 grid of subplots

• Plot on a specific subplot (Axes object).

axes[0, 0].plot(x, np.sin(x))

• Set the title for a specific subplot.

axes[0, 0].set_title('Subplot 1')

• Set labels for a specific subplot.

axes[0, 0].set_xlabel('X-axis')
axes[0, 0].set_ylabel('Y-axis')

• Add a legend to a specific subplot.

axes[0, 0].legend(['Sine'])

• Add a main title for the entire figure.

fig.suptitle('Main Figure Title')

• Automatically adjust subplot parameters for a tight layout.

plt.tight_layout()

• Share x or y axes between subplots.

fig, axes = plt.subplots(2, 1, sharex=True)

• Get the current Axes instance.

ax = plt.gca()

• Create a second y-axis that shares the x-axis.

ax2 = ax.twinx()

VI. Specialized Plots

• Create a contour plot.

X, Y = np.meshgrid(x, x)
Z = np.sin(X) * np.cos(Y)
plt.contour(X, Y, Z, levels=10)

• Create a filled contour plot.

plt.contourf(X, Y, Z)

• Create a stream plot for vector fields.

U, V = np.cos(X), np.sin(Y)
plt.streamplot(X, Y, U, V)

• Create a 3D surface plot.

from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z)

#Python #Matplotlib #DataVisualization #DataScience #Plotting

━━━━━━━━━━━━━━━
By: @DataScienceM ✨

600 views04:21

Data Science Machine Learning Data Analysis

• Group data by a column.

df.groupby('col1')

• Group by a column and get the sum.

df.groupby('col1').sum()

• Apply multiple aggregation functions at once.

df.groupby('col1').agg(['mean', 'count'])

• Get the size of each group.

df.groupby('col1').size()

• Get the frequency counts of unique values in a Series.

df['col1'].value_counts()

• Create a pivot table.

pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])

VI. Merging, Joining & Concatenating

• Merge two DataFrames (like a SQL join).

pd.merge(left_df, right_df, on='key_column')

• Concatenate (stack) DataFrames along an axis.

pd.concat([df1, df2]) # Stacks rows

• Join DataFrames on their indexes.

left_df.join(right_df, how='outer')

VII. Input & Output

• Write a DataFrame to a CSV file.

df.to_csv('output.csv', index=False)

• Write a DataFrame to an Excel file.

df.to_excel('output.xlsx', sheet_name='Sheet1')

• Read data from an Excel file.

pd.read_excel('input.xlsx', sheet_name='Sheet1')

• Read from a SQL database.

pd.read_sql_query('SELECT * FROM my_table', connection_object)

VIII. Time Series & Special Operations

• Use the string accessor (.str) for Series operations.

s.str.lower()
s.str.contains('pattern')

• Use the datetime accessor (.dt) for Series operations.

s.dt.year
s.dt.day_name()

• Create a rolling window calculation.

df['col1'].rolling(window=3).mean()

• Create a basic plot from a Series or DataFrame.

df['col1'].plot(kind='hist')

#Python #Pandas #DataAnalysis #DataScience #Programming

━━━━━━━━━━━━━━━
By: @DataScienceM ✨

❤6👍1🔥1

679 views03:07

Data Science Machine Learning Data Analysis

📌 NumPy for Absolute Beginners: A Project-Based Approach to Data Analysis

🗂 Category: DATA SCIENCE

🕒 Date: 2025-11-04 | ⏱️ Read time: 14 min read

Master NumPy for data analysis with this project-based guide for absolute beginners. Learn to build a high-performance sensor data pipeline from scratch and unlock the true speed of Python for data-intensive applications.

#NumPy #Python #DataAnalysis #DataScience

579 views08:20

📖 Read and Learn

🧪 Explore Data Science

Data Science Machine Learning Data Analysis

📌 Train a Humanoid Robot with AI and Python

🗂 Category: ROBOTICS

🕒 Date: 2025-11-04 | ⏱️ Read time: 9 min read

Explore how to train a humanoid robot using Python and AI. This guide covers the application of 3D simulations and Reinforcement Learning, leveraging powerful tools like the MuJoCo physics engine and the Gym toolkit to create and manage sophisticated learning environments for robotics.

#AI #Robotics #Python #ReinforcementLearning #MachineLearning

❤1

810 views12:50

📖 Read and Learn

🧪 Explore Data Science

Data Science Machine Learning Data Analysis

📌 Make Python Up to 150× Faster with C

🗂 Category: PROGRAMMING

🕒 Date: 2025-11-10 | ⏱️ Read time: 14 min read

Dramatically accelerate your Python applications—up to 150x faster—by strategically offloading performance-critical code to C. This practical guide shows how to seamlessly integrate C with your existing Python projects, supercharging your code's bottlenecks without abandoning the Python ecosystem. Achieve significant performance gains where they matter most.

#Python #CProgramming #PerformanceOptimization #Coding

657 views04:29

📖 Read and Learn

🧪 Explore Data Science

Data Science Machine Learning Data Analysis

📌 Does More Data Always Yield Better Performance?

🗂 Category: DATA SCIENCE

🕒 Date: 2025-11-10 | ⏱️ Read time: 9 min read

Exploring and challenging the conventional wisdom of “more data → better performance” by experimenting with…

#DataScience #AI #Python

❤1

617 views12:29

📖 Read and Learn

🧪 Explore Data Science

Data Science Machine Learning Data Analysis

📌 How to Build Agents with GPT-5

🗂 Category: AGENTIC AI

🕒 Date: 2025-11-11 | ⏱️ Read time: 8 min read

Learn how to use GPT-5 as a powerful AI Agent on your data.

#DataScience #AI #Python

623 views06:30

📖 Read and Learn

🧪 Explore Data Science

Data Science Machine Learning Data Analysis

📌 Feature Detection, Part 2: Laplace & Gaussian Operators

🗂 Category: COMPUTER VISION

🕒 Date: 2025-11-12 | ⏱️ Read time: 12 min read

Laplace meets Gaussian — the story of two operators in edge detection

#DataScience #AI #Python

576 views13:57

📖 Read and Learn

🧪 Explore Data Science

Data Science Machine Learning Data Analysis

📌 Robotics with Python: Q-Learning vs Actor-Critic vs Evolutionary Algorithms

🗂 Category: Uncategorized

🕒 Date: 2025-11-13 | ⏱️ Read time: 15 min read

Explore the intersection of Python and robotics in this deep dive into reinforcement learning algorithms. The article compares the trade-offs, strengths, and weaknesses of Q-Learning, Actor-Critic, and Evolutionary Algorithms for robotic control tasks. Learn how to apply these concepts by building a custom 3D environment to train and test your own RL-powered robot, providing a practical understanding of which technique to choose for your specific application.

#Python #Robotics #ReinforcementLearning #MachineLearning #AI

388 views03:58

📖 Read and Learn

🧪 Explore Data Science

Data Science Machine Learning Data Analysis

📌 Spearman Correlation Coefficient for When Pearson Isn’t Enough

🗂 Category: DATA SCIENCE

🕒 Date: 2025-11-13 | ⏱️ Read time: 7 min read

Not all relationships are linear, and that is where Spearman comes in.

#DataScience #AI #Python

203 views11:59

📖 Read and Learn

🧪 Explore Data Science

Data Science Machine Learning Data Analysis

📌 Music, Lyrics, and Agentic AI: Building a Smart Song Explainer using Python and OpenAI

🗂 Category: LARGE LANGUAGE MODELS

🕒 Date: 2025-11-14 | ⏱️ Read time: 10 min read

This is how to build an AI-powered Song Explainer using Python and OpenAI

#DataScience #AI #Python

37 views14:41

📖 Read and Learn

🧪 Explore Data Science

About

Blog

Apps

Platform