Machine Learning
39.4K subscribers
4.35K photos
40 videos
50 files
1.42K links
Real Machine Learning — simple, practical, and built on experience.
Learn step by step with clear explanations and working code.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
• Get raw audio data as a NumPy array.
import numpy as np
samples = np.array(audio.get_array_of_samples())

• Create a Pydub segment from a NumPy array.
new_audio = AudioSegment(
samples.tobytes(),
frame_rate=audio.frame_rate,
sample_width=audio.sample_width,
channels=audio.channels
)

• Read a WAV file directly into a NumPy array.
from scipy.io.wavfile import read
rate, data = read("sound.wav")

• Write a NumPy array to a WAV file.
from scipy.io.wavfile import write
write("new_sound.wav", rate, data)

• Generate a sine wave.
import numpy as np
sample_rate = 44100
frequency = 440 # A4 note
duration = 5
t = np.linspace(0., duration, int(sample_rate * duration))
amplitude = np.iinfo(np.int16).max * 0.5
data = amplitude * np.sin(2. * np.pi * frequency * t)
# This array can now be written to a file


VIII. Audio Analysis with Librosa

• Load audio with Librosa.
import librosa
y, sr = librosa.load("sound.mp3")

• Estimate tempo (Beats Per Minute).
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)

• Get beat event times in seconds.
beat_times = librosa.frames_to_time(beat_frames, sr=sr)

• Decompose into harmonic and percussive components.
y_harmonic, y_percussive = librosa.effects.hpss(y)

• Compute a spectrogram.
import numpy as np
D = librosa.stft(y)
S_db = librosa.amplitude_to_db(np.abs(D), ref=np.max)

• Compute Mel-Frequency Cepstral Coefficients (MFCCs).
mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)

• Compute Chroma features (related to musical pitch).
chroma = librosa.feature.chroma_stft(y=y, sr=sr)

• Detect onset events (the start of notes).
onset_frames = librosa.onset.onset_detect(y=y, sr=sr)
onset_times = librosa.frames_to_time(onset_frames, sr=sr)

• Pitch shifting.
y_pitched = librosa.effects.pitch_shift(y, sr=sr, n_steps=4) # Shift up 4 semitones

• Time stretching (change speed without changing pitch).
y_fast = librosa.effects.time_stretch(y, rate=2.0) # Double speed


IX. More Utilities

• Detect leading silence.
from pydub.silence import detect_leading_silence
trim_ms = detect_leading_silence(audio)
trimmed_audio = audio[trim_ms:]

• Get the root mean square (RMS) energy.
rms = audio.rms

• Get the maximum possible RMS for the audio format.
max_possible_rms = audio.max_possible_amplitude

• Find the loudest section of an audio file.
from pydub.scipy_effects import normalize
loudest_part = normalize(audio.strip_silence(silence_len=1000, silence_thresh=-32))

• Change the frame rate (resample).
resampled = audio.set_frame_rate(16000)

• Create a simple band-pass filter.
from pydub.scipy_effects import band_pass_filter
filtered = band_pass_filter(audio, 400, 2000) # Pass between 400Hz and 2000Hz

• Convert file format in one line.
AudioSegment.from_file("music.ogg").export("music.mp3", format="mp3")

• Get the raw bytes of the audio data.
raw_data = audio.raw_data

• Get the maximum amplitude.
max_amp = audio.max

• Match the volume of two segments.
matched_audio2 = audio2.apply_gain(audio1.dBFS - audio2.dBFS)


#Python #AudioProcessing #Pydub #Librosa #SignalProcessing

━━━━━━━━━━━━━━━
By: @DataScienceM
3
📌 The Pearson Correlation Coefficient, Explained Simply

🗂 Category: STATISTICS

🕒 Date: 2025-11-01 | ⏱️ Read time: 7 min read

A simple explanation of the Pearson correlation coefficient with examples
2
📌 Graph RAG vs SQL RAG

🗂 Category: LARGE LANGUAGE MODELS

🕒 Date: 2025-11-01 | ⏱️ Read time: 7 min read

Evaluating RAGs on graph and SQL databases
📌 Understanding the Two Faces of Shiny for Python: Core and Express

🗂 Category: DATA SCIENCE

🕒 Date: 2024-05-29 | ⏱️ Read time: 7 min read

Exploring the Differences and Use Cases of Shiny Core and Shiny Express for Python
📌 Do You Need a Degree to Be a Data Scientist?

🗂 Category: DATA SCIENCE

🕒 Date: 2024-05-29 | ⏱️ Read time: 8 min read

No, but it certainly helps.
🤖🧠 HunyuanWorld-Mirror: Tencent’s Breakthrough in Universal 3D Reconstruction

🗓️ 03 Nov 2025
📚 AI News & Trends

The race toward achieving universal 3D understanding has reached a significant milestone with Tencent’s HunyuanWorld-Mirror, a cutting-edge open-source model designed to revolutionize 3D reconstruction. In an era dominated by visual intelligence and immersive digital experiences, this new model stands out by offering a feed-forward, geometry-aware framework that can predict multiple 3D outputs in a single ...

#HunyuanWorld #Tencent #3DReconstruction #UniversalAI #GeometryAware #OpenSourceAI
📌 Data Scientists Work in the Cloud. Here’s How to Practice This as a Student (Part 2: Python)

🗂 Category: DATA SCIENCE

🕒 Date: 2024-05-29 | ⏱️ Read time: 9 min read

Because data scientists don’t write production code in the Udemy code editor
💡 Top 50 Operations for Signal Processing in Python

Note: Most examples use numpy, scipy.signal, and matplotlib.pyplot. Assume they are imported as:
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt

I. Signal Generation

• Create a time vector.
fs = 1000  # Sampling frequency
t = np.linspace(0, 1, fs, endpoint=False)

• Generate a sine wave.
freq = 50 # Hz
sine_wave = np.sin(2 * np.pi * freq * t)

• Generate a square wave.
square_wave = signal.square(2 * np.pi * freq * t)

• Generate a sawtooth wave.
sawtooth_wave = signal.sawtooth(2 * np.pi * freq * t)

• Generate Gaussian white noise.
noise = np.random.normal(0, 1, len(t))

• Generate a frequency-swept cosine (chirp).
chirp_signal = signal.chirp(t, f0=1, f1=100, t1=1, method='linear')

• Generate an impulse signal (unit impulse).
impulse = signal.unit_impulse(100, 'mid') # at index 50 of 100

• Generate a Gaussian pulse.
gaus_pulse = signal.gausspulse(t, fc=5, bw=0.5)


II. Signal Visualization & Properties

• Plot a signal.
plt.plot(t, sine_wave)
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.show()

• Calculate the mean value.
mean_val = np.mean(sine_wave)

• Calculate the Root Mean Square (RMS).
rms_val = np.sqrt(np.mean(sine_wave**2))

• Calculate the standard deviation.
std_dev = np.std(sine_wave)

• Find the maximum value and its index.
max_val = np.max(sine_wave)
max_idx = np.argmax(sine_wave)


III. Frequency Domain Analysis (FFT)

• Compute the Fast Fourier Transform (FFT).
from scipy.fft import fft, fftfreq
yf = fft(sine_wave)

• Get the frequency bins for the FFT.
N = len(sine_wave)
xf = fftfreq(N, 1 / fs)[:N//2]

• Plot the magnitude spectrum.
plt.plot(xf, 2.0/N * np.abs(yf[0:N//2]))
plt.grid()
plt.show()

• Compute the Inverse FFT (IFFT).
from scipy.fft import ifft
original_signal = ifft(yf)

• Compute the Power Spectral Density (PSD) using Welch's method.
f, Pxx_den = signal.welch(sine_wave, fs, nperseg=1024)


IV. Digital Filtering

• Design a Butterworth low-pass filter.
b, a = signal.butter(4, 100, 'low', analog=False, fs=fs)

• Apply a filter to a signal (zero-phase filtering).
noisy_signal = sine_wave + noise
filtered_signal = signal.filtfilt(b, a, noisy_signal)

• Design a Chebyshev Type I high-pass filter.
b, a = signal.cheby1(4, 5, 100, 'high', fs=fs) # 5dB ripple

• Design a Bessel band-pass filter.
b, a = signal.bessel(4, [50, 150], 'band', fs=fs)

• Design an FIR filter using a window method.
numtaps = 101
fir_coeffs = signal.firwin(numtaps, cutoff=100, fs=fs)

• Plot the frequency response of a filter.
w, h = signal.freqz(b, a, fs=fs)
plt.plot(w, 20 * np.log10(abs(h)))

• Apply a median filter (good for salt-and-pepper noise).
median_filtered = signal.medfilt(noisy_signal, kernel_size=3)

• Apply a Wiener filter for noise reduction.
wiener_filtered = signal.wiener(noisy_signal)


V. Resampling & Windowing

• Resample a signal to a new length.
resampled = signal.resample(sine_wave, num=500) # Resample to 500 points

• Decimate a signal (downsample by a factor).
decimated = signal.decimate(sine_wave, q=4) # Downsample by 4

• Create a Hamming window.
window = signal.windows.hamming(51)

• Apply a window to a signal segment.
segment = sine_wave[0:51]
windowed_segment = segment * window


VI. Convolution & Correlation

• Perform linear convolution.
sig1 = np.repeat([0., 1., 0.], 100)
sig2 = np.repeat([0., 1., 1., 0.], 100)
convolved = signal.convolve(sig1, sig2, mode='same')

• Compute cross-correlation.
# Useful for finding delays between signals
correlation = signal.correlate(sig1, sig2, mode='full')

• Compute auto-correlation.
# Useful for finding periodicities in a signal
autocorr = signal.correlate(sine_wave, sine_wave, mode='full')


VII. Time-Frequency Analysis

• Compute and plot a spectrogram.
f, t_spec, Sxx = signal.spectrogram(chirp_signal, fs)
plt.pcolormesh(t_spec, f, Sxx, shading='gouraud')
plt.show()

• Perform Continuous Wavelet Transform (CWT).
widths = np.arange(1, 31)
cwt_matrix = signal.cwt(chirp_signal, signal.ricker, widths)

• Perform Hilbert transform to get the analytic signal.
analytic_signal = signal.hilbert(sine_wave)

• Calculate instantaneous frequency.
instant_phase = np.unwrap(np.angle(analytic_signal))
instant_freq = (np.diff(instant_phase) / (2.0*np.pi) * fs)


VIII. Feature Extraction

• Find peaks in a signal.
peaks, _ = signal.find_peaks(sine_wave, height=0.5)

• Find peaks with prominence criteria.
peaks_prom, _ = signal.find_peaks(noisy_signal, prominence=1)

• Differentiate a signal (e.g., to find velocity from position).
derivative = np.diff(sine_wave)

• Integrate a signal.
from scipy.integrate import cumulative_trapezoid
integral = cumulative_trapezoid(sine_wave, t, initial=0)

• Detrend a signal to remove a linear trend.
trend = np.linspace(0, 1, fs)
trended_signal = sine_wave + trend
detrended = signal.detrend(trended_signal)


IX. System Analysis

• Define a system via a transfer function (numerator, denominator).
# Example: 2nd order low-pass filter
system = signal.TransferFunction([1], [1, 1, 1])

• Compute the step response of a system.
t_step, y_step = signal.step(system)

• Compute the impulse response of a system.
t_impulse, y_impulse = signal.impulse(system)

• Compute the Bode plot of a system's frequency response.
w, mag, phase = signal.bode(system)


X. Signal Generation from Data

• Generate a signal from a function.
t = np.linspace(0, 1, 500)
custom_signal = np.sinc(2 * np.pi * 4 * t)

• Convert a list of values to a signal array.
my_data = [0, 1, 2, 3, 2, 1, 0, -1, -2, -1, 0]
data_signal = np.array(my_data)

• Read signal data from a WAV file.
from scipy.io import wavfile
samplerate, data = wavfile.read('audio.wav')

• Create a pulse train signal.
pulse_train = np.zeros(fs)
pulse_train[::100] = 1 # Impulse every 100 samples


#Python #SignalProcessing #SciPy #NumPy #DSP

━━━━━━━━━━━━━━━
By: @DataScienceM
💡 Top 50 Matplotlib Commands in Python

Note: Examples assume the following imports:
import matplotlib.pyplot as plt
import numpy as np

I. Figure & Basic Plots

• Create a figure.
fig = plt.figure(figsize=(8, 6))

• Create a basic line plot.
x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))

• Show/display the plot.
plt.show()

• Save a figure to a file.
plt.savefig("my_plot.png", dpi=300)

• Create a scatter plot.
plt.scatter(x, np.cos(x))

• Create a bar chart.
categories = ['A', 'B', 'C']
values = [3, 7, 2]
plt.bar(categories, values)

• Create a horizontal bar chart.
plt.barh(categories, values)

• Create a histogram.
data = np.random.randn(1000)
plt.hist(data, bins=30)

• Create a pie chart.
plt.pie(values, labels=categories, autopct='%1.1f%%')

• Create a box plot.
plt.boxplot([data, data*2])

• Display a 2D array or image.
matrix = np.random.rand(10, 10)
plt.imshow(matrix, cmap='viridis')

• Clear the current figure.
plt.clf()


II. Labels, Titles & Legends

• Add a title to the plot.
plt.title("Sine Wave")

• Add a label to the x-axis.
plt.xlabel("Time (s)")

• Add a label to the y-axis.
plt.ylabel("Amplitude")

• Add a legend.
plt.plot(x, np.sin(x), label='Sine')
plt.plot(x, np.cos(x), label='Cosine')
plt.legend()

• Add a grid.
plt.grid(True)

• Add text to the plot at specific coordinates.
plt.text(2, 0.5, 'An important point')

• Add an annotation with an arrow.
plt.annotate('Peak', xy=(np.pi/2, 1), xytext=(3, 1.5),
arrowprops=dict(facecolor='black', shrink=0.05))


III. Axes & Ticks

• Set the x-axis limits.
plt.xlim(0, 5)

• Set the y-axis limits.
plt.ylim(-1.5, 1.5)

• Set the x-axis ticks and labels.
plt.xticks([0, np.pi, 2*np.pi], ['0', '$\pi$', '$2\pi$'])

• Set the y-axis ticks and labels.
plt.yticks([-1, 0, 1])

• Set a logarithmic scale on an axis.
plt.yscale('log')

• Set the aspect ratio of the plot.
plt.axis('equal') # Other options: 'tight', 'off'


IV. Plot Customization

• Set the color of a plot.
plt.plot(x, np.sin(x), color='red')

• Set the line style.
plt.plot(x, np.sin(x), linestyle='--')

• Set the line width.
plt.plot(x, np.sin(x), linewidth=3)

• Set the marker style for points.
plt.plot(x, np.sin(x), marker='o')

• Set the transparency (alpha).
plt.hist(data, alpha=0.5)

• Use a predefined style.
plt.style.use('ggplot')

• Fill the area between two curves.
plt.fill_between(x, np.sin(x), np.cos(x), alpha=0.2)

• Create an error bar plot.
y_err = 0.2 * np.ones_like(x)
plt.errorbar(x, np.sin(x), yerr=y_err)

• Add a horizontal line.
plt.axhline(y=0, color='k', linestyle='-')

• Add a vertical line.
plt.axvline(x=np.pi, color='k', linestyle='-')

• Add a colorbar for plots like imshow or scatter.
plt.colorbar(label='Magnitude')


V. Subplots (Object-Oriented Approach)

• Create a figure and a grid of subplots (preferred method).
fig, ax = plt.subplots() # Single subplot
fig, axes = plt.subplots(2, 2) # 2x2 grid of subplots

• Plot on a specific subplot (Axes object).
axes[0, 0].plot(x, np.sin(x))

• Set the title for a specific subplot.
axes[0, 0].set_title('Subplot 1')

• Set labels for a specific subplot.
axes[0, 0].set_xlabel('X-axis')
axes[0, 0].set_ylabel('Y-axis')

• Add a legend to a specific subplot.
axes[0, 0].legend(['Sine'])

• Add a main title for the entire figure.
fig.suptitle('Main Figure Title')

• Automatically adjust subplot parameters for a tight layout.
plt.tight_layout()

• Share x or y axes between subplots.
fig, axes = plt.subplots(2, 1, sharex=True)

• Get the current Axes instance.
ax = plt.gca()

• Create a second y-axis that shares the x-axis.
ax2 = ax.twinx()


VI. Specialized Plots

• Create a contour plot.
X, Y = np.meshgrid(x, x)
Z = np.sin(X) * np.cos(Y)
plt.contour(X, Y, Z, levels=10)

• Create a filled contour plot.
plt.contourf(X, Y, Z)

• Create a stream plot for vector fields.
U, V = np.cos(X), np.sin(Y)
plt.streamplot(X, Y, U, V)

• Create a 3D surface plot.
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z)


#Python #Matplotlib #DataVisualization #DataScience #Plotting

━━━━━━━━━━━━━━━
By: @DataScienceM
📌 SQL Explained: Normal Forms

🗂 Category: DATA ENGINEERING

🕒 Date: 2024-05-29 | ⏱️ Read time: 9 min read

Applying 1st, 2nd and 3rd normal forms to a database
📌 Simple Ways to Speed Up Your PyTorch Model Training

🗂 Category: MACHINE LEARNING

🕒 Date: 2024-05-28 | ⏱️ Read time: 12 min read

If all machine learning engineers want one thing, it’s faster model training - maybe after good test…
📌 Fine-Tune Smaller Transformer Models: Text Classification

🗂 Category: MACHINE LEARNING

🕒 Date: 2024-05-28 | ⏱️ Read time: 22 min read

Using Microsoft’s Phi-3 to generate synthetic data
1
📌 How I Assess the Memory Consumption of My Python Code

🗂 Category: ARTIFICIAL INTELLIGENCE

🕒 Date: 2024-05-28 | ⏱️ Read time: 6 min read

Different approaches to measure the memory consumption of a variable or a function
📌 Scaling Monosemanticity: Anthropic’s One Step Towards Interpretable & Manipulable LLMs

🗂 Category:

🕒 Date: 2024-05-28 | ⏱️ Read time: 13 min read

From prompt engineering to activation engineering for more controllable and safer LLMs
1
🤖🧠 LongCat-Video: Meituan’s Groundbreaking Step Toward Efficient Long Video Generation with AI

🗓️ 04 Nov 2025
📚 AI News & Trends

In the rapidly advancing field of generative AI, the ability to create realistic, coherent, and high-quality videos from text or images has become one of the most sought-after goals. Meituan, one of the leading technology innovators in China, has made a remarkable stride in this domain with its latest open-source model — LongCat-Video. Designed as ...

#LongCatVideo #Meituan #GenerativeAI #VideoGeneration #AIInnovation #OpenSource
📌 Introduction to Domain Adaptation- Motivation, Options, Tradeoffs

🗂 Category:

🕒 Date: 2024-05-28 | ⏱️ Read time: 15 min read

Stepping out of the “comfort zone” – part 1/3 of a deep-dive into domain adaptation…
🔥1
💡 Top 50 Pandas Operations in Python

(Note: Examples assume the import import pandas as pd and import numpy as np)

I. Series & DataFrame Creation

• Create a pandas Series from a list.
s = pd.Series([1, 3, 5, np.nan, 6, 8])

• Create a DataFrame from a dictionary of lists.
data = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(data)

• Create a DataFrame from a list of dictionaries.
data = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)

• Read data from a CSV file.
df = pd.read_csv('my_file.csv')

• Create a date range.
dates = pd.date_range('20230101', periods=6)


II. Data Inspection & Selection

• View the first 5 rows.
df.head()

• View the last 5 rows.
df.tail()

• Get a concise summary of the DataFrame.
df.info()

• Get descriptive statistics for numerical columns.
df.describe()

• Get the dimensions of the DataFrame (rows, columns).
df.shape

• Get the column labels.
df.columns

• Get the index (row labels).
df.index

• Select a single column.
df['col1'] # or df.col1

• Select multiple columns.
df[['col1', 'col2']]

• Select rows by label/index name using .loc.
df.loc[0:2, ['col1']] # Select rows 0,1,2 and column 'col1'

• Select rows by integer position using .iloc.
df.iloc[0:3, 0:1] # Select first 3 rows and first column

• Perform boolean/conditional selection.
df[df['col1'] > 2]

• Filter rows using .isin().
df[df['col1'].isin([1, 3])]


III. Data Cleaning

• Check for missing/null values.
df.isnull().sum() # Returns a Series with counts of nulls per column

• Drop rows with any missing values.
df.dropna()

• Fill missing values with a specific value.
df.fillna(value=0)

• Check for duplicated rows.
df.duplicated()

• Drop duplicated rows.
df.drop_duplicates(inplace=True)


IV. Data Manipulation & Operations

• Drop specified labels (columns or rows).
df.drop('col1', axis=1) # Drop a column

• Rename columns.
df.rename(columns={'col1': 'new_col1_name'})

• Set a column as the index.
df.set_index('col1')

• Reset the index.
df.reset_index(drop=True)

• Apply a function along an axis (e.g., per column).
df.apply(np.cumsum)

• Apply a function element-wise to a Series.
df['col1'].map(lambda x: x*100)

• Sort by values in a column.
df.sort_values(by='col1', ascending=False)

• Sort by index.
df.sort_index(axis=1, ascending=False)

• Change the data type of a column.
df['col1'].astype('float')

• Create a new column based on a calculation.
df['new_col'] = df['col1'] * 2


V. Grouping & Aggregation
🔥1
• Group data by a column.
df.groupby('col1')

• Group by a column and get the sum.
df.groupby('col1').sum()

• Apply multiple aggregation functions at once.
df.groupby('col1').agg(['mean', 'count'])

• Get the size of each group.
df.groupby('col1').size()

• Get the frequency counts of unique values in a Series.
df['col1'].value_counts()

• Create a pivot table.
pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])


VI. Merging, Joining & Concatenating

• Merge two DataFrames (like a SQL join).
pd.merge(left_df, right_df, on='key_column')

• Concatenate (stack) DataFrames along an axis.
pd.concat([df1, df2]) # Stacks rows

• Join DataFrames on their indexes.
left_df.join(right_df, how='outer')


VII. Input & Output

• Write a DataFrame to a CSV file.
df.to_csv('output.csv', index=False)

• Write a DataFrame to an Excel file.
df.to_excel('output.xlsx', sheet_name='Sheet1')

• Read data from an Excel file.
pd.read_excel('input.xlsx', sheet_name='Sheet1')

• Read from a SQL database.
pd.read_sql_query('SELECT * FROM my_table', connection_object)


VIII. Time Series & Special Operations

• Use the string accessor (.str) for Series operations.
s.str.lower()
s.str.contains('pattern')

• Use the datetime accessor (.dt) for Series operations.
s.dt.year
s.dt.day_name()

• Create a rolling window calculation.
df['col1'].rolling(window=3).mean()

• Create a basic plot from a Series or DataFrame.
df['col1'].plot(kind='hist')


#Python #Pandas #DataAnalysis #DataScience #Programming

━━━━━━━━━━━━━━━
By: @DataScienceM
6👍1🔥1