Machine Learning
39.4K subscribers
4.35K photos
40 videos
50 files
1.42K links
Real Machine Learning β€” simple, practical, and built on experience.
Learn step by step with clear explanations and working code.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
πŸ“Œ The Pearson Correlation Coefficient, Explained Simply

πŸ—‚ Category: STATISTICS

πŸ•’ Date: 2025-11-01 | ⏱️ Read time: 7 min read

A simple explanation of the Pearson correlation coefficient with examples
❀2
πŸ“Œ Graph RAG vs SQL RAG

πŸ—‚ Category: LARGE LANGUAGE MODELS

πŸ•’ Date: 2025-11-01 | ⏱️ Read time: 7 min read

Evaluating RAGs on graph and SQL databases
πŸ“Œ Understanding the Two Faces of Shiny for Python: Core and Express

πŸ—‚ Category: DATA SCIENCE

πŸ•’ Date: 2024-05-29 | ⏱️ Read time: 7 min read

Exploring the Differences and Use Cases of Shiny Core and Shiny Express for Python
πŸ“Œ Do You Need a Degree to Be a Data Scientist?

πŸ—‚ Category: DATA SCIENCE

πŸ•’ Date: 2024-05-29 | ⏱️ Read time: 8 min read

No, but it certainly helps.
πŸ€–πŸ§  HunyuanWorld-Mirror: Tencent’s Breakthrough in Universal 3D Reconstruction

πŸ—“οΈ 03 Nov 2025
πŸ“š AI News & Trends

The race toward achieving universal 3D understanding has reached a significant milestone with Tencent’s HunyuanWorld-Mirror, a cutting-edge open-source model designed to revolutionize 3D reconstruction. In an era dominated by visual intelligence and immersive digital experiences, this new model stands out by offering a feed-forward, geometry-aware framework that can predict multiple 3D outputs in a single ...

#HunyuanWorld #Tencent #3DReconstruction #UniversalAI #GeometryAware #OpenSourceAI
πŸ“Œ Data Scientists Work in the Cloud. Here’s How to Practice This as a Student (Part 2: Python)

πŸ—‚ Category: DATA SCIENCE

πŸ•’ Date: 2024-05-29 | ⏱️ Read time: 9 min read

Because data scientists don’t write production code in the Udemy code editor
πŸ’‘ Top 50 Operations for Signal Processing in Python

Note: Most examples use numpy, scipy.signal, and matplotlib.pyplot. Assume they are imported as:
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt

I. Signal Generation

β€’ Create a time vector.
fs = 1000  # Sampling frequency
t = np.linspace(0, 1, fs, endpoint=False)

β€’ Generate a sine wave.
freq = 50 # Hz
sine_wave = np.sin(2 * np.pi * freq * t)

β€’ Generate a square wave.
square_wave = signal.square(2 * np.pi * freq * t)

β€’ Generate a sawtooth wave.
sawtooth_wave = signal.sawtooth(2 * np.pi * freq * t)

β€’ Generate Gaussian white noise.
noise = np.random.normal(0, 1, len(t))

β€’ Generate a frequency-swept cosine (chirp).
chirp_signal = signal.chirp(t, f0=1, f1=100, t1=1, method='linear')

β€’ Generate an impulse signal (unit impulse).
impulse = signal.unit_impulse(100, 'mid') # at index 50 of 100

β€’ Generate a Gaussian pulse.
gaus_pulse = signal.gausspulse(t, fc=5, bw=0.5)


II. Signal Visualization & Properties

β€’ Plot a signal.
plt.plot(t, sine_wave)
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.show()

β€’ Calculate the mean value.
mean_val = np.mean(sine_wave)

β€’ Calculate the Root Mean Square (RMS).
rms_val = np.sqrt(np.mean(sine_wave**2))

β€’ Calculate the standard deviation.
std_dev = np.std(sine_wave)

β€’ Find the maximum value and its index.
max_val = np.max(sine_wave)
max_idx = np.argmax(sine_wave)


III. Frequency Domain Analysis (FFT)

β€’ Compute the Fast Fourier Transform (FFT).
from scipy.fft import fft, fftfreq
yf = fft(sine_wave)

β€’ Get the frequency bins for the FFT.
N = len(sine_wave)
xf = fftfreq(N, 1 / fs)[:N//2]

β€’ Plot the magnitude spectrum.
plt.plot(xf, 2.0/N * np.abs(yf[0:N//2]))
plt.grid()
plt.show()

β€’ Compute the Inverse FFT (IFFT).
from scipy.fft import ifft
original_signal = ifft(yf)

β€’ Compute the Power Spectral Density (PSD) using Welch's method.
f, Pxx_den = signal.welch(sine_wave, fs, nperseg=1024)


IV. Digital Filtering

β€’ Design a Butterworth low-pass filter.
b, a = signal.butter(4, 100, 'low', analog=False, fs=fs)

β€’ Apply a filter to a signal (zero-phase filtering).
noisy_signal = sine_wave + noise
filtered_signal = signal.filtfilt(b, a, noisy_signal)

β€’ Design a Chebyshev Type I high-pass filter.
b, a = signal.cheby1(4, 5, 100, 'high', fs=fs) # 5dB ripple

β€’ Design a Bessel band-pass filter.
b, a = signal.bessel(4, [50, 150], 'band', fs=fs)

β€’ Design an FIR filter using a window method.
numtaps = 101
fir_coeffs = signal.firwin(numtaps, cutoff=100, fs=fs)

β€’ Plot the frequency response of a filter.
w, h = signal.freqz(b, a, fs=fs)
plt.plot(w, 20 * np.log10(abs(h)))

β€’ Apply a median filter (good for salt-and-pepper noise).
median_filtered = signal.medfilt(noisy_signal, kernel_size=3)

β€’ Apply a Wiener filter for noise reduction.
wiener_filtered = signal.wiener(noisy_signal)


V. Resampling & Windowing

β€’ Resample a signal to a new length.
resampled = signal.resample(sine_wave, num=500) # Resample to 500 points

β€’ Decimate a signal (downsample by a factor).
decimated = signal.decimate(sine_wave, q=4) # Downsample by 4

β€’ Create a Hamming window.
window = signal.windows.hamming(51)

β€’ Apply a window to a signal segment.
segment = sine_wave[0:51]
windowed_segment = segment * window


VI. Convolution & Correlation

β€’ Perform linear convolution.
sig1 = np.repeat([0., 1., 0.], 100)
sig2 = np.repeat([0., 1., 1., 0.], 100)
convolved = signal.convolve(sig1, sig2, mode='same')

β€’ Compute cross-correlation.
# Useful for finding delays between signals
correlation = signal.correlate(sig1, sig2, mode='full')

β€’ Compute auto-correlation.
# Useful for finding periodicities in a signal
autocorr = signal.correlate(sine_wave, sine_wave, mode='full')


VII. Time-Frequency Analysis

β€’ Compute and plot a spectrogram.
f, t_spec, Sxx = signal.spectrogram(chirp_signal, fs)
plt.pcolormesh(t_spec, f, Sxx, shading='gouraud')
plt.show()

β€’ Perform Continuous Wavelet Transform (CWT).
widths = np.arange(1, 31)
cwt_matrix = signal.cwt(chirp_signal, signal.ricker, widths)

β€’ Perform Hilbert transform to get the analytic signal.
analytic_signal = signal.hilbert(sine_wave)

β€’ Calculate instantaneous frequency.
instant_phase = np.unwrap(np.angle(analytic_signal))
instant_freq = (np.diff(instant_phase) / (2.0*np.pi) * fs)


VIII. Feature Extraction

β€’ Find peaks in a signal.
peaks, _ = signal.find_peaks(sine_wave, height=0.5)

β€’ Find peaks with prominence criteria.
peaks_prom, _ = signal.find_peaks(noisy_signal, prominence=1)

β€’ Differentiate a signal (e.g., to find velocity from position).
derivative = np.diff(sine_wave)

β€’ Integrate a signal.
from scipy.integrate import cumulative_trapezoid
integral = cumulative_trapezoid(sine_wave, t, initial=0)

β€’ Detrend a signal to remove a linear trend.
trend = np.linspace(0, 1, fs)
trended_signal = sine_wave + trend
detrended = signal.detrend(trended_signal)


IX. System Analysis

β€’ Define a system via a transfer function (numerator, denominator).
# Example: 2nd order low-pass filter
system = signal.TransferFunction([1], [1, 1, 1])

β€’ Compute the step response of a system.
t_step, y_step = signal.step(system)

β€’ Compute the impulse response of a system.
t_impulse, y_impulse = signal.impulse(system)

β€’ Compute the Bode plot of a system's frequency response.
w, mag, phase = signal.bode(system)


X. Signal Generation from Data

β€’ Generate a signal from a function.
t = np.linspace(0, 1, 500)
custom_signal = np.sinc(2 * np.pi * 4 * t)

β€’ Convert a list of values to a signal array.
my_data = [0, 1, 2, 3, 2, 1, 0, -1, -2, -1, 0]
data_signal = np.array(my_data)

β€’ Read signal data from a WAV file.
from scipy.io import wavfile
samplerate, data = wavfile.read('audio.wav')

β€’ Create a pulse train signal.
pulse_train = np.zeros(fs)
pulse_train[::100] = 1 # Impulse every 100 samples


#Python #SignalProcessing #SciPy #NumPy #DSP

━━━━━━━━━━━━━━━
By: @DataScienceM ✨
πŸ’‘ Top 50 Matplotlib Commands in Python

Note: Examples assume the following imports:
import matplotlib.pyplot as plt
import numpy as np

I. Figure & Basic Plots

β€’ Create a figure.
fig = plt.figure(figsize=(8, 6))

β€’ Create a basic line plot.
x = np.linspace(0, 10, 100)
plt.plot(x, np.sin(x))

β€’ Show/display the plot.
plt.show()

β€’ Save a figure to a file.
plt.savefig("my_plot.png", dpi=300)

β€’ Create a scatter plot.
plt.scatter(x, np.cos(x))

β€’ Create a bar chart.
categories = ['A', 'B', 'C']
values = [3, 7, 2]
plt.bar(categories, values)

β€’ Create a horizontal bar chart.
plt.barh(categories, values)

β€’ Create a histogram.
data = np.random.randn(1000)
plt.hist(data, bins=30)

β€’ Create a pie chart.
plt.pie(values, labels=categories, autopct='%1.1f%%')

β€’ Create a box plot.
plt.boxplot([data, data*2])

β€’ Display a 2D array or image.
matrix = np.random.rand(10, 10)
plt.imshow(matrix, cmap='viridis')

β€’ Clear the current figure.
plt.clf()


II. Labels, Titles & Legends

β€’ Add a title to the plot.
plt.title("Sine Wave")

β€’ Add a label to the x-axis.
plt.xlabel("Time (s)")

β€’ Add a label to the y-axis.
plt.ylabel("Amplitude")

β€’ Add a legend.
plt.plot(x, np.sin(x), label='Sine')
plt.plot(x, np.cos(x), label='Cosine')
plt.legend()

β€’ Add a grid.
plt.grid(True)

β€’ Add text to the plot at specific coordinates.
plt.text(2, 0.5, 'An important point')

β€’ Add an annotation with an arrow.
plt.annotate('Peak', xy=(np.pi/2, 1), xytext=(3, 1.5),
arrowprops=dict(facecolor='black', shrink=0.05))


III. Axes & Ticks

β€’ Set the x-axis limits.
plt.xlim(0, 5)

β€’ Set the y-axis limits.
plt.ylim(-1.5, 1.5)

β€’ Set the x-axis ticks and labels.
plt.xticks([0, np.pi, 2*np.pi], ['0', '$\pi$', '$2\pi$'])

β€’ Set the y-axis ticks and labels.
plt.yticks([-1, 0, 1])

β€’ Set a logarithmic scale on an axis.
plt.yscale('log')

β€’ Set the aspect ratio of the plot.
plt.axis('equal') # Other options: 'tight', 'off'


IV. Plot Customization

β€’ Set the color of a plot.
plt.plot(x, np.sin(x), color='red')

β€’ Set the line style.
plt.plot(x, np.sin(x), linestyle='--')

β€’ Set the line width.
plt.plot(x, np.sin(x), linewidth=3)

β€’ Set the marker style for points.
plt.plot(x, np.sin(x), marker='o')

β€’ Set the transparency (alpha).
plt.hist(data, alpha=0.5)

β€’ Use a predefined style.
plt.style.use('ggplot')

β€’ Fill the area between two curves.
plt.fill_between(x, np.sin(x), np.cos(x), alpha=0.2)

β€’ Create an error bar plot.
y_err = 0.2 * np.ones_like(x)
plt.errorbar(x, np.sin(x), yerr=y_err)

β€’ Add a horizontal line.
plt.axhline(y=0, color='k', linestyle='-')

β€’ Add a vertical line.
plt.axvline(x=np.pi, color='k', linestyle='-')

β€’ Add a colorbar for plots like imshow or scatter.
plt.colorbar(label='Magnitude')


V. Subplots (Object-Oriented Approach)

β€’ Create a figure and a grid of subplots (preferred method).
fig, ax = plt.subplots() # Single subplot
fig, axes = plt.subplots(2, 2) # 2x2 grid of subplots

β€’ Plot on a specific subplot (Axes object).
axes[0, 0].plot(x, np.sin(x))

β€’ Set the title for a specific subplot.
axes[0, 0].set_title('Subplot 1')

β€’ Set labels for a specific subplot.
axes[0, 0].set_xlabel('X-axis')
axes[0, 0].set_ylabel('Y-axis')

β€’ Add a legend to a specific subplot.
axes[0, 0].legend(['Sine'])

β€’ Add a main title for the entire figure.
fig.suptitle('Main Figure Title')

β€’ Automatically adjust subplot parameters for a tight layout.
plt.tight_layout()

β€’ Share x or y axes between subplots.
fig, axes = plt.subplots(2, 1, sharex=True)

β€’ Get the current Axes instance.
ax = plt.gca()

β€’ Create a second y-axis that shares the x-axis.
ax2 = ax.twinx()


VI. Specialized Plots

β€’ Create a contour plot.
X, Y = np.meshgrid(x, x)
Z = np.sin(X) * np.cos(Y)
plt.contour(X, Y, Z, levels=10)

β€’ Create a filled contour plot.
plt.contourf(X, Y, Z)

β€’ Create a stream plot for vector fields.
U, V = np.cos(X), np.sin(Y)
plt.streamplot(X, Y, U, V)

β€’ Create a 3D surface plot.
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z)


#Python #Matplotlib #DataVisualization #DataScience #Plotting

━━━━━━━━━━━━━━━
By: @DataScienceM ✨
πŸ“Œ SQL Explained: Normal Forms

πŸ—‚ Category: DATA ENGINEERING

πŸ•’ Date: 2024-05-29 | ⏱️ Read time: 9 min read

Applying 1st, 2nd and 3rd normal forms to a database
πŸ“Œ Simple Ways to Speed Up Your PyTorch Model Training

πŸ—‚ Category: MACHINE LEARNING

πŸ•’ Date: 2024-05-28 | ⏱️ Read time: 12 min read

If all machine learning engineers want one thing, it’s faster model trainingβ€Š-β€Šmaybe after good test…
πŸ“Œ Fine-Tune Smaller Transformer Models: Text Classification

πŸ—‚ Category: MACHINE LEARNING

πŸ•’ Date: 2024-05-28 | ⏱️ Read time: 22 min read

Using Microsoft’s Phi-3 to generate synthetic data
❀1
πŸ“Œ How I Assess the Memory Consumption of My Python Code

πŸ—‚ Category: ARTIFICIAL INTELLIGENCE

πŸ•’ Date: 2024-05-28 | ⏱️ Read time: 6 min read

Different approaches to measure the memory consumption of a variable or a function
πŸ“Œ Scaling Monosemanticity: Anthropic’s One Step Towards Interpretable & Manipulable LLMs

πŸ—‚ Category:

πŸ•’ Date: 2024-05-28 | ⏱️ Read time: 13 min read

From prompt engineering to activation engineering for more controllable and safer LLMs
❀1
πŸ€–πŸ§  LongCat-Video: Meituan’s Groundbreaking Step Toward Efficient Long Video Generation with AI

πŸ—“οΈ 04 Nov 2025
πŸ“š AI News & Trends

In the rapidly advancing field of generative AI, the ability to create realistic, coherent, and high-quality videos from text or images has become one of the most sought-after goals. Meituan, one of the leading technology innovators in China, has made a remarkable stride in this domain with its latest open-source model β€” LongCat-Video. Designed as ...

#LongCatVideo #Meituan #GenerativeAI #VideoGeneration #AIInnovation #OpenSource
πŸ“Œ Introduction to Domain Adaptation- Motivation, Options, Tradeoffs

πŸ—‚ Category:

πŸ•’ Date: 2024-05-28 | ⏱️ Read time: 15 min read

Stepping out of the β€œcomfort zone” – part 1/3 of a deep-dive into domain adaptation…
πŸ”₯1
πŸ’‘ Top 50 Pandas Operations in Python

(Note: Examples assume the import import pandas as pd and import numpy as np)

I. Series & DataFrame Creation

β€’ Create a pandas Series from a list.
s = pd.Series([1, 3, 5, np.nan, 6, 8])

β€’ Create a DataFrame from a dictionary of lists.
data = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}
df = pd.DataFrame(data)

β€’ Create a DataFrame from a list of dictionaries.
data = [{'a': 1, 'b': 2}, {'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)

β€’ Read data from a CSV file.
df = pd.read_csv('my_file.csv')

β€’ Create a date range.
dates = pd.date_range('20230101', periods=6)


II. Data Inspection & Selection

β€’ View the first 5 rows.
df.head()

β€’ View the last 5 rows.
df.tail()

β€’ Get a concise summary of the DataFrame.
df.info()

β€’ Get descriptive statistics for numerical columns.
df.describe()

β€’ Get the dimensions of the DataFrame (rows, columns).
df.shape

β€’ Get the column labels.
df.columns

β€’ Get the index (row labels).
df.index

β€’ Select a single column.
df['col1'] # or df.col1

β€’ Select multiple columns.
df[['col1', 'col2']]

β€’ Select rows by label/index name using .loc.
df.loc[0:2, ['col1']] # Select rows 0,1,2 and column 'col1'

β€’ Select rows by integer position using .iloc.
df.iloc[0:3, 0:1] # Select first 3 rows and first column

β€’ Perform boolean/conditional selection.
df[df['col1'] > 2]

β€’ Filter rows using .isin().
df[df['col1'].isin([1, 3])]


III. Data Cleaning

β€’ Check for missing/null values.
df.isnull().sum() # Returns a Series with counts of nulls per column

β€’ Drop rows with any missing values.
df.dropna()

β€’ Fill missing values with a specific value.
df.fillna(value=0)

β€’ Check for duplicated rows.
df.duplicated()

β€’ Drop duplicated rows.
df.drop_duplicates(inplace=True)


IV. Data Manipulation & Operations

β€’ Drop specified labels (columns or rows).
df.drop('col1', axis=1) # Drop a column

β€’ Rename columns.
df.rename(columns={'col1': 'new_col1_name'})

β€’ Set a column as the index.
df.set_index('col1')

β€’ Reset the index.
df.reset_index(drop=True)

β€’ Apply a function along an axis (e.g., per column).
df.apply(np.cumsum)

β€’ Apply a function element-wise to a Series.
df['col1'].map(lambda x: x*100)

β€’ Sort by values in a column.
df.sort_values(by='col1', ascending=False)

β€’ Sort by index.
df.sort_index(axis=1, ascending=False)

β€’ Change the data type of a column.
df['col1'].astype('float')

β€’ Create a new column based on a calculation.
df['new_col'] = df['col1'] * 2


V. Grouping & Aggregation
πŸ”₯1
β€’ Group data by a column.
df.groupby('col1')

β€’ Group by a column and get the sum.
df.groupby('col1').sum()

β€’ Apply multiple aggregation functions at once.
df.groupby('col1').agg(['mean', 'count'])

β€’ Get the size of each group.
df.groupby('col1').size()

β€’ Get the frequency counts of unique values in a Series.
df['col1'].value_counts()

β€’ Create a pivot table.
pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])


VI. Merging, Joining & Concatenating

β€’ Merge two DataFrames (like a SQL join).
pd.merge(left_df, right_df, on='key_column')

β€’ Concatenate (stack) DataFrames along an axis.
pd.concat([df1, df2]) # Stacks rows

β€’ Join DataFrames on their indexes.
left_df.join(right_df, how='outer')


VII. Input & Output

β€’ Write a DataFrame to a CSV file.
df.to_csv('output.csv', index=False)

β€’ Write a DataFrame to an Excel file.
df.to_excel('output.xlsx', sheet_name='Sheet1')

β€’ Read data from an Excel file.
pd.read_excel('input.xlsx', sheet_name='Sheet1')

β€’ Read from a SQL database.
pd.read_sql_query('SELECT * FROM my_table', connection_object)


VIII. Time Series & Special Operations

β€’ Use the string accessor (.str) for Series operations.
s.str.lower()
s.str.contains('pattern')

β€’ Use the datetime accessor (.dt) for Series operations.
s.dt.year
s.dt.day_name()

β€’ Create a rolling window calculation.
df['col1'].rolling(window=3).mean()

β€’ Create a basic plot from a Series or DataFrame.
df['col1'].plot(kind='hist')


#Python #Pandas #DataAnalysis #DataScience #Programming

━━━━━━━━━━━━━━━
By: @DataScienceM ✨
❀6πŸ‘1πŸ”₯1