Machine Learning

20x faster KMeans with Faiss!!

#KMeans uses a slow, exhaustive search to find the nearest centroids.

#Faiss uses "Inverted Index"—an optimized data structure to store and index data points for approximate neighbor search.

#MachineLearning #DeepLearning #BigData #Datascience #ML #HealthTech #DataVisualization #ArtificialInteligence #SoftwareEngineering #GenAI #deeplearning #ChatGPT #OpenAI #python #AI #keras

https://t.me/DataScienceM

👍6❤2🔥1

4.71K viewsedited 07:22

Machine Learning

💡 Python: Simple K-Means Clustering Project

K-Means is a popular unsupervised machine learning algorithm used to partition n observations into k clusters, where each observation belongs to the cluster with the nearest mean (centroid). This simple project demonstrates K-Means on the classic Iris dataset using scikit-learn to group similar flower species based on their measurements.

import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import numpy as np

# 1. Load the Iris dataset
iris = load_iris()
X = iris.data # Features (sepal length, sepal width, petal length, petal width)
y = iris.target # True labels (0, 1, 2 for different species) - not used by KMeans

# 2. (Optional but recommended) Scale the features
# K-Means is sensitive to the scale of features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# 3. Define and train the K-Means model
# We know there are 3 species in Iris, so we set n_clusters=3
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10) # n_init is important for robust results
kmeans.fit(X_scaled)

# 4. Get the cluster assignments for each data point
labels = kmeans.labels_

# 5. Get the coordinates of the cluster centroids
centroids = kmeans.cluster_centers_

# 6. Visualize the clusters (using first two features for simplicity)
plt.figure(figsize=(8, 6))

# Plot each cluster
colors = ['red', 'green', 'blue']
for i in range(3):
    plt.scatter(X_scaled[labels == i, 0], X_scaled[labels == i, 1],
                s=50, c=colors[i], label=f'Cluster {i+1}', alpha=0.7)

# Plot the centroids
plt.scatter(centroids[:, 0], centroids[:, 1],
            s=200, marker='X', c='black', label='Centroids', edgecolor='white')

plt.title('K-Means Clustering on Iris Dataset (Scaled Features)')
plt.xlabel('Scaled Sepal Length')
plt.ylabel('Scaled Sepal Width')
plt.legend()
plt.grid(True)
plt.show()

# You can also compare with true labels (for evaluation, not part of clustering process itself)
# print("True labels:", y)
# print("K-Means labels:", labels)

Code explanation: This script loads the Iris dataset, scales its features using StandardScaler, and then applies KMeans to group the data into 3 clusters. It visualizes the resulting clusters and their centroids using a scatter plot with the first two scaled features.

#Python #MachineLearning #KMeans #Clustering #DataScience

━━━━━━━━━━━━━━━
By: @DataScienceM ✨

745 views06:10

Machine Learning

📌 The Machine Learning “Advent Calendar” Day 4: k-Means in Excel

🗂 Category: MACHINE LEARNING

🕒 Date: 2025-12-04 | ⏱️ Read time: 7 min read

Discover how to implement the k-Means clustering algorithm, a fundamental machine learning technique, using only Microsoft Excel. This guide, part of a "Machine Learning Advent Calendar" series, walks through building a training algorithm from scratch in a familiar spreadsheet environment, demystifying what "real" ML looks like in practice.

#MachineLearning #kMeans #Excel #DataScience #Tutorial

❤2

1.09K views18:41

📖 Read and Learn

🧪 Explore Data Science

About

Blog

Apps

Platform