Python | Machine Learning | Coding

Python | Machine Learning | Coding | R

Topic: Handling Datasets of All Types – Part 1 of 5: Introduction and Basic Concepts

---

1. What is a Dataset?

• A dataset is a structured collection of data, usually organized in rows and columns, used for analysis or training machine learning models.

---

2. Types of Datasets

• Structured Data: Tables, spreadsheets with rows and columns (e.g., CSV, Excel).

• Unstructured Data: Images, text, audio, video.

• Semi-structured Data: JSON, XML files containing hierarchical data.

---

3. Common Dataset Formats

• CSV (Comma-Separated Values)

• Excel (.xls, .xlsx)

• JSON (JavaScript Object Notation)

• XML (eXtensible Markup Language)

• Images (JPEG, PNG, TIFF)

• Audio (WAV, MP3)

---

4. Loading Datasets in Python

• Use libraries like pandas for structured data:

import pandas as pd
df = pd.read_csv('data.csv')

• Use libraries like json for JSON files:

import json
with open('data.json') as f:
    data = json.load(f)

---

5. Basic Dataset Exploration

• Check shape and size:

print(df.shape)

• Preview data:

print(df.head())

• Check for missing values:

print(df.isnull().sum())

---

6. Summary

• Understanding dataset types is crucial before processing.

• Loading and exploring datasets helps identify cleaning and preprocessing needs.

---

Exercise

• Load a CSV and JSON dataset in Python, print their shapes, and identify missing values.

---

#DataScience #Datasets #DataLoading #Python #DataExploration

The rest of the parts 👇
https://t.me/DataScienceM

🌟

Please open Telegram to view this post

VIEW IN TELEGRAM

❤27👍1

8.15K viewsedited 12:11

Python | Machine Learning | Coding | R

Topic: Python Script to Convert a Shared ChatGPT Link to PDF – Step-by-Step Guide

---

### Objective

In this lesson, we’ll build a Python script that:

• Takes a ChatGPT share link (e.g., https://chat.openai.com/share/abc123)
• Downloads the HTML content of the chat
• Converts it to a PDF file using pdfkit and wkhtmltopdf

This is useful for archiving, sharing, or printing ChatGPT conversations in a clean format.

---

### 1. Prerequisites

Before starting, you need the following libraries and tools:

#### • Install pdfkit and requests

pip install pdfkit requests

#### • Install wkhtmltopdf

Download from:
https://wkhtmltopdf.org/downloads.html

Make sure to add the path of the installed binary to your system PATH.

---

### 2. Python Script: Convert Shared ChatGPT URL to PDF

import pdfkit
import requests
import os

# Define output filename
output_file = "chatgpt_conversation.pdf"

# ChatGPT shared URL (user input)
chat_url = input("Enter the ChatGPT share URL: ").strip()

# Verify the URL format
if not chat_url.startswith("https://chat.openai.com/share/"):
    print("Invalid URL. Must start with https://chat.openai.com/share/")
    exit()

try:
    # Download HTML content
    response = requests.get(chat_url)
    if response.status_code != 200:
        raise Exception(f"Failed to load the chat: {response.status_code}")

    html_content = response.text

    # Save HTML to temporary file
    with open("temp_chat.html", "w", encoding="utf-8") as f:
        f.write(html_content)

    # Convert HTML to PDF
    pdfkit.from_file("temp_chat.html", output_file)

    print(f"\n✅ PDF saved as: {output_file}")

    # Optional: remove temp file
    os.remove("temp_chat.html")

except Exception as e:
    print(f"❌ Error: {e}")

---

### 3. Notes

• This approach works only if the shared page is publicly accessible (which ChatGPT share links are).
• The PDF output will contain the web page version, including theme and layout.
• You can customize the PDF output using pdfkit options (like page size, margins, etc.).

---

### 4. Optional Enhancements

• Add GUI with Tkinter
• Accept multiple URLs
• Add PDF metadata (title, author, etc.)
• Add support for offline rendering using BeautifulSoup to clean content

---

### Exercise

• Try converting multiple ChatGPT share links to PDF
• Customize the styling with your own CSS
• Add a timestamp or watermark to the PDF

---

#Python #ChatGPT #PDF #WebScraping #Automation #pdfkit #tkinter

https://t.me/CodeProgrammer

✅

Please open Telegram to view this post

VIEW IN TELEGRAM

❤25💯1

7.9K viewsedited 06:08

Python | Machine Learning | Coding | R

Photo

# 📚 Python Tutorial: Convert EPUB to PDF (Preserving Images)
#Python #EPUB #PDF #EbookConversion #Automation

This comprehensive guide will show you how to convert EPUB files (including those with images) to high-quality PDFs using Python.

---

## 🔹 Required Tools & Libraries
We'll use these Python packages:
- ebooklib - For EPUB parsing
- pdfkit (wrapper for wkhtmltopdf) - For PDF generation
- Pillow - For image handling (optional)

pip install ebooklib pdfkit pillow

Also install system dependencies:

# On Ubuntu/Debian
sudo apt-get install wkhtmltopdf

# On MacOS
brew install wkhtmltopdf

# On Windows (download from wkhtmltopdf.org)

---

## 🔹 Step 1: Extract EPUB Contents
First, we'll unpack the EPUB file to access its HTML and images.

from ebooklib import epub
from bs4 import BeautifulSoup
import os

def extract_epub(epub_path, output_dir):
    book = epub.read_epub(epub_path)
    
    # Create output directory
    os.makedirs(output_dir, exist_ok=True)
    
    # Extract all items (chapters, images, styles)
    for item in book.get_items():
        if item.get_type() == epub.ITEM_IMAGE:
            # Save images
            with open(os.path.join(output_dir, item.get_name()), 'wb') as f:
                f.write(item.get_content())
        elif item.get_type() == epub.ITEM_DOCUMENT:
            # Save HTML chapters
            with open(os.path.join(output_dir, item.get_name()), 'wb') as f:
                f.write(item.get_content())
    
    return [item.get_name() for item in book.get_items() if item.get_type() == epub.ITEM_DOCUMENT]

---

## 🔹 Step 2: Convert HTML to PDF
Now we'll convert the extracted HTML files to PDF while preserving images.

import pdfkit
from PIL import Image  # For image validation (optional)

def html_to_pdf(html_files, output_pdf, base_dir):
    options = {
        'encoding': "UTF-8",
        'quiet': '',
        'enable-local-file-access': '',  # Critical for local images
        'no-outline': None,
        'margin-top': '15mm',
        'margin-right': '15mm',
        'margin-bottom': '15mm',
        'margin-left': '15mm',
    }
    
    # Validate images (optional)
    for html_file in html_files:
        soup = BeautifulSoup(open(os.path.join(base_dir, html_file)), 'html.parser')
        for img in soup.find_all('img'):
            img_path = os.path.join(base_dir, img['src'])
            try:
                Image.open(img_path)  # Validate image
            except Exception as e:
                print(f"Image error in {html_file}: {e}")
                img.decompose()  # Remove broken images
    
    # Convert to PDF
    pdfkit.from_file(
        [os.path.join(base_dir, f) for f in html_files],
        output_pdf,
        options=options
    )

---

## 🔹 Step 3: Complete Conversion Function
Combine everything into a single workflow.

def epub_to_pdf(epub_path, output_pdf, temp_dir="temp_epub"):
    try:
        print(f"Converting {epub_path} to PDF...")
        
        # Step 1: Extract EPUB
        print("Extracting EPUB contents...")
        html_files = extract_epub(epub_path, temp_dir)
        
        # Step 2: Convert to PDF
        print("Generating PDF...")
        html_to_pdf(html_files, output_pdf, temp_dir)
        
        print(f"Success! PDF saved to {output_pdf}")
        return True
    
    except Exception as e:
        print(f"Conversion failed: {str(e)}")
        return False
    finally:
        # Clean up temporary files
        if os.path.exists(temp_dir):
            import shutil
            shutil.rmtree(temp_dir)

---

## 🔹 Advanced Options
### 1. Custom Styling
Add CSS to improve PDF appearance:

def html_to_pdf(html_files, output_pdf, base_dir):
    options = {
        # ... previous options ...
        'user-style-sheet': 'styles.css',  # Custom CSS
    }
    
    # Create CSS file if needed
    css = """
    body { font-family: "Times New Roman", serif; font-size: 12pt; }
    img { max-width: 100%; height: auto; }
    """
    with open(os.path.join(base_dir, 'styles.css'), 'w') as f:
        f.write(css)
    
    pdfkit.from_file(/* ... */)

❤10🔥2🎉1

4.97K views10:48

Python | Machine Learning | Coding | R

📚 JaidedAI/EasyOCR — an open-source Python library for Optical Character Recognition (OCR) that's easy to use and supports over 80 languages out of the box.

### 🔍 Key Features:

🔸 Extracts text from images and scanned documents — including handwritten notes and unusual fonts
🔸 Supports a wide range of languages like English, Russian, Chinese, Arabic, and more
🔸 Built on PyTorch — uses modern deep learning models (not the old-school Tesseract)
🔸 Simple to integrate into your Python projects

### ✅ Example Usage:

import easyocr

reader = easyocr.Reader(['en', 'ru'])  # Choose supported languages
result = reader.readtext('image.png')

### 📌 Ideal For:

✅ Text extraction from photos, scans, and documents
✅ Embedding OCR capabilities in apps (e.g. automated data entry)

🔗 GitHub: https://github.com/JaidedAI/EasyOCR

👉 Follow us for more: @DataScienceN

#Python #OCR #MachineLearning #ComputerVision #EasyOCR

❤3👎1🎉1

5.58K views06:39

Python | Machine Learning | Coding | R

0:38

This media is not supported in your browser

VIEW IN TELEGRAM

🧹

ObjectClear — an AI-powered tool for removing objects from images effortlessly.

⚙️ What It Can Do:

🖼️ Upload any image
🎯 Select the object you want to remove
🌟 The model automatically erases the object and intelligently reconstructs the background

⚡️ Under the Hood:

— Uses Segment Anything (SAM) by Meta for object segmentation
— Leverages Inpaint-Anything for realistic background generation
— Works in your browser with an intuitive Gradio UI

✔️ Fully open-source and can be run locally.

📎 GitHub: https://github.com/zjx0101/ObjectClear

#AI #ImageEditing #ComputerVision #Gradio #OpenSource #Python

✉️ Our Telegram channels: https://t.me/addlist/0f6vfFbEMdAwODBk

📱 Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

Please open Telegram to view this post

VIEW IN TELEGRAM

❤10

6.04K viewsedited 04:27

Python | Machine Learning | Coding | R

🚀 Comprehensive Tutorial: Build a Folder Monitoring & Intruder Detection System in Python

In this comprehensive, step-by-step tutorial, you will learn how to build a real-time folder monitoring and intruder detection system using Python.

🔐 Your Goal:
Create a background program that:
- Monitors a specific folder on your computer.
- Instantly captures a photo using the webcam whenever someone opens that folder.
- Saves the photo with a timestamp in a secure folder.
- Runs automatically when Windows starts.
- Keeps running until you manually stop it (e.g., via Task Manager or a hotkey).

Read and get code: https://hackmd.io/@husseinsheikho/Build-a-Folder-Monitoring

#Python #Security #FolderMonitoring #IntruderDetection #OpenCV #FaceCapture #Automation #Windows #TaskScheduler #ComputerVision

✉️ Our Telegram channels: https://t.me/addlist/0f6vfFbEMdAwODBk

📱 Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

Please open Telegram to view this post

VIEW IN TELEGRAM

❤7🔥1🎉1

6.04K viewsedited 05:41

Python | Machine Learning | Coding | R

🚀 Comprehensive Guide: How to Prepare for an Image Processing Job Interview – 500 Most Common Interview Questions

Let's start: https://hackmd.io/@husseinsheikho/IP

#ImageProcessing #ComputerVision #OpenCV #Python #InterviewPrep #DigitalImageProcessing #MachineLearning #AI #SignalProcessing #ComputerGraphics

✉️ Our Telegram channels: https://t.me/addlist/0f6vfFbEMdAwODBk

📱 Our WhatsApp channel: https://whatsapp.com/channel/0029VaC7Weq29753hpcggW2A

Please open Telegram to view this post

VIEW IN TELEGRAM

❤4👎1🔥1

5.26K views09:59

Python | Machine Learning | Coding | R

0:09

This media is not supported in your browser

VIEW IN TELEGRAM

🥇

This repo is like gold for every data scientist!

✅ Just open your browser; a ton of interactive exercises and real experiences await you. Any question about statistics, probability, Python, or machine learning, you'll get the answer right there! With code, charts, even animations. This way, you don't waste time, and what you learn really sticks in your mind!

⬅️ Data science statistics and probability topics
⬅️ Clustering
⬅️ Principal Component Analysis (PCA)
⬅️ Bagging and Boosting techniques
⬅️ Linear regression
⬅️ Neural networks and more...

┌ 📂 Int Data Science Python Dash
└ 🐱 GitHub-Repos

👉

@codeprogrammer

#Python #OpenCV #Automation #ML #AI #DEEPLEARNING #MACHINELEARNING #ComputerVision

Please open Telegram to view this post

VIEW IN TELEGRAM

❤9👍4💯1🏆1

8.2K viewsedited 20:18

Python | Machine Learning | Coding | R

python-docx: Create and Modify Word Documents #python

python-docx is a Python library for reading, creating, and updating Microsoft Word 2007+ (.docx) files.

Installation

pip install python-docx

Example

from docx import Document

document = Document()
document.add_paragraph("It was a dark and stormy night.")
<docx.text.paragraph.Paragraph object at 0x10f19e760>
document.save("dark-and-stormy.docx")

document = Document("dark-and-stormy.docx")
document.paragraphs[0].text
'It was a dark and stormy night.'

https://t.me/DataScienceN 🚗

❤10🔥1

7.07K views19:09

Python | Machine Learning | Coding | R

Forwarded from Python | Algorithms | Data Structures | Cyber Security | Networks

✨ Download a Free Python Cheat Sheet ✨

📖 Download a free Python 3 cheat sheet PDF put together by the Real Python team.

🏷️ #Python

❤15💯2

6.03K views19:33

📚 Read & Learn

🚀 Explore Data Science

Python | Machine Learning | Coding | R

🚀 2025 FREE Study Recourses from SPOTO for y’all — Don’t Miss Out!
✅ 100% Free Downloads
✅ No signup / spam

📘 #Python, Cybersecurity & Excel: https://bit.ly/4lYeVYp
📊 #Cloud Computing: https://bit.ly/45Rj1gm
☁️ #AI Kits: https://bit.ly/4m4bHTc
🔐 #CCNA Courses: https://bit.ly/45TL7rm
🧠 Free Online Practice – Test Now: https://bit.ly/41Kurjr

September 8th to 21th, SPOTO launches the Lowest Price Ever on ALL products! 🔥
Amazing Discounts for 📌 CCNA 200-301 📌 CCNP 400-007 and more…
📲 Contact admin to grab them: https://wa.link/uxde01

❤4

19.4K views14:35

Python | Machine Learning | Coding | R

Forwarded from Python | Algorithms | Data Structures | Cyber Security | Networks

✨ Python Cheat Sheet ✨

📖 Compact Python cheat sheet covering setup, syntax, data types, variables, strings, control flow, functions, classes, errors, and I/O.

🏷️ #Python

❤4

4.56K views18:17

📚 Read & Learn

🚀 Explore Data Science

Python | Machine Learning | Coding | R

6:02

Media is too big

VIEW IN TELEGRAM

Released Real-Time Voice Cloning utility

Which clones speech and reproduces any phrases with your intonation in just a few seconds of recording.

It runs on #Python, generates speech in real time, and is completely local, no clouds or restrictions. 🫠

🌟 GitHub: https://github.com/CorentinJ/Real-Time-Voice-Cloning

👉

https://t.me/CodeProgrammer

Please open Telegram to view this post

VIEW IN TELEGRAM

❤8

3.7K viewsedited 06:36

About

Blog

Apps

Platform