Python | Machine Learning | Coding | R
63.7K subscribers
1.14K photos
70 videos
144 files
793 links
Help and ads: @hussein_sheikho

Discover powerful insights with Python, Machine Learning, Coding, and R—your essential toolkit for data-driven solutions, smart alg

List of our channels:
https://t.me/addlist/8_rRW2scgfRhOTc0

https://telega.io/?r=nikapsOH
Download Telegram
Python | Machine Learning | Coding | R
Photo
# 📚 Python Tutorial: Convert EPUB to PDF (Preserving Images)
#Python #EPUB #PDF #EbookConversion #Automation

This comprehensive guide will show you how to convert EPUB files (including those with images) to high-quality PDFs using Python.

---

## 🔹 Required Tools & Libraries
We'll use these Python packages:
- ebooklib - For EPUB parsing
- pdfkit (wrapper for wkhtmltopdf) - For PDF generation
- Pillow - For image handling (optional)

pip install ebooklib pdfkit pillow


Also install system dependencies:
# On Ubuntu/Debian
sudo apt-get install wkhtmltopdf

# On MacOS
brew install wkhtmltopdf

# On Windows (download from wkhtmltopdf.org)


---

## 🔹 Step 1: Extract EPUB Contents
First, we'll unpack the EPUB file to access its HTML and images.

from ebooklib import epub
from bs4 import BeautifulSoup
import os

def extract_epub(epub_path, output_dir):
book = epub.read_epub(epub_path)

# Create output directory
os.makedirs(output_dir, exist_ok=True)

# Extract all items (chapters, images, styles)
for item in book.get_items():
if item.get_type() == epub.ITEM_IMAGE:
# Save images
with open(os.path.join(output_dir, item.get_name()), 'wb') as f:
f.write(item.get_content())
elif item.get_type() == epub.ITEM_DOCUMENT:
# Save HTML chapters
with open(os.path.join(output_dir, item.get_name()), 'wb') as f:
f.write(item.get_content())

return [item.get_name() for item in book.get_items() if item.get_type() == epub.ITEM_DOCUMENT]


---

## 🔹 Step 2: Convert HTML to PDF
Now we'll convert the extracted HTML files to PDF while preserving images.

import pdfkit
from PIL import Image # For image validation (optional)

def html_to_pdf(html_files, output_pdf, base_dir):
options = {
'encoding': "UTF-8",
'quiet': '',
'enable-local-file-access': '', # Critical for local images
'no-outline': None,
'margin-top': '15mm',
'margin-right': '15mm',
'margin-bottom': '15mm',
'margin-left': '15mm',
}

# Validate images (optional)
for html_file in html_files:
soup = BeautifulSoup(open(os.path.join(base_dir, html_file)), 'html.parser')
for img in soup.find_all('img'):
img_path = os.path.join(base_dir, img['src'])
try:
Image.open(img_path) # Validate image
except Exception as e:
print(f"Image error in {html_file}: {e}")
img.decompose() # Remove broken images

# Convert to PDF
pdfkit.from_file(
[os.path.join(base_dir, f) for f in html_files],
output_pdf,
options=options
)


---

## 🔹 Step 3: Complete Conversion Function
Combine everything into a single workflow.

def epub_to_pdf(epub_path, output_pdf, temp_dir="temp_epub"):
try:
print(f"Converting {epub_path} to PDF...")

# Step 1: Extract EPUB
print("Extracting EPUB contents...")
html_files = extract_epub(epub_path, temp_dir)

# Step 2: Convert to PDF
print("Generating PDF...")
html_to_pdf(html_files, output_pdf, temp_dir)

print(f"Success! PDF saved to {output_pdf}")
return True

except Exception as e:
print(f"Conversion failed: {str(e)}")
return False
finally:
# Clean up temporary files
if os.path.exists(temp_dir):
import shutil
shutil.rmtree(temp_dir)


---

## 🔹 Advanced Options
### 1. Custom Styling
Add CSS to improve PDF appearance:

def html_to_pdf(html_files, output_pdf, base_dir):
options = {
# ... previous options ...
'user-style-sheet': 'styles.css', # Custom CSS
}

# Create CSS file if needed
css = """
body { font-family: "Times New Roman", serif; font-size: 12pt; }
img { max-width: 100%; height: auto; }
"""
with open(os.path.join(base_dir, 'styles.css'), 'w') as f:
f.write(css)

pdfkit.from_file(/* ... */)
4🔥2🎉1