#PDF #EPUB #TelegramBot #Python #SQLite #Project
Lesson: Building a PDF <> EPUB Telegram Converter Bot
This lesson walks you through creating a fully functional Telegram bot from scratch. The bot will accept PDF or EPUB files, convert them to the other format, and log each transaction in an SQLite database.
---
Part 1: Prerequisites & Setup
First, we need to install the necessary Python library for the Telegram Bot API. We will also rely on Calibre's command-line tools for conversion.
Important: You must install Calibre on the system where the bot will run and ensure its
#Setup #Prerequisites
---
Part 2: Database Initialization
We'll use SQLite to log every successful conversion. Create a file named
#Database #SQLite #Initialization
---
Part 3: The Main Bot Script - Imports & Basic Commands
Now, let's create our main bot file,
#TelegramBot #Python #Boilerplate
---
Part 4: The Core Conversion Logic
This function will be the heart of our bot. It uses the
Lesson: Building a PDF <> EPUB Telegram Converter Bot
This lesson walks you through creating a fully functional Telegram bot from scratch. The bot will accept PDF or EPUB files, convert them to the other format, and log each transaction in an SQLite database.
---
Part 1: Prerequisites & Setup
First, we need to install the necessary Python library for the Telegram Bot API. We will also rely on Calibre's command-line tools for conversion.
Important: You must install Calibre on the system where the bot will run and ensure its
ebook-convert tool is in your system's PATH.pip install python-telegram-bot==20.3#Setup #Prerequisites
---
Part 2: Database Initialization
We'll use SQLite to log every successful conversion. Create a file named
database_setup.py and run it once to create the database file and the table.# database_setup.py
import sqlite3
def setup_database():
conn = sqlite3.connect('conversions.db')
cursor = conn.cursor()
# Create table to store conversion logs
cursor.execute('''
CREATE TABLE IF NOT EXISTS conversions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_id INTEGER NOT NULL,
original_filename TEXT NOT NULL,
converted_filename TEXT NOT NULL,
conversion_type TEXT NOT NULL,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
)
''')
conn.commit()
conn.close()
print("Database setup complete. 'conversions.db' is ready.")
if __name__ == '__main__':
setup_database()
#Database #SQLite #Initialization
---
Part 3: The Main Bot Script - Imports & Basic Commands
Now, let's create our main bot file,
converter_bot.py. We'll start with imports and the initial /start and /help commands.# converter_bot.py
import logging
import os
import sqlite3
import subprocess
from telegram import Update
from telegram.ext import Application, CommandHandler, MessageHandler, filters, ContextTypes
# Enable logging
logging.basicConfig(format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', level=logging.INFO)
# --- Bot Token ---
TELEGRAM_TOKEN = "YOUR_TELEGRAM_BOT_TOKEN"
# --- Command Handlers ---
async def start(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
user = update.effective_user
await update.message.reply_html(
rf"Hi {user.mention_html()}! Send me a PDF or EPUB file to convert.",
)
async def help_command(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
await update.message.reply_text("Simply send a .pdf file to get an .epub, or send an .epub file to get a .pdf. Note: Conversion quality depends on the source file's structure.")
#TelegramBot #Python #Boilerplate
---
Part 4: The Core Conversion Logic
This function will be the heart of our bot. It uses the
ebook-convert command-line tool (from Calibre) to perform the conversion. It's crucial that Calibre is installed correctly for this to work.❤1
# converter_bot.py (continued)
def run_conversion(input_path: str, output_path: str) -> bool:
"""Runs the ebook-convert command and returns True on success."""
try:
command = ['ebook-convert', input_path, output_path]
result = subprocess.run(command, check=True, capture_output=True, text=True)
logging.info(f"Calibre output: {result.stdout}")
return True
except FileNotFoundError:
logging.error("CRITICAL: 'ebook-convert' command not found. Is Calibre installed and in the system's PATH?")
return False
except subprocess.CalledProcessError as e:
logging.error(f"Conversion failed for {input_path}. Error: {e.stderr}")
return False
#Conversion #Calibre #Subprocess
---
Part 5: Database Logging Function
This helper function will connect to our SQLite database and insert a new record for each successful conversion.
# converter_bot.py (continued)
def log_to_db(user_id: int, original_file: str, converted_file: str, conv_type: str):
"""Logs a successful conversion to the SQLite database."""
try:
conn = sqlite3.connect('conversions.db')
cursor = conn.cursor()
cursor.execute(
"INSERT INTO conversions (user_id, original_filename, converted_filename, conversion_type) VALUES (?, ?, ?, ?)",
(user_id, original_file, converted_file, conv_type)
)
conn.commit()
conn.close()
except sqlite3.Error as e:
logging.error(f"Database error: {e}")
#Database #Logging #SQLite
---
Part 6: Handling Incoming Files
This is the main handler that will be triggered when a user sends a document. It downloads the file, determines the target format, calls the conversion function, sends the result back, logs it, and cleans up.
# converter_bot.py (continued)
async def handle_document(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
doc = update.message.document
file_id = doc.file_id
file_name = doc.file_name
input_path = os.path.join("downloads", file_name)
os.makedirs("downloads", exist_ok=True) # Ensure download directory exists
new_file = await context.bot.get_file(file_id)
await new_file.download_to_drive(input_path)
await update.message.reply_text(f"Received '{file_name}'. Starting conversion...")
output_path = ""
conversion_type = ""
if file_name.lower().endswith('.pdf'):
output_path = input_path.rsplit('.', 1)[0] + '.epub'
conversion_type = "PDF -> EPUB"
elif file_name.lower().endswith('.epub'):
output_path = input_path.rsplit('.', 1)[0] + '.pdf'
conversion_type = "EPUB -> PDF"
else:
await update.message.reply_text("Sorry, I only support PDF and EPUB files.")
os.remove(input_path)
return
# Run the conversion
success = run_conversion(input_path, output_path)
if success and os.path.exists(output_path):
await update.message.reply_text("Conversion successful! Uploading your file...")
await context.bot.send_document(chat_id=update.effective_chat.id, document=open(output_path, 'rb'))
# Log to database
log_to_db(update.effective_user.id, file_name, os.path.basename(output_path), conversion_type)
else:
await update.message.reply_text("An error occurred during conversion. Please check the file and try again. The file might be corrupted or protected.")
# Cleanup
if os.path.exists(input_path):
os.remove(input_path)
if os.path.exists(output_path):
os.remove(output_path)
#FileHandler #BotLogic
---
Part 7: Main Execution Block
Finally, this block sets up the application, registers all our handlers, and starts the bot. This code goes at the end of
#Main #Execution #RunBot
---
Part 8: Results & Discussion
To Run:
• Run
• Replace
• Run
• Send a PDF or EPUB file to your bot on Telegram.
Expected Results:
• The bot will acknowledge the file.
• After a short processing time, it will send back the converted file.
• A new entry will be added to the
Viewing the Database:
You can inspect the
Discussion & Limitations:
• Dependency: The bot is entirely dependent on a local installation of Calibre. This makes it hard to deploy on simple hosting services. A Docker-based deployment would be a good solution.
• Conversion Quality: Converting from PDF, especially those with complex layouts, images, and columns, can result in poor EPUB formatting. This is a fundamental limitation of PDF-to-EPUB conversion, not just a flaw in the bot.
• Synchronous Processing: The bot handles one file at a time. If two users send files simultaneously, one has to wait. For a larger scale, a task queue system (like Celery with Redis) would be necessary to handle conversions asynchronously in the background.
• Error Handling: The current error messaging is generic. Advanced versions could parse Calibre's error output to give users more specific feedback (e.g., "This PDF is password-protected").
#Results #Discussion #Limitations #Scalability
━━━━━━━━━━━━━━━
By: @CodeProgrammer ✨
Finally, this block sets up the application, registers all our handlers, and starts the bot. This code goes at the end of
converter_bot.py.# converter_bot.py (continued)
def main() -> None:
"""Start the bot."""
application = Application.builder().token(TELEGRAM_TOKEN).build()
# Register handlers
application.add_handler(CommandHandler("start", start))
application.add_handler(CommandHandler("help", help_command))
application.add_handler(MessageHandler(filters.Document.ALL, handle_document))
# Run the bot until the user presses Ctrl-C
print("Bot is running...")
application.run_polling()
if __name__ == '__main__':
main()
#Main #Execution #RunBot
---
Part 8: Results & Discussion
To Run:
• Run
python database_setup.py once.• Replace
"YOUR_TELEGRAM_BOT_TOKEN" in converter_bot.py with your actual token from BotFather.• Run
python converter_bot.py.• Send a PDF or EPUB file to your bot on Telegram.
Expected Results:
• The bot will acknowledge the file.
• After a short processing time, it will send back the converted file.
• A new entry will be added to the
conversions.db file.Viewing the Database:
You can inspect the
conversions.db file using a tool like "DB Browser for SQLite" or the command line:sqlite3 conversions.db "SELECT * FROM conversions;"Discussion & Limitations:
• Dependency: The bot is entirely dependent on a local installation of Calibre. This makes it hard to deploy on simple hosting services. A Docker-based deployment would be a good solution.
• Conversion Quality: Converting from PDF, especially those with complex layouts, images, and columns, can result in poor EPUB formatting. This is a fundamental limitation of PDF-to-EPUB conversion, not just a flaw in the bot.
• Synchronous Processing: The bot handles one file at a time. If two users send files simultaneously, one has to wait. For a larger scale, a task queue system (like Celery with Redis) would be necessary to handle conversions asynchronously in the background.
• Error Handling: The current error messaging is generic. Advanced versions could parse Calibre's error output to give users more specific feedback (e.g., "This PDF is password-protected").
#Results #Discussion #Limitations #Scalability
━━━━━━━━━━━━━━━
By: @CodeProgrammer ✨
❤8👍1
Please open Telegram to view this post
VIEW IN TELEGRAM
❤3
Top 100 Data Analysis Commands & Functions
#DataAnalysis #Pandas #DataLoading #Inspection
Part 1: Pandas - Data Loading & Inspection
#1.
Reads a comma-separated values (csv) file into a Pandas DataFrame.
#2.
Returns the first n rows of the DataFrame (default is 5).
#3.
Returns the last n rows of theDataFrame (default is 5).
#4.
Prints a concise summary of a DataFrame, including data types and non-null values.
#5.
Generates descriptive statistics for numerical columns.
#6.
Returns a tuple representing the dimensionality (rows, columns) of the DataFrame.
#7.
Returns the column labels of the DataFrame.
#8.
Returns the data types of each column.
#9.
Returns a Series containing counts of unique values in a column.
#10.
Returns an array of the unique values in a column.
#11.
Returns the number of unique values in a column.
#DataAnalysis #Pandas #DataLoading #Inspection
Part 1: Pandas - Data Loading & Inspection
#1.
pd.read_csv()Reads a comma-separated values (csv) file into a Pandas DataFrame.
import pandas as pd
from io import StringIO
csv_data = "col1,col2,col3\n1,a,True\n2,b,False"
df = pd.read_csv(StringIO(csv_data))
print(df)
col1 col2 col3
0 1 a True
1 2 b False
#2.
df.head()Returns the first n rows of the DataFrame (default is 5).
import pandas as pd
df = pd.DataFrame({'A': range(10), 'B': list('abcdefghij')})
print(df.head(3))
A B
0 0 a
1 1 b
2 2 c
#3.
df.tail()Returns the last n rows of theDataFrame (default is 5).
import pandas as pd
df = pd.DataFrame({'A': range(10), 'B': list('abcdefghij')})
print(df.tail(3))
A B
7 7 h
8 8 i
9 9 j
#4.
df.info()Prints a concise summary of a DataFrame, including data types and non-null values.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, np.nan], 'B': ['x', 'y', 'z']})
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 2 non-null float64
1 B 3 non-null object
dtypes: float64(1), object(1)
memory usage: 176.0+ bytes
#5.
df.describe()Generates descriptive statistics for numerical columns.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 5]})
print(df.describe())
A
count 5.000000
mean 3.000000
std 1.581139
min 1.000000
25% 2.000000
50% 3.000000
75% 4.000000
max 5.000000
#6.
df.shapeReturns a tuple representing the dimensionality (rows, columns) of the DataFrame.
import pandas as pd
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4], 'C': [5, 6]})
print(df.shape)
(2, 3)
#7.
df.columnsReturns the column labels of the DataFrame.
import pandas as pd
df = pd.DataFrame({'Name': ['Alice'], 'Age': [30]})
print(df.columns)
Index(['Name', 'Age'], dtype='object')
#8.
df.dtypesReturns the data types of each column.
import pandas as pd
df = pd.DataFrame({'A': [1, 2], 'B': [1.1, 2.2], 'C': ['x', 'y']})
print(df.dtypes)
A int64
B float64
C object
dtype: object
#9.
df['col'].value_counts()Returns a Series containing counts of unique values in a column.
import pandas as pd
df = pd.DataFrame({'Fruit': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana', 'Apple']})
print(df['Fruit'].value_counts())
Apple 3
Banana 2
Orange 1
Name: Fruit, dtype: int64
#10.
df['col'].unique()Returns an array of the unique values in a column.
import pandas as pd
df = pd.DataFrame({'Fruit': ['Apple', 'Banana', 'Apple', 'Orange']})
print(df['Fruit'].unique())
['Apple' 'Banana' 'Orange']
#11.
df['col'].nunique()Returns the number of unique values in a column.
❤2
import pandas as pd
df = pd.DataFrame({'Fruit': ['Apple', 'Banana', 'Apple', 'Orange']})
print(df['Fruit'].nunique())
3
#12.
df.isnull()Returns a DataFrame of boolean values indicating missing values.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan], 'B': [np.nan, 'x']})
print(df.isnull())
A B
0 False True
1 True False
#13.
df.isnull().sum()Returns the number of missing values in each column.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3, np.nan], 'B': [5, 6, 7, 8]})
print(df.isnull().sum())
A 2
B 0
dtype: int64
#14.
df.to_csv()Writes the DataFrame to a comma-separated values (csv) file.
import pandas as pd
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
csv_output = df.to_csv(index=False)
print(csv_output)
A,B
1,3
2,4
#15.
df.copy()Creates a deep copy of a DataFrame.
import pandas as pd
df1 = pd.DataFrame({'A': [1]})
df2 = df1.copy()
df2.loc[0, 'A'] = 99
print(f"Original df1:\n{df1}")
print(f"Copied df2:\n{df2}")
Original df1:
A
0 1
Copied df2:
A
0 99
---
#DataAnalysis #Pandas #Selection #Indexing
Part 2: Pandas - Data Selection & Indexing
#16.
df['col']Selects a single column as a Series.
import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25]})
print(df['Name'])
0 Alice
1 Bob
Name: Name, dtype: object
#17.
df[['col1', 'col2']]Selects multiple columns as a new DataFrame.
import pandas as pd
df = pd.DataFrame({'Name': ['Alice'], 'Age': [30], 'City': ['New York']})
print(df[['Name', 'City']])
Name City
0 Alice New York
#18.
df.loc[]Accesses a group of rows and columns by label(s) or a boolean array.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3]}, index=['x', 'y', 'z'])
print(df.loc['y'])
A 2
Name: y, dtype: int64
#19.
df.iloc[]Accesses a group of rows and columns by integer position(s).
import pandas as pd
df = pd.DataFrame({'A': [10, 20, 30]})
print(df.iloc[1])
A 20
Name: 1, dtype: int64
#20.
df[df['col'] > value]Selects rows based on a boolean condition (boolean indexing).
import pandas as pd
df = pd.DataFrame({'Age': [22, 35, 18, 40]})
print(df[df['Age'] > 30])
Age
1 35
3 40
#21.
df.set_index()Sets the DataFrame index using existing columns.
import pandas as pd
df = pd.DataFrame({'Country': ['USA', 'UK'], 'Code': [1, 44]})
df_indexed = df.set_index('Country')
print(df_indexed)
Code
Country
USA 1
UK 44
#22.
df.reset_index()Resets the index of the DataFrame and uses the default integer index.
import pandas as pd
df = pd.DataFrame({'Code': [1, 44]}, index=['USA', 'UK'])
df_reset = df.reset_index()
print(df_reset)
index Code
0 USA 1
1 UK 44
#23.
df.at[]Accesses a single value by row/column label pair. Faster than
.loc.❤1
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3]}, index=['x', 'y', 'z'])
print(df.at['y', 'A'])
2
#24.
df.iat[]Accesses a single value by row/column integer position. Faster than
.iloc.import pandas as pd
df = pd.DataFrame({'A': [10, 20, 30]})
print(df.iat[1, 0])
20
#25.
df.sample()Returns a random sample of items from an axis of object.
import pandas as pd
df = pd.DataFrame({'A': range(10)})
print(df.sample(n=3))
A
8 8
2 2
5 5
(Note: Output rows will be random)
---
#DataAnalysis #Pandas #DataCleaning #Manipulation
Part 3: Pandas - Data Cleaning & Manipulation
#26.
df.dropna()Removes missing values.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3]})
print(df.dropna())
A
0 1.0
2 3.0
#27.
df.fillna()Fills missing (NA/NaN) values using a specified method.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3]})
print(df.fillna(0))
A
0 1.0
1 0.0
2 3.0
#28.
df.astype()Casts a pandas object to a specified dtype.
import pandas as pd
df = pd.DataFrame({'A': [1.1, 2.7, 3.5]})
df['A'] = df['A'].astype(int)
print(df)
A
0 1
1 2
2 3
#29.
df.rename()Alters axes labels.
import pandas as pd
df = pd.DataFrame({'a': [1], 'b': [2]})
df_renamed = df.rename(columns={'a': 'A', 'b': 'B'})
print(df_renamed)
A B
0 1 2
#30.
df.drop()Drops specified labels from rows or columns.
import pandas as pd
df = pd.DataFrame({'A': [1], 'B': [2], 'C': [3]})
df_dropped = df.drop(columns=['B'])
print(df_dropped)
A C
0 1 3
#31.
pd.to_datetime()Converts argument to datetime.
import pandas as pd
s = pd.Series(['2023-01-01', '2023-01-02'])
dt_s = pd.to_datetime(s)
print(dt_s)
0 2023-01-01
1 2023-01-02
dtype: datetime64[ns]
#32.
df.apply()Applies a function along an axis of the DataFrame.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3]})
df['B'] = df['A'].apply(lambda x: x * 2)
print(df)
A B
0 1 2
1 2 4
2 3 6
#33.
df['col'].map()Maps values of a Series according to an input mapping or function.
import pandas as pd
df = pd.DataFrame({'Gender': ['M', 'F', 'M']})
df['Gender_Full'] = df['Gender'].map({'M': 'Male', 'F': 'Female'})
print(df)
Gender Gender_Full
0 M Male
1 F Female
2 M Male
#34.
df.replace()Replaces values given in
to_replace with value.import pandas as pd
df = pd.DataFrame({'Score': [10, -99, 15, -99]})
df_replaced = df.replace(-99, 0)
print(df_replaced)
Score
0 10
1 0
2 15
3 0
#35.
df.duplicated()Returns a boolean Series denoting duplicate rows.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 1], 'B': ['a', 'b', 'a']})
print(df.duplicated())
0 False
1 False
2 True
dtype: bool
#36.
df.drop_duplicates()Returns a DataFrame with duplicate rows removed.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 1], 'B': ['a', 'b', 'a']})
print(df.drop_duplicates())
A B
0 1 a
1 2 b
#37.
df.sort_values()Sorts by the values along either axis.
import pandas as pd
df = pd.DataFrame({'Age': [25, 22, 30]})
print(df.sort_values(by='Age'))
Age
1 22
0 25
2 30
#38.
df.sort_index()Sorts object by labels (along an axis).
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3]}, index=[10, 5, 8])
print(df.sort_index())
A
5 2
8 3
10 1
#39.
pd.cut()Bins values into discrete intervals.
import pandas as pd
ages = pd.Series([22, 35, 58, 8, 42])
age_bins = pd.cut(ages, bins=[0, 18, 35, 60], labels=['Child', 'Adult', 'Senior'])
print(age_bins)
0 Adult
1 Adult
2 Senior
3 Child
4 Senior
dtype: category
Categories (3, object): ['Child' < 'Adult' < 'Senior']
#40.
pd.qcut()Quantile-based discretization function (bins into equal-sized groups).
import pandas as pd
data = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
quartiles = pd.qcut(data, 4, labels=False)
print(quartiles)
0 0
1 0
2 0
3 1
4 1
5 2
6 2
7 3
8 3
9 3
dtype: int64
#41.
s.str.contains()Tests if a pattern or regex is contained within a string of a Series.
import pandas as pd
s = pd.Series(['apple', 'banana', 'apricot'])
print(s[s.str.contains('ap')])
0 apple
2 apricot
dtype: object
#42.
s.str.split()Splits strings around a given separator/delimiter.
import pandas as pd
s = pd.Series(['a_b', 'c_d'])
print(s.str.split('_', expand=True))
0 1
0 a b
1 c d
#43.
s.str.lower()Converts strings in the Series to lowercase.
import pandas as pd
s = pd.Series(['HELLO', 'World'])
print(s.str.lower())
0 hello
1 world
dtype: object
#44.
s.str.strip()Removes leading and trailing whitespace.
import pandas as pd
s = pd.Series([' hello ', ' world '])
print(s.str.strip())
0 hello
1 world
dtype: object
#45.
s.dt.yearExtracts the year from a datetime Series.
import pandas as pd
s = pd.to_datetime(pd.Series(['2023-01-01', '2024-05-10']))
print(s.dt.year)
0 2023
1 2024
dtype: int64
---
#DataAnalysis #Pandas #Grouping #Aggregation
Part 4: Pandas - Grouping & Aggregation
#46.
df.groupby()Groups a DataFrame using a mapper or by a Series of columns.
import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
grouped = df.groupby('Team')
print(grouped)
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x...>
#47.
groupby.agg()Aggregates using one or more operations over the specified axis.
import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
agg_df = df.groupby('Team').agg(['mean', 'sum'])
print(agg_df)
Points
mean sum
Team
A 11 22
B 7 14
#48.
groupby.size()Computes group sizes.
❤1
import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B', 'A']})
print(df.groupby('Team').size())
Team
A 3
B 2
dtype: int64
#49.
groupby.count()Computes the count of non-NA cells for each group.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Team': ['A', 'B', 'A'], 'Score': [1, np.nan, 3]})
print(df.groupby('Team').count())
Score
Team
A 2
B 0
#50.
groupby.mean()Computes the mean of group values.
import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
print(df.groupby('Team').mean())
Points
Team
A 11
B 7
#51.
groupby.sum()Computes the sum of group values.
import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
print(df.groupby('Team').sum())
Points
Team
A 22
B 14
#52.
groupby.min()Computes the minimum of group values.
import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
print(df.groupby('Team').min())
Points
Team
A 10
B 6
#53.
groupby.max()Computes the maximum of group values.
import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
print(df.groupby('Team').max())
Points
Team
A 12
B 8
#54.
df.pivot_table()Creates a spreadsheet-style pivot table as a DataFrame.
import pandas as pd
df = pd.DataFrame({'A': ['foo', 'foo', 'bar'], 'B': ['one', 'two', 'one'], 'C': [1, 2, 3]})
pivot = df.pivot_table(values='C', index='A', columns='B')
print(pivot)
B one two
A
bar 3.0 NaN
foo 1.0 2.0
#55.
pd.crosstab()Computes a cross-tabulation of two (or more) factors.
import pandas as pd
df = pd.DataFrame({'A': ['foo', 'foo', 'bar'], 'B': ['one', 'two', 'one']})
crosstab = pd.crosstab(df.A, df.B)
print(crosstab)
B one two
A
bar 1 0
foo 1 1
---
#DataAnalysis #Pandas #Merging #Joining
Part 5: Pandas - Merging & Concatenating
#56.
pd.merge()Merges DataFrame or named Series objects with a database-style join.
import pandas as pd
df1 = pd.DataFrame({'key': ['A', 'B'], 'val1': [1, 2]})
df2 = pd.DataFrame({'key': ['A', 'B'], 'val2': [3, 4]})
merged = pd.merge(df1, df2, on='key')
print(merged)
key val1 val2
0 A 1 3
1 B 2 4
#57.
pd.concat()Concatenates pandas objects along a particular axis.
import pandas as pd
df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'A': [3, 4]})
concatenated = pd.concat([df1, df2])
print(concatenated)
A
0 1
1 2
0 3
1 4
#58.
df.join()Joins columns with other DataFrame(s) on index or on a key column.
❤1
import pandas as pd
df1 = pd.DataFrame({'val1': [1, 2]}, index=['A', 'B'])
df2 = pd.DataFrame({'val2': [3, 4]}, index=['A', 'B'])
joined = df1.join(df2)
print(joined)
val1 val2
A 1 3
B 2 4
#59.
pd.get_dummies()Converts categorical variable into dummy/indicator variables (one-hot encoding).
import pandas as pd
s = pd.Series(list('abca'))
dummies = pd.get_dummies(s)
print(dummies)
a b c
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
#60.
df.nlargest()Returns the first n rows ordered by columns in descending order.
import pandas as pd
df = pd.DataFrame({'population': [100, 500, 200, 800]})
print(df.nlargest(2, 'population'))
population
3 800
1 500
---
#DataAnalysis #NumPy #Arrays
Part 6: NumPy - Array Creation & Manipulation
#61.
np.array()Creates a NumPy ndarray.
import numpy as np
arr = np.array([1, 2, 3])
print(arr)
[1 2 3]
#62.
np.arange()Returns an array with evenly spaced values within a given interval.
import numpy as np
arr = np.arange(0, 5)
print(arr)
[0 1 2 3 4]
#63.
np.linspace()Returns an array with evenly spaced numbers over a specified interval.
import numpy as np
arr = np.linspace(0, 10, 5)
print(arr)
[ 0. 2.5 5. 7.5 10. ]
#64.
np.zeros()Returns a new array of a given shape and type, filled with zeros.
import numpy as np
arr = np.zeros((2, 3))
print(arr)
[[0. 0. 0.]
[0. 0. 0.]]
#65.
np.ones()Returns a new array of a given shape and type, filled with ones.
import numpy as np
arr = np.ones((2, 3))
print(arr)
[[1. 1. 1.]
[1. 1. 1.]]
#66.
np.random.rand()Creates an array of the given shape and populates it with random samples from a uniform distribution over [0, 1).
import numpy as np
arr = np.random.rand(2, 2)
print(arr)
[[0.13949386 0.2921446 ]
[0.52273283 0.77122228]]
(Note: Output values will be random)
#67.
arr.reshape()Gives a new shape to an array without changing its data.
import numpy as np
arr = np.arange(6)
reshaped_arr = arr.reshape((2, 3))
print(reshaped_arr)
[[0 1 2]
[3 4 5]]
#68.
np.concatenate()Joins a sequence of arrays along an existing axis.
import numpy as np
a = np.array([[1, 2]])
b = np.array([[3, 4]])
print(np.concatenate((a, b), axis=0))
[[1 2]
[3 4]]
#69.
np.vstack()Stacks arrays in sequence vertically (row wise).
import numpy as np
a = np.array([1, 2])
b = np.array([3, 4])
print(np.vstack((a, b)))
[[1 2]
[3 4]]
#70.
np.hstack()Stacks arrays in sequence horizontally (column wise).
import numpy as np
a = np.array([1, 2])
b = np.array([3, 4])
print(np.hstack((a, b)))
[1 2 3 4]
---
#DataAnalysis #NumPy #Math #Statistics
Part 7: NumPy - Mathematical & Statistical Functions
#71.
np.mean()Computes the arithmetic mean along the specified axis.
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(np.mean(arr))
3.0
#72.
np.median()Computes the median along the specified axis.
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(np.median(arr))
3.0
#73.
np.std()Computes the standard deviation along the specified axis.
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(np.std(arr))
1.4142135623730951
#74.
np.sum()Sums array elements over a given axis.
import numpy as np
arr = np.array([[1, 2], [3, 4]])
print(np.sum(arr))
10
#75.
np.min()Returns the minimum of an array or minimum along an axis.
import numpy as np
arr = np.array([5, 2, 8, 1])
print(np.min(arr))
1
#76.
np.max()Returns the maximum of an array or maximum along an axis.
import numpy as np
arr = np.array([5, 2, 8, 1])
print(np.max(arr))
8
#77.
np.sqrt()Returns the non-negative square-root of an array, element-wise.
import numpy as np
arr = np.array([4, 9, 16])
print(np.sqrt(arr))
[2. 3. 4.]
#78.
np.log()Calculates the natural logarithm, element-wise.
import numpy as np
arr = np.array([1, np.e, np.e**2])
print(np.log(arr))
[0. 1. 2.]
#79.
np.dot()Calculates the dot product of two arrays.
import numpy as np
a = np.array([1, 2])
b = np.array([3, 4])
print(np.dot(a, b))
11
#80.
np.where()Returns elements chosen from x or y depending on a condition.
import numpy as np
arr = np.array([10, 5, 20, 15])
print(np.where(arr > 12, 'High', 'Low'))
['Low' 'Low' 'High' 'High']
---
#DataAnalysis #Matplotlib #Seaborn #Visualization
Part 8: Matplotlib & Seaborn - Data Visualization
#81.
plt.plot()Plots y versus x as lines and/or markers.
import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
# In a real script, you would call plt.show()
print("Output: A figure window opens displaying a line plot.")
Output: A figure window opens displaying a line plot.
#82.
plt.scatter()A scatter plot of y vs. x with varying marker size and/or color.
import matplotlib.pyplot as plt
plt.scatter([1, 2, 3, 4], [1, 4, 9, 16])
print("Output: A figure window opens displaying a scatter plot.")
Output: A figure window opens displaying a scatter plot.
#83.
plt.hist()Computes and draws the histogram of x.
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(1000)
plt.hist(data, bins=30)
print("Output: A figure window opens displaying a histogram.")
Output: A figure window opens displaying a histogram.
#84.
plt.bar()Makes a bar plot.
import matplotlib.pyplot as plt
plt.bar(['A', 'B', 'C'], [10, 15, 7])
print("Output: A figure window opens displaying a bar chart.")
Output: A figure window opens displaying a bar chart.
#85.
plt.boxplot()Makes a box and whisker plot.
import matplotlib.pyplot as plt
import numpy as np
data = [np.random.normal(0, std, 100) for std in range(1, 4)]
plt.boxplot(data)
print("Output: A figure window opens displaying a box plot.")
Output: A figure window opens displaying a box plot.
#86.
sns.heatmap()Plots rectangular data as a color-encoded matrix.
import seaborn as sns
import numpy as np
data = np.random.rand(10, 12)
sns.heatmap(data)
print("Output: A figure window opens displaying a heatmap.")
Output: A figure window opens displaying a heatmap.
#87.
sns.pairplot()Plots pairwise relationships in a dataset.
❤2
import seaborn as sns
import pandas as pd
df = pd.DataFrame(np.random.randn(100, 4), columns=['A', 'B', 'C', 'D'])
# sns.pairplot(df) # This line would generate the plot
print("Output: A figure grid opens showing scatterplots for each pair of variables.")
Output: A figure grid opens showing scatterplots for each pair of variables.
#88.
sns.countplot()Shows the counts of observations in each categorical bin using bars.
import seaborn as sns
import pandas as pd
df = pd.DataFrame({'category': ['A', 'B', 'A', 'C', 'A', 'B']})
sns.countplot(x='category', data=df)
print("Output: A figure window opens showing a count plot.")
Output: A figure window opens showing a count plot.
#89.
sns.jointplot()Draws a plot of two variables with bivariate and univariate graphs.
import seaborn as sns
import pandas as pd
df = pd.DataFrame({'x': range(50), 'y': range(50) + np.random.randn(50)})
# sns.jointplot(x='x', y='y', data=df) # This line would generate the plot
print("Output: A figure shows a scatter plot with histograms for each axis.")
Output: A figure shows a scatter plot with histograms for each axis.
#90.
plt.show()Displays all open figures.
import matplotlib.pyplot as plt
plt.plot([1, 2, 3])
# plt.show() # In a script, this is essential to see the plot.
print("Executes the command to render and display the plot.")
Executes the command to render and display the plot.
---
#DataAnalysis #ScikitLearn #Modeling #Preprocessing
Part 9: Scikit-learn - Modeling & Preprocessing
#91.
train_test_split()Splits arrays or matrices into random train and test subsets.
from sklearn.model_selection import train_test_split
import numpy as np
X, y = np.arange(10).reshape((5, 2)), range(5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
print(f"X_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")
X_train shape: (3, 2)
X_test shape: (2, 2)
#92.
StandardScaler()Standardizes features by removing the mean and scaling to unit variance.
from sklearn.preprocessing import StandardScaler
data = [[0, 0], [0, 0], [1, 1], [1, 1]]
scaler = StandardScaler()
print(scaler.fit_transform(data))
[[-1. -1.]
[-1. -1.]
[ 1. 1.]
[ 1. 1.]]
#93.
MinMaxScaler()Transforms features by scaling each feature to a given range, typically [0, 1].
from sklearn.preprocessing import MinMaxScaler
data = [[-1, 2], [-0.5, 6], [0, 10], [1, 18]]
scaler = MinMaxScaler()
print(scaler.fit_transform(data))
[[0. 0. ]
[0.25 0.25]
[0.5 0.5 ]
[1. 1. ]]
#94.
LabelEncoder()Encodes target labels with values between 0 and n_classes-1.
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
encoded = le.fit_transform(['paris', 'tokyo', 'paris'])
print(encoded)
[0 1 0]
#95.
OneHotEncoder()Encodes categorical features as a one-hot numeric array.
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
X = [['Male'], ['Female'], ['Female']]
print(enc.fit_transform(X).toarray())
[[0. 1.]
[1. 0.]
[1. 0.]]
#96.
LinearRegression()Ordinary least squares Linear Regression model.
from sklearn.linear_model import LinearRegression
X = [[0], [1], [2]]
y = [0, 1, 2]
reg = LinearRegression().fit(X, y)
print(f"Coefficient: {reg.coef_[0]}")
❤2
Coefficient: 1.0
#97.
LogisticRegression()Implements Logistic Regression for classification.
from sklearn.linear_model import LogisticRegression
X = [[-1], [0], [1], [2]]
y = [0, 0, 1, 1]
clf = LogisticRegression().fit(X, y)
print(f"Prediction for [[-2]]: {clf.predict([[-2]])}")
Prediction for [[-2]]: [0]
#98.
KMeans()K-Means clustering algorithm.
from sklearn.cluster import KMeans
X = [[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]
kmeans = KMeans(n_clusters=2, n_init='auto').fit(X)
print(kmeans.labels_)
[0 0 0 1 1 1]
(Note: Cluster labels may be flipped, e.g., [1 1 1 0 0 0])
#99.
accuracy_score()Calculates the accuracy classification score.
from sklearn.metrics import accuracy_score
y_true = [0, 1, 1, 0]
y_pred = [0, 1, 0, 0]
print(accuracy_score(y_true, y_pred))
0.75
#100.
confusion_matrix()Computes a confusion matrix to evaluate the accuracy of a classification.
from sklearn.metrics import confusion_matrix
y_true = [0, 1, 0, 1]
y_pred = [1, 1, 0, 1]
print(confusion_matrix(y_true, y_pred))
[[1 1]
[0 2]]
━━━━━━━━━━━━━━━
By: @CodeProgrammer ✨
❤6🆒4👍2
Unlock premium learning without spending a dime! ⭐️ @DataScienceC is the first Telegram channel dishing out free Udemy coupons daily—grab courses on data science, coding, AI, and beyond. Join the revolution and boost your skills for free today! 📕
What topic are you itching to learn next?😊
https://t.me/DataScienceC🌟
What topic are you itching to learn next?
https://t.me/DataScienceC
Please open Telegram to view this post
VIEW IN TELEGRAM
Telegram
Udemy Coupons
ads: @HusseinSheikho
The first channel in Telegram that offers free
Udemy coupons
The first channel in Telegram that offers free
Udemy coupons
❤2
Please open Telegram to view this post
VIEW IN TELEGRAM
❤1
💡 Top 50 Pillow Operations for Image Processing
I. File & Basic Operations
• Open an image file.
• Save an image.
• Display an image (opens in default viewer).
• Create a new blank image.
• Get image format (e.g., 'JPEG').
• Get image dimensions as a (width, height) tuple.
• Get pixel format (e.g., 'RGB', 'L' for grayscale).
• Convert image mode.
• Get a pixel's color value at (x, y).
• Set a pixel's color value at (x, y).
II. Cropping, Resizing & Pasting
• Crop a rectangular region.
• Resize an image to an exact size.
• Create a thumbnail (maintains aspect ratio).
• Paste one image onto another.
III. Rotation & Transformation
• Rotate an image (counter-clockwise).
• Flip an image horizontally.
• Flip an image vertically.
• Rotate by 90, 180, or 270 degrees.
• Apply an affine transformation.
IV. ImageOps Module Helpers
• Invert image colors.
• Flip an image horizontally (mirror).
• Flip an image vertically.
• Convert to grayscale.
• Colorize a grayscale image.
• Reduce the number of bits for each color channel.
• Auto-adjust image contrast.
• Equalize the image histogram.
• Add a border to an image.
V. Color & Pixel Operations
• Split image into individual bands (e.g., R, G, B).
• Merge bands back into an image.
• Apply a function to each pixel.
• Get a list of colors used in the image.
• Blend two images with alpha compositing.
VI. Filters (ImageFilter)
I. File & Basic Operations
• Open an image file.
from PIL import Image
img = Image.open("image.jpg")
• Save an image.
img.save("new_image.png")• Display an image (opens in default viewer).
img.show()
• Create a new blank image.
new_img = Image.new("RGB", (200, 100), "blue")• Get image format (e.g., 'JPEG').
print(img.format)
• Get image dimensions as a (width, height) tuple.
width, height = img.size
• Get pixel format (e.g., 'RGB', 'L' for grayscale).
print(img.mode)
• Convert image mode.
grayscale_img = img.convert("L")• Get a pixel's color value at (x, y).
r, g, b = img.getpixel((10, 20))
• Set a pixel's color value at (x, y).
img.putpixel((10, 20), (255, 0, 0))
II. Cropping, Resizing & Pasting
• Crop a rectangular region.
box = (100, 100, 400, 400)
cropped_img = img.crop(box)
• Resize an image to an exact size.
resized_img = img.resize((200, 200))
• Create a thumbnail (maintains aspect ratio).
img.thumbnail((128, 128))
• Paste one image onto another.
img.paste(another_img, (50, 50))
III. Rotation & Transformation
• Rotate an image (counter-clockwise).
rotated_img = img.rotate(45, expand=True)
• Flip an image horizontally.
flipped_img = img.transpose(Image.FLIP_LEFT_RIGHT)
• Flip an image vertically.
flipped_img = img.transpose(Image.FLIP_TOP_BOTTOM)
• Rotate by 90, 180, or 270 degrees.
img_90 = img.transpose(Image.ROTATE_90)
• Apply an affine transformation.
transformed = img.transform(img.size, Image.AFFINE, (1, 0.5, 0, 0, 1, 0))
IV. ImageOps Module Helpers
• Invert image colors.
from PIL import ImageOps
inverted_img = ImageOps.invert(img)
• Flip an image horizontally (mirror).
mirrored_img = ImageOps.mirror(img)
• Flip an image vertically.
flipped_v_img = ImageOps.flip(img)
• Convert to grayscale.
grayscale = ImageOps.grayscale(img)
• Colorize a grayscale image.
colorized = ImageOps.colorize(grayscale, black="blue", white="yellow")
• Reduce the number of bits for each color channel.
posterized = ImageOps.posterize(img, 4)
• Auto-adjust image contrast.
adjusted_img = ImageOps.autocontrast(img)
• Equalize the image histogram.
equalized_img = ImageOps.equalize(img)
• Add a border to an image.
bordered = ImageOps.expand(img, border=10, fill='black')
V. Color & Pixel Operations
• Split image into individual bands (e.g., R, G, B).
r, g, b = img.split()
• Merge bands back into an image.
merged_img = Image.merge("RGB", (r, g, b))• Apply a function to each pixel.
brighter_img = img.point(lambda i: i * 1.2)
• Get a list of colors used in the image.
colors = img.getcolors(maxcolors=256)
• Blend two images with alpha compositing.
# Both images must be in RGBA mode
blended = Image.alpha_composite(img1_rgba, img2_rgba)
VI. Filters (ImageFilter)
❤2
• Apply a simple blur filter.
• Apply a box blur with a given radius.
• Apply a Gaussian blur.
• Sharpen the image.
• Find edges.
• Enhance edges.
• Emboss the image.
• Find contours.
VII. Image Enhancement (ImageEnhance)
• Adjust color saturation.
• Adjust brightness.
• Adjust contrast.
• Adjust sharpness.
VIII. Drawing (ImageDraw & ImageFont)
• Draw text on an image.
• Draw a line.
• Draw a rectangle (outline).
• Draw a filled ellipse.
• Draw a polygon.
#Python #Pillow #ImageProcessing #PIL #CheatSheet
━━━━━━━━━━━━━━━
By: @CodeProgrammer ✨
from PIL import ImageFilter
blurred_img = img.filter(ImageFilter.BLUR)
• Apply a box blur with a given radius.
box_blur = img.filter(ImageFilter.BoxBlur(5))
• Apply a Gaussian blur.
gaussian_blur = img.filter(ImageFilter.GaussianBlur(radius=2))
• Sharpen the image.
sharpened = img.filter(ImageFilter.SHARPEN)
• Find edges.
edges = img.filter(ImageFilter.FIND_EDGES)
• Enhance edges.
edge_enhanced = img.filter(ImageFilter.EDGE_ENHANCE)
• Emboss the image.
embossed = img.filter(ImageFilter.EMBOSS)
• Find contours.
contours = img.filter(ImageFilter.CONTOUR)
VII. Image Enhancement (ImageEnhance)
• Adjust color saturation.
from PIL import ImageEnhance
enhancer = ImageEnhance.Color(img)
vibrant_img = enhancer.enhance(2.0)
• Adjust brightness.
enhancer = ImageEnhance.Brightness(img)
bright_img = enhancer.enhance(1.5)
• Adjust contrast.
enhancer = ImageEnhance.Contrast(img)
contrast_img = enhancer.enhance(1.5)
• Adjust sharpness.
enhancer = ImageEnhance.Sharpness(img)
sharp_img = enhancer.enhance(2.0)
VIII. Drawing (ImageDraw & ImageFont)
• Draw text on an image.
from PIL import ImageDraw, ImageFont
draw = ImageDraw.Draw(img)
font = ImageFont.truetype("arial.ttf", 36)
draw.text((10, 10), "Hello", font=font, fill="red")
• Draw a line.
draw.line((0, 0, 100, 200), fill="blue", width=3)
• Draw a rectangle (outline).
draw.rectangle([10, 10, 90, 60], outline="green", width=2)
• Draw a filled ellipse.
draw.ellipse([100, 100, 180, 150], fill="yellow")
• Draw a polygon.
draw.polygon([(10,10), (20,50), (60,10)], fill="purple")
#Python #Pillow #ImageProcessing #PIL #CheatSheet
━━━━━━━━━━━━━━━
By: @CodeProgrammer ✨
❤6🔥4👍2🎉2
🤖🧠 LongCat-Video: Meituan’s Groundbreaking Step Toward Efficient Long Video Generation with AI
🗓️ 04 Nov 2025
📚 AI News & Trends
In the rapidly advancing field of generative AI, the ability to create realistic, coherent, and high-quality videos from text or images has become one of the most sought-after goals. Meituan, one of the leading technology innovators in China, has made a remarkable stride in this domain with its latest open-source model — LongCat-Video. Designed as ...
#LongCatVideo #Meituan #GenerativeAI #VideoGeneration #AIInnovation #OpenSource
🗓️ 04 Nov 2025
📚 AI News & Trends
In the rapidly advancing field of generative AI, the ability to create realistic, coherent, and high-quality videos from text or images has become one of the most sought-after goals. Meituan, one of the leading technology innovators in China, has made a remarkable stride in this domain with its latest open-source model — LongCat-Video. Designed as ...
#LongCatVideo #Meituan #GenerativeAI #VideoGeneration #AIInnovation #OpenSource