Please open Telegram to view this post
VIEW IN TELEGRAM
β€4
#PDF #EPUB #TelegramBot #Python #SQLite #Project
Lesson: Building a PDF <> EPUB Telegram Converter Bot
This lesson walks you through creating a fully functional Telegram bot from scratch. The bot will accept PDF or EPUB files, convert them to the other format, and log each transaction in an SQLite database.
---
Part 1: Prerequisites & Setup
First, we need to install the necessary Python library for the Telegram Bot API. We will also rely on Calibre's command-line tools for conversion.
Important: You must install Calibre on the system where the bot will run and ensure its
#Setup #Prerequisites
---
Part 2: Database Initialization
We'll use SQLite to log every successful conversion. Create a file named
#Database #SQLite #Initialization
---
Part 3: The Main Bot Script - Imports & Basic Commands
Now, let's create our main bot file,
#TelegramBot #Python #Boilerplate
---
Part 4: The Core Conversion Logic
This function will be the heart of our bot. It uses the
Lesson: Building a PDF <> EPUB Telegram Converter Bot
This lesson walks you through creating a fully functional Telegram bot from scratch. The bot will accept PDF or EPUB files, convert them to the other format, and log each transaction in an SQLite database.
---
Part 1: Prerequisites & Setup
First, we need to install the necessary Python library for the Telegram Bot API. We will also rely on Calibre's command-line tools for conversion.
Important: You must install Calibre on the system where the bot will run and ensure its
ebook-convert tool is in your system's PATH.pip install python-telegram-bot==20.3#Setup #Prerequisites
---
Part 2: Database Initialization
We'll use SQLite to log every successful conversion. Create a file named
database_setup.py and run it once to create the database file and the table.# database_setup.py
import sqlite3
def setup_database():
conn = sqlite3.connect('conversions.db')
cursor = conn.cursor()
# Create table to store conversion logs
cursor.execute('''
CREATE TABLE IF NOT EXISTS conversions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_id INTEGER NOT NULL,
original_filename TEXT NOT NULL,
converted_filename TEXT NOT NULL,
conversion_type TEXT NOT NULL,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
)
''')
conn.commit()
conn.close()
print("Database setup complete. 'conversions.db' is ready.")
if __name__ == '__main__':
setup_database()
#Database #SQLite #Initialization
---
Part 3: The Main Bot Script - Imports & Basic Commands
Now, let's create our main bot file,
converter_bot.py. We'll start with imports and the initial /start and /help commands.# converter_bot.py
import logging
import os
import sqlite3
import subprocess
from telegram import Update
from telegram.ext import Application, CommandHandler, MessageHandler, filters, ContextTypes
# Enable logging
logging.basicConfig(format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', level=logging.INFO)
# --- Bot Token ---
TELEGRAM_TOKEN = "YOUR_TELEGRAM_BOT_TOKEN"
# --- Command Handlers ---
async def start(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
user = update.effective_user
await update.message.reply_html(
rf"Hi {user.mention_html()}! Send me a PDF or EPUB file to convert.",
)
async def help_command(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
await update.message.reply_text("Simply send a .pdf file to get an .epub, or send an .epub file to get a .pdf. Note: Conversion quality depends on the source file's structure.")
#TelegramBot #Python #Boilerplate
---
Part 4: The Core Conversion Logic
This function will be the heart of our bot. It uses the
ebook-convert command-line tool (from Calibre) to perform the conversion. It's crucial that Calibre is installed correctly for this to work.β€1
# converter_bot.py (continued)
def run_conversion(input_path: str, output_path: str) -> bool:
"""Runs the ebook-convert command and returns True on success."""
try:
command = ['ebook-convert', input_path, output_path]
result = subprocess.run(command, check=True, capture_output=True, text=True)
logging.info(f"Calibre output: {result.stdout}")
return True
except FileNotFoundError:
logging.error("CRITICAL: 'ebook-convert' command not found. Is Calibre installed and in the system's PATH?")
return False
except subprocess.CalledProcessError as e:
logging.error(f"Conversion failed for {input_path}. Error: {e.stderr}")
return False
#Conversion #Calibre #Subprocess
---
Part 5: Database Logging Function
This helper function will connect to our SQLite database and insert a new record for each successful conversion.
# converter_bot.py (continued)
def log_to_db(user_id: int, original_file: str, converted_file: str, conv_type: str):
"""Logs a successful conversion to the SQLite database."""
try:
conn = sqlite3.connect('conversions.db')
cursor = conn.cursor()
cursor.execute(
"INSERT INTO conversions (user_id, original_filename, converted_filename, conversion_type) VALUES (?, ?, ?, ?)",
(user_id, original_file, converted_file, conv_type)
)
conn.commit()
conn.close()
except sqlite3.Error as e:
logging.error(f"Database error: {e}")
#Database #Logging #SQLite
---
Part 6: Handling Incoming Files
This is the main handler that will be triggered when a user sends a document. It downloads the file, determines the target format, calls the conversion function, sends the result back, logs it, and cleans up.
# converter_bot.py (continued)
async def handle_document(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
doc = update.message.document
file_id = doc.file_id
file_name = doc.file_name
input_path = os.path.join("downloads", file_name)
os.makedirs("downloads", exist_ok=True) # Ensure download directory exists
new_file = await context.bot.get_file(file_id)
await new_file.download_to_drive(input_path)
await update.message.reply_text(f"Received '{file_name}'. Starting conversion...")
output_path = ""
conversion_type = ""
if file_name.lower().endswith('.pdf'):
output_path = input_path.rsplit('.', 1)[0] + '.epub'
conversion_type = "PDF -> EPUB"
elif file_name.lower().endswith('.epub'):
output_path = input_path.rsplit('.', 1)[0] + '.pdf'
conversion_type = "EPUB -> PDF"
else:
await update.message.reply_text("Sorry, I only support PDF and EPUB files.")
os.remove(input_path)
return
# Run the conversion
success = run_conversion(input_path, output_path)
if success and os.path.exists(output_path):
await update.message.reply_text("Conversion successful! Uploading your file...")
await context.bot.send_document(chat_id=update.effective_chat.id, document=open(output_path, 'rb'))
# Log to database
log_to_db(update.effective_user.id, file_name, os.path.basename(output_path), conversion_type)
else:
await update.message.reply_text("An error occurred during conversion. Please check the file and try again. The file might be corrupted or protected.")
# Cleanup
if os.path.exists(input_path):
os.remove(input_path)
if os.path.exists(output_path):
os.remove(output_path)
#FileHandler #BotLogic
---
Part 7: Main Execution Block
Finally, this block sets up the application, registers all our handlers, and starts the bot. This code goes at the end of
#Main #Execution #RunBot
---
Part 8: Results & Discussion
To Run:
β’ Run
β’ Replace
β’ Run
β’ Send a PDF or EPUB file to your bot on Telegram.
Expected Results:
β’ The bot will acknowledge the file.
β’ After a short processing time, it will send back the converted file.
β’ A new entry will be added to the
Viewing the Database:
You can inspect the
Discussion & Limitations:
β’ Dependency: The bot is entirely dependent on a local installation of Calibre. This makes it hard to deploy on simple hosting services. A Docker-based deployment would be a good solution.
β’ Conversion Quality: Converting from PDF, especially those with complex layouts, images, and columns, can result in poor EPUB formatting. This is a fundamental limitation of PDF-to-EPUB conversion, not just a flaw in the bot.
β’ Synchronous Processing: The bot handles one file at a time. If two users send files simultaneously, one has to wait. For a larger scale, a task queue system (like Celery with Redis) would be necessary to handle conversions asynchronously in the background.
β’ Error Handling: The current error messaging is generic. Advanced versions could parse Calibre's error output to give users more specific feedback (e.g., "This PDF is password-protected").
#Results #Discussion #Limitations #Scalability
βββββββββββββββ
By: @CodeProgrammer β¨
Finally, this block sets up the application, registers all our handlers, and starts the bot. This code goes at the end of
converter_bot.py.# converter_bot.py (continued)
def main() -> None:
"""Start the bot."""
application = Application.builder().token(TELEGRAM_TOKEN).build()
# Register handlers
application.add_handler(CommandHandler("start", start))
application.add_handler(CommandHandler("help", help_command))
application.add_handler(MessageHandler(filters.Document.ALL, handle_document))
# Run the bot until the user presses Ctrl-C
print("Bot is running...")
application.run_polling()
if __name__ == '__main__':
main()
#Main #Execution #RunBot
---
Part 8: Results & Discussion
To Run:
β’ Run
python database_setup.py once.β’ Replace
"YOUR_TELEGRAM_BOT_TOKEN" in converter_bot.py with your actual token from BotFather.β’ Run
python converter_bot.py.β’ Send a PDF or EPUB file to your bot on Telegram.
Expected Results:
β’ The bot will acknowledge the file.
β’ After a short processing time, it will send back the converted file.
β’ A new entry will be added to the
conversions.db file.Viewing the Database:
You can inspect the
conversions.db file using a tool like "DB Browser for SQLite" or the command line:sqlite3 conversions.db "SELECT * FROM conversions;"Discussion & Limitations:
β’ Dependency: The bot is entirely dependent on a local installation of Calibre. This makes it hard to deploy on simple hosting services. A Docker-based deployment would be a good solution.
β’ Conversion Quality: Converting from PDF, especially those with complex layouts, images, and columns, can result in poor EPUB formatting. This is a fundamental limitation of PDF-to-EPUB conversion, not just a flaw in the bot.
β’ Synchronous Processing: The bot handles one file at a time. If two users send files simultaneously, one has to wait. For a larger scale, a task queue system (like Celery with Redis) would be necessary to handle conversions asynchronously in the background.
β’ Error Handling: The current error messaging is generic. Advanced versions could parse Calibre's error output to give users more specific feedback (e.g., "This PDF is password-protected").
#Results #Discussion #Limitations #Scalability
βββββββββββββββ
By: @CodeProgrammer β¨
β€8π1
Please open Telegram to view this post
VIEW IN TELEGRAM
β€3
Top 100 Data Analysis Commands & Functions
#DataAnalysis #Pandas #DataLoading #Inspection
Part 1: Pandas - Data Loading & Inspection
#1.
Reads a comma-separated values (csv) file into a Pandas DataFrame.
#2.
Returns the first n rows of the DataFrame (default is 5).
#3.
Returns the last n rows of theDataFrame (default is 5).
#4.
Prints a concise summary of a DataFrame, including data types and non-null values.
#5.
Generates descriptive statistics for numerical columns.
#6.
Returns a tuple representing the dimensionality (rows, columns) of the DataFrame.
#7.
Returns the column labels of the DataFrame.
#8.
Returns the data types of each column.
#9.
Returns a Series containing counts of unique values in a column.
#10.
Returns an array of the unique values in a column.
#11.
Returns the number of unique values in a column.
#DataAnalysis #Pandas #DataLoading #Inspection
Part 1: Pandas - Data Loading & Inspection
#1.
pd.read_csv()Reads a comma-separated values (csv) file into a Pandas DataFrame.
import pandas as pd
from io import StringIO
csv_data = "col1,col2,col3\n1,a,True\n2,b,False"
df = pd.read_csv(StringIO(csv_data))
print(df)
col1 col2 col3
0 1 a True
1 2 b False
#2.
df.head()Returns the first n rows of the DataFrame (default is 5).
import pandas as pd
df = pd.DataFrame({'A': range(10), 'B': list('abcdefghij')})
print(df.head(3))
A B
0 0 a
1 1 b
2 2 c
#3.
df.tail()Returns the last n rows of theDataFrame (default is 5).
import pandas as pd
df = pd.DataFrame({'A': range(10), 'B': list('abcdefghij')})
print(df.tail(3))
A B
7 7 h
8 8 i
9 9 j
#4.
df.info()Prints a concise summary of a DataFrame, including data types and non-null values.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, np.nan], 'B': ['x', 'y', 'z']})
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 2 non-null float64
1 B 3 non-null object
dtypes: float64(1), object(1)
memory usage: 176.0+ bytes
#5.
df.describe()Generates descriptive statistics for numerical columns.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 5]})
print(df.describe())
A
count 5.000000
mean 3.000000
std 1.581139
min 1.000000
25% 2.000000
50% 3.000000
75% 4.000000
max 5.000000
#6.
df.shapeReturns a tuple representing the dimensionality (rows, columns) of the DataFrame.
import pandas as pd
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4], 'C': [5, 6]})
print(df.shape)
(2, 3)
#7.
df.columnsReturns the column labels of the DataFrame.
import pandas as pd
df = pd.DataFrame({'Name': ['Alice'], 'Age': [30]})
print(df.columns)
Index(['Name', 'Age'], dtype='object')
#8.
df.dtypesReturns the data types of each column.
import pandas as pd
df = pd.DataFrame({'A': [1, 2], 'B': [1.1, 2.2], 'C': ['x', 'y']})
print(df.dtypes)
A int64
B float64
C object
dtype: object
#9.
df['col'].value_counts()Returns a Series containing counts of unique values in a column.
import pandas as pd
df = pd.DataFrame({'Fruit': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana', 'Apple']})
print(df['Fruit'].value_counts())
Apple 3
Banana 2
Orange 1
Name: Fruit, dtype: int64
#10.
df['col'].unique()Returns an array of the unique values in a column.
import pandas as pd
df = pd.DataFrame({'Fruit': ['Apple', 'Banana', 'Apple', 'Orange']})
print(df['Fruit'].unique())
['Apple' 'Banana' 'Orange']
#11.
df['col'].nunique()Returns the number of unique values in a column.
β€2
import pandas as pd
df = pd.DataFrame({'Fruit': ['Apple', 'Banana', 'Apple', 'Orange']})
print(df['Fruit'].nunique())
3
#12.
df.isnull()Returns a DataFrame of boolean values indicating missing values.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan], 'B': [np.nan, 'x']})
print(df.isnull())
A B
0 False True
1 True False
#13.
df.isnull().sum()Returns the number of missing values in each column.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3, np.nan], 'B': [5, 6, 7, 8]})
print(df.isnull().sum())
A 2
B 0
dtype: int64
#14.
df.to_csv()Writes the DataFrame to a comma-separated values (csv) file.
import pandas as pd
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
csv_output = df.to_csv(index=False)
print(csv_output)
A,B
1,3
2,4
#15.
df.copy()Creates a deep copy of a DataFrame.
import pandas as pd
df1 = pd.DataFrame({'A': [1]})
df2 = df1.copy()
df2.loc[0, 'A'] = 99
print(f"Original df1:\n{df1}")
print(f"Copied df2:\n{df2}")
Original df1:
A
0 1
Copied df2:
A
0 99
---
#DataAnalysis #Pandas #Selection #Indexing
Part 2: Pandas - Data Selection & Indexing
#16.
df['col']Selects a single column as a Series.
import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25]})
print(df['Name'])
0 Alice
1 Bob
Name: Name, dtype: object
#17.
df[['col1', 'col2']]Selects multiple columns as a new DataFrame.
import pandas as pd
df = pd.DataFrame({'Name': ['Alice'], 'Age': [30], 'City': ['New York']})
print(df[['Name', 'City']])
Name City
0 Alice New York
#18.
df.loc[]Accesses a group of rows and columns by label(s) or a boolean array.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3]}, index=['x', 'y', 'z'])
print(df.loc['y'])
A 2
Name: y, dtype: int64
#19.
df.iloc[]Accesses a group of rows and columns by integer position(s).
import pandas as pd
df = pd.DataFrame({'A': [10, 20, 30]})
print(df.iloc[1])
A 20
Name: 1, dtype: int64
#20.
df[df['col'] > value]Selects rows based on a boolean condition (boolean indexing).
import pandas as pd
df = pd.DataFrame({'Age': [22, 35, 18, 40]})
print(df[df['Age'] > 30])
Age
1 35
3 40
#21.
df.set_index()Sets the DataFrame index using existing columns.
import pandas as pd
df = pd.DataFrame({'Country': ['USA', 'UK'], 'Code': [1, 44]})
df_indexed = df.set_index('Country')
print(df_indexed)
Code
Country
USA 1
UK 44
#22.
df.reset_index()Resets the index of the DataFrame and uses the default integer index.
import pandas as pd
df = pd.DataFrame({'Code': [1, 44]}, index=['USA', 'UK'])
df_reset = df.reset_index()
print(df_reset)
index Code
0 USA 1
1 UK 44
#23.
df.at[]Accesses a single value by row/column label pair. Faster than
.loc.β€1
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3]}, index=['x', 'y', 'z'])
print(df.at['y', 'A'])
2
#24.
df.iat[]Accesses a single value by row/column integer position. Faster than
.iloc.import pandas as pd
df = pd.DataFrame({'A': [10, 20, 30]})
print(df.iat[1, 0])
20
#25.
df.sample()Returns a random sample of items from an axis of object.
import pandas as pd
df = pd.DataFrame({'A': range(10)})
print(df.sample(n=3))
A
8 8
2 2
5 5
(Note: Output rows will be random)
---
#DataAnalysis #Pandas #DataCleaning #Manipulation
Part 3: Pandas - Data Cleaning & Manipulation
#26.
df.dropna()Removes missing values.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3]})
print(df.dropna())
A
0 1.0
2 3.0
#27.
df.fillna()Fills missing (NA/NaN) values using a specified method.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3]})
print(df.fillna(0))
A
0 1.0
1 0.0
2 3.0
#28.
df.astype()Casts a pandas object to a specified dtype.
import pandas as pd
df = pd.DataFrame({'A': [1.1, 2.7, 3.5]})
df['A'] = df['A'].astype(int)
print(df)
A
0 1
1 2
2 3
#29.
df.rename()Alters axes labels.
import pandas as pd
df = pd.DataFrame({'a': [1], 'b': [2]})
df_renamed = df.rename(columns={'a': 'A', 'b': 'B'})
print(df_renamed)
A B
0 1 2
#30.
df.drop()Drops specified labels from rows or columns.
import pandas as pd
df = pd.DataFrame({'A': [1], 'B': [2], 'C': [3]})
df_dropped = df.drop(columns=['B'])
print(df_dropped)
A C
0 1 3
#31.
pd.to_datetime()Converts argument to datetime.
import pandas as pd
s = pd.Series(['2023-01-01', '2023-01-02'])
dt_s = pd.to_datetime(s)
print(dt_s)
0 2023-01-01
1 2023-01-02
dtype: datetime64[ns]
#32.
df.apply()Applies a function along an axis of the DataFrame.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3]})
df['B'] = df['A'].apply(lambda x: x * 2)
print(df)
A B
0 1 2
1 2 4
2 3 6
#33.
df['col'].map()Maps values of a Series according to an input mapping or function.
import pandas as pd
df = pd.DataFrame({'Gender': ['M', 'F', 'M']})
df['Gender_Full'] = df['Gender'].map({'M': 'Male', 'F': 'Female'})
print(df)
Gender Gender_Full
0 M Male
1 F Female
2 M Male
#34.
df.replace()Replaces values given in
to_replace with value.import pandas as pd
df = pd.DataFrame({'Score': [10, -99, 15, -99]})
df_replaced = df.replace(-99, 0)
print(df_replaced)
Score
0 10
1 0
2 15
3 0
#35.
df.duplicated()Returns a boolean Series denoting duplicate rows.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 1], 'B': ['a', 'b', 'a']})
print(df.duplicated())
0 False
1 False
2 True
dtype: bool
#36.
df.drop_duplicates()Returns a DataFrame with duplicate rows removed.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 1], 'B': ['a', 'b', 'a']})
print(df.drop_duplicates())
A B
0 1 a
1 2 b
#37.
df.sort_values()Sorts by the values along either axis.
import pandas as pd
df = pd.DataFrame({'Age': [25, 22, 30]})
print(df.sort_values(by='Age'))
Age
1 22
0 25
2 30
#38.
df.sort_index()Sorts object by labels (along an axis).
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3]}, index=[10, 5, 8])
print(df.sort_index())
A
5 2
8 3
10 1
#39.
pd.cut()Bins values into discrete intervals.
import pandas as pd
ages = pd.Series([22, 35, 58, 8, 42])
age_bins = pd.cut(ages, bins=[0, 18, 35, 60], labels=['Child', 'Adult', 'Senior'])
print(age_bins)
0 Adult
1 Adult
2 Senior
3 Child
4 Senior
dtype: category
Categories (3, object): ['Child' < 'Adult' < 'Senior']
#40.
pd.qcut()Quantile-based discretization function (bins into equal-sized groups).
import pandas as pd
data = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
quartiles = pd.qcut(data, 4, labels=False)
print(quartiles)
0 0
1 0
2 0
3 1
4 1
5 2
6 2
7 3
8 3
9 3
dtype: int64
#41.
s.str.contains()Tests if a pattern or regex is contained within a string of a Series.
import pandas as pd
s = pd.Series(['apple', 'banana', 'apricot'])
print(s[s.str.contains('ap')])
0 apple
2 apricot
dtype: object
#42.
s.str.split()Splits strings around a given separator/delimiter.
import pandas as pd
s = pd.Series(['a_b', 'c_d'])
print(s.str.split('_', expand=True))
0 1
0 a b
1 c d
#43.
s.str.lower()Converts strings in the Series to lowercase.
import pandas as pd
s = pd.Series(['HELLO', 'World'])
print(s.str.lower())
0 hello
1 world
dtype: object
#44.
s.str.strip()Removes leading and trailing whitespace.
import pandas as pd
s = pd.Series([' hello ', ' world '])
print(s.str.strip())
0 hello
1 world
dtype: object
#45.
s.dt.yearExtracts the year from a datetime Series.
import pandas as pd
s = pd.to_datetime(pd.Series(['2023-01-01', '2024-05-10']))
print(s.dt.year)
0 2023
1 2024
dtype: int64
---
#DataAnalysis #Pandas #Grouping #Aggregation
Part 4: Pandas - Grouping & Aggregation
#46.
df.groupby()Groups a DataFrame using a mapper or by a Series of columns.
import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
grouped = df.groupby('Team')
print(grouped)
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x...>
#47.
groupby.agg()Aggregates using one or more operations over the specified axis.
import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
agg_df = df.groupby('Team').agg(['mean', 'sum'])
print(agg_df)
Points
mean sum
Team
A 11 22
B 7 14
#48.
groupby.size()Computes group sizes.
β€1
import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B', 'A']})
print(df.groupby('Team').size())
Team
A 3
B 2
dtype: int64
#49.
groupby.count()Computes the count of non-NA cells for each group.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Team': ['A', 'B', 'A'], 'Score': [1, np.nan, 3]})
print(df.groupby('Team').count())
Score
Team
A 2
B 0
#50.
groupby.mean()Computes the mean of group values.
import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
print(df.groupby('Team').mean())
Points
Team
A 11
B 7
#51.
groupby.sum()Computes the sum of group values.
import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
print(df.groupby('Team').sum())
Points
Team
A 22
B 14
#52.
groupby.min()Computes the minimum of group values.
import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
print(df.groupby('Team').min())
Points
Team
A 10
B 6
#53.
groupby.max()Computes the maximum of group values.
import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
print(df.groupby('Team').max())
Points
Team
A 12
B 8
#54.
df.pivot_table()Creates a spreadsheet-style pivot table as a DataFrame.
import pandas as pd
df = pd.DataFrame({'A': ['foo', 'foo', 'bar'], 'B': ['one', 'two', 'one'], 'C': [1, 2, 3]})
pivot = df.pivot_table(values='C', index='A', columns='B')
print(pivot)
B one two
A
bar 3.0 NaN
foo 1.0 2.0
#55.
pd.crosstab()Computes a cross-tabulation of two (or more) factors.
import pandas as pd
df = pd.DataFrame({'A': ['foo', 'foo', 'bar'], 'B': ['one', 'two', 'one']})
crosstab = pd.crosstab(df.A, df.B)
print(crosstab)
B one two
A
bar 1 0
foo 1 1
---
#DataAnalysis #Pandas #Merging #Joining
Part 5: Pandas - Merging & Concatenating
#56.
pd.merge()Merges DataFrame or named Series objects with a database-style join.
import pandas as pd
df1 = pd.DataFrame({'key': ['A', 'B'], 'val1': [1, 2]})
df2 = pd.DataFrame({'key': ['A', 'B'], 'val2': [3, 4]})
merged = pd.merge(df1, df2, on='key')
print(merged)
key val1 val2
0 A 1 3
1 B 2 4
#57.
pd.concat()Concatenates pandas objects along a particular axis.
import pandas as pd
df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'A': [3, 4]})
concatenated = pd.concat([df1, df2])
print(concatenated)
A
0 1
1 2
0 3
1 4
#58.
df.join()Joins columns with other DataFrame(s) on index or on a key column.
β€1
import pandas as pd
df1 = pd.DataFrame({'val1': [1, 2]}, index=['A', 'B'])
df2 = pd.DataFrame({'val2': [3, 4]}, index=['A', 'B'])
joined = df1.join(df2)
print(joined)
val1 val2
A 1 3
B 2 4
#59.
pd.get_dummies()Converts categorical variable into dummy/indicator variables (one-hot encoding).
import pandas as pd
s = pd.Series(list('abca'))
dummies = pd.get_dummies(s)
print(dummies)
a b c
0 1 0 0
1 0 1 0
2 0 0 1
3 1 0 0
#60.
df.nlargest()Returns the first n rows ordered by columns in descending order.
import pandas as pd
df = pd.DataFrame({'population': [100, 500, 200, 800]})
print(df.nlargest(2, 'population'))
population
3 800
1 500
---
#DataAnalysis #NumPy #Arrays
Part 6: NumPy - Array Creation & Manipulation
#61.
np.array()Creates a NumPy ndarray.
import numpy as np
arr = np.array([1, 2, 3])
print(arr)
[1 2 3]
#62.
np.arange()Returns an array with evenly spaced values within a given interval.
import numpy as np
arr = np.arange(0, 5)
print(arr)
[0 1 2 3 4]
#63.
np.linspace()Returns an array with evenly spaced numbers over a specified interval.
import numpy as np
arr = np.linspace(0, 10, 5)
print(arr)
[ 0. 2.5 5. 7.5 10. ]
#64.
np.zeros()Returns a new array of a given shape and type, filled with zeros.
import numpy as np
arr = np.zeros((2, 3))
print(arr)
[[0. 0. 0.]
[0. 0. 0.]]
#65.
np.ones()Returns a new array of a given shape and type, filled with ones.
import numpy as np
arr = np.ones((2, 3))
print(arr)
[[1. 1. 1.]
[1. 1. 1.]]
#66.
np.random.rand()Creates an array of the given shape and populates it with random samples from a uniform distribution over [0, 1).
import numpy as np
arr = np.random.rand(2, 2)
print(arr)
[[0.13949386 0.2921446 ]
[0.52273283 0.77122228]]
(Note: Output values will be random)
#67.
arr.reshape()Gives a new shape to an array without changing its data.
import numpy as np
arr = np.arange(6)
reshaped_arr = arr.reshape((2, 3))
print(reshaped_arr)
[[0 1 2]
[3 4 5]]
#68.
np.concatenate()Joins a sequence of arrays along an existing axis.
import numpy as np
a = np.array([[1, 2]])
b = np.array([[3, 4]])
print(np.concatenate((a, b), axis=0))
[[1 2]
[3 4]]
#69.
np.vstack()Stacks arrays in sequence vertically (row wise).
import numpy as np
a = np.array([1, 2])
b = np.array([3, 4])
print(np.vstack((a, b)))
[[1 2]
[3 4]]
#70.
np.hstack()Stacks arrays in sequence horizontally (column wise).
import numpy as np
a = np.array([1, 2])
b = np.array([3, 4])
print(np.hstack((a, b)))
[1 2 3 4]
---
#DataAnalysis #NumPy #Math #Statistics
Part 7: NumPy - Mathematical & Statistical Functions
#71.
np.mean()Computes the arithmetic mean along the specified axis.
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(np.mean(arr))
3.0
#72.
np.median()Computes the median along the specified axis.
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(np.median(arr))
3.0
#73.
np.std()Computes the standard deviation along the specified axis.
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(np.std(arr))
1.4142135623730951
#74.
np.sum()Sums array elements over a given axis.
import numpy as np
arr = np.array([[1, 2], [3, 4]])
print(np.sum(arr))
10
#75.
np.min()Returns the minimum of an array or minimum along an axis.
import numpy as np
arr = np.array([5, 2, 8, 1])
print(np.min(arr))
1
#76.
np.max()Returns the maximum of an array or maximum along an axis.
import numpy as np
arr = np.array([5, 2, 8, 1])
print(np.max(arr))
8
#77.
np.sqrt()Returns the non-negative square-root of an array, element-wise.
import numpy as np
arr = np.array([4, 9, 16])
print(np.sqrt(arr))
[2. 3. 4.]
#78.
np.log()Calculates the natural logarithm, element-wise.
import numpy as np
arr = np.array([1, np.e, np.e**2])
print(np.log(arr))
[0. 1. 2.]
#79.
np.dot()Calculates the dot product of two arrays.
import numpy as np
a = np.array([1, 2])
b = np.array([3, 4])
print(np.dot(a, b))
11
#80.
np.where()Returns elements chosen from x or y depending on a condition.
import numpy as np
arr = np.array([10, 5, 20, 15])
print(np.where(arr > 12, 'High', 'Low'))
['Low' 'Low' 'High' 'High']
---
#DataAnalysis #Matplotlib #Seaborn #Visualization
Part 8: Matplotlib & Seaborn - Data Visualization
#81.
plt.plot()Plots y versus x as lines and/or markers.
import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
# In a real script, you would call plt.show()
print("Output: A figure window opens displaying a line plot.")
Output: A figure window opens displaying a line plot.
#82.
plt.scatter()A scatter plot of y vs. x with varying marker size and/or color.
import matplotlib.pyplot as plt
plt.scatter([1, 2, 3, 4], [1, 4, 9, 16])
print("Output: A figure window opens displaying a scatter plot.")
Output: A figure window opens displaying a scatter plot.
#83.
plt.hist()Computes and draws the histogram of x.
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(1000)
plt.hist(data, bins=30)
print("Output: A figure window opens displaying a histogram.")
Output: A figure window opens displaying a histogram.
#84.
plt.bar()Makes a bar plot.
import matplotlib.pyplot as plt
plt.bar(['A', 'B', 'C'], [10, 15, 7])
print("Output: A figure window opens displaying a bar chart.")
Output: A figure window opens displaying a bar chart.
#85.
plt.boxplot()Makes a box and whisker plot.
import matplotlib.pyplot as plt
import numpy as np
data = [np.random.normal(0, std, 100) for std in range(1, 4)]
plt.boxplot(data)
print("Output: A figure window opens displaying a box plot.")
Output: A figure window opens displaying a box plot.
#86.
sns.heatmap()Plots rectangular data as a color-encoded matrix.
import seaborn as sns
import numpy as np
data = np.random.rand(10, 12)
sns.heatmap(data)
print("Output: A figure window opens displaying a heatmap.")
Output: A figure window opens displaying a heatmap.
#87.
sns.pairplot()Plots pairwise relationships in a dataset.
β€2
import seaborn as sns
import pandas as pd
df = pd.DataFrame(np.random.randn(100, 4), columns=['A', 'B', 'C', 'D'])
# sns.pairplot(df) # This line would generate the plot
print("Output: A figure grid opens showing scatterplots for each pair of variables.")
Output: A figure grid opens showing scatterplots for each pair of variables.
#88.
sns.countplot()Shows the counts of observations in each categorical bin using bars.
import seaborn as sns
import pandas as pd
df = pd.DataFrame({'category': ['A', 'B', 'A', 'C', 'A', 'B']})
sns.countplot(x='category', data=df)
print("Output: A figure window opens showing a count plot.")
Output: A figure window opens showing a count plot.
#89.
sns.jointplot()Draws a plot of two variables with bivariate and univariate graphs.
import seaborn as sns
import pandas as pd
df = pd.DataFrame({'x': range(50), 'y': range(50) + np.random.randn(50)})
# sns.jointplot(x='x', y='y', data=df) # This line would generate the plot
print("Output: A figure shows a scatter plot with histograms for each axis.")
Output: A figure shows a scatter plot with histograms for each axis.
#90.
plt.show()Displays all open figures.
import matplotlib.pyplot as plt
plt.plot([1, 2, 3])
# plt.show() # In a script, this is essential to see the plot.
print("Executes the command to render and display the plot.")
Executes the command to render and display the plot.
---
#DataAnalysis #ScikitLearn #Modeling #Preprocessing
Part 9: Scikit-learn - Modeling & Preprocessing
#91.
train_test_split()Splits arrays or matrices into random train and test subsets.
from sklearn.model_selection import train_test_split
import numpy as np
X, y = np.arange(10).reshape((5, 2)), range(5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
print(f"X_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")
X_train shape: (3, 2)
X_test shape: (2, 2)
#92.
StandardScaler()Standardizes features by removing the mean and scaling to unit variance.
from sklearn.preprocessing import StandardScaler
data = [[0, 0], [0, 0], [1, 1], [1, 1]]
scaler = StandardScaler()
print(scaler.fit_transform(data))
[[-1. -1.]
[-1. -1.]
[ 1. 1.]
[ 1. 1.]]
#93.
MinMaxScaler()Transforms features by scaling each feature to a given range, typically [0, 1].
from sklearn.preprocessing import MinMaxScaler
data = [[-1, 2], [-0.5, 6], [0, 10], [1, 18]]
scaler = MinMaxScaler()
print(scaler.fit_transform(data))
[[0. 0. ]
[0.25 0.25]
[0.5 0.5 ]
[1. 1. ]]
#94.
LabelEncoder()Encodes target labels with values between 0 and n_classes-1.
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
encoded = le.fit_transform(['paris', 'tokyo', 'paris'])
print(encoded)
[0 1 0]
#95.
OneHotEncoder()Encodes categorical features as a one-hot numeric array.
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
X = [['Male'], ['Female'], ['Female']]
print(enc.fit_transform(X).toarray())
[[0. 1.]
[1. 0.]
[1. 0.]]
#96.
LinearRegression()Ordinary least squares Linear Regression model.
from sklearn.linear_model import LinearRegression
X = [[0], [1], [2]]
y = [0, 1, 2]
reg = LinearRegression().fit(X, y)
print(f"Coefficient: {reg.coef_[0]}")
β€2
Coefficient: 1.0
#97.
LogisticRegression()Implements Logistic Regression for classification.
from sklearn.linear_model import LogisticRegression
X = [[-1], [0], [1], [2]]
y = [0, 0, 1, 1]
clf = LogisticRegression().fit(X, y)
print(f"Prediction for [[-2]]: {clf.predict([[-2]])}")
Prediction for [[-2]]: [0]
#98.
KMeans()K-Means clustering algorithm.
from sklearn.cluster import KMeans
X = [[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]
kmeans = KMeans(n_clusters=2, n_init='auto').fit(X)
print(kmeans.labels_)
[0 0 0 1 1 1]
(Note: Cluster labels may be flipped, e.g., [1 1 1 0 0 0])
#99.
accuracy_score()Calculates the accuracy classification score.
from sklearn.metrics import accuracy_score
y_true = [0, 1, 1, 0]
y_pred = [0, 1, 0, 0]
print(accuracy_score(y_true, y_pred))
0.75
#100.
confusion_matrix()Computes a confusion matrix to evaluate the accuracy of a classification.
from sklearn.metrics import confusion_matrix
y_true = [0, 1, 0, 1]
y_pred = [1, 1, 0, 1]
print(confusion_matrix(y_true, y_pred))
[[1 1]
[0 2]]
βββββββββββββββ
By: @CodeProgrammer β¨
β€6π4π2
Unlock premium learning without spending a dime! βοΈ @DataScienceC is the first Telegram channel dishing out free Udemy coupons dailyβgrab courses on data science, coding, AI, and beyond. Join the revolution and boost your skills for free today! π
What topic are you itching to learn next?π
https://t.me/DataScienceCπ
What topic are you itching to learn next?
https://t.me/DataScienceC
Please open Telegram to view this post
VIEW IN TELEGRAM
Telegram
Udemy Coupons
ads: @HusseinSheikho
The first channel in Telegram that offers free
Udemy coupons
The first channel in Telegram that offers free
Udemy coupons
β€2
Please open Telegram to view this post
VIEW IN TELEGRAM
β€1
π‘ Top 50 Pillow Operations for Image Processing
I. File & Basic Operations
β’ Open an image file.
β’ Save an image.
β’ Display an image (opens in default viewer).
β’ Create a new blank image.
β’ Get image format (e.g., 'JPEG').
β’ Get image dimensions as a (width, height) tuple.
β’ Get pixel format (e.g., 'RGB', 'L' for grayscale).
β’ Convert image mode.
β’ Get a pixel's color value at (x, y).
β’ Set a pixel's color value at (x, y).
II. Cropping, Resizing & Pasting
β’ Crop a rectangular region.
β’ Resize an image to an exact size.
β’ Create a thumbnail (maintains aspect ratio).
β’ Paste one image onto another.
III. Rotation & Transformation
β’ Rotate an image (counter-clockwise).
β’ Flip an image horizontally.
β’ Flip an image vertically.
β’ Rotate by 90, 180, or 270 degrees.
β’ Apply an affine transformation.
IV. ImageOps Module Helpers
β’ Invert image colors.
β’ Flip an image horizontally (mirror).
β’ Flip an image vertically.
β’ Convert to grayscale.
β’ Colorize a grayscale image.
β’ Reduce the number of bits for each color channel.
β’ Auto-adjust image contrast.
β’ Equalize the image histogram.
β’ Add a border to an image.
V. Color & Pixel Operations
β’ Split image into individual bands (e.g., R, G, B).
β’ Merge bands back into an image.
β’ Apply a function to each pixel.
β’ Get a list of colors used in the image.
β’ Blend two images with alpha compositing.
VI. Filters (ImageFilter)
I. File & Basic Operations
β’ Open an image file.
from PIL import Image
img = Image.open("image.jpg")
β’ Save an image.
img.save("new_image.png")β’ Display an image (opens in default viewer).
img.show()
β’ Create a new blank image.
new_img = Image.new("RGB", (200, 100), "blue")β’ Get image format (e.g., 'JPEG').
print(img.format)
β’ Get image dimensions as a (width, height) tuple.
width, height = img.size
β’ Get pixel format (e.g., 'RGB', 'L' for grayscale).
print(img.mode)
β’ Convert image mode.
grayscale_img = img.convert("L")β’ Get a pixel's color value at (x, y).
r, g, b = img.getpixel((10, 20))
β’ Set a pixel's color value at (x, y).
img.putpixel((10, 20), (255, 0, 0))
II. Cropping, Resizing & Pasting
β’ Crop a rectangular region.
box = (100, 100, 400, 400)
cropped_img = img.crop(box)
β’ Resize an image to an exact size.
resized_img = img.resize((200, 200))
β’ Create a thumbnail (maintains aspect ratio).
img.thumbnail((128, 128))
β’ Paste one image onto another.
img.paste(another_img, (50, 50))
III. Rotation & Transformation
β’ Rotate an image (counter-clockwise).
rotated_img = img.rotate(45, expand=True)
β’ Flip an image horizontally.
flipped_img = img.transpose(Image.FLIP_LEFT_RIGHT)
β’ Flip an image vertically.
flipped_img = img.transpose(Image.FLIP_TOP_BOTTOM)
β’ Rotate by 90, 180, or 270 degrees.
img_90 = img.transpose(Image.ROTATE_90)
β’ Apply an affine transformation.
transformed = img.transform(img.size, Image.AFFINE, (1, 0.5, 0, 0, 1, 0))
IV. ImageOps Module Helpers
β’ Invert image colors.
from PIL import ImageOps
inverted_img = ImageOps.invert(img)
β’ Flip an image horizontally (mirror).
mirrored_img = ImageOps.mirror(img)
β’ Flip an image vertically.
flipped_v_img = ImageOps.flip(img)
β’ Convert to grayscale.
grayscale = ImageOps.grayscale(img)
β’ Colorize a grayscale image.
colorized = ImageOps.colorize(grayscale, black="blue", white="yellow")
β’ Reduce the number of bits for each color channel.
posterized = ImageOps.posterize(img, 4)
β’ Auto-adjust image contrast.
adjusted_img = ImageOps.autocontrast(img)
β’ Equalize the image histogram.
equalized_img = ImageOps.equalize(img)
β’ Add a border to an image.
bordered = ImageOps.expand(img, border=10, fill='black')
V. Color & Pixel Operations
β’ Split image into individual bands (e.g., R, G, B).
r, g, b = img.split()
β’ Merge bands back into an image.
merged_img = Image.merge("RGB", (r, g, b))β’ Apply a function to each pixel.
brighter_img = img.point(lambda i: i * 1.2)
β’ Get a list of colors used in the image.
colors = img.getcolors(maxcolors=256)
β’ Blend two images with alpha compositing.
# Both images must be in RGBA mode
blended = Image.alpha_composite(img1_rgba, img2_rgba)
VI. Filters (ImageFilter)
β€2
β’ Apply a simple blur filter.
β’ Apply a box blur with a given radius.
β’ Apply a Gaussian blur.
β’ Sharpen the image.
β’ Find edges.
β’ Enhance edges.
β’ Emboss the image.
β’ Find contours.
VII. Image Enhancement (ImageEnhance)
β’ Adjust color saturation.
β’ Adjust brightness.
β’ Adjust contrast.
β’ Adjust sharpness.
VIII. Drawing (ImageDraw & ImageFont)
β’ Draw text on an image.
β’ Draw a line.
β’ Draw a rectangle (outline).
β’ Draw a filled ellipse.
β’ Draw a polygon.
#Python #Pillow #ImageProcessing #PIL #CheatSheet
βββββββββββββββ
By: @CodeProgrammer β¨
from PIL import ImageFilter
blurred_img = img.filter(ImageFilter.BLUR)
β’ Apply a box blur with a given radius.
box_blur = img.filter(ImageFilter.BoxBlur(5))
β’ Apply a Gaussian blur.
gaussian_blur = img.filter(ImageFilter.GaussianBlur(radius=2))
β’ Sharpen the image.
sharpened = img.filter(ImageFilter.SHARPEN)
β’ Find edges.
edges = img.filter(ImageFilter.FIND_EDGES)
β’ Enhance edges.
edge_enhanced = img.filter(ImageFilter.EDGE_ENHANCE)
β’ Emboss the image.
embossed = img.filter(ImageFilter.EMBOSS)
β’ Find contours.
contours = img.filter(ImageFilter.CONTOUR)
VII. Image Enhancement (ImageEnhance)
β’ Adjust color saturation.
from PIL import ImageEnhance
enhancer = ImageEnhance.Color(img)
vibrant_img = enhancer.enhance(2.0)
β’ Adjust brightness.
enhancer = ImageEnhance.Brightness(img)
bright_img = enhancer.enhance(1.5)
β’ Adjust contrast.
enhancer = ImageEnhance.Contrast(img)
contrast_img = enhancer.enhance(1.5)
β’ Adjust sharpness.
enhancer = ImageEnhance.Sharpness(img)
sharp_img = enhancer.enhance(2.0)
VIII. Drawing (ImageDraw & ImageFont)
β’ Draw text on an image.
from PIL import ImageDraw, ImageFont
draw = ImageDraw.Draw(img)
font = ImageFont.truetype("arial.ttf", 36)
draw.text((10, 10), "Hello", font=font, fill="red")
β’ Draw a line.
draw.line((0, 0, 100, 200), fill="blue", width=3)
β’ Draw a rectangle (outline).
draw.rectangle([10, 10, 90, 60], outline="green", width=2)
β’ Draw a filled ellipse.
draw.ellipse([100, 100, 180, 150], fill="yellow")
β’ Draw a polygon.
draw.polygon([(10,10), (20,50), (60,10)], fill="purple")
#Python #Pillow #ImageProcessing #PIL #CheatSheet
βββββββββββββββ
By: @CodeProgrammer β¨
β€7π₯6π2π2
Core Python Cheatsheet.pdf
173.3 KB
Python is a high-level, interpreted programming language known for its simplicity, readability, and
versatility. It was first released in 1991 by Guido van Rossum and has since become one of the most
popular programming languages in the world.
Pythonβs syntax emphasizes readability, with code written in a clear and concise manner using whitespace and indentation to define blocks of code. It is an interpreted language, meaning that
code is executed line-by-line rather than compiled into machine code. This makes it easy to write and test code quickly, without needing to worry about the details of low-level hardware.
Python is a general-purpose language, meaning that it can be used for a wide variety of applications, from web development to scientific computing to artificial intelligence and machine learning. Its simplicity and ease of use make it a popular choice for beginners, while its power and flexibility make it a favorite of experienced developers.
Pythonβs standard library contains a wide range of modules and packages, providing support for
everything from basic data types and control structures to advanced data manipulation and visualization. Additionally, there are countless third-party packages available through Pythonβs package manager, pip, allowing developers to easily extend Pythonβs capabilities to suit their needs.
Overall, Pythonβs combination of simplicity, power, and flexibility makes it an ideal language for a wide range of applications and skill levels.
https://t.me/CodeProgrammer
Please open Telegram to view this post
VIEW IN TELEGRAM
β€5π1