Python | Machine Learning | Coding

Building a PDF <> EPUB Telegram Converter Bot

👇

Please open Telegram to view this post

❤4

841 viewsedited 09:18

#PDF #EPUB #TelegramBot #Python #SQLite #Project

Lesson: Building a PDF <> EPUB Telegram Converter Bot

This lesson walks you through creating a fully functional Telegram bot from scratch. The bot will accept PDF or EPUB files, convert them to the other format, and log each transaction in an SQLite database.

---

Part 1: Prerequisites & Setup

First, we need to install the necessary Python library for the Telegram Bot API. We will also rely on Calibre's command-line tools for conversion.

Important: You must install Calibre on the system where the bot will run and ensure its ebook-convert tool is in your system's PATH.

pip install python-telegram-bot==20.3

#Setup #Prerequisites

---

Part 2: Database Initialization

We'll use SQLite to log every successful conversion. Create a file named database_setup.py and run it once to create the database file and the table.

# database_setup.py
import sqlite3

def setup_database():
    conn = sqlite3.connect('conversions.db')
    cursor = conn.cursor()
    
    # Create table to store conversion logs
    cursor.execute('''
        CREATE TABLE IF NOT EXISTS conversions (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            user_id INTEGER NOT NULL,
            original_filename TEXT NOT NULL,
            converted_filename TEXT NOT NULL,
            conversion_type TEXT NOT NULL,
            timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
        )
    ''')
    
    conn.commit()
    conn.close()
    print("Database setup complete. 'conversions.db' is ready.")

if __name__ == '__main__':
    setup_database()

#Database #SQLite #Initialization

---

Part 3: The Main Bot Script - Imports & Basic Commands

Now, let's create our main bot file, converter_bot.py. We'll start with imports and the initial /start and /help commands.

# converter_bot.py
import logging
import os
import sqlite3
import subprocess
from telegram import Update
from telegram.ext import Application, CommandHandler, MessageHandler, filters, ContextTypes

# Enable logging
logging.basicConfig(format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', level=logging.INFO)

# --- Bot Token ---
TELEGRAM_TOKEN = "YOUR_TELEGRAM_BOT_TOKEN"

# --- Command Handlers ---
async def start(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
    user = update.effective_user
    await update.message.reply_html(
        rf"Hi {user.mention_html()}! Send me a PDF or EPUB file to convert.",
    )

async def help_command(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
    await update.message.reply_text("Simply send a .pdf file to get an .epub, or send an .epub file to get a .pdf. Note: Conversion quality depends on the source file's structure.")

#TelegramBot #Python #Boilerplate

---

Part 4: The Core Conversion Logic

This function will be the heart of our bot. It uses the ebook-convert command-line tool (from Calibre) to perform the conversion. It's crucial that Calibre is installed correctly for this to work.

❤1

807 views09:19

Python | Machine Learning | Coding | R

# converter_bot.py (continued)

def run_conversion(input_path: str, output_path: str) -> bool:
    """Runs the ebook-convert command and returns True on success."""
    try:
        command = ['ebook-convert', input_path, output_path]
        result = subprocess.run(command, check=True, capture_output=True, text=True)
        logging.info(f"Calibre output: {result.stdout}")
        return True
    except FileNotFoundError:
        logging.error("CRITICAL: 'ebook-convert' command not found. Is Calibre installed and in the system's PATH?")
        return False
    except subprocess.CalledProcessError as e:
        logging.error(f"Conversion failed for {input_path}. Error: {e.stderr}")
        return False

#Conversion #Calibre #Subprocess

---

Part 5: Database Logging Function

This helper function will connect to our SQLite database and insert a new record for each successful conversion.

# converter_bot.py (continued)

def log_to_db(user_id: int, original_file: str, converted_file: str, conv_type: str):
    """Logs a successful conversion to the SQLite database."""
    try:
        conn = sqlite3.connect('conversions.db')
        cursor = conn.cursor()
        cursor.execute(
            "INSERT INTO conversions (user_id, original_filename, converted_filename, conversion_type) VALUES (?, ?, ?, ?)",
            (user_id, original_file, converted_file, conv_type)
        )
        conn.commit()
        conn.close()
    except sqlite3.Error as e:
        logging.error(f"Database error: {e}")

#Database #Logging #SQLite

---

Part 6: Handling Incoming Files

This is the main handler that will be triggered when a user sends a document. It downloads the file, determines the target format, calls the conversion function, sends the result back, logs it, and cleans up.

# converter_bot.py (continued)

async def handle_document(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
    doc = update.message.document
    file_id = doc.file_id
    file_name = doc.file_name
    
    input_path = os.path.join("downloads", file_name)
    os.makedirs("downloads", exist_ok=True) # Ensure download directory exists
    
    new_file = await context.bot.get_file(file_id)
    await new_file.download_to_drive(input_path)
    
    await update.message.reply_text(f"Received '{file_name}'. Starting conversion...")

    output_path = ""
    conversion_type = ""

    if file_name.lower().endswith('.pdf'):
        output_path = input_path.rsplit('.', 1)[0] + '.epub'
        conversion_type = "PDF -> EPUB"
    elif file_name.lower().endswith('.epub'):
        output_path = input_path.rsplit('.', 1)[0] + '.pdf'
        conversion_type = "EPUB -> PDF"
    else:
        await update.message.reply_text("Sorry, I only support PDF and EPUB files.")
        os.remove(input_path)
        return

    # Run the conversion
    success = run_conversion(input_path, output_path)
    
    if success and os.path.exists(output_path):
        await update.message.reply_text("Conversion successful! Uploading your file...")
        await context.bot.send_document(chat_id=update.effective_chat.id, document=open(output_path, 'rb'))
        
        # Log to database
        log_to_db(update.effective_user.id, file_name, os.path.basename(output_path), conversion_type)
    else:
        await update.message.reply_text("An error occurred during conversion. Please check the file and try again. The file might be corrupted or protected.")

    # Cleanup
    if os.path.exists(input_path):
        os.remove(input_path)
    if os.path.exists(output_path):
        os.remove(output_path)

#FileHandler #BotLogic

---

934 views09:19

Python | Machine Learning | Coding | R

Part 7: Main Execution Block

Finally, this block sets up the application, registers all our handlers, and starts the bot. This code goes at the end of converter_bot.py.

# converter_bot.py (continued)

def main() -> None:
    """Start the bot."""
    application = Application.builder().token(TELEGRAM_TOKEN).build()

    # Register handlers
    application.add_handler(CommandHandler("start", start))
    application.add_handler(CommandHandler("help", help_command))
    application.add_handler(MessageHandler(filters.Document.ALL, handle_document))

    # Run the bot until the user presses Ctrl-C
    print("Bot is running...")
    application.run_polling()

if __name__ == '__main__':
    main()

#Main #Execution #RunBot

---

Part 8: Results & Discussion

To Run:
• Run python database_setup.py once.
• Replace "YOUR_TELEGRAM_BOT_TOKEN" in converter_bot.py with your actual token from BotFather.
• Run python converter_bot.py.
• Send a PDF or EPUB file to your bot on Telegram.

Expected Results:
• The bot will acknowledge the file.
• After a short processing time, it will send back the converted file.
• A new entry will be added to the conversions.db file.

Viewing the Database:
You can inspect the conversions.db file using a tool like "DB Browser for SQLite" or the command line:
sqlite3 conversions.db "SELECT * FROM conversions;"

Discussion & Limitations:
• Dependency: The bot is entirely dependent on a local installation of Calibre. This makes it hard to deploy on simple hosting services. A Docker-based deployment would be a good solution.
• Conversion Quality: Converting from PDF, especially those with complex layouts, images, and columns, can result in poor EPUB formatting. This is a fundamental limitation of PDF-to-EPUB conversion, not just a flaw in the bot.
• Synchronous Processing: The bot handles one file at a time. If two users send files simultaneously, one has to wait. For a larger scale, a task queue system (like Celery with Redis) would be necessary to handle conversions asynchronously in the background.
• Error Handling: The current error messaging is generic. Advanced versions could parse Calibre's error output to give users more specific feedback (e.g., "This PDF is password-protected").

#Results #Discussion #Limitations #Scalability

━━━━━━━━━━━━━━━
By: @CodeProgrammer ✨

❤8👍1

1.35K views09:19

Python | Machine Learning | Coding | R

Top 100 Data Analysis Commands & Functions

👇

Please open Telegram to view this post

VIEW IN TELEGRAM

❤3

1K viewsedited 17:01

Python | Machine Learning | Coding | R

Top 100 Data Analysis Commands & Functions

#DataAnalysis #Pandas #DataLoading #Inspection

Part 1: Pandas - Data Loading & Inspection

#1. pd.read_csv()
Reads a comma-separated values (csv) file into a Pandas DataFrame.

import pandas as pd
from io import StringIO

csv_data = "col1,col2,col3\n1,a,True\n2,b,False"
df = pd.read_csv(StringIO(csv_data))
print(df)

col1 col2   col3
0     1    a   True
1     2    b  False

#2. df.head()
Returns the first n rows of the DataFrame (default is 5).

import pandas as pd
df = pd.DataFrame({'A': range(10), 'B': list('abcdefghij')})
print(df.head(3))

#3. df.tail()
Returns the last n rows of theDataFrame (default is 5).

import pandas as pd
df = pd.DataFrame({'A': range(10), 'B': list('abcdefghij')})
print(df.tail(3))

#4. df.info()
Prints a concise summary of a DataFrame, including data types and non-null values.

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, np.nan], 'B': ['x', 'y', 'z']})
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   A       2 non-null      float64
 1   B       3 non-null      object 
dtypes: float64(1), object(1)
memory usage: 176.0+ bytes

#5. df.describe()
Generates descriptive statistics for numerical columns.

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 5]})
print(df.describe())

A
count  5.000000
mean   3.000000
std    1.581139
min    1.000000
25%    2.000000
50%    3.000000
75%    4.000000
max    5.000000

#6. df.shape
Returns a tuple representing the dimensionality (rows, columns) of the DataFrame.

import pandas as pd
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4], 'C': [5, 6]})
print(df.shape)

(2, 3)

#7. df.columns
Returns the column labels of the DataFrame.

import pandas as pd
df = pd.DataFrame({'Name': ['Alice'], 'Age': [30]})
print(df.columns)

Index(['Name', 'Age'], dtype='object')

#8. df.dtypes
Returns the data types of each column.

import pandas as pd
df = pd.DataFrame({'A': [1, 2], 'B': [1.1, 2.2], 'C': ['x', 'y']})
print(df.dtypes)

A      int64
B    float64
C     object
dtype: object

#9. df['col'].value_counts()
Returns a Series containing counts of unique values in a column.

import pandas as pd
df = pd.DataFrame({'Fruit': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana', 'Apple']})
print(df['Fruit'].value_counts())

Apple     3
Banana    2
Orange    1
Name: Fruit, dtype: int64

#10. df['col'].unique()
Returns an array of the unique values in a column.

import pandas as pd
df = pd.DataFrame({'Fruit': ['Apple', 'Banana', 'Apple', 'Orange']})
print(df['Fruit'].unique())

['Apple' 'Banana' 'Orange']

#11. df['col'].nunique()
Returns the number of unique values in a column.

❤2

859 views17:02

Python | Machine Learning | Coding | R

import pandas as pd
df = pd.DataFrame({'Fruit': ['Apple', 'Banana', 'Apple', 'Orange']})
print(df['Fruit'].nunique())

#12. df.isnull()
Returns a DataFrame of boolean values indicating missing values.

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan], 'B': [np.nan, 'x']})
print(df.isnull())

A      B
0  False   True
1   True  False

#13. df.isnull().sum()
Returns the number of missing values in each column.

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3, np.nan], 'B': [5, 6, 7, 8]})
print(df.isnull().sum())

A    2
B    0
dtype: int64

#14. df.to_csv()
Writes the DataFrame to a comma-separated values (csv) file.

import pandas as pd
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
csv_output = df.to_csv(index=False)
print(csv_output)

A,B
1,3
2,4

#15. df.copy()
Creates a deep copy of a DataFrame.

import pandas as pd
df1 = pd.DataFrame({'A': [1]})
df2 = df1.copy()
df2.loc[0, 'A'] = 99
print(f"Original df1:\n{df1}")
print(f"Copied df2:\n{df2}")

Original df1:
   A
0  1
Copied df2:
    A
0  99

---
#DataAnalysis #Pandas #Selection #Indexing

Part 2: Pandas - Data Selection & Indexing

#16. df['col']
Selects a single column as a Series.

import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25]})
print(df['Name'])

0    Alice
1      Bob
Name: Name, dtype: object

#17. df[['col1', 'col2']]
Selects multiple columns as a new DataFrame.

import pandas as pd
df = pd.DataFrame({'Name': ['Alice'], 'Age': [30], 'City': ['New York']})
print(df[['Name', 'City']])

Name       City
0  Alice   New York

#18. df.loc[]
Accesses a group of rows and columns by label(s) or a boolean array.

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3]}, index=['x', 'y', 'z'])
print(df.loc['y'])

A    2
Name: y, dtype: int64

#19. df.iloc[]
Accesses a group of rows and columns by integer position(s).

import pandas as pd
df = pd.DataFrame({'A': [10, 20, 30]})
print(df.iloc[1])

A    20
Name: 1, dtype: int64

#20. df[df['col'] > value]
Selects rows based on a boolean condition (boolean indexing).

import pandas as pd
df = pd.DataFrame({'Age': [22, 35, 18, 40]})
print(df[df['Age'] > 30])

Age
1   35
3   40

#21. df.set_index()
Sets the DataFrame index using existing columns.

import pandas as pd
df = pd.DataFrame({'Country': ['USA', 'UK'], 'Code': [1, 44]})
df_indexed = df.set_index('Country')
print(df_indexed)

Code
Country      
USA         1
UK         44

#22. df.reset_index()
Resets the index of the DataFrame and uses the default integer index.

import pandas as pd
df = pd.DataFrame({'Code': [1, 44]}, index=['USA', 'UK'])
df_reset = df.reset_index()
print(df_reset)

index  Code
0   USA     1
1    UK    44

#23. df.at[]
Accesses a single value by row/column label pair. Faster than .loc.

❤1

625 views17:02

Python | Machine Learning | Coding | R

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3]}, index=['x', 'y', 'z'])
print(df.at['y', 'A'])

#24. df.iat[]
Accesses a single value by row/column integer position. Faster than .iloc.

import pandas as pd
df = pd.DataFrame({'A': [10, 20, 30]})
print(df.iat[1, 0])

#25. df.sample()
Returns a random sample of items from an axis of object.

import pandas as pd
df = pd.DataFrame({'A': range(10)})
print(df.sample(n=3))

A
8  8
2  2
5  5
(Note: Output rows will be random)

---
#DataAnalysis #Pandas #DataCleaning #Manipulation

Part 3: Pandas - Data Cleaning & Manipulation

#26. df.dropna()
Removes missing values.

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3]})
print(df.dropna())

A
0  1.0
2  3.0

#27. df.fillna()
Fills missing (NA/NaN) values using a specified method.

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3]})
print(df.fillna(0))

#28. df.astype()
Casts a pandas object to a specified dtype.

import pandas as pd
df = pd.DataFrame({'A': [1.1, 2.7, 3.5]})
df['A'] = df['A'].astype(int)
print(df)

#29. df.rename()
Alters axes labels.

import pandas as pd
df = pd.DataFrame({'a': [1], 'b': [2]})
df_renamed = df.rename(columns={'a': 'A', 'b': 'B'})
print(df_renamed)

A  B
0  1  2

#30. df.drop()
Drops specified labels from rows or columns.

import pandas as pd
df = pd.DataFrame({'A': [1], 'B': [2], 'C': [3]})
df_dropped = df.drop(columns=['B'])
print(df_dropped)

A  C
0  1  3

#31. pd.to_datetime()
Converts argument to datetime.

import pandas as pd
s = pd.Series(['2023-01-01', '2023-01-02'])
dt_s = pd.to_datetime(s)
print(dt_s)

0   2023-01-01
1   2023-01-02
dtype: datetime64[ns]

#32. df.apply()
Applies a function along an axis of the DataFrame.

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3]})
df['B'] = df['A'].apply(lambda x: x * 2)
print(df)

#33. df['col'].map()
Maps values of a Series according to an input mapping or function.

import pandas as pd
df = pd.DataFrame({'Gender': ['M', 'F', 'M']})
df['Gender_Full'] = df['Gender'].map({'M': 'Male', 'F': 'Female'})
print(df)

Gender Gender_Full
0      M        Male
1      F      Female
2      M        Male

#34. df.replace()
Replaces values given in to_replace with value.

import pandas as pd
df = pd.DataFrame({'Score': [10, -99, 15, -99]})
df_replaced = df.replace(-99, 0)
print(df_replaced)

#35. df.duplicated()
Returns a boolean Series denoting duplicate rows.

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 1], 'B': ['a', 'b', 'a']})
print(df.duplicated())

0    False
1    False
2     True
dtype: bool

#36. df.drop_duplicates()
Returns a DataFrame with duplicate rows removed.

575 views17:02

Python | Machine Learning | Coding | R

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 1], 'B': ['a', 'b', 'a']})
print(df.drop_duplicates())

A  B
0  1  a
1  2  b

#37. df.sort_values()
Sorts by the values along either axis.

import pandas as pd
df = pd.DataFrame({'Age': [25, 22, 30]})
print(df.sort_values(by='Age'))

#38. df.sort_index()
Sorts object by labels (along an axis).

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3]}, index=[10, 5, 8])
print(df.sort_index())

#39. pd.cut()
Bins values into discrete intervals.

import pandas as pd
ages = pd.Series([22, 35, 58, 8, 42])
age_bins = pd.cut(ages, bins=[0, 18, 35, 60], labels=['Child', 'Adult', 'Senior'])
print(age_bins)

0     Adult
1     Adult
2    Senior
3     Child
4    Senior
dtype: category
Categories (3, object): ['Child' < 'Adult' < 'Senior']

#40. pd.qcut()
Quantile-based discretization function (bins into equal-sized groups).

import pandas as pd
data = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
quartiles = pd.qcut(data, 4, labels=False)
print(quartiles)

0    0
1    0
2    0
3    1
4    1
5    2
6    2
7    3
8    3
9    3
dtype: int64

#41. s.str.contains()
Tests if a pattern or regex is contained within a string of a Series.

import pandas as pd
s = pd.Series(['apple', 'banana', 'apricot'])
print(s[s.str.contains('ap')])

0      apple
2    apricot
dtype: object

#42. s.str.split()
Splits strings around a given separator/delimiter.

import pandas as pd
s = pd.Series(['a_b', 'c_d'])
print(s.str.split('_', expand=True))

0  1
0  a  b
1  c  d

#43. s.str.lower()
Converts strings in the Series to lowercase.

import pandas as pd
s = pd.Series(['HELLO', 'World'])
print(s.str.lower())

0    hello
1    world
dtype: object

#44. s.str.strip()
Removes leading and trailing whitespace.

import pandas as pd
s = pd.Series(['  hello  ', ' world '])
print(s.str.strip())

0    hello
1    world
dtype: object

#45. s.dt.year
Extracts the year from a datetime Series.

import pandas as pd
s = pd.to_datetime(pd.Series(['2023-01-01', '2024-05-10']))
print(s.dt.year)

0    2023
1    2024
dtype: int64

---
#DataAnalysis #Pandas #Grouping #Aggregation

Part 4: Pandas - Grouping & Aggregation

#46. df.groupby()
Groups a DataFrame using a mapper or by a Series of columns.

import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
grouped = df.groupby('Team')
print(grouped)

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x...>

#47. groupby.agg()
Aggregates using one or more operations over the specified axis.

import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
agg_df = df.groupby('Team').agg(['mean', 'sum'])
print(agg_df)

Points     
        mean  sum
Team             
A         11   22
B          7   14

#48. groupby.size()
Computes group sizes.

❤1

545 views17:02

Python | Machine Learning | Coding | R

import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B', 'A']})
print(df.groupby('Team').size())

Team
A    3
B    2
dtype: int64

#49. groupby.count()
Computes the count of non-NA cells for each group.

import pandas as pd
import numpy as np
df = pd.DataFrame({'Team': ['A', 'B', 'A'], 'Score': [1, np.nan, 3]})
print(df.groupby('Team').count())

Score
Team       
A         2
B         0

#50. groupby.mean()
Computes the mean of group values.

import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
print(df.groupby('Team').mean())

Points
Team        
A         11
B          7

#51. groupby.sum()
Computes the sum of group values.

import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
print(df.groupby('Team').sum())

Points
Team        
A         22
B         14

#52. groupby.min()
Computes the minimum of group values.

import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
print(df.groupby('Team').min())

Points
Team        
A         10
B          6

#53. groupby.max()
Computes the maximum of group values.

import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
print(df.groupby('Team').max())

Points
Team        
A         12
B          8

#54. df.pivot_table()
Creates a spreadsheet-style pivot table as a DataFrame.

import pandas as pd
df = pd.DataFrame({'A': ['foo', 'foo', 'bar'], 'B': ['one', 'two', 'one'], 'C': [1, 2, 3]})
pivot = df.pivot_table(values='C', index='A', columns='B')
print(pivot)

B    one  two
A            
bar  3.0  NaN
foo  1.0  2.0

#55. pd.crosstab()
Computes a cross-tabulation of two (or more) factors.

import pandas as pd
df = pd.DataFrame({'A': ['foo', 'foo', 'bar'], 'B': ['one', 'two', 'one']})
crosstab = pd.crosstab(df.A, df.B)
print(crosstab)

B    one  two
A            
bar    1    0
foo    1    1

---
#DataAnalysis #Pandas #Merging #Joining

Part 5: Pandas - Merging & Concatenating

#56. pd.merge()
Merges DataFrame or named Series objects with a database-style join.

import pandas as pd
df1 = pd.DataFrame({'key': ['A', 'B'], 'val1': [1, 2]})
df2 = pd.DataFrame({'key': ['A', 'B'], 'val2': [3, 4]})
merged = pd.merge(df1, df2, on='key')
print(merged)

key  val1  val2
0   A     1     3
1   B     2     4

#57. pd.concat()
Concatenates pandas objects along a particular axis.

import pandas as pd
df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'A': [3, 4]})
concatenated = pd.concat([df1, df2])
print(concatenated)

#58. df.join()
Joins columns with other DataFrame(s) on index or on a key column.

❤1

588 views17:02

Python | Machine Learning | Coding | R

import pandas as pd
df1 = pd.DataFrame({'val1': [1, 2]}, index=['A', 'B'])
df2 = pd.DataFrame({'val2': [3, 4]}, index=['A', 'B'])
joined = df1.join(df2)
print(joined)

val1  val2
A     1     3
B     2     4

#59. pd.get_dummies()
Converts categorical variable into dummy/indicator variables (one-hot encoding).

import pandas as pd
s = pd.Series(list('abca'))
dummies = pd.get_dummies(s)
print(dummies)

#60. df.nlargest()
Returns the first n rows ordered by columns in descending order.

import pandas as pd
df = pd.DataFrame({'population': [100, 500, 200, 800]})
print(df.nlargest(2, 'population'))

population
3         800
1         500

---
#DataAnalysis #NumPy #Arrays

Part 6: NumPy - Array Creation & Manipulation

#61. np.array()
Creates a NumPy ndarray.

import numpy as np
arr = np.array([1, 2, 3])
print(arr)

[1 2 3]

#62. np.arange()
Returns an array with evenly spaced values within a given interval.

import numpy as np
arr = np.arange(0, 5)
print(arr)

[0 1 2 3 4]

#63. np.linspace()
Returns an array with evenly spaced numbers over a specified interval.

import numpy as np
arr = np.linspace(0, 10, 5)
print(arr)

[ 0.   2.5  5.   7.5 10. ]

#64. np.zeros()
Returns a new array of a given shape and type, filled with zeros.

import numpy as np
arr = np.zeros((2, 3))
print(arr)

[[0. 0. 0.]
 [0. 0. 0.]]

#65. np.ones()
Returns a new array of a given shape and type, filled with ones.

import numpy as np
arr = np.ones((2, 3))
print(arr)

[[1. 1. 1.]
 [1. 1. 1.]]

#66. np.random.rand()
Creates an array of the given shape and populates it with random samples from a uniform distribution over [0, 1).

import numpy as np
arr = np.random.rand(2, 2)
print(arr)

[[0.13949386 0.2921446 ]
 [0.52273283 0.77122228]]
(Note: Output values will be random)

#67. arr.reshape()
Gives a new shape to an array without changing its data.

import numpy as np
arr = np.arange(6)
reshaped_arr = arr.reshape((2, 3))
print(reshaped_arr)

[[0 1 2]
 [3 4 5]]

#68. np.concatenate()
Joins a sequence of arrays along an existing axis.

import numpy as np
a = np.array([[1, 2]])
b = np.array([[3, 4]])
print(np.concatenate((a, b), axis=0))

[[1 2]
 [3 4]]

#69. np.vstack()
Stacks arrays in sequence vertically (row wise).

import numpy as np
a = np.array([1, 2])
b = np.array([3, 4])
print(np.vstack((a, b)))

[[1 2]
 [3 4]]

#70. np.hstack()
Stacks arrays in sequence horizontally (column wise).

import numpy as np
a = np.array([1, 2])
b = np.array([3, 4])
print(np.hstack((a, b)))

[1 2 3 4]

---
#DataAnalysis #NumPy #Math #Statistics

Part 7: NumPy - Mathematical & Statistical Functions

#71. np.mean()
Computes the arithmetic mean along the specified axis.

import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(np.mean(arr))

3.0

#72. np.median()
Computes the median along the specified axis.

import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(np.median(arr))

3.0

#73. np.std()
Computes the standard deviation along the specified axis.

497 views17:02

Python | Machine Learning | Coding | R

import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(np.std(arr))

1.4142135623730951

#74. np.sum()
Sums array elements over a given axis.

import numpy as np
arr = np.array([[1, 2], [3, 4]])
print(np.sum(arr))

#75. np.min()
Returns the minimum of an array or minimum along an axis.

import numpy as np
arr = np.array([5, 2, 8, 1])
print(np.min(arr))

#76. np.max()
Returns the maximum of an array or maximum along an axis.

import numpy as np
arr = np.array([5, 2, 8, 1])
print(np.max(arr))

#77. np.sqrt()
Returns the non-negative square-root of an array, element-wise.

import numpy as np
arr = np.array([4, 9, 16])
print(np.sqrt(arr))

[2. 3. 4.]

#78. np.log()
Calculates the natural logarithm, element-wise.

import numpy as np
arr = np.array([1, np.e, np.e**2])
print(np.log(arr))

[0. 1. 2.]

#79. np.dot()
Calculates the dot product of two arrays.

import numpy as np
a = np.array([1, 2])
b = np.array([3, 4])
print(np.dot(a, b))

#80. np.where()
Returns elements chosen from x or y depending on a condition.

import numpy as np
arr = np.array([10, 5, 20, 15])
print(np.where(arr > 12, 'High', 'Low'))

['Low' 'Low' 'High' 'High']

---
#DataAnalysis #Matplotlib #Seaborn #Visualization

Part 8: Matplotlib & Seaborn - Data Visualization

#81. plt.plot()
Plots y versus x as lines and/or markers.

import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
# In a real script, you would call plt.show()
print("Output: A figure window opens displaying a line plot.")

Output: A figure window opens displaying a line plot.

#82. plt.scatter()
A scatter plot of y vs. x with varying marker size and/or color.

import matplotlib.pyplot as plt
plt.scatter([1, 2, 3, 4], [1, 4, 9, 16])
print("Output: A figure window opens displaying a scatter plot.")

Output: A figure window opens displaying a scatter plot.

#83. plt.hist()
Computes and draws the histogram of x.

import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(1000)
plt.hist(data, bins=30)
print("Output: A figure window opens displaying a histogram.")

Output: A figure window opens displaying a histogram.

#84. plt.bar()
Makes a bar plot.

import matplotlib.pyplot as plt
plt.bar(['A', 'B', 'C'], [10, 15, 7])
print("Output: A figure window opens displaying a bar chart.")

Output: A figure window opens displaying a bar chart.

#85. plt.boxplot()
Makes a box and whisker plot.

import matplotlib.pyplot as plt
import numpy as np
data = [np.random.normal(0, std, 100) for std in range(1, 4)]
plt.boxplot(data)
print("Output: A figure window opens displaying a box plot.")

Output: A figure window opens displaying a box plot.

#86. sns.heatmap()
Plots rectangular data as a color-encoded matrix.

import seaborn as sns
import numpy as np
data = np.random.rand(10, 12)
sns.heatmap(data)
print("Output: A figure window opens displaying a heatmap.")

Output: A figure window opens displaying a heatmap.

#87. sns.pairplot()
Plots pairwise relationships in a dataset.

❤2

690 views17:02

Python | Machine Learning | Coding | R

import seaborn as sns
import pandas as pd
df = pd.DataFrame(np.random.randn(100, 4), columns=['A', 'B', 'C', 'D'])
# sns.pairplot(df) # This line would generate the plot
print("Output: A figure grid opens showing scatterplots for each pair of variables.")

Output: A figure grid opens showing scatterplots for each pair of variables.

#88. sns.countplot()
Shows the counts of observations in each categorical bin using bars.

import seaborn as sns
import pandas as pd
df = pd.DataFrame({'category': ['A', 'B', 'A', 'C', 'A', 'B']})
sns.countplot(x='category', data=df)
print("Output: A figure window opens showing a count plot.")

Output: A figure window opens showing a count plot.

#89. sns.jointplot()
Draws a plot of two variables with bivariate and univariate graphs.

import seaborn as sns
import pandas as pd
df = pd.DataFrame({'x': range(50), 'y': range(50) + np.random.randn(50)})
# sns.jointplot(x='x', y='y', data=df) # This line would generate the plot
print("Output: A figure shows a scatter plot with histograms for each axis.")

Output: A figure shows a scatter plot with histograms for each axis.

#90. plt.show()
Displays all open figures.

import matplotlib.pyplot as plt
plt.plot([1, 2, 3])
# plt.show() # In a script, this is essential to see the plot.
print("Executes the command to render and display the plot.")

Executes the command to render and display the plot.

---
#DataAnalysis #ScikitLearn #Modeling #Preprocessing

Part 9: Scikit-learn - Modeling & Preprocessing

#91. train_test_split()
Splits arrays or matrices into random train and test subsets.

from sklearn.model_selection import train_test_split
import numpy as np
X, y = np.arange(10).reshape((5, 2)), range(5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
print(f"X_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")

X_train shape: (3, 2)
X_test shape: (2, 2)

#92. StandardScaler()
Standardizes features by removing the mean and scaling to unit variance.

from sklearn.preprocessing import StandardScaler
data = [[0, 0], [0, 0], [1, 1], [1, 1]]
scaler = StandardScaler()
print(scaler.fit_transform(data))

[[-1. -1.]
 [-1. -1.]
 [ 1.  1.]
 [ 1.  1.]]

#93. MinMaxScaler()
Transforms features by scaling each feature to a given range, typically [0, 1].

from sklearn.preprocessing import MinMaxScaler
data = [[-1, 2], [-0.5, 6], [0, 10], [1, 18]]
scaler = MinMaxScaler()
print(scaler.fit_transform(data))

[[0.   0.  ]
 [0.25 0.25]
 [0.5  0.5 ]
 [1.   1.  ]]

#94. LabelEncoder()
Encodes target labels with values between 0 and n_classes-1.

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
encoded = le.fit_transform(['paris', 'tokyo', 'paris'])
print(encoded)

[0 1 0]

#95. OneHotEncoder()
Encodes categorical features as a one-hot numeric array.

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
X = [['Male'], ['Female'], ['Female']]
print(enc.fit_transform(X).toarray())

[[0. 1.]
 [1. 0.]
 [1. 0.]]

#96. LinearRegression()
Ordinary least squares Linear Regression model.

from sklearn.linear_model import LinearRegression
X = [[0], [1], [2]]
y = [0, 1, 2]
reg = LinearRegression().fit(X, y)
print(f"Coefficient: {reg.coef_[0]}")

❤2

1.19K views17:02

Python | Machine Learning | Coding | R

Coefficient: 1.0

#97. LogisticRegression()
Implements Logistic Regression for classification.

from sklearn.linear_model import LogisticRegression
X = [[-1], [0], [1], [2]]
y = [0, 0, 1, 1]
clf = LogisticRegression().fit(X, y)
print(f"Prediction for [[-2]]: {clf.predict([[-2]])}")

Prediction for [[-2]]: [0]

#98. KMeans()
K-Means clustering algorithm.

from sklearn.cluster import KMeans
X = [[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]
kmeans = KMeans(n_clusters=2, n_init='auto').fit(X)
print(kmeans.labels_)

[0 0 0 1 1 1]
(Note: Cluster labels may be flipped, e.g., [1 1 1 0 0 0])

#99. accuracy_score()
Calculates the accuracy classification score.

from sklearn.metrics import accuracy_score
y_true = [0, 1, 1, 0]
y_pred = [0, 1, 0, 0]
print(accuracy_score(y_true, y_pred))

0.75

#100. confusion_matrix()
Computes a confusion matrix to evaluate the accuracy of a classification.

from sklearn.metrics import confusion_matrix
y_true = [0, 1, 0, 1]
y_pred = [1, 1, 0, 1]
print(confusion_matrix(y_true, y_pred))

[[1 1]
 [0 2]]

━━━━━━━━━━━━━━━
By: @CodeProgrammer ✨

❤6🆒4👍2

1.71K views17:02

Python | Machine Learning | Coding | R

Forwarded from Kaggle DataSets for Data Science, Machine Learning and Data Analysis

Unlock premium learning without spending a dime! ⭐️ @DataScienceC is the first Telegram channel dishing out free Udemy coupons daily—grab courses on data science, coding, AI, and beyond. Join the revolution and boost your skills for free today! 📕

What topic are you itching to learn next? 😊
https://t.me/DataScienceC

🌟

Please open Telegram to view this post

VIEW IN TELEGRAM

Udemy Coupons

ads: @HusseinSheikho

The first channel in Telegram that offers free
Udemy coupons

❤2

869 views06:03

Python | Machine Learning | Coding | R

💡 Top 50 Pillow Operations for Image Processing

👇

Please open Telegram to view this post

VIEW IN TELEGRAM

❤1

1.1K viewsedited 10:51

Python | Machine Learning | Coding | R

💡 Top 50 Pillow Operations for Image Processing

I. File & Basic Operations

• Open an image file.

from PIL import Image
img = Image.open("image.jpg")

• Save an image.

img.save("new_image.png")

• Display an image (opens in default viewer).

img.show()

• Create a new blank image.

new_img = Image.new("RGB", (200, 100), "blue")

• Get image format (e.g., 'JPEG').

print(img.format)

• Get image dimensions as a (width, height) tuple.

width, height = img.size

• Get pixel format (e.g., 'RGB', 'L' for grayscale).

print(img.mode)

• Convert image mode.

grayscale_img = img.convert("L")

• Get a pixel's color value at (x, y).

r, g, b = img.getpixel((10, 20))

• Set a pixel's color value at (x, y).

img.putpixel((10, 20), (255, 0, 0))

II. Cropping, Resizing & Pasting

• Crop a rectangular region.

box = (100, 100, 400, 400)
cropped_img = img.crop(box)

• Resize an image to an exact size.

resized_img = img.resize((200, 200))

• Create a thumbnail (maintains aspect ratio).

img.thumbnail((128, 128))

• Paste one image onto another.

img.paste(another_img, (50, 50))

III. Rotation & Transformation

• Rotate an image (counter-clockwise).

rotated_img = img.rotate(45, expand=True)

• Flip an image horizontally.

flipped_img = img.transpose(Image.FLIP_LEFT_RIGHT)

• Flip an image vertically.

flipped_img = img.transpose(Image.FLIP_TOP_BOTTOM)

• Rotate by 90, 180, or 270 degrees.

img_90 = img.transpose(Image.ROTATE_90)

• Apply an affine transformation.

transformed = img.transform(img.size, Image.AFFINE, (1, 0.5, 0, 0, 1, 0))

IV. ImageOps Module Helpers

• Invert image colors.

from PIL import ImageOps
inverted_img = ImageOps.invert(img)

• Flip an image horizontally (mirror).

mirrored_img = ImageOps.mirror(img)

• Flip an image vertically.

flipped_v_img = ImageOps.flip(img)

• Convert to grayscale.

grayscale = ImageOps.grayscale(img)

• Colorize a grayscale image.

colorized = ImageOps.colorize(grayscale, black="blue", white="yellow")

• Reduce the number of bits for each color channel.

posterized = ImageOps.posterize(img, 4)

• Auto-adjust image contrast.

adjusted_img = ImageOps.autocontrast(img)

• Equalize the image histogram.

equalized_img = ImageOps.equalize(img)

• Add a border to an image.

bordered = ImageOps.expand(img, border=10, fill='black')

V. Color & Pixel Operations

• Split image into individual bands (e.g., R, G, B).

r, g, b = img.split()

• Merge bands back into an image.

merged_img = Image.merge("RGB", (r, g, b))

• Apply a function to each pixel.

brighter_img = img.point(lambda i: i * 1.2)

• Get a list of colors used in the image.

colors = img.getcolors(maxcolors=256)

• Blend two images with alpha compositing.

# Both images must be in RGBA mode
blended = Image.alpha_composite(img1_rgba, img2_rgba)

VI. Filters (ImageFilter)

❤2

1.49K views10:52

Python | Machine Learning | Coding | R

• Apply a simple blur filter.

from PIL import ImageFilter
blurred_img = img.filter(ImageFilter.BLUR)

• Apply a box blur with a given radius.

box_blur = img.filter(ImageFilter.BoxBlur(5))

• Apply a Gaussian blur.

gaussian_blur = img.filter(ImageFilter.GaussianBlur(radius=2))

• Sharpen the image.

sharpened = img.filter(ImageFilter.SHARPEN)

• Find edges.

edges = img.filter(ImageFilter.FIND_EDGES)

• Enhance edges.

edge_enhanced = img.filter(ImageFilter.EDGE_ENHANCE)

• Emboss the image.

embossed = img.filter(ImageFilter.EMBOSS)

• Find contours.

contours = img.filter(ImageFilter.CONTOUR)

VII. Image Enhancement (ImageEnhance)

• Adjust color saturation.

from PIL import ImageEnhance
enhancer = ImageEnhance.Color(img)
vibrant_img = enhancer.enhance(2.0)

• Adjust brightness.

enhancer = ImageEnhance.Brightness(img)
bright_img = enhancer.enhance(1.5)

• Adjust contrast.

enhancer = ImageEnhance.Contrast(img)
contrast_img = enhancer.enhance(1.5)

• Adjust sharpness.

enhancer = ImageEnhance.Sharpness(img)
sharp_img = enhancer.enhance(2.0)

VIII. Drawing (ImageDraw & ImageFont)

• Draw text on an image.

from PIL import ImageDraw, ImageFont
draw = ImageDraw.Draw(img)
font = ImageFont.truetype("arial.ttf", 36)
draw.text((10, 10), "Hello", font=font, fill="red")

• Draw a line.

draw.line((0, 0, 100, 200), fill="blue", width=3)

• Draw a rectangle (outline).

draw.rectangle([10, 10, 90, 60], outline="green", width=2)

• Draw a filled ellipse.

draw.ellipse([100, 100, 180, 150], fill="yellow")

• Draw a polygon.

draw.polygon([(10,10), (20,50), (60,10)], fill="purple")

#Python #Pillow #ImageProcessing #PIL #CheatSheet

━━━━━━━━━━━━━━━
By: @CodeProgrammer ✨

❤7🔥6👍2🎉2

1.99K views10:52

Python | Machine Learning | Coding | R

Core Python Cheatsheet.pdf

173.3 KB

Python is a high-level, interpreted programming language known for its simplicity, readability, and
versatility. It was first released in 1991 by Guido van Rossum and has since become one of the most
popular programming languages in the world.
Python’s syntax emphasizes readability, with code written in a clear and concise manner using whitespace and indentation to define blocks of code. It is an interpreted language, meaning that
code is executed line-by-line rather than compiled into machine code. This makes it easy to write and test code quickly, without needing to worry about the details of low-level hardware.
Python is a general-purpose language, meaning that it can be used for a wide variety of applications, from web development to scientific computing to artificial intelligence and machine learning. Its simplicity and ease of use make it a popular choice for beginners, while its power and flexibility make it a favorite of experienced developers.
Python’s standard library contains a wide range of modules and packages, providing support for
everything from basic data types and control structures to advanced data manipulation and visualization. Additionally, there are countless third-party packages available through Python’s package manager, pip, allowing developers to easily extend Python’s capabilities to suit their needs.
Overall, Python’s combination of simplicity, power, and flexibility makes it an ideal language for a wide range of applications and skill levels.

https://t.me/CodeProgrammer

⚡️

Please open Telegram to view this post

VIEW IN TELEGRAM

❤5👍1

740 viewsedited 16:14

About

Blog

Apps

Platform