Python | Machine Learning | Coding

Python | Machine Learning | Coding | R

#PDF #EPUB #TelegramBot #Python #SQLite #Project

Lesson: Building a PDF <> EPUB Telegram Converter Bot

This lesson walks you through creating a fully functional Telegram bot from scratch. The bot will accept PDF or EPUB files, convert them to the other format, and log each transaction in an SQLite database.

---

Part 1: Prerequisites & Setup

First, we need to install the necessary Python library for the Telegram Bot API. We will also rely on Calibre's command-line tools for conversion.

Important: You must install Calibre on the system where the bot will run and ensure its ebook-convert tool is in your system's PATH.

pip install python-telegram-bot==20.3

#Setup #Prerequisites

---

Part 2: Database Initialization

We'll use SQLite to log every successful conversion. Create a file named database_setup.py and run it once to create the database file and the table.

# database_setup.py
import sqlite3

def setup_database():
    conn = sqlite3.connect('conversions.db')
    cursor = conn.cursor()
    
    # Create table to store conversion logs
    cursor.execute('''
        CREATE TABLE IF NOT EXISTS conversions (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            user_id INTEGER NOT NULL,
            original_filename TEXT NOT NULL,
            converted_filename TEXT NOT NULL,
            conversion_type TEXT NOT NULL,
            timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
        )
    ''')
    
    conn.commit()
    conn.close()
    print("Database setup complete. 'conversions.db' is ready.")

if __name__ == '__main__':
    setup_database()

#Database #SQLite #Initialization

---

Part 3: The Main Bot Script - Imports & Basic Commands

Now, let's create our main bot file, converter_bot.py. We'll start with imports and the initial /start and /help commands.

# converter_bot.py
import logging
import os
import sqlite3
import subprocess
from telegram import Update
from telegram.ext import Application, CommandHandler, MessageHandler, filters, ContextTypes

# Enable logging
logging.basicConfig(format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', level=logging.INFO)

# --- Bot Token ---
TELEGRAM_TOKEN = "YOUR_TELEGRAM_BOT_TOKEN"

# --- Command Handlers ---
async def start(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
    user = update.effective_user
    await update.message.reply_html(
        rf"Hi {user.mention_html()}! Send me a PDF or EPUB file to convert.",
    )

async def help_command(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
    await update.message.reply_text("Simply send a .pdf file to get an .epub, or send an .epub file to get a .pdf. Note: Conversion quality depends on the source file's structure.")

#TelegramBot #Python #Boilerplate

---

Part 4: The Core Conversion Logic

This function will be the heart of our bot. It uses the ebook-convert command-line tool (from Calibre) to perform the conversion. It's crucial that Calibre is installed correctly for this to work.

❤1

745 views09:19

Python | Machine Learning | Coding | R

# converter_bot.py (continued)

def run_conversion(input_path: str, output_path: str) -> bool:
    """Runs the ebook-convert command and returns True on success."""
    try:
        command = ['ebook-convert', input_path, output_path]
        result = subprocess.run(command, check=True, capture_output=True, text=True)
        logging.info(f"Calibre output: {result.stdout}")
        return True
    except FileNotFoundError:
        logging.error("CRITICAL: 'ebook-convert' command not found. Is Calibre installed and in the system's PATH?")
        return False
    except subprocess.CalledProcessError as e:
        logging.error(f"Conversion failed for {input_path}. Error: {e.stderr}")
        return False

#Conversion #Calibre #Subprocess

---

Part 5: Database Logging Function

This helper function will connect to our SQLite database and insert a new record for each successful conversion.

# converter_bot.py (continued)

def log_to_db(user_id: int, original_file: str, converted_file: str, conv_type: str):
    """Logs a successful conversion to the SQLite database."""
    try:
        conn = sqlite3.connect('conversions.db')
        cursor = conn.cursor()
        cursor.execute(
            "INSERT INTO conversions (user_id, original_filename, converted_filename, conversion_type) VALUES (?, ?, ?, ?)",
            (user_id, original_file, converted_file, conv_type)
        )
        conn.commit()
        conn.close()
    except sqlite3.Error as e:
        logging.error(f"Database error: {e}")

#Database #Logging #SQLite

---

Part 6: Handling Incoming Files

This is the main handler that will be triggered when a user sends a document. It downloads the file, determines the target format, calls the conversion function, sends the result back, logs it, and cleans up.

# converter_bot.py (continued)

async def handle_document(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
    doc = update.message.document
    file_id = doc.file_id
    file_name = doc.file_name
    
    input_path = os.path.join("downloads", file_name)
    os.makedirs("downloads", exist_ok=True) # Ensure download directory exists
    
    new_file = await context.bot.get_file(file_id)
    await new_file.download_to_drive(input_path)
    
    await update.message.reply_text(f"Received '{file_name}'. Starting conversion...")

    output_path = ""
    conversion_type = ""

    if file_name.lower().endswith('.pdf'):
        output_path = input_path.rsplit('.', 1)[0] + '.epub'
        conversion_type = "PDF -> EPUB"
    elif file_name.lower().endswith('.epub'):
        output_path = input_path.rsplit('.', 1)[0] + '.pdf'
        conversion_type = "EPUB -> PDF"
    else:
        await update.message.reply_text("Sorry, I only support PDF and EPUB files.")
        os.remove(input_path)
        return

    # Run the conversion
    success = run_conversion(input_path, output_path)
    
    if success and os.path.exists(output_path):
        await update.message.reply_text("Conversion successful! Uploading your file...")
        await context.bot.send_document(chat_id=update.effective_chat.id, document=open(output_path, 'rb'))
        
        # Log to database
        log_to_db(update.effective_user.id, file_name, os.path.basename(output_path), conversion_type)
    else:
        await update.message.reply_text("An error occurred during conversion. Please check the file and try again. The file might be corrupted or protected.")

    # Cleanup
    if os.path.exists(input_path):
        os.remove(input_path)
    if os.path.exists(output_path):
        os.remove(output_path)

#FileHandler #BotLogic

---

851 views09:19

Python | Machine Learning | Coding | R

Part 7: Main Execution Block

Finally, this block sets up the application, registers all our handlers, and starts the bot. This code goes at the end of converter_bot.py.

# converter_bot.py (continued)

def main() -> None:
    """Start the bot."""
    application = Application.builder().token(TELEGRAM_TOKEN).build()

    # Register handlers
    application.add_handler(CommandHandler("start", start))
    application.add_handler(CommandHandler("help", help_command))
    application.add_handler(MessageHandler(filters.Document.ALL, handle_document))

    # Run the bot until the user presses Ctrl-C
    print("Bot is running...")
    application.run_polling()

if __name__ == '__main__':
    main()

#Main #Execution #RunBot

---

Part 8: Results & Discussion

To Run:
• Run python database_setup.py once.
• Replace "YOUR_TELEGRAM_BOT_TOKEN" in converter_bot.py with your actual token from BotFather.
• Run python converter_bot.py.
• Send a PDF or EPUB file to your bot on Telegram.

Expected Results:
• The bot will acknowledge the file.
• After a short processing time, it will send back the converted file.
• A new entry will be added to the conversions.db file.

Viewing the Database:
You can inspect the conversions.db file using a tool like "DB Browser for SQLite" or the command line:
sqlite3 conversions.db "SELECT * FROM conversions;"

Discussion & Limitations:
• Dependency: The bot is entirely dependent on a local installation of Calibre. This makes it hard to deploy on simple hosting services. A Docker-based deployment would be a good solution.
• Conversion Quality: Converting from PDF, especially those with complex layouts, images, and columns, can result in poor EPUB formatting. This is a fundamental limitation of PDF-to-EPUB conversion, not just a flaw in the bot.
• Synchronous Processing: The bot handles one file at a time. If two users send files simultaneously, one has to wait. For a larger scale, a task queue system (like Celery with Redis) would be necessary to handle conversions asynchronously in the background.
• Error Handling: The current error messaging is generic. Advanced versions could parse Calibre's error output to give users more specific feedback (e.g., "This PDF is password-protected").

#Results #Discussion #Limitations #Scalability

━━━━━━━━━━━━━━━
By: @CodeProgrammer ✨

❤8👍1

1.23K views09:19

Python | Machine Learning | Coding | R

Top 100 Data Analysis Commands & Functions

👇

Please open Telegram to view this post

VIEW IN TELEGRAM

❤3

913 viewsedited 17:01

Python | Machine Learning | Coding | R

Top 100 Data Analysis Commands & Functions

#DataAnalysis #Pandas #DataLoading #Inspection

Part 1: Pandas - Data Loading & Inspection

#1. pd.read_csv()
Reads a comma-separated values (csv) file into a Pandas DataFrame.

import pandas as pd
from io import StringIO

csv_data = "col1,col2,col3\n1,a,True\n2,b,False"
df = pd.read_csv(StringIO(csv_data))
print(df)

col1 col2   col3
0     1    a   True
1     2    b  False

#2. df.head()
Returns the first n rows of the DataFrame (default is 5).

import pandas as pd
df = pd.DataFrame({'A': range(10), 'B': list('abcdefghij')})
print(df.head(3))

#3. df.tail()
Returns the last n rows of theDataFrame (default is 5).

import pandas as pd
df = pd.DataFrame({'A': range(10), 'B': list('abcdefghij')})
print(df.tail(3))

#4. df.info()
Prints a concise summary of a DataFrame, including data types and non-null values.

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, np.nan], 'B': ['x', 'y', 'z']})
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   A       2 non-null      float64
 1   B       3 non-null      object 
dtypes: float64(1), object(1)
memory usage: 176.0+ bytes

#5. df.describe()
Generates descriptive statistics for numerical columns.

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 5]})
print(df.describe())

A
count  5.000000
mean   3.000000
std    1.581139
min    1.000000
25%    2.000000
50%    3.000000
75%    4.000000
max    5.000000

#6. df.shape
Returns a tuple representing the dimensionality (rows, columns) of the DataFrame.

import pandas as pd
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4], 'C': [5, 6]})
print(df.shape)

(2, 3)

#7. df.columns
Returns the column labels of the DataFrame.

import pandas as pd
df = pd.DataFrame({'Name': ['Alice'], 'Age': [30]})
print(df.columns)

Index(['Name', 'Age'], dtype='object')

#8. df.dtypes
Returns the data types of each column.

import pandas as pd
df = pd.DataFrame({'A': [1, 2], 'B': [1.1, 2.2], 'C': ['x', 'y']})
print(df.dtypes)

A      int64
B    float64
C     object
dtype: object

#9. df['col'].value_counts()
Returns a Series containing counts of unique values in a column.

import pandas as pd
df = pd.DataFrame({'Fruit': ['Apple', 'Banana', 'Apple', 'Orange', 'Banana', 'Apple']})
print(df['Fruit'].value_counts())

Apple     3
Banana    2
Orange    1
Name: Fruit, dtype: int64

#10. df['col'].unique()
Returns an array of the unique values in a column.

import pandas as pd
df = pd.DataFrame({'Fruit': ['Apple', 'Banana', 'Apple', 'Orange']})
print(df['Fruit'].unique())

['Apple' 'Banana' 'Orange']

#11. df['col'].nunique()
Returns the number of unique values in a column.

❤2

773 views17:02

Python | Machine Learning | Coding | R

import pandas as pd
df = pd.DataFrame({'Fruit': ['Apple', 'Banana', 'Apple', 'Orange']})
print(df['Fruit'].nunique())

#12. df.isnull()
Returns a DataFrame of boolean values indicating missing values.

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan], 'B': [np.nan, 'x']})
print(df.isnull())

A      B
0  False   True
1   True  False

#13. df.isnull().sum()
Returns the number of missing values in each column.

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3, np.nan], 'B': [5, 6, 7, 8]})
print(df.isnull().sum())

A    2
B    0
dtype: int64

#14. df.to_csv()
Writes the DataFrame to a comma-separated values (csv) file.

import pandas as pd
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
csv_output = df.to_csv(index=False)
print(csv_output)

A,B
1,3
2,4

#15. df.copy()
Creates a deep copy of a DataFrame.

import pandas as pd
df1 = pd.DataFrame({'A': [1]})
df2 = df1.copy()
df2.loc[0, 'A'] = 99
print(f"Original df1:\n{df1}")
print(f"Copied df2:\n{df2}")

Original df1:
   A
0  1
Copied df2:
    A
0  99

---
#DataAnalysis #Pandas #Selection #Indexing

Part 2: Pandas - Data Selection & Indexing

#16. df['col']
Selects a single column as a Series.

import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [30, 25]})
print(df['Name'])

0    Alice
1      Bob
Name: Name, dtype: object

#17. df[['col1', 'col2']]
Selects multiple columns as a new DataFrame.

import pandas as pd
df = pd.DataFrame({'Name': ['Alice'], 'Age': [30], 'City': ['New York']})
print(df[['Name', 'City']])

Name       City
0  Alice   New York

#18. df.loc[]
Accesses a group of rows and columns by label(s) or a boolean array.

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3]}, index=['x', 'y', 'z'])
print(df.loc['y'])

A    2
Name: y, dtype: int64

#19. df.iloc[]
Accesses a group of rows and columns by integer position(s).

import pandas as pd
df = pd.DataFrame({'A': [10, 20, 30]})
print(df.iloc[1])

A    20
Name: 1, dtype: int64

#20. df[df['col'] > value]
Selects rows based on a boolean condition (boolean indexing).

import pandas as pd
df = pd.DataFrame({'Age': [22, 35, 18, 40]})
print(df[df['Age'] > 30])

Age
1   35
3   40

#21. df.set_index()
Sets the DataFrame index using existing columns.

import pandas as pd
df = pd.DataFrame({'Country': ['USA', 'UK'], 'Code': [1, 44]})
df_indexed = df.set_index('Country')
print(df_indexed)

Code
Country      
USA         1
UK         44

#22. df.reset_index()
Resets the index of the DataFrame and uses the default integer index.

import pandas as pd
df = pd.DataFrame({'Code': [1, 44]}, index=['USA', 'UK'])
df_reset = df.reset_index()
print(df_reset)

index  Code
0   USA     1
1    UK    44

#23. df.at[]
Accesses a single value by row/column label pair. Faster than .loc.

❤1

496 views17:02

Python | Machine Learning | Coding | R

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3]}, index=['x', 'y', 'z'])
print(df.at['y', 'A'])

#24. df.iat[]
Accesses a single value by row/column integer position. Faster than .iloc.

import pandas as pd
df = pd.DataFrame({'A': [10, 20, 30]})
print(df.iat[1, 0])

#25. df.sample()
Returns a random sample of items from an axis of object.

import pandas as pd
df = pd.DataFrame({'A': range(10)})
print(df.sample(n=3))

A
8  8
2  2
5  5
(Note: Output rows will be random)

---
#DataAnalysis #Pandas #DataCleaning #Manipulation

Part 3: Pandas - Data Cleaning & Manipulation

#26. df.dropna()
Removes missing values.

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3]})
print(df.dropna())

A
0  1.0
2  3.0

#27. df.fillna()
Fills missing (NA/NaN) values using a specified method.

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3]})
print(df.fillna(0))

#28. df.astype()
Casts a pandas object to a specified dtype.

import pandas as pd
df = pd.DataFrame({'A': [1.1, 2.7, 3.5]})
df['A'] = df['A'].astype(int)
print(df)

#29. df.rename()
Alters axes labels.

import pandas as pd
df = pd.DataFrame({'a': [1], 'b': [2]})
df_renamed = df.rename(columns={'a': 'A', 'b': 'B'})
print(df_renamed)

A  B
0  1  2

#30. df.drop()
Drops specified labels from rows or columns.

import pandas as pd
df = pd.DataFrame({'A': [1], 'B': [2], 'C': [3]})
df_dropped = df.drop(columns=['B'])
print(df_dropped)

A  C
0  1  3

#31. pd.to_datetime()
Converts argument to datetime.

import pandas as pd
s = pd.Series(['2023-01-01', '2023-01-02'])
dt_s = pd.to_datetime(s)
print(dt_s)

0   2023-01-01
1   2023-01-02
dtype: datetime64[ns]

#32. df.apply()
Applies a function along an axis of the DataFrame.

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3]})
df['B'] = df['A'].apply(lambda x: x * 2)
print(df)

#33. df['col'].map()
Maps values of a Series according to an input mapping or function.

import pandas as pd
df = pd.DataFrame({'Gender': ['M', 'F', 'M']})
df['Gender_Full'] = df['Gender'].map({'M': 'Male', 'F': 'Female'})
print(df)

Gender Gender_Full
0      M        Male
1      F      Female
2      M        Male

#34. df.replace()
Replaces values given in to_replace with value.

import pandas as pd
df = pd.DataFrame({'Score': [10, -99, 15, -99]})
df_replaced = df.replace(-99, 0)
print(df_replaced)

#35. df.duplicated()
Returns a boolean Series denoting duplicate rows.

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 1], 'B': ['a', 'b', 'a']})
print(df.duplicated())

0    False
1    False
2     True
dtype: bool

#36. df.drop_duplicates()
Returns a DataFrame with duplicate rows removed.

500 views17:02

Python | Machine Learning | Coding | R

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 1], 'B': ['a', 'b', 'a']})
print(df.drop_duplicates())

A  B
0  1  a
1  2  b

#37. df.sort_values()
Sorts by the values along either axis.

import pandas as pd
df = pd.DataFrame({'Age': [25, 22, 30]})
print(df.sort_values(by='Age'))

#38. df.sort_index()
Sorts object by labels (along an axis).

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3]}, index=[10, 5, 8])
print(df.sort_index())

#39. pd.cut()
Bins values into discrete intervals.

import pandas as pd
ages = pd.Series([22, 35, 58, 8, 42])
age_bins = pd.cut(ages, bins=[0, 18, 35, 60], labels=['Child', 'Adult', 'Senior'])
print(age_bins)

0     Adult
1     Adult
2    Senior
3     Child
4    Senior
dtype: category
Categories (3, object): ['Child' < 'Adult' < 'Senior']

#40. pd.qcut()
Quantile-based discretization function (bins into equal-sized groups).

import pandas as pd
data = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
quartiles = pd.qcut(data, 4, labels=False)
print(quartiles)

0    0
1    0
2    0
3    1
4    1
5    2
6    2
7    3
8    3
9    3
dtype: int64

#41. s.str.contains()
Tests if a pattern or regex is contained within a string of a Series.

import pandas as pd
s = pd.Series(['apple', 'banana', 'apricot'])
print(s[s.str.contains('ap')])

0      apple
2    apricot
dtype: object

#42. s.str.split()
Splits strings around a given separator/delimiter.

import pandas as pd
s = pd.Series(['a_b', 'c_d'])
print(s.str.split('_', expand=True))

0  1
0  a  b
1  c  d

#43. s.str.lower()
Converts strings in the Series to lowercase.

import pandas as pd
s = pd.Series(['HELLO', 'World'])
print(s.str.lower())

0    hello
1    world
dtype: object

#44. s.str.strip()
Removes leading and trailing whitespace.

import pandas as pd
s = pd.Series(['  hello  ', ' world '])
print(s.str.strip())

0    hello
1    world
dtype: object

#45. s.dt.year
Extracts the year from a datetime Series.

import pandas as pd
s = pd.to_datetime(pd.Series(['2023-01-01', '2024-05-10']))
print(s.dt.year)

0    2023
1    2024
dtype: int64

---
#DataAnalysis #Pandas #Grouping #Aggregation

Part 4: Pandas - Grouping & Aggregation

#46. df.groupby()
Groups a DataFrame using a mapper or by a Series of columns.

import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
grouped = df.groupby('Team')
print(grouped)

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x...>

#47. groupby.agg()
Aggregates using one or more operations over the specified axis.

import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
agg_df = df.groupby('Team').agg(['mean', 'sum'])
print(agg_df)

Points     
        mean  sum
Team             
A         11   22
B          7   14

#48. groupby.size()
Computes group sizes.

❤1

470 views17:02

Python | Machine Learning | Coding | R

import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B', 'A']})
print(df.groupby('Team').size())

Team
A    3
B    2
dtype: int64

#49. groupby.count()
Computes the count of non-NA cells for each group.

import pandas as pd
import numpy as np
df = pd.DataFrame({'Team': ['A', 'B', 'A'], 'Score': [1, np.nan, 3]})
print(df.groupby('Team').count())

Score
Team       
A         2
B         0

#50. groupby.mean()
Computes the mean of group values.

import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
print(df.groupby('Team').mean())

Points
Team        
A         11
B          7

#51. groupby.sum()
Computes the sum of group values.

import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
print(df.groupby('Team').sum())

Points
Team        
A         22
B         14

#52. groupby.min()
Computes the minimum of group values.

import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
print(df.groupby('Team').min())

Points
Team        
A         10
B          6

#53. groupby.max()
Computes the maximum of group values.

import pandas as pd
df = pd.DataFrame({'Team': ['A', 'B', 'A', 'B'], 'Points': [10, 8, 12, 6]})
print(df.groupby('Team').max())

Points
Team        
A         12
B          8

#54. df.pivot_table()
Creates a spreadsheet-style pivot table as a DataFrame.

import pandas as pd
df = pd.DataFrame({'A': ['foo', 'foo', 'bar'], 'B': ['one', 'two', 'one'], 'C': [1, 2, 3]})
pivot = df.pivot_table(values='C', index='A', columns='B')
print(pivot)

B    one  two
A            
bar  3.0  NaN
foo  1.0  2.0

#55. pd.crosstab()
Computes a cross-tabulation of two (or more) factors.

import pandas as pd
df = pd.DataFrame({'A': ['foo', 'foo', 'bar'], 'B': ['one', 'two', 'one']})
crosstab = pd.crosstab(df.A, df.B)
print(crosstab)

B    one  two
A            
bar    1    0
foo    1    1

---
#DataAnalysis #Pandas #Merging #Joining

Part 5: Pandas - Merging & Concatenating

#56. pd.merge()
Merges DataFrame or named Series objects with a database-style join.

import pandas as pd
df1 = pd.DataFrame({'key': ['A', 'B'], 'val1': [1, 2]})
df2 = pd.DataFrame({'key': ['A', 'B'], 'val2': [3, 4]})
merged = pd.merge(df1, df2, on='key')
print(merged)

key  val1  val2
0   A     1     3
1   B     2     4

#57. pd.concat()
Concatenates pandas objects along a particular axis.

import pandas as pd
df1 = pd.DataFrame({'A': [1, 2]})
df2 = pd.DataFrame({'A': [3, 4]})
concatenated = pd.concat([df1, df2])
print(concatenated)

#58. df.join()
Joins columns with other DataFrame(s) on index or on a key column.

❤1

511 views17:02

Python | Machine Learning | Coding | R

import pandas as pd
df1 = pd.DataFrame({'val1': [1, 2]}, index=['A', 'B'])
df2 = pd.DataFrame({'val2': [3, 4]}, index=['A', 'B'])
joined = df1.join(df2)
print(joined)

val1  val2
A     1     3
B     2     4

#59. pd.get_dummies()
Converts categorical variable into dummy/indicator variables (one-hot encoding).

import pandas as pd
s = pd.Series(list('abca'))
dummies = pd.get_dummies(s)
print(dummies)

#60. df.nlargest()
Returns the first n rows ordered by columns in descending order.

import pandas as pd
df = pd.DataFrame({'population': [100, 500, 200, 800]})
print(df.nlargest(2, 'population'))

population
3         800
1         500

---
#DataAnalysis #NumPy #Arrays

Part 6: NumPy - Array Creation & Manipulation

#61. np.array()
Creates a NumPy ndarray.

import numpy as np
arr = np.array([1, 2, 3])
print(arr)

[1 2 3]

#62. np.arange()
Returns an array with evenly spaced values within a given interval.

import numpy as np
arr = np.arange(0, 5)
print(arr)

[0 1 2 3 4]

#63. np.linspace()
Returns an array with evenly spaced numbers over a specified interval.

import numpy as np
arr = np.linspace(0, 10, 5)
print(arr)

[ 0.   2.5  5.   7.5 10. ]

#64. np.zeros()
Returns a new array of a given shape and type, filled with zeros.

import numpy as np
arr = np.zeros((2, 3))
print(arr)

[[0. 0. 0.]
 [0. 0. 0.]]

#65. np.ones()
Returns a new array of a given shape and type, filled with ones.

import numpy as np
arr = np.ones((2, 3))
print(arr)

[[1. 1. 1.]
 [1. 1. 1.]]

#66. np.random.rand()
Creates an array of the given shape and populates it with random samples from a uniform distribution over [0, 1).

import numpy as np
arr = np.random.rand(2, 2)
print(arr)

[[0.13949386 0.2921446 ]
 [0.52273283 0.77122228]]
(Note: Output values will be random)

#67. arr.reshape()
Gives a new shape to an array without changing its data.

import numpy as np
arr = np.arange(6)
reshaped_arr = arr.reshape((2, 3))
print(reshaped_arr)

[[0 1 2]
 [3 4 5]]

#68. np.concatenate()
Joins a sequence of arrays along an existing axis.

import numpy as np
a = np.array([[1, 2]])
b = np.array([[3, 4]])
print(np.concatenate((a, b), axis=0))

[[1 2]
 [3 4]]

#69. np.vstack()
Stacks arrays in sequence vertically (row wise).

import numpy as np
a = np.array([1, 2])
b = np.array([3, 4])
print(np.vstack((a, b)))

[[1 2]
 [3 4]]

#70. np.hstack()
Stacks arrays in sequence horizontally (column wise).

import numpy as np
a = np.array([1, 2])
b = np.array([3, 4])
print(np.hstack((a, b)))

[1 2 3 4]

---
#DataAnalysis #NumPy #Math #Statistics

Part 7: NumPy - Mathematical & Statistical Functions

#71. np.mean()
Computes the arithmetic mean along the specified axis.

import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(np.mean(arr))

3.0

#72. np.median()
Computes the median along the specified axis.

import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(np.median(arr))

3.0

#73. np.std()
Computes the standard deviation along the specified axis.

430 views17:02

Python | Machine Learning | Coding | R

import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(np.std(arr))

1.4142135623730951

#74. np.sum()
Sums array elements over a given axis.

import numpy as np
arr = np.array([[1, 2], [3, 4]])
print(np.sum(arr))

#75. np.min()
Returns the minimum of an array or minimum along an axis.

import numpy as np
arr = np.array([5, 2, 8, 1])
print(np.min(arr))

#76. np.max()
Returns the maximum of an array or maximum along an axis.

import numpy as np
arr = np.array([5, 2, 8, 1])
print(np.max(arr))

#77. np.sqrt()
Returns the non-negative square-root of an array, element-wise.

import numpy as np
arr = np.array([4, 9, 16])
print(np.sqrt(arr))

[2. 3. 4.]

#78. np.log()
Calculates the natural logarithm, element-wise.

import numpy as np
arr = np.array([1, np.e, np.e**2])
print(np.log(arr))

[0. 1. 2.]

#79. np.dot()
Calculates the dot product of two arrays.

import numpy as np
a = np.array([1, 2])
b = np.array([3, 4])
print(np.dot(a, b))

#80. np.where()
Returns elements chosen from x or y depending on a condition.

import numpy as np
arr = np.array([10, 5, 20, 15])
print(np.where(arr > 12, 'High', 'Low'))

['Low' 'Low' 'High' 'High']

---
#DataAnalysis #Matplotlib #Seaborn #Visualization

Part 8: Matplotlib & Seaborn - Data Visualization

#81. plt.plot()
Plots y versus x as lines and/or markers.

import matplotlib.pyplot as plt
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
# In a real script, you would call plt.show()
print("Output: A figure window opens displaying a line plot.")

Output: A figure window opens displaying a line plot.

#82. plt.scatter()
A scatter plot of y vs. x with varying marker size and/or color.

import matplotlib.pyplot as plt
plt.scatter([1, 2, 3, 4], [1, 4, 9, 16])
print("Output: A figure window opens displaying a scatter plot.")

Output: A figure window opens displaying a scatter plot.

#83. plt.hist()
Computes and draws the histogram of x.

import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(1000)
plt.hist(data, bins=30)
print("Output: A figure window opens displaying a histogram.")

Output: A figure window opens displaying a histogram.

#84. plt.bar()
Makes a bar plot.

import matplotlib.pyplot as plt
plt.bar(['A', 'B', 'C'], [10, 15, 7])
print("Output: A figure window opens displaying a bar chart.")

Output: A figure window opens displaying a bar chart.

#85. plt.boxplot()
Makes a box and whisker plot.

import matplotlib.pyplot as plt
import numpy as np
data = [np.random.normal(0, std, 100) for std in range(1, 4)]
plt.boxplot(data)
print("Output: A figure window opens displaying a box plot.")

Output: A figure window opens displaying a box plot.

#86. sns.heatmap()
Plots rectangular data as a color-encoded matrix.

import seaborn as sns
import numpy as np
data = np.random.rand(10, 12)
sns.heatmap(data)
print("Output: A figure window opens displaying a heatmap.")

Output: A figure window opens displaying a heatmap.

#87. sns.pairplot()
Plots pairwise relationships in a dataset.

❤2

536 views17:02

Python | Machine Learning | Coding | R

import seaborn as sns
import pandas as pd
df = pd.DataFrame(np.random.randn(100, 4), columns=['A', 'B', 'C', 'D'])
# sns.pairplot(df) # This line would generate the plot
print("Output: A figure grid opens showing scatterplots for each pair of variables.")

Output: A figure grid opens showing scatterplots for each pair of variables.

#88. sns.countplot()
Shows the counts of observations in each categorical bin using bars.

import seaborn as sns
import pandas as pd
df = pd.DataFrame({'category': ['A', 'B', 'A', 'C', 'A', 'B']})
sns.countplot(x='category', data=df)
print("Output: A figure window opens showing a count plot.")

Output: A figure window opens showing a count plot.

#89. sns.jointplot()
Draws a plot of two variables with bivariate and univariate graphs.

import seaborn as sns
import pandas as pd
df = pd.DataFrame({'x': range(50), 'y': range(50) + np.random.randn(50)})
# sns.jointplot(x='x', y='y', data=df) # This line would generate the plot
print("Output: A figure shows a scatter plot with histograms for each axis.")

Output: A figure shows a scatter plot with histograms for each axis.

#90. plt.show()
Displays all open figures.

import matplotlib.pyplot as plt
plt.plot([1, 2, 3])
# plt.show() # In a script, this is essential to see the plot.
print("Executes the command to render and display the plot.")

Executes the command to render and display the plot.

---
#DataAnalysis #ScikitLearn #Modeling #Preprocessing

Part 9: Scikit-learn - Modeling & Preprocessing

#91. train_test_split()
Splits arrays or matrices into random train and test subsets.

from sklearn.model_selection import train_test_split
import numpy as np
X, y = np.arange(10).reshape((5, 2)), range(5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
print(f"X_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")

X_train shape: (3, 2)
X_test shape: (2, 2)

#92. StandardScaler()
Standardizes features by removing the mean and scaling to unit variance.

from sklearn.preprocessing import StandardScaler
data = [[0, 0], [0, 0], [1, 1], [1, 1]]
scaler = StandardScaler()
print(scaler.fit_transform(data))

[[-1. -1.]
 [-1. -1.]
 [ 1.  1.]
 [ 1.  1.]]

#93. MinMaxScaler()
Transforms features by scaling each feature to a given range, typically [0, 1].

from sklearn.preprocessing import MinMaxScaler
data = [[-1, 2], [-0.5, 6], [0, 10], [1, 18]]
scaler = MinMaxScaler()
print(scaler.fit_transform(data))

[[0.   0.  ]
 [0.25 0.25]
 [0.5  0.5 ]
 [1.   1.  ]]

#94. LabelEncoder()
Encodes target labels with values between 0 and n_classes-1.

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
encoded = le.fit_transform(['paris', 'tokyo', 'paris'])
print(encoded)

[0 1 0]

#95. OneHotEncoder()
Encodes categorical features as a one-hot numeric array.

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
X = [['Male'], ['Female'], ['Female']]
print(enc.fit_transform(X).toarray())

[[0. 1.]
 [1. 0.]
 [1. 0.]]

#96. LinearRegression()
Ordinary least squares Linear Regression model.

from sklearn.linear_model import LinearRegression
X = [[0], [1], [2]]
y = [0, 1, 2]
reg = LinearRegression().fit(X, y)
print(f"Coefficient: {reg.coef_[0]}")

❤2

1.08K views17:02

Python | Machine Learning | Coding | R

Coefficient: 1.0

#97. LogisticRegression()
Implements Logistic Regression for classification.

from sklearn.linear_model import LogisticRegression
X = [[-1], [0], [1], [2]]
y = [0, 0, 1, 1]
clf = LogisticRegression().fit(X, y)
print(f"Prediction for [[-2]]: {clf.predict([[-2]])}")

Prediction for [[-2]]: [0]

#98. KMeans()
K-Means clustering algorithm.

from sklearn.cluster import KMeans
X = [[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]
kmeans = KMeans(n_clusters=2, n_init='auto').fit(X)
print(kmeans.labels_)

[0 0 0 1 1 1]
(Note: Cluster labels may be flipped, e.g., [1 1 1 0 0 0])

#99. accuracy_score()
Calculates the accuracy classification score.

from sklearn.metrics import accuracy_score
y_true = [0, 1, 1, 0]
y_pred = [0, 1, 0, 0]
print(accuracy_score(y_true, y_pred))

0.75

#100. confusion_matrix()
Computes a confusion matrix to evaluate the accuracy of a classification.

from sklearn.metrics import confusion_matrix
y_true = [0, 1, 0, 1]
y_pred = [1, 1, 0, 1]
print(confusion_matrix(y_true, y_pred))

[[1 1]
 [0 2]]

━━━━━━━━━━━━━━━
By: @CodeProgrammer ✨

❤6🆒4👍2

1.59K views17:02

Python | Machine Learning | Coding | R

Forwarded from Kaggle DataSets for Data Science, Machine Learning and Data Analysis

Unlock premium learning without spending a dime! ⭐️ @DataScienceC is the first Telegram channel dishing out free Udemy coupons daily—grab courses on data science, coding, AI, and beyond. Join the revolution and boost your skills for free today! 📕

What topic are you itching to learn next? 😊
https://t.me/DataScienceC

🌟

Please open Telegram to view this post

VIEW IN TELEGRAM

Udemy Coupons

ads: @HusseinSheikho

The first channel in Telegram that offers free
Udemy coupons

❤2

751 views06:03

Python | Machine Learning | Coding | R

💡 Top 50 Pillow Operations for Image Processing

👇

Please open Telegram to view this post

VIEW IN TELEGRAM

❤1

924 viewsedited 10:51

Python | Machine Learning | Coding | R

💡 Top 50 Pillow Operations for Image Processing

I. File & Basic Operations

• Open an image file.

from PIL import Image
img = Image.open("image.jpg")

• Save an image.

img.save("new_image.png")

• Display an image (opens in default viewer).

img.show()

• Create a new blank image.

new_img = Image.new("RGB", (200, 100), "blue")

• Get image format (e.g., 'JPEG').

print(img.format)

• Get image dimensions as a (width, height) tuple.

width, height = img.size

• Get pixel format (e.g., 'RGB', 'L' for grayscale).

print(img.mode)

• Convert image mode.

grayscale_img = img.convert("L")

• Get a pixel's color value at (x, y).

r, g, b = img.getpixel((10, 20))

• Set a pixel's color value at (x, y).

img.putpixel((10, 20), (255, 0, 0))

II. Cropping, Resizing & Pasting

• Crop a rectangular region.

box = (100, 100, 400, 400)
cropped_img = img.crop(box)

• Resize an image to an exact size.

resized_img = img.resize((200, 200))

• Create a thumbnail (maintains aspect ratio).

img.thumbnail((128, 128))

• Paste one image onto another.

img.paste(another_img, (50, 50))

III. Rotation & Transformation

• Rotate an image (counter-clockwise).

rotated_img = img.rotate(45, expand=True)

• Flip an image horizontally.

flipped_img = img.transpose(Image.FLIP_LEFT_RIGHT)

• Flip an image vertically.

flipped_img = img.transpose(Image.FLIP_TOP_BOTTOM)

• Rotate by 90, 180, or 270 degrees.

img_90 = img.transpose(Image.ROTATE_90)

• Apply an affine transformation.

transformed = img.transform(img.size, Image.AFFINE, (1, 0.5, 0, 0, 1, 0))

IV. ImageOps Module Helpers

• Invert image colors.

from PIL import ImageOps
inverted_img = ImageOps.invert(img)

• Flip an image horizontally (mirror).

mirrored_img = ImageOps.mirror(img)

• Flip an image vertically.

flipped_v_img = ImageOps.flip(img)

• Convert to grayscale.

grayscale = ImageOps.grayscale(img)

• Colorize a grayscale image.

colorized = ImageOps.colorize(grayscale, black="blue", white="yellow")

• Reduce the number of bits for each color channel.

posterized = ImageOps.posterize(img, 4)

• Auto-adjust image contrast.

adjusted_img = ImageOps.autocontrast(img)

• Equalize the image histogram.

equalized_img = ImageOps.equalize(img)

• Add a border to an image.

bordered = ImageOps.expand(img, border=10, fill='black')

V. Color & Pixel Operations

• Split image into individual bands (e.g., R, G, B).

r, g, b = img.split()

• Merge bands back into an image.

merged_img = Image.merge("RGB", (r, g, b))

• Apply a function to each pixel.

brighter_img = img.point(lambda i: i * 1.2)

• Get a list of colors used in the image.

colors = img.getcolors(maxcolors=256)

• Blend two images with alpha compositing.

# Both images must be in RGBA mode
blended = Image.alpha_composite(img1_rgba, img2_rgba)

VI. Filters (ImageFilter)

❤2

1.12K views10:52

Python | Machine Learning | Coding | R

• Apply a simple blur filter.

from PIL import ImageFilter
blurred_img = img.filter(ImageFilter.BLUR)

• Apply a box blur with a given radius.

box_blur = img.filter(ImageFilter.BoxBlur(5))

• Apply a Gaussian blur.

gaussian_blur = img.filter(ImageFilter.GaussianBlur(radius=2))

• Sharpen the image.

sharpened = img.filter(ImageFilter.SHARPEN)

• Find edges.

edges = img.filter(ImageFilter.FIND_EDGES)

• Enhance edges.

edge_enhanced = img.filter(ImageFilter.EDGE_ENHANCE)

• Emboss the image.

embossed = img.filter(ImageFilter.EMBOSS)

• Find contours.

contours = img.filter(ImageFilter.CONTOUR)

VII. Image Enhancement (ImageEnhance)

• Adjust color saturation.

from PIL import ImageEnhance
enhancer = ImageEnhance.Color(img)
vibrant_img = enhancer.enhance(2.0)

• Adjust brightness.

enhancer = ImageEnhance.Brightness(img)
bright_img = enhancer.enhance(1.5)

• Adjust contrast.

enhancer = ImageEnhance.Contrast(img)
contrast_img = enhancer.enhance(1.5)

• Adjust sharpness.

enhancer = ImageEnhance.Sharpness(img)
sharp_img = enhancer.enhance(2.0)

VIII. Drawing (ImageDraw & ImageFont)

• Draw text on an image.

from PIL import ImageDraw, ImageFont
draw = ImageDraw.Draw(img)
font = ImageFont.truetype("arial.ttf", 36)
draw.text((10, 10), "Hello", font=font, fill="red")

• Draw a line.

draw.line((0, 0, 100, 200), fill="blue", width=3)

• Draw a rectangle (outline).

draw.rectangle([10, 10, 90, 60], outline="green", width=2)

• Draw a filled ellipse.

draw.ellipse([100, 100, 180, 150], fill="yellow")

• Draw a polygon.

draw.polygon([(10,10), (20,50), (60,10)], fill="purple")

#Python #Pillow #ImageProcessing #PIL #CheatSheet

━━━━━━━━━━━━━━━
By: @CodeProgrammer ✨

❤6🔥4👍2🎉2

1.51K views10:52

Python | Machine Learning | Coding | R

🤖🧠 LongCat-Video: Meituan’s Groundbreaking Step Toward Efficient Long Video Generation with AI

🗓️ 04 Nov 2025
📚 AI News & Trends

In the rapidly advancing field of generative AI, the ability to create realistic, coherent, and high-quality videos from text or images has become one of the most sought-after goals. Meituan, one of the leading technology innovators in China, has made a remarkable stride in this domain with its latest open-source model — LongCat-Video. Designed as ...

#LongCatVideo #Meituan #GenerativeAI #VideoGeneration #AIInnovation #OpenSource

110 views22:21

📖 Read More

📣 BEST TELEGRAM CHANNELS

About

Blog

Apps

Platform