# Initialize the TF-IDF Vectorizer
vectorizer = TfidfVectorizer()
# Fit the vectorizer on the training data and transform it
X_train_tfidf = vectorizer.fit_transform(X_train)
# Only transform the test data using the already-fitted vectorizer
X_test_tfidf = vectorizer.transform(X_test)
print("Shape of training data vectors:", X_train_tfidf.shape)
print("Shape of testing data vectors:", X_test_tfidf.shape)
---
Step 5: Training the NLP Model
Now we can train a machine learning model. Multinomial Naive Bayes is a simple yet powerful algorithm that works very well for text classification tasks.
#ModelTraining #NaiveBayes
# Initialize and train the Naive Bayes classifier
model = MultinomialNB()
model.fit(X_train_tfidf, y_train)
print("Model training complete.")
---
Step 6: Making Predictions and Evaluating the Model
With our model trained, let's use it to make predictions on our unseen test data and see how well it performs.
#Evaluation #ModelPerformance #Prediction
# Make predictions on the test set
y_pred = model.predict(X_test_tfidf)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy * 100:.2f}%\n")
# Display a detailed classification report
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=['Negative', 'Positive']))
---
Step 7: Discussion of Results
#Results #Discussion
Our model achieved 100% accuracy on this very small test set.
Accuracy: This is the percentage of correct predictions. 100% is perfect, but this is expected on such a tiny, clean dataset. In the real world, an accuracy of 85-95% is often considered very good.
Precision: Of all the times the model predicted "Positive", what percentage were actually positive?
Recall: Of all the actual "Positive" texts, what percentage did the model correctly identify?
F1-Score: A weighted average of Precision and Recall.
Limitations: Our dataset is extremely small. A real model would need thousands of examples to be reliable and generalize well to new, unseen text.
---
Step 8: Testing the Model on New Sentences
Let's see how our complete pipeline works on brand new text.
#RealWorldNLP #Inference
# Function to predict sentiment of a new sentence
def predict_sentiment(sentence):
# 1. Preprocess the text
processed_sentence = preprocess_text(sentence)
# 2. Vectorize the text using the SAME vectorizer
vectorized_sentence = vectorizer.transform([processed_sentence])
# 3. Make a prediction
prediction = model.predict(vectorized_sentence)
# 4. Return the result
return "Positive" if prediction[0] == 1 else "Negative"
# Test with new sentences
new_sentence_1 = "The movie was absolutely amazing!"
new_sentence_2 = "I was very bored and did not like it."
print(f"'{new_sentence_1}' -> Sentiment: {predict_sentiment(new_sentence_1)}")
print(f"'{new_sentence_2}' -> Sentiment: {predict_sentiment(new_sentence_2)}")
━━━━━━━━━━━━━━━
By: @CodeProgrammer ✨
❤5👍3
  Part 7: Main Execution Block
Finally, this block sets up the application, registers all our handlers, and starts the bot. This code goes at the end of
#Main #Execution #RunBot
---
Part 8: Results & Discussion
To Run:
• Run
• Replace
• Run
• Send a PDF or EPUB file to your bot on Telegram.
Expected Results:
• The bot will acknowledge the file.
• After a short processing time, it will send back the converted file.
• A new entry will be added to the
Viewing the Database:
You can inspect the
Discussion & Limitations:
• Dependency: The bot is entirely dependent on a local installation of Calibre. This makes it hard to deploy on simple hosting services. A Docker-based deployment would be a good solution.
• Conversion Quality: Converting from PDF, especially those with complex layouts, images, and columns, can result in poor EPUB formatting. This is a fundamental limitation of PDF-to-EPUB conversion, not just a flaw in the bot.
• Synchronous Processing: The bot handles one file at a time. If two users send files simultaneously, one has to wait. For a larger scale, a task queue system (like Celery with Redis) would be necessary to handle conversions asynchronously in the background.
• Error Handling: The current error messaging is generic. Advanced versions could parse Calibre's error output to give users more specific feedback (e.g., "This PDF is password-protected").
#Results #Discussion #Limitations #Scalability
━━━━━━━━━━━━━━━
By: @CodeProgrammer ✨
Finally, this block sets up the application, registers all our handlers, and starts the bot. This code goes at the end of
converter_bot.py.# converter_bot.py (continued)
def main() -> None:
"""Start the bot."""
application = Application.builder().token(TELEGRAM_TOKEN).build()
# Register handlers
application.add_handler(CommandHandler("start", start))
application.add_handler(CommandHandler("help", help_command))
application.add_handler(MessageHandler(filters.Document.ALL, handle_document))
# Run the bot until the user presses Ctrl-C
print("Bot is running...")
application.run_polling()
if __name__ == '__main__':
main()
#Main #Execution #RunBot
---
Part 8: Results & Discussion
To Run:
• Run
python database_setup.py once.• Replace
"YOUR_TELEGRAM_BOT_TOKEN" in converter_bot.py with your actual token from BotFather.• Run
python converter_bot.py.• Send a PDF or EPUB file to your bot on Telegram.
Expected Results:
• The bot will acknowledge the file.
• After a short processing time, it will send back the converted file.
• A new entry will be added to the
conversions.db file.Viewing the Database:
You can inspect the
conversions.db file using a tool like "DB Browser for SQLite" or the command line:sqlite3 conversions.db "SELECT * FROM conversions;"Discussion & Limitations:
• Dependency: The bot is entirely dependent on a local installation of Calibre. This makes it hard to deploy on simple hosting services. A Docker-based deployment would be a good solution.
• Conversion Quality: Converting from PDF, especially those with complex layouts, images, and columns, can result in poor EPUB formatting. This is a fundamental limitation of PDF-to-EPUB conversion, not just a flaw in the bot.
• Synchronous Processing: The bot handles one file at a time. If two users send files simultaneously, one has to wait. For a larger scale, a task queue system (like Celery with Redis) would be necessary to handle conversions asynchronously in the background.
• Error Handling: The current error messaging is generic. Advanced versions could parse Calibre's error output to give users more specific feedback (e.g., "This PDF is password-protected").
#Results #Discussion #Limitations #Scalability
━━━━━━━━━━━━━━━
By: @CodeProgrammer ✨
❤7👍1