Python | Machine Learning | Coding

Python | Machine Learning | Coding | R

# Initialize the TF-IDF Vectorizer
vectorizer = TfidfVectorizer()

# Fit the vectorizer on the training data and transform it
X_train_tfidf = vectorizer.fit_transform(X_train)

# Only transform the test data using the already-fitted vectorizer
X_test_tfidf = vectorizer.transform(X_test)

print("Shape of training data vectors:", X_train_tfidf.shape)
print("Shape of testing data vectors:", X_test_tfidf.shape)

---

Step 5: Training the NLP Model

Now we can train a machine learning model. Multinomial Naive Bayes is a simple yet powerful algorithm that works very well for text classification tasks.

#ModelTraining #NaiveBayes

# Initialize and train the Naive Bayes classifier
model = MultinomialNB()
model.fit(X_train_tfidf, y_train)

print("Model training complete.")

---

Step 6: Making Predictions and Evaluating the Model

With our model trained, let's use it to make predictions on our unseen test data and see how well it performs.

#Evaluation #ModelPerformance #Prediction

# Make predictions on the test set
y_pred = model.predict(X_test_tfidf)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy * 100:.2f}%\n")

# Display a detailed classification report
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=['Negative', 'Positive']))

---

Step 7: Discussion of Results

#Results #Discussion

Our model achieved 100% accuracy on this very small test set.

Accuracy: This is the percentage of correct predictions. 100% is perfect, but this is expected on such a tiny, clean dataset. In the real world, an accuracy of 85-95% is often considered very good.
Precision: Of all the times the model predicted "Positive", what percentage were actually positive?
Recall: Of all the actual "Positive" texts, what percentage did the model correctly identify?
F1-Score: A weighted average of Precision and Recall.

Limitations: Our dataset is extremely small. A real model would need thousands of examples to be reliable and generalize well to new, unseen text.

---

Step 8: Testing the Model on New Sentences

Let's see how our complete pipeline works on brand new text.

#RealWorldNLP #Inference

# Function to predict sentiment of a new sentence
def predict_sentiment(sentence):
    # 1. Preprocess the text
    processed_sentence = preprocess_text(sentence)
    
    # 2. Vectorize the text using the SAME vectorizer
    vectorized_sentence = vectorizer.transform([processed_sentence])
    
    # 3. Make a prediction
    prediction = model.predict(vectorized_sentence)
    
    # 4. Return the result
    return "Positive" if prediction[0] == 1 else "Negative"

# Test with new sentences
new_sentence_1 = "The movie was absolutely amazing!"
new_sentence_2 = "I was very bored and did not like it."

print(f"'{new_sentence_1}' -> Sentiment: {predict_sentiment(new_sentence_1)}")
print(f"'{new_sentence_2}' -> Sentiment: {predict_sentiment(new_sentence_2)}")

━━━━━━━━━━━━━━━
By: @CodeProgrammer ✨

❤5👍3

1.54K views11:33

Python | Machine Learning | Coding | R

Part 7: Main Execution Block

Finally, this block sets up the application, registers all our handlers, and starts the bot. This code goes at the end of converter_bot.py.

# converter_bot.py (continued)

def main() -> None:
    """Start the bot."""
    application = Application.builder().token(TELEGRAM_TOKEN).build()

    # Register handlers
    application.add_handler(CommandHandler("start", start))
    application.add_handler(CommandHandler("help", help_command))
    application.add_handler(MessageHandler(filters.Document.ALL, handle_document))

    # Run the bot until the user presses Ctrl-C
    print("Bot is running...")
    application.run_polling()

if __name__ == '__main__':
    main()

#Main #Execution #RunBot

---

Part 8: Results & Discussion

To Run:
• Run python database_setup.py once.
• Replace "YOUR_TELEGRAM_BOT_TOKEN" in converter_bot.py with your actual token from BotFather.
• Run python converter_bot.py.
• Send a PDF or EPUB file to your bot on Telegram.

Expected Results:
• The bot will acknowledge the file.
• After a short processing time, it will send back the converted file.
• A new entry will be added to the conversions.db file.

Viewing the Database:
You can inspect the conversions.db file using a tool like "DB Browser for SQLite" or the command line:
sqlite3 conversions.db "SELECT * FROM conversions;"

Discussion & Limitations:
• Dependency: The bot is entirely dependent on a local installation of Calibre. This makes it hard to deploy on simple hosting services. A Docker-based deployment would be a good solution.
• Conversion Quality: Converting from PDF, especially those with complex layouts, images, and columns, can result in poor EPUB formatting. This is a fundamental limitation of PDF-to-EPUB conversion, not just a flaw in the bot.
• Synchronous Processing: The bot handles one file at a time. If two users send files simultaneously, one has to wait. For a larger scale, a task queue system (like Celery with Redis) would be necessary to handle conversions asynchronously in the background.
• Error Handling: The current error messaging is generic. Advanced versions could parse Calibre's error output to give users more specific feedback (e.g., "This PDF is password-protected").

#Results #Discussion #Limitations #Scalability

━━━━━━━━━━━━━━━
By: @CodeProgrammer ✨

❤7👍1

1.17K views09:19

About

Blog

Apps

Platform