# Initialize the TF-IDF Vectorizer
vectorizer = TfidfVectorizer()
# Fit the vectorizer on the training data and transform it
X_train_tfidf = vectorizer.fit_transform(X_train)
# Only transform the test data using the already-fitted vectorizer
X_test_tfidf = vectorizer.transform(X_test)
print("Shape of training data vectors:", X_train_tfidf.shape)
print("Shape of testing data vectors:", X_test_tfidf.shape)
---
Step 5: Training the NLP Model
Now we can train a machine learning model. Multinomial Naive Bayes is a simple yet powerful algorithm that works very well for text classification tasks.
#ModelTraining #NaiveBayes
# Initialize and train the Naive Bayes classifier
model = MultinomialNB()
model.fit(X_train_tfidf, y_train)
print("Model training complete.")
---
Step 6: Making Predictions and Evaluating the Model
With our model trained, let's use it to make predictions on our unseen test data and see how well it performs.
#Evaluation #ModelPerformance #Prediction
# Make predictions on the test set
y_pred = model.predict(X_test_tfidf)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy * 100:.2f}%\n")
# Display a detailed classification report
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=['Negative', 'Positive']))
---
Step 7: Discussion of Results
#Results #Discussion
Our model achieved 100% accuracy on this very small test set.
Accuracy: This is the percentage of correct predictions. 100% is perfect, but this is expected on such a tiny, clean dataset. In the real world, an accuracy of 85-95% is often considered very good.
Precision: Of all the times the model predicted "Positive", what percentage were actually positive?
Recall: Of all the actual "Positive" texts, what percentage did the model correctly identify?
F1-Score: A weighted average of Precision and Recall.
Limitations: Our dataset is extremely small. A real model would need thousands of examples to be reliable and generalize well to new, unseen text.
---
Step 8: Testing the Model on New Sentences
Let's see how our complete pipeline works on brand new text.
#RealWorldNLP #Inference
# Function to predict sentiment of a new sentence
def predict_sentiment(sentence):
# 1. Preprocess the text
processed_sentence = preprocess_text(sentence)
# 2. Vectorize the text using the SAME vectorizer
vectorized_sentence = vectorizer.transform([processed_sentence])
# 3. Make a prediction
prediction = model.predict(vectorized_sentence)
# 4. Return the result
return "Positive" if prediction[0] == 1 else "Negative"
# Test with new sentences
new_sentence_1 = "The movie was absolutely amazing!"
new_sentence_2 = "I was very bored and did not like it."
print(f"'{new_sentence_1}' -> Sentiment: {predict_sentiment(new_sentence_1)}")
print(f"'{new_sentence_2}' -> Sentiment: {predict_sentiment(new_sentence_2)}")
━━━━━━━━━━━━━━━
By: @CodeProgrammer ✨
❤5👍3