Python | Machine Learning | Coding

Python | Machine Learning | Coding | R

# Initialize the TF-IDF Vectorizer
vectorizer = TfidfVectorizer()

# Fit the vectorizer on the training data and transform it
X_train_tfidf = vectorizer.fit_transform(X_train)

# Only transform the test data using the already-fitted vectorizer
X_test_tfidf = vectorizer.transform(X_test)

print("Shape of training data vectors:", X_train_tfidf.shape)
print("Shape of testing data vectors:", X_test_tfidf.shape)

---

Step 5: Training the NLP Model

Now we can train a machine learning model. Multinomial Naive Bayes is a simple yet powerful algorithm that works very well for text classification tasks.

#ModelTraining #NaiveBayes

# Initialize and train the Naive Bayes classifier
model = MultinomialNB()
model.fit(X_train_tfidf, y_train)

print("Model training complete.")

---

Step 6: Making Predictions and Evaluating the Model

With our model trained, let's use it to make predictions on our unseen test data and see how well it performs.

#Evaluation #ModelPerformance #Prediction

# Make predictions on the test set
y_pred = model.predict(X_test_tfidf)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy * 100:.2f}%\n")

# Display a detailed classification report
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=['Negative', 'Positive']))

---

Step 7: Discussion of Results

#Results #Discussion

Our model achieved 100% accuracy on this very small test set.

Accuracy: This is the percentage of correct predictions. 100% is perfect, but this is expected on such a tiny, clean dataset. In the real world, an accuracy of 85-95% is often considered very good.
Precision: Of all the times the model predicted "Positive", what percentage were actually positive?
Recall: Of all the actual "Positive" texts, what percentage did the model correctly identify?
F1-Score: A weighted average of Precision and Recall.

Limitations: Our dataset is extremely small. A real model would need thousands of examples to be reliable and generalize well to new, unseen text.

---

Step 8: Testing the Model on New Sentences

Let's see how our complete pipeline works on brand new text.

#RealWorldNLP #Inference

# Function to predict sentiment of a new sentence
def predict_sentiment(sentence):
    # 1. Preprocess the text
    processed_sentence = preprocess_text(sentence)
    
    # 2. Vectorize the text using the SAME vectorizer
    vectorized_sentence = vectorizer.transform([processed_sentence])
    
    # 3. Make a prediction
    prediction = model.predict(vectorized_sentence)
    
    # 4. Return the result
    return "Positive" if prediction[0] == 1 else "Negative"

# Test with new sentences
new_sentence_1 = "The movie was absolutely amazing!"
new_sentence_2 = "I was very bored and did not like it."

print(f"'{new_sentence_1}' -> Sentiment: {predict_sentiment(new_sentence_1)}")
print(f"'{new_sentence_2}' -> Sentiment: {predict_sentiment(new_sentence_2)}")

━━━━━━━━━━━━━━━
By: @CodeProgrammer ✨

❤5👍3

1.76K views11:33

About

Blog

Apps

Platform