Python | Machine Learning | Coding | R
67.3K subscribers
1.25K photos
89 videos
153 files
906 links
Help and ads: @hussein_sheikho

Discover powerful insights with Python, Machine Learning, Coding, and R—your essential toolkit for data-driven solutions, smart alg

List of our channels:
https://t.me/addlist/8_rRW2scgfRhOTc0

https://telega.io/?r=nikapsOH
Download Telegram
# Initialize the TF-IDF Vectorizer
vectorizer = TfidfVectorizer()

# Fit the vectorizer on the training data and transform it
X_train_tfidf = vectorizer.fit_transform(X_train)

# Only transform the test data using the already-fitted vectorizer
X_test_tfidf = vectorizer.transform(X_test)

print("Shape of training data vectors:", X_train_tfidf.shape)
print("Shape of testing data vectors:", X_test_tfidf.shape)


---

Step 5: Training the NLP Model

Now we can train a machine learning model. Multinomial Naive Bayes is a simple yet powerful algorithm that works very well for text classification tasks.

#ModelTraining #NaiveBayes

# Initialize and train the Naive Bayes classifier
model = MultinomialNB()
model.fit(X_train_tfidf, y_train)

print("Model training complete.")


---

Step 6: Making Predictions and Evaluating the Model

With our model trained, let's use it to make predictions on our unseen test data and see how well it performs.

#Evaluation #ModelPerformance #Prediction

# Make predictions on the test set
y_pred = model.predict(X_test_tfidf)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy * 100:.2f}%\n")

# Display a detailed classification report
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=['Negative', 'Positive']))


---

Step 7: Discussion of Results

#Results #Discussion

Our model achieved 100% accuracy on this very small test set.

Accuracy: This is the percentage of correct predictions. 100% is perfect, but this is expected on such a tiny, clean dataset. In the real world, an accuracy of 85-95% is often considered very good.
Precision: Of all the times the model predicted "Positive", what percentage were actually positive?
Recall: Of all the actual "Positive" texts, what percentage did the model correctly identify?
F1-Score: A weighted average of Precision and Recall.

Limitations: Our dataset is extremely small. A real model would need thousands of examples to be reliable and generalize well to new, unseen text.

---

Step 8: Testing the Model on New Sentences

Let's see how our complete pipeline works on brand new text.

#RealWorldNLP #Inference

# Function to predict sentiment of a new sentence
def predict_sentiment(sentence):
# 1. Preprocess the text
processed_sentence = preprocess_text(sentence)

# 2. Vectorize the text using the SAME vectorizer
vectorized_sentence = vectorizer.transform([processed_sentence])

# 3. Make a prediction
prediction = model.predict(vectorized_sentence)

# 4. Return the result
return "Positive" if prediction[0] == 1 else "Negative"

# Test with new sentences
new_sentence_1 = "The movie was absolutely amazing!"
new_sentence_2 = "I was very bored and did not like it."

print(f"'{new_sentence_1}' -> Sentiment: {predict_sentiment(new_sentence_1)}")
print(f"'{new_sentence_2}' -> Sentiment: {predict_sentiment(new_sentence_2)}")


━━━━━━━━━━━━━━━
By: @CodeProgrammer
5👍3