Learning Goals
3 min- Use VADER for instant rule-based sentiment scores.
- Train a TF-IDF + classifier sentiment model on labelled reviews.
- Compare lexicon vs trained approaches.
- Spot where sentiment models fail (sarcasm, negation, context).
Warm-Up · Two Routes
5 minLEXICON (VADER): a dictionary of word→sentiment scores.
No training. Great for social media / short text.
TRAINED MODEL: TF-IDF + classifier on YOUR labelled data.
Learns your domain's language. Needs labels.If you have no labels and need something now, use a lexicon tool. If you have labelled data in your domain (movie reviews, product feedback), a trained model usually wins. Both are sentiment analysis.
New Concept · VADER & a Trained Model
14 minRoute 1 — VADER (no training)
pip install nltk
python -c "import nltk; nltk.download('vader_lexicon')"from nltk.sentiment import SentimentIntensityAnalyzer sia = SentimentIntensityAnalyzer() for text in ["I love this!", "This is terrible.", "It's okay I guess"]: s = sia.polarity_scores(text) print(f"{s['compound']:+.2f} {text}")
+0.69 I love this! -0.48 This is terrible. +0.20 It's okay I guess
compound ranges -1 (very negative) to +1 (very positive). VADER even understands "!", ALL-CAPS, and emoji intensity. Rule of thumb: ≥ 0.05 positive, ≤ -0.05 negative, else neutral.
Route 2 — train your own
from sklearn.pipeline import make_pipeline from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.linear_model import LogisticRegression clf = make_pipeline( TfidfVectorizer(stop_words="english", ngram_range=(1, 2)), LogisticRegression(max_iter=1000), ) clf.fit(train_reviews, train_labels) # labels: "pos"/"neg" print(clf.predict(["the plot was boring but the acting saved it"]))
The ngram_range=(1,2) is doing real work here — it captures "not good", "too slow", "highly recommend" as phrases.
Where both fail
Sarcasm: "Oh GREAT, another delay." (positive words, negative meaning) Negation: "not bad at all" (a lexicon may miss the flip) Domain: "this phone is sick" (slang positive, lexicon says negative) Context: "small" is good for a phone, bad for a hotel room
Worked Example · Lexicon vs Trained
12 min# sentiment.py — compare VADER to a trained model from nltk.sentiment import SentimentIntensityAnalyzer from sklearn.pipeline import make_pipeline from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.linear_model import LogisticRegression # tiny labelled training set (use a real dataset for production) train = [ ("absolutely loved it, best ever", "pos"), ("amazing quality, highly recommend", "pos"), ("so good, will buy again", "pos"), ("terrible, broke immediately", "neg"), ("worst purchase, total waste", "neg"), ("hated it, do not buy", "neg"), ] texts, labels = zip(*train) clf = make_pipeline(TfidfVectorizer(ngram_range=(1, 2)), LogisticRegression(max_iter=1000)).fit(texts, labels) sia = SentimentIntensityAnalyzer() tests = ["this is fantastic", "not worth the money", "it does the job"] print(f"{'text':<24}{'VADER':>10}{'trained':>10}") for t in tests: v = sia.polarity_scores(t)["compound"] v_label = "pos" if v >= 0.05 else "neg" if v <= -0.05 else "neu" print(f"{t:<24}{v_label:>10}{clf.predict([t])[0]:>10}")
Sample output
text VADER trained this is fantastic pos pos not worth the money neg neg it does the job neu pos
Read the diff
VADER and the trained model agree on clear cases. They diverge on "it does the job" — neutral to VADER, but our tiny training set has no neutral class so the model is forced to pick pos/neg. The lesson: a trained model is only as good as its labels and classes. With real, balanced data the trained model usually wins on your specific domain.
Try It Yourself
13 minRun VADER on 10 of your own sentences. Does it agree with how you'd label them?
Find three sentences where VADER gets it wrong (sarcasm, slang, negation). Explain why.
Use a real review dataset (IMDB, or a CSV you have). Train TF-IDF + LogisticRegression and report CV accuracy. Compare to VADER's accuracy on the same test set.
Mini-Challenge · Sentiment Over Time
8 minGiven dated reviews, compute average sentiment per day/week and plot the trend. Did sentiment improve or decline? This is exactly how brands monitor reputation.
Show one possible solution
import pandas as pd from nltk.sentiment import SentimentIntensityAnalyzer sia = SentimentIntensityAnalyzer() df = pd.read_csv("reviews.csv", parse_dates=["date"]) # date, text df["sentiment"] = df["text"].apply(lambda t: sia.polarity_scores(t)["compound"]) weekly = df.set_index("date")["sentiment"].resample("W").mean() weekly.plot(title="Average sentiment per week")
Non-negotiables: a sentiment score per row, resampled over time, a trend plot. This combines L4 time-series with L5 NLP.
Recap
3 minTwo routes: VADER (lexicon, zero training, great for short social text) and a trained TF-IDF + classifier (learns your domain, needs labels). Both struggle with sarcasm, slang, negation and context. Use VADER for a quick start; train when you have labelled domain data. Next: a rule-based chatbot — no API.
Vocabulary Card
- sentiment analysis
- Classifying the emotional tone of text (positive/negative/neutral).
- lexicon (VADER)
- A dictionary of word→sentiment scores; no training needed.
- compound score
- VADER's overall -1..+1 sentiment for a piece of text.
- domain adaptation
- Training on your specific text so the model learns its language.
Homework
4 minCollect 20 short reviews (real or written). Score them with VADER AND a small trained model. Where do they disagree? Pick the 3 most interesting disagreements and explain which one you trust and why.
Combine sentiment.py with your own reviews. The interesting disagreements usually involve sarcasm, slang, or domain-specific words.