Out of the Box Sentiment Analysis options with Python using VADER Sentiment and TextBlob




What's going on everyone and welcome to a quick tutorial on doing sentiment analysis with Python. Today, I am going to be looking into two of the more popular "out of the box" sentiment analysis solutions for Python. The first is TextBlob, and the second is going to be Vader Sentiment. This tutorial will focus on checking out these two libaries and using them, and the subsequent tutorials in this series are going to be about making a sentiment analysis application with Twitter.

TextBlob is more of a natural language processing library, but it comes with a rule-based sentiment analysis library that we can use. Taken from the readme: "VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media."

I am sure there are others, but I would like to compare these two for now. To do this, I am going to use a "short movie reviews" dataset. I've had this dataset for years, but have no idea its original source. If anyone knows, I would be happy to cite it. Anyway, here they are: positive.txt and negative.txt. In total, a bit over 10,000 examples for us to test against.

I've always liked using this reviews dataset because many of the reviews are hard for even me to peg as being positive or negative. For this reason, we can also utlize any sort of "confidence" metric a classifier might have to see if we can tweak things to get better accuracy, even if it means throwing some samples out. I am planning to use this sentiment analysis algorithm on Twitter streaming data, on high-volume subjects, so I am evaluating these on both accuracy and speed.

To begin our journey, let's check out TextBlob's offering. With TextBlob, we get a polarity and a subjectivity metric. The polarity is the sentiment itself, ranging from a -1 to a +1. The subjectivity is a measure of the sentiment being objective to subjective, and goes from 0 to 1. We'd rather see sentiment that is objective than subjective, so a lower score should likely denote a more likely-to-be-accurate reading. We'll see.

To use it, you will need to install it, so do a pip install textblob. Now, let's see a quick example:

from textblob import TextBlob

analysis = TextBlob("TextBlob sure looks like it has some interesting features!")

To use something from TextBlob, we first want to convert it to a TextBlob object, so we do that with our analysis var. From here, we can do quite a bit. You can read the docs, or just do:

print(dir(analysis))

You should see quite a bit:

['__add__', '__class__', '__contains__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_cmpkey', '_compare', '_create_sentence_objects', '_strkey', 'analyzer', 'classifier', 'classify', 'correct', 'detect_language', 'ends_with', 'endswith', 'find', 'format', 'index', 'join', 'json', 'lower', 'ngrams', 'noun_phrases', 'np_counts', 'np_extractor', 'parse', 'parser', 'polarity', 'pos_tagger', 'pos_tags', 'raw', 'raw_sentences', 'replace', 'rfind', 'rindex', 'sentences', 'sentiment', 'sentiment_assessments', 'serialized', 'split', 'starts_with', 'startswith', 'string', 'strip', 'stripped', 'subjectivity', 'tags', 'title', 'to_json', 'tokenize', 'tokenizer', 'tokens', 'translate', 'translator', 'upper', 'word_counts', 'words']

We can immediately see quite a few useful ones. We can do things like detect_language, capture noun_phrases, label parts of speech with tags, we can even translate to other languages, tokenize, and more. Very interesting! I am mainly here for the sentiment, but these things are nifty. Let's check a few:

print(analysis.translate(to='es'))
¡TextBlob seguramente parece tener algunas características interesantes!
print(analysis.tags)
[('TextBlob', 'NNP'), ('sure', 'JJ'), ('looks', 'VBZ'), ('like', 'IN'), ('it', 'PRP'), ('has', 'VBZ'), ('some', 'DT'), ('interesting', 'JJ'), ('features', 'NNS')]

These are parts of speech. Since TextBlob is built on top of NLTK, the part of speech tags are the same. Here are the definitions:

POS tag list:
CC  coordinating conjunction
CD  cardinal digit
DT  determiner
EX  existential there (like: "there is" ... think of it like "there exists")
FW  foreign word
IN  preposition/subordinating conjunction
JJ  adjective   'big'
JJR adjective, comparative  'bigger'
JJS adjective, superlative  'biggest'
LS  list marker 1)
MD  modal   could, will
NN  noun, singular 'desk'
NNS noun plural 'desks'
NNP proper noun, singular   'Harrison'
NNPS    proper noun, plural 'Americans'
PDT predeterminer   'all the kids'
POS possessive ending   parent\'s
PRP personal pronoun    I, he, she
PRP$    possessive pronoun  my, his, hers
RB  adverb  very, silently,
RBR adverb, comparative better
RBS adverb, superlative best
RP  particle    give up
TO  to  go 'to' the store.
UH  interjection    errrrrrrrm
VB  verb, base form take
VBD verb, past tense    took
VBG verb, gerund/present participle taking
VBN verb, past participle   taken
VBP verb, sing. present, non-3d take
VBZ verb, 3rd person sing. present  takes
WDT wh-determiner   which
WP  wh-pronoun  who, what
WP$ possessive wh-pronoun   whose
WRB wh-abverb   where, when

Now let's check out the sentiment:

print(analysis.sentiment)
Sentiment(polarity=0.5625, subjectivity=0.6944444444444444)

So this is fairly positive, but also highly subjective. Now, let's test this on a bit more data using the positive.txt and negative.txt datasets.

from textblob import TextBlob

pos_count = 0
pos_correct = 0

with open("positive.txt","r") as f:
    for line in f.read().split('\n'):
        analysis = TextBlob(line)
        if analysis.sentiment.polarity > 0:
            pos_correct += 1
        pos_count +=1


neg_count = 0
neg_correct = 0

with open("negative.txt","r") as f:
    for line in f.read().split('\n'):
        analysis = TextBlob(line)
        if analysis.sentiment.polarity <= 0:
            neg_correct += 1
        neg_count +=1

print("Positive accuracy = {}% via {} samples".format(pos_correct/pos_count*100.0, pos_count))
print("Negative accuracy = {}% via {} samples".format(neg_correct/neg_count*100.0, neg_count))
Positive accuracy = 71.11777944486121% via 5332 samples
Negative accuracy = 55.8702175543886% via 5332 samples
[Finished in 6.5s]

It looks like our positive accuracy is decent, but the negative sentiment accuracy isn't all that good. It could be the case that this classifier is biased across the board, so our "zero" could be moved a bit, let's say 0.2, so we change:

        if analysis.sentiment.polarity > 0.2:
            pos_correct += 1

and

        if analysis.sentiment.polarity <= 0.2:
            neg_correct += 1
Positive accuracy = 46.19279819954989% via 5332 samples
Negative accuracy = 80.1012753188297% via 5332 samples
[Finished in 6.5s]

Nope, that's not it!

How about 0.1?

Positive accuracy = 60.5026256564141% via 5332 samples
Negative accuracy = 68.37959489872468% via 5332 samples
[Finished in 6.5s]

Hmm, well that's better than random I guess, but not something we want to see. What if we play with the subjectivity though? Maybe we can only look at reviews that we feel are more objective?

from textblob import TextBlob

pos_count = 0
pos_correct = 0

with open("positive.txt","r") as f:
    for line in f.read().split('\n'):
        analysis = TextBlob(line)

        if analysis.sentiment.subjectivity < 0.3:
            if analysis.sentiment.polarity > 0:
                pos_correct += 1
            pos_count +=1


neg_count = 0
neg_correct = 0

with open("negative.txt","r") as f:
    for line in f.read().split('\n'):
        analysis = TextBlob(line)
        if analysis.sentiment.subjectivity < 0.3:
            if analysis.sentiment.polarity <= 0:
                neg_correct += 1
            neg_count +=1

print("Positive accuracy = {}% via {} samples".format(pos_correct/pos_count*100.0, pos_count))
print("Negative accuracy = {}% via {} samples".format(neg_correct/neg_count*100.0, neg_count))
Positive accuracy = 29.116465863453815% via 996 samples
Negative accuracy = 76.03369065849922% via 1306 samples
[Finished in 6.5s]

Wow, we're throwing away a lot of samples, and not getting much in return for it! What if we flip things around and require a high degree of subjectivity?

from textblob import TextBlob

pos_count = 0
pos_correct = 0

with open("positive.txt","r") as f:
    for line in f.read().split('\n'):
        analysis = TextBlob(line)

        if analysis.sentiment.subjectivity > 0.9:
            if analysis.sentiment.polarity > 0:
                pos_correct += 1
            pos_count +=1


neg_count = 0
neg_correct = 0

with open("negative.txt","r") as f:
    for line in f.read().split('\n'):
        analysis = TextBlob(line)
        if analysis.sentiment.subjectivity > 0.9:
            if analysis.sentiment.polarity <= 0:
                neg_correct += 1
            neg_count +=1

print("Positive accuracy = {}% via {} samples".format(pos_correct/pos_count*100.0, pos_count))
print("Negative accuracy = {}% via {} samples".format(neg_correct/neg_count*100.0, neg_count))
Positive accuracy = 76.08695652173914% via 414 samples
Negative accuracy = 68.6046511627907% via 344 samples
[Finished in 6.5s]

Interesting, I must not understand subjectivity. Interestingly, if we require subjectivity to be BELOW 0.1, I get:

Positive accuracy = 2.914389799635701% via 549 samples Negative accuracy = 98.1159420289855% via 690 samples [Finished in 6.5s]

Okay, so I am not too impressed here. Can VADER Sentiment save us? Let's find out!.

Do a pip install vadersentiment, now let's check it out!

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()
vs = analyzer.polarity_scores("VADER Sentiment looks interesting, I have high hopes!")
print(vs)

Giving us:

{'neg': 0.0, 'neu': 0.463, 'pos': 0.537, 'compound': 0.6996}
[Finished in 0.3s]

The neg is negative sentiment found, neu is anything found to be neutral, pos is positive, and the compound is "computed by summing the valence scores of each word in the lexicon, adjusted according to the rules, then normalized... [it] is the most useful metric if you want a single unidimensional measure of sentiment." (from the docs). The docs also suggest:

positive sentiment: compound score >= 0.5
neutral sentiment: (compound score > -0.5) and (compound score < 0.5)
negative sentiment: compound score <= -0.5

Alright, let's see what we find. First though, to properly compare, we should just start with 0.

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

pos_count = 0
pos_correct = 0

with open("positive.txt","r") as f:
    for line in f.read().split('\n'):
        vs = analyzer.polarity_scores(line)
        if vs['compound'] > 0:
            pos_correct += 1
        pos_count +=1


neg_count = 0
neg_correct = 0

with open("negative.txt","r") as f:
    for line in f.read().split('\n'):
        vs = analyzer.polarity_scores(line)
        if vs['compound'] <= 0:
            neg_correct += 1
        neg_count +=1

print("Positive accuracy = {}% via {} samples".format(pos_correct/pos_count*100.0, pos_count))
print("Negative accuracy = {}% via {} samples".format(neg_correct/neg_count*100.0, neg_count))
Positive accuracy = 69.4298574643661% via 5332 samples
Negative accuracy = 57.764441110277566% via 5332 samples
[Finished in 3.1s]

Hmm, not much better. Okay, now let's go with the 0.5 and -0.5 as suggested by the documentation:

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

pos_count = 0
pos_correct = 0

threshold = 0.5

with open("positive.txt","r") as f:
    for line in f.read().split('\n'):
        vs = analyzer.polarity_scores(line)

        if vs['compound'] >= threshold or vs['compound'] <= -threshold:
            if vs['compound'] > 0:
                pos_correct += 1
            pos_count +=1


neg_count = 0
neg_correct = 0

with open("negative.txt","r") as f:
    for line in f.read().split('\n'):
        vs = analyzer.polarity_scores(line)
        if vs['compound'] >= threshold or vs['compound'] <= -threshold:
            if vs['compound'] <= 0:
                neg_correct += 1
            neg_count +=1

print("Positive accuracy = {}% via {} samples".format(pos_correct/pos_count*100.0, pos_count))
print("Negative accuracy = {}% via {} samples".format(neg_correct/neg_count*100.0, neg_count))
Positive accuracy = 87.22179585571757% via 2606 samples
Negative accuracy = 50.0% via 1818 samples
[Finished in 3.1s]

We threw out a lot of samples here, and we aren't doing much different than TextBlob. Should we give up? Maybe, but what if we instead look for no conflict. So, what if we look only for signals where the opposite is lower, or non-existent? For example, to classify something as positive here, why not require the neg bit to be less than 0.1 or something like:

        if not vs['neg'] > 0.1:
            if vs['pos']-vs['neg'] >= 0:
                pos_correct += 1
            pos_count +=1

Full script:

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

import time
analyzer = SentimentIntensityAnalyzer()

pos_count = 0
pos_correct = 0

with open("positive.txt","r") as f:
    for line in f.read().split('\n'):
        vs = analyzer.polarity_scores(line)
        if not vs['neg'] > 0.1:
            if vs['pos']-vs['neg'] > 0:
                pos_correct += 1
            pos_count +=1


neg_count = 0
neg_correct = 0

with open("negative.txt","r") as f:
    for line in f.read().split('\n'):
        vs = analyzer.polarity_scores(line)
        if not vs['pos'] > 0.1:
            if vs['pos']-vs['neg'] <= 0:
                neg_correct += 1
            neg_count +=1

print("Positive accuracy = {}% via {} samples".format(pos_correct/pos_count*100.0, pos_count))
print("Negative accuracy = {}% via {} samples".format(neg_correct/neg_count*100.0, neg_count))
Positive accuracy = 80.69370058658507% via 3921 samples
Negative accuracy = 91.73643975245722% via 2747 samples
[Finished in 3.1s]

Alright! I can work with that. Recall the suggestion about -0.5 to 0.5 being "neutral" with VADER? What if we tried this with the TextBlob?

from textblob import TextBlob

pos_count = 0
pos_correct = 0

with open("positive.txt","r") as f:
    for line in f.read().split('\n'):
        analysis = TextBlob(line)

        if analysis.sentiment.polarity >= 0.5:
            if analysis.sentiment.polarity > 0:
                pos_correct += 1
            pos_count +=1


neg_count = 0
neg_correct = 0

with open("negative.txt","r") as f:
    for line in f.read().split('\n'):
        analysis = TextBlob(line)
        if analysis.sentiment.polarity <= -0.5:
            if analysis.sentiment.polarity <= 0:
                neg_correct += 1
            neg_count +=1

print("Positive accuracy = {}% via {} samples".format(pos_correct/pos_count*100.0, pos_count))
print("Negative accuracy = {}% via {} samples".format(neg_correct/neg_count*100.0, neg_count))
Positive accuracy = 100.0% via 766 samples
Negative accuracy = 100.0% via 282 samples
[Finished in 6.4s]

Oh.

Hmm, okay so we lost a lot of samples, but got perfect accuracy. What if we change this threshold just a bit, let's go with 0.1 and -0.1 instead.

Positive accuracy = 100.0% via 3310 samples
Negative accuracy = 100.0% via 1499 samples
[Finished in 6.5s]

...interesting. 0.001?

Positive accuracy = 100.0% via 3790 samples
Negative accuracy = 100.0% via 2072 samples
[Finished in 6.5s]

0.00000000001?!

Same result. Okay, so this tells me that we have a lot of "zeros," which was hurting our accuracy.

The VADER sentiment takes ~ 3.1-3.3 seconds to run, while TextBlob takes ~6.4-6.5 seconds, so about twice as long.

Now, if sentiment was absolutely the *only* thing you planned to do with this text, and you need it to be processed as fast as possible, then VADER sentiment is likely a better choice, going with that 0.05 threshdold which gave:

Positive accuracy = 99.85114617445669% via 3359 samples
Negative accuracy = 99.44954128440368% via 2180 samples
[Finished in 3.1s]

For me, I am fairly interested in TextBlob for part of speech tagging, however. I am guessing a lot of the speed difference is, with TextBlob, our sentences are being converted to entire TextBlob objects, which can do many things, where VADER Sentiment converts your strings to objects that really just do sentiment, nothing more.

Which one you choose to go with is really up to you, and will more depend on your specific needs.

For me, I am going to put them to use on some Twitter data.

The next tutorial:





  • Intro - Data Visualization Applications with Dash and Python p.1
  • Interactive User Interface - Data Visualization GUIs with Dash and Python p.2
  • Dynamic Graph based on User Input - Data Visualization GUIs with Dash and Python p.3
  • Live Graphs - Data Visualization GUIs with Dash and Python p.4
  • Vehicle Data App Example - Data Visualization GUIs with Dash and Python p.5
  • Out of the Box Sentiment Analysis options with Python using VADER Sentiment and TextBlob
  • Streaming Tweets and Sentiment from Twitter in Python - Sentiment Analysis GUI with Dash and Python p.2
  • Reading from our sentiment database - Sentiment Analysis GUI with Dash and Python p.3
  • Live Twitter Sentiment Graph - Sentiment Analysis GUI with Dash and Python p.4
  • Dynamically Graphing Terms for Sentiment - Sentiment Analysis GUI with Dash and Python p.5
  • Deploy Dash App to a VPS web server - Data Visualization Applications with Dash and Python p.11