What's going on everyone and welcome to a quick tutorial on doing sentiment analysis with Python. Today, I am going to be looking into two of the more popular "out of the box" sentiment analysis solutions for Python. The first is TextBlob, and the second is going to be Vader Sentiment. This tutorial will focus on checking out these two libaries and using them, and the subsequent tutorials in this series are going to be about making a sentiment analysis application with Twitter.
TextBlob is more of a natural language processing library, but it comes with a rule-based sentiment analysis library that we can use. Taken from the readme: "VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media."
I am sure there are others, but I would like to compare these two for now. To do this, I am going to use a "short movie reviews" dataset. I've had this dataset for years, but have no idea its original source. If anyone knows, I would be happy to cite it. Anyway, here they are: positive.txt and negative.txt. In total, a bit over 10,000 examples for us to test against.
I've always liked using this reviews dataset because many of the reviews are hard for even me to peg as being positive or negative. For this reason, we can also utlize any sort of "confidence" metric a classifier might have to see if we can tweak things to get better accuracy, even if it means throwing some samples out. I am planning to use this sentiment analysis algorithm on Twitter streaming data, on high-volume subjects, so I am evaluating these on both accuracy and speed.
To begin our journey, let's check out TextBlob's offering. With TextBlob, we get a polarity
and a subjectivity
metric. The polarity is the sentiment itself, ranging from a -1 to a +1. The subjectivity
is a measure of the sentiment being objective to subjective, and goes from 0 to 1. We'd rather see sentiment that is objective than subjective, so a lower score should likely denote a more likely-to-be-accurate reading. We'll see.
To use it, you will need to install it, so do a pip install textblob
. Now, let's see a quick example:
from textblob import TextBlob analysis = TextBlob("TextBlob sure looks like it has some interesting features!")
To use something from TextBlob, we first want to convert it to a TextBlob
object, so we do that with our analysis
var. From here, we can do quite a bit. You can read the docs, or just do:
print(dir(analysis))
You should see quite a bit:
['__add__', '__class__', '__contains__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_cmpkey', '_compare', '_create_sentence_objects', '_strkey', 'analyzer', 'classifier', 'classify', 'correct', 'detect_language', 'ends_with', 'endswith', 'find', 'format', 'index', 'join', 'json', 'lower', 'ngrams', 'noun_phrases', 'np_counts', 'np_extractor', 'parse', 'parser', 'polarity', 'pos_tagger', 'pos_tags', 'raw', 'raw_sentences', 'replace', 'rfind', 'rindex', 'sentences', 'sentiment', 'sentiment_assessments', 'serialized', 'split', 'starts_with', 'startswith', 'string', 'strip', 'stripped', 'subjectivity', 'tags', 'title', 'to_json', 'tokenize', 'tokenizer', 'tokens', 'translate', 'translator', 'upper', 'word_counts', 'words']
We can immediately see quite a few useful ones. We can do things like detect_language
, capture noun_phrases
, label parts of speech with tags
, we can even translate
to other languages, tokenize
, and more. Very interesting! I am mainly here for the sentiment, but these things are nifty. Let's check a few:
print(analysis.translate(to='es'))
¡TextBlob seguramente parece tener algunas características interesantes!
print(analysis.tags)
[('TextBlob', 'NNP'), ('sure', 'JJ'), ('looks', 'VBZ'), ('like', 'IN'), ('it', 'PRP'), ('has', 'VBZ'), ('some', 'DT'), ('interesting', 'JJ'), ('features', 'NNS')]
These are parts of speech. Since TextBlob
is built on top of NLTK, the part of speech tags are the same. Here are the definitions:
POS tag list: CC coordinating conjunction CD cardinal digit DT determiner EX existential there (like: "there is" ... think of it like "there exists") FW foreign word IN preposition/subordinating conjunction JJ adjective 'big' JJR adjective, comparative 'bigger' JJS adjective, superlative 'biggest' LS list marker 1) MD modal could, will NN noun, singular 'desk' NNS noun plural 'desks' NNP proper noun, singular 'Harrison' NNPS proper noun, plural 'Americans' PDT predeterminer 'all the kids' POS possessive ending parent\'s PRP personal pronoun I, he, she PRP$ possessive pronoun my, his, hers RB adverb very, silently, RBR adverb, comparative better RBS adverb, superlative best RP particle give up TO to go 'to' the store. UH interjection errrrrrrrm VB verb, base form take VBD verb, past tense took VBG verb, gerund/present participle taking VBN verb, past participle taken VBP verb, sing. present, non-3d take VBZ verb, 3rd person sing. present takes WDT wh-determiner which WP wh-pronoun who, what WP$ possessive wh-pronoun whose WRB wh-abverb where, when
Now let's check out the sentiment:
print(analysis.sentiment)
Sentiment(polarity=0.5625, subjectivity=0.6944444444444444)
So this is fairly positive, but also highly subjective. Now, let's test this on a bit more data using the positive.txt and negative.txt datasets.
from textblob import TextBlob pos_count = 0 pos_correct = 0 with open("positive.txt","r") as f: for line in f.read().split('\n'): analysis = TextBlob(line) if analysis.sentiment.polarity > 0: pos_correct += 1 pos_count +=1 neg_count = 0 neg_correct = 0 with open("negative.txt","r") as f: for line in f.read().split('\n'): analysis = TextBlob(line) if analysis.sentiment.polarity <= 0: neg_correct += 1 neg_count +=1 print("Positive accuracy = {}% via {} samples".format(pos_correct/pos_count*100.0, pos_count)) print("Negative accuracy = {}% via {} samples".format(neg_correct/neg_count*100.0, neg_count))
Positive accuracy = 71.11777944486121% via 5332 samples Negative accuracy = 55.8702175543886% via 5332 samples [Finished in 6.5s]
It looks like our positive accuracy is decent, but the negative sentiment accuracy isn't all that good. It could be the case that this classifier is biased across the board, so our "zero" could be moved a bit, let's say 0.2, so we change:
if analysis.sentiment.polarity > 0.2: pos_correct += 1
and
if analysis.sentiment.polarity <= 0.2: neg_correct += 1
Positive accuracy = 46.19279819954989% via 5332 samples Negative accuracy = 80.1012753188297% via 5332 samples [Finished in 6.5s]
Nope, that's not it!
How about 0.1
?
Positive accuracy = 60.5026256564141% via 5332 samples Negative accuracy = 68.37959489872468% via 5332 samples [Finished in 6.5s]
Hmm, well that's better than random I guess, but not something we want to see. What if we play with the subjectivity
though? Maybe we can only look at reviews that we feel are more objective?
from textblob import TextBlob pos_count = 0 pos_correct = 0 with open("positive.txt","r") as f: for line in f.read().split('\n'): analysis = TextBlob(line) if analysis.sentiment.subjectivity < 0.3: if analysis.sentiment.polarity > 0: pos_correct += 1 pos_count +=1 neg_count = 0 neg_correct = 0 with open("negative.txt","r") as f: for line in f.read().split('\n'): analysis = TextBlob(line) if analysis.sentiment.subjectivity < 0.3: if analysis.sentiment.polarity <= 0: neg_correct += 1 neg_count +=1 print("Positive accuracy = {}% via {} samples".format(pos_correct/pos_count*100.0, pos_count)) print("Negative accuracy = {}% via {} samples".format(neg_correct/neg_count*100.0, neg_count))
Positive accuracy = 29.116465863453815% via 996 samples Negative accuracy = 76.03369065849922% via 1306 samples [Finished in 6.5s]
Wow, we're throwing away a lot of samples, and not getting much in return for it! What if we flip things around and require a high degree of subjectivity?
from textblob import TextBlob pos_count = 0 pos_correct = 0 with open("positive.txt","r") as f: for line in f.read().split('\n'): analysis = TextBlob(line) if analysis.sentiment.subjectivity > 0.9: if analysis.sentiment.polarity > 0: pos_correct += 1 pos_count +=1 neg_count = 0 neg_correct = 0 with open("negative.txt","r") as f: for line in f.read().split('\n'): analysis = TextBlob(line) if analysis.sentiment.subjectivity > 0.9: if analysis.sentiment.polarity <= 0: neg_correct += 1 neg_count +=1 print("Positive accuracy = {}% via {} samples".format(pos_correct/pos_count*100.0, pos_count)) print("Negative accuracy = {}% via {} samples".format(neg_correct/neg_count*100.0, neg_count))
Positive accuracy = 76.08695652173914% via 414 samples Negative accuracy = 68.6046511627907% via 344 samples [Finished in 6.5s]
Interesting, I must not understand subjectivity. Interestingly, if we require subjectivity to be BELOW 0.1, I get:
Positive accuracy = 2.914389799635701% via 549 samples Negative accuracy = 98.1159420289855% via 690 samples [Finished in 6.5s]
Okay, so I am not too impressed here. Can VADER Sentiment save us? Let's find out!
.
Do a pip install vadersentiment
, now let's check it out!
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer analyzer = SentimentIntensityAnalyzer() vs = analyzer.polarity_scores("VADER Sentiment looks interesting, I have high hopes!") print(vs)
Giving us:
{'neg': 0.0, 'neu': 0.463, 'pos': 0.537, 'compound': 0.6996} [Finished in 0.3s]
The neg
is negative sentiment found, neu
is anything found to be neutral, pos
is positive, and the compound
is "computed by summing the valence scores of each word in the lexicon, adjusted according to the rules, then normalized... [it] is the most useful metric if you want a single unidimensional measure of sentiment." (from the docs). The docs also suggest:
positive sentiment: compound score >= 0.5 neutral sentiment: (compound score > -0.5) and (compound score < 0.5) negative sentiment: compound score <= -0.5
Alright, let's see what we find. First though, to properly compare, we should just start with 0.
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer analyzer = SentimentIntensityAnalyzer() pos_count = 0 pos_correct = 0 with open("positive.txt","r") as f: for line in f.read().split('\n'): vs = analyzer.polarity_scores(line) if vs['compound'] > 0: pos_correct += 1 pos_count +=1 neg_count = 0 neg_correct = 0 with open("negative.txt","r") as f: for line in f.read().split('\n'): vs = analyzer.polarity_scores(line) if vs['compound'] <= 0: neg_correct += 1 neg_count +=1 print("Positive accuracy = {}% via {} samples".format(pos_correct/pos_count*100.0, pos_count)) print("Negative accuracy = {}% via {} samples".format(neg_correct/neg_count*100.0, neg_count))
Positive accuracy = 69.4298574643661% via 5332 samples Negative accuracy = 57.764441110277566% via 5332 samples [Finished in 3.1s]
Hmm, not much better. Okay, now let's go with the 0.5 and -0.5 as suggested by the documentation:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer analyzer = SentimentIntensityAnalyzer() pos_count = 0 pos_correct = 0 threshold = 0.5 with open("positive.txt","r") as f: for line in f.read().split('\n'): vs = analyzer.polarity_scores(line) if vs['compound'] >= threshold or vs['compound'] <= -threshold: if vs['compound'] > 0: pos_correct += 1 pos_count +=1 neg_count = 0 neg_correct = 0 with open("negative.txt","r") as f: for line in f.read().split('\n'): vs = analyzer.polarity_scores(line) if vs['compound'] >= threshold or vs['compound'] <= -threshold: if vs['compound'] <= 0: neg_correct += 1 neg_count +=1 print("Positive accuracy = {}% via {} samples".format(pos_correct/pos_count*100.0, pos_count)) print("Negative accuracy = {}% via {} samples".format(neg_correct/neg_count*100.0, neg_count))
Positive accuracy = 87.22179585571757% via 2606 samples Negative accuracy = 50.0% via 1818 samples [Finished in 3.1s]
We threw out a lot of samples here, and we aren't doing much different than TextBlob
. Should we give up? Maybe, but what if we instead look for no conflict. So, what if we look only for signals where the opposite is lower, or non-existent? For example, to classify something as positive here, why not require the neg
bit to be less than 0.1 or something like:
if not vs['neg'] > 0.1: if vs['pos']-vs['neg'] >= 0: pos_correct += 1 pos_count +=1
Full script:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer import time analyzer = SentimentIntensityAnalyzer() pos_count = 0 pos_correct = 0 with open("positive.txt","r") as f: for line in f.read().split('\n'): vs = analyzer.polarity_scores(line) if not vs['neg'] > 0.1: if vs['pos']-vs['neg'] > 0: pos_correct += 1 pos_count +=1 neg_count = 0 neg_correct = 0 with open("negative.txt","r") as f: for line in f.read().split('\n'): vs = analyzer.polarity_scores(line) if not vs['pos'] > 0.1: if vs['pos']-vs['neg'] <= 0: neg_correct += 1 neg_count +=1 print("Positive accuracy = {}% via {} samples".format(pos_correct/pos_count*100.0, pos_count)) print("Negative accuracy = {}% via {} samples".format(neg_correct/neg_count*100.0, neg_count))
Positive accuracy = 80.69370058658507% via 3921 samples Negative accuracy = 91.73643975245722% via 2747 samples [Finished in 3.1s]
Alright! I can work with that. Recall the suggestion about -0.5 to 0.5 being "neutral" with VADER? What if we tried this with the TextBlob
?
from textblob import TextBlob pos_count = 0 pos_correct = 0 with open("positive.txt","r") as f: for line in f.read().split('\n'): analysis = TextBlob(line) if analysis.sentiment.polarity >= 0.5: if analysis.sentiment.polarity > 0: pos_correct += 1 pos_count +=1 neg_count = 0 neg_correct = 0 with open("negative.txt","r") as f: for line in f.read().split('\n'): analysis = TextBlob(line) if analysis.sentiment.polarity <= -0.5: if analysis.sentiment.polarity <= 0: neg_correct += 1 neg_count +=1 print("Positive accuracy = {}% via {} samples".format(pos_correct/pos_count*100.0, pos_count)) print("Negative accuracy = {}% via {} samples".format(neg_correct/neg_count*100.0, neg_count))
Positive accuracy = 100.0% via 766 samples Negative accuracy = 100.0% via 282 samples [Finished in 6.4s]
Oh.
Hmm, okay so we lost a lot of samples, but got perfect accuracy. What if we change this threshold just a bit, let's go with 0.1
and -0.1
instead.
Positive accuracy = 100.0% via 3310 samples Negative accuracy = 100.0% via 1499 samples [Finished in 6.5s]
...interesting. 0.001
?
Positive accuracy = 100.0% via 3790 samples Negative accuracy = 100.0% via 2072 samples [Finished in 6.5s]
0.00000000001
?!
Same result. Okay, so this tells me that we have a lot of "zeros," which was hurting our accuracy.
The VADER sentiment
takes ~ 3.1-3.3 seconds to run, while TextBlob
takes ~6.4-6.5 seconds, so about twice as long.
Now, if sentiment was absolutely the *only* thing you planned to do with this text, and you need it to be processed as fast as possible, then VADER sentiment is likely a better choice, going with that 0.05 threshdold which gave:
Positive accuracy = 99.85114617445669% via 3359 samples Negative accuracy = 99.44954128440368% via 2180 samples [Finished in 3.1s]
For me, I am fairly interested in TextBlob
for part of speech tagging, however. I am guessing a lot of the speed difference is, with TextBlob, our sentences are being converted to entire TextBlob objects, which can do many things, where VADER Sentiment converts your strings to objects that really just do sentiment, nothing more.
Which one you choose to go with is really up to you, and will more depend on your specific needs.
For me, I am going to put them to use on some Twitter data.