Python Programming Tutorials

NLTK Part of Speech Tagging Tutorial

Once you have NLTK installed, you are ready to begin using it. One of the more powerful aspects of NLTK for Python is the part of speech tagger that is built in. Notably, this part of speech tagger is not perfect, but it is pretty darn good. If you are looking for something better, you can purchase some, or even modify the existing code for NLTK.

The idea of part of speech tagging is so that you can understand the sentence structure and begin to use your program to somewhat follow the meaning of a sentence based on the word used, its part of speech, and the string it creates.

To accompany the video, here is the sample code for NLTK part of speech tagging with lots of comments and info as well:

POS tag list:

CC coordinating conjunction
CD cardinal digit
DT determiner
EX existential there (like: "there is" ... think of it like "there exists")
FW foreign word
IN preposition/subordinating conjunction
JJ adjective 'big'
JJR adjective, comparative 'bigger'
JJS adjective, superlative 'biggest'
LS list marker 1)
MD modal could, will
NN noun, singular 'desk'
NNS noun plural 'desks'
NNP proper noun, singular 'Harrison'
NNPS proper noun, plural 'Americans'
PDT predeterminer 'all the kids'
POS possessive ending parent's
PRP personal pronoun I, he, she
PRP$ possessive pronoun my, his, hers
RB adverb very, silently,
RBR adverb, comparative better
RBS adverb, superlative best
RP particle give up
TO to go 'to' the store.
UH interjection errrrrrrrm
VB verb, base form take
VBD verb, past tense took
VBG verb, gerund/present participle taking
VBN verb, past participle taken
VBP verb, sing. present, non-3d take
VBZ verb, 3rd person sing. present takes
WDT wh-determiner which
WP wh-pronoun who, what
WP$ possessive wh-pronoun whose
WRB wh-abverb where, when

import re
import nltk



cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.addheaders = [('User-agent', 'Mozilla/5.0')]


#empty array#
contentArray =['Starbucks is doing very well lately.',
               'Overall, while it may seem there is already a Starbucks on every corner, Starbucks still has a lot of room to grow.',
               'They just began expansion into food products, which has been going quite well so far for them.',
               'I can attest that my own expenditure when going to Starbucks has increased, in lieu of these food products.',
               'Starbucks is also indeed expanding their number of stores as well.',
               'Starbucks still sees strong sales growth here in the united states, and intends to actually continue increasing this.',
               'Starbucks also has one of the more successful loyalty programs, which accounts for 30%  of all transactions being loyalty-program-based.',
               'As if news could not get any more positive for the company, Brazilian weather has become ideal for producing coffee beans.',
               'Brazil is the world\'s #1 coffee producer, the source of about 1/3rd of the entire world\'s supply!',
               'Given the dry weather, coffee farmers have amped up production, to take as much of an advantage as possible with the dry weather.',
               'Increase in supply... well you know the rules...',]


exampleArray = ['The incredibly intimidating NLP scares people away who are sissies']
               

def processContent():
    try:
        for item in exampleArray:
            tokenized = nltk.word_tokenize(item)
            tagged = nltk.pos_tag(tokenized)

            chunkGram = r"""Chunk: {<RB.?>*<VB.?>*<NNP>}"""
            chunkParser = nltk.RegexpParser(chunkGram)

            chunked = chunkParser.parse(tagged)
            print(chunked)
            chunked.draw()
            
    except Exception as e:
        print(str(e))
    


processContent()

The next tutorial:

Simple RSS feed scraping
Simple website scraping
More Parsing/Scraping
Installing the Natural Language Toolkit (NLTK)
NLTK Part of Speech Tagging Tutorial
Named Entity Recognition NLTK tutorial
Building a Knowledge-base
More Named Entity Recognition with NLTK
Pulling related Sentiment about Named Entities
Populating a knowledge-base
What next?
Accuracy Testing
Building back-testing
Machine learning and Sentiment Analysis