NLTK Part of Speech Tagging Tutorial




Once you have NLTK installed, you are ready to begin using it. One of the more powerful aspects of NLTK for Python is the part of speech tagger that is built in. Notably, this part of speech tagger is not perfect, but it is pretty darn good. If you are looking for something better, you can purchase some, or even modify the existing code for NLTK.

The idea of part of speech tagging is so that you can understand the sentence structure and begin to use your program to somewhat follow the meaning of a sentence based on the word used, its part of speech, and the string it creates.

To accompany the video, here is the sample code for NLTK part of speech tagging with lots of comments and info as well:

POS tag list:

  • CC coordinating conjunction
  • CD cardinal digit
  • DT determiner
  • EX existential there (like: "there is" ... think of it like "there exists")
  • FW foreign word
  • IN preposition/subordinating conjunction
  • JJ adjective 'big'
  • JJR adjective, comparative 'bigger'
  • JJS adjective, superlative 'biggest'
  • LS list marker 1)
  • MD modal could, will
  • NN noun, singular 'desk'
  • NNS noun plural 'desks'
  • NNP proper noun, singular 'Harrison'
  • NNPS proper noun, plural 'Americans'
  • PDT predeterminer 'all the kids'
  • POS possessive ending parent's
  • PRP personal pronoun I, he, she
  • PRP$ possessive pronoun my, his, hers
  • RB adverb very, silently,
  • RBR adverb, comparative better
  • RBS adverb, superlative best
  • RP particle give up
  • TO to go 'to' the store.
  • UH interjection errrrrrrrm
  • VB verb, base form take
  • VBD verb, past tense took
  • VBG verb, gerund/present participle taking
  • VBN verb, past participle taken
  • VBP verb, sing. present, non-3d take
  • VBZ verb, 3rd person sing. present takes
  • WDT wh-determiner which
  • WP wh-pronoun who, what
  • WP$ possessive wh-pronoun whose
  • WRB wh-abverb where, when
import re
import nltk



cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.addheaders = [('User-agent', 'Mozilla/5.0')]


#empty array#
contentArray =['Starbucks is doing very well lately.',
               'Overall, while it may seem there is already a Starbucks on every corner, Starbucks still has a lot of room to grow.',
               'They just began expansion into food products, which has been going quite well so far for them.',
               'I can attest that my own expenditure when going to Starbucks has increased, in lieu of these food products.',
               'Starbucks is also indeed expanding their number of stores as well.',
               'Starbucks still sees strong sales growth here in the united states, and intends to actually continue increasing this.',
               'Starbucks also has one of the more successful loyalty programs, which accounts for 30%  of all transactions being loyalty-program-based.',
               'As if news could not get any more positive for the company, Brazilian weather has become ideal for producing coffee beans.',
               'Brazil is the world\'s #1 coffee producer, the source of about 1/3rd of the entire world\'s supply!',
               'Given the dry weather, coffee farmers have amped up production, to take as much of an advantage as possible with the dry weather.',
               'Increase in supply... well you know the rules...',]


exampleArray = ['The incredibly intimidating NLP scares people away who are sissies']
               

def processContent():
    try:
        for item in exampleArray:
            tokenized = nltk.word_tokenize(item)
            tagged = nltk.pos_tag(tokenized)

            chunkGram = r"""Chunk: {<RB.?>*<VB.?>*<NNP>}"""
            chunkParser = nltk.RegexpParser(chunkGram)

            chunked = chunkParser.parse(tagged)
            print(chunked)
            chunked.draw()
            
    except Exception as e:
        print(str(e))
    


processContent()
		

The next tutorial:





  • Simple RSS feed scraping
  • Simple website scraping
  • More Parsing/Scraping
  • Installing the Natural Language Toolkit (NLTK)
  • NLTK Part of Speech Tagging Tutorial
  • Named Entity Recognition NLTK tutorial
  • Building a Knowledge-base
  • More Named Entity Recognition with NLTK
  • Pulling related Sentiment about Named Entities
  • Populating a knowledge-base
  • What next?
  • Accuracy Testing
  • Building back-testing
  • Machine learning and Sentiment Analysis