Python Programming Tutorials

Lemmatizing with NLTK

A very similar operation to stemming is called lemmatizing. The major difference between these is, as you saw earlier, stemming can often create non-existent words, whereas lemmas are actual words.

So, your root stem, meaning the word you end up with, is not something you can just look up in a dictionary, but you can look up a lemma.

Some times you will wind up with a very similar word, but sometimes, you will wind up with a completely different word. Let's see some examples.

from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

print(lemmatizer.lemmatize("cats"))
print(lemmatizer.lemmatize("cacti"))
print(lemmatizer.lemmatize("geese"))
print(lemmatizer.lemmatize("rocks"))
print(lemmatizer.lemmatize("python"))
print(lemmatizer.lemmatize("better", pos="a"))
print(lemmatizer.lemmatize("best", pos="a"))
print(lemmatizer.lemmatize("run"))
print(lemmatizer.lemmatize("run",'v'))

Here, we've got a bunch of examples of the lemma for the words that we use. The only major thing to note is that lemmatize takes a part of speech parameter, "pos." If not supplied, the default is "noun." This means that an attempt will be made to find the closest noun, which can create trouble for you. Keep this in mind if you use lemmatizing!

In the next tutorial, we're going to dive into the NTLK corpus that came with the module, looking at all of the awesome documents they have waiting for us there.

There exists 1 quiz/question(s) for this tutorial. for access to these, video downloads, and no ads.

The next tutorial:

Tokenizing Words and Sentences with NLTK
Stop words with NLTK
Stemming words with NLTK
Part of Speech Tagging with NLTK
Chunking with NLTK
Chinking with NLTK
Named Entity Recognition with NLTK
Lemmatizing with NLTK
The corpora with NLTK
Wordnet with NLTK
Text Classification with NLTK
Converting words to Features with NLTK
Naive Bayes Classifier with NLTK
Saving Classifiers with NLTK
Scikit-Learn Sklearn with NLTK
Combining Algorithms with NLTK
Investigating bias with NLTK
Improving Training Data for sentiment analysis with NLTK
Creating a module for Sentiment Analysis with NLTK
Twitter Sentiment Analysis with NLTK
Graphing Live Twitter Sentiment Analysis with NLTK with NLTK
Named Entity Recognition with Stanford NER Tagger
Testing NLTK and Stanford NER Taggers for Accuracy
Testing NLTK and Stanford NER Taggers for Speed
Using BIO Tags to Create Readable Named Entity Lists