Welcome to part 4 of the Google Cloud tutorial series. In this part, we're going to explore some of the Natural Language API. We're going to focus on the entity recognition and sentiment analysis, but you can also do syntactical analysis with this API.
As usual, you will need to both enable this API and of course have the API credentials setup as we did in Part 2.
From here, things should begin to look familiar with the APIs, for example we'll have client = language.Client()
, and then we'll get all sorts of methods that we can do with some input, which, in this case, will be text. For example:
from google.cloud import language def language_analysis(text): client = language.Client() document = client.document_from_text(text) sent_analysis = document.analyze_sentiment() dir(sent_analysis) sentiment = sent_analysis.sentiment ent_analysis = document.analyze_entities() dir(ent_analysis) entities = ent_analysis.entities return sentiment, entities example_text = 'Python is such a great programming language' sentiment, entities = language_analysis(example_text) print(sentiment.score, sentiment.magnitude) for e in entities: print(e.name, e.entity_type, e.metadata, e.salience)
Running this:
(0.8, 0.8) (u'Python', u'ORGANIZATION', {u'mid': u'/m/05z1_', u'wikipedia_url': u'http://en.wikipedia.org/wiki/Python_(programming_language)'}, 0.84221077) (u'programming language', u'OTHER', {}, 0.1577892)
We've done quite a bit here, so let's break it down.
First, we got the sentiment and sentiment magnitude. The sentiment is scored between -1 and +1, and the sentiment magnitude is unbounded, from 0 onward. Sentiment is the emotion, and the magnitude is the strength of that emotion.
So, in the case above, we got 0.8 sentiment, and a 0.8 magnitude, so positive sentiment, and positive magnitude.
Next, we can see that we run through the entities. We can see that two were found, "Python" and "programming language." For each of the entities, we are printing out their name, the entity type, any meta data we might find, and then the salience.
The entity types shown so far are organization and other, but there are others, like locations and people. Then we get metadata, which will contain any extra information if there is any known. For example, for the Python entity, we get the wikipedia page for the Python programming language. Finally, we get the salience, which is a measure of this entity's importance in the overall document. As we can see, "Python" is far more "important" here as a subject than "programming language" is.
Taking the first few paragraphs from the Python programming wikipedia page, let's see what the results are.
from google.cloud import language def language_analysis(text): client = language.Client() document = client.document_from_text(text) sent_analysis = document.analyze_sentiment() dir(sent_analysis) sentiment = sent_analysis.sentiment ent_analysis = document.analyze_entities() dir(ent_analysis) entities = ent_analysis.entities return sentiment, entities example_text = '''Python is a widely used high-level programming language for general-$ Python features a dynamic type system and automatic memory management and supports mul$ Python interpreters are available for many operating systems, allowing Python code to $ sentiment, entities = language_analysis(example_text) print(sentiment.score, sentiment.magnitude) for e in entities: print(e.name, e.entity_type, e.metadata, e.salience)
(0.3, 2.6) (u'Python', u'ORGANIZATION', {u'mid': u'/m/05z1_', u'wikipedia_url': u'http://en.wikipedia.org/wiki/Python_(programming_language)'}, 0.56117535) (u'programming', u'OTHER', {}, 0.07085185) (u'programming language', u'OTHER', {}, 0.06715261) (u'CPython', u'CONSUMER_GOOD', {u'mid': u'/m/06bxxb', u'wikipedia_url': u'http://en.wikipedia.org/wiki/CPython'}, 0.0427983) (u'Guido van Rossum', u'PERSON', {u'mid': u'/m/01h05c', u'wikipedia_url': u'http://en.wikipedia.org/wiki/Guido_van_Rossum'}, 0.028725443) (u'syntax', u'OTHER', {}, 0.020441024) (u'language', u'OTHER', {}, 0.019001) (u'whitespace indentation', u'OTHER', {}, 0.01484696) (u'programs', u'OTHER', {}, 0.010850109) (u'language', u'OTHER', {}, 0.010392293) (u'code readability', u'OTHER', {}, 0.009924238) (u'code blocks', u'OTHER', {}, 0.009924238) (u'braces', u'CONSUMER_GOOD', {}, 0.009924238) (u'code', u'OTHER', {}, 0.009277556) (u'languages', u'OTHER', {}, 0.009277556) (u'design philosophy', u'OTHER', {}, 0.008008728) (u'type system', u'OTHER', {}, 0.007276746) (u'lines', u'OTHER', {}, 0.006710995) (u'C++', u'OTHER', {}, 0.006710995) (u'concepts', u'OTHER', {}, 0.006710995) (u'keywords', u'OTHER', {}, 0.006710995) (u'programmers', u'PERSON', {}, 0.006071727) (u'constructs', u'OTHER', {}, 0.005694318) (u'memory management', u'OTHER', {}, 0.005694318) (u'programming paradigms', u'OTHER', {}, 0.005694318) (u'systems', u'OTHER', {}, 0.0054415334) (u'variety', u'OTHER', {}, 0.004914569) (u'Java.', u'LOCATION', {u'mid': u'/m/07sbkfb', u'wikipedia_url': u'http://en.wikipedia.org/wiki/Java_(programming_language)'}, 0.004136519) (u'functional programming', u'OTHER', {}, 0.003754759) (u'variant implementations', u'OTHER', {}, 0.0035818678) (u'code', u'OTHER', {}, 0.0032084463) (u'all', u'OTHER', {}, 0.0031023393) (u'operating systems', u'OTHER', {}, 0.0029644007) (u'development model', u'OTHER', {}, 0.002501432) (u'Python interpreters', u'PERSON', {}, 0.0023183578) (u'styles', u'OTHER', {}, 0.0023183578) (u'Python Software Foundation', u'ORGANIZATION', {u'mid': u'/m/033l1p', u'wikipedia_url': u'http://en.wikipedia.org/wiki/Python_Software_Foundation'}, 0.0019105236)
We've got quite a few more entities here, many of which have metadata and of course all have saliance, which also is what all the entities are currently sorted by.
In the next tutorial, we're going to cover the Translation API.