In this tutorial miniseries, we're going to be covering the Python Reddit API Wrapper, PRAW. Reddit is a place for just about everything, separated by "subreddits." I find it to be a decent source for news, a great source to learn more about specific topics, and certainly always interesting.
While it fluctuates a bit, at the time of my writing this, Reddit is one of the top 10 websites in the world, and the sheer amount of contextual data that you can find here is staggering. I am currently interested in pulling more specifically conversational data, but there's a lot more you could do here, so my goal here is to show you how quickly you can get your own bot up and running, do whatever it is you're wanting to do.
The Reddit API has a rate limit of 30 requests per minute, but you can still make quite a bit with this. The PRAW also dynamically handles the rate limiting for you and attempts to be efficient, so you wont have to worry about crossing the line, but you might wonder regardless why things might not be going as quick as you expect.
To get the PRAW, simply do a:
pip install praw
import praw
reddit = praw.Reddit(client_id='clientid',
client_secret='secret', password='password',
user_agent='PrawTut', username='username')
Above, we've created a Reddit instance. You will need to fill in all of the information above with your own information. You need a Reddit account, so you can either use an existing account or create a new one. Once done, you can go to https://www.reddit.com/, and then click on "preferences" next to your name.
From here, choose the "apps" tab, and then "create an app" or "create another app..."
In here, fill out the name, description, about url and redirect uri as you see fit.
Once done, you should see your secret, and then the Client ID sits under your app's name in the top left. It'll look something like "l9QcKj32sa_x."
Once you have your Reddit instance, the world of Reddit just opened up to you! Let's see what you can do. At least for me, I am using a brand new acount, so I am not a member of any subreddits. SAD. So let's check out a subreddit. I think the Python subreddit would be fitting.
import praw
reddit = praw.Reddit(client_id='clientid',
client_secret='secret', password='password',
user_agent='PrawTut', username='username')
subreddit = reddit.subreddit('python')
From here, maybe we want to actually see what's going on here. I usually filter using the "hot" sorting, but you can feel free to use rising or controversial if you like.
hot_python = subreddit.hot()
What this will give you is a bunch of 'submission objects' in the form of a Python generator object, which we can iterate through. To keep things simple, let's limit how many submission objects we get (submission threads in this case). We'll limit to just one:
hot_python = subreddit.hot(limit=1)
Now we can iterate through this with something like
for submission in hot_python:
print(submission)
As you see here, what we have is the submission object's id, but it's not just an id, it's an object that we can do things with. For example:
hot_python = subreddit.hot(limit=1)
for submission in hot_python:
print(submission.title)
Note that I re-defined hot_python. That's because it's a generator object, so it's not saved after it's been iterated through. You could convert to list if you wanted to keep it stored in memory, but, in most cases, you wont be doing that, so I am not doing that here.
Another issue is, as you can see, we did get a title, but if you actually go to the Python subreddit, you will see this is a sticky, posted 3 months ago. We don't really want stickies. Of course, if you are familiar with the subreddit, you could know to just skip the stickies, or we can just check for them:
hot_python = subreddit.hot(limit=3)
for submission in hot_python:
if not submission.stickied:
print(submission.title)
We can also gather all sorts of information on this submission:
hot_python = subreddit.hot(limit=3)
for submission in hot_python:
if not submission.stickied:
print('Title: {}, ups: {}, downs: {}, Have we visited?: {}'.format(submission.title,
submission.ups,
submission.downs,
submission.visited))
Not only can we get information, but we can also take actions on these objects, here are a few options at our disposal:
Submissions (threads/comments)
.upvote()
.clear_vote()
.downvote()
.reply()
Subreddits:
.subscribe()
.unsubscribe()
Let's take an obvious action:
subreddit.subscribe()
This will subscribe us to the Python subreddit. You might be seeing the upvote and downvote and thinking to yourself "jackpot!" Upvotes and downvotes are still meant to be done only by humans and you can still wind up banned for abusing this rule. Information: https://www.reddit.com/dev/api#POST_api_vote
The reason upvote/downvote exists in the API is in case you wanted to build some sort of Reddit application that other people would use.
In the next tutorial, we're going to be covering how to navigate the Reddit API by trying to parse and organize comments.