When OpenAI's Universe came out, and various articles suggested games even like Grand Theft Auto 5 were ready to go, I was very excited in checking it out. Then, however, somewhat mysteriously, GTA V was completely removed from Universe with no explanation whatsoever.
I gave up and forgot about it for a while, but the idea still seemed exciting. Finally, I decided to put some more mental energy into it, and questioned whether or not I even needed Open AI at all for a task like this. Sure, it's nice for simpler games that can be run en masse, so you can train thousands of iterations in moments, but, with something like GTA V, this is really not going to be much of an option anyway.
Just in case it's not totally obvious, why GTA V? At least for me, Grand Theft Auto 5 is a great environment to practice in for a variety of reasons. It's an open world with endless things you can do, but let's consider even just a simple one: Self-driving cars. With GTA V, we can use mods to control the time of day, weather, traffic, speeds, what happens when we crash...all kinds of things (mainly using mods, but this isn't absolutely required). It's just a completely customize-able environment.
Some of my tutorials are planned fully, others sort of, and some not at all. This is not planned at all, and is going to be me working through this problem. I realize not everyone has Grand Theft Auto 5, but it is my expectation that you have SOME similar games to do the tasks we're going to be working on, and that this method can be done on a variety of games. Because you may have to translate some things and tweak to get things working on your end, this is probably not going to be a beginner-friendly series.
My initial goal is to just create a sort of self-driving car. Any game with lanes and cars should be just fine for you to follow along. The method I will use to access the game should be do-able on almost any game. A simpler game will likely be much more simple of a task too. Things like sun glare in GTA V will make computer vision only much more challenging, but also more realistic.
I may also try other games with this method, since I also think we can teach an AI to play games by simply showing it how to play for a bit, using a Convolutional Neural Network on that information, and then letting the AI poke around.
Here are my initial thoughts:
Despite not having a pre-packaged solution already with Python:
This is already enough for more rudimentary tasks, but what about for something like deep learning? Really the only extra thing we might want is something that can also log various events from the game world. That said, since most games are played almost completely visually, we can handle for that already, and we can also track mouse position and key presses, allowing us to engage in deep learning.
I doubt this will be sunshine and rainbows, but I think it's at least possible, and will make for a great, or at least interesting, project. My main concern is processing everything fast enough, but I think we can do it, and it's at least worth a shot.
So this is quite a large project, if we don't break it down, and take some baby-steps, we're going to be overwhelmed. The way I see it, we need to try to do the bare minimum first. Thus, the initial goals are:
Alright, so step 1, how should we actually access our screen? I am only certain it's been done, but I don't really know how. For this, I take to Google! I find quite a few examples, most of which don't actually loop, but this one does: http://stackoverflow.com/questions/24129253/screen-capture-with-opencv-and-python-2-7, it just appears to have a typo on the import, ImageGrab is part of PIL.
import numpy as np
import ImageGrab
import cv2
while(True):
printscreen_pil = ImageGrab.grab()
printscreen_numpy = np.array(printscreen_pil.getdata(),dtype=uint8)\
.reshape((printscreen_pil.size[1],printscreen_pil.size[0],3))
cv2.imshow('window',printscreen_numpy)
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break
Odd, okay, ImageGrab is part of PIL from what I understand, so we fix that import:
import numpy as np
from PIL import ImageGrab
import cv2
while(True):
printscreen_pil = ImageGrab.grab()
printscreen_numpy = np.array(printscreen_pil.getdata(),dtype=uint8)\
.reshape((printscreen_pil.size[1],printscreen_pil.size[0],3))
cv2.imshow('window',printscreen_numpy)
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break
More fighting. The dtype should be string, not what appears to be a variable name that's obviously not defined. Did this person run the code?
import numpy as np
from PIL import ImageGrab
import cv2
def screen_record():
while True:
printscreen_pil = ImageGrab.grab()
printscreen_numpy = np.array(printscreen_pil.getdata(),dtype='uint8')\
.reshape((printscreen_pil.size[1],printscreen_pil.size[0],3))
cv2.imshow('window',printscreen_numpy)
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break
Great, this one actually works to some degree. It's a bit large though. And slow. Let's solve for the size.
import numpy as np
from PIL import ImageGrab
import cv2
def screen_record():
while True:
# 800x600 windowed mode
printscreen_pil = ImageGrab.grab(bbox=(0,40,800,640))
printscreen_numpy = np.array(printscreen_pil.getdata(),dtype='uint8')\
.reshape((printscreen_pil.size[1],printscreen_pil.size[0],3))
cv2.imshow('window',cv2.cvtColor(printscreen_numpy, cv2.COLOR_BGR2RGB))
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break
Okay great, this will work for size... but this is still very slow. I am currently getting about 2-3 Frames per second. Let's find out why.
import numpy as np
from PIL import ImageGrab
import cv2
import time
def screen_record():
last_time = time.time()
while True:
# 800x600 windowed mode
printscreen_pil = ImageGrab.grab(bbox=(0,40,800,640))
printscreen_numpy = np.array(printscreen_pil.getdata(),dtype='uint8')\
.reshape((printscreen_pil.size[1],printscreen_pil.size[0],3))
print('loop took {} seconds'.format(time.time()-last_time))
last_time = time.time()
## cv2.imshow('window',cv2.cvtColor(printscreen_numpy, cv2.COLOR_BGR2RGB))
## if cv2.waitKey(25) & 0xFF == ord('q'):
## cv2.destroyAllWindows()
## break
This is still ~2-3 FPS, so the imshow is not the culprit.
import numpy as np
from PIL import ImageGrab
import cv2
import time
def screen_record():
last_time = time.time()
while True:
# 800x600 windowed mode
printscreen_pil = ImageGrab.grab(bbox=(0,40,800,640))
## printscreen_numpy = np.array(printscreen_pil.getdata(),dtype='uint8')\
## .reshape((printscreen_pil.size[1],printscreen_pil.size[0],3))
print('loop took {} seconds'.format(time.time()-last_time))
last_time = time.time()
##
## cv2.imshow('window',cv2.cvtColor(printscreen_numpy, cv2.COLOR_BGR2RGB))
## if cv2.waitKey(25) & 0xFF == ord('q'):
## cv2.destroyAllWindows()
## break
Oooh. We're on to something now: loop took 0.05849909782409668 seconds loop took 0.044053077697753906 seconds loop took 0.04760456085205078 seconds loop took 0.04805493354797363 seconds loop took 0.05989837646484375 seconds
Now, for OpenCV's imshow, we really need a numpy array. What if, rather than doing the whole .getdata and reshape.... let's just convert ImageGrab.grab(bbox=(0,40,800,640)) to a numpy array. Why the reshape? It's already the size we need, and maybe .getdata, despite being a method, wont be required.
import numpy as np
from PIL import ImageGrab
import cv2
import time
def screen_record():
last_time = time.time()
while(True):
# 800x600 windowed mode
printscreen = np.array(ImageGrab.grab(bbox=(0,40,800,640)))
print('loop took {} seconds'.format(time.time()-last_time))
last_time = time.time()
cv2.imshow('window',cv2.cvtColor(printscreen, cv2.COLOR_BGR2RGB))
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break
Great, this gives me ~12-13 FPS. That's certainly not amazing, but we can work with that.
I chose the bbox dimensions to match an 800x600 resolution of GTA V in windowed mode.