Welcome to part 3 of the TensorFlow Object Detection API tutorial series. In this part and the subsequent few, we're going to cover how we can track and detect our own custom objects with this API. If you watch the video, I am making use of Paperspace. If you need a high-end GPU, you can use their cloud-desktop solution with that referral link for a $10 credit, which is more than enough to finish this miniseries (train time is ~1 hour on their $0.40/hr GPU)
Going from using the pre-built models to adding custom objects is a decent jump from my findings, and I could not locate any full step-by-step guides, so hopefully I can save you all from the struggle. Once solved, the ability to train for any custom object you can think of (and create data for) is an awesome skill to have.
Alright, so a brief overview of the steps needed to do this:
So, for this tutorial, I needed an object. I wanted something useful, but that wasn't already done. Obviously, everyone needs to know where the macaroni and cheese is, so let's track that!
I used Google Images, Bing, and ImageNet to collect some images of Mac n Cheese. In general, pictures around the size of 800x600, not too large and not too small.
For this tutorial, you can track *anything* you want, you just need 100+ images. Once you have images, you need to annotate them. For this, I am going to use LabelImg, you can grab it with
git clone https://github.com/tzutalin/labelImg, or just download and extract the zip.
Installation instructions are on the labelimg github, but for Python3 on Ubuntu:
sudo apt-get install pyqt5-dev-tools
sudo pip3 install lxml
When running this, you should get a GUI window. From here, choose to
open dir and pick the directory that you saved all of your images to. Now, you can begin to annotate with the
create rectbox button. Draw your box, add the name in, and hit ok. Save, hit next image, and repeat! You can press the
w key to draw the box and do
ctrl+s to save faster. Not sure if there's a shortcut for the next image.
Once you have over 100 images labeled, we're going to separate them into training and testing groups. To do this, just copy about 10% of your images and their annotation XML files to a new dir called
test and then copy the remaining ones to a new dir called
Once you've done all of this, you're ready to go to the next tutorial, where we're going to cover how we can create the required TFRecord files from this data.
Alternatively, if you would like to just use my pre-made files, you can download my labeled macaroni and cheese.