Saving our Data For Training and Testing

Next up, we're going to start using our example numbers data. We want to create image arrays out of our numbers data, saving them, so that we can reference them later for pattern recognition.

For this, we're going to create a "createExamples" function:

def createExamples():
    numberArrayExamples = open('numArEx.txt','a')
    numbersWeHave = range(1,10)
    for eachNum in numbersWeHave:
        #print eachNum
        for furtherNum in numbersWeHave:
            # you could also literally add it *.1 and have it create
            # an actual float, but, since in the end we are going
            # to use it as a string, this way will work.
            imgFilePath = 'images/numbers/'+str(eachNum)+'.'+str(furtherNum)+'.png'
            ei =
            eiar = np.array(ei)
            eiarl = str(eiar.tolist())

            lineToWrite = str(eachNum)+'::'+eiarl+'\n'

I left a few comments in there, but you may also want to watch the video if you're finding yourself confused on this function. The purpose of this function is to literally just append the image's array to the file so we can reference it later.

In this, we're just using a flat file as our database. This is fine for smaller data-sets, but you may want to look into working with databases, either or in the future.

SQLite is a "light" version of SQL. It is also a flat file, but is going to be a bit more efficient than using something like a .txt file.

SQLite tutorial here

MySQL is probably the most popular database type and api used for SQL with databases.

MySQL tutorial here

Running the fucntion createExamples() should now create the numArEx.txt file and populat it with number arrays. With these, we can then take new numbers, threshold if necessary, then compare the current number array with our known number patterns, making an educated guess on what the number we're looking at is.

The next tutorial:

  • Introduction and Dependencies
  • Understanding Pixel Arrays
  • More Pixel Arrays
  • Graphing our images in Matplotlib
  • Thresholding
  • Thresholding Function
  • Thresholding Logic
  • Saving our Data For Training and Testing
  • Basic Testing
  • Testing, visualization, and moving forward