Hello and welcome to part 11 of the Unconventional Neural Networks series, here, we're going to run through the results from the neural network that does addition, and then we'll go over making this even more advanced.
In the previous tutorial, we checked out some of the results, but let's go ahead and show a bit more. As the model trained, I personally just manually appended the step number do the end of the file. So, for output_dev on step 15,000, I would just call that output_dev15k
. From here, I created a script that would automatically check the output_dev output, comparing it to the real answer and whatever the model answered. Here's a basic version of that:
with open("data/tst2012.from", "r") as f: model_in = f.read().split('\n') with open("model/output_dev15k", "r") as f: model_out = f.read().split('\n') with open("data/tst2012.to", "r") as f: actual_out = f.read().split('\n') correct = 0 total = 0 for i, _ in enumerate(model_out[:-1]): print("Input: {}. Desired Out: {} || Model out: {}".format(model_in[i].replace(" ", ""), actual_out[i].replace(" ", ""), model_out[i].replace(" ", ""))) if actual_out[i] == model_out[i]: correct += 1 print("YAAAAAAAAAYY!!!!") total += 1 print(correct/total)
At 15K steps, the model is already 1.2% accurate, which isn't all that bad, considering the type of model that this is, and the complexity of the challenge (at least for a neural network). Regardless, I continued training overnight until the model was complete, saving the outputs along the way as best I could, but the last ~150K steps occurred while I slept, so I didn't get all of those.
As time went on though, I wanted another metric besides loss and total accuracy to see how training was going. With something like math, the final accuracy is certainly a decent metric, but, another decent metric is "how wrong" something is, kind of like how loss works, only a bit more applicable to this exact problem. I wondered about tracking the absolute difference between the right answer and the predicted answers, to see if that metric also got better over time. Even if a model never actually got even 1 answer right, we'd still likely agree that it was at least learning if it was getting closer and closer to the right answers.
So then I made some modifications to the code:
with open("data/tst2012.from","r") as f: model_in = f.read().split('\n') with open("model/output_dev25k","r") as f: model_out = f.read().split('\n') with open("data/tst2012.to","r") as f: actual_out = f.read().split('\n') correct = 0 total = 0 total_difference = 0 for i, _ in enumerate(model_out[:-1]): print("Input: {}. Desired Out: {} || Model out: {}".format(model_in[i].replace(" ",""), actual_out[i].replace(" ",""), model_out[i].replace(" ",""))) total_difference += abs(int(actual_out[i].replace(" ","")) - int(model_out[i].replace(" ",""))) if actual_out[i] == model_out[i]: correct+=1 print("YAAAAAAAAAYY!!!!") total +=1 print(correct/total) print(total_difference)
This time it just prints the total difference too.
At 15K steps, the total accuracy was 1.2%
and the total difference was 46,761
. At 25K steps, the accuracy was actually 0.4%
, so less than at 15K, but the total difference was decreated to 21,513
. Even though the accuracy at the end went down, the overall "closeness" to the right answers improved.
By 50k steps, accuracy was 9.8%
, and the total difference was only 2044
. Big improvements!
Finally, I came up with a 3rd version that tracked these changes over time:
with open("data/tst2012.from","r") as f: model_in = f.read().split('\n') with open("data/tst2012.to","r") as f: actual_out = f.read().split('\n') versions = [5, 10, 15, 20, 25, 30, 33, 39, 44, 50, 55, 62, 85, 90, 98, 103, 257] for v in versions: with open("model/output_dev{}k".format(v),"r") as f: model_out = f.read().split('\n') correct = 0 total = 0 total_difference = 0 for i, _ in enumerate(model_out[:-1]): total_difference += abs(int(actual_out[i].replace(" ","")) - int(model_out[i].replace(" ",""))) if actual_out[i] == model_out[i]: correct+=1 total +=1 print("{}K Correct rate: {}. Sum of differences: {}".format(v, correct/total, total_difference))
The versions list is just all of the step #s (in thousands), that I had saved. Then, we print out the accuracy and then the total difference.
5K Correct rate: 0.0. Sum of differences: 394562 10K Correct rate: 0.002. Sum of differences: 47611 15K Correct rate: 0.012. Sum of differences: 46761 20K Correct rate: 0.002. Sum of differences: 18832 25K Correct rate: 0.004. Sum of differences: 21513 30K Correct rate: 0.026. Sum of differences: 8705 33K Correct rate: 0.036. Sum of differences: 6113 39K Correct rate: 0.038. Sum of differences: 13964 44K Correct rate: 0.068. Sum of differences: 12757 50K Correct rate: 0.098. Sum of differences: 2044 55K Correct rate: 0.088. Sum of differences: 1799 62K Correct rate: 0.168. Sum of differences: 2169 85K Correct rate: 0.434. Sum of differences: 434 90K Correct rate: 0.476. Sum of differences: 405 98K Correct rate: 0.524. Sum of differences: 356 103K Correct rate: 0.526. Sum of differences: 354 257K Correct rate: 1.0. Sum of differences: 0
By the end of training, this model learned to do addition with 100% accuracy. If you watch the video, you see I got somewhat surprised by the inference not having perfect accuracy. The inference in this case was custom-made for a chatbot, with after-the-fact scoring mechanisms in place. One of the main things is that the scoring prefers longer responses, which is actually why the smaller numbers seemingly werent doing as well. A custom inference script could be made for the true, raw, output, using the same code as what generates the output_dev files, for example. Anyway, pretty cool! We learned addition!
Want the model? I've uploaded it here: NMT-Addition Model 257K steps, which also contains the settings.
So if we can do addition, what about all math symbols? So division, subtraction, addition, and multiplication, all at once?
Here's a data generation script for doing all 4 types of operations:
import random hm_samples = 10000000 max_val = 100000 operators = ['add','sub','mul','div'] def generate_pair(action): x = random.randrange(1, max_val) y = random.randrange(1, max_val) if action == 'add': result = x+y symbol = "+" elif action == 'sub': result = x-y symbol = "-" elif action == 'mul': result = x*y symbol = "*" elif action == 'div': result = round(x/y,7) symbol = "/" str_in = "{}{}{}\n".format(x, symbol, y) str_out = "{}\n".format(result) return str_in, str_out def test_gen_pair(method='sub'): str_in, str_out = generate_pair(method) print(str_in) print(str_out) if __name__ == "__main__": #test_gen_pair() with open("train.from", "a") as fin: with open("train.to", "a") as fout: for i in range(hm_samples): str_in, str_out = generate_pair(random.choice(operators)) fin.write(str_in) fout.write(str_out) with open("tst2012.from", "a") as fin1: with open("tst2013.from", "a") as fin2: with open("tst2012.to", "a") as fout1: with open("tst2013.to", "a") as fout2: for i in range(500): str_in, str_out = generate_pair(random.choice(operators)) fin1.write(str_in) fin2.write(str_in) fout1.write(str_out) fout2.write(str_out)
The above will produce equations like:
78049-1609 82342-60624 83188*70507 4988+18198 21562/25607 24494/2506 2305-7721 45157*60121 50226+31208 62895+94793 94956+18861 59858+53243 70692+20065 8614/47356
I am curious about these results, but I am *also* curious about even MORE complex math. Since the model was able to learn addition, I am confident that the multi-operator will probably work decently-well. I still want to run and test the above, but...what about making waaaaaaaay more complex operations? Here's the complex-math code:
import random from collections import defaultdict hm_test = 500 hm_samples = 10000000+hm_test max_val = 100000 max_number_of_nums = 10 operators = ["+", "-", "*", "/"] equations = {} while len(equations) < hm_samples: nums = [random.randrange(1,max_val) for _ in range(random.randrange(2,max_number_of_nums))] number_of_parenthesis = random.randrange(0, min(4, len(nums)-2)) if len(nums) > 2 else 0 opening_parenthesis = defaultdict(lambda: 0) closing_parenthesis = defaultdict(lambda: 0) for _ in range(number_of_parenthesis): opening_parenthesis_position = random.randrange(0, len(nums)-1) if opening_parenthesis[opening_parenthesis_position] > 0 and opening_parenthesis[opening_parenthesis_position] + 1 in closing_parenthesis.values(): continue opening_parenthesis[opening_parenthesis_position] += 1 closing_parenthesis_position = random.randrange(opening_parenthesis_position + 1, len(nums)) if closing_parenthesis[closing_parenthesis_position] > 0 and closing_parenthesis[closing_parenthesis_position] + 1 in opening_parenthesis.values(): opening_parenthesis[opening_parenthesis_position] -= 1 continue closing_parenthesis[closing_parenthesis_position] += 1 init_str = '' while opening_parenthesis[0] > 0 and closing_parenthesis[len(nums)-1] > 0: opening_parenthesis[0] -= 1 closing_parenthesis[len(nums)-1] -= 1 for index, num in enumerate(nums): while opening_parenthesis[index] > 0 and closing_parenthesis[index] > 0: opening_parenthesis[index] -= 1 closing_parenthesis[index] -= 1 operator = random.choice(operators) if init_str != '' else '' init_str += "{}{}{}{}".format(operator, '('*opening_parenthesis[index], str(num), ')'*closing_parenthesis[index]) try: equations[init_str] = eval(init_str) except: pass #print('\n'.join([k + ' = ' + str(v) for k, v in equations.items()])) with open("train.from", "a") as fin: with open("train.to", "a") as fout: for k, v in list(equations.items())[:-hm_test]: fin.write(k) fin.write('\n') fout.write(str(v)) fout.write('\n') with open("tst2012.from", "a") as fin1: with open("tst2013.from", "a") as fin2: with open("tst2012.to", "a") as fout1: with open("tst2013.to", "a") as fout2: for k, v in list(equations.items())[-hm_test:]: fin1.write(k) fin1.write('\n') fin2.write(k) fin2.write('\n') fout1.write(str(v)) fout1.write('\n') fout2.write(str(v)) fout2.write('\n')
This makes equations like:
63166/21707+25193-26327+14443*20117*67066/91296 22564/(15291+65142*83720*6457+91001/70325+9577) 77861+((43454-88314*(78299/77643)+40734)/61134/46151) 90584+26054+54674 91680/(49369+(99777-91774-(1089-58896/99825*83470)/42034)) 23831*58422+51593+55339 51065+96120*50507 82385*54087/45899 52283-(37808*86291+25851)*62242 58635*72485 80418/87375*(71408-38976) 52734*35731*80873-5370+89899/64551 15100-9067/51953/49726 34087+89287 25126*90947*43776 52241/78092-54404 84155*24269+61062 34993-29484/13714/98436 39590*33244+(48665/11603*45145*44756)-(17328+35983)
Solving equations like that, or even getting close, is a pretty cool challenge. Let's see how we do!