Welcome to part 7 of the chatbot with Python and TensorFlow tutorial series. Here, we're going to discuss our model. There are endless models that you could come up with and use, or find online and adapt to your needs. My main interest was in sequence to sequence models, since sequence to sequence models can be used for a chatbot, sure, but can also be used for a whole host of other things too. Basically everything in life can be reduced to sequences being mapped to sequences, so we could train quite a bit of things. For now though: I want a chatbot.
At the time when I began my quest for a chatbot, I stumbled on the original TensorFlow translation seq2seq tutorial which focused on translating English to French, and did a decent job of it. Unfortunately, this model is now deprecated due to some changes in seq2seq. There is a legacy seq2seq that you can bring in with up-to-date TensorFlow, but I've never got it to work. Instead, if you want to use this model, you'll probably need to downgrade tensorflow (pip install tensorflow-gpu==1.0.0
). Alternatively, you can look into the later Neural Machine Translation (NMT) models, using the latest and greatest seq2seq from TensorFlow. The latest NMT tutorial and code from TensorFlow can be found here: Neural Machine Translation (seq2seq) Tutorial.
We're going to working with a project that I have been working on with my friend, Daniel.
The project's location is: NMT Chatbot, which is a set of utilities built on top of TensorFlow's NMT code.
The project is subject to change, so you should check the readme, which, at the time of my writing this, currently says:
$ git clone --recursive https://github.com/daniel-kukiela/nmt-chatbot $ cd nmt-chatbot $ pip install -r requirements.txt $ cd setup (optional) edit settings.py to your liking. These are a decent starting point for ~4gb of VRAM, you should first start by trying to raise vocab if you can. (optional) Edit text files containing rules in setup directory Place training data inside "new_data" folder (train.(from|to), tst2012.(from|to)m tst2013(from|to)). We have provided some sample data for those who just want to do a quick test drive. $ python prepare_data.py ...Run setup/prepare_data.py - new folder called "data" will be created with prepared training data $ cd ../ $ python train.py Begin training
So let's do that! We will first set this up, get it running, and then I will explain the major concepts that you should understand.
If you need more processing power, check out Paperspace with this $10 credit, which should get you enough time to at least get something decent. I've been using them for a while now, and really love how quickly I can just boot up their "ML-in-a-Box" options and immediately train a model.
Make sure you download the package recursively, or manually get the nmt package either forked in our repo or from the official TensorFlow source. Our fork just has one change with the version checking, which, at least at the time, required a very specific version of 1.4.0, which wasn't actually necessary. It might be fixed by the time you're going through this, but we may also make further changes to the core NMT code.
Once downloaded, edit setup/settings.py. If you don't really know what you're doing, that's okay, you don't need to modify anything. The preset settings will require ~4GB of VRAM, but should still produce an at least coherent model. Charles v2 was trained with the following settings, and 'vocab_size': 100000,
(set earlier in the script):
hparams = { 'attention': 'scaled_luong', 'src': 'from', 'tgt': 'to', 'vocab_prefix': os.path.join(train_dir, "vocab"), 'train_prefix': os.path.join(train_dir, "train"), 'dev_prefix': os.path.join(train_dir, "tst2012"), 'test_prefix': os.path.join(train_dir, "tst2013"), 'out_dir': out_dir, 'num_train_steps': 500000, 'num_layers': 2, 'num_units': 512, 'override_loaded_hparams': True, 'learning_rate':0.001, # 'decay_factor': 0.99998, 'decay_steps': 1, # 'residual': True, 'start_decay_step': 1, 'beam_width': 10, 'length_penalty_weight': 1.0, 'optimizer': 'adam', 'encoder_type': 'bi', 'num_translations_per_input': 30 }
I manually decayed the learning rate, since Adam really doesn't need a gradual decay (The ada in adam stands for adaptive, the m is moment, so adam = adaptive moment). I did 0.001 to start, then halved to 0.0005, then 0.00025, then 0.0001. Depending on how much data you have, you don't want to necessarily decay per set # of steps. I would suggest decaying once every 1-2 epochs when using Adam. Default batch size is 128, so you can calculate the exact # of steps for your epochs if you want to still set it up to automatically decay. If you use an SGD optimizer, then the decay factor commented out is good, and you'd want to start the learning rate at 1.
Once you've got your settings all set, inside the main dir (with the utils, tests, and setup directories), throw in your train.to
and train.from
, along with the matching tst2012
and tst2013
files into the new_data
directory. Now cd setup
run the prepare_data.py
file:
$ python3 prepare_data.py
Finally, cd ../
and then
$ python3 train.py
In the next tutorial, we're going to discuss more in-depth how the model works, the parameters, and the metrics involved in training.