Friday, January 11, 2019

Generating Text Character by Character


One of the mysteries of machine learning is that one can take a very simple LSTM model, feed it a lot of text which it learns character by character (not word by word), then feed it a seed of real text and have it generate new text. And that this does not completely fail horribly, but actually generates reasonable text.

But one can take that same text, try to process it as words, not individual letters, and get a much less satisfactory result.

Maybe I am doing something wrong? Oh, no doubt.

