Sunday, September 3, 2023

Fundamentals for LLMs and Image Diffusion

 
I am enjoying Midjourney very much.  I presume I would be as impressed with any LLM (large language model) driven image diffusion system but this is the only one I have played with.  I feel completely awed and blindsided by the range of capabilities I have seen here.  Where did this come from?  How does it work?  To save you the effort of finding out, I am endeavoring to collect a list of topics it will be necessary to have at least a passing familiarity with in order to understand this modern miracle.  It probably wont be necessary for you to be a genius at any of these topics, perhaps, but it may be necessary for you to know something about all of these if you want to know what is going on beneath the hood and maybe predict where this kind of technology is going.
 
To give you an idea about why I find this so compelling and disturbing, here is the result of "/imagine vintage photograph of king kong climbing the Chrysler building".
 
 

 
A partial list of technologies involved include 
 
Latent Variable Models
Variational Autoencoder
Generative Adversarial Networks (GANs)
Semantic Image Synthesis
 
and more to come.