DLCV - Generative Models (I) - AE, VAE & GAN

Generative Models

Discriminative Model	Generative Model	Conditional Generative Model
Learn a probability distribution	Learn a probability distribution	Learn

Model posteriors from likelihoods ，where is the input data, and indicates the class of interest，是一種 conditional generation
Example (posterior)

unconditional generation: 若生成的圖片不是從的 set 裡生出來，可能就不會像貓咪了
The possible labels for each input “compete” for probability mass. But no competition between imgs
No way for the model to handle unreasonable inputs; it must give label distributions for all imgs
Goal: Learn a (posterior) probability distribution p(y|x)
Task: Assign labels to each instance x (e.g., classification, regression, etc.)
Supervised learning

All possible imgs compete with each other for probability mass
- Model can “reject” unreasonable inputs by assigning them small values
Goal: Learn a probability distribution p(x)
Task: Data representation, generation, detect outliers, etc.
(Mostly) unsupervised learning

unsupervised learning
Autoencoding = encoding itself with recovery purposes
In other words, encode/decode data with reconstruction guarantees
Latent variables/features as deep representations
Example objective/loss function at output:
- L2 norm between input and output, i.e

Why/when AE may be favorable?
- i.e., unsupervised learning for latent representation…
- Any application comes to your mind?? -> anomaly detect 異常檢測，台積電晶片損壞測量手法
Train autoencoder (AE) for downstream tasks
Train AE with reconstruction guarantees
Keep encoder (and the derived features) for downstream tasks (e.g., classification)
Thus, a trained encoder can be applied to initialize a supervised model

> Z -> laten space > Classifier -> MLP

Probabilistic Spin on AE
- Learn latent feature from raw input data
- Sample from the latent space (via ) to generate data
- For simplicity, assume simple prior (e.g., Gaussian)
- Learn via a NN as a (probabilistic) decoder
Remarks
- Training objective: maximum likelihood of data p(x)
- Note that we can’t possibly observe all latents z & need to marginalize it:
- We can compute with the decoder module,

and we assume Gaussian prior for
- Still, can’t integrate over all possible z!
- What else can we do? Recall that we have Bayes’ rule:

We still need , but , which is not explicitly known.
Instead, we train the encoder module to learn

Emerging as powerful visual generative models
- Unconditional image synthesis
- Conditional image synthesis

Learning to generate by denoising

Diffusion Model 沒有 Encoder 和 Decoder，只有 Denoiser
Diffusion Model 唯一要訓練的

Diffusion Model 三大目標:

DDPM
- Prosn
  - High quality image generation without adversarial training.
- Cons
  - Require simulating a Markov chain for many steps in order to produce a sample.

(i.e., x t relies on x t-1 )