DLCV - Generative Models (I) - AE, VAE & GAN
Generative Models
Discriminative vs. Generative Models
Discriminative Model | Generative Model | Conditional Generative Model | ||
---|---|---|---|---|
Learn a probability distribution |
Learn a probability distribution |
Learn |
- Discriminative Models
- Model posteriors
from likelihoods ,where is the input data, and indicates the class of interest, 是一種 conditional generation - Example (posterior)
unconditional generation: 若生成的圖片不是從
的 set 裡生出來,可能就不會像貓咪了 - The possible labels for each input “compete” for probability mass. But no competition between imgs
- No way for the model to handle unreasonable inputs; it must give label distributions for all imgs
- Goal: Learn a (posterior) probability distribution p(y|x)
- Task: Assign labels to each instance x (e.g., classification, regression, etc.)
- Supervised learning
- Generative Model
- All possible imgs compete with each other for probability mass
- Model can “reject” unreasonable inputs by assigning them small values
- Goal: Learn a probability distribution p(x)
- Task: Data representation, generation, detect outliers, etc.
- (Mostly) unsupervised learning
- Conditional Generative Model
- Each possible label induces a competition among all imgs
Autoencoder (AE)
- unsupervised learning
- Autoencoding = encoding itself with recovery purposes
- In other words, encode/decode data with reconstruction guarantees
- Latent variables/features as deep representations
- Example objective/loss function at output:
- L2 norm between input and output, i.e
AE for Learning Latent Variables/Representations
- Why/when AE may be favorable?
- i.e., unsupervised learning for latent representation…
- Any application comes to your mind?? -> anomaly detect 異常檢測,台積電晶片損壞測量手法
- Train autoencoder (AE) for downstream tasks
- Train AE with reconstruction guarantees
- Keep encoder (and the derived features) for downstream tasks (e.g., classification)
- Thus, a trained encoder can be applied to initialize a supervised model
Variational Autoencoder
Probabilistic Spin on AE
- Learn latent feature
from raw input data - Sample from the latent space (via
) to generate data - For simplicity, assume simple prior
(e.g., Gaussian)
- Learn
via a NN as a (probabilistic) decoder
- Learn latent feature
Remarks
- Training objective: maximum likelihood of data p(x)
- Note that we can’t possibly observe all latents z & need to marginalize it:
- We can compute with the decoder module,
and we assume Gaussian prior for
- Still, can’t integrate over all possible z!
- What else can we do? Recall that we have Bayes’ rule:
We still need , but , which is not explicitly known.
Instead, we train the encoder module to learn
Reparameterization Trick in VAE
- Remarks
- Given x, sample z from latent distribution (described by output parameters of encoder)
- However, this creates a bottleneck since backpropagation (BP) cannot flow through
- Alternatively, we apply z =
(ε simply generated by Normal distribution). - This enables BP gradients in encoder through μ and σ,
while maintaining stochasticity via ε (for generative synthesis purposes).
Implementation of VAE Training
Diffusion Mode
Denoising Diffusion Probabilistic Model (DDPM)
Denoising Diffusion Models
- Emerging as powerful visual generative models
- Unconditional image synthesis
- Conditional image synthesis
Denoising Diffusion Probabilistic Models (DDPM)
Learning to generate by denoising
- 2 processes required for training:
- Forward diffusion process
- gradually add noise to input
- Reverse diffusion process
- learns to generate/restore data by denoising
- typically implemented via a conditional U-net
- Forward diffusion process
Diffusion Model 沒有 Encoder 和 Decoder,只有 Denoiser
Diffusion Model 唯一要訓練的
VAE vs. DDPM
Learning of Diffusion Models
Diffusion Model 三大目標:
- Prior matching
- Denoising matching
- Reconstruction
Let’s Take a Look at the Encoding Part…
Now Let’s Focus on the Training Objective
Training vs. Inference
Observation #1
Latent Diffusion Model (LDM)
Denoising Diffusion Implicit Model (DDIM)
- DDPM
- Prosn
- High quality image generation without adversarial training.
- Cons
- Require simulating a Markov chain for many steps in order to produce a sample.
- Prosn
(i.e., x t relies on x t-1 )
DDI
DDIM
- A non-Markovian process
- Non-Markovian forward process