DLCV - Generative Models (I) - AE, VAE & GAN

LAVI

Generative Models

Discriminative vs. Generative Models

Discriminative Model Generative Model Conditional Generative Model
Learn a probability distribution Learn a probability distribution Learn
  1. Discriminative Models
  • Model posteriors from likelihoods ,where is the input data, and indicates the class of interest, 是一種 conditional generation
  • Example (posterior)

    unconditional generation: 若生成的圖片不是從 的 set 裡生出來,可能就不會像貓咪了

  • The possible labels for each input “compete” for probability mass. But no competition between imgs
  • No way for the model to handle unreasonable inputs; it must give label distributions for all imgs
  • Goal: Learn a (posterior) probability distribution p(y|x)
  • Task: Assign labels to each instance x (e.g., classification, regression, etc.)
  • Supervised learning
  1. Generative Model
  • All possible imgs compete with each other for probability mass
    • Model can “reject” unreasonable inputs by assigning them small values
  • Goal: Learn a probability distribution p(x)
  • Task: Data representation, generation, detect outliers, etc.
  • (Mostly) unsupervised learning
  1. Conditional Generative Model
  • Each possible label induces a competition among all imgs

Autoencoder (AE)

  • unsupervised learning
  • Autoencoding = encoding itself with recovery purposes
  • In other words, encode/decode data with reconstruction guarantees
  • Latent variables/features as deep representations
  • Example objective/loss function at output:
    • L2 norm between input and output, i.e

AE for Learning Latent Variables/Representations

  • Why/when AE may be favorable?
    • i.e., unsupervised learning for latent representation…
    • Any application comes to your mind?? -> anomaly detect 異常檢測,台積電晶片損壞測量手法
  • Train autoencoder (AE) for downstream tasks
  • Train AE with reconstruction guarantees
  • Keep encoder (and the derived features) for downstream tasks (e.g., classification)
  • Thus, a trained encoder can be applied to initialize a supervised model
> Z -> laten space > Classifier -> MLP

Variational Autoencoder

  • Probabilistic Spin on AE

    • Learn latent feature from raw input data
    • Sample from the latent space (via ) to generate data
    • For simplicity, assume simple prior (e.g., Gaussian)
    • Learn via a NN as a (probabilistic) decoder
  • Remarks

    • Training objective: maximum likelihood of data p(x)
    • Note that we can’t possibly observe all latents z & need to marginalize it:

    • We can compute with the decoder module,

and we assume Gaussian prior for
- Still, can’t integrate over all possible z!
- What else can we do? Recall that we have Bayes’ rule:

We still need , but , which is not explicitly known.
Instead, we train the encoder module to learn

Reparameterization Trick in VAE

  • Remarks
    • Given x, sample z from latent distribution (described by output parameters of encoder)
    • However, this creates a bottleneck since backpropagation (BP) cannot flow through
    • Alternatively, we apply z = (ε simply generated by Normal distribution).
    • This enables BP gradients in encoder through μ and σ,
      while maintaining stochasticity via ε (for generative synthesis purposes).

Implementation of VAE Training

Diffusion Mode

Denoising Diffusion Probabilistic Model (DDPM)

Denoising Diffusion Models

  • Emerging as powerful visual generative models
    • Unconditional image synthesis
    • Conditional image synthesis

Denoising Diffusion Probabilistic Models (DDPM)

Learning to generate by denoising

  • 2 processes required for training:
    • Forward diffusion process
      • gradually add noise to input
    • Reverse diffusion process
      • learns to generate/restore data by denoising
      • typically implemented via a conditional U-net

Diffusion Model 沒有 Encoder 和 Decoder,只有 Denoiser
Diffusion Model 唯一要訓練的

VAE vs. DDPM

Learning of Diffusion Models

Diffusion Model 三大目標:

  1. Prior matching
  2. Denoising matching
  3. Reconstruction

Let’s Take a Look at the Encoding Part…

Now Let’s Focus on the Training Objective

Training vs. Inference

Observation #1

Latent Diffusion Model (LDM)

Denoising Diffusion Implicit Model (DDIM)

  • DDPM
    • Prosn
      • High quality image generation without adversarial training.
    • Cons
      • Require simulating a Markov chain for many steps in order to produce a sample.

(i.e., x t relies on x t-1 )

  • DDI

  • DDIM

    • A non-Markovian process
    • Non-Markovian forward process