DLCV - Recurrent Neural Networks & Transformer

LAVI

Mode Collapse

  • Remarks
  • The generator only outputs a limited number of image variants
    regardless of the inputs.

MSGAN

  • To address the mode collapse issue by conditional GANs

  • Mode Seeking Generative Adversarial Networks
    for Diverse Image Synthesis

  • With the goal of producing diverse image outputs

  • Motivation (for unconditional GAN)

  • Proposed Regularization (for conditional GAN)

Adversarial Learning for Transfer Learning

  • 左: source domain: 有 ,可以獲得 data 和 label
  • 右: target domain: 有 ,有 data 沒 label

Domain Adaptation in Transfer Learning

  • What’s DA?
    • Leveraging info source to target domains, so that the same learning task across domains (or particularly in the target domain) can be addressed.
    • Typically all the source-domain data are labeled.
  • Settings
    • Semi-supervised DA: few target-domain data are with labels.
    • Unsupervised DA: no label info available in the target-domain.
      (shall we address supervised DA?)
    • Imbalanced DA: fewer classes of interest in the target domain
    • Homogeneous vs. heterogeneous DA

unsupervised domain adaptation

Deep Domain Confusion (DDC)

  • Deep Domain Confusion: Maximizing for Domain Invariance

讓紅色和藍色的資料混在一起分類

利用計算 "距離" 來訓練拉近紅色和藍色的同類別資料,但這個方法十分 "單純"

兩個步驟都可以同時 train
這張圖左邊底下藍色的 Labeled Images 其實是 Source data (紅色圓圈內的資料)
右邊底下紅色的 Unlabeled Images 其實是 Target data (藍色圓圈內的資料)

Domain Confusion by Domain-Adversarial Training

  • Domain-Adversarial Training of Neural Networks (DANN)
    • Y. Ganin et al., ICML 2015
    • Maximize domain confusion = maximize domain classification loss
    • Minimize source-domain data classification loss
    • The derived feature f can be viewed as a disentangled & domain-invariant feature

兩條線都可以同時進行 train

Beyond Domain Confusion

  • Domain Separation Network (DSN)
    • Bousmalis et al., NIPS 2016
    • Separate encoders for domain-invariant and domain-specific features
    • Private/common features are disentangled from each other.

橘色區域取的是前景,綠色區域取的是背景,橘色和綠掃區域生出的圖片要越不像越好

Recurrent Neural Networks

  • Parameter sharing + unrolling
    • Keeps the number of parameters fixed
    • Allows sequential data with varying lengths
  • Memory ability
    • Capture and preserve information which has been extracted/processed

h: hidden state

Training RNNs: Back Propagation Through Time

-Let’s focus on one training instance.

  • The divergence to be computed is between the sequence of outputs by the network and the desired output sequence.
  • Generally, this is not just the sum of the divergences at individual times

Variants of RNN

  • Long Short-term Memory (LSTM) [Hochreiter et al., 1997]
    • Additional memory cell
    • Input/Forget/Output Gates
    • Handle gradient vanishing
    • Learn long-term dependencies
  • Gated Recurrent Unit (GRU) [Cho et al., EMNLP 2014]
    • Similar to LSTM
      • handle gradient vanishing & learn long-term dependencies
    • No additional memory cell
    • Reset / Update Gates
    • Fewer parameters than LSTM
    • Comparable performance to LSTM [Chung et al., NIPS Workshop 2014]

Sequence-to-Sequence Modeling

Unsupervised Learning of Video Representations using LSTMs

Multi-task learning

左邊白色區域是 Encoder
藍色區域是 Decoder#1,做 data recondstruction 或 recovery
橘色區域是 Decoder#2,做 data predict

Learning to generate long-term future via hierarchical prediction

data 跟 posture 對得起來才是 true (test 第一個案例),包含時間點對不起來也是 false (test 第三個案例)

What’s the Potential Problem in RNN?

  • Each hidden state vector extracts/carries information across time steps (some might be diluted downstream).
  • Information of the entire input sequence is embedded into a single hidden state vector.
  • Outputs at different time steps have particular meanings.
  • However, synchrony between input and output seqs is not required

Solution #1: Attention Model

  • What should the attention model be?
    • A NN whose inputs are z and h while output is a scalar, indicating the similarity between z and h.
  • Most attention models are jointly learned with other parts of network (e.g., recognition, etc.)
- -

AttentImage Captioning with Attention

  • Attention helps image recognition… What else?
    • localization / explainable AI

Transformer

用前後文來決定 “apple” 指的是吃的蘋果還是蘋果手機,這個方法就是 “attention”,也可說 self attention