DLCV - Recurrent Neural Networks & Transformer
Mode Collapse
- Remarks
- The generator only outputs a limited number of image variants
regardless of the inputs.
MSGAN
To address the mode collapse issue by conditional GANs
Mode Seeking Generative Adversarial Networks
for Diverse Image SynthesisWith the goal of producing diverse image outputs
Motivation (for unconditional GAN)
Proposed Regularization (for conditional GAN)
Adversarial Learning for Transfer Learning
- 左: source domain: 有
,可以獲得 data 和 label - 右: target domain: 有
,有 data 沒 label
Domain Adaptation in Transfer Learning
- What’s DA?
- Leveraging info source to target domains, so that the same learning task across domains (or particularly in the target domain) can be addressed.
- Typically all the source-domain data are labeled.
- Settings
- Semi-supervised DA: few target-domain data are with labels.
- Unsupervised DA: no label info available in the target-domain.
(shall we address supervised DA?) - Imbalanced DA: fewer classes of interest in the target domain
- Homogeneous vs. heterogeneous DA
unsupervised domain adaptation
Deep Domain Confusion (DDC)
- Deep Domain Confusion: Maximizing for Domain Invariance
利用計算 "距離" 來訓練拉近紅色和藍色的同類別資料,但這個方法十分 "單純"讓紅色和藍色的資料混在一起分類
兩個步驟都可以同時 train
這張圖左邊底下藍色的 Labeled Images 其實是 Source data (紅色圓圈內的資料)
右邊底下紅色的 Unlabeled Images 其實是 Target data (藍色圓圈內的資料)
Domain Confusion by Domain-Adversarial Training
- Domain-Adversarial Training of Neural Networks (DANN)
- Y. Ganin et al., ICML 2015
- Maximize domain confusion = maximize domain classification loss
- Minimize source-domain data classification loss
- The derived feature f can be viewed as a disentangled & domain-invariant feature
兩條線都可以同時進行 train
Beyond Domain Confusion
- Domain Separation Network (DSN)
- Bousmalis et al., NIPS 2016
- Separate encoders for domain-invariant and domain-specific features
- Private/common features are disentangled from each other.
橘色區域取的是前景,綠色區域取的是背景,橘色和綠掃區域生出的圖片要越不像越好
Recurrent Neural Networks
- Parameter sharing + unrolling
- Keeps the number of parameters fixed
- Allows sequential data with varying lengths
- Memory ability
- Capture and preserve information which has been extracted/processed
h: hidden state
Training RNNs: Back Propagation Through Time
-Let’s focus on one training instance.
- The divergence to be computed is between the sequence of outputs by the network and the desired output sequence.
- Generally, this is not just the sum of the divergences at individual times
Variants of RNN
- Long Short-term Memory (LSTM) [Hochreiter et al., 1997]
- Additional memory cell
- Input/Forget/Output Gates
- Handle gradient vanishing
- Learn long-term dependencies
- Gated Recurrent Unit (GRU) [Cho et al., EMNLP 2014]
- Similar to LSTM
- handle gradient vanishing & learn long-term dependencies
- No additional memory cell
- Reset / Update Gates
- Fewer parameters than LSTM
- Comparable performance to LSTM [Chung et al., NIPS Workshop 2014]
- Similar to LSTM
Sequence-to-Sequence Modeling
Unsupervised Learning of Video Representations using LSTMs
Multi-task learning左邊白色區域是 Encoder
藍色區域是 Decoder#1,做 data recondstruction 或 recovery
橘色區域是 Decoder#2,做 data predict
Learning to generate long-term future via hierarchical prediction
data 跟 posture 對得起來才是 true (test 第一個案例),包含時間點對不起來也是 false (test 第三個案例)
What’s the Potential Problem in RNN?
- Each hidden state vector extracts/carries information across time steps (some might be diluted downstream).
- Information of the entire input sequence is embedded into a single hidden state vector.
- Outputs at different time steps have particular meanings.
- However, synchrony between input and output seqs is not required
Solution #1: Attention Model
- What should the attention model be?
- A NN whose inputs are z and h while output is a scalar, indicating the similarity between z and h.
- Most attention models are jointly learned with other parts of network (e.g., recognition, etc.)
AttentImage Captioning with Attention
- Attention helps image recognition… What else?
- localization / explainable AI
Transformer
用前後文來決定 “apple” 指的是吃的蘋果還是蘋果手機,這個方法就是 “attention”,也可說 self attention