Fast AutoAugment

2020年5月19日 22:39

Arxiv

Official Pytorch implementation

One of the Authors, Kaggle Competitions Master

1. Abstract

1-1. Fast AutoAugment is an algorithm to automatically search for augmentation policies.

1-2. A GPU workload of Fast AutoAugment is small and its strategy is based on density matching.

2. Introduction and Related Work

2-1. Baseline Augmentation, Mixup, Cutout, and CutMix
- Designed manually based on domain knowledge.

2-2. Smart Augmentation
- Learns to generate augmented data by merging two or more samples in the same class.

2-3. Simulated GAN

2-4. AutoAugment
- Uses reinforcement learning (RL).
- Requires thousands of GPU hours.

2-5. Population Based Augmentation (PBA)
- Based on population-based training method of hyperparameter optimization.

- Directly searches for augmentation policies that maximize the match between the distribution of augmented split and the distribution of another, un-augmented split via a single model.

2-6. Bayesian Data Augmentation (BDA)
- New annotated training points are treated as missing variables and generated based on the distribution learned from the training set.

- Generalised Monte Carlo EM algorithm as an extension of the Generative Adversarial Network (GAN).

2-7. Fast AutoAugment
- Different from BDA, recovers those missing data points by the exploitation-and-exploration via Bayesian optimization in the policy search phase.

- Table 1 below shows, Fast AutoAugment can search augmentation policies significantly faster than AutoAugment, while retaining comparable performances to AutoAugment on diverse image datasets and networks.

- Uses Tree-structured Parzen Estimator (TPE) algorithm for practical implementation.

3. Fast AutoAugment

3-1. Search Space

- Search space (S) consists of sub-policyies (tau1, tau2, ...) and 2 parameters (p: probability, lambda: magnitude of an operation) of each operation.

- Sub-policy can be any number of consecutive operations. In the case of Figure 1, 2 operations are applied.

- Final policy is a collection of sub-policies.

3-2. Search Strategy

- Split train data into D_M (for Model) and D_A (for Augmentation) that are used for learning the model parameters (theta) and exploring the augmentation policy (Tau) , respectively.

- Augmentation policy is derived from optimizing equation (2). R is accuracy of the model ( with parameters (theta^*) trained on D_M ) on augmented D_A.

- Figure 2 shows the procedure how optimization in equation (2) is performed.

- The procedure is as following,

<Step 1>
Perform the K-fold stratified shuffling to split the train dataset into D^(1)_train, ..., D^(K)_train where each D^(k)_train consists of two datasets (D^(k)_M and D^(k)_A) using StratifiedShuffleSplit method in sklearn.
<Step 2>
Train model parameter theta on D^(k)_M from scratch without data augmentation.
<Step 3>
For each time (t = 0, 1, ..., T-1), explore B (: tau_1, tau_2, ..., tau_B) candidate policies via Bayesian optimization method. Then select top-N policies (= tau_t in line 8 of algorithm 1 above) over B.
<Step 4>
Merge every tau_t into final policy set (= tau_* in line 9)

- Policy Exploration is performed with HyperOpt of Ray using TPE to estimate Expected Improvement.