# AutoML for Data Augmentation

DeepAugment is an AutoML tool focusing on data augmentation. It utilizes Bayesian optimization for discovering data augmentation strategies tailored to your image dataset. The main benefits and features of DeepAugment are:

**Reduces the error rate of CNN models**(showed 60% decrease in error for CIFAR10 on WRN-28–10)**Saves time**by automating the process**50 times faster**than Google’s previous solution–AutoAugment

The finished package is on PyPI. You can install it from your terminal by running:

$ pip install deepaugment

You can also visit the project’s README or run the Google Colab notebook tutorial. To learn more about how I built this, read on!

### Introduction

Data is the most critical piece of AI applications. Not having enough labeled data often leads to overfitting, which means the model will not be able to generalize to unseen examples. This can be mitigated by data augmentation, which effectively increases the amount and diversity of data seen by the network. It is done by artificially producing new data by applying transformations on an original dataset such as rotation, cropping, occlusion, etc. However, determining which augmentations will work best for the dataset at hand is no trivial task. To address this problem, Google published AutoAugment last year, which discovers optimized augmentations for the given dataset using reinforcement learning.

Using Google’s AutoAugment requires powerful computational resources due to the reinforcement learning module. Since obtaining the necessary computational power can be costly, I developed a novel approach, DeepAugment, which employs Bayesian optimization instead of reinforcement learning.

### Ways to get better data

Efforts to improve the quality of data often have a higher return on investment than efforts to enhance models. There are three main ways to improve data: **collecting** more data, **synthesizing** new data, or **augmenting** existing data. Collecting additional data is not always possible and can be expensive. Data synthesis, done by GANs, is promising but complicated, and might diverge from realistic examples.

Data augmentation, on the other hand, is simple and has high impact. It is applicable to most datasets and is done with simple image transformations. The problem, however, is determining which augmentation technique is best for the dataset at hand. Discovering the proper method requires time-consuming experimentation. Even after many experiments, a machine learning (ML) engineer may still not discover the best option.

Effective augmentation strategies are different for each image dataset, and some augmentation techniques may even be detrimental to the model. For example, applying rotations would make your model worse if you are using it with the MNIST digits dataset, because a 180 degree rotation on a “6” would make it look like a “9”, while still being labeled as a 6. On the other hand, applying rotation to satellite images can improve results significantly since a car image from the air will still be a car, no matter how much it is rotated.

### DeepAugment: lightning fast autoML

DeepAugment is designed as a fast and flexible autoML data augmentation solution. More specifically, it is designed as a faster and more flexible alternative to AutoAugment (Cubuk et al., 2018, blog). AutoAugment was one of the most exciting publications in 2018, and the first method using reinforcement learning for this particular problem. At the time of this article, the open source version of AutoAugment did not provide the controller module, which prevents users from running it for their own datasets. Moreover, it takes **15,000** iterations to learn augmentation policies, requiring huge computational resources. Most people could not benefit from it even if its source code was fully available.

DeepAugment addresses these problems with the following design goals:

**Minimize the computational complexity**of the optimization of data augmentation while maintaining the quality of results.- Be
**modular and user-friendly.**

In order to achieve the first goal, DeepAugment was designed with the following differences, as compared to AutoAugment:

- Utilizes Bayesian optimization instead of reinforcement learning (requires fewer iterations) (~100x speed-up)
- Minimizes size of child model (decreases computational complexity of each training) (~20x speed-up)
- Less stochastic augmentation search space design (decreases number of iterations needed)

To achieve the second goal, making DeepAugment modular and user-friendly, the user interface is designed in a way that gives the user a broad configuration of possibilities and model selections (e.g. selecting the child model or inputting a self-designed child model, see configuration options).

#### Designing augmentation policies

DeepAugment aims to find the best augmentation policy for a given image dataset. An augmentation policy is defined as the sum of five sub-policies, which are made from two types of augmentation techniques and two real-values [0, 1], determining how powerfully each augmentation technique will be applied. I implemented augmentation techniques using the imgaug package, which is known for its large collection of augmentation techniques (see below).

Augmentations are most effective when they are diverse and randomly applied. For instance, instead of rotating every image, it is better to rotate some portion of images, shear another portion, and apply a color inversion for another. Based on this observation, DeepAugment applies one of five sub-policies (consisting of two augmentations) randomly to the images. During the optimization process, each image has an equal chance (16%) of being augmented by one of five sub-policies and a 20% chance of not being augmented at all.

While I was inspired by AutoAugment for this policy design, there is one main difference: I do not use any parameters for the probability of applying sub-policies in order to make policies less stochastic and allow optimization in fewer iterations.

This policy design creates a 20-dimensional search space for the Bayesian optimizer, where 10 dimensions are categorical (type of augmentation technique) and the other 10 are real-values (magnitudes). Since categorical values are involved, I configured the Bayesian optimizer to use a random forest estimator.

#### How DeepAugment finds the best policies

The three major components of DeepAugment are the **controller** (Bayesian optimizer), the **augmenter**, and the **child model**, with the overall workflow as follows: the controller samples new augmentation policies, the augmenter transforms images by the new policy, and the child model is trained from scratch by the augmented images.

A reward is calculated from the child model’s training history. The reward is returned back to the controller, and it updates its surrogate model with this reward and associated augmentation policy (see section “*How Bayesian optimization works” *below). The controller then samples new policies again and the same steps repeat. This process cycles until the user-determined maximum number of iterations are reached.

The controller (Bayesian optimizer) is implemented using scikit-optimize library’s ask-and-tell method. It is configured to use a **random forest estimator** as its base estimator and **expected improvement **as its acquisition function.

#### How Bayesian optimization works

The aim of Bayesian optimization is to find a set of parameters that maximize the value of the objective function. A working cycle of Bayesian optimization can be summarized as:

- Build a surrogate model of the objective function
- Find parameters that perform best on the surrogate
- Execute the objective function with these parameters
- Update the surrogate model with these parameters and the score of the objective function
- Repeat steps 2–4 until the maximum number of iterations is reached

For more information about Bayesian optimization, read this blog explaining it at a high-level, or take a glance at this review paper.

#### Trade-offs of Bayesian optimization

Currently, the standard approaches used for hyper-parameter optimization are random search, grid search, Bayesian optimization, evolutionary algorithms, and reinforcement learning, in the order of method complexity. Bayesian optimization is a better choice than grid search and random search in terms of accuracy, cost, and computation time for hyper-parameter tuning (see an empirical comparison here). This is due to the fact that Bayesian optimization learns from runs with the previous parameters, contrary to grid search and random search.

When Bayesian optimization is compared against reinforcement learning and evolutionary algorithms, it provides competitive accuracies while requiring far fewer iterations. Google’s AutoAugment, for example, iterates 15,000 times in order to learn good policies (which means training the child CNN model 15,000 times). Bayesian optimization, on the other hand, learns good polices in 100–300 iterations. A rule of thumb for Bayesian optimization is making the number of iterations as much as the number of optimized parameters times 10.

### Challenges and solutions

**Challenge 1: **Optimizing for augmentation requires a lot of computational resources, since the child model should be trained from scratch over and over. This dramatically slowed down the development process of my tool. Even though usage of Bayesian optimization made it faster, the optimization process was still not fast enough to make development feasible.

**Solutions:** I developed two solutions. First, I optimized the child CNN model (see below), which is the computational bottleneck of the process. Second, I designed augmentation policies in a more deterministic way, making the Bayesian optimizer require fewer iterations.

**Challenge 2: **I encountered an interesting problem during the development of DeepAugment. During the optimization of augmentations by training the child model over and over, they started to overfit to the validation set. I discovered that my best-found policies perform poorly when I changed the validation set. This is an interesting case because it is different than overfitting, in the general sense, where model weights are overfitting to the noise in the data.

**Solution:** Instead of using the same validation set, I reserved the rest of the data and the training data as the "seed validation set", and sampled a validation set with 1000 images at each training of the child CNN model (see data pipeline below). This solved the augmentation overfitting problem.

### How to integrate into your ML pipeline

DeepAugment is published on PyPI. You can install it from your terminal by running:

$ pip install deepaugment

And usage is easy:

from deepaugment.deepaugment import DeepAugment

deepaug = DeepAugment(my_images, my_labels)

best_policies = deepaug.optimize()

A more advanced usage, by configuring DeepAugment:

from keras.datasets import cifar10

# my configuration

my_config = {

"model": "basiccnn",

"method": "bayesian_optimization",

"train_set_size": 2000,

"opt_samples": 3,

"opt_last_n_epochs": 3,

"opt_initial_points": 10,

"child_epochs": 50,

"child_first_train_epochs": 0,

"child_batch_size": 64

}

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# X_train.shape -> (N, M, M, 3)

# y_train.shape -> (N)

deepaug = DeepAugment(x_train, y_train, config=my_config)

best_policies = deepaug.optimize(300)

For more detailed installation/usage information, visit the project's README or run the Google Colab notebook tutorial.

### Conclusion

To our knowledge, DeepAugment is the first method utilizing Bayesian optimization to find the best data augmentations. Optimization of data augmentation is a recent research area, and AutoAugment was one of the first methods tackling this problem.

The main contribution of DeepAugment to the open-source community is that it makes the process scalable, enabling users to optimize augmentation policies without needing huge computational resources*. It is very modular and >50 times faster than the previous solution, AutoAugment.

DeepAugment is shown to** reduce error by 60%** for a WideResNet-28-10 model using the CIFAR-10 small image dataset when compared to the same model and dataset without augmentation.

DeepAugment currently only optimizes augmentations for the image classification task. It could be expanded to optimize for object detection or segmentation tasks, and I welcome your contributions if you would like to do so. However, I would expect that the best augmentation policies are very dependent on the type of dataset, and less so on the task (such as classification or object detection). This means AutoAugment should find similar strategies regardless of the task, but it would be very interesting if these strategies end up being very different!

While DeepAugment currently works for image datasets, it would be very interesting to extend it for text, audio or video datasets. The same concept is applicable to other types of datasets as well.

*DeepAugment takes 4.2 hours (500 iterations) on CIFAR-10 dataset which costs around $13 using AWS p3.x2large instance.

#### Acknowledgements

I have done this project during my time at the Insight Artificial Intelligence Fellows program over a three week period. I give my thanks to program directors Matt Rubashkin and Amber Roberts for their very helpful guidance, and to my technical advisor Melissa Runfeldt for helping me problem solve along the way. I thank to Amber Roberts, Emmanuel Ameisen, Holly Szafarek, and Andrew Forrester for their suggestions and editing work on this blog post.

*Want to advance your career in Data Science and Artificial Intelligence? **Apply today!** **Learn more about the **Artificial Intelligence** program at Insight!*

### Resources

**GitHub:** github.com/barisozmen/deepaugment

**Demo slide deck:** bit.ly/deepaugmentslides

**Colab tutorial:** bit.ly/deepaugmentusage