Mar 22 2019
An innovative field in artificial intelligence (AI) involves applying algorithms to automatically create machine-learning systems called neural networks. Such neural networks are considered to be more efficient and precise than those designed by human engineers.
However, this supposed neural architecture search—NAS for short—method is computationally costly. Google recently developed a sophisticated NAS algorithm to operate on a group of graphical processing units (GPUs); this algorithm took about 48,000 GPU hours to create one convolutional neural network (CNN), which is employed for detection and image classification tasks. Google has the capacity to run countless numbers of GPUs and other dedicated hardware simultaneously, but for a majority of other companies, that is beyond their reach.
In this regard, MIT researchers will present a paper at the International Conference on Learning Representations in May. This paper includes details on how a NAS algorithm has the potential to directly learn specific CNNs for target hardware platforms—when operated on a huge image dataset—in just 200 GPU hours. Such an approach could give way to a much wider application of these kinds of algorithms.
According to the researchers, the time- and cost-saving algorithm may benefit resource-strapped companies and researchers. The main objective is “to democratize AI,” stated Song Han, co-author and an assistant professor of electrical engineering and computer science and also a researcher in MIT’s Microsystems Technology Laboratories. “We want to enable both AI experts and nonexperts to efficiently design neural network architectures with a push-button solution that runs fast on a specific hardware,” he stated. Human engineers can never be replaced by such NAS algorithms, Han added.
The aim is to offload the repetitive and tedious work that comes with designing and refining neural network architectures.
Song Han, Study Co-Author and Assistant Professor, Department of Electrical Engineering and Computer Science, MIT.
“Path-level” binarization and pruning
In their experimental study, the investigators devised methods to reduce computing times, remove unwanted neural network design components, and use just a fraction of hardware memory to operate a NAS algorithm. Through an extra innovation, each outputted CNN can be ensured to run in a more efficient manner on particular hardware platforms—like mobile devices, GPUs, and CPUs—when compared to those developed through conventional methods. During the tests, the CNNs developed by the researchers were 1.8 times faster when measured on a mobile phone as opposed to conventional gold-standard versions having analogous precision.
The architecture of a CNN includes computation layers with modifiable parameters, known as “filters,” and the potential connections between those filters. Image pixels are processed by filters in grids of squares, like 3x3, 5x5, or 7x7, and each filter covers one square. In essence, the filters move across the image, integrating all the colors of their enclosed grid of pixels into one pixel. Different-sized filters may be present in different layers, which connect to share information in different manners. This gives a condensed image—from the integrated data from all the filters—that can be examined more easily by a computer.
Considering the number of potential architectures to select from—known as the “search space”—is extremely large, it is computationally prohibitive to use the NAS algorithm to produce a neural network on large image datasets. Usually, engineers run the NAS algorithm on proxy datasets, which are smaller, followed by transferring their learned architectures of CNN to the target task. However, this is a generalization technique, which lowers the accuracy of the model. In addition, the same outputted architecture is used on all hardware platforms, resulting in efficiency problems.
The new NAS algorithm was trained and tested by the team on an image classification task directly in the ImageNet dataset; this dataset comprises of an infinite number of images in a thousand classes. The researchers initially produced a search space containing all potential CNN “paths”—implying how the filters and layers connect for data processing. This provides a free reign to the NAS algorithm to locate an optimal architecture.
Usually, this would imply that all potential paths have to be preserved in memory, which would surpass the limits of GPU memory. In order to deal with this, the team made use of a method known as “path-level binarization,” which is capable of storing just a single sampled path at a given time and saving an order of magnitude in terms of memory consumption. This binarization is integrated with “path-level pruning,” a method that normally learns which type of “neurons” in a neural network can be removed without causing any impact on the output. Yet, instead of removing the neurons, the team’s NAS algorithm prunes the whole paths, entirely changing the architecture of the neural network.
At the time of training, all paths are first provided the same probability for selection. The paths are subsequently traced by the NAS algorithm, which stores just one path at a time to note the loss (a numerical penalty allocated for wrong predictions) and precision of their outputs. Following this, the algorithm modifies the paths’ probabilities in order to improve efficiency as well as precision. Finally, all the low-probability paths are pruned by the NAS algorithm and only the highest-probability path is kept. This path is the ultimate CNN architecture.
Hardware-aware
Han informed that another major breakthrough was making the NAS algorithm “hardware-aware,” which means it utilizes the latency on every hardware platform as a feedback signal so as to improve the architecture. For example, to determine this latency on mobile devices, top companies like Google will use a “farm” of mobile devices, which is rather costly. Instead, the investigators developed a model that can estimate the latency by utilizing just one mobile phone.
For every selected network layer, the new NAS algorithm samples the architecture on that latency-prediction model. Subsequently, it utilizes that data to create an architecture that operates as rapidly as possible and, at the same time, attains a high level of precision. During experiments, the CNN developed by the team ran almost twice as fast as that of a gold-standard version on mobile devices.
According to Han, one fascinating result was that the researchers’ NAS algorithm created CNN architectures that were traditionally ignored as being extremely inefficient; however, in the tests performed by researchers, the architectures were, in fact, improved for specific hardware. Engineers, for example, have actually ceased using 7x7 filters, as they are considered to be computationally more costly when compared to smaller and multiple filters. However, the new NAS algorithm successfully detected architectures, while a few layers of 7x7 filters optimally ran on GPUs. That is because GPUs are known to possess high parallelization—implying that they simultaneously compute a number of calculations, and hence, can process one large filter at once in a more efficient way as opposed to processing many tiny filters one at a time.
This goes against previous human thinking. The larger the search space, the more unknown things you can find. You don’t know if something will be better than the past human experience. Let the AI figure it out.
Song Han, Study Co-Author and Assistant Professor, Department of Electrical Engineering and Computer Science, MIT.
The study was partly supported by the MIT Quest for Intelligence, the MIT-IBM Watson AI lab, Xilinx, and SenseTime.