The groundbreaking research conducted over the past decade is largely responsible for the remarkable achievements of machine learning. Machine learning applications are now necessary to address a wide range of real-world problems, including facial recognition, fraudulent transactions, machine translation, and disease detection and protein sequence prediction in medicine. However, progress in these areas has required tedious manual labor in task-specific design and training of neural networks, which consumes a significant amount of human and computational resources that most practitioners do not have access to. In contrast to this task-specific approach, general-purpose models such as DeepMind’s Perceiver IO, Google’s Gato, and Pathway were designed to solve multiple tasks simultaneously. However, practitioners cannot even determine whether fine-tuning any of these models would be effective for their target task, as these private pre-trained models are not publicly available. Furthermore, a general-purpose model cannot be built independently from scratch due to the massive computational power and training data required.
The domain of automated machine learning (AutoML), which strives to create high-quality models for various tasks with the least amount of human labor and computer resources, offers a more accessible option. Neural Architecture Search (NAS) can be used to automate the neural network designs for various learning tasks. Although NAS has enabled the use of AutoML in several well-researched areas, its use for applications outside of computer vision is still not well understood. On this front, a team of scientists from Carnegie Mellon University worked to develop a method to find a good balance between expressivity and efficiency in NAS, as AutoML is believed to have the greatest impact in less studied areas.
The team presented DASH, a NAS technique that uses task-specific convolutional neural networks (CNNs) to achieve excellent prediction accuracy. The efficiency of convolutions as feature extractors is well known, and recent work demonstrates the effectiveness of modern CNNs in various tasks. From this foundation, the team worked to extend the generalization capabilities of CNNs, such as the state-of-the-art performance of the ConvNeXt model, which incorporates many techniques used by Transformers. The central premise of their research is that optimizing a basic CNN topology with the right kernel sizes and dilation rates can produce models competitive with those of experts.
Most NAS approaches to building task-specific models have two main components, a search space that defines all possible networks and a search algorithm that searches the search space until a final model is built. Only when the search space is meaningful enough for the job will a model prove effective. It is therefore assumed that there is sufficient time to examine all possible architectural configurations in space. In NAS research, this conflict between the expressivity of the search space and the effectiveness of the search algorithms has been prominent. Existing methods either consider very meaningful search spaces with insurmountable search algorithms or are intended to quickly examine a large number of architectures in small search spaces.
DASH fills this gap by using CNN as a base and searching for the best kernel configurations. The assumption is that attention-based architectures can compete with modern convolution models like ConvNeXt and Conv-Mixer and that different kernel sizes and dilations can further improve the feature extraction process for different tasks. For example, small kernels are often more efficient for modeling long-range dependencies in sequencing tasks than large filters for visual tasks, which typically require small filters to detect low-level features such as edges and corners.
The researchers had to consider many cores with different core sizes and dilation rates, resulting in a combinatorial explosion of possible configurations that was one of their biggest hurdles. To address this problem, they presented three methods that take advantage of fast matrix multiplication on GPUs and the mathematical features of convolution. DASH was tested on ten different tasks, including input dimensions (1D and 2D), prediction types (point and density), and multiple domains including music, genomics, ECG, vision, and audio.
Researchers performed many experimental tests to confirm that DASH strikes a balance between expressivity and efficiency. Using the Wide ResNet backbone, its performance was evaluated on ten different NAS Bench 360 workloads. Among other notable findings, DASH DARTS outperforms 7 out of 10 tasks, ranks first among all NAS baselines, and seeks up to 10x faster than existing NAS techniques. In addition, it outperforms classic non-DL methods like Auto-Sklearn and general-purpose models like Perceiver IO.
DASH outperforms manually created expert models on 7 out of 10 tests. The sophistication of expert networks varies from task to task, but DASH’s performance on tasks like Darcy Flow shows that it can hold its own against highly specialized networks. This suggests that a promising strategy for developing models in new areas is to equip backbone networks with task-specific kernels. DASH consistently outperforms DARTS in terms of speed, and its seek process often takes a fraction of the time it takes to train the backbone. In summary, DASH is faster and more productive than most NAS technologies.
CMU developed DASH with the vision to extend AutoML to find effective models to solve various real-world problems. DASH finds a balance between expressiveness and efficiency in NAS, which helps it produce high-quality models. However, the researchers believe that this is still only the first step in the enormous challenge of AutoML for various jobs. The group awaits the input of other researchers working to create more automated and valuable approaches for future processing of various tasks.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Efficient Architecture Search for Diverse Tasks'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and github link.
Please Don't Forget To Join Our ML Subreddit
Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from Indian Institute of Technology (IIT), Goa. She is passionate about machine learning, natural language processing and web development. She enjoys learning more about the technical field by participating in multiple challenges.