IBM’s newly announced Artificial Intelligence Unit (AIU) is IBM’s first system-on-chip design. The AIU is an application-specific integrated circuit (ASIC) designed to train and run deep learning models that require massively parallel computing. The AIU is much faster than existing CPUs, which predate the development of deep learning for traditional software applications by several years. IBM has not specified a release date for the AIU
The IBM Research AI Hardware Center has been developing the new AIU chip for five years. The center focuses on the development of next-generation chips and AI systems to improve the efficiency of AI hardware by 2.5 times annually and train and run AI models 1,000 times faster in 2029 than in 2019.
Unpacking the AIU
According to the IBM blog, “Our full system-on-chip has 32 processing cores and contains 23 billion transistors – roughly the same number as in our z16 chip. The IBM AIU is also designed to be as easy to use as a graphics card. It can be plugged into any computer or server with a PCIe slot.”
Deep learning models traditionally rely on a combination of CPU and GPU coprocessors to train and run models. GPUs were originally developed to render graphical images, but later the technology found advantages for use in artificial intelligence.
The IBM AIU is not a graphics processor. It is specifically designed and optimized to speed up matrix and vector calculations used by deep learning models. The AIU can solve computationally complex problems and perform data analysis at speeds far beyond the capabilities of a CPU.
Growth of AI and Deep Learning
The growth of deep learning is putting the available computing power under resource pressure. AI and deep learning models are growing exponentially across all industries for a variety of applications.
Besides growth, another issue is model size. Deep learning models are huge, with billions and sometimes trillions of parameters. Unfortunately, according to IBM, hardware efficiency has lagged behind the exponential growth of deep learning.
In the past, the calculation relied on high-precision 64- and 32-bit floating-point arithmetic. IBM believes that a degree of precision is not always required. It has a term for reducing traditional calculation accuracy – “approximate calculation”. In its blog, IBM explains its rationale for using approximate computing:
“Do we need this level of accuracy for common deep learning tasks? Does our brain need a high-resolution image to recognize a family member or a cat? When we enter a text thread for the search, we need precision in the relative ranking of the 50,002nd most useful answer against the 50.003approx? The answer is that many tasks, including these examples, can be accomplished with approximate computing.”
When designing the new AIU chip, the approximate calculation played an essential role. IBM researchers designed the AIU chip with less precision than needed from a CPU. Lower precision was crucial to achieve high computational density in the new AIU hardware accelerator. Instead of the 32-bit floating-point or 16-bit floating-point arithmetic typically used for AI training, IBM used hybrid 8-bit floating-point computations (HFP8). The lower precision calculation allowed the chip to operate twice as fast as FP16 calculations while delivering similar training results.
There seemed to be conflicting design goals, but the conflict did not pose a problem for IBM. While low-precision computation was required to achieve higher density and faster computation, deep learning (DL) models needed precision be at a level consistent with high-precision calculation.
IBM developed the chip for optimized AI workflows. According to IBM, “Our chip architecture has a simpler layout than a general-purpose CPU because most AI calculations involve matrix and vector multiplication. IBM designed the AIU to send data directly from one computing machine to the next, resulting in huge energy savings.”
IBM’s announcement contained very little technical information about the chip. However, we can gain some insight into its performance by looking back at the demonstration of its first prototype when IBM presented the performance results of its early 7nm chip design at the International Solid-State Circuits Conference (ISSCC) in 2021.
Instead of 32 cores, IBM’s prototype for the conference demonstration was an experimental quad-core 7nm AI chip supporting fp16 and hybrid-fp8 formats for DL model training and inference. It also supported int4 and int2 formats for scaling inference. A summary of the prototype chip’s performance was included in a 2021 Lindley Group newsletter reporting IBM’s demonstration that year:
- At peak speed, the 7nm design achieved 1.9 teraflops per second per watt (TF/W) with HFP 8.
- TOPS measures how many math problems an accelerator can solve in one second. It provides a method to compare the performance of different accelerators on a given inference task. Using INT4 for inference, the experimental chip achieved 16.5 TOPS/W, outperforming Qualcomm’s low-power cloud AI module.
- Although few specifications and no prices have been released, a general price estimate would be in the $1500-$2000 range. If the price develops accordingly, the AIU should be able to establish itself quickly on the market.
- Due to a lack of information, it is not possible to directly compare the AIU and GPUs based solely on AI processing cores.
- Low-precision AIU technologies used in the AIU were based on previous IBM research that pioneered the first 16-bit reduced-precision systems for deep learning training, the first 8-bit training techniques, and most modern 2-bit systems performed inference results.
- According to IBM Research, the AIU chip uses a scaled version of the AI accelerator in its Telum chip.
- The Telum uses 7nm transistors, but the AIU uses faster 5nm transistors.
- It will be interesting to see how the AIU compares to other technologies when it releases in time for next year’s MLPerf benchmarking tests.
Note: Moor Insights & Strategy writers and editors may have contributed to this article.
Moor Insights & Strategy, like all research and technology industry analyst firms, offers or has provided paid services to technology companies. These services include research, analysis, consulting, benchmarking, acquisition matchmaking and speech sponsorship. The Company had or currently has paid relationships with 8×8, Accenture, A10 Networks, Advanced Micro Devices, Amazon, Amazon Web Services, Ambient Scientific, Anuta Networks, Applied Brain Research, Applied Micro, Apstra, Arm, Aruba Networks (now HPE ), Atom Computing, AT&T, Aura, Automation Anywhere, AWS, A-10 Strategies, Bitfusion, Blaize, Box, Broadcom, C3.AI, Calix, Campfire, Cisco Systems, Clear Software, Cloudera, Clumio, Cognitive Systems, CompuCom, Cradlepoint, CyberArk, Dell, Dell EMC, Dell Technologies, Diablo Technologies, Dialogue Group, Digital Optics, Dreamium Labs, D-Wave, Echelon, Ericsson, Extreme Networks, Five9, Flex, Foundries.io, Foxconn, Frame (now VMware) , Fujitsu, Gen Z Consortium, Glue Networks, GlobalFoundries, Revolve (now Google), Google Cloud, Graphcore, Groq, Hiregenics, Hotwire Global, HP Inc., Hewlett Packard Enterprise, Honeywell, Huawei Technologies, IBM, Infinidat, Infosys , Inseego , IonQ, IonVR, Inseego, Infosys, Infiot, Intel, Interdigital, Jabil Circuit, Keysight, Konica Minolta, Lattice Semiconductor, Lenovo, Linux Foundation, Lightbits Labs, LogicMonitor, Luminar, MapBox, Marvell Technology, Mavenir, Marseille Inc, Mayfair Equity, Meraki (Cisco), Merck KGaA, Mesophere, Micron Technology , Microsoft, MiTEL, Mojo Networks, MongoDB, MulteFire Alliance, National Instruments, Neat, NetApp, Nightwatch, NOKIA (Alcatel-Lucent), Nortek, Novumind, NVIDIA, Nutanix, Nuvia (now Qualcomm), onsemi, ONUG, OpenStack Foundation , Oracle, Palo Alto Networks, Panasas, Peraso, Pexip, Pixelworks, Plume Design, PlusAI, Poly (formerly Plantronics), Portworx, Pure Storage, Qualcomm, Quantinuum, Rackspace, Rambus, Rayvolt E-Bikes, Red Hat, Renesas, Residio , Samsung Electronics, Samsung Semi, SAP, SAS, Scale Computing, Schneider Electric, SiFive, Silver Peak (now Aruba-HPE), SkyWorks, SONY Optical Storage, Splunk, Springpath (now Cisco), Spirent, Splunk, Sprint (now T – Mobile), Stratus Technologies, Symantec, Synaptics, Syni verse, Synopsys, Tanium, Telesign, TE Connectivity, TensTorrent, Tobii Technology, Teradata, T-Mobile, Treasure Data, Twitter, Unity Technologies, UiPath, Verizon Communications, VAST Data, Ventana Micro Systems, Vidyo, VMware, Wave Computing, Wellsmith, Xilinx, Zayo, Zebra, Zededa, Zendesk, Zoho, Zoom and Zscaler. Patrick Moorhead, Founder, CEO and Principal Analyst of Moor Insights & Strategy, is an investor in dMY Technology Group Inc. VI, Dreamium Labs, Groq, Luminar Technologies, MemryX and Movandi.
Patrick Moorhead, Founder, CEO and Principal Analyst of Moor Insights & Strategy, is an investor in dMY Technology Group Inc. VI, Dreamium Labs, Groq, Luminar Technologies, MemryX and Movand