As machine learning models grow larger and more complex, they require faster and more energy-efficient hardware to perform calculations. Traditional digital computers are struggling to keep up.
An analog optical neural network could perform the same tasks as a digital one, such as: B. image classification or speech recognition, but since calculations are performed with light instead of electrical signals, optical neural networks can run many times faster while using less energy.
However, these analog devices are prone to hardware errors that can make calculations less precise. Microscopic flaws in hardware components are one cause of these errors. In an optical neural network with many connected components, errors can quickly accumulate.
Even with error correction techniques, some level of error is inevitable due to fundamental properties of the devices that make up an optical neural network. A network large enough to be implemented in the real world would be far too imprecise to be effective.
MIT researchers overcame this hurdle and found a way to effectively scale an optical neural network. By adding a tiny hardware component to the optical switches that make up the network architecture, they can even reduce the uncorrectable errors that would otherwise accumulate in the device.
Their work could enable a super-fast, low-power, analog neural network that can operate with the same accuracy as a digital one. With this technique, the amount of error in their calculations actually decreases as an optical circuit gets larger.
“This is remarkable because it is counterintuitive to analog systems, where larger circuits are expected to have higher errors, so errors limit scalability. This present paper allows us to address the question of the scalability of these systems with an unequivocal ‘yes’,” says lead author Ryan Hamerly, visiting scientist at the MIT Research Laboratory for Electronics (RLE) and Quantum Photonics Laboratory and principal scientist at NTT Research.
Hamerly’s co-authors are graduate student Saumil Bandyopadhyay and senior author Dirk Englund, associate professor in the MIT Department of Electrical Engineering and Computer Science (EECS), director of the Quantum Photonics Laboratory and member of the RLE. The research is published in nature communication.
Multiply by light
An optical neural network consists of many connected components that function like reprogrammable, tunable mirrors. These tunable mirrors are called Mach-Zehnder interferometers (MZI). Neural network data is encoded in light that is shot into the optical neural network by a laser.
A typical MZI contains two mirrors and two beam splitters. Light enters an MZI at the top, where it is split into two parts that interfere with each other before being recombined by the second beamsplitter and then reflected down to the next MZI in the array. Researchers can use the interference of these optical signals to perform complex linear algebra operations known as matrix multiplication, which neural networks use to process data.
But errors that can occur in any MZI quickly add up as light travels from one device to the next. Some failures can be avoided by identifying them in advance and tuning the MZIs so that earlier failures are canceled out by later devices in the array.
“It’s a very simple algorithm once you know what the errors are. But these errors are notoriously difficult to detect because you only have access to the chip’s inputs and outputs,” says Hamerly. “This motivated us to check whether it is possible to create a calibration-free error correction.”
Hamerly and his collaborators previously demonstrated a mathematical technique that went one step further. They were able to successfully deduce the errors and properly tune the MZIs accordingly, but even this did not eliminate all errors.
Due to the fundamental nature of an MZI, there are cases where it is impossible to tune a device so that all light flows from the bottom port to the next MZI. If the device loses a fraction of light with each step and the array is very large, only a tiny bit of power is left at the end.
“Even with error correction, there is a fundamental limit to how good a chip can be. MZIs are physically incapable of realizing certain settings that they need to be configured for,” he says.
So the team developed a new type of MZI. The researchers added an additional beamsplitter to the end of the device and called it a 3-MZI because it has three beamsplitters instead of two. Because of the way this additional beamsplitter mixes the light, it becomes much easier for an MZI to achieve the adjustment it needs to send all the light through its bottom port.
Importantly, the additional beamsplitter is only a few microns in size and is a passive component, so no additional wiring is required. Adding additional beamsplitters does not significantly change the size of the chip.
Bigger chip, fewer bugs
When the researchers ran simulations to test their architecture, they found that it can eliminate much of the uncorrectable error that affects accuracy. And as the optical neural network gets larger, the amount of error in the device actually goes down—the opposite of what happens in a device with standard MZIs.
With 3-MZIs, they could potentially create a device large enough for commercial use with errors reduced by a factor of 20, Hamerly says.
Researchers also developed a variant of the MZI design specifically for correlated errors. These occur because of manufacturing errors – if a chip’s thickness is slightly wrong, the MZIs can all differ by about the same amount, so the errors are all roughly the same. They found a way to change the configuration of an MZI to make it resilient to this type of failure. This technique also increased the bandwidth of the optical neural network, allowing it to run three times faster.
Now that they have demonstrated these techniques using simulations, Hamerly and his collaborators plan to test these approaches on physical hardware and continue working toward an optical neural network that they can use effectively in the real world.
This research is funded in part by a graduate research grant from the National Science Foundation and the US Air Force Office of Scientific Research.