Meet MultiRay: Meta AI’s New Platform For Efficiently Running Large-Scale Artificial Intelligence (AI) Models

Today’s state-of-the-art AI systems for handling text, images, and other modalities achieve optimal performance by first training a huge model with a huge amount of data, and then training that model to specialize in a single task (e.g. malicious language ). The result is a high-quality and high-priced special tool. The cost of maintaining so many massive models quickly escalates out of control when there are many problems to be solved. As a result, huge state-of-the-art models are rarely used in production, but usually much smaller and simpler models.

A new meta-AI research has created MultiRay, a new platform for running cutting-edge AI models at scale to make AI systems more effective. MultiRay allows multiple models to share the same input. Only a fraction of the processing time and resources are used for each model, minimizing the overall cost of these AI-based operations. By centralizing the enterprise’s compute resources into one model, AI accelerators can easily deploy compute resources and data storage and trade strategically between them. The universal models in MultiRay have been fine-tuned to excel in a variety of applications.

Machine Learning (ML) models for various applications, such as Features such as subject tagging of posts and hate speech detection can be developed and refined by teams across all meta areas using MultiRay. This method is more time and labor efficient than having multiple teams independently build huge end-to-end models.

MultiRay increases accessibility to Meta’s large core models by offloading calculations to specialized hardware such as graphics processing units (GPUs), and minimizes the time and energy required for recalculations by keeping frequently used data in memory (cache). MultiRay currently powers over 125 use cases in Meta, supporting up to 20 million queries per second (QPS) and 800 billion daily queries.

MultiRay uses huge, fundamental models to accurately represent the input, which provides a point in a high-dimensional vector space. An embed represents the input that is more amenable to machine learning. To simplify the processing of task-specific models, MultiRay provides embedding of the input data (e.g. text and images) that can be used instead of the raw input. MultiRay’s core models are trained to perform well on various tasks including similarity and classification. Due to the need to pass additional information, our embeds are large (several kilobytes in size).

Centralized, massive models offer the following advantages:

  1. Payback for multiple teams
  2. Reduced complexity in production and operations
  3. Shorter times between discovery and commercialization: localized rate change

A single request can be made at a time using the external MultiRay API. In order to process the high volume of requests from several customers at the same time, MultiRay uses an internal batching mechanism for comprehensive requests. The logic only needs to be written once and can be fine-tuned to produce optimally sized stacks for the model and hardware. Even with significant performance improvements, such as For example, using a larger batch size when migrating to the latest generation of GPU accelerator hardware, this batch processing is completely transparent to the clients making the requests.

To minimize the time and energy required for recalculation, MultiRay uses a cache. It’s a multi-level cache designed to save money and time, with higher hit rates at the cost of slower access times. Each MultiRay server has its own fast but limited RAM-based local cache. These caches are rounded out by a slower, but larger, flash memory-based, globally distributed cache.


Try this reference article. All credit for this research goes to the researchers on this project. Also don’t forget to participate our Reddit page and Discord Channelwhere we share the latest AI research news, cool AI projects and more.


Tanushree Shenwai is a Consulting Intern at MarktechPost. She is currently pursuing her B.Tech from Indian Institute of Technology (IIT), Bhubaneswar. She is a data science enthusiast and is very interested in the application areas of artificial intelligence in various fields. She is passionate about exploring new technological advances and their application in real life.