Image: Schematic diagram of the process of PanGu training and PanGu application to a variety of AI drug screening tasks. The top shows PanGu’s conditional variation autoencoder structure, pretrained with chemical structures of 1.7 billion small molecules. The following diagram shows the AI-assisted drug screening flowchart, which consists of AI tasks of compound-protein interaction prediction, molecular property prediction, drug-like optimization, and library creation, and shows the molecule screening procedure of an initial molecular database to a HIT candidate library, lead-like library and final screening parent library. Arrows indicate that PanGu Fingerprint and PanGu Decoder take over the corresponding tasks. see more
Photo credit: ©Science China Press
This study is led by Dr. Nan Qiao (Laboratory of Health Intelligence, Huawei Cloud Computing Technologies), Dr. Hualiang Jiang (Shanghai Institute of Materia Medica, Chinese Academy of Sciences) and Dr. Mingyue Zheng (Shanghai Institute of Materia Medica, Chinese) headed Academy of Sciences). “Over the past year, the parameter size of the language model has continued to grow, exceeding 175 billion GPT3. Recently, ChatGPT, a new-generation language model, is interacting with users in more realistic ways, such as answering questions, admitting mistakes, challenging wrong questions, or rejecting inappropriate requests, and is even thought to subvert search engines,” said Dr. says Qiao.
In addition to language models, areas such as image, video and multimodality were refreshed at the same time by transformer architectures. These large models typically use self-supervised learning, which can significantly reduce workload and achieve better performance on long-tail tasks. However, in the field of AI for drug research, there has not been a really big model to speed up drug research and development and improve efficiency.
Xinyuan Lin and Zhaoping Xiong, together with laboratory director Nan Qiao, attempted to create a large-scale drug discovery model that can be used for drug discovery tasks such as molecular property prediction, molecule generation and optimization. The team proposes a novel asymmetric graph-to-sequence (graph2seq) structure that differs from classical sequence-to-sequence (seq2seq) and graph-to-graph (graph2graph) variational autocoding processes. The model is pre-trained for 1.7 billion drug-like molecules (currently the largest), the input is a two-dimensional undirected cyclic graph of drug-like molecules, and the output is the corresponding chemical formula or SMILES string. Humans read pictures of chemical structures and write down the text of the corresponding formulas, so Pangu can learn the relationship between chemical structures and formula chains after billions of repetitions, similar to human cognitive transformations (Figure 1).
After pre-training on 1.7 billion drug-like small molecules, the model achieved state-of-the-art results in 20 drug discovery tasks, including prediction of molecular properties. (prediction of ADMET properties, compound-protein interactions, drug-drug interactions and chemical reaction yields), molecular generation and molecular optimization. Pangu Molecular Generator has also created a new drug screening library of 100 million drug-like small molecules with 99.68% novelty, which can also effectively generate new compounds with similar physicochemical properties to a certain distribution. This library can be used to supplement the existing substance database. In addition, Pangu Molecular Optimizer can optimize the chemical structure of the starting molecule and improve the properties of the molecule of interest. A multi-objective automatic optimization network application realized by the Pangu drug model is shown at http://www.pangu-drug.com/.
See the article:
PanGu Drug Model: Learn a molecule like a human
https://doi.org/10.1007/s11427-022-2239-y
diary
Science China Life Sciences
Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of the press releases published on EurekAlert! by contributing institutions or for the use of information about the EurekAlert system.