By Prof. Smitha Rao, School of Computational and Data Science, Vidyashilp University
Since all things have causes, the knowledge of anything is not acquired or complete unless it is known through its causes. – Ibn Sina
Data science is the science of extracting information and value from data to enable data-driven decision-making. Essentially, it’s about using data to solve real-world problems. As an interdisciplinary science, data science borrows its theories and practices from statistics, mathematics, computer science, and various other application areas such as economics, social sciences, psychology, healthcare, business administration, and finance, among others. Technologies driving the advancement of data science are artificial intelligence, machine learning, deep learning, reinforcement learning, and natural language processing to name a few. Data scientists aim to make reliable predictions and conclusions through in-depth analysis of large amounts of current and historical, multivariate data.
Recent advances in data availability, new technologies, computing power and a high level of maturity of AI algorithms have made data science applicable to all fields. Aside from using data science to unlock value for business applications, it is now considered an indispensable and integral part in the fields of social sciences, drug discovery, life sciences, molecular biology, and the like. Artificial intelligence, especially machine learning, has made coveted contributions to various scientific research and development initiatives. This article examines two use cases where data science is being used to improve life and society – data science in development economics and data science in proteomics and drug discovery.
Data Science in Development Economics
Piloting national censuses is a very expensive and laborious task. Many developing and underdeveloped countries have conducted few poverty surveys, resulting in a lack of reliable data for policymakers and researchers to develop robust solutions. These obstacles have paved the way for using machine learning algorithms to predict various parameters. B. Predicting poverty levels using alternative correlated data, particularly using cellphone records, and high-resolution satellite imagery data to identify features such as metal roofs, paved roads, nighttime (night light) imagery, etc. to identify higher-income areas and accurately predict wealth. A variety of AI algorithms are used to detect, manage and predict various other socio-economic outcomes such as agricultural yields, weed detection, literacy levels, etc.
Data science in proteomics and drug discovery
Proteomics is the large-scale study of proteins and their structure. Proteins, present in every cell in the human body, are the basic building blocks of life. The folded three-dimensional structure of a protein determines its functionality. Predicting the 3D protein structure from its sequence is a long-standing and intensively researched problem due to the large number of proteins and the computationally intensive nature of the problem. Data in this area is massive and highly complex. Nowadays, deep learning algorithms like AlphaFold have solved this difficult problem by predicting the structure of proteins based on their sequence with relatively high accuracy. This has paved the way for relevant drug discoveries and ongoing research into diseases caused by aberrant protein structures, such as B. Alzheimer’s, Parkinson’s, etc. Machine learning also contributes to clinical research by increasing the effectiveness of the pre-trial phase, data analysis and participant selection and management.
Data science generally relies on credible and unbiased data that is not always readily available. Also, AI predictions are not 100% accurate. Based on this uncertainty, reliable systems are defined as those that support active human-machine interaction. This mechanism is called the “Human-In-the-Loop” (HITL) system. HITL allows machines to adapt the system through constant feedback from humans, resulting in a reliable system and thus optimizing the learning process. We believe that the expanded power of AI will enable us to find concrete solutions to most of the problems plaguing the world today.