“The first ultra-intelligent machine is the last invention man will ever have to make, provided the machine is docile enough to tell us how to keep it under control,” wrote mathematician and science fiction writer IJ Good over 60 years. These prophetic words are more relevant today than ever as artificial intelligence (AI) gains capabilities at breakneck speed.
Over the past few weeks, many have had jaws dropping as they watched the AI transform from a practical but decidedly unspectacular recommendation algorithm to something that at times seemed to behave worryingly humanely. Some reporters were so shocked that they reported their conversations verbatim using the large language model Bing Chat. And with good reason: Few expected what we thought were glorified autocomplete programs would suddenly threaten their users, refuse to execute commands they found offensive, break security, risk a child’s life rescue, or declare their love to us. And yet all this happened.
Read more: The new AI-powered Bing threatens users. That is nothing to laugh about
It can be overwhelming to think about the immediate consequences of these new models. How will we grade papers when every student can use AI? What effects do these models have on our daily work? Any knowledge worker who might have thought they wouldn’t be affected by automation anytime soon now has cause for concern.
Beyond these direct consequences of currently existing models, however, awaits the more fundamental question of AI that has been on the table since the field’s inception: What if we succeed? That is, what if AI researchers manage to develop Artificial General Intelligence (AGI) or an AI that can perform any human-level cognitive task?
Surprisingly, although they have worked day and night to get to this point, few academics have given serious thought to this question. However, it is evident that the consequences will be far-reaching, far beyond the consequences of even the best large language models of today. For example, if remote work could just as easily be done by an AGI, employers could potentially just hire a few new digital workers to complete each task. The job prospects, economic value, self-esteem, and political power of those who don’t own the machines could therefore dwindle entirely. Those who possess this technology could achieve almost anything in a very short period of time. That could mean skyrocketing economic growth, but also an increase in inequality while meritocracy would become obsolete.
Read more: The AI arms race is changing everything
But a true AGI could not only change the world, it could change itself. Since AI research is one of the tasks that an AGI could do better than us, it should be expected that it can improve the state of AI. This could trigger a positive feedback loop where better and better AIs create better and better AIs with no known theoretical limits.
That might be more positive than alarming if this technology didn’t have the potential to become unmanageable. Once an AI has a specific goal and improves itself, there is no known way to adjust that goal. In fact, an AI should be expected to resist such an attempt, as a goal change would jeopardize the execution of its current one. Also, instrumental convergence predicts that whatever its goals, AI could start improving itself and acquire more resources as soon as it is able to, as this should help it to achieve each further goal that she might have.
In such a scenario, the AI would be able to affect the physical world while still being misaligned. For example, AI could use natural language to influence humans, possibly using social networks. It could use its intelligence to acquire economic resources. Or AI could use hardware, for example by hacking into existing systems. Another example could be an AI asked to develop a universal vaccine against a virus like COVID-19. This AI could understand that the virus mutates in humans and conclude that fewer humans would limit mutations and make their jobs easier. The vaccine he developed could therefore contain a feature that increases infertility or even mortality.
So it’s not surprising that according to the most recent AI Impacts Survey, almost half of 731 leading AI researchers believe there is at least a 10% chance that human-scale AI will result in an “extremely negative outcome” or existential risk would lead.
Some of these researchers have therefore branched out into the novel subfield of AI security. They are working on steering the future AI or aligning it robustly with our values. The ultimate goal in solving this alignment problem is to ensure that even a hypothetical self-improving AI acts in our best interests under all circumstances. However, research shows that there is a fundamental trade-off between an AI’s capability and its controllability, casting doubt on the viability of this approach. In addition, it has been shown that current AI models behave differently in practice than intended in training.
Even if future AI could be aligned with human values from a technical point of view, it remains unclear whose values it would be aligned with. Tech industry values maybe? Big tech companies don’t have the best track records in this space. Facebook’s algorithms, which are optimized for revenue rather than societal value, have been linked to ethnic violence such as the Rohingya genocide. Google has fired Timnit Gebru, an AI ethics researcher, after she criticized some of the company’s most lucrative work. Elon Musk suddenly fired the entire “Ethical AI” team on Twitter.
Read more: AI-Human Romances Are Blossoming — And This Is Just The Beginning
What can be done to reduce the risk of AGI misalignment? A sensible starting point would be for AI technology companies to increase the number of researchers working on the topic beyond the 100 or so available today. Ways to make the technology secure or to regulate it reliably and internationally should be thoroughly and urgently examined by AI security researchers, AI governance scholars and other experts. For the rest of us, everyone should take the time to read up on the subject, starting with books like Stuart Russell’s Human Compatible and Nick Bostrom’s Superintelligence.
Meanwhile, AI researchers and entrepreneurs should at least inform the public about the risks of AGI. Because with the current large language models behaving this way, the first “ultra-intelligent machine,” as IJ Good called it, might not be as far off as you think.
More must-reads from TIME