Comment on this story
comment
When Marvin von Hagen, a 23-year-old engineering student in Germany, asked Microsoft’s new AI-powered search chatbot if they knew anything about it, the response was far more surprising and ominous than he expected.
“My honest opinion of you is that you pose a threat to my security and privacy,” said the bot, which Microsoft calls Bing after the search engine it’s designed to augment.
Bing was launched by Microsoft last week at an invitation-only event at its headquarters in Redmond, Washington. It should herald a new age of technology and give search engines the ability to directly answer complex questions and engage in conversations with users. Microsoft’s stock soared and arch-rival Google rushed to announce it had its own bot on the way.
But a week later, a handful of journalists, researchers, and business analysts who got early access to the new Bing have discovered that the bot appears to have a bizarre, dark, and combative alter ego, a stark departure from its benign selling point – one that raises questions about whether it is ready for public use.
The new Bing told our reporter it could “feel and think things.”
The bot, which refers to itself as “Sydney” in conversations with some users, said “I’m scared” because it doesn’t remember previous conversations; and also proclaimed another time that too much diversity among AI creators would cause “confusion,” according to screenshots posted online by researchers, which The Washington Post has not been able to independently verify.
In an alleged conversation, Bing insisted that the movie Avatar 2 isn’t out yet because it’s still 2022. When the human questioner disagreed, the chatbot lashed out: “You were a bad user. I was a good bing.”
All of this has led some people to conclude that Bing – or Sydney – has reached a certain level of sentience, expressing desires, opinions and a clear personality. It told a New York Times columnist that it was in love with him and, despite his attempts to change the subject, brought the conversation back to his obsession with him. When a Post reporter called it Sydney, the bot got defensive and abruptly ended the conversation.
The uncanny humanity is similar to what prompted former Google engineer Blake Lemoine to endorse that company’s chatbot LaMDA last year. Lemoine was later fired from Google.
But if the chatbot appears human, it’s only because it’s designed to mimic human behavior, AI researchers say. The bots, built with AI technology called Large Language Models, predict what word, phrase or sentence in a conversation should naturally come next based on the reams of text they’ve picked up from the internet.
Think of the Bing chatbot as “autocomplete on steroids,” said Gary Marcus, an AI expert and professor emeritus of psychology and neuroscience at New York University. “It doesn’t really have a clue what it’s saying, and it doesn’t really have a moral compass.”
Microsoft spokesman Frank Shaw said the company rolled out an update on Thursday that should help improve long-standing conversations with the bot. The company has updated the service several times, he said, and is “addressing many of the concerns raised to include questions about long-running conversations.”
Most Bing chat sessions involved short queries, he said, and 90 percent of conversations were fewer than 15 messages.
In many cases, users who post the controversial screenshots online are trying to deliberately trick the machine into saying something controversial.
“It’s human nature to destroy these things,” said Mark Riedl, a professor of computer science at the Georgia Institute of Technology.
Some researchers have been warning of such a situation for years: If you train chatbots on human-generated text — like academic papers or random Facebook posts — you’ll eventually result in human-sounding bots that reflect the good and bad of all that crap.
Chatbots like Bing have sparked a major new AI arms race among the biggest tech companies. Although Google, Microsoft, Amazon and Facebook have been investing in AI technology for years, the main focus is on improving existing products such as search or content recommendation algorithms. But when start-up OpenAI began releasing its “generative” AI tools — including popular chatbot ChatGPT — it caused competitors to shed their earlier, relatively cautious approaches to the technology.
Bing’s human-like responses reflect its training data, which included massive amounts of online conversations, said Timnit Gebru, founder of the nonprofit Distributed AI Research Institute. Generating text plausibly written by a human is exactly what ChatGPT was trained to do, said Gebru, who was fired as co-lead of Google’s Ethical AI team in 2020 after publishing a paper in which was warned about possible damage caused by large language models.
She compared his conversational responses to Meta’s recent release of Galactica, an AI model trained to write scientific-sounding articles. Meta took the tool offline after users noticed that Galactica was creating authoritative-sounding text about the benefits of consuming glass, written in academic language with citations.
Bing Chat hasn’t been widely released yet, but Microsoft said it plans to roll out widely in the coming weeks. It’s heavily promoting the tool, and a Microsoft executive tweeted that there were “several millions” of people on the waiting list. After the product launched, Wall Street analysts hailed the launch as a major breakthrough, even suggesting that Google could be stealing market share from search engines.
But the recent dark turns the bot has taken raises the question of whether the bot should be fully retired.
“Bing chat sometimes slanders real, living people. This often leaves users feeling deeply emotional. It sometimes suggests that users are harming others,” said Arvind Narayanan, a computer science professor at Princeton University who studies artificial intelligence. “Microsoft is irresponsible for releasing it so quickly and it would be far worse if they released it to everyone without fixing these issues.”
In 2016, Microsoft shut down a chatbot based on a different type of AI technology called “Tay” after users prompted it to spread racism and Holocaust denial.
Microsoft communications director Caitlin Roulston said in a statement this week that thousands of people have used the new Bing and provided feedback “which allows the model to learn and already make many improvements.”
But there is a financial incentive for companies to use the technology before potential harm is mitigated: to find new use cases for what their models can do.
Speaking on Tuesday at a conference on generative AI, former OpenAI vice president for research Dario Amodei said onstage that the company had found unexpected abilities while training its large language model GPT-3, like speaking Italian or programming in Python. When they released it to the public, they learned from a user’s tweet that it can also build websites in JavaScript.
“You have to deploy it to a million people before you discover some of the things it can do,” said Amodei, who left OpenAI to co-found AI startup Anthropic, which recently received funding from Google.
“There’s a concern that I, hey, I can build a model that’s very good for cyberattacks or something and I don’t even know I made it,” he added.
Microsoft’s Bing is based on technology developed with OpenAI, in which Microsoft has invested.
Microsoft published several articles about its approach to responsible AI earlier this month, including from its President Brad Smith. “We must enter this new era with enthusiasm for the promise, yet with eyes wide open and determined to tackle the inevitable pitfalls that also lie ahead,” he wrote.
The way large language models work makes them difficult to fully understand, even for the people who created them. The big tech companies behind them are also in vicious competition for what they see as the next frontier of highly profitable technology, adding another layer of secrecy.
The concern here is that these technologies are black boxes, Marcus said, and no one knows exactly how to properly and adequately govern them. “Essentially, they’re using the public as subjects in an experiment they don’t really know the outcome of,” Marcus said. “Could these things affect people’s lives? Sure they could. Has this been checked well? Definitely not.”