In recent weeks, new generative AI tools — like OpenAI’s ChatGPT, Microsoft’s chatbot Bing, and Google’s Bard — have come under the spotlight for discussion about their potential to transform the way journalists work.
For data and computer journalists in particular, AI tools like ChatGPT have the potential to help with a variety of tasks such as writing code, scraping PDF files, and translating between programming languages. But tools like ChatGPT are far from perfect and have been shown to “hallucinate” data and spread errors throughout the text they generate.
We spoke to Nicholas Diakopoulos, associate professor of communications and computer science at Northwestern University and former Tow Fellow, about how to manage these risks, whether ChatGPT can be a helpful tool for journalism students and novice coders, and how journalists can track their steps when using AI.
Diakopoulos recently launched the Generative AI in the Newsroom project, which explores how journalists can use generative AI responsibly.
As part of the project, news producers are encouraged to submit a pitch on how they envision using the technology for news production. You can read more about the project here. This conversation has been edited and condensed for clarity.
SG: For journalists and students who are new to programming, how useful do you think ChatGPT can be for math problems?
ND: As a user myself, I’ve found that ChatGPT can certainly be useful for solving certain types of programming challenges. But I’m also aware that you already need a fairly high level of programming skill to understand it and write the right queries and then be able to pull the answers together into an actual solution. It could potentially be useful for advanced programmers, as long as you know the basics, how to evaluate the answers, and how to put things together. But if you don’t know how to read code, it will give you an answer, and you won’t really know if it does what you wanted.
There’s a reason we have programming languages. That’s because you need to specify in code exactly how to solve a problem. On the other hand, if you say it in natural language, there are many ambiguities. Obviously ChatGPT is good at guessing how to make your question unique and give you the code you want, but it may not always be right.
I wonder if journalism students will lose some basic skills using ChatGPT for assignments. When it comes to learning to code, is it better for students to learn how to write code from scratch than to rely on ChatGPT?
One lens through which I view this issue is substitution versus complementarity of AI. People get scared when they start talking about AI replacing someone’s work. But in reality, most of what we see is AI complementing the work of experts. So you have someone who’s already an expert, and then the AI kind of marries that person and augments them so they’re smarter and more efficient. I think ChatGPT is a great addition for human coders who know something about what they are doing with code and it can really speed up your skills.
They started a project called AI in the Newsroom, where journalists can submit case studies about how they used ChatGPT in the newsroom. How is the project going?
I am also investigating various use cases with the technology myself. I’ve blogged about it and published the pilots to help people in the community understand what the possibilities and limitations are. So overall I’m very happy with the project. I think things are going well. Hopefully we’ll see some of these projects mature and begin release over the next month.
Now that you’ve looked at what journalists are submitting, do you have a better intuition about what things ChatGPT could help in the newsroom?
There are just so many different use cases that people are exploring. I don’t even know if there will be one thing it’s really good at. People are exploring content rewriting, summarizing and personalizing, news finding, translation, and engaged journalism. For me, part of the appeal of the project is exploring this area. Hopefully in a few months these projects can start to mature and get more feedback. I really encourage people to rate their use case. For example, how do you know it’s operating at a level of accuracy and reliability that you can easily implement as part of your workflow?
A major concern among computer journalists is that ChatGPT sometimes “hallucinates” data. For example, maybe you use it to extract data from a PDF and everything works fine on the first page. But when you do that with 2,000 PDFs, suddenly errors are scattered everywhere. How do you deal with this risk?
Accuracy is a core value of journalism. There is an element of statistical uncertainty in AI systems and machine learning systems, which means that it is essentially impossible to guarantee 100% accuracy. So you want your system to be as accurate as possible. But at the end of the day, while this is a core journalistic value and something to strive for, whether something needs to be 100 percent accurate or not depends on the type of claims you want to make with the information generated from them AI system.
So if you want a system that will identify people committing fraud based on analysis of a bunch of PDF documents, and you plan to publicly indict those people based on your analysis of those documents, you should be damn sure it does correct is. Having spoken to journalists about such things for years, they probably won’t rely on just one machine learning tool to find this evidence. You could use that as a starting point. But then they will triangulate that with other sources of evidence to increase their certainty.
However, there might be other use cases where it doesn’t really matter if there’s a 2 or 5 percent margin of error because you might see a big trend. Perhaps the trend is so big that a 5 percent margin of error doesn’t even cover it up with a little bit of error around it. Therefore, it is important to think about the use case and how much error it can tolerate. Then you can find out how much error this generative AI tool produces? Does it actually meet my needs in terms of the types of evidence I want to present for the types of claims I want to make?
Do you envisage some kind of AI course or tutorial for journalists on responsible use of AI in the future?
I want to avoid a future where people feel like they can totally rely on automation. There may be some hard and fast rules for situations where you need to manually inspect the output and situations where you don’t need to inspect it. But I would like to think that much lies between these two extremes. The Society for Professional Journalists publishes a book entitled Media Ethics which contains basically all of their case studies and reflections on different types of journalistic ethics. It might be interesting to think about it that way, maybe this book needs a chapter on AI to analyze in which situations more problematic things can happen and in which less.
Maybe it’s not that different from today, when we have these central journalistic constructs like accuracy or the Do No Harm principle. When you release information, your goal is to weigh the value of public interest information against the potential harm it could cause to an innocent person. So you have to balance these two things. When you think of AI or generative AI bugs summarizing something, applying this type of rubric can be useful. What is the potential harm that could result from this error? Who could be hurt by this information? What damage could this information cost?
Yes, and journalists also make mistakes when dealing with data.
There is a difference, however, and it comes back to the issue of accountability. When a human makes a mistake, you have a very clear line of accountability. Someone can explain their process and see why they overlooked this thing or made a mistake. This is not to say that AI shouldn’t be accountable. It’s just that tracking human accountability through the AI system is much more complex.
If a generative AI system makes a mistake in a summary, you could blame Open AI if they created the AI system. However, by using their system, you also agree to their terms of service and accept responsibility for the accuracy of the output. So Open AI says it’s your responsibility as a user and they assign responsibility to you. You don’t want to be held responsible for the mistake. And by contract they made you responsible for it. So now it’s your problem. Are you willing to take responsibility and be accountable, like the journalist or news organization using this tool?
How would a journalist keep track of using AI if they had to resort to error?
That’s a great question. Chasing prompts is one way to think about it. So that as a user of the technology it gives an idea of what my role in using the technology was. What were the parameters I used to trigger the technology? That’s at least a starting point. So if I did something irresponsible in my request, that would be an example of negligence. For example, if I’m prompting something to summarize a document, but I’ve set the temperature to 0.9t and a high temperature means the output is much more random.
Here’s what you should know when using these models. You should know that high temperature introduces much more noise into the output. So you may have some responsibility if this output contains an error. Perhaps you should have set the temperature to zero or much lower to reduce the potential for randomness in the output. I think as a user you should be responsible for how you enter prompts and what parameters you choose and ready to explain how you use the technology.
Sarah Grevy Gotfredsen