3 Ways to Tame ChatGPT

Governments around the world are pushing AI regulation that has nothing to say about generative models. That could be dangerous.
Abstract illustration of AI and a feedback loop
Illustration: WIRED Staff; Getty Images

This year, we’ve seen the introduction of powerful generative AI systems that have the ability to create images and text on demand. 

At the same time, regulators are on the move. Europe is in the middle of finalizing its AI regulation (the AI Act), which aims to put strict rules on high-risk AI systems. Canadathe UKthe US, and China have all introduced their own approaches to regulating high-impact AI. But general-purpose AI seems to be an afterthought rather than the core focus. When Europe’s new regulatory rules were proposed in April 2021, there was no single mention of general-purpose, foundational models, including generative AI. Barely a year and a half later, our understanding of the future of AI has radically changed. An unjustified exemption of today’s foundational models from these proposals would turn AI regulations into paper tigers that appear powerful but cannot protect fundamental rights.

ChatGPT made the AI paradigm shift tangible. Now, a few models—such as GPT-3, DALL-E, Stable Diffusion, and AlphaCode—are becoming the foundation for almost all AI-based systems.  AI startups can adjust the parameters of these foundational models to better suit their specific tasks. In this way, the foundational models can feed a high number of downstream applications in various fields, including marketing, sales, customer service, software development, design, gaming, education, and law. 

While foundational models can be used to create novel applications and business models, they can also become a powerful way to spread misinformation, automate high-quality spam, write malware, and plagiarize copyrighted content and inventions. Foundational models have been proven to contain biases and generate stereotyped or prejudiced content. These models can accurately emulate extremist content and could be used to radicalize individuals into extremist ideologies. They have the capability to deceive and present false information convincingly. Worryingly, the potential flaws in these models will be passed on to all subsequent models, potentially leading to widespread problems if not deliberately governed.

The problem of “many hands” refers to the challenge of attributing moral responsibility for outcomes caused by multiple actors, and it is one of the key drivers of eroding accountability when it comes to algorithmic societies. Accountability for the new AI supply chains, where foundational models feed hundreds of downstream applications, must be built on end-to-end transparency. Specifically, we need to strengthen the transparency of the supply chain on three levels and establish a feedback loop between them.

Transparency in the foundational models is critical to enabling researchers and the entire downstream supply chain of users to investigate and understand the models’ vulnerabilities and biases. Developers of the models have themselves acknowledged this need. For example, DeepMind’s researchers suggest that the harms of large language models must be addressed by collaborating with a wide range of stakeholders building on a sufficient level of explainability and interpretability to allow efficient detection, assessment, and mitigation of harms. Methodologies for standardized measurement and benchmarking, such as Standford University’s HELM, are needed. These models are becoming too powerful to operate without assessment by researchers and independent auditors. Regulators should ask: Do we understand enough to be able to assess where the models should be applied and where they must be prohibited? Can the high-risk downstream applications be properly evaluated for safety and robustness with the information at hand?

Transparency in the use of foundational models. The organizations deploying these models for a specific use case will ultimately determine whether they are suitable and meet the necessary performance and robustness requirements. However, transparency around the use of these foundational models is essential to making potential harms visible. Deployers must credit the foundational models involved, enabling users, auditors, and the broader community to evaluate the risks of these downstream applications.

Transparency in the outcomes created by AI. One of the biggest transparency challenges is the last-mile issue: distinguishing AI-generated content from that created by humans. In the past week, many of us have been fooled by LinkedIn posts written by ChatGPT. Various industry actors have recognized the problem, and everyone seems to agree on the importance of solving it. But the technical solutions are still being developed. People have proposed labeling AI-generated content with watermarks as a way to address copyright issues and detect potentially prohibited and malicious uses. Some experts say an ideal solution would be one that a human reader could not discern but that would still enable highly confident detection. This way, the labeling wouldn’t significantly interfere with user experiences of all AI-created content but would still enable better filtering for content misuse.

Feedback loops. Large generative models are known to be highly unpredictable, which makes it also difficult to anticipate the consequences of their development and deployment. This unpredictable nature, together with the many hands problem, makes it important to have an additional level of transparency—feedback loops—that can help both the industry and regulators ensure better-aligned and safer solutions.

Alignment techniques, which employ human-given feedback to instruct AI, were successfully used to train GPT-3 to produce less offensive language and less misinformation and make fewer mistakes. The approach used reinforcement learning to teach the model, drawing on feedback from 40 human trainers hired to phrase and rate GPT-3’s responses. Outcomes of the aligned model received a positive response, and this encouraging example underlines the importance of human involvement in the evaluation of AI outcomes.

More focus from both industry players and regulators is needed to scale and standardize notification mechanisms that would enable users to report false, biased, or harmful outputs created by foundational models. This can serve two important targets: helping further train foundational models with human feedback from downstream applications and providing researchers and regulators with real-world data to inform the development of risk mitigations and policies.

Recent weeks have offered a glimpse into the current state of AI capabilities, which are both fascinating and  worrying. Regulators will need to adjust their thinking to address coming developments, and industry will need to collaborate with policymakers as we navigate the next AI paradigm shift.