Democracy and the AI Value Alignment Problem -

Policy Making with Human Aligned AI

Let's look at the alignment problem in more detail, because without a more specific (read technical) understanding, our mission may seem vague. There are two parts to this understanding. Key to both parts is the humans in the loop paradigm. This paradigm simply says that in order to produce AI systems that are aligned with human values, humans must be a part of the training phase for these AI systems.

First part is that with humans in the loop, AI can design economic policy mechanisms that a wider majority of humans (not just those in the training loop) prefer. This has been shown in a number of articles, see the first article below for example.

The second part is that models like ChatGPT have achieved stunning success only after their earlier unsupervised versions have been fine-tuned through human feedback. How is this done? After the base model is trained in an unsupervised way from a huge corpus of textual data, the model is refined by human annotators who teach it what responses should be to given prompts. After this fine-tuning, an additional reward model is trained. This model asks humans again to rate the responses given by the refined model to a select group of prompts. And finally, the reward model is used in a reinforcement learning fashion (repeatedly) to score responses given by the refined model. This 3-step procedure is known as Reinforcement Learning with Human Feedback (RLHF). RLHF-trained models give answers that are much better aligned with human values, they are more detailed, and they have the important ability of detecting questions that are inappropriate or questions that may be appropriate but which it was not trained to answer.

Now let's combine these two parts to yield a potential AI system that would design national-level policy (like public healthcare, social security, gun control, immigration, etc.) based on democratic feedback. There are two ways to proceed. First, through a direct democracy process, i.e. allowing any citizen to rate the responses given by the AI system. This is to a large extent what is happening now, when you rate the various responses given to you by ChatGPT. The problem with this way of setting policy is the same problem that Socrates and the US Founding Fathers did not like about direct democracy. The other alternative is to build a system where only the representatives (for example Congressmen or Senators) are rating the system's responses. That is one of our goals in SD-AI, although this will require much time, effort, and governmental buy-in.

Human-centred mechanism design with Democratic AI

By Nature/ July 4, 2022

"In AI research, there is a growing realization that to build human-compatible systems, we need new research methods in which humans and agents interact, and an increased effort to learn values directly from humans to build value-aligned AI. Capitalizing on this idea, here we combined modern deep RL with an age-old technology for arbitrating among conflicting views—majoritarian democracy among human voters—to develop a human-centred research pipeline for value-aligned AI research. Instead of imbuing our agents with purportedly human values a priori, and thus potentially biasing systems towards the preferences of AI researchers, we train them to maximize a democratic objective: to design policies that humans prefer and thus will vote to implement in a majoritarian election. We call our approach, which extends recent related participatory approaches, Democratic AI"

How Artificial Intelligence Can Aid Democracy

By BRUCE SCHNEIER, HENRY FARRELL, AND NATHAN E. SANDERS/ April 21, 2023

"An A.I. built for public benefit could be tailor-made for those use cases where technology can best help democracy. It could plausibly educate citizens, help them deliberate together, summarize what they think, and find possible common ground. Politicians might use large language models, or LLMs, like GPT4 to better understand what their citizens want. Today, state-of-the-art A.I. systems are controlled by multibillion-dollar tech companies: Google, Meta, and OpenAI in connection with Microsoft. These companies get to decide how we engage with their A.I.s and what sort of access we have. They can steer and shape those A.I.s to conform to their corporate interests. That isn’t the world we want. Instead, we want A.I. options that are both public goods and directed toward public good."

AI Alignment and Future Threats

By Brent Skorup/ April 14, 2023

"AI technologies must be analyzed in light of America’s loss of esteem for social pluralism. The U.S. governing factions are stuck in a dangerous arms race to dominate each other. My colleague Martin Gurri points to a related problem in many nations, that the governing classes have lost their control of media narratives and the trust of the public. In the U.S., rather than acknowledging the new reality and attempting reform, the governing classes have turned on each other. It is a sad development that political violence, threats against family members, spying and investigations by intelligence agencies, secret surveillance warrants relying on false statements, threats of imprisonment and actual imprisonment are today real possibilities for U.S. political leaders, their staff members and judges—not to mention lesser forms of intimidation and sabotage."

Symbiosis, not alignment, as the goal for liberal democracies in the transition to artificial general intelligence

By simonfriederich/ March 17, 2023

"A transition to a world with artificial general intelligence (AGI) may occur within the next few decades. This transition may give rise to catastrophic risks from misaligned AGI, which have received a significant amount of attention, deservedly. Here I argue that AGI systems that are intent-aligned – they always try to do what their operators want them to do – would also create catastrophic risks, mainly due to the power that they concentrate on their operators. With time, that power would almost certainly be catastrophically exploited, potentially resulting in human extinction or permanent dystopia. I suggest that liberal democracies, if they decide to allow the development of AGI, may react to this threat by letting AGI take shape as an intergenerational social project, resulting in an arrangement where AGI is not intent-aligned but symbiotic with humans. I provide some tentative ideas on what the resulting arrangement may look like and consider what speaks for and what against aiming for intent-aligned AGI as an intermediate step."

Challenges of Implementing AI With “Democratic Values”: Lessons From Algorithmic Transparency

By Matt O'Shaughnessy/ April 26, 2023

"Like other norms and principles for AI governance, efforts to make the inner workings of algorithms more transparent provide utility to policymakers and researchers alike. More detailed information about how AI systems are used can enable better evidence-based policy for algorithms, help users understand when algorithmic systems are reliable, and expose developers’ thorny design trade-offs to meaningful debate. Calling for transparency is an easy and noncontroversial step for policymakers—and one that does not require deep engagement with the technical details of AI systems. But it also avoids the more difficult and value-laden questions of what algorithms should do and how complex trade-offs should be made in their design."

The political economy of AI: Towards democratic control of the means of prediction

By Maximilian Kasy/ April 14, 2023

This chapter discusses the regulation of artificial intelligence (AI) from the vantage point of political economy. By “political economy” I mean a perspective which emphasizes that there are different people and actors in society who have divergent interests and unequal access to resources and power. By “artificial intelligence” I mean the construction of autonomous systems that maximize some notion of reward. The construction of such systems typically draws on the tools of machine learning and optimization. AI and machine learning are used in an ever wider array of socially consequential settings. This includes labor markets, education, criminal justice, health, banking, housing, as well as the curation of information by search engines, social networks, and recommender systems. There is a need for public debates about desirable directions of technical innovation, the use of technologies, and constraints to be imposed on technologies. In this chapter, I review some frameworks to help structure such debates.