(Featured) Let’s Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning

Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning

Research by Xiao Ma, Swaroop Mishra, Ahmad Beirami, Alex Beutel, and Jilin Chen’s pivots on an examination of artificial intelligence (AI) language models within the context of moral reasoning tasks. The goal is not merely to comprehend these models’ performance but, more fundamentally, to devise methodologies that may enhance their ethical cognition capabilities. The impetus for such an endeavor stems from the explicit recognition of the limitations inherent in AI when applied to tasks demanding ethical discernment. From a broader perspective, these efforts are rooted in the mandate to develop AI that can be responsibly deployed, one that is equipped with a nuanced understanding of moral and ethical contours. The two methods employed by the researchers – zero-shot and few-shot prompting – emerge as the central axes around which the investigation rotates. These approaches offer novel strategies to navigate the complexities of AI moral reasoning, thereby laying the foundation for the experimental structure and results that constitute the core of their study.

The researchers build their theoretical and conceptual framework on the construct of ‘zero-shot’ and ‘few-shot’ prompting, a mechanism where AI is given either no examples (zero-shot) or a few examples (few-shot) to learn and extrapolate from. For this, two specific approaches are employed: direct zero-shot, Chain-of-Thought (CoT) and a novel technique, Thought Experiments (TE). The TE approach is of particular interest as it represents a unique multi-step framework that actively guides the AI through a sequence of counterfactual questions, detailed answers, summarization, choice, and a final simple zero-shot answer. This distinctive design is intended to circumvent the limitations faced by AI models in handling complex moral reasoning tasks, thereby allowing them to offer a more sophisticated understanding of the ethical dimensions inherent in a given scenario. The aspiration, through this comprehensive methodological framework, is to offer pathways for AI models to respond in more ethically informed ways to the challenges of moral reasoning.

Methodology and results

Ma et al. juxtapose the baseline of direct zero-shot prompting with more nuanced structures like Chain-of-Thought (CoT) and the novel Thought Experiments (TE). The latter two approaches operate on both a zero-shot and few-shot level. In the case of TE, an intricate sequence is proposed involving counterfactual questioning, detailed answering, summarization, choice, and a final simplified answer. The authors test these methods on the Moral Scenarios subtask in the MMLU benchmark, a testbed known for its robustness. For the model, they utilize the Flan-PaLM 540B with a temperature of 0.7 across all trials. The researchers report task accuracy for each method, thus laying a quantitative groundwork for their subsequent comparisons. Their methodological approach draws strength from its layered complexity and the use of a recognized model, and shows promise in gauging the model’s ability to reason morally.

Despite the simplicity of the zero-shot method, results reveal a noteworthy 60% task accuracy for the direct variant, with the CoT and TE variants showing a respective accuracy increase of 8% and 12%. Although TE significantly outperforms the zero-shot baseline, the few-shot iteration of the method displays no notable improvement over its zero-shot counterpart, suggesting a saturation point in model performance. Furthermore, a critical observation by the authors exposes the model’s tendency towards endorsing positive sounding responses, which might skew the outcomes and mask the true moral reasoning capability of the AI. The researchers’ examination of their system’s vulnerability to leading prompts also exposes the inherent susceptibility of AI models to potentially manipulative inputs, a poignant takeaway for futures studies concerning AI’s ethical resilience.

The Broader Philosophical Discourse

By exposing the susceptibility of AI models to leading prompts, the study underscores a vital discourse within philosophy – the challenge of imbuing AI systems with robust and unbiased moral reasoning capabilities. As AI technologies evolve and penetrate deeper into human life, their ethical resilience becomes paramount. Furthermore, the study’s exploration of the efficacy of different prompting strategies adds to the ongoing conversation about the best ways to inculcate moral reasoning in AI. By illuminating the AI’s propensity to endorse positive sounding responses, the authors highlight the difficulty of aligning AI systems with complex human morality – a subject at the forefront of philosophical discussions about AI and ethics. In this way, the work of Ma et al. situates itself within, and contributes to, the evolving philosophical narrative on the ethical implications of AI development.

Abstract

Language models still struggle on moral reasoning, despite their impressive performance in many other tasks. In particular, the Moral Scenarios task in MMLU (Multi-task Language Understanding) is among the worst performing tasks for many language models, including GPT-3. In this work, we propose a new prompting framework, Thought Experiments, to teach language models to do better moral reasoning using counterfactuals. Experiment results show that our framework elicits counterfactual questions and answers from the model, which in turn helps improve the accuracy on Moral Scenarios task by 9-16% compared to other zero-shot baselines. Interestingly, unlike math reasoning tasks, zero-shot Chain-of-Thought (CoT) reasoning doesn’t work out of the box, and even reduces accuracy by around 4% compared to direct zero-shot. We further observed that with minimal human supervision in the form of 5 few-shot examples, the accuracy of the task can be improved to as much as 80%.

Let’s Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning

(Featured) The AI Commander Problem: Ethical, Political, and Psychological Dilemmas of Human-Machine Interactions in AI-enabled Warfare

The AI Commander Problem: Ethical, Political, and Psychological Dilemmas of Human-Machine Interactions in AI-enabled Warfare

James Johnson explores the ethical and psychological implications of integrating AI into warfare. The author argues that the use of autonomous weapons in warfare may create moral vacuums that eliminate meaningful ethical and moral deliberation in the quest for riskless and rational war. Moreover, the author argues that the human-machine integration process is part of a broader evolutionary dovetailing of humanity and technology. The logical end of this trajectory is an AI commander, which would effectively outsource ethical decision-making to machines that are ill-equipped to fill this ethical and moral void.

The author also explores the limitations of AI in distinguishing between legitimate and illegitimate targets in asymmetric conflicts, such as insurgencies and civil wars. He stresses the importance of recognizing the personhood of the enemy in warfare and argue that until AI can achieve this moral standing, it will be unable to meet the requirements of jus in bello. Additionally, the Johnson argues that human judgment and prediction, while imperfect, are still necessary in warfare because of the subtle cues that humans can recognize that machines cannot.

The paper highlights three key psychological insights regarding human-machine interactions and political-ethical dilemmas in future AI-enabled warfare. First, the Johnson argues that human-machine integration is a socio-technical psychological process that is part of a broader evolutionary dovetailing of humanity and technology. Second, he argues that biases associated with human-machine interactions can compound the “illusion of control” problem. Third, he suggests that coding human ethics into AI algorithms is technically, theoretically, ontologically, and psychologically problematic and ethically and morally questionable.

This paper raises important philosophical questions about the relationship between technology and ethics. It highlights the risks associated with outsourcing ethical decision-making to machines and emphasizes the importance of recognizing the personhood of the enemy in warfare. The paper also underscores the limitations of AI in distinguishing between legitimate and illegitimate targets and the importance of human judgment in recognizing subtle cues that machines cannot. Ultimately, this paper challenges us to consider the role of technology in shaping our ethical and moral decision-making processes.

Future research in this area could explore the psychological and ethical implications of human-machine integration in other domains, such as healthcare or criminal justice. Additionally, research could focus on developing AI systems that are capable of understanding the complexities of human ethics and morality. This research could also explore ways to incorporate ethical decision-making into AI algorithms without sacrificing human agency and accountability. Finally, research could explore the broader philosophical implications of the use of AI in warfare and consider the ethical and moral implications of a world in which machines are increasingly integrated into our lives.

Abstract

Can AI solve the ethical, moral, and political dilemmas of warfare? How is artificial intelligence (AI)-enabled warfare changing the way we think about the ethical-political dilemmas and practice of war? This article explores the key elements of the ethical, moral, and political dilemmas of human-machine interactions in modern digitized warfare. It provides a counterpoint to the argument that AI “rational” efficiency can simultaneously offer a viable solution to human psychological and biological fallibility in combat while retaining “meaningful” human control over the war machine. This Panglossian assumption neglects the psychological features of human-machine interactions, the pace at which future AI-enabled conflict will be fought, and the complex and chaotic nature of modern war. The article expounds key psychological insights of human-machine interactions to elucidate how AI shapes our capacity to think about future warfare’s political and ethical dilemmas. It argues that through the psychological process of human-machine integration, AI will not merely force-multiply existing advanced weaponry but will become de facto strategic actors in warfare – the “AI commander problem.”

The AI Commander Problem: Ethical, Political, and Psychological Dilemmas of Human-Machine Interactions in AI-enabled Warfare