Research by Xiao Ma, Swaroop Mishra, Ahmad Beirami, Alex Beutel, and Jilin Chen’s pivots on an examination of artificial intelligence (AI) language models within the context of moral reasoning tasks. The goal is not merely to comprehend these models’ performance but, more fundamentally, to devise methodologies that may enhance their ethical cognition capabilities. The impetus for such an endeavor stems from the explicit recognition of the limitations inherent in AI when applied to tasks demanding ethical discernment. From a broader perspective, these efforts are rooted in the mandate to develop AI that can be responsibly deployed, one that is equipped with a nuanced understanding of moral and ethical contours. The two methods employed by the researchers – zero-shot and few-shot prompting – emerge as the central axes around which the investigation rotates. These approaches offer novel strategies to navigate the complexities of AI moral reasoning, thereby laying the foundation for the experimental structure and results that constitute the core of their study.
The researchers build their theoretical and conceptual framework on the construct of ‘zero-shot’ and ‘few-shot’ prompting, a mechanism where AI is given either no examples (zero-shot) or a few examples (few-shot) to learn and extrapolate from. For this, two specific approaches are employed: direct zero-shot, Chain-of-Thought (CoT) and a novel technique, Thought Experiments (TE). The TE approach is of particular interest as it represents a unique multi-step framework that actively guides the AI through a sequence of counterfactual questions, detailed answers, summarization, choice, and a final simple zero-shot answer. This distinctive design is intended to circumvent the limitations faced by AI models in handling complex moral reasoning tasks, thereby allowing them to offer a more sophisticated understanding of the ethical dimensions inherent in a given scenario. The aspiration, through this comprehensive methodological framework, is to offer pathways for AI models to respond in more ethically informed ways to the challenges of moral reasoning.
Methodology and results
Ma et al. juxtapose the baseline of direct zero-shot prompting with more nuanced structures like Chain-of-Thought (CoT) and the novel Thought Experiments (TE). The latter two approaches operate on both a zero-shot and few-shot level. In the case of TE, an intricate sequence is proposed involving counterfactual questioning, detailed answering, summarization, choice, and a final simplified answer. The authors test these methods on the Moral Scenarios subtask in the MMLU benchmark, a testbed known for its robustness. For the model, they utilize the Flan-PaLM 540B with a temperature of 0.7 across all trials. The researchers report task accuracy for each method, thus laying a quantitative groundwork for their subsequent comparisons. Their methodological approach draws strength from its layered complexity and the use of a recognized model, and shows promise in gauging the model’s ability to reason morally.
Despite the simplicity of the zero-shot method, results reveal a noteworthy 60% task accuracy for the direct variant, with the CoT and TE variants showing a respective accuracy increase of 8% and 12%. Although TE significantly outperforms the zero-shot baseline, the few-shot iteration of the method displays no notable improvement over its zero-shot counterpart, suggesting a saturation point in model performance. Furthermore, a critical observation by the authors exposes the model’s tendency towards endorsing positive sounding responses, which might skew the outcomes and mask the true moral reasoning capability of the AI. The researchers’ examination of their system’s vulnerability to leading prompts also exposes the inherent susceptibility of AI models to potentially manipulative inputs, a poignant takeaway for futures studies concerning AI’s ethical resilience.
The Broader Philosophical Discourse
By exposing the susceptibility of AI models to leading prompts, the study underscores a vital discourse within philosophy – the challenge of imbuing AI systems with robust and unbiased moral reasoning capabilities. As AI technologies evolve and penetrate deeper into human life, their ethical resilience becomes paramount. Furthermore, the study’s exploration of the efficacy of different prompting strategies adds to the ongoing conversation about the best ways to inculcate moral reasoning in AI. By illuminating the AI’s propensity to endorse positive sounding responses, the authors highlight the difficulty of aligning AI systems with complex human morality – a subject at the forefront of philosophical discussions about AI and ethics. In this way, the work of Ma et al. situates itself within, and contributes to, the evolving philosophical narrative on the ethical implications of AI development.
Abstract
Language models still struggle on moral reasoning, despite their impressive performance in many other tasks. In particular, the Moral Scenarios task in MMLU (Multi-task Language Understanding) is among the worst performing tasks for many language models, including GPT-3. In this work, we propose a new prompting framework, Thought Experiments, to teach language models to do better moral reasoning using counterfactuals. Experiment results show that our framework elicits counterfactual questions and answers from the model, which in turn helps improve the accuracy on Moral Scenarios task by 9-16% compared to other zero-shot baselines. Interestingly, unlike math reasoning tasks, zero-shot Chain-of-Thought (CoT) reasoning doesn’t work out of the box, and even reduces accuracy by around 4% compared to direct zero-shot. We further observed that with minimal human supervision in the form of 5 few-shot examples, the accuracy of the task can be improved to as much as 80%.
Let’s Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning

