(Featured) Let’s Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning

Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning

Research by Xiao Ma, Swaroop Mishra, Ahmad Beirami, Alex Beutel, and Jilin Chen’s pivots on an examination of artificial intelligence (AI) language models within the context of moral reasoning tasks. The goal is not merely to comprehend these models’ performance but, more fundamentally, to devise methodologies that may enhance their ethical cognition capabilities. The impetus for such an endeavor stems from the explicit recognition of the limitations inherent in AI when applied to tasks demanding ethical discernment. From a broader perspective, these efforts are rooted in the mandate to develop AI that can be responsibly deployed, one that is equipped with a nuanced understanding of moral and ethical contours. The two methods employed by the researchers – zero-shot and few-shot prompting – emerge as the central axes around which the investigation rotates. These approaches offer novel strategies to navigate the complexities of AI moral reasoning, thereby laying the foundation for the experimental structure and results that constitute the core of their study.

The researchers build their theoretical and conceptual framework on the construct of ‘zero-shot’ and ‘few-shot’ prompting, a mechanism where AI is given either no examples (zero-shot) or a few examples (few-shot) to learn and extrapolate from. For this, two specific approaches are employed: direct zero-shot, Chain-of-Thought (CoT) and a novel technique, Thought Experiments (TE). The TE approach is of particular interest as it represents a unique multi-step framework that actively guides the AI through a sequence of counterfactual questions, detailed answers, summarization, choice, and a final simple zero-shot answer. This distinctive design is intended to circumvent the limitations faced by AI models in handling complex moral reasoning tasks, thereby allowing them to offer a more sophisticated understanding of the ethical dimensions inherent in a given scenario. The aspiration, through this comprehensive methodological framework, is to offer pathways for AI models to respond in more ethically informed ways to the challenges of moral reasoning.

Methodology and results

Ma et al. juxtapose the baseline of direct zero-shot prompting with more nuanced structures like Chain-of-Thought (CoT) and the novel Thought Experiments (TE). The latter two approaches operate on both a zero-shot and few-shot level. In the case of TE, an intricate sequence is proposed involving counterfactual questioning, detailed answering, summarization, choice, and a final simplified answer. The authors test these methods on the Moral Scenarios subtask in the MMLU benchmark, a testbed known for its robustness. For the model, they utilize the Flan-PaLM 540B with a temperature of 0.7 across all trials. The researchers report task accuracy for each method, thus laying a quantitative groundwork for their subsequent comparisons. Their methodological approach draws strength from its layered complexity and the use of a recognized model, and shows promise in gauging the model’s ability to reason morally.

Despite the simplicity of the zero-shot method, results reveal a noteworthy 60% task accuracy for the direct variant, with the CoT and TE variants showing a respective accuracy increase of 8% and 12%. Although TE significantly outperforms the zero-shot baseline, the few-shot iteration of the method displays no notable improvement over its zero-shot counterpart, suggesting a saturation point in model performance. Furthermore, a critical observation by the authors exposes the model’s tendency towards endorsing positive sounding responses, which might skew the outcomes and mask the true moral reasoning capability of the AI. The researchers’ examination of their system’s vulnerability to leading prompts also exposes the inherent susceptibility of AI models to potentially manipulative inputs, a poignant takeaway for futures studies concerning AI’s ethical resilience.

The Broader Philosophical Discourse

By exposing the susceptibility of AI models to leading prompts, the study underscores a vital discourse within philosophy – the challenge of imbuing AI systems with robust and unbiased moral reasoning capabilities. As AI technologies evolve and penetrate deeper into human life, their ethical resilience becomes paramount. Furthermore, the study’s exploration of the efficacy of different prompting strategies adds to the ongoing conversation about the best ways to inculcate moral reasoning in AI. By illuminating the AI’s propensity to endorse positive sounding responses, the authors highlight the difficulty of aligning AI systems with complex human morality – a subject at the forefront of philosophical discussions about AI and ethics. In this way, the work of Ma et al. situates itself within, and contributes to, the evolving philosophical narrative on the ethical implications of AI development.

Abstract

Language models still struggle on moral reasoning, despite their impressive performance in many other tasks. In particular, the Moral Scenarios task in MMLU (Multi-task Language Understanding) is among the worst performing tasks for many language models, including GPT-3. In this work, we propose a new prompting framework, Thought Experiments, to teach language models to do better moral reasoning using counterfactuals. Experiment results show that our framework elicits counterfactual questions and answers from the model, which in turn helps improve the accuracy on Moral Scenarios task by 9-16% compared to other zero-shot baselines. Interestingly, unlike math reasoning tasks, zero-shot Chain-of-Thought (CoT) reasoning doesn’t work out of the box, and even reduces accuracy by around 4% compared to direct zero-shot. We further observed that with minimal human supervision in the form of 5 few-shot examples, the accuracy of the task can be improved to as much as 80%.

Let’s Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning

(Featured) ChatGPT: deconstructing the debate and moving it forward

ChatGPT: deconstructing the debate and moving it forward

Mark Coeckelbergh’s and David J. Gunkel’s critical analysis compels us to reevaluate our understanding of authorship, language, and the generation of meaning in the realm of Artificial Intelligence. The analysis of ChatGPT extrapolates beyond a mere understanding of the model as an algorithmic tool, but rather as an active participant in the construction of language and meaning, challenging longstanding preconceptions around authorship. The key argument lies in the subversion of traditional metaphysics, offering a vantage point from which to reinterpret the role of language and ethics in AI.

The research further offers a critique of Platonic metaphysics, which has historically served as the underpinning for many normative questions. The authors advance an anti-foundationalist perspective, suggesting that the performances and the materiality of text, inherently, possess and create their own meaning and value. The discourse decouples questions of ethics and semantics from their metaphysical moorings, thereby directly challenging traditional conceptions of moral and semantic authority.

Contextualizing the ChatGPT

The examination of ChatGPT provides a distinct perspective on the ways AI can be seen as a participant in authorship and meaning-making processes. Grounded in the extensive training data and iterative development of the model, the role of the AI is reframed, transgressing the conventional image of AI as an impersonal tool for human use. The underlying argument asserts the importance of acknowledging the role of AI in not only generating text but also in constructing meaning, thereby influencing the larger context in which it operates. In doing so, the article probes the interplay between large language models, authorship, and the very nature of language, reflecting on the ethical and philosophical considerations intertwined within.

The discourse contextualizes the subject within the framework of linguistic performativity, emphasizing the transformative dynamics of AI in our understanding of authorship and text generation. Specifically, the authors argue that in the context of ChatGPT, authorship is diffused, moving beyond the sole dominion of the human user to a shared responsibility with the AI system. The textual productions of AI become not mere reflections of pre-established human language patterns, but also active components in the construction of new narratives and meaning. This unique proposition incites a paradigm shift in our understanding of large language models, and the author provides a substantive foundation for this perspective within the framework of the research.

Anti-foundationalism, Ethical Pluralism and AI

The authors champion a view of language and meaning as a contingent, socially negotiated construct, thereby challenging the Platonic metaphysical model that prioritizes absolute truth or meaning. Within the sphere of AI, this perspective disavows the idea of a univocal foundation for value and meaning, asserting instead that AI systems like ChatGPT contribute to meaning-making processes in their interactions and performances. This stance, while likely to incite concerns of relativism, is supported by scholarly concepts such as ethical pluralism and an appreciation of diverse standards, which envision shared norms coexisting with a spectrum of interpretations. The authors extend this philosophical foundation to the development of large language models, arguing for an ethical approach that forefronts the needs and values of a diverse range of stakeholders in the evolution of this technology.

A central theme of the authors’ exploration is the application of ethical pluralism within AI technologies, specifically large language models (LLMs) like ChatGPT. This approach, inherently opposed to any absolute metaphysics, prioritizes cooperation, respect, and continuous renewal of standards. As the authors propose, it’s not about the unilateral decision-making rooted in absolutist beliefs, but rather about co-creation and negotiation of what is acceptable and desirable in a society that is as diverse as its ever-evolving standards. It underscores the role of technologies such as ChatGPT as active agents in the co-construction of meaning, emphasising the need for these technologies to be developed and used responsibly. This responsibility, according to the author, should account for the needs and values of a range of stakeholders, both human and non-human, thus incorporating a wider ethical concern into the AI discourse.

A Turn Towards Responsibility and Future Research Directions

Drawing from the philosophies of Levinas, the authors advocate for a dramatic change in approach, proposing that instead of basing the principles on metaphysical foundations, they should spring from ethical considerations. The authors argue that this shift is a critical necessity for preventing technological practices from devolving into power games. Here, the notion of responsibility extends beyond human agents and encompasses non-human otherness as well, implying a clear departure from traditional anthropocentric paradigms. This proposal requires recognizing the social and technological generation of truth and meaning, acknowledging the performative power structures embedded in technology, and considering the capability to respond to a broad range of others. Consequently, this outlook presents a forward-looking perspective on the ethics and politics of AI technologies, emphasizing the necessity for democratic discussion, ethical reflection, and acknowledgment of their primary role in shaping the path of AI.

This’ critical approach shifts the discourse from the metaphysical to ethical and political questions, prompting considerations about the nature of “good” performances and processes, and the factors determining them. Future investigations should further probe the relationship between power, technology, and authorship, with emphasis on the dynamics of exclusion and marginalization in these processes. The author calls for practical effort and empirical research to uncover the human and nonhuman labour involved in AI technologies, and to examine the fairness of existing decision-making processes. This nexus between technology, philosophy, and language invites interdisciplinary and transdisciplinary inquiries, encompassing fields such as philosophy, linguistics, literature, and more. The authors’ assertions reframe the understanding of authorship and language in the age of AI, presenting a call for a more comprehensive exploration of these interrelated domains in the context of advanced technologies like ChatGPT.

Abstract

Large language models such as ChatGPT enable users to automatically produce text but also raise ethical concerns, for example about authorship and deception. This paper analyses and discusses some key philosophical assumptions in these debates, in particular assumptions about authorship and language and—our focus—the use of the appearance/reality distinction. We show that there are alternative views of what goes on with ChatGPT that do not rely on this distinction. For this purpose, we deploy the two phased approach of deconstruction and relate our finds to questions regarding authorship and language in the humanities. We also identify and respond to two common counter-objections in order to show the ethical appeal and practical use of our proposal.

ChatGPT: deconstructing the debate and moving it forward

(Featured) Machines and metaphors: Challenges for the detection, interpretation and production of metaphors by computer programs

Machines and metaphors: Challenges for the detection, interpretation and production of metaphors by computer programs

Artificial intelligence (AI) and its interaction with human language present a challenging yet intriguing frontier in both linguistics and philosophy. The ability of AI to process and generate language has seen significant advancement, with tools such as GPT-4 demonstrating an impressive capacity to imitate human-like text generation. However, this research article by Jacob Hesse draws attention to an understudied dimension—AI’s capabilities in dealing with metaphors. The author dissects the complexities of metaphor interpretation, positioning it as an intellectual hurdle for AI that tests the boundaries of machine language comprehension. It brings into question whether AI, despite its technical prowess, can successfully navigate the subtleties and nuances that come with understanding, interpreting, and creating metaphors, a quintessential aspect of human communication.

The research article ventures into the philosophical implications of AI’s competence with three specific types of metaphors: Twice-Apt-Metaphors, presuppositional pretence-based metaphors, and self-expressing Indirect Discourse Metaphors (IDMs). The author suggests that these metaphor types require certain faculties such as aesthetic appreciation, a higher-order Theory of Mind, and affective experiential states, which might be absent in AI. This analysis unravels a paradoxical situation, where AI, an embodiment of logical and rational computation, grapples with the emotional and experiential realm of metaphors. Thus, it invites us to critically reflect on the nature and limits of machine learning, providing a compelling starting point for our exploration into the philosophy of AI’s language understanding.

Analysis

The research contributes a nuanced analysis of AI’s interaction with metaphors, taking into consideration linguistic, psychological, and philosophical dimensions. It focuses on three types of metaphors: Twice-Apt-Metaphors, presuppositional pretence-based metaphors, and self-expressing IDMs. The author argues that each metaphor type presents unique interpretative challenges that push the boundaries of AI’s language understanding. For instance, Twice-Apt-Metaphors require an aesthetic judgment, presuppositional pretence-based metaphors demand a higher-order Theory of Mind, and self-expressing IDMs necessitate an understanding of affective experiential states. The article posits that these metaphor types may lay bare potential limitations of AI due to the absence of these cognitive and affective faculties.

This comprehensive analysis is underpinned by a philosophical exploration of the nature of AI. The author leverages the arguments of Alan Turing and John Searle to engage in a broader debate about whether AI can possess mental states and consciousness. Turing’s perspective that successful AI behavior in dealing with figurative language might suggest consciousness is juxtaposed with Searle’s argument against attributing internal states to AI. This dialectic frames the discourse on the potential and limitations of AI in understanding metaphors. Consequently, the research article navigates the intricate interplay between AI’s computational prowess and the nuances of human language, offering an intricate analysis that enriches our understanding of AI’s metaphor interpretation capabilities.

Theory of Mind, Affective and Experiential States, and AI

Where concerns AI and metaphor interpretation, the research invokes the theory of mind as an essential conceptual tool. Specifically, the discussion of presuppositional pretence-based metaphors emphasizes the necessity of a higher-order theory of mind for their interpretation—a capability that current AI models lack. The author elaborates that this kind of metaphor requires the ability to simulate pretence while assuming the addressee’s perspective, effectively necessitating the understanding of another’s mental states—an ability attributed to conscious beings. The proposition challenges the notion that AI, as currently conceived, can adequately simulate human-like understanding of language, as it underscores the fundamental gap between processing information and genuine comprehension that is imbued with conscious, subjective experience. This argument not only extends the discussion about AI’s ability to handle complex metaphors but also ventures into the philosophical debate on whether machines could, in principle, develop consciousness or an equivalent functional attribute.

On the concepts of affective and experiential states, the author emphasizes their indispensable role in the understanding of metaphors known as self-expressing IDMs. These metaphors, as outlined by the author, necessitate an emotional resonance and experiential comparison on the part of the listener—an attribute currently unattainable for AI models. The argument propounds that without internal affective and experiential states, the AI’s responses to these metaphors would likely be less apt compared to human responses. This perspective raises profound questions about the nature of AI, pivoting the conversation toward whether machines can ever achieve the depth of understanding inherent to human cognition. The author acknowledges the controversy surrounding this assumption, illuminating the enduring philosophical debate around consciousness, internal states, and their potential existence within the realm of artificial intelligence.

Conscious Machines and Implications for Linguistics and Philosophy

Turing’s philosophy of conscious machines is integral to the discourse of the article, thus allowing it to expand into the wider intellectual milieu of AI consciousness. The research invokes Turing’s counter-argument to Sir Geoffrey Jefferson’s assertion, thereby stimulating a deeper conversation on AI’s potential to possess mental and emotional states. Turing’s contention against Jefferson’s solipsistic argument holds that if we attribute consciousness to other humans despite not experiencing their internal states, we should, by parity of reasoning, be open to the idea of conscious machines. The author, through this engagement with Turing’s thinking, underscores the seminal contribution of Turing’s dialogue example, where an interrogator and a machine engage in a discussion on metaphoric language. This excerpt presents a pertinent, and as yet unresolved, challenge for AI: the ability to handle complex, poetic language that requires deeper, affective understanding. Thus, Turing’s perspective on conscious machines emerges as a significant philosophical vantage point within the research, with implications far beyond the realm of linguistics and into the broader study of futures.

The author’s research effectively brings into focus the intertwined destinies of linguistics, philosophy, and AI, stimulating a philosophical debate with practical ramifications. It poses crucial challenges to the prevalent theories of metaphor interpretation that presuppose a sense for aesthetic pleasure, a higher-order theory of mind, and internal experiential or affective states. If future AI systems successfully handle twice-apt, presuppositional pretence-based and certain IDM metaphors, then the cognitive prerequisites for understanding these metaphors could require reconsideration. This eventuality could disrupt established thinking in linguistics and philosophy, prompting scholars to rethink the very foundation of their theories about metaphors and figurative language. Yet, if AI systems fail to improve their aptitude for metaphorical language, it may solidify the author’s hypothesis about the essential mental capabilities for metaphor interpretation that computer programs lack. Thus, the research serves as a launchpad for future philosophical and linguistic exploration, establishing an impetus for re-evaluating established theories and conceptions.

Abstract

Powerful transformer models based on neural networks such as GPT-4 have enabled huge progress in natural language processing. This paper identifies three challenges for computer programs dealing with metaphors. First, the phenomenon of Twice-Apt-Metaphors shows that metaphorical interpretations do not have to be triggered by syntactical, semantic or pragmatic tensions. The detection of these metaphors seems to involve a sense of aesthetic pleasure or a higher-order theory of mind, both of which are difficult to implement into computer programs. Second, the contexts relative to which metaphors are interpreted are not simply given but must be reconstructed based on pragmatic considerations that can involve presuppositional pretence. If computer programs cannot produce or understand such a form of pretence, they will have problems dealing with certain metaphors. Finally, adequately interpreting and reacting to some metaphors seems to require the ability to have internal, first-personal experiential and affective states. Since it is questionable whether computer programs have such mental states, it can be assumed that they will have problems with these kinds of metaphors.

Machines and metaphors: Challenges for the detection, interpretation and production of metaphors by computer programs

(Work in Progress) Resilient Failure Modes of AI Alignment

Resilient Failure Modes of AI Alignment

This project is housed at the Institute of Futures Research and relates to understanding and regularizing challenges to goal and value alignment in artificial intelligent (AI) systems when those systems exhibit nontrivial degrees of behavioral freedom and flexibility, and agency. Of particular concern are resilient failure modes, that is, failure modes that are intractable to methodological or technological resolution, owing to e.g. fundamental conflicts in the underlying ethical theory, or epistemic issues such as persistent ambiguity between the ethical theory, empirical facts, and any world models and policies held by the AI.

I will also be characterizing a resilient failure mode which has not apparently been addressed in the extant literature: misalignment incurred when reasoning and acting from shifting levels of abstraction. An intelligence apparently aligned in its outputs via some mechanism to a state space is not guaranteed to be aligned in the event that state space expands, for instance, through in-context learning or reasoning upon metastatements. This project will motivate, clarify, and formalize this failure mode as it pertains to artificial intelligence systems.

Within the scope of this research project, I am conducting a review of the literature pertaining to artificial intelligence alignment methods and failure modes, epistemological challenges to goal and value alignment, impossibility theorems in population and utilitarian ethics, and the nature of agency as it pertains to artifacts. A nonexhaustive bibliography follows.

I am greatly interested in potential feedback on this project, and suggestions for further reading.

References

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, & Dan Mané. (2016). Concrete Problems in AI Safety. https://doi.org/10.48550/arXiv.1606.06565

Peter Eckersley. (2019). Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function). https://doi.org/10.48550/arXiv.1901.00064

Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, & Stuart Russell. (2016). Cooperative Inverse Reinforcement Learning. https://doi.org/10.48550/arXiv.1606.03137

Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, & Shane Legg. (2017). AI Safety Gridworlds. https://doi.org/10.48550/arXiv.1711.09883

Scott McLean, Gemma J. M. Read, Jason Thompson, Chris Baber, Neville A. Stanton & Paul M. Salmon(2023)The risks associated with Artificial General Intelligence: A systematic review,Journal of Experimental & Theoretical Artificial Intelligence,35:5,649-663,DOI: 10.1080/0952813X.2021.1964003

Richard Ngo, Lawrence Chan, & Sören Mindermann. (2023). The alignment problem from a deep learning perspective. https://doi.org/10.48550/arXiv.2209.00626

Petersen, S. (2017). Superintelligence as Superethical. In P. Lin, K. Abney, & R. Jenkins (Eds.), Robot Ethics 2. 0: New Challenges in Philosophy, Law, and Society (pp. 322–337). New York, USA: Oxford University Press.

Max Reuter, & William Schulze. (2023). I’m Afraid I Can’t Do That: Predicting Prompt Refusal in Black-Box Generative Language Models. https://doi.org/10.48550/arXiv.2306.03423

Jonas Schuett, Noemi Dreksler, Markus Anderljung, David McCaffary, Lennart Heim, Emma Bluemke, & Ben Garfinkel. (2023). Towards best practices in AGI safety and governance: A survey of expert opinion. https://doi.org/10.48550/arXiv.2305.07153

Open Ended Learning Team, Adam Stooke, Anuj Mahajan, Catarina Barros, Charlie Deck, Jakob Bauer, Jakub Sygnowski, Maja Trebacz, Max Jaderberg, Michael Mathieu, Nat McAleese, Nathalie Bradley-Schmieg, Nathaniel Wong, Nicolas Porcel, Roberta Raileanu, Steph Hughes-Fitt, Valentin Dalibard, & Wojciech Marian Czarnecki. (2021). Open-Ended Learning Leads to Generally Capable Agents. https://doi.org/10.48550/arXiv.2107.12808

Roman V. Yampolskiy(2014)Utility function security in artificially intelligent agents,Journal of Experimental & Theoretical Artificial Intelligence,26:3,373-389. https://doi.org/10.1080/0952813X.2014.895114

(Featured) The Ethics of Technology: How Can Indigenous Thought Contribute?

The Ethics of Technology: How Can Indigenous Thought Contribute?

John Weckert and Rogelio Bayod present a comprehensive examination of the intersection between ethics, technology, and Indigenous worldviews. The authors argue that the ethics of technology, which largely remains a peripheral concern in technological developments, could significantly benefit from the incorporation of Indigenous perspectives. They contend that the entrenched paradigms of Western thought, with their focus on materialism, individualism, efficiency, and progress, often marginalize ethical considerations. This, they suggest, is where Indigenous worldviews, which emphasize relationality, spirituality, and a reciprocal relationship with the Earth, could offer a potent alternative.

A key aspect of Indigenous thought highlighted in the paper is the concept of relationality. Indigenous worldviews often consider all entities, living and non-living, as interconnected and mutually influential. This view contrasts with the Western conceptualization of individual entities as distinct and primarily self-interested. Consequently, incorporating this perspective into the ethics of technology could help shift the focus from the maximization of individual benefits to the maintenance of collective well-being. The paper also underscores the Indigenous emphasis on spirituality, where both natural and man-made objects can hold spiritual or non-material significance. This perspective could help challenge the prevailing Western materialistic worldview, fostering a more holistic understanding of technological artifacts and their value.

The authors propose that integrating these Indigenous concepts could provide a foundation for a reimagined Western worldview, even if these elements are interpreted metaphorically rather than literally. Such a worldview, they argue, would not only challenge the prevailing emphasis on materialistic values but could also facilitate a more beneficial development and use of technology. This reframed paradigm would prioritize environmental health, reduce the production of disposable products, and lessen the focus on profitability, efficiency, and individualism. Instead, it would place greater emphasis on care for the Earth, kinship, relationships, and spirituality.

This research contributes to broader philosophical discussions around the ethics of technology and futures studies. It offers a critical reframing of our relationship with technology, drawing on Indigenous worldviews to challenge dominant Western paradigms. By doing so, it highlights the value of diverse perspectives in shaping our technological futures and raises critical questions around the role of values and worldviews in guiding technological development. This paper thus adds to ongoing debates around decolonizing technology and futures studies, and extends them into the sphere of ethics.

The paper suggests numerous avenues for future research. Given its emphasis on the potential of Indigenous worldviews, further explorations could delve deeper into specific Indigenous perspectives on technology, drawing from a wider range of cultures and traditions. Another promising area for future research could involve examining how these Indigenous values could be operationalized within different technological domains, and the possible impacts this could have. Finally, there is a significant need for empirical research on how this paradigm shift might be achieved, and the potential barriers and facilitators involved. This research paper thus opens the door to a rich array of investigations that could fundamentally reshape our understanding of the ethics of technology.

Abstract

The ethics of technology is not as effective as it should. Despite decades of ethical discussion, development and use of new technologies continues apace without much regard to those discussions. Economic and other forces are too powerful. More focus needs to be placed on the values that underpin social attitudes to technology. By seriously looking at Indigenous thought and comparing it with the typical Western way of seeing the world, we can gain a better understanding of our own views. The Indigenous Filipino worldview provides us with a platform for assessing our own core values and suggests modifications to those values. It also indicates ways for broadening and altering the focus of the ethics of technology to make it more effective in helping us to use technologies in ways more conducive to human well-being.

The Ethics of Technology: How Can Indigenous Thought Contribute?

(Featured) Can robots be trustworthy?

Can robots be trustworthy?

Ines Schröder et al. present an in-depth exploration of the phenomenological and ethical implications of socially assistive robots (SARs), with a specific focus on their role within the medical sector. Central to the discussion is the concept of responsivity, a construct that the authors argue is inherent to human experience and mirrored, to a certain extent, in human-robot interactions. They explore the nature of this perceived responsivity and its implications for the philosophical understanding of human-robot relations.

The article begins by drawing a distinction between human and artificial responsivity, elucidating the phenomenological structure of human responsivity and how it is translated into SARs’ design. The authors underscore how SARs’ design parameters, such as AI-enhanced speech recognition, physical mobility, and social affordances, culminate in a form of ‘virtual responsivity.’ This virtual responsivity serves to mimic human interaction, creating a semblance of empathy and understanding. However, the authors also emphasize the limitations of this approach, highlighting the potential for deception and the lack of essential direct reciprocity inherent in genuine ethical responsivity.

The crux of the article lies in its examination of the ethical implications of this constructed responsivity. The authors grapple with the potential ethical pitfalls, tensions, and challenges of SARs, particularly within the domain of medical applications. They articulate concerns regarding the preservation of patient autonomy, the balancing of beneficial impact against inherent risks, and the principle of justice in relation to access to advanced technologies. The authors further highlight the three ethically relevant dimensions of vulnerability, dignity, and trust in relation to responsivity, emphasizing the importance of these dimensions in human-robot interactions.

Broadly, the research intersects with larger philosophical themes concerning the nature of consciousness, personhood, and the moral status of non-human entities. The authors’ analysis of SARs’ ‘virtual responsivity’ challenges conventional understandings of these concepts, raising critical questions about the attribution of moral status and the potential for emotional attachment to non-human entities. The exploration of ethical dimensions of vulnerability, dignity, and trust in the context of human-robot interactions further elucidates the evolving dynamics of human-machine relationships, providing a nuanced perspective on the philosophical implications of advanced technology.

Looking towards the future, the research opens several avenues for further exploration. One potential focus is the development of a robust ethical framework for the design and use of SARs, especially in sensitive domains such as healthcare. There is a need for research into ‘ethically sensitive responsiveness,’ which could provide a basis for setting appropriate boundaries in human-robot interactions and ensuring the clear communication of a robot’s capabilities and limitations. Additionally, empirical research exploring the psychological effects of human-robot interactions, particularly in relation to the formation of trust, would be invaluable. Overall, the ethical and philosophical implications of artificial responsivity necessitate a multidisciplinary approach, inviting further dialogue between fields such as robotics, ethics, philosophy, and psychology.

Abstract

Definition of the problem

This article critically addresses the conceptualization of trust in the ethical discussion on artificial intelligence (AI) in the specific context of social robots in care. First, we attempt to define in which respect we can speak of ‘social’ robots and how their ‘social affordances’ affect the human propensity to trust in human–robot interaction. Against this background, we examine the use of the concept of ‘trust’ and ‘trustworthiness’ with respect to the guidelines and recommendations of the High-Level Expert Group on AI of the European Union.

Arguments

Trust is analyzed as a multidimensional concept and phenomenon that must be primarily understood as departing from trusting as a human functioning and capability. To trust is an essential part of the human basic capability to form relations with others. We further want to discuss the concept of responsivity which has been established in phenomenological research as a foundational structure of the relation between the self and the other. We argue that trust and trusting as a capability is fundamentally responsive and needs responsive others to be realized. An understanding of responsivity is thus crucial to conceptualize trusting in the ethical framework of human flourishing. We apply a phenomenological–anthropological analysis to explore the link between certain qualities of social robots that construct responsiveness and thereby simulate responsivity and the human propensity to trust.

Conclusion

Against this background, we want to critically ask whether the concept of trustworthiness in social human–robot interaction could be misguided because of the limited ethical demands that the constructed responsiveness of social robots is able to answer to.

Can robots be trustworthy?

(Featured) In Conversation with Artificial Intelligence: Aligning language Models with Human Values

In Conversation with Artificial Intelligence: Aligning language Models with Human Values

Atoosa Kasirzadeh and Iason Gabriel embark on an ambitious analysis of how large-scale conversational agents, such as AI language models, can be better designed to align with human values. The premise of the article is grounded in the philosophy of language and pragmatics, employing Gricean maxims and Speech Act Theory to establish the importance of context and cooperation in achieving effective and ethical linguistic communication. The authors underscore the necessity of considering pragmatic norms and concerns in the design of conversational agents and illustrate their proposition through three discursive domains: science, civic life, and creative exchange.

The authors present a novel approach, suggesting the operationalization of Gricean maxims of quantity, quality, relation, and manner, to aid in cooperative communication between humans and AI. They also emphasize the diversity of utterances, asserting that there is no single universal condition of validity that applies to all. Instead, the validity of utterances often depends on different sorts of truth conditions which require different methodologies for substantiation, based on context-specific criteria of validity. They further stress the centrality of contextual information in the design of ideal conversational agents and highlight the need for research to theorise and measure the difference between the literal and contextual meaning of utterances.

The authors also delve into the implications of their analysis for future research into the design of conversational agents. They discuss the potential for anthropomorphisation of conversational agents and the constraints that might be imposed on them. They note that while anthropomorphism can sometimes be consistent with the creation of value-aligned agents, there are situations where it might be undesirable or inappropriate. They also advocate for the exploration of the potential for conversational agents to facilitate more robust and respectful conversations through context construction and elucidation. Lastly, they suggest that their analysis could be used to evaluate the quality of interactions between conversational agents and users, providing a framework for refining both human and automatic evaluation of conversational agent performance.

The research article resonates with broader philosophical themes, particularly those concerning the interplay between technology and society. It touches upon the ethical dimensions of AI, hinting at the moral responsibility of designing AI systems that align with human values and norms. The exploration of Gricean maxims and Speech Act Theory in the context of AI conversational agents provides a unique blend of AI ethics, philosophy of language, and pragmatics, reflecting the interdisciplinary nature of contemporary AI research. In doing so, the article stimulates dialogue about the role of AI in shaping our social and communicative practices, challenging conventional boundaries between humans and machines, and highlighting the potential of AI as a tool for fostering effective and ethically sound communication.

In terms of future avenues of research, the authors’ analysis opens up a myriad of possibilities. First, while the paper focuses primarily on the English language, a fruitful direction of research could involve the exploration of norms and pragmatics in other languages, thereby ensuring the cultural inclusivity and sensitivity of AI systems. Second, the proposed alignment of AI conversational agents with Gricean maxims and discursive ideals could be further operationalized and tested empirically to assess its effectiveness in real-world scenarios. Third, the article alludes to the potential of AI in fostering more robust and respectful conversations, which suggests an opportunity to investigate how AI can play an active role in shaping discourse norms and facilitating constructive dialogues. Lastly, the authors’ work can be further enriched by drawing from other sociological and philosophical traditions, such as Luhmann’s system theory or Latour’s actor-network theory, to offer a more comprehensive and nuanced understanding of the complex interplay between AI, language, and society.

Abstract

Large-scale language technologies are increasingly used in various forms of communication with humans across different contexts. One particular use case for these technologies is conversational agents, which output natural language text in response to prompts and queries. This mode of engagement raises a number of social and ethical questions. For example, what does it mean to align conversational agents with human norms or values? Which norms or values should they be aligned with? And how can this be accomplished? In this paper, we propose a number of steps that help answer these questions. We start by developing a philosophical analysis of the building blocks of linguistic communication between conversational agents and human interlocutors. We then use this analysis to identify and formulate ideal norms of conversation that can govern successful linguistic communication between humans and conversational agents. Furthermore, we explore how these norms can be used to align conversational agents with human values across a range of different discursive domains. We conclude by discussing the practical implications of our proposal for the design of conversational agents that are aligned with these norms and values.

In Conversation with Artificial Intelligence: Aligning language Models with Human Values

(Featured) The black box problem revisited. Real and imaginary challenges for automated legal decision making

The black box problem revisited. Real and imaginary challenges for automated legal decision making

Bartosz Brożek et al. explore the ethical and practical dilemmas arising from the integration of Artificial Intelligence (AI) in the realm of law. The authors suggest that despite the perceived opacity and unpredictability of AI, these machines can provide rational and justifiable decisions in legal reasoning. By challenging conventional notions of decision-making and justifiability, the paper reframes the discussion around AI’s role in law and provides a compelling argument for AI’s potential to aid in legal reasoning.

The authors delve into the intricacies of legal decision-making, highlighting the contrast between our traditional expectations and the realities of legal reasoning. They argue that while we expect legal decisions to be based on clearly identifiable structures, algorithmic operations on beliefs, and classical logic, the cognitive science research paints a contrasting picture. The authors further suggest that most legal decisions emerge unconsciously, lack a recognizable structure, and are often influenced by emotional reactions and social training. This observation paves the way for a paradigm shift, suggesting that rather than the process, it is the justifiability of the decision ex post that is paramount.

The authors propose a two-module AI system, one intuitive and the other rational. The intuitive module, powered by machine learning, recognizes patterns from large datasets and makes decisions. The rational module, grounded in logic, does not make decisions but justifies those made by the intuitive module. In this framework, AI can be seen as rational if an acceptable justification can be provided for its decisions, despite their unpredictability. This interesting intertwining of machine learning and logic reshapes our understanding of AI’s role in legal decision-making.

This paper touches upon broader philosophical issues surrounding consciousness, rationality, and decision-making. By arguing for a shift from a process-oriented to a result-oriented evaluation of decision-making, the authors challenge the traditional Kantian perspective. The proposed model, in which an AI’s decisions are assessed based on their post-hoc justifiability, aligns more closely with consequentialist philosophy. This emphasis on the end result rather than the means to reach it further stimulates the ongoing debate on the ethical implications of AI use and the re-evaluation of long-held philosophical tenets in the face of technological advancements.

Future research could explore various facets of this proposed two-module AI system, particularly the interplay and potential conflicts between the intuitive and rational modules. Questions around what constitutes an “acceptable justification” in various legal contexts also demand further exploration. Additionally, research could investigate how this approach to AI in law would intersect with other legal principles, such as fairness, transparency, and due process. Ultimately, the paper presents a compelling case for rethinking the role and evaluation of AI in legal decision-making, opening up intriguing possibilities for future philosophical and legal discourse.

Abstract

This paper addresses the black-box problem in artificial intelligence (AI), and the related problem of explainability of AI in the legal context. We argue, first, that the black box problem is, in fact, a superficial one as it results from an overlap of four different – albeit interconnected – issues: the opacity problem, the strangeness problem, the unpredictability problem, and the justification problem. Thus, we propose a framework for discussing both the black box problem and the explainability of AI. We argue further that contrary to often defended claims the opacity issue is not a genuine problem. We also dismiss the justification problem. Further, we describe the tensions involved in the strangeness and unpredictability problems and suggest some ways to alleviate them.

The black box problem revisited. Real and imaginary challenges for automated legal decision making

(Featured) Algorithmic Nudging: The Need for an Interdisciplinary Oversight

Algorithmic Nudging: The Need for an Interdisciplinary Oversight

Christian Schmauder et al. critically assess the implications and risks of employing “black box” AI systems for the development and implementation of personalized nudges in various domains of life. They begin by outlining the power and promise of algorithmic nudging, drawing attention to how AI-driven nudges could bring about widespread benefits in areas such as health, finance, and sustainability. However, they contend that outsourcing nudging to opaque AI systems poses challenges in terms of understanding the underlying reasons for their effectiveness and addressing potential unintended consequences.

The authors delve deeper into the nuances of algorithmic nudging by examining the role of personalized advice in influencing human decision-making. They highlight a key concern that arises when AI systems attempt to maximize user satisfaction: the tendency of the algorithms to exploit cognitive biases in order to achieve desired outcomes. Consequently, the effectiveness of the AI-developed nudges might come at the cost of truthfulness, ultimately undermining the very goals they were designed to achieve.

To address this issue, the authors advocate for the need to look “under the hood” of AI systems, arguing that understanding the underlying cognitive processes harnessed by these systems is crucial for mitigating unintended side effects. They emphasize the importance of interdisciplinary collaboration between computer scientists, cognitive scientists, and psychologists in the development, monitoring, and refinement of AI systems designed to influence human decision-making.

The authors’ exploration of the limitations and risks of “black box” AI nudges raises broader philosophical concerns, particularly in relation to the ethics of autonomy, transparency, and accountability. These concerns call into question the balance between leveraging AI-driven nudges to benefit society and preserving individual autonomy and freedom of choice. Furthermore, the analysis highlights the tension between relying on AI’s predictive power and fostering a deeper understanding of the mechanisms driving human behavior.

This paper provides a valuable foundation for future research on the ethical and philosophical implications of AI-driven nudging. Further investigation could delve into the possible approaches to designing more transparent and explainable AI systems, exploring how such systems might enhance, rather than hinder, human decision-making processes. Additionally, researchers could examine the moral responsibilities of AI developers and regulators, studying the ethical frameworks necessary to guide the development and deployment of AI nudges that respect human autonomy, values, and dignity. Ultimately, a deeper understanding of these complex philosophical questions will be instrumental in realizing the full potential of AI-driven nudges while safeguarding against their potential pitfalls.

Abstract

Nudge is a popular public policy tool that harnesses well-known biases in human judgement to subtly guide people’s decisions, often to improve their choices or to achieve some socially desirable outcome. Thanks to recent developments in artificial intelligence (AI) methods new possibilities emerge of how and when our decisions can be nudged. On the one hand, algorithmically personalized nudges have the potential to vastly improve human daily lives. On the other hand, blindly outsourcing the development and implementation of nudges to “black box” AI systems means that the ultimate reasons for why such nudges work, that is, the underlying human cognitive processes that they harness, will often be unknown. In this paper, we unpack this concern by considering a series of examples and case studies that demonstrate how AI systems can learn to harness biases in human judgment to reach a specified goal. Drawing on an analogy in a philosophical debate concerning the methodology of economics, we call for the need of an interdisciplinary oversight of AI systems that are tasked and deployed to nudge human behaviours.

Algorithmic Nudging: The Need for an Interdisciplinary Oversight

(Featured) Levels of explicability for medical artificial intelligence: What do we normatively need and what can we technically reach?

Levels of explicability for medical artificial intelligence: What do we normatively need and what can we technically reach?

Frank Ursin et al. investigate the ethical considerations associated with medical artificial intelligence (AI), particularly in the context of radiology. They emphasize the importance of implementing explainable AI (XAI) techniques to address epistemic and explanatory concerns that arise when AI is employed in medical decision-making. The authors outline a four-level approach to explicability, comprising disclosure, intelligibility, interpretability, and explainability, with each successive level representing an escalation in the level of detail and clarity provided to the patient or physician.

The authors argue that XAI has great potential in the medical field, and they present two examples from radiology to illustrate its practical applications. The first example involves the use of image inpainting techniques to generate sharper and more detailed saliency maps, which can help localize relevant regions within radiological images. The second example highlights the importance of natural language communication in XAI, where an image-to-text model is used to generate medical reports based on radiological images. These two examples demonstrate that incorporating XAI techniques in radiology can provide valuable insights and improved communication for medical practitioners and patients.

In the paper’s conclusion, the authors emphasize the need for a tailored approach to explicability that considers the needs of patients and the scope of medical decisions. They also advocate for the use of insights gained from medical AI ethics to re-evaluate established medical practices and confront biases in medical classification systems. By applying the four levels of explicability in a thoughtful manner, the authors posit that ethically defensible information processes can be established when utilizing medical AI.

This paper touches on broader philosophical issues related to the ethics of technology, medical autonomy, and the nature of trust in AI-driven decision-making. As AI becomes increasingly integrated into various domains of human activity, questions about transparency, fairness, and the moral implications of AI systems become paramount. This paper demonstrates the necessity of establishing an ethical framework for AI applications in healthcare, providing valuable insights that can be extended to other disciplines as well. By considering the complex interplay between AI-driven systems and human agents, the authors also underscore the importance of understanding how technological advancements impact the broader social fabric and the values we uphold as a society.

Future research in this area could explore the generalizability of the four-level approach to explicability in other medical domains or even non-medical contexts. Additionally, researchers may investigate how the incorporation of diverse perspectives in the development of AI systems and explainability techniques can mitigate the potential for biases and discriminatory outcomes. It would also be valuable to study how XAI can be adapted to the specific needs and preferences of individual patients or physicians, creating personalized approaches to explicability. Lastly, researchers may wish to assess the long-term impact of integrating XAI in medical practice, particularly in terms of patient satisfaction, physician trust, and overall quality of care.

Abstract

Definition of the problem

The umbrella term “explicability” refers to the reduction of opacity of artificial intelligence (AI) systems. These efforts are challenging for medical AI applications because higher accuracy often comes at the cost of increased opacity. This entails ethical tensions because physicians and patients desire to trace how results are produced without compromising the performance of AI systems. The centrality of explicability within the informed consent process for medical AI systems compels an ethical reflection on the trade-offs. Which levels of explicability are needed to obtain informed consent when utilizing medical AI?

Arguments

We proceed in five steps: First, we map the terms commonly associated with explicability as described in the ethics and computer science literature, i.e., disclosure, intelligibility, interpretability, and explainability. Second, we conduct a conceptual analysis of the ethical requirements for explicability when it comes to informed consent. Third, we distinguish hurdles for explicability in terms of epistemic and explanatory opacity. Fourth, this then allows to conclude the level of explicability physicians must reach and what patients can expect. In a final step, we show how the identified levels of explicability can technically be met from the perspective of computer science. Throughout our work, we take diagnostic AI systems in radiology as an example.

Conclusion

We determined four levels of explicability that need to be distinguished for ethically defensible informed consent processes and showed how developers of medical AI can technically meet these requirements.

Levels of explicability for medical artificial intelligence: What do we normatively need and what can we technically reach?