(Work in Progress) Resilient Failure Modes of AI Alignment

Resilient Failure Modes of AI Alignment

This project is housed at the Institute of Futures Research and relates to understanding and regularizing challenges to goal and value alignment in artificial intelligent (AI) systems when those systems exhibit nontrivial degrees of behavioral freedom and flexibility, and agency. Of particular concern are resilient failure modes, that is, failure modes that are intractable to methodological or technological resolution, owing to e.g. fundamental conflicts in the underlying ethical theory, or epistemic issues such as persistent ambiguity between the ethical theory, empirical facts, and any world models and policies held by the AI.

I will also be characterizing a resilient failure mode which has not apparently been addressed in the extant literature: misalignment incurred when reasoning and acting from shifting levels of abstraction. An intelligence apparently aligned in its outputs via some mechanism to a state space is not guaranteed to be aligned in the event that state space expands, for instance, through in-context learning or reasoning upon metastatements. This project will motivate, clarify, and formalize this failure mode as it pertains to artificial intelligence systems.

Within the scope of this research project, I am conducting a review of the literature pertaining to artificial intelligence alignment methods and failure modes, epistemological challenges to goal and value alignment, impossibility theorems in population and utilitarian ethics, and the nature of agency as it pertains to artifacts. A nonexhaustive bibliography follows.

I am greatly interested in potential feedback on this project, and suggestions for further reading.

References

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, & Dan Mané. (2016). Concrete Problems in AI Safety. https://doi.org/10.48550/arXiv.1606.06565

Peter Eckersley. (2019). Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function). https://doi.org/10.48550/arXiv.1901.00064

Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, & Stuart Russell. (2016). Cooperative Inverse Reinforcement Learning. https://doi.org/10.48550/arXiv.1606.03137

Jan Leike, Miljan Martic, Victoria Krakovna, Pedro A. Ortega, Tom Everitt, Andrew Lefrancq, Laurent Orseau, & Shane Legg. (2017). AI Safety Gridworlds. https://doi.org/10.48550/arXiv.1711.09883

Scott McLean, Gemma J. M. Read, Jason Thompson, Chris Baber, Neville A. Stanton & Paul M. Salmon(2023)The risks associated with Artificial General Intelligence: A systematic review,Journal of Experimental & Theoretical Artificial Intelligence,35:5,649-663,DOI: 10.1080/0952813X.2021.1964003

Richard Ngo, Lawrence Chan, & Sören Mindermann. (2023). The alignment problem from a deep learning perspective. https://doi.org/10.48550/arXiv.2209.00626

Petersen, S. (2017). Superintelligence as Superethical. In P. Lin, K. Abney, & R. Jenkins (Eds.), Robot Ethics 2. 0: New Challenges in Philosophy, Law, and Society (pp. 322–337). New York, USA: Oxford University Press.

Max Reuter, & William Schulze. (2023). I’m Afraid I Can’t Do That: Predicting Prompt Refusal in Black-Box Generative Language Models. https://doi.org/10.48550/arXiv.2306.03423

Jonas Schuett, Noemi Dreksler, Markus Anderljung, David McCaffary, Lennart Heim, Emma Bluemke, & Ben Garfinkel. (2023). Towards best practices in AGI safety and governance: A survey of expert opinion. https://doi.org/10.48550/arXiv.2305.07153

Open Ended Learning Team, Adam Stooke, Anuj Mahajan, Catarina Barros, Charlie Deck, Jakob Bauer, Jakub Sygnowski, Maja Trebacz, Max Jaderberg, Michael Mathieu, Nat McAleese, Nathalie Bradley-Schmieg, Nathaniel Wong, Nicolas Porcel, Roberta Raileanu, Steph Hughes-Fitt, Valentin Dalibard, & Wojciech Marian Czarnecki. (2021). Open-Ended Learning Leads to Generally Capable Agents. https://doi.org/10.48550/arXiv.2107.12808

Roman V. Yampolskiy(2014)Utility function security in artificially intelligent agents,Journal of Experimental & Theoretical Artificial Intelligence,26:3,373-389. https://doi.org/10.1080/0952813X.2014.895114

(Featured) The Metaphysics of Transhumanism

The Metaphysics of Transhumanism

Eric T. Olson investigates the concept of “Parfitian transhumanism” and its metaphysical implications. Named after the British philosopher Derek Parfit, Parfitian transhumanism explores the transformation of human identity and existence, primarily through the lens of “psychological continuity,” in a potential future era of advanced technological interventions in human biology and cognition. The author effectively uses this article as a platform to address the intricate relationship between identity, existence, and psychological continuity in a transhumanist context, a discourse that not only challenges traditional philosophical perspectives but also provides compelling insights into the possible future of human evolution.

Olson posits psychological continuity as a cornerstone of Parfitian transhumanism, suggesting a shift in focus from physical to psychological in understanding personal identity and survival. In delineating this shift, the author challenges the traditional concept of survival as an identity-preserving process and presents a more nuanced understanding of survival as contingent upon psychological continuity and connectedness. This reassessment of survival reframes the philosophical discourse on identity and existence in a transhumanist context.

Concept of Psychological Continuity

The concept of psychological continuity serves as a critical pivot in the author’s exploration of Parfitian transhumanism. This perspective posits identity not as static or inherently tied to the physical form, but as a flowing narrative, a continuum shaped by psychological similarities and connectedness over time. It is in this context that the author examines the dynamics of identity preservation in future scenarios where advanced technology may facilitate radical transformations in human existence. By positing psychological continuity as a defining factor of identity, the author challenges the traditional philosophical precept of identity as predominantly physical or material and redirects our attention towards psychological factors such as memory, cognition, and personality traits.

Within this framework, the author presents an interesting argument by contrasting the survival of physical identity with that of psychological continuity. The traditional understanding of survival, as discussed in the article, assumes a direct correlation between the survival of the physical self and that of personal identity. However, the author contends that this correlation does not necessarily hold in scenarios that involve ‘nondestructive uploading,’ where an individual’s psychological profile is preserved in an electronic entity while leaving the physical self intact. By invoking this notion, the author further entrenches the concept of psychological continuity as a central theme of Parfitian transhumanism, questioning the sufficiency of physical continuity as a measure of survival and prompting a deeper exploration of this psychological dimension of identity.

Parfitian Transhumanism and the Martian Hypothetical

Parfitian transhumanism ushers in a new paradigm for considering the implications of future human transformations via technological advancements. Grounded in Derek Parfit’s notion of psychological continuity, this perspective critically reassesses our conceptions of identity and survival in a post-human context. Through a series of hypothetical scenarios, the author teases out the potential divergence between psychological continuity and personal survival. They expose an intriguing inconsistency: even in the presence of a psychologically continuous successor, the psychological original tends to express a clear preference for its own welfare. Such examples underscore the complexities inherent in Parfitian transhumanism and call into question the very premises of identity and survival, invoking a reevaluation of our prudential attitudes towards future selves and prompting a profound discourse on the future of human identity in an era of rapid technological advancement.

For example, the author’s innovative “Martian hypothetical” presents us with a scenario wherein an exact psychological replica of a human, an “electronic person,” is created non-destructively and is subjected to differing experiences, including torture. The scenario illuminates an intriguing paradox: even when a psychological clone exists, the original self shows a clear preference for its own welfare, suggesting a disconnect between psychological continuity and personal survival. This paradox, as presented by the author, poses a profound ethical question regarding the status of psychological replicas, asking us to contemplate the validity of selfish concern in the face of seemingly identical psychological entities. By probing these issues, the author deepens our philosophical understanding of identity, survival, and ethics in the face of prospective technological advancements.

The Prudential Concerns and Broader Philosophical Discourse

The examination of prudential concerns within the transhumanist paradigm provides a valuable contribution to philosophical discourse. While the article articulates the notion of psychological continuity as the core of personal identity, it also raises doubts about the sufficiency of this concept for prudential concern – the interest one has in their own future experiences. In scenarios such as nondestructive uploading, despite perfect psychological continuity with the electronic replica, the author notes a discernible preference for one’s own physical continuity. This observation seems to contradict the notion of equivalency between psychological continuity and survival, indicating a potential disparity between philosophical and prudential perspectives on identity. The author’s rigorous analysis thus prompts us to reassess assumptions about the centrality of psychological continuity to personal identity, prompting further deliberation on the complex relationship between continuity, survival, and prudential interests in the philosophical sphere.

The author’s critique of Parfitian transhumanism emerges from an analysis of the disjunction between psychological continuity and prudential interest, providing a contribution to the larger discourse on personal identity and the ethics of futuristic technology. This line of inquiry echoes and amplifies long-standing philosophical debates about the nature of the self and the conditions for its survival. While the author’s skepticism regarding the adequacy of psychological continuity in defining survival is noteworthy, it further fuels the ongoing philosophical discussions around personal identity, transhumanism, and their ethical implications. In contextualizing this argument within the broader philosophical landscape, the author subtly invites a more profound dialogue between traditional theories of identity and the ever-evolving concept of transhumanism, thereby enriching the conversation in the field of futures studies.

Abstract

Transhumanists want to free us from the constraints imposed by our humanity by means of “uploading”: extracting information from the brain, transferring it to a computer, and using it to create a purely electronic person there. That is supposed to move us from our human bodies to computers. This presupposes that a human being could literally move to a computer by a mere transfer of information. The chapter questions this assumption, then asks whether the procedure might be just as good, as far as our interests go, even if it could not move us to a computer.

The Metaphysics of Transhumanism

(Work in Progress) Scientific Theory and the Epistemology of Neural Networks

Scientific Theory and the Epistemology of Neural Networks

This project is housed at the Institute of Futures Research and seeks to address some challenges associated with the interpretability, explainability, and comprehensibility of neural networks—often termed ‘black boxes’ due to alleged epistemic opacity. Despite these limitations, I propose that neural network-generated knowledge can be epistemically licensed when they align with the theoretical requirements of scientific theories. Specifically, I focus on scientific theories that can be effectively represented through structures, or formal systems of symbols, statements, and inference rules. My goal is to establish a framework that positions neural networks as a plausible intermediary between terms and empirical statements in the formal apparatus of scientific theory, as satisfied traditionally by theoretical statements. This approach would bridge the gap between the computations and statistical results of neural networks and the epistemic objectives of science, and address concerns associated with the epistemic opacity of these models. By advancing a newly probabilistic account of scientific theories centered on neural networks, I hope to contribute new perspectives to the discourse on the role and interpretation of AI in scientific inquiry and the philosophy of science.

Objectives of this project include:

  • Clearly defining and operationalizing in a philosophical context such notions as artificial intelligence, artificial neural network, and the structure of scientific theory
  • Given an account of scientific theory, critically examining the available theoretical apparatus and understanding the role of ANNs in the production of science knowledge
  • Exploring the senses of epistemic opacity implicated by ANNs, and identifying those most relevant to the project
  • Understanding the scope of epistemic concerns surrounding the use of ANNs in the production of science knowledge
  • Providing a framework for translating the function and properties of ANNs to the structure, both syntactic and semantic, of theoretical statements in scientific theory
  • Demonstrating that ANNs satisfy the requirements of theoretical statements of scientific theory, epistemic and formal
  • Providing additional informal motivation towards the epistemic license of ANNs

This approach is not without complications. In particular, the recourse to ANNs in the generation of science knowledge introduces a novel source of uncertainty. If artificial neural networks are the objects of epistemic license in a scientific theory, whatever uncertainty pervades the algorithm reflexively pervades the generated science knowledge, which we would take to be (empirical) hypotheses of the theory. Furthermore, we may decide that such theories, while being adequately prescriptive, are nevertheless inadequately descriptive and transparent. They may be inadequately explanatory, and introduce a novel uncertainty when attempting to reproduce their results.

Within the scope of this research project, I am conducting a review of the literature pertaining to syntactic and semantic accounts of scientific theory, epistemological challenges to neural network methods, the formal specification of artificial neural networks, and contributions of neural network-based research to science knowledge. A nonexhaustive bibliography follows.

I am greatly interested in potential feedback on this project, and suggestions for further reading.

References

Almada, M. (2019). Human Intervention in Automated Decision-Making: Toward the Construction of Contestable Systems. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Law, 2–11. Presented at the Montreal, QC, Canada. doi:10.1145/3322640.3326699

Andoni, A., Panigrahy, R., Valiant, G., & Zhang, L. (22–24 Jun 2014). Learning Polynomials with Neural Networks. In E. P. Xing & T. Jebara (Eds.), Proceedings of the 31st International Conference on Machine Learning (pp. 1908–1916). Retrieved from https://proceedings.mlr.press/v32/andoni14.html

Ananny, M., & Crawford, K. (12 2016). Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability. New Media & Society, 0, 146144481667664. doi:10.1177/1461444816676645

Barredo Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., … Herrera, F. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. doi:10.1016/j.inffus.2019.12.012

Barron, A. (06 1993). Barron, A.E.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. on Information Theory 39, 930-945. Information Theory, IEEE Transactions On, 39, 930–945. doi:10.1109/18.256500

Bennett, M.T., Maruyama, Y. (2022). The Artificial Scientist: Logicist, Emergentist, and Universalist Approaches to Artificial General Intelligence. In: Goertzel, B., Iklé, M., Potapov, A. (eds) Artificial General Intelligence. AGI 2021. Lecture Notes in Computer Science(), vol 13154. Springer, Cham. https://doi.org/10.1007/978-3-030-93758-4_6

Binns, R., Van Kleek, M., Veale, M., Lyngs, U., Zhao, J., & Shadbolt, N. (2018). “It”s Reducing a Human Being to a Percentage’. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. doi:10.1145/3173574.3173951

Boge, F.J. Two Dimensions of Opacity and the Deep Learning Predicament. Minds & Machines 32, 43–75 (2022). https://doi.org/10.1007/s11023-021-09569-4

Breen, P. G., Foley, C. N., Boekholt, T., & Zwart, S. P. (2020). Newton versus the machine: solving the chaotic three-body problem using deep neural networks. Monthly Notices of the Royal Astronomical Society, 494(2), 2465–2470. doi:10.1093/mnras/staa713

Burrell, J. (2016). How the Machine \textquoteleftThinks\textquoteright: Understanding Opacity in Machine Learning Algorithms. Big Data and Society, 3(1). doi:10.1177/2053951715622512

Carabantes, M. (06 2020). Black-box artificial intelligence: an epistemological and critical analysis. AI & SOCIETY, 35. doi:10.1007/s00146-019-00888-w

Carnap, R., & Gardner, M. (1966). Philosophical Foundations of Physics: An Introduction to the Philosophy of Science. Retrieved from https://books.google.com/books?id=rP3YAAAAIAAJ

Carvalho, D., Pereira, E., & Cardoso, J. (07 2019). Machine Learning Interpretability: A Survey on Methods and Metrics. Electronics, 8, 832. doi:10.3390/electronics8080832

Cichy, R., & Kaiser, D. (02 2019). Deep Neural Networks as Scientific Models. Trends in Cognitive Sciences, 23. doi:10.1016/j.tics.2019.01.009

Doran, D., Schulz, S., & Besold, T. R. (2017). What Does Explainable AI Really Mean? A New Conceptualization of Perspectives. ArXiv, abs/1710.00794.

Dressel, J., & Farid, H. (01 2018). The accuracy, fairness, and limits of predicting recidivism. Science Advances, 4, eaao5580. doi:10.1126/sciadv.aao5580

Dorffner, G., Wiklicky, H., & Prem, E. (1994). Formal neural network specification and its implications on standardization. Computer Standards & Interfaces, 16(3), 205–219. doi:10.1016/0920-5489(94)90012-4

Durán, J., & Formanek, N. (12 2018). Grounds for Trust: Essential Epistemic Opacity and Computational Reliabilism. Minds and Machines. doi:10.1007/s11023-018-9481-6

Durán, J. M., & Jongsma, K. R. (2021). Who is afraid of black box algorithms? On the epistemological and ethical basis of trust in medical AI. Journal of Medical Ethics, 47(5), 329–335. doi:10.1136/medethics-2020-106820

Fiesler, E. (1994). Neural network classification and formalization. Computer Standards & Interfaces, 16(3), 231–239. doi:10.1016/0920-5489(94)90014-0

Gilpin, L., Bau, D., Yuan, B., Bajwa, A., Specter, M., & Kagal, L. (10 2018). Explaining Explanations: An Overview of Interpretability of Machine Learning. 80–89. doi:10.1109/DSAA.2018.00018

Girard-Satabin, J., Charpiat, G., Chihani, Z., & Schoenauer, M. (2019). CAMUS: A Framework to Build Formal Specifications for Deep Perception Systems Using Simulators. CoRR, abs/1911.10735. Retrieved from http://arxiv.org/abs/1911.10735

Hollanek, T. (11 2020). AI transparency: a matter of reconciling design with critique. AI & SOCIETY, 1–9. doi:10.1007/s00146-020-01110-y

Humphreys, P. (2009). The Philosophical Novelty of Computer Simulation Methods. Synthese, 169(3), 615–626. doi:10.1007/s11229-008-9435-2

Ismailov, V. (2020). A three layer neural network can represent any discontinuous multivariate function. arXiv [Cs.LG]. Retrieved from http://arxiv.org/abs/2012.03016

Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). https://doi.org/10.1038/s41586-021-03819-2

Karniadakis, G.E., Kevrekidis, I.G., Lu, L. et al. Physics-informed machine learning. Nat Rev Phys 3, 422–440 (2021). https://doi.org/10.1038/s42254-021-00314-5

Kim, J. (1988). What Is ‘Naturalized Epistemology?’ Philosophical Perspectives, 2, 381–405. Retrieved from http://www.jstor.org/stable/2214082

Knuuttila, T. (2011). Modelling and representing: An artefactual approach to model-based representation. Studies in History and Philosophy of Science Part A, 42(2), 262–271. doi:10.1016/j.shpsa.2010.11.034

Marquis, P., Papini, O., & Prade, H. (2020). A Guided Tour of Artificial Intelligence Research: Volume III: Interfaces and Applications of Artificial Intelligence. Retrieved from https://books.google.com/books?id=z07iDwAAQBAJ

Mahbooba, B., Timilsina, M., Sahal, R., & Serrano, M. (01 2021). Explainable Artificial Intelligence (XAI) to Enhance Trust Management in Intrusion Detection Systems Using Decision Tree Model. Complexity, 2021, 11. doi:10.1155/2021/6634811

Martínez-Ordaz, M. del R. (2023). Scientific understanding through big data: From ignorance to insights to understanding. Possibility Studies & Society, 0(0). https://doi.org/10.1177/27538699231176523

Montavon, G., Samek, W., & Müller, K.-R. (2018). Methods for interpreting and understanding deep neural networks. Digital Signal Processing, 73, 1–15. doi:10.1016/j.dsp.2017.10.011

Pappas, G. S. (2012). Justification and Knowledge: New Studies in Epistemology. Retrieved from https://books.google.com/books?id=6FlgBgAAQBAJ

Ramsey, W. (1997). Do Connectionist Representations Earn Their Explanatory Keep? Mind and Language, 12(1), 34–66. doi:10.1111/j.1468-0017.1997.tb00061.x

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). ‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144. Presented at the San Francisco, California, USA. doi:10.1145/2939672.2939778

Rudin, C. (05 2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1, 206–215. doi:10.1038/s42256-019-0048-x

Samek, W., Wiegand, T., & Müller, K.-R. (10 2017). Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models. ITU Journal: ICT Discoveries – Special Issue 1 – The Impact of Artificial Intelligence (AI) on Communication Networks and Services, 1, 1–10.

Sanchez-Gonzalez, A., Godwin, J., Pfaff, T., Ying, R., Leskovec, J., & Battaglia, P. W. (2020). Learning to Simulate Complex Physics with Graph Networks. CoRR, abs/2002.09405. Retrieved from https://arxiv.org/abs/2002.09405

Seshia, S. A., Desai, A., Dreossi, T., Fremont, D. J., Ghosh, S., Kim, E., … Yue, X. (2018). Formal Specification for Deep Neural Networks. In S. K. Lahiri & C. Wang (Eds.), Automated Technology for Verification and Analysis (pp. 20–34). Cham: Springer International Publishing.

Srećković, S., Berber, A. & Filipović, N. The Automated Laplacean Demon: How ML Challenges Our Views on Prediction and Explanation. Minds & Machines 32, 159–183 (2022). https://doi.org/10.1007/s11023-021-09575-6

Szymanski, L., & McCane, B. (2012). Deep, super-narrow neural network is a universal classifier. The 2012 International Joint Conference on Neural Networks (IJCNN), 1–8. doi:10.1109/IJCNN.2012.6252513

Taylor, B. J., & Darrah, M. A. (2005). Rule extraction as a formal method for the verification and validation of neural networks. Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., 5, 2915–2920 vol. 5. doi:10.1109/IJCNN.2005.1556388

Walmsley, J. (06 2021). Artificial intelligence and the value of transparency. AI & SOCIETY, 36, 1–11. doi:10.1007/s00146-020-01066-z

Wheeler, G. R., & Pereira, L. M. (2004). Epistemology and artificial intelligence. Journal of Applied Logic, 2(4), 469–493. doi:10.1016/j.jal.2004.07.007

Xu, T., Zhan, J., Garrod, O. G. B., Torr, P. H. S., Zhu, S.-C., Ince, R. A. A., & Schyns, P. G. (2018). Deeper Interpretability of Deep Networks. arXiv [Cs.CV]. Retrieved from http://arxiv.org/abs/1811.07807

Zednik, C. (06 2021). Solving the Black Box Problem: A Normative Framework for Explainable Artificial Intelligence. Philosophy & Technology, 34. doi:10.1007/s13347-019-00382-7

Zerilli, J., Knott, A., Maclaurin, J., & Gavaghan, C. (12 2019). Transparency in Algorithmic and Human Decision-Making: Is There a Double Standard? Philosophy & Technology, 32. doi:10.1007/s13347-018-0330-6

Highly accurate protein structure prediction with AlphaFold. (08 2021). Nature, 596, 583–589. doi:10.1038/s41586-021-03819-2

(Featured) Limits of conceivability in the study of the future. Lessons from philosophy of science

Limits of conceivability in the study of the future. Lessons from philosophy of science

Veli Virmajoki explores the epistemological and conceptual limitations of futures studies, and offers an enlightening perspective in the philosophical discourse on the conceivability of future possibilities. Utilizing three case studies from the philosophy of science as the crux of its argument, the paper meticulously dissects how these limitations pose significant obstacles in envisaging alternatives to the present state of affairs. The author poses a thought-provoking argument centered on the constraints imposed by our current understanding of reality and the mechanisms it employs to reinforce its own continuity and inevitability.

The backbone of this philosophical inquiry lies in the robust debate between inevitabilism, a stance asserting the inevitable development of specific scientific theories, and contingentism, a view that endorses the potentiality of genuinely alternative scientific trajectories. The exploration of this contentious issue facilitates a deeper understanding of the constraints in predicting future scenarios, as our ability to conceptualize these alternatives is bound by our understanding of past and present realities. The paper deftly argues that the choice between inevitabilism and contingentism is fundamentally intertwined with our personal intuition about the range of genuine possibilities, thereby asserting the subjective nature of perceived futurity. As such, the article offers a fresh, critical lens to scrutinize the underpinnings of futures studies, and instigates a profound rethinking of our philosophical approach to anticipating what lies ahead.

Unconceived Possibilities and their Consequences

The author asserts that our conception of potential futures is significantly limited by profound epistemological and conceptual factors. They draw on the case study of the late 19th-century ether theories in physics, where, despite the existence of genuinely alternative theories, only a limited number of possibilities were conceived due to prevailing scientific practices and principles. The author uses this historical case to illustrate that while some futures may seem inconceivable from our present vantage point, they may still fall within the realm of genuine possibilities.

Moreover, the author argues that the potential impact of these unconceived possibilities extends beyond the localized elements of a system to reverberate throughout its entirety. This underlines the complexity of the task in futures studies; any unconceived alternatives in one sector of a system can trigger significant, far-reaching consequences for the entire system. Therefore, the research warns against oversimplification in predicting future scenarios and emphasizes the need for a nuanced approach that recognizes the interconnectedness of elements within any given system. This presents a remarkable challenge for futures studies, highlighting the depth of the iceberg that lies beneath the surface of our current epistemological and conceptual understanding.

Historical Trajectories and Justification of Future Possibilities

In the examination of plausibility and the justification of future possibilities, the article underscores the fundamental epistemological and conceptual challenges that limit our capability to predict alternative futures. The author refers to historical episodes like the case of Soviet cybernetics, where the existence of plausible alternative futures was not recognized, due to the collective failure to see past the status quo. It brings to light the inherent difficulties in justifying the plausibility or even the possibility of certain futures, where our current knowledge systems and conceptual frameworks may blind us to divergent scenarios. This observation raises pertinent questions about the inherent biases of our epistemic practices, as well as the potential for deeply entrenched beliefs to restrict our ability to imagine and evaluate a broader range of future possibilities. Hence, this line of inquiry necessitates the careful examination of the underlying assumptions that might constrain the scope of our foresight and deliberations on future possibilities.

The article further discusses the concept of historical trajectories and their connection to future possibilities, offering a philosophical lens into the entanglement of past, present, and future. It argues that our understanding of history and future possibilities, and our interpretation of the present’s robustness and inevitability, are inextricably linked through a complex web of modal considerations. The author emphasizes the interconnectedness of past trajectories and future possibilities, arguing that the way we perceive historical possibilities affects how we anticipate future outcomes. This perspective allows us to examine whether it is the deterministic view of history (inevitabilism) or the contingency of events (contingentism) that should be the default position, a determination that would have profound implications for our understanding of future possibilities.

Inevitabilism vs. Contingentism

Tthe author elaborates on a crucial dichotomy in philosophy of science: inevitabilism versus contingentism. Inevitabilism implies a deterministic understanding of scientific and historical development, where the present state of affairs appears as the unique and necessary outcome of the past. Contingentism, on the other hand, endorses the idea of multiple genuine alternatives to the current state, thus opening the space of historical and future possibilities. The article underscores that these positions are not simply academic disputes but carry substantial implications for how we conceive possibilities for the future. Moreover, these divergent outlooks reflect the individual’s inherent beliefs and intuitions about the range of possibilities within human affairs. The author contends that these perspectives cannot conclusively advocate for or against alternative futures because one’s stance on the inevitabilism versus contingentism debate inherently relies on their preconceived notions of the scope of historical and future possibilities.

Future Research Avenues

In light of the research as presented, promising avenues for future research emerge. The author suggests a systematic examination of the epistemological and conceptual boundaries of our ability to conceive and reason about potential futures. Such an investigation is not limited to philosophical discourse but requires interdisciplinary dialogue with a myriad of fields, as these boundaries are, in part, shaped by our social and scientific structures. This method of research would offer a comprehensive understanding of the creative and critical capacities of futures studies and aid us in recognizing our epistemological and conceptual predicament concerning future possibilities. Furthermore, it could potentially expose the manner in which these boundaries are historically mutable, opening up a discussion about the renegotiation of the boundaries of conceivability.

Abstract

In this paper, the epistemological and conceptual limits of our ability to conceive and reason about future possibilities are analyzed. It is argued that more attention should be paid in futures studies on these epistemological and conceptual limits. Drawing on three cases from philosophy of science, the paper argues that there are deep epistemological and conceptual limits in our ability to conceive and reason about alternatives to the current world. The nature and existence of these limits are far from obvious and become visible only through careful investigation. The cases establish that we often are unable to conceive relevant alternatives; that historical and counterfactual considerations are more limited than has been suggested; and that the present state of affairs reinforces its hegemony through multiple conceptual and epistemological mechanisms. The paper discusses the reasons behind the limits of the conceivability and the consequences that follow from the considerations that make the limits visible. The paper suggests that the epistemological and conceptual limits in our ability to conceive and reason about possible futures should be mapped systematically. The mapping would provide a better understanding of the creative and critical bite of futures studies.

Limits of conceivability in the study of the future. Lessons from philosophy of science

(Featured) Future value change: Identifying realistic possibilities and risks

Future value change: Identifying realistic possibilities and risks

The advent of rapid technological development has prompted philosophical investigation into the ways in which societal values might adapt or evolve in response to changing circumstances. One such approach is axiological futurism, a discipline that endeavors to anticipate potential shifts in value systems proactively. The research article at hand makes a significant contribution to the developing field of axiological futurism, proposing innovative methods for predicting potential trajectories of value change. This article from Jeroen Hopster underscores the complexity and nuance inherent in such a task, acknowledging the myriad factors influencing the evolution of societal values.

His research presents an interdisciplinary approach to advance axiological futurism, drawing parallels between the philosophy of technology and climate scholarship, two distinct yet surprisingly complementary fields. Both fields, it argues, share an anticipatory nature, characterized by a future orientation and a firm grounding in substantial uncertainty. Notably, the article positions climate science’s sophisticated modelling techniques as instructive for philosophical studies, promoting the use of similar predictive models in axiological futurism. The approach suggested in the article enriches the discourse on futures studies by integrating strategies from climate science and principles from historical moral change, presenting an enlightened perspective on the anticipatory framework.

Theoretical Framework

The theoretical framework of the article is rooted in the concept of axiological possibility spaces, a means to anticipate future moral change based on a deep historical understanding of past transformations in societal values. The researcher proposes that these spaces represent realistic possibilities of value change, where ‘realism’ is a function of historical conditioning. To illustrate, processes of moralisation and demoralisation are considered historical markers that offer predictive insights into future moral transitions. Moralisation is construed as the phenomenon wherein previously neutral or non-moral issues acquire moral significance, while demoralisation refers to the converse. As the research paper posits, these processes are essential to understanding how technology could engender shifts in societal values.

In particular, the research identifies two key factors—technological affordances and the emergence of societal challenges—as instrumental in driving moralisation or demoralisation processes. The author suggests that these factors collectively engender realistic possibilities within the axiological possibility space. Notably, the concept of technological affordances serves to underline how new technologies, by enabling or constraining certain behaviors, can precipitate changes in societal values. On the other hand, societal challenges are posited to stimulate moral transformations in response to shifting social dynamics. Taken together, this theoretical framework stands as an innovative schema for the anticipation of future moral change, thereby contributing to the discourse of axiological futurism.

Axiological Possibility Space and Lessons from Climate Scholarship

The concept of an axiological possibility space, as developed in the research article, operates as a predictive instrument for anticipating future value change in societal norms and morals. This space is not a projection of all hypothetical future moral changes, but rather a compilation of realistic possibilities. The author defines these realistic possibilities as those rooted in the past and present, inextricably tied to the historical conditioning of moral trends. Utilizing historical patterns of moralisation and demoralisation, the author contends that these processes, in concert with the introduction of new technologies and arising societal challenges, provide us with plausible trajectories for future moral change. As such, the axiological possibility space serves as a tool to articulate these historically grounded projections, offering a valuable contribution to the field of anticipatory ethics and, more broadly, to the philosophy of futures studies.

A central insight from the article emerges from the intersection of futures studies and climate scholarship. The author skillfully extracts lessons from the way climate change prediction models operate, particularly the CMIP models utilized by the IPCC, and their subsequent shortcomings in the face of substantial uncertainty. The idea that the intricacies of predictive modeling can sometimes overshadow the focus on potentially disastrous outcomes is critically assessed. The author contends that the realm of axiological futurism could face similar issues and hence should take heed. Notably, the call for a shift from prediction-centric frameworks to a scenario approach that can articulate the spectrum of realistic possibilities is emphasized. This scenario approach, currently being developed in climate science under the term “storyline approach,” underlines the importance of compound risks and maintains a robust focus on potentially high-impact events. The author suggests that the axiological futurist could profitably adopt a similar strategy, exploring value change in technomoral scenarios, to successfully navigate the deep uncertainties intrinsic to predicting future moral norms.

Integration into Practical Fields and Relating to Broader Philosophical Discourse

The transfer of the theoretical discussion into pragmatic fields is achieved in the research with a thoughtful examination of its potential applications, primarily in value-sensitive design. By suggesting a need for engineers to take into consideration the dynamics of moralisation and demoralisation, the author not only proposes a shift in perspective, but also creates a bridge between theoretical discourse and practical implementation. Importantly, it is argued that a future-proof design requires an assessment of the probability of embedded values shifting in moral significance over time. The research paper goes further, introducing a risk-based approach to the design process, where engineers should not merely identify likely value changes but rather seek out those changes that render the design most vulnerable from a moral perspective. The mitigation of these high-risk value changes then becomes a priority in design adaptation, solidifying the article’s argument that axiological futurism is an essential tool in technological development.

The author’s analysis also presents a substantial contribution to the broader philosophical discourse, notably the philosophy of futures studies and the ethics of technology. By integrating concepts from climatology and axiology, the work demonstrates an interdisciplinary approach that enriches philosophical discourse, emphasizing how diverse scientific fields can illuminate complex ethical issues in technology. Importantly, the work builds on and critiques the ideas of prominent thinkers like John Danaher, pushing for a more diversified and pragmatic approach in axiological futurism, rather than a singular reliance on model-based projections. The research also introduces the critical notion of “realistic possibilities” into the discourse, enriching our understanding of anticipatory ethics. It advocates for a shift in focus towards salient normative risks, drawing parallels to climate change scholarship and highlighting the necessity for anticipatory endeavours to be both scientifically plausible and ethically insightful. This approach has potential for a significant impact on philosophical studies concerning value change and the ethical implications of future technologies.

Future Research Directions

The study furnishes ample opportunities for future research in the philosophy of futures studies, particularly concerning the integration of its insights into practical fields and its implications for anticipatory ethics. The author’s exploration of axiological possibility spaces remains an open-ended endeavor; further work could be conducted to investigate the specific criteria or heuristic models that could guide ethical assessments within these spaces. The potential application of these concepts in different technological domains, beyond AI and climate change, also presents an inviting avenue of inquiry. Moreover, as the author has adopted lessons from climate scholarship, similar interdisciplinary approaches could be employed to incorporate insights from other scientific disciplines. Perhaps most intriguingly, the research introduces a call for a critical exploration of “realistic possibilities,” an area that is ripe for in-depth theoretical and empirical examination. Future research could build upon this foundational concept, investigating its broader implications, refining its methodological underpinnings, and exploring its potential impact on policy making and technological design.

Abstract

The co-shaping of technology and values is a topic of increasing interest among philosophers of technology. Part of this interest pertains to anticipating future value change, or what Danaher (2021) calls the investigation of ‘axiological futurism’. However, this investigation faces a challenge: ‘axiological possibility space’ is vast, and we currently lack a clear account of how this space should be demarcated. It stands to reason that speculations about how values might change over time should exclude farfetched possibilities and be restricted to possibilities that can be dubbed realistic. But what does this realism criterion entail? This article introduces the notion of ‘realistic possibilities’ as a key conceptual advancement to the study of axiological futurism and offers suggestions as to how realistic possibilities of future value change might be identified. Additionally, two slight modifications to the approach of axiological futurism are proposed. First, axiological futurism can benefit from a more thoroughly historicized understanding of moral change. Secondly, when employed in service of normative aims, the axiological futurist should pay specific attention to identifying realistic possibilities that come with substantial normative risks.

Future value change: Identifying realistic possibilities and risks

(Featured) The Ethics of Terminology: Can We Use Human Terms to Describe AI?

The Ethics of Terminology: Can We Use Human Terms to Describe AI?

The philosophical discourse on artificial intelligence (AI) often negotiates the boundary of the human-anthropocentric worldview, pivoting around the use of human attributes to describe and assess AI. In this context, the research article by Ophelia Deroy presents a compelling inquiry into our linguistic and cognitive tendency to ascribe human characteristics, particularly “trustworthiness,” to artificial entities. In an attempt to unravel the philosophical implications and ramifications of this anthropomorphism, the author explores three conceptual frameworks – new ontological category, extended human-category, and Deroy’s semi-propositional beliefs. The divergence among these perspectives underscores the complexity of the issue, highlighting how our conceptions of AI shape our interactions with and attitudes towards it.

In addition to ontological and communicative aspects, the article scrutinizes the legal dimension of AI personhood. It analyzes the merits and shortcomings of the legal argument for ascribing personhood to AI, juxtaposing it with the established notion of corporate personhood. Although this comparison offers certain pragmatic and epistemic advantages, it does not unequivocally endorse the uncritical application of human terminology to AI. Through this multi-faceted analysis, the research article integrates perspectives from philosophy, cognitive science, and law, extending the ongoing discourse about AI into uncharted territories. The examination of AI within this framework thus emerges as an indispensable part of philosophical futures studies.

Understanding Folk Concepts of AI

The exploration of folk concepts of AI is critical in understanding how people conceive and interpret artificial intelligence within their worldview. Ophelia Deroy meticulously dissects these concepts by challenging the prevalent ascription of ‘trustworthiness’ to AI. The article emphasizes the potential mismatch between our cognitive conception of trust in humans and the attributes usually associated with AI, such as reliability or predictability. The focus is not only on the logical inconsistencies of such anthropomorphic attributions but also on the potential for miscommunication they could engender, especially given the complexity and variability of the term ‘trustworthiness’ across cultures and languages.

The author employs an interesting analytical angle by exploring the notion of AI as a possible extension of the human category, or alternatively, as a distinct ontological category. The question at hand is whether people perceive AI as fundamentally different from humans or merely view them as extreme non-prototypical cases of humans. This consideration reflects the complex cognitive landscape we navigate when dealing with AI, pointing towards the potential ontological ambiguity surrounding AI. Understanding these folk concepts and the mental models they reflect not only enriches our comprehension of AI from a sociocultural perspective but also yields important insights for the development and communication strategies of AI technologies.

Human Terms and their Implications, Legal Argument

The linguistic choice of using human terms such as “trustworthiness” to describe AI, arguably entrenched in anthropocentric reasoning, poses substantial problems. The author identifies three interpretations of how people categorize AI: an extension of the human category, a distinct ontological category, or a semi-propositional belief akin to religious or spiritual constructs. This last interpretation is particularly illuminating, suggesting that people might hold inconsistent beliefs about AI without considering them irrational. This offers a crucial insight into how human language shapes our understanding and discourse about AI, potentially fostering misconceptions. Yet, the author points out, there is a lack of empirical evidence supporting the appropriateness of applying such human-centric terms to AI, raising questions about the legitimacy of this linguistic practice in both scientific and broader public contexts.

In the discussion of AI’s anthropomorphic portrayal, Deroy introduces a compelling legal perspective. Drawing parallels with the legal status granted to non-human entities like corporations, the author investigates whether AI could be treated as a “legal person,” a concept that could reconcile the use of human terms in AI discourse. However, this argument presents its own set of challenges and limitations. The text using such terms must clearly state that the analogical use of “trust” is with respect to legal persons and not actual persons, a nuance often overlooked in many texts. Moreover, the justification for using such legal fiction must weigh the potential benefits against possible costs or risks, a task best left to legal experts. Thus, despite its merits, the legal argument does not provide an unproblematic justification for humanizing AI discourse.

The Broader Philosophical Discourse and Future Directions

This study is an important contribution to the broader philosophical discourse, illuminating the intersection of linguistics, ethics, and futures studies. The argument challenges the conventional notion of language as a neutral medium, stressing the normative power of language in shaping societal perception of AI. This aligns with the poststructuralist argument that reality is socially constructed, extending it to a technological context. The insight that folk concepts, embedded in language, influence our collective vision of AI’s role echoes phenomenological philosophies which underscore the role of intersubjectivity in shaping our shared reality. The ethical implications arising from the anthropomorphic portrayal of AI resonate with moral philosophy, particularly debates on moral agency and personhood. Thus, this study reinforces the growing realization that philosophical reflections are integral to our navigation of an increasingly AI-infused future.

Furthermore, the research points towards several promising avenues for future investigation. The most apparent is an extension of this study across diverse cultures and languages to explore how varying linguistic contexts may shape differing conceptions of AI, revealing cultural variations in anthropomorphizing technology. A comparative study might yield valuable insights into the societal implications of folk concepts across the globe. Additionally, an exploration into the real-world impact of anthropomorphic language in AI discourse, such as its effects on policy-making and public sentiment towards AI, would be an enlightening sequel. Lastly, this work paves the way for developing an ethical framework to guide the linguistic portrayal of AI in public discourse, a timely topic given the accelerating integration of AI into our daily lives. Thus, this research sets a fertile ground for multidisciplinary inquiries into linguistics, sociology, ethics, and futures studies.

Abstract

Despite facing significant criticism for assigning human-like characteristics to artificial intelligence, phrases like “trustworthy AI” are still commonly used in official documents and ethical guidelines. It is essential to consider why institutions continue to use these phrases, even though they are controversial. This article critically evaluates various reasons for using these terms, including ontological, legal, communicative, and psychological arguments. All these justifications share the common feature of trying to justify the official use of terms like “trustworthy AI” by appealing to the need to reflect pre-existing facts, be it the ontological status, ways of representing AI or legal categories. The article challenges the justifications for these linguistic practices observed in the field of AI ethics and AI science communication. In particular, it takes aim at two main arguments. The first is the notion that ethical discourse can move forward without the need for philosophical clarification, bypassing existing debates. The second justification argues that it’s acceptable to use anthropomorphic terms because they are consistent with the common concepts of AI held by non-experts—exaggerating this time the existing evidence and ignoring the possibility that folk beliefs about AI are not consistent and come closer to semi-propositional beliefs. The article sounds a strong warning against the use of human-centric language when discussing AI, both in terms of principle and the potential consequences. It argues that the use of such terminology risks shaping public opinion in ways that could have negative outcomes.

The Ethics of Terminology: Can We Use Human Terms to Describe AI?

(Featured) Cognitive architectures for artificial intelligence ethics

Cognitive architectures for artificial intelligence ethics

The landscape of artificial intelligence (AI) is a complex and rapidly evolving field, one that increasingly intersects with ethical, philosophical, and societal considerations. The role of AI in shaping our future is now largely uncontested, with potential applications spanning an array of sectors from healthcare to education, logistics to creative industries. Of particular interest, however, is not merely the surface-level functionality of these AI systems, but the cognitive architectures underpinning them. Cognitive architectures, a theoretical blueprint for cognitive and intelligent behavior, essentially dictate how AI systems perceive, think, and act. They therefore represent a foundational aspect of AI design and hold substantial implications for how AI systems will interact with, and potentially transform, our broader societal structures.

Yet, the discourse surrounding these architectures is, to a large extent, bifurcated between two paradigms: the biological cognitive architecture and the functional cognitive architecture. The biological paradigm, primarily drawing from neuroscience and biology, emphasizes replicating the cognitive processes of the human brain. On the other hand, the functional paradigm, rooted more in computer science and engineering, is concerned with designing efficient systems capable of executing cognitive tasks, regardless of whether they emulate human cognitive processes. This fundamental divergence in design philosophy thus embodies distinct assumptions about the nature of cognition and intelligence, consequently shaping the way AI systems are created and how they might impact society. It is these paradigms, their implications, and their interplay with AI ethics principles, that form the main themes of this essay.

Frameworks for Understanding Cognitive Architectures and the Role of Mental Models in AI Design

Cognitive architectures, central to the progression of artificial intelligence, encapsulate the fundamental rules and structures that drive the operation of an intelligent agent. The research article situates its discussion within two dominant theoretical frameworks: symbolic and connectionist cognitive architectures. Symbolic cognitive architectures, rooted in the realm of logic and explicit representation, emphasize rule-based systems and algorithms. They are typified by their capacity for discrete, structured reasoning, often relating to high-level cognitive functions such as planning and problem-solving. This structured approach carries the advantage of interpretability, affording clearer insights into the decision-making processes.

On the other hand, connectionist cognitive architectures embody a divergent perspective, deriving their inspiration from biological neural networks. Connectionist models prioritize emergent behavior and learning from experience, expressed in the form of neural networks that adjust synaptic weights based on input. These architectures have exhibited exceptional performance in pattern recognition and adaptive learning scenarios. However, their opaque, ‘black-box’ nature presents challenges to understanding and predicting their behavior. The interplay between these two models, symbolizing the tension between the transparent but rigid symbolic approach and the flexible but opaque connectionist approach, forms the foundation upon which contemporary discussions of cognitive architectures in AI rest.

The incorporation of mental models in AI design represents a nexus where philosophical interpretations of cognition intersect with computational practicalities. The use of mental models, i.e., internal representations of the world and its operational mechanisms, is a significant bridge between biological and functional cognitive architectures. This highlights the philosophical significance of mental models in the study of AI design: they reflect the complex interplay between the reality we perceive and the reality we construct. The efficacy of mental models in AI system design underscores their pivotal role in knowledge acquisition and problem-solving. In the biological cognitive framework, mental models mimic human cognition’s non-linear, associative, and adaptive nature, thereby conforming to the cognitive isomorphism principle. On the other hand, the functional cognitive framework employs mental models as pragmatic tools for efficient task execution, demonstrating a utilitarian approach to cognition. Thus, the role of mental models in AI design serves as a litmus test for the philosophical assumptions underlying distinct cognitive architectures.

Philosophical Reflections and AI Ethics Principles in Relation to Cognitive Architectures

AI ethics principles, primarily those concerning autonomy, beneficence, and justice, possess substantial implications for the understanding and application of cognitive architectures. If we consider the biological framework, ethical considerations significantly arise concerning the autonomy and agency of AI systems. To what extent can, or should, an AI system with a human-like cognitive structure make independent decisions? The principle of beneficence—commitment to do good and prevent harm—profoundly impacts the design of functional cognitive architectures. Here, a tension surfaces between the utilitarian goal of optimized task execution and the prevention of potential harm resulting from such single-mindedness. Meanwhile, the principle of justice—fairness in the distribution of benefits and burdens—prompts critical scrutiny of the societal consequences of both architectures. As these models become more prevalent, we must continuously ask: Who benefits from these technologies, and who bears the potential harms? Consequently, the intricate intertwining of AI ethics principles with cognitive architectures brings philosophical discourse to the forefront of AI development, establishing its pivotal role in shaping the future of artificial cognition.

The philosophical discourse surrounding AI and cognitive architectures is deeply entwined with the ethical, ontological, and epistemological considerations inherent to AI design. On an ethical level, the discourse probes the societal implications of these technologies and the moral responsibilities of their developers. The questions of what AI is and what it could be—an ontological debate—become pressing as cognitive architectures increasingly mimic the complexities of the human mind. Furthermore, the epistemological dimension of this discourse explores the nature of AI’s knowledge acquisition and decision-making processes. This discourse, therefore, cannot be separated from the technological progression of AI, as the philosophical issues at play directly inform the design choices made. Thus, philosophical reflections are not merely theoretical musings but tangible influences on the future of AI and, by extension, society. As AI continues to evolve, the ongoing dialogue between philosophy and technology will be critical in guiding its development towards beneficial and ethical ends.

Future Directions for Research

Considering the rapid advancement of AI, cognitive architectures, and their deep-rooted philosophical implications, potential avenues for future research appear vast and multidimensional. It would be valuable to delve deeper into the empirical examination of cognitive architectures’ impact on decision-making processes in AI, quantitatively exploring their effect on AI reliability and behavior. A comparative study across different cognitive architecture models, analyzing their benefits and drawbacks in diverse real-world contexts, would further enrich the understanding of their practical applications. As ethical considerations take center stage, research exploring the development and implementation of ethical guidelines specific to cognitive architectures is essential. Notably, studies addressing the question of how to efficiently integrate philosophical perspectives into the technical development process could be transformative. Furthermore, in this era of advancing AI technologies, maintaining a dialogue between the technologists and the philosophers is crucial; thus, fostering interdisciplinary collaborations between AI research and philosophy should be a high priority in future research agendas.

Abstract

As artificial intelligence (AI) thrives and propagates through modern life, a key question to ask is how to include humans in future AI? Despite human involvement at every stage of the production process from conception and design through to implementation, modern AI is still often criticized for its “black box” characteristics. Sometimes, we do not know what really goes on inside or how and why certain conclusions are met. Future AI will face many dilemmas and ethical issues unforeseen by their creators beyond those commonly discussed (e.g., trolley problems and variants of it) and to which solutions cannot be hard-coded and are often still up for debate. Given the sensitivity of such social and ethical dilemmas and the implications of these for human society at large, when and if our AI make the “wrong” choice we need to understand how they got there in order to make corrections and prevent recurrences. This is particularly true in situations where human livelihoods are at stake (e.g., health, well-being, finance, law) or when major individual or household decisions are taken. Doing so requires opening up the “black box” of AI; especially as they act, interact, and adapt in a human world and how they interact with other AI in this world. In this article, we argue for the application of cognitive architectures for ethical AI. In particular, for their potential contributions to AI transparency, explainability, and accountability. We need to understand how our AI get to the solutions they do, and we should seek to do this on a deeper level in terms of the machine-equivalents of motivations, attitudes, values, and so on. The path to future AI is long and winding but it could arrive faster than we think. In order to harness the positive potential outcomes of AI for humans and society (and avoid the negatives), we need to understand AI more fully in the first place and we expect this will simultaneously contribute towards greater understanding of their human counterparts also.

Cognitive architectures for artificial intelligence ethics

(Featured) Moral disagreement and artificial intelligence

Moral disagreement and artificial intelligence

Pamela Robinson proposes a robust examination of the methodological problems arising due to moral disagreement in the development and decision-making processes of artificial intelligence (AI). The central point of discussion is the formulation of ethical AI systems, in particular, the AI Decider, that needs to make decisions in cases where its decision subjects have moral disagreements. The author posits that the conundrum could potentially be managed using moral, compromise, or epistemic solutions.

The author systematically elucidates the possible solutions by presenting three categories. Moral solutions are proposed to involve choosing a moral theory and having AI align to it, like preference utilitarianism, thereby sidestepping disagreement by assuming moral consensus. Compromise solutions, on the other hand, suggest handling disagreement by aggregating moral views to arrive at a collective decision. The author introduces the Arrow’s impossibility theorem and Social Choice Theory as potential tools for AI decision-making. Lastly, epistemic solutions, arguably the most complex of the three, require the AI Decider to treat moral disagreement as evidence and adjust its decision accordingly. The author mentions several approaches within this category, such as reflective equilibrium, moral uncertainty, and moral hedging.

However, none of these solutions, the author asserts, can provide a perfect answer to the problem. Each solution is fraught with its own complexities and risks. Here, the concept of ‘moral risk,’ meaning the chance of getting things wrong morally, is introduced. The author postulates that the selection between an epistemic or compromise solution should depend on the moral risk involved. They argue that the methodological problem could be addressed by minimizing this moral risk, regardless of whether a moral, compromise, or epistemic solution is employed.

Delving into the broader philosophical themes, this paper reignites the enduring debate on the role and impact of moral relativism and objectivism within the sphere of artificial intelligence. The issues presented tie into the grand narrative of moral philosophy, particularly the discourse around meta-ethics and normative ethics, where differing moral perspectives invariably lead to dilemmas. The AI Decider, in this sense, mirrors the human condition where decision-making often requires navigating the labyrinth of moral disagreement. The author’s emphasis on moral risk provides a novel framework, bridging the gap between theoretical moral philosophy and the practical demands of AI ethics.

For future research, several intriguing pathways are suggested by this article. First, an in-depth exploration of the concept of ‘moral risk’ could illuminate new strategies for handling moral disagreement in AI decision-making. Comparative studies, analyzing the outcomes and repercussions of decisions made by an AI system utilizing moral, compromise, or epistemic solutions, could provide empirical evidence for the efficacy of these approaches. Lastly, given the dynamism of moral evolution, the impact of changes in societal moral views over time on an AI Decider’s decision-making process warrants investigation. This could include exploring how the AI system could effectively adapt to the evolution of moral consensus or disagreement within its decision subjects. Such future research could significantly enhance our understanding of ethical decision-making in AI systems, bringing us closer to the creation of more ethically aligned, responsive, and responsible artificial intelligence.

Abstract

Artificially intelligent systems will be used to make increasingly important decisions about us. Many of these decisions will have to be made without universal agreement about the relevant moral facts. For other kinds of disagreement, it is at least usually obvious what kind of solution is called for. What makes moral disagreement especially challenging is that there are three different ways of handling it. Moral solutions apply a moral theory or related principles and largely ignore the details of the disagreement. Compromise solutions apply a method of finding a compromise and taking information about the disagreement as input. Epistemic solutions apply an evidential rule that treats the details of the disagreement as evidence of moral truth. Proposals for all three kinds of solutions can be found in the AI ethics and value alignment literature, but little has been said to justify choosing one over the other. I argue that the choice is best framed in terms of moral risk.

Moral disagreement and artificial intelligence

(Featured) Is AI the Future of Mental Healthcare?

Is AI the Future of Mental Healthcare?

Francesca Minerva and Alberto Giubilini engage with the intricate subject of AI implementation in the mental healthcare sector, particularly focusing on the potential benefits and challenges of its utilization. They open by setting forth the landscape of the rising demand for mental healthcare globally and articulates that the conventional therapist-centric model might not be scalable enough to meet this demand. This sets the context for exploring the use of AI in supplementing or even replacing human therapists in certain capacities. The use of AI in mental healthcare is argued to have significant advantages such as scalability, cost-effectiveness, continuous availability, and the ability to harness and analyze vast amounts of data for effective diagnosis and treatment. However, there is an explicit acknowledgment of the potential downsides such as privacy concerns, issues with personal data use and potential misuse, and the need for regulatory frameworks for monitoring and ensuring the safe and ethical use of AI in this context.

Their research subsequently delves into the issues of potential bias in healthcare, highlighting how AI could both help overcome human biases and also potentially introduce new biases into healthcare provision. It elucidates that healthcare practitioners, despite their commitment to objectivity, may be prone to biases arising from a patient’s individual and social factors, such as age, social status, and ethnic background. AI, if programmed carefully, could potentially help counteract these biases by focusing more rigidly on symptoms, yet the article also underscores that AI, being programmed by humans, could be susceptible to biases introduced in its programming. This delicate dance of bias mitigation and introduction forms a key discussion point of the article.

Their research finally broaches two critical ethical-philosophical considerations, centering around the categorization of mental health disorders and the shifting responsibilities of mental health professionals with the introduction of AI. The authors argue that existing categorizations, such as those in DSM5, may not remain adequate or relevant if AI can provide more nuanced data and behavioral cues, thus potentially necessitating a reevaluation of diagnostic categories. The issue of professional responsibility is also touched upon, wherein the challenge of assigning responsibility for AI-enabled diagnosis, especially in the light of potential errors or misdiagnoses, is critically evaluated.

The philosophical underpinning of the research article is deeply rooted in the realm of ethics, epistemology, and ontological considerations of AI in healthcare. The philosophical themes underscored in the article, such as the reevaluation of categorizations of mental health disorders and the shifting responsibilities of mental health professionals, point towards broader philosophical discourses. These revolve around how technologies like AI challenge our existing epistemic models and ethical frameworks and demand a reconsideration of our ontological understanding of subjects like disease categories, diagnosis, and treatment. The question of responsibility, and the degree to which AI systems can or should be held accountable, is a compelling case of applied ethics intersecting with technology.

Future research could delve deeper into the philosophical dimensions of AI use in psychiatry. For instance, exploring the ontological questions of mental health disorders in the age of AI could be a meaningful avenue. Also, studying the epistemic shifts in our understanding of mental health symptoms and diagnosis with AI’s increasing role could be a fascinating research area. An additional perspective could be to examine the ethical considerations in the context of AI, particularly focusing on accountability, transparency, and the changing professional responsibilities of mental health practitioners. Investigating the broader societal and cultural implications of such a shift in mental healthcare provision could also provide valuable insights.

Excerpt

Over the past decade, AI has been used to aid or even replace humans in many professional fields. There are now robots delivering groceries or working in assembling lines in factories, and there are AI assistants scheduling meetings or answering the phone line of customer services. Perhaps even more surprisingly, we have recently started admiring visual art produced by AI, and reading essays and poetry “written” by AI (Miller 2019), that is, composed by imitating or assembling human compositions. Very recently, the development of ChatGPT has shown how AI could have applications in education (Kung et al. 2023) the judicial system (Parikh et al. 2019) and the entertainment industry.

Is AI the Future of Mental Healthcare?

(Featured) A neo-aristotelian perspective on the need for artificial moral agents (AMAs)

A neo-aristotelian perspective on the need for artificial moral agents (AMAs)

Alejo José G. Sison and Dulce M. Redín take a critical look at the concept of autonomous moral agents (AMAs), especially in relation to artificial intelligence (AI), from a neo-Aristotelian ethical standpoint. The authors open with a compelling critique of the arguments in favor of AMAs, asserting that they are neither inevitable nor guaranteed to bring practical benefits. They elucidate that the term ‘autonomous’ may not be fitting, as AMAs are, at their core, bound to the algorithmic instructions they follow. Moreover, the term ‘moral’ is questioned due to the inherent external nature of the proposed morality. According to the authors, the true moral good is internally driven and cannot be separated from the agent nor the manner in which it is achieved.

The authors proceed to suggest that the arguments against the development of AMAs have been insufficiently considered, proposing a neo-Aristotelian ethical framework as a potential remedy. This approach places emphasis on human intelligence, grounded in biological and psychological scaffolding, and distinguishes between the categories of heterotelic production (poiesis) and autotelic action (praxis), highlighting that the former can accommodate machine operations, while the latter is strictly reserved for human actors. Further, the authors propose that this framework offers greater clarity and coherence by explicitly denying bots the status of moral agents due to their inability to perform voluntary actions.

Lastly, the authors explore the potential alignment of AI and virtue ethics. They scrutinize the potential for AI to impact human flourishing and virtues through their actions or the consequences thereof. Herein, they feature the work of Vallor, who has proposed the design of “moral machines” by embedding norms, laws, and values into computational systems, thereby, focusing on human-computer interaction. However, they caution that such an approach, while intriguing, may be inherently flawed. The authors also examine two possible ways of embedding ethics in AI: value alignment and virtue embodiment.

The research article provides an interesting contribution to the ongoing debate on the potential for AI to function as moral agents. The authors adopt a neo-Aristotelian ethical framework to add depth to the discourse, providing a fresh perspective that integrates virtue ethics and emphasizes the role of human agency. This perspective brings to light the broader philosophical questions around the very nature of morality, autonomy, and the distinctive attributes of human intelligence.

Future research avenues might revolve around exploring more extensively how virtue ethics can interface with AI and if the goals that Vallor envisages can be realistically achieved. Further philosophical explorations around the assumptions of agency and morality in AI are also needed. Moreover, studies examining the practical implications of the neo-Aristotelian ethical framework, especially in the realm of human-computer interaction, would be invaluable. Lastly, it may be insightful to examine the authors’ final suggestion of approaching AI as a moral agent within the realm of fictional ethics, a proposal that opens up a new and exciting area of interdisciplinary research between philosophy, AI, and literature.

Abstract

We examine Van Wynsberghe and Robbins (JAMA 25:719-735, 2019) critique of the need for Artificial Moral Agents (AMAs) and its rebuttal by Formosa and Ryan (JAMA 10.1007/s00146-020-01089-6, 2020) set against a neo-Aristotelian ethical background. Neither Van Wynsberghe and Robbins (JAMA 25:719-735, 2019) essay nor Formosa and Ryan’s (JAMA 10.1007/s00146-020-01089-6, 2020) is explicitly framed within the teachings of a specific ethical school. The former appeals to the lack of “both empirical and intuitive support” (Van Wynsberghe and Robbins 2019, p. 721) for AMAs, and the latter opts for “argumentative breadth over depth”, meaning to provide “the essential groundwork for making an all things considered judgment regarding the moral case for building AMAs” (Formosa and Ryan 2019, pp. 1–2). Although this strategy may benefit their acceptability, it may also detract from their ethical rootedness, coherence, and persuasiveness, characteristics often associated with consolidated ethical traditions. Neo-Aristotelian ethics, backed by a distinctive philosophical anthropology and worldview, is summoned to fill this gap as a standard to test these two opposing claims. It provides a substantive account of moral agency through the theory of voluntary action; it explains how voluntary action is tied to intelligent and autonomous human life; and it distinguishes machine operations from voluntary actions through the categories of poiesis and praxis respectively. This standpoint reveals that while Van Wynsberghe and Robbins may be right in rejecting the need for AMAs, there are deeper, more fundamental reasons. In addition, despite disagreeing with Formosa and Ryan’s defense of AMAs, their call for a more nuanced and context-dependent approach, similar to neo-Aristotelian practical wisdom, becomes expedient.

A neo-aristotelian perspective on the need for artificial moral agents (AMAs)