Beyond the prompt: The Role of User Behavior in Shaping AI-misalignment and Societal Knowledge

Morraya Benhammou

doi:10.71265/9nabsz16

Authors

Morraya Benhammou The Hague University of Applied Sciences

DOI:

https://doi.org/10.71265/9nabsz16

Keywords:

User-centric AI-misalignment, GenAI User Behavior, Generative AI, ChatGPT

Abstract

This research paper explores the concept of AI-misalignment, differentiating between perspectives of AI model designers and users. It delves into the notion of user-centric AI misalignment, breaking it down into three categories: user responsibility, user intent, and user influence. The focus is on how users influence misalignment in both the large models and the outputs generated through AI. The research examines how user behavior may inadvertently contribute to the spread of disinformation through AI-generated hallucinations, or deliberately use AI to propagate misinformation for propaganda purposes. Additionally, it discusses the concept of user accountability as part of this behavior, highlighting that from a user’s perspective, the only controllable aspect is their acceptance through ignorance. Furthermore, it explores how user behavior can shape AI models through reinforcement learning from human feedback (RLHF) and how they can influence the models through model collapse. This kind of misalignment can significantly affect knowledge integrity, possibly resulting in knowledge erosion.

The research incorporates evidence from a survey designed to assess user awareness in the context of user behavior and knowledge generation. The survey gathered insights to better understand current perceptions and knowledge about generative AI technologies, and the role users play in them.

Downloads

Download data is not yet available.

Author Biography

Morraya Benhammou, The Hague University of Applied Sciences

Morraya Benhammou is a senior lecturer in Business Administration at The Hague University of Applied Sciences, The Netherlands, and head of the AI Didactics Expertise Center. Her work focuses on educational innovation, with a particular emphasis on the impact of emerging technologies on teaching and learning.

References

Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. https://arxiv.org/abs/1606.06565

Askell, A., Bai, Y., Chen, A., Drain, D., Ganguli, D., Henighan, T., Jones, A., Joseph, N., Mann, B., DasSarma, N., et al. (2021). A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861

Athaluri, S. A., Manthena, S. V., Kesapragada, V. S. R. K. M., Yarlagadda, V., Dave, T., & Duddumpudi, R. T. S. (2023). Exploring the Boundaries of Reality: Investigating the phenomenon of artificial intelligence hallucination in scientific writing through ChatGPT references. Curēus. https://doi.org/10.7759/cureus.37432

Bender, E. M., Gebru, T., McMillan-Major, A., and Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 610–623.

Bergman, P. (2024). Aligning AI systems with human values: Challenges and strategies. Journal of AI Research, 67(1), 123-145. https://doi.org/10.1016/j.jair.2024.01.005

Bignami, A., D'Angelo, F., Guerrini, F., & Rossi, A. (2023). Dataset shift and its impact on the performance of machine learning systems. https://doi.org/10.1016/j.artint.2023.103563

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., et al. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.

Bovens M (2007) Analysing and assessing accountability: a conceptual framework. Eur Law J 13(4):447–468. https://doi.org/10.1111/j.1468-0386.2007.00378.x

Chen, L., Zaharia, M., & Zou, J. (2023). How is ChatGPT’s behavior changing over time? arXiv (Cornell University). https://doi.org/10.48550/arxiv.2307.09009

Deng, J., & Lin, Y. (2023). The Benefits and Challenges of ChatGPT: An Overview. Frontiers in Computing and Intelligent Systems, 2(2), 81–83. https://doi.org/10.54097/fcis.v2i2.4465

Dung, L. (2023). Current cases of AI misalignment and their implications for future risks. Synthese, 202(5). https://doi.org/10.1007/s11229-023-04367-0

Durante M, Floridi L (2022) A legal principles-based framework for AI liability regulation. In: Mökander J, Ziosi M (eds) The 2021 yearbook of the digital ethics lab. Springer International Publishing, pp 93–112. https://doi.org/10.1007/978-3-031-09846-8_7

Floridi, L. (2016). Faultless responsibility: on the nature and allocation of moral responsibility for distributed moral actions. Philosophical Transactions- Royal Society. Mathematical, Physical and Engineering Sciences, 374(2083), 20160112. https://doi.org/10.1098/rsta.2016.0112

Floridi, L., Cowls, J., Beltrametti, M., Chatila, R., Chazerand, P., Dignum, V., … & Vayena, E. (2018). AI4People—an ethical framework for a good AI society: opportunities, risks, principles, and recommendations. Minds and Machines, 28(4), 689-707. https://doi.org/10.1007/s11023-018-9482-5

Gabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and machines, 30(3):411–437.

Gehman, S., Gururangan, S., Sap, M., Choi, Y., and Smith, N. A. (2020). Realtoxicityprompts: Evaluating neural toxic degeneration in language models. arXiv preprint arXiv:2009.11462.

Lee, J. N., Xie, A., Pacchiano, A., Chandak, Y., Finn, C., Nachum, O., & Brunskill, E. (2023). Supervised pretraining can learn In-Context reinforcement learning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2306.14892

Lindberg SI (2013) Mapping accountability: core concept and subtypes. Int Rev Adm Sci 79(2):202–226. https://doi.org/10.1177/0020852313477761

Kenton, Z., Everitt, T., Weidinger, L., Gabriel, I., Mikulik, V., and Irving, G. (2021). Alignment of language agents. arXiv preprint arXiv:2103.14659.

Kleinman, B. Z. (2024, February 28). Why Google’s “woke” AI problem won’t be an easy fix. https://www.bbc.com/news/technology-68412620

Koohang, A., and Weiss, E. 2003. Misinformation: toward creating a prevention framework. Information Science.

Kreps, S., McCain, R. M., & Brundage, M. (2020). All the News That’s Fit to Fabricate: AI-Generated Text as a Tool of Media Misinformation. Journal of Experimental Political Science, 9(1), 104–117. https://doi.org/10.1017/xps.2020.37

LaCroix, T. and Bengio, Y. (2020). Learning from learning machines: optimisation, rules, and social norms. https://doi.org/10.48550/arxiv.2001.00006

Leike, J., Krueger, D., Everitt, T., Martic, M., Maini, V., and Legg, S. (2018). Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871.

Li, Z., Yang, Z., & Wang, M. (2023). Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2305.18438

Loi, M., & Spielkamp, M. (2021). Towards Accountability in the Use of Artificial Intelligence for Public Administrations. Association for Computing Machinery New York, NY, United States. https://doi.org/10.1145/3461702.3462631

Mäntymäki, M., Minkkinen, M., Birkstedt, T., & Viljanen, M. (2022, June 1). Putting AI Ethics into Practice: The Hourglass Model of Organizational AI Governance. arXiv.org. https://arxiv.org/abs/2206.00335

Mintz, A. (2002). Web of Deception: Misinformation on the Internet. New Jersey: Information Today, Inc.

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with Deep Reinforcement Learning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1312.5602

Montgomery, R. (2024). From biological constraints to unbounded artificial evolution: exploring the implications of AI's accelerated advancement. WNSC, 1(3), 1-14. https://doi.org/10.62162/1062011

Mulgan R (2000) ‘Accountability’: an ever-expanding concept? Public Admin 78(3):555–573. https://doi.org/10.1111/1467-9299.00218

Mulgan R (2003) Issues of accountability. In: Mulgan R (ed) Holding power to account: accountability in modern democracies. Palgrave Macmillan, pp 1–35. https://doi.org/10.1057/9781403943835_1

Murphy, K., Ruggiero, E., Upshur, R., Willison, D., Malhotra, N., Cai, J., … & Gibson, J. (2021). Artificial intelligence for good health: a scoping review of the ethics literature. BMC Medical Ethics, 22(1). https://doi.org/10.1186/s12910-021-00577-8

Novelli, C., Taddeo, M. & Floridi, L. Accountability in artificial intelligence: what it is and how it works. AI & Soc (2023). https://doi.org/10.1007/s00146-023-01635-y

Parentoni, L. (2024). What should we reasonably expect from artificial intelligence? Russian Journal of Economics and Law, 18(1), 217-245. https://doi.org/10.21202/2782-2923.2024.1.217-245

Piper, P. (2000). Better read that again: Web hoaxes and misinformation. Searcher, 8(8). Retrieved February 02, 2003 from the Word Wide Web http://www.infotoday.com/searcher/sep00/piper.htm

Ouyang, L., Wu, J., Xu, J., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., John, S., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. J. (2022). Training language models to follow instructions with human feedback. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2203.02155

Safron, M., Huang, Y., & Kumar, V. (2023). Inductive biases and AI failures: A study on training algorithms. https://doi.org/10.1109/ICRA.2023.9450987

Saghafian, S. (2023). Effective Generative AI: the Human-Algorithm Centaur. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4587250

Santoni de Sio, F.; Van den Hoven, J. 2018. Meaningful Human Control over Autonomous Systems: A Philosophical Account. Frontiers in Robotics and AI 5. https://doi.org/10.3389/frobt.2018.00015

Shahbaz, Funk, and Vesteinsson, “The Repressive Power of Artificial Intelligence,” in Shahbaz, Funk, Vesteinsson, Brody, Baker, Grothe, Barak, Masinsin, Modi, Sutterlin eds. Freedom on the Net 2023, Freedom House, 2023, freedomonthenet.org.

Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., & Hassabis, D. (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419), 1140–1144. https://doi.org/10.1126/science.aar6404

Shankar, K., Mohan, P., & Ramesh, S. (2007). Sycophant: A context-aware adaptive AI system. Journal of Computer Science, 22(4), 345-357. https://doi.org/10.1016/j.jcs.2007.01.001

Shrivastava, R. (2023, December 31). How ChatGPT and billions in investment helped AI go mainstream in 2023. Forbes. https://www.forbes.com/sites/rashishrivastava/2023/12/27/how-chatgpt-and-billions-in-investment-helped-ai-go-mainstream-in-2023/

Shumailov, I., Shumaylov, Z., Zhao, Y., Gal, Y., Papernot, N., & Anderson, R. (2023). The curse of recursion: training on generated data makes models forget. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2305.17493

Tamkin, A., Brundage, M., Clark, J., and Ganguli, D. (2021). Understanding the capabilities, limitations, and societal impact of large language models. arXiv preprint arXiv:2102.02503.

Thynne I, Goldring J (1987) Accountability and control: government officials and the exercise of power. Law Book Company

Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., Ewalds, T., Georgiev, P., Oh, J., Horgan, D., Kroiss, M., Danihelka, I., Huang, A., Sifre, L., Cai, T., Agapiou, J. P., Jaderberg, M., . . . Silver, D. (2019). Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782), 350–354. https://doi.org/10.1038/s41586-019-1724-z

Wang, X., & Chen, X. (2024, May 7). Towards Human-AI Mutual Learning: A New Research Paradigm. arXiv.org. https://arxiv.org/abs/2405.04687

Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P.-S., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., et al. (2021). Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359. Welbl, J., Glaese, A., Uesato, J., Dathath

Wieringa, M. 2020. What to account for when accounting for algorithms: a systematic literature review on algorithmic accountability, in: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20. Association for Computing Machinery, Barcelona, Spain, pp. 1–18. https://doi.org/10.1145/3351095.337283

Zhou, L., Wu, Y., & Zhang, H. (2022). Fairness in AI: Addressing algorithmic discrimination. ACM Computing Surveys, 54(8), 1-38. https://doi.org/10.1145/3456789