Beyond the prompt: The Role of User Behavior in Shaping AI-misalignment and Societal Knowledge
DOI:
https://doi.org/10.71265/9nabsz16Keywords:
User-centric AI-misalignment, GenAI User Behavior, Generative AI, ChatGPTAbstract
This research paper explores the concept of AI-misalignment, differentiating between perspectives of AI model designers and users. It delves into the notion of user-centric AI misalignment, breaking it down into three categories: user responsibility, user intent, and user influence. The focus is on how users influence misalignment in both the large models and the outputs generated through AI. The research examines how user behavior may inadvertently contribute to the spread of disinformation through AI-generated hallucinations, or deliberately use AI to propagate misinformation for propaganda purposes. Additionally, it discusses the concept of user accountability as part of this behavior, highlighting that from a user’s perspective, the only controllable aspect is their acceptance through ignorance. Furthermore, it explores how user behavior can shape AI models through reinforcement learning from human feedback (RLHF) and how they can influence the models through model collapse. This kind of misalignment can significantly affect knowledge integrity, possibly resulting in knowledge erosion.
The research incorporates evidence from a survey designed to assess user awareness in the context of user behavior and knowledge generation. The survey gathered insights to better understand current perceptions and knowledge about generative AI technologies, and the role users play in them.
Downloads
References
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. https://arxiv.org/abs/1606.06565
Askell, A., Bai, Y., Chen, A., Drain, D., Ganguli, D., Henighan, T., Jones, A., Joseph, N., Mann, B., DasSarma, N., et al. (2021). A general language assistant as a laboratory for alignment. arXiv preprint arXiv:2112.00861
Athaluri, S. A., Manthena, S. V., Kesapragada, V. S. R. K. M., Yarlagadda, V., Dave, T., & Duddumpudi, R. T. S. (2023). Exploring the Boundaries of Reality: Investigating the phenomenon of artificial intelligence hallucination in scientific writing through ChatGPT references. Curēus. https://doi.org/10.7759/cureus.37432
Bender, E. M., Gebru, T., McMillan-Major, A., and Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 610–623.
Bergman, P. (2024). Aligning AI systems with human values: Challenges and strategies. Journal of AI Research, 67(1), 123-145. https://doi.org/10.1016/j.jair.2024.01.005
Bignami, A., D'Angelo, F., Guerrini, F., & Rossi, A. (2023). Dataset shift and its impact on the performance of machine learning systems. https://doi.org/10.1016/j.artint.2023.103563
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., et al. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
Bovens M (2007) Analysing and assessing accountability: a conceptual framework. Eur Law J 13(4):447–468. https://doi.org/10.1111/j.1468-0386.2007.00378.x
Chen, L., Zaharia, M., & Zou, J. (2023). How is ChatGPT’s behavior changing over time? arXiv (Cornell University). https://doi.org/10.48550/arxiv.2307.09009
Deng, J., & Lin, Y. (2023). The Benefits and Challenges of ChatGPT: An Overview. Frontiers in Computing and Intelligent Systems, 2(2), 81–83. https://doi.org/10.54097/fcis.v2i2.4465
Dung, L. (2023). Current cases of AI misalignment and their implications for future risks. Synthese, 202(5). https://doi.org/10.1007/s11229-023-04367-0
Durante M, Floridi L (2022) A legal principles-based framework for AI liability regulation. In: Mökander J, Ziosi M (eds) The 2021 yearbook of the digital ethics lab. Springer International Publishing, pp 93–112. https://doi.org/10.1007/978-3-031-09846-8_7
Floridi, L. (2016). Faultless responsibility: on the nature and allocation of moral responsibility for distributed moral actions. Philosophical Transactions- Royal Society. Mathematical, Physical and Engineering Sciences, 374(2083), 20160112. https://doi.org/10.1098/rsta.2016.0112
Floridi, L., Cowls, J., Beltrametti, M., Chatila, R., Chazerand, P., Dignum, V., … & Vayena, E. (2018). AI4People—an ethical framework for a good AI society: opportunities, risks, principles, and recommendations. Minds and Machines, 28(4), 689-707. https://doi.org/10.1007/s11023-018-9482-5
Gabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and machines, 30(3):411–437.
Gehman, S., Gururangan, S., Sap, M., Choi, Y., and Smith, N. A. (2020). Realtoxicityprompts: Evaluating neural toxic degeneration in language models. arXiv preprint arXiv:2009.11462.
Lee, J. N., Xie, A., Pacchiano, A., Chandak, Y., Finn, C., Nachum, O., & Brunskill, E. (2023). Supervised pretraining can learn In-Context reinforcement learning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2306.14892
Lindberg SI (2013) Mapping accountability: core concept and subtypes. Int Rev Adm Sci 79(2):202–226. https://doi.org/10.1177/0020852313477761
Kenton, Z., Everitt, T., Weidinger, L., Gabriel, I., Mikulik, V., and Irving, G. (2021). Alignment of language agents. arXiv preprint arXiv:2103.14659.
Kleinman, B. Z. (2024, February 28). Why Google’s “woke” AI problem won’t be an easy fix. https://www.bbc.com/news/technology-68412620
Koohang, A., and Weiss, E. 2003. Misinformation: toward creating a prevention framework. Information Science.
Kreps, S., McCain, R. M., & Brundage, M. (2020). All the News That’s Fit to Fabricate: AI-Generated Text as a Tool of Media Misinformation. Journal of Experimental Political Science, 9(1), 104–117. https://doi.org/10.1017/xps.2020.37
LaCroix, T. and Bengio, Y. (2020). Learning from learning machines: optimisation, rules, and social norms. https://doi.org/10.48550/arxiv.2001.00006
Leike, J., Krueger, D., Everitt, T., Martic, M., Maini, V., and Legg, S. (2018). Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871.
Li, Z., Yang, Z., & Wang, M. (2023). Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2305.18438
Loi, M., & Spielkamp, M. (2021). Towards Accountability in the Use of Artificial Intelligence for Public Administrations. Association for Computing Machinery New York, NY, United States. https://doi.org/10.1145/3461702.3462631
Mäntymäki, M., Minkkinen, M., Birkstedt, T., & Viljanen, M. (2022, June 1). Putting AI Ethics into Practice: The Hourglass Model of Organizational AI Governance. arXiv.org. https://arxiv.org/abs/2206.00335
Mintz, A. (2002). Web of Deception: Misinformation on the Internet. New Jersey: Information Today, Inc.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with Deep Reinforcement Learning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1312.5602
Montgomery, R. (2024). From biological constraints to unbounded artificial evolution: exploring the implications of AI's accelerated advancement. WNSC, 1(3), 1-14. https://doi.org/10.62162/1062011
Mulgan R (2000) ‘Accountability’: an ever-expanding concept? Public Admin 78(3):555–573. https://doi.org/10.1111/1467-9299.00218
Mulgan R (2003) Issues of accountability. In: Mulgan R (ed) Holding power to account: accountability in modern democracies. Palgrave Macmillan, pp 1–35. https://doi.org/10.1057/9781403943835_1
Murphy, K., Ruggiero, E., Upshur, R., Willison, D., Malhotra, N., Cai, J., … & Gibson, J. (2021). Artificial intelligence for good health: a scoping review of the ethics literature. BMC Medical Ethics, 22(1). https://doi.org/10.1186/s12910-021-00577-8
Novelli, C., Taddeo, M. & Floridi, L. Accountability in artificial intelligence: what it is and how it works. AI & Soc (2023). https://doi.org/10.1007/s00146-023-01635-y
Parentoni, L. (2024). What should we reasonably expect from artificial intelligence? Russian Journal of Economics and Law, 18(1), 217-245. https://doi.org/10.21202/2782-2923.2024.1.217-245
Piper, P. (2000). Better read that again: Web hoaxes and misinformation. Searcher, 8(8). Retrieved February 02, 2003 from the Word Wide Web http://www.infotoday.com/searcher/sep00/piper.htm
Ouyang, L., Wu, J., Xu, J., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., John, S., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. J. (2022). Training language models to follow instructions with human feedback. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2203.02155
Safron, M., Huang, Y., & Kumar, V. (2023). Inductive biases and AI failures: A study on training algorithms. https://doi.org/10.1109/ICRA.2023.9450987
Saghafian, S. (2023). Effective Generative AI: the Human-Algorithm Centaur. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4587250
Santoni de Sio, F.; Van den Hoven, J. 2018. Meaningful Human Control over Autonomous Systems: A Philosophical Account. Frontiers in Robotics and AI 5. https://doi.org/10.3389/frobt.2018.00015
Shahbaz, Funk, and Vesteinsson, “The Repressive Power of Artificial Intelligence,” in Shahbaz, Funk, Vesteinsson, Brody, Baker, Grothe, Barak, Masinsin, Modi, Sutterlin eds. Freedom on the Net 2023, Freedom House, 2023, freedomonthenet.org.
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., & Hassabis, D. (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419), 1140–1144. https://doi.org/10.1126/science.aar6404
Shankar, K., Mohan, P., & Ramesh, S. (2007). Sycophant: A context-aware adaptive AI system. Journal of Computer Science, 22(4), 345-357. https://doi.org/10.1016/j.jcs.2007.01.001
Shrivastava, R. (2023, December 31). How ChatGPT and billions in investment helped AI go mainstream in 2023. Forbes. https://www.forbes.com/sites/rashishrivastava/2023/12/27/how-chatgpt-and-billions-in-investment-helped-ai-go-mainstream-in-2023/
Shumailov, I., Shumaylov, Z., Zhao, Y., Gal, Y., Papernot, N., & Anderson, R. (2023). The curse of recursion: training on generated data makes models forget. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2305.17493
Tamkin, A., Brundage, M., Clark, J., and Ganguli, D. (2021). Understanding the capabilities, limitations, and societal impact of large language models. arXiv preprint arXiv:2102.02503.
Thynne I, Goldring J (1987) Accountability and control: government officials and the exercise of power. Law Book Company
Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., Ewalds, T., Georgiev, P., Oh, J., Horgan, D., Kroiss, M., Danihelka, I., Huang, A., Sifre, L., Cai, T., Agapiou, J. P., Jaderberg, M., . . . Silver, D. (2019). Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782), 350–354. https://doi.org/10.1038/s41586-019-1724-z
Wang, X., & Chen, X. (2024, May 7). Towards Human-AI Mutual Learning: A New Research Paradigm. arXiv.org. https://arxiv.org/abs/2405.04687
Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P.-S., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., et al. (2021). Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359. Welbl, J., Glaese, A., Uesato, J., Dathath
Wieringa, M. 2020. What to account for when accounting for algorithms: a systematic literature review on algorithmic accountability, in: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20. Association for Computing Machinery, Barcelona, Spain, pp. 1–18. https://doi.org/10.1145/3351095.337283
Zhou, L., Wu, Y., & Zhang, H. (2022). Fairness in AI: Addressing algorithmic discrimination. ACM Computing Surveys, 54(8), 1-38. https://doi.org/10.1145/3456789
Downloads
Published
License
Copyright (c) 2025 Morraya Benhammou

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.