To evaluate AI results, we need to consider several factors. In this section we will walk through the mechanics of current AI algorithms (as far as we can know them) to understand how AI tools are finding and using sources to generate results. We will consider these results through the familiar lenses of the ACRL framework and source evaluation tools such as CRAAP to see how AI generated material might be judged based on its credibility or accuracy.
Due to issues explained below, it is possible that some current AI tools may not pass credible source evaluation tests such as CRAAP. However, by establishing an understanding of how these AI tools work, and by learning how to adapt existing evaluation tools to AI results, you can continue to follow and evaluate as AI grows and evolves.
AI Chatbots and AI Technology:
The current batch of AI Chatbots are LLMs (Large-Language models) “a machine-learning system that autonomously learns from data and can produce sophisticated and seemingly intelligent writing after training on a massive data set of text” (van Dis, et al, 2023). The exact language training set data for each AI differ from product to product and the sources are considered proprietary information and are mostly kept secret. However, OpenAI (ChatGPT’s parent company) has released a preprint paper on their LLM methods, which explained that the dataset sources included Wikipedia, publicly available books, some academic articles, general websites, and publicly readable social media networks such as Reddit. (Radford, et al, 2023). The current iteration of the generator also notes that it does not have access to academic databases. Additionally, the language set used for the previous version of ChatGPT was only current up to 2021 (Radford, et al, 2023). Each individual AI product is expected to update and expand on its own timeline, according to its parent company. AI products pull their responses from the language training sets and libraries, not from the entire existing internet. This means that the current set of AI apps are restricted in their output to whatever subset of sources are available to them as input.
Natural language generator (NLG):
It’s important to note that ChatGPT and many other LLM AI models are training as much on using natural, human-sounding language, as they are training on retrieving information. In terms of developing an AI algorithm, this can lead to gaps in what is called alignment, the degree to which an AI performs tasks as humans expect or need it to. (Strickland, 2023). While ChatGPT has well-developed NLG capabilities, it still does not perform search tasks in alignment with what humans expect from a search function, including misquoting and hallucinating (making things up). In fact, the NLG performs so much better than people expect, that they attribute much higher levels of accuracy to the responses than is measured by verification studies.
Because ChatGPT and other LLM’s are newer technology, there has not been time to build a substantive body of research literature on accuracy or hallucination rate. However, here are a few example studies done recently:
Studies on accuracy and hallucination rates will evolve along with AI products. Currently, accuracy of cited information remains a major issue in AI-generated information.
Black Box issues
As mentioned above, all current AI LLMs are proprietary software, owned by private companies. This means that their generative algorithms are trade secrets. Neither OpenAI nor any of the other major AI LLM developers have open-sourced their specific algorithms. All information on datasets used for LLMs have been voluntarily released by the corporate owners, and only to the degree they choose. For any particular AI tool, the actual sources being used to train or provide datasets may be partially or wholly unknown.
Additionally, current AI LLMs rely on specific prompts to generate information. If the user does not prompt the AI to provide a specific source or citation, generally none will be provided (Walters, et al, 2023). These two barriers represent a significant obstacle to the user to identifying and finding the source of an idea, claim, or statement in an AI LLM generated text.
Current AI Chatbot and Text generator models do not pull materials from the entire internet or from most paid subscription academic databases. Therefore the information available to the AI products is a far smaller, and less academic, set of sources than most academic libraries provide access to. Some AI chatbots may incorporate academic database sources in the future, however there is no way to know if the AI parent companies will disclose the extent or specific sources.
Some AI products currently scrape or pull publicly available sources for academic materials. These sources could be anything, from an article on an individual's professional website, to an open source academic repository, to open published articles, to predatory publishing sources, to non-academic sources publishing articles, papers, and opinion pieces. Without access to the AI's datasets, its not possible to know which generated text may be from which source. Complicating this lack of transparency is the possibility that the AI might possibly attach a hallucinated citation to a non-academic text or excerpt or source.
Most of the common credible source evaluations methods used by academic libraries such as CRAAP, TRAAP, PROVEN, or the 5 W's, focus on the specific source of a text or claim. Use of these models requires the researcher to analyze the source by characteristic details (date of publication, academic affiliation, author credentials, etc). Without access to the specific and accurate citation to a source, these models have no method of finding a source credible.
Even where citations are provided, the accuracy rate of a particular AI product may not be high enough to provide reliable credibility without the secondary step of verifying and then evaluating the sources individually.
The LibrAIry has created the ROBOT test to consider when using AI technology.
Reliability
Objective
Bias
Ownership
Type
Reliability
Objective
Bias
Owner
Type
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Baker, R.S. & Hawn, A. Algorithmic Bias in Education. Int J Artif Intell Educ 32, 1052–1092
(2022). https://doi.org/10.1007/s40593-021-00285-9
Bali, M. (2023, April 1) What I mean when I say critical AI literacy. Reflecting allowed.
https://blog.mahabali.me/educational-technology-2/what-i-mean-when-i-say-critical-ai-literacy/#:~:text=How%20to%20Use%20AI%20in,potentially%20lose%20when%20using%20it.
Bhattacharyya M, Miller VM, Bhattacharyya D, Miller LE. High Rates of Fabricated and
Inaccurate References in ChatGPT-Generated Medical Content. Cureus. 2023 May 19;15(5):e39238. doi: 10.7759/cureus.39238
Boehme, G., Hilles, S., Justus, R., & Gibson, K. (2023). Harnessing Pandora’s Box: At the
Intersection of Information Literacy and AI.
Buchanan, J., Hill, S., & Shapoval, O. (2023). ChatGPT Hallucinates Non-existent Citations:
Evidence from Economics. The American Economist, 0(0). https://doi.org/10.1177/05694345231218454
Del Castillo, M. (2023). AI LibGuides by Librarians. FIU library.
Benjamin Hall & Jimmy McKee (2024) An early or somewhat late ChatGPT guide for librarians,
Journal of Business & Finance Librarianship, 29:1, 58-69, DOI:
10.1080/08963568.2024.2303944
James, A. B., & Filgo, E. H. (2023). Where does ChatGPT fit into the Framework for
Information Literacy? The possibilities and problems of AI in library instruction. College & Research Libraries News, 84(9), 334.
Noble, S.U. (2018). Algorithms of Oppression. NYU Press.
Radford, A., Narasimhan, K., Salimans, T. & Sutskever, I. (2023). Improving Language
Understanding by Generative Pre-Training. https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf
Strickland, E. (2023, August 31). Open AI’s Moonshot: Solving the AI Alignment Problem.
IEEE Spectrum. https://spectrum.ieee.org/the-alignment-problem-openai
Stokel-Walker, C. (2023, November 22). ChatGPT replicates gender bias in recommendation
letters. Scientific American. https://www.scientificamerican.com/article/chatgpt-replicates-gender-bias-in-recommendation-letters/
van Dis, E. A. M., Bollen, J., Zuidema, W., van Rooij, R., & Bockting, C. L. (2023). Chatgpt:
five priorities for research. Nature : International Weekly Journal of Science, 614(7947), 224–226. https://doi.org/10.1038/d41586-023-00288-7
Walters, W.H. & Wilder, E.I. (2023). Fabrication and errors in the bibliographic citations
generated by ChatGPT. Sci Rep 13, 14045