When ChatGPT was introduced in 2023, the future seemed to be upon us. Here was a tool capable of acing tests, programming, and summarizing information in an essay-style format using straightforward natural language. As first dates go, it was perfect. But as our relationship with the initial large language model (LLM) in worldwide use went on, problems arose. Our new partner “hallucinated,” making up facts. It had biases and could be unreliable. It was not, in a word, trustworthy.
But researchers did not give up on this important new partner. Rather, they have been hard at work figuring out how to improve it. Among them is UC Santa Barbara computer science assistant professor Shiyu Chang, who recently received a prestigious five-year, nearly $600,000 National Science Foundation (NSF) Faculty Early Career Development Program (CAREER) Award to conduct research aimed at advancing algorithms which might enable a collaborative, open-access approach to LLM error correction and knowledge addition and editing.
“It feels fantastic being a junior faculty member and having this award, especially with NSF grants in the trending area of AI being very competitive,” said Chang, who had applied unsuccessfully once before and added, “I feel I was lucky to get the award this time. I'm truly grateful, and I appreciate everyone who helped me while I was preparing the draft: my mentors, my colleagues in the Computer Science Department, the Office of Research, my students, and my friends. So many things came together to lead to this outcome.”
Despite the impressive performance of LLMs in harnessing knowledge, Chang, who came to UCSB in July 2021, writes in the award abstract that three research challenges remain in order to improve them: understanding how knowledge interacts with the behavior of LLMs; understanding how to correct outdated and incorrect knowledge in LLMs more surgically and precisely, without affecting other knowledge; and understanding how to efficiently impart new knowledge into LLMs to adapt them to different tasks and domains.
“When ChatGPT came along a few years ago, everyone was amazed, but then we started to see that it hallucinates, and it's not very creative, and it has all kinds of security and privacy problems,” Chang said. “As a community of computer scientists, we are collectively working to enable LLMs to be more robust, transparent, and modular so that they do what we expect them to do. There is still a long way to go.”
Having said that, he notes that LLMs have proven useful in many daily tasks. “It can correct grammatical errors and help to paraphrase things,” he said. “It can help me reply to emails from students. For a lot of simple things like that, it’s good enough, but many questions need to be addressed before it can be applied in higher-stakes applications.
"We aim to create a groundbreaking AI system, which we refer to as the ‘wiki’ model," he continues. "Currently, most large language models are developed by major tech companies, and while their performance continues to improve, their vast scale and training processes make them challenging to debug or enhance collaboratively and in a modular way. This current data-driven AI paradigm restricts easy community-driven refinement. To address the problem, we are investigating how collaborative platforms like Wikipedia and GitHub facilitate crowd-sourced knowledge, envisioning a similar framework for AI models. Just as Wikipedia users collectively edit and refine content, so, we aim to develop solutions that enable collaborative updates and corrections to AI model knowledge."
Presently, no open-source system exists that allows for precise, collaborative edits within AI models. “Neither the learning paradigm nor the current methods of model training and maintenance support this level of collaborative engagement,” Chang notes. “We hope that in the future, such a system will be possible.”
The first step that, he explains, "is to thoroughly understand the inner workings of these large models — identifying where knowledge is stored, how the model accesses it, and how it uses this knowledge to produce outputs. This understanding will allow us to track and pinpoint specific knowledge within the model, determining whether it is accurate, outdated, or needs correction. We also need to know how to edit this knowledge without unintentionally impacting other information within the model. This is what we refer to as ‘surgical’ editing. If we understand how knowledge is stored and organized within the model, we can also efficiently adapt it for any specific downstream application or scenario. Our goal is to enable a future where we can edit and modify AI models as intuitively as editing a Wikipedia page.”