HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
Abstract
In order to thrive in hostile and ever-changing natural environments, mammalian brains evolved to store large amounts of knowledge about the world and continually integrate new information while avoiding catastrophic forgetting. Despite the impressive accomplishments, large language models (LLMs), even with retrieval-augmented generation (RAG), still struggle to efficiently and effectively integrate a large amount of new experiences after pre-training. In this work, we introduce HippoRAG, a novel retrieval framework inspired by the hippocampal indexing theory of human long-term memory to enable deeper and more efficient knowledge integration over new experiences. HippoRAG synergistically orchestrates LLMs, knowledge graphs, and the Personalized PageRank algorithm to mimic the different roles of neocortex and hippocampus in human memory. We compare HippoRAG with existing RAG methods on multi-hop question answering and show that our method outperforms the state-of-the-art methods remarkably, by up to 20%. Single-step retrieval with HippoRAG achieves comparable or better performance than iterative retrieval like IRCoT while being 10-30 times cheaper and 6-13 times faster, and integrating HippoRAG into IRCoT brings further substantial gains. Finally, we show that our method can tackle new types of scenarios that are out of reach of existing methods. Code and data are available at https://github.com/OSU-NLP-Group/HippoRAG.
Community
Congrats on the awesome work! I think there's a lot of potential at the intersection of knowledge graphs x LLM systems.
One question I have, that isn't really addressed in the paper, is related to how well this type of system can scale to large corpora...
- IIUC, we don't actually need to encode and index embeddings for document chunks themselves
- Instead, we need to encode + index embeddings for each named entity through the entire corpus (along with maintaining the graph of relationships)
- The papers experiments show that there are ~8x the number of unique nodes (i.e. named entities) relative to the number of passages (across the three datasets used)
If we were to scale up to large corpora (e.g. say 10 billion of document chunks), that would imply 80B embeddings to store in a vector DB index.
Is my understanding correct here? Wouldn't this also pose a major issue to scaling (along with the need for 2 LLM calls per document chunk to create the triplets)?
Thanks in advance for your thoughts on this!
Hi @andrewrreed ! I'm sorry I completely missed this post.
You are definitely correct about the scaling problem, however there are a few things that would make things better here: 1) The ratio between entities and passages would likely go down as the graph grows since only a limited number of entities exist.
2) The difference in memory reqs for the vector store would likely be less than 1 order of magnitude which seems worthwhile in order to consolidate new knowledge into your RAG framework without any training.
As for the 2 LLM calls per document chunk during indexing, I think this is the most significant scaling problem. However, I'd argue that you always need to run either inference or training over new documents at least once in order to perform standard RAG or to integrate new knowledge into the LLM's parameters. Also, it seems like smaller models are able to perform well on the indexing step (like Llama 3-8B).
Thanks for your questions, hope this helps!
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper