NaiveRAG is fast but dumb.
GraphRAG is smart but slow.
This open-source solution fixes both.
RAG systems have a fundamental problem: They treat documents as isolated chunks. No connections. No context. No understanding of how things relate.
Graph RAG addresses this, but traditional graph databases become painfully slow for real-time applications.
What if you could combine the speed of vector search with the intelligence of knowledge graphs?
That’s exactly what we have built and shared in the video above.
A real-time AI Avatar that uses a knowledge graph as its memory. You talk to it directly, and everything happens in real-time.
Watch the video for the full demo and code walkthrough. We’ve open-sourced everything.
To power this, we used Zep’s knowledge retrieval system (open-source).
What makes it fast:
↳ Smart retrieval algorithms that avoid full graph search.
↳ Fine-tuned Qwen3 and Gemma models with <10ms embedding and <50ms reranking.
↳ S3 + hot caching for dense vector and BM25 search instead of traditional vector databases.
Zep’s open-source framework Graphiti is available under Apache 2.0, so you can easily self-host it.
You can find Zep’s GitHub repo here →
Thanks for reading!









