AgenticRAG: Zero-Hallucination Document Question Answering
A vectorless, multi-agent retrieval pipeline that eliminates context fragmentation, semantic drift, and unchecked hallucination in AI-powered document analysis. Open-source, MIT-licensed.
Arham Mirkar
DataLayer - Enterprise Data Infrastructure
Abstract
Retrieval-Augmented Generation (RAG) has become the dominant paradigm for grounding large language models in factual data. However, production RAG pipelines suffer from three systemic failures: context fragmentation caused by arbitrary document chunking, semantic drift from embedding-based retrieval, and unchecked hallucination in synthesized answers. We present AgenticRAG, an open-source Python library that eliminates all three failure modes by replacing the vector-embedding-chunk pipeline with a multi-agent reasoning architecture over structurally-aware document trees. In head-to-head benchmarks on SEC 10-K financial filings, AgenticRAG achieved a 5/5 quality score versus 1/5 for vector RAG, correctly extracting a complete 15-person executive roster that vector RAG failed to retrieve entirely.
TL;DR
AgenticRAG is an open-source Python library that replaces traditional vector-based RAG with a multi-agent pipeline over document tree structures. It achieves 5/5 quality on financial document extraction (vs 1/5 for vector RAG), reduces LLM token usage by 98% through hybrid pre-filtering, and eliminates hallucination via a dedicated Critic agent that cross-references every claim against source text. Install with pip install agentic-rag-core.
Why Traditional RAG Fails
The chunk-embed-retrieve paradigm was designed for semantic search at scale. It was never designed for precision extraction from structured documents.
The Specific Problem of Numerical Hallucination
LLMs are particularly unreliable with numerical data. When asked to extract revenue figures, asset values, or executive compensation from financial documents, they frequently:
- Confabulate adjacent numbers - combine a revenue figure from one table with a percentage from another to produce a fabricated statistic
- Round or approximate- change "$4,831,234" to "$4.8 million," introducing inaccuracy where exact figures matter
- Hallucinate from training data - substitute memorized statistics from pre-training over the actually retrieved context
- Lose table structure - when table cells are serialized into flat text by chunking, the LLM cannot associate row labels with column values
Multi-Agent Architecture
Five specialized LLM agents collaborate in a constrained state machine, mirroring the workflow of a human researcher.
Planner
Query RouterExtracts search terms, queries the DocumentGraph (SQLite + FTS5), and selects the top-N most relevant documents from the indexed collection.
Keyword Agent
Search ExpanderGenerates domain-specific keyphrases, keywords, and synonyms tailored to the question and document vocabulary. One call, shared across all hunters.
Hunter
Tree NavigatorParallel tree search across candidate documents. Uses hybrid pre-filtering (98% token reduction) to find contextually complete sections, not fragments.
Synthesizer
Evidence CombinerCombines evidence chunks from all parallel hunters into a unified answer with strict citation rules: every claim references [Source: Doc, Pages X-Y].
Evaluator
Self-CorrectionAssesses evidence sufficiency. If gaps exist, it refines the query and triggers another retrieval round. Prevents hallucinated query refinements via overlap validation.
Critic
Zero-Hallucination EnforcerCross-references every claim in the draft against raw source text. Unsupported claims are detected and removed before the answer reaches the user.
Hybrid Sub-Tree Pre-Filtering
Large documents (160+ pages) produce tree indices with hundreds of nodes. Passing the entire tree to the LLM is both expensive and counterproductive. Our hybrid pipeline prunes the tree before the LLM sees it:
| Metric | Without Pre-Filter | With Pre-Filter | Reduction |
|---|---|---|---|
| Nodes passed to LLM | ~800 | ~15 | 98.1% |
| Tokens per call | ~20,000 | ~400 | 98.0% |
| Total tokens (3 iters) | ~60,000 | ~1,200 | 98.0% |
| Rate-limit risk | High | Negligible | - |
Benchmark Results
Head-to-head comparison on a 3M Company 2018 10-K filing (160 pages, ~800 tree nodes). LLM Judge: llama-3.3-70b-versatile
| Metric | AgenticRAG | Vector RAG |
|---|---|---|
| Quality Score (LLM Judge) | 5 / 5 | 1 / 5 |
| Answer Completeness | 15 executives, full details | Failed - "context not found" |
| Latency | 48.02s | 2.57s |
| Factual Accuracy | Every name, age, title verified | N/A (no answer) |
Vector RAG Output
"The provided context does not contain the complete list of executives of 3M. It mentions that the list of executive officers is presented in a table, but the table itself is not included in the provided context."
Chunking destroyed the table structure. Embeddings found nearby text but not the actual data.
AgenticRAG Output
Correctly identified and returned all 15 executives with full names, ages, titles, tenure history, and prior roles. Every fact cited as [Source: 3M_2018_10K.pdf, Pages 7-8]
Tree index preserved the executive table as a single, complete node. Critic verified every claim.
How It Compares
Open Source, MIT Licensed
AgenticRAG is free to use, modify, and deploy. Three lines of code to get started. No vector database, no embeddings, no infrastructure.
$ pip install agentic-rag-core
from agenticrag import Forest
forest = Forest(verbose=True)
forest.add("report.pdf")
result = forest.ask("What was the net income?")
print(result.text)
print(result.confidence) # 0.0-1.0References
- Lewis, P., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020.
- Edge, D., et al. (2024). "From Local to Global: A Graph RAG Approach to Query-Focused Summarization." Microsoft Research.
- Yao, S., et al. (2023). "ReAct: Synergizing Reasoning and Acting in Language Models." ICLR 2023.
- Jiang, Z., et al. (2023). "Active Retrieval Augmented Generation." EMNLP 2023.
- Gao, Y., et al. (2024). "Retrieval-Augmented Generation for Large Language Models: A Survey." arXiv:2312.10997.
© 2026 Arham Mirkar, DataLayer. Published under Creative Commons Attribution 4.0 International (CC BY 4.0). The AgenticRAG software is released under the MIT License.