Research PaperJune 2026

AgenticRAG: Zero-Hallucination Document Question Answering

A vectorless, multi-agent retrieval pipeline that eliminates context fragmentation, semantic drift, and unchecked hallucination in AI-powered document analysis. Open-source, MIT-licensed.

Arham Mirkar

DataLayer - Enterprise Data Infrastructure

Abstract

Retrieval-Augmented Generation (RAG) has become the dominant paradigm for grounding large language models in factual data. However, production RAG pipelines suffer from three systemic failures: context fragmentation caused by arbitrary document chunking, semantic drift from embedding-based retrieval, and unchecked hallucination in synthesized answers. We present AgenticRAG, an open-source Python library that eliminates all three failure modes by replacing the vector-embedding-chunk pipeline with a multi-agent reasoning architecture over structurally-aware document trees. In head-to-head benchmarks on SEC 10-K financial filings, AgenticRAG achieved a 5/5 quality score versus 1/5 for vector RAG, correctly extracting a complete 15-person executive roster that vector RAG failed to retrieve entirely.

TL;DR

AgenticRAG is an open-source Python library that replaces traditional vector-based RAG with a multi-agent pipeline over document tree structures. It achieves 5/5 quality on financial document extraction (vs 1/5 for vector RAG), reduces LLM token usage by 98% through hybrid pre-filtering, and eliminates hallucination via a dedicated Critic agent that cross-references every claim against source text. Install with pip install agentic-rag-core.

Section 1

Why Traditional RAG Fails

The chunk-embed-retrieve paradigm was designed for semantic search at scale. It was never designed for precision extraction from structured documents.

Context Fragmentation

Fixed-length chunking arbitrarily splits documents at token boundaries, severing tables from headers and separating financial figures from their row labels.

Semantic Drift

Embedding-based retrieval measures cosine similarity, ranking general commentary higher than the specific table row containing the exact answer.

Unchecked Hallucination

Standard RAG performs no verification of generated answers. The LLM may interpolate between chunks, invent figures, or misattribute data from one section to another.

The Specific Problem of Numerical Hallucination

LLMs are particularly unreliable with numerical data. When asked to extract revenue figures, asset values, or executive compensation from financial documents, they frequently:

Confabulate adjacent numbers - combine a revenue figure from one table with a percentage from another to produce a fabricated statistic
Round or approximate- change "$4,831,234" to "$4.8 million," introducing inaccuracy where exact figures matter
Hallucinate from training data - substitute memorized statistics from pre-training over the actually retrieved context
Lose table structure - when table cells are serialized into flat text by chunking, the LLM cannot associate row labels with column values

Section 2

Multi-Agent Architecture

Five specialized LLM agents collaborate in a constrained state machine, mirroring the workflow of a human researcher.

Planner

Query Router

Extracts search terms, queries the DocumentGraph (SQLite + FTS5), and selects the top-N most relevant documents from the indexed collection.

Keyword Agent

Search Expander

Generates domain-specific keyphrases, keywords, and synonyms tailored to the question and document vocabulary. One call, shared across all hunters.

Hunter

Tree Navigator

Parallel tree search across candidate documents. Uses hybrid pre-filtering (98% token reduction) to find contextually complete sections, not fragments.

Synthesizer

Evidence Combiner

Combines evidence chunks from all parallel hunters into a unified answer with strict citation rules: every claim references [Source: Doc, Pages X-Y].

Evaluator

Self-Correction

Assesses evidence sufficiency. If gaps exist, it refines the query and triggers another retrieval round. Prevents hallucinated query refinements via overlap validation.

Critic

Zero-Hallucination Enforcer

Cross-references every claim in the draft against raw source text. Unsupported claims are detected and removed before the answer reaches the user.

Hybrid Sub-Tree Pre-Filtering

Large documents (160+ pages) produce tree indices with hundreds of nodes. Passing the entire tree to the LLM is both expensive and counterproductive. Our hybrid pipeline prunes the tree before the LLM sees it:

Metric	Without Pre-Filter	With Pre-Filter	Reduction
Nodes passed to LLM	~800	~15	98.1%
Tokens per call	~20,000	~400	98.0%
Total tokens (3 iters)	~60,000	~1,200	98.0%
Rate-limit risk	High	Negligible	-

Section 3

Benchmark Results

Head-to-head comparison on a 3M Company 2018 10-K filing (160 pages, ~800 tree nodes). LLM Judge: llama-3.3-70b-versatile

Metric	AgenticRAG	Vector RAG
Quality Score (LLM Judge)	5 / 5	1 / 5
Answer Completeness	15 executives, full details	Failed - "context not found"
Latency	48.02s	2.57s
Factual Accuracy	Every name, age, title verified	N/A (no answer)

Vector RAG Output

"The provided context does not contain the complete list of executives of 3M. It mentions that the list of executive officers is presented in a table, but the table itself is not included in the provided context."

Chunking destroyed the table structure. Embeddings found nearby text but not the actual data.

AgenticRAG Output

Correctly identified and returned all 15 executives with full names, ages, titles, tenure history, and prior roles. Every fact cited as [Source: 3M_2018_10K.pdf, Pages 7-8]

Tree index preserved the executive table as a single, complete node. Critic verified every claim.

Section 4

How It Compares

vs. Standard Vector RAG

LangChain, LlamaIndex

Vector RAG: chunk-embed-retrieve with cosine similarity. Fast, cheap, scales to millions.
AgenticRAG: tree-navigate-reason with LLM agents. Slower, but contextually complete.
Vector RAG is better for massive datasets. AgenticRAG is better when a wrong number is a compliance violation.

vs. Microsoft GraphRAG

Entity extraction + knowledge graphs

GraphRAG extracts entities and relationships to build a knowledge graph. Massive compute during ingestion.
AgenticRAG uses document structure (headings), not extracted entities. Orders of magnitude cheaper to ingest.
GraphRAG excels at global themes. AgenticRAG excels at targeted, verifiable facts.

vs. ReAct / FLARE Agents

Free-form agentic retrieval loops

ReAct uses generic tool-calling loops where the agent decides what to search.
AgenticRAG hardcodes the research workflow (Plan > Hunt > Synthesize > Evaluate > Critic).
Constrained pipeline is more predictable, debuggable, and resistant to infinite loops.

Open Source, MIT Licensed

AgenticRAG is free to use, modify, and deploy. Three lines of code to get started. No vector database, no embeddings, no infrastructure.

$ pip install agentic-rag-core

from agenticrag import Forest

forest = Forest(verbose=True)
forest.add("report.pdf")

result = forest.ask("What was the net income?")
print(result.text)
print(result.confidence)  # 0.0-1.0

References

Lewis, P., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020.
Edge, D., et al. (2024). "From Local to Global: A Graph RAG Approach to Query-Focused Summarization." Microsoft Research.
Yao, S., et al. (2023). "ReAct: Synergizing Reasoning and Acting in Language Models." ICLR 2023.
Jiang, Z., et al. (2023). "Active Retrieval Augmented Generation." EMNLP 2023.
Gao, Y., et al. (2024). "Retrieval-Augmented Generation for Large Language Models: A Survey." arXiv:2312.10997.