Multiple LLMs voting together on content validation catch each other’s mistakes to achieve 95.6% accuracy.

arxiv.org

Multiple LLMs voting together on content validation catch each other’s mistakes to achieve 95.6% accuracy.

arxiv.org

Lugh@futurology.todayM to

Futurology@futurology.todayEnglish · 7 months ago

Probabilistic Consensus through Ensemble Validation: A Framework for LLM Reliability

arxiv.org

Large Language Models (LLMs) have shown significant advances in text generation but often lack the reliability needed for autonomous deployment in high-stakes domains like healthcare, law, and finance. Existing approaches rely on external knowledge or human oversight, limiting scalability. We introduce a novel framework that repurposes ensemble methods for content validation through model consensus. In tests across 78 complex cases requiring factual accuracy and causal consistency, our framework improved precision from 73.1% to 93.9% with two models (95% CI: 83.5%-97.9%) and to 95.6% with three models (95% CI: 85.2%-98.8%). Statistical analysis indicates strong inter-model agreement ($κ$ > 0.76) while preserving sufficient independence to catch errors through disagreement. We outline a clear pathway to further enhance precision with additional validators and refinements. Although the current approach is constrained by multiple-choice format requirements and processing latency, it offers immediate value for enabling reliable autonomous AI systems in critical applications.

You must log in or register to comment.

Chat