← Back to Projects
RAGLLMPythonData AnalyticsRisk Analytics

Financial Document RAG Q&A System

A RAG (Retrieval-Augmented Generation) Q&A system built for Chinese financial regulatory documents (PBOC / SAFE regulations), enabling natural-language compliance queries with precise answers and source citations.

9 min read
Interactive Demo
5 tech tags

Overview

Designed for fintech compliance teams, this system transforms regulatory documents from the People's Bank of China, SAFE, and the NPC into a searchable knowledge base. Users query in natural language (e.g. 'What are the reserve fund requirements for non-bank payment institutions?'), and the system retrieves relevant regulatory passages and synthesizes a grounded answer with original-text citations.

Key Features

01

Hybrid retrieval: BM25 sparse search + dense vector retrieval (dual-path recall) for superior relevance on complex regulatory queries

02

Semantic-aware chunking: splits documents at clause boundaries to preserve legal meaning and avoid cross-clause truncation

03

Grounded LLM synthesis: retrieved passages fed to an LLM to generate coherent answers with per-claim source attribution

04

Citation tracing: every answer includes regulation name, article number, and an original-text excerpt for fast verification

05

Multi-document retrieval: simultaneous search across multiple regulations with automatic cross-reference handling

Methodology

Built with LangChain + ChromaDB. Documents are chunked at regulatory clause boundaries (chunk_size=512, overlap=64). Retrieval blends BM25 and text-embedding-3-small vector scores (weights 0.3:0.7). Generation uses GPT-4o with structured JSON output (answer, sources, confidence), and the frontend renders inline citation cards from this schema.

Tech Stack

RAGRAG
LLMLLM
PythonPY
Data AnalyticsDA
Risk AnalyticsRISK

Project Info

Read time9 min
Live demoAvailable
FeaturedYes
Tags5
← Back to Projects
Interactive Demo

RAG Q&A Live Demo

Experience the full hybrid retrieval pipeline live — searching 11 real regulatory documents and generating cited answers.

RAG · Hybrid Retrieval Demo

Financial Regulation Q&A

BM25 + dense vector hybrid · 11 regulatory docs · real-time retrieval + citations

11

docs