Librarius
active
Python
PostgreSQL
pgvector
Ollama
Kubernetes
An end-to-end Retrieval Augmented Generation (RAG) system for Warhammer 40K rulebooks. Fully self-hosted on my homelab K3s cluster — no cloud APIs, no subscriptions, just local models and local infrastructure.
Architecture
The system is built as a three-phase pipeline, each named after Librarian specializations from the lore:
Phase 1: Lexicanium — Data Ingestion
- Extracts PDFs from archives with strict filename schema
- Partitions documents using semantic chunking that preserves structure (sections, tables, stat blocks)
- Handles both native and scanned PDFs with OCR via tesseract
- Stores chunks in PostgreSQL with rich metadata (faction, edition, category, page numbers)
Phase 2: Epistolary — Embedding
- Converts text chunks to 1024-dimensional vector embeddings using
intfloat/multilingual-e5-large-instruct - GPU-accelerated batch processing
- Stored in PostgreSQL with the pgvector extension
Phase 3: Codicier — Retrieval & Generation
- Embeds user queries and performs k-NN vector search
- Provides retrieved context to a local LLM via Ollama
- Supports interactive chat mode with source citations
- Filterable by game system (40K, 30K, Kill Team, etc.)
Why
LLMs hallucinate game rules constantly. By grounding responses in actual rulebook content with semantic search, the system provides accurate, sourced answers about game mechanics.