Librarius

active

August 2025

Python PostgreSQL pgvector Ollama Kubernetes

ai rag python self-hosted

An end-to-end Retrieval Augmented Generation (RAG) system for Warhammer 40K rulebooks. Fully self-hosted on my homelab K3s cluster — no cloud APIs, no subscriptions, just local models and local infrastructure.

Architecture

The system is built as a three-phase pipeline, each named after Librarian specializations from the lore:

Phase 1: Lexicanium — Data Ingestion

Extracts PDFs from archives with strict filename schema
Partitions documents using semantic chunking that preserves structure (sections, tables, stat blocks)
Handles both native and scanned PDFs with OCR via tesseract
Stores chunks in PostgreSQL with rich metadata (faction, edition, category, page numbers)

Phase 2: Epistolary — Embedding

Converts text chunks to 1024-dimensional vector embeddings using intfloat/multilingual-e5-large-instruct
GPU-accelerated batch processing
Stored in PostgreSQL with the pgvector extension

Phase 3: Codicier — Retrieval & Generation

Embeds user queries and performs k-NN vector search
Provides retrieved context to a local LLM via Ollama
Supports interactive chat mode with source citations
Filterable by game system (40K, 30K, Kill Team, etc.)

Why

LLMs hallucinate game rules constantly. By grounding responses in actual rulebook content with semantic search, the system provides accurate, sourced answers about game mechanics.