Case 10 · Vertical AI Applications / AI & Automation

Qdrant Self-Hosted RAG for Regulated Enterprises

Applied-AI & Retrieval Infrastructure Engineersqdrant.techAI & AutomationCloud & DevOpsData & Integration
← All case studies

Qdrant is a high-performance Rust vector database for RAG and semantic search (Apache 2.0). We self-hosted it on the client's Kubernetes and built a multi-tenant "chat with your docs" retrieval layer for customers who can't send data to hosted AI services.

The challenge

A B2B knowledge-management vendor wanted to ship a self-hosted "chat with your docs" feature for enterprise customers in regulated industries who cannot send data to OpenAI or Pinecone. They needed strict tenant isolation and high-quality retrieval, fully inside the customer's environment.

  • Running a production vector database inside the customer's own Kubernetes with no external AI dependencies.
  • Enforcing hard multi-tenant isolation so one tenant's query could never surface another's documents.
  • Getting retrieval quality high enough for trustworthy answers over dense enterprise PDFs.
Our solution

We deployed Qdrant via Helm on the client's Kubernetes, created collections sized for the chosen embedding model, chunked PDFs with LangChain, upserted points with tenant metadata, and ran hybrid dense+sparse search with a hard tenant filter and ColBERT late-interaction reranking.

  • A Helm-deployed Qdrant cluster on the client's Kubernetes with collections at Distance.Cosine and the right vector size (1536 for OpenAI, 768 for a local BGE model).
  • An ingestion pipeline that chunks PDFs via LangChain and upserts points with {source, page, tenant_id} payloads.
  • Query-time hybrid dense+sparse search with a mandatory must filter on tenant_id, reranked with ColBERT late-interaction before the top-K context is streamed to the LLM.

A customized view of the system we shipped for this engagement — the components and how requests and data flow between them.

upserthybridisolatetop-K🖥️Chat-with-DocsUI📥LangChainIngestion⚙️RetrievalService📐Qdrant (HNSW)🔐tenant_id Filter🎯ColBERT Rerank🧠Local LLM
Rust (Qdrant)PythonLangChainHelm / KubernetesHNSW IndexingOpenAI / BGE EmbeddingsColBERT RerankingDocker
Shipped self-hosted RAG entirely inside the customer's Kubernetes — no data sent to hosted AI APIs.
Enforced hard tenant isolation via a mandatory tenant_id filter on every query.
Raised answer quality with hybrid dense+sparse retrieval and ColBERT reranking.
Direct value addedLets the vendor sell an AI document-chat feature into regulated accounts that would otherwise be off-limits, because nothing ever leaves the customer's environment.
Why it mattersRAG quality and data residency decide whether AI ships in regulated industries. A self-hosted Rust vector DB with hybrid search and reranking delivers both without a third-party dependency.

Before — manual bottleneck flow

1Hosted-AI RejectionBottleneck
Customer Security · Blocks deal

Regulated buyers refuse any feature that ships their documents to a third-party AI service.

2Manual Doc SearchBottleneck
Knowledge Worker · Hours

Staff hunt across shared drives and PDFs by keyword, missing relevant context.

3Tenant-Bleed RiskBottleneck
Platform Team · Ongoing

Without enforced isolation, a naive vector search could surface another tenant's data.

After — automated optimized flow

1In-Cluster Ingestion
LangChain Pipeline · Batch

PDFs are chunked and upserted into Qdrant with tenant metadata, inside the customer's K8s.

2Isolated Hybrid Search
Qdrant Engine · < 50 ms

Hybrid dense+sparse retrieval runs with a hard tenant_id filter, then ColBERT rerank.

3Grounded Answer
Local LLM · Seconds

Top-K context streams to the model and a sourced answer returns to the user.

Portrait of Priya Natarajan
Our enterprise buyers in finance and healthcare wouldn't let documents leave their network, full stop. Running retrieval in their own cluster with tenant isolation enforced at query time is what finally unblocked those deals for us.
Priya Natarajan at Zendesk

Have a problem like this?

Tell us your goal and we'll turn it into a structured plan — from idea to stable, scalable reality.

Contact us