Qdrant is a high-performance Rust vector database for RAG and semantic search (Apache 2.0). We self-hosted it on the client's Kubernetes and built a multi-tenant "chat with your docs" retrieval layer for customers who can't send data to hosted AI services.
A B2B knowledge-management vendor wanted to ship a self-hosted "chat with your docs" feature for enterprise customers in regulated industries who cannot send data to OpenAI or Pinecone. They needed strict tenant isolation and high-quality retrieval, fully inside the customer's environment.
- Running a production vector database inside the customer's own Kubernetes with no external AI dependencies.
- Enforcing hard multi-tenant isolation so one tenant's query could never surface another's documents.
- Getting retrieval quality high enough for trustworthy answers over dense enterprise PDFs.
We deployed Qdrant via Helm on the client's Kubernetes, created collections sized for the chosen embedding model, chunked PDFs with LangChain, upserted points with tenant metadata, and ran hybrid dense+sparse search with a hard tenant filter and ColBERT late-interaction reranking.
- A Helm-deployed Qdrant cluster on the client's Kubernetes with collections at Distance.Cosine and the right vector size (1536 for OpenAI, 768 for a local BGE model).
- An ingestion pipeline that chunks PDFs via LangChain and upserts points with {source, page, tenant_id} payloads.
- Query-time hybrid dense+sparse search with a mandatory must filter on tenant_id, reranked with ColBERT late-interaction before the top-K context is streamed to the LLM.
A customized view of the system we shipped for this engagement — the components and how requests and data flow between them.
Before — manual bottleneck flow
Regulated buyers refuse any feature that ships their documents to a third-party AI service.
Staff hunt across shared drives and PDFs by keyword, missing relevant context.
Without enforced isolation, a naive vector search could surface another tenant's data.
After — automated optimized flow
PDFs are chunked and upserted into Qdrant with tenant metadata, inside the customer's K8s.
Hybrid dense+sparse retrieval runs with a hard tenant_id filter, then ColBERT rerank.
Top-K context streams to the model and a sourced answer returns to the user.
“Our enterprise buyers in finance and healthcare wouldn't let documents leave their network, full stop. Running retrieval in their own cluster with tenant isolation enforced at query time is what finally unblocked those deals for us.”

