← All courses

// AIINFRA 301 · Semester 3

Production RAG & LLMOps — Observability and Evaluation

Building, Evaluating, and Operating Retrieval-Augmented LLM Systems in Production

This course prepares learners to design, evaluate, and operate retrieval-augmented generation systems using modern vector databases, hybrid search, and reranking techniques. Students build RAG evaluation pipelines with RAGAS and DeepEval as automated quality gates, then instrument production systems with tracing, prompt versioning, and drift monitoring using industry-standard LLMOps tooling. Emphasis is placed on hands-on labs that mirror real workplace tasks, from indexing strategy selection through cost-aware, observable deployment. Learners exit ready to support or own the retrieval and evaluation layer of an enterprise AI application.

Contact hours54 hrs
Credit equivalent3-unit
PrerequisiteAIINFRA 300
Length16 weeks
01 / outcomes

Outcomes

Course objectives

  1. Select and configure a vector database (Qdrant, Milvus, Weaviate, or pgvector) based on scale, filtering, and indexing tradeoffs
  2. Design chunking and hybrid retrieval pipelines combining dense embeddings, BM25, and cross-encoder reranking
  3. Build automated RAG evaluation suites with RAGAS and DeepEval to gate retrieval and generation quality in CI/CD
  4. Instrument LLM applications with tracing and observability platforms to diagnose latency, cost, and quality regressions
  5. Implement production monitoring, prompt versioning, and feedback loops to detect and respond to model and data drift

Student learning outcomes

  • Select and configure a vector database (Qdrant, Milvus, Weaviate, or pgvector) based on scale, filtering, and indexing tradeoffs.
  • Design chunking and hybrid retrieval pipelines combining dense embeddings, BM25, and cross-encoder reranking.
  • Build automated RAG evaluation suites with RAGAS and DeepEval to gate retrieval and generation quality in CI/CD.
  • Instrument LLM applications with tracing and observability platforms to diagnose latency, cost, and quality regressions.
  • Implement production monitoring, prompt versioning, and feedback loops to detect and respond to model and data drift.
02 / schedule

16-week schedule

Wk 01
Introduction to Production RAG Architecture and the LLMOps Lifecycle
Introduces why production RAG systems exist and surveys the LLMOps lifecycle that governs how they are built and operated.
Wk 02
Embeddings and Vector Representations for Retrieval
Covers embeddings and vector representations, including the indexing pipeline that turns documents and queries into comparable vectors.
Wk 03
Vector Database Selection: Qdrant, Milvus, Weaviate, and pgvector
Compares Qdrant, Milvus, Weaviate, and pgvector to select a vector database based on scale and filtering needs.
Wk 04
Indexing Deep Dive: HNSW vs IVF and Metadata Filtering
Covers HNSW versus IVF indexing algorithms and metadata filtering strategies for vector search.
Wk 05
Chunking Strategies for Real-World Documents
Covers fixed-size, recursive, and semantic chunking strategies for splitting real-world documents.
Wk 06
Hybrid Search: Combining Dense Retrieval with BM25
Covers hybrid search techniques that combine dense embedding retrieval with sparse BM25 keyword search.
Wk 07
Cross-Encoder Rerankers and Query Rewriting/Expansion
Covers cross-encoder reranking and query rewriting/expansion techniques such as HyDE to improve retrieval quality.
Wk 08
Advanced Retrieval Patterns: GraphRAG and Agentic RAG
Midterm week: covers advanced retrieval patterns including GraphRAG and agentic RAG alongside the course midterm assessment.
Midterm · covers Wks 1–7
Wk 09
RAG Evaluation Foundations with RAGAS
Introduces RAG evaluation foundations using the RAGAS framework to score retrieval and generation quality.
Wk 10
Deep Evaluation and Failure Analysis with DeepEval
Covers deep evaluation and failure analysis of LLM outputs using DeepEval's Pytest-style testing approach.
Wk 11
Wiring Evaluation into CI/CD Quality Gates
Covers wiring automated RAG evaluation suites into CI/CD pipelines as quality gates.
Wk 12
Tracing and Observability: Langfuse and Arize Phoenix
Covers instrumenting LLM applications with tracing and observability using Langfuse and Arize Phoenix.
Wk 13
Observability at Scale: LangSmith, OpenLLMetry, and MLflow
Covers scaling observability practices using LangSmith, OpenLLMetry, and MLflow.
Wk 14
Prompt Management, Versioning, and Release Workflows
Covers prompt registries, versioning, immutability, and environment-based release workflows for prompts.
Wk 15
Production Monitoring: Drift Detection, Feedback Loops, Cost, and Latency
Covers production monitoring for model and data drift, feedback loops, cost, and latency.
Wk 16
Capstone Project & Course Review
Final capstone week: students design, build, and present a production-grade RAG/LLMOps system and review the course.
Capstone
03 / tools

Tools & frameworks

Vector Databases
QdrantMilvusWeaviatepgvector
Embedding & Retrieval Libraries
Sentence TransformersOpenAI Embeddings APIrank_bm25LangChain retrievers
Rerankers & Query Enhancement
Cohere Rerankcross-encoder models (Hugging Face)HyDE query expansion
RAG Evaluation Frameworks
RAGASDeepEval
LLM Observability & Tracing
LangfuseArize PhoenixLangSmithOpenLLMetry
MLOps & Experiment Tracking
MLflowWeights & Biases
Low-Code Prototyping On-Ramps
n8nLangflowFlowise

What this course trains you for

Data Scientists$145,554 median
Computer Occupations, All Other$138,203 median

CA median wages, 2024–34 projections (EDD/OEWS). See the full labor-market dashboard on the program overview.