← All courses

// AIINFRA 202 · Semester 2

Model Adaptation — Fine-Tuning & Quantization

Customize and compress open-weight LLMs on consumer and cloud GPUs

This hands-on course teaches adult learners to adapt open-weight language models to specific tasks and domains without enterprise-scale hardware. Students learn to decide when to fine-tune versus prompt-engineer or use RAG, build and format instruction datasets, and run supervised and parameter-efficient fine-tuning (LoRA, QLoRA, PEFT) plus preference tuning with DPO using Hugging Face, Unsloth, and Axolotl. The course also covers quantization formats (GGUF, AWQ, GPTQ, FP8), KV-cache quantization, and knowledge distillation to shrink models for deployment. Learners evaluate adapted models, guard against overfitting and catastrophic forgetting, and address the licensing and ethics of model weights and training data.

Contact hours54 hrs
Credit equivalent3-unit
PrerequisiteAIINFRA 201
Length16 weeks
01 / outcomes

Outcomes

Course objectives

  1. Evaluate a task and choose correctly among prompt engineering, retrieval-augmented generation, and fine-tuning, justifying the decision by cost, data availability, and hardware limits.
  2. Build, clean, and format instruction and chat datasets and run supervised and parameter-efficient fine-tuning (LoRA, QLoRA, PEFT) on a single consumer or cloud GPU.
  3. Apply preference tuning with DPO and track experiments to improve a model while avoiding overfitting and catastrophic forgetting.
  4. Quantize models into GGUF, AWQ, GPTQ, and FP8 formats and analyze the resulting precision, quality, and memory tradeoffs.
  5. Assess licensing and ethical constraints on model weights and training data and package an adapted, quantized model for serving.

Student learning outcomes

  • Choose correctly among prompt engineering, RAG, and fine-tuning for a given task, cost, and hardware budget.
  • Build instruction/chat datasets and run SFT and parameter-efficient fine-tuning (LoRA/QLoRA/PEFT) on a single GPU.
  • Apply DPO preference tuning and track experiments while avoiding overfitting and catastrophic forgetting.
  • Quantize models to GGUF, AWQ, GPTQ, and FP8 and analyze precision, quality, and memory tradeoffs.
  • Evaluate model-weight and training-data licensing/ethics and package an adapted, quantized model for serving.
02 / schedule

16-week schedule

Wk 01
Adapt, Prompt, or Retrieve: The Model Customization Decision
Introduces the canonical decision sequence — prompt engineering, then RAG, then fine-tuning — and when each is justified by cost, data, and hardware.
Wk 02
Open Models, Licenses, and the Fine-Tuning Toolchain Setup
Surveys open-weight model families and licenses and sets up the fine-tuning toolchain used throughout the course.
Wk 03
Dataset Design and Data Preparation for Fine-Tuning
Covers designing, cleaning, and preparing datasets suitable for supervised fine-tuning.
Wk 04
Instruction and Chat Formatting, Tokenization, and Templates
Covers instruction and chat formatting conventions, tokenization, and chat templates used to prepare training data.
Wk 05
Supervised Fine-Tuning Fundamentals with Hugging Face TRL
Introduces supervised fine-tuning fundamentals using the Hugging Face TRL library.
Wk 06
Parameter-Efficient Fine-Tuning: LoRA and PEFT
Covers parameter-efficient fine-tuning methods, focusing on LoRA and the Hugging Face PEFT library.
Wk 07
QLoRA and Training on Consumer GPUs with Unsloth
Covers QLoRA and using Unsloth to train models efficiently on consumer-grade GPUs.
Wk 08
Axolotl, Config-Driven Training, and Experiment Tracking
Midterm week: covers config-driven training with Axolotl and experiment tracking practices.
Midterm · covers Wks 1–7
Wk 09
Preference Tuning: DPO in Practice (with a Look at GRPO)
Covers Direct Preference Optimization in practice and contrasts it with GRPO's group-relative reward approach.
Wk 10
Evaluating Fine-Tuned Models and Detecting Overfitting
Covers evaluation methods for fine-tuned models and techniques for detecting overfitting.
Wk 11
Catastrophic Forgetting, Adapter Merging, and Cloud Training
Covers catastrophic forgetting, merging trained adapters into base models, and moving training workloads to the cloud.
Wk 12
Quantization Foundations: Precision, GGUF, and llama.cpp
Introduces quantization foundations, numeric precision tradeoffs, the GGUF format, and llama.cpp.
Wk 13
Advanced Quantization: AWQ, GPTQ, FP8, and KV-Cache Quantization
Covers advanced quantization methods including AWQ, GPTQ, FP8, and KV-cache quantization.
Wk 14
Knowledge Distillation and Quality-Size Tradeoff Analysis
Covers knowledge distillation techniques and analyzing quality-versus-size tradeoffs for compressed models.
Wk 15
Serving Adapted and Quantized Models — Handoff to AIINFRA 201
Covers serving fine-tuned, quantized models and preparing a handoff document for production serving in AIINFRA 201.
Wk 16
Capstone Project & Course Review
Final capstone week: students package and present an adapted, quantized model ready for deployment.
Capstone
03 / tools

Tools & frameworks

Fine-tuning libraries
Hugging Face TransformersPEFTTRL
Accelerated training
UnslothAxolotlbitsandbytes
Quantization
llama.cpp/GGUFAutoAWQGPTQModelllm-compressor (FP8)
Experiment tracking
Weights & BiasesTensorBoardHugging Face Hub
Compute platforms
Google ColabKaggle NotebooksRunPodLambda Cloud
Datasets and models
Hugging Face DatasetsHub model cardsmeta-llama/Qwen/Mistral/Gemma open weights
Evaluation
lm-evaluation-harnessLLM-as-judgecustom held-out test sets
Serving handoff
OllamavLLMLM Studiotext-generation-inference

What this course trains you for

Data Scientists$145,554 median
Computer Occupations, All Other$138,203 median

CA median wages, 2024–34 projections (EDD/OEWS). See the full labor-market dashboard on the program overview.