// AIINFRA 202 · Semester 2
Model Adaptation — Fine-Tuning & Quantization
Customize and compress open-weight LLMs on consumer and cloud GPUs
This hands-on course teaches adult learners to adapt open-weight language models to specific tasks and domains without enterprise-scale hardware. Students learn to decide when to fine-tune versus prompt-engineer or use RAG, build and format instruction datasets, and run supervised and parameter-efficient fine-tuning (LoRA, QLoRA, PEFT) plus preference tuning with DPO using Hugging Face, Unsloth, and Axolotl. The course also covers quantization formats (GGUF, AWQ, GPTQ, FP8), KV-cache quantization, and knowledge distillation to shrink models for deployment. Learners evaluate adapted models, guard against overfitting and catastrophic forgetting, and address the licensing and ethics of model weights and training data.
Outcomes
Course objectives
- Evaluate a task and choose correctly among prompt engineering, retrieval-augmented generation, and fine-tuning, justifying the decision by cost, data availability, and hardware limits.
- Build, clean, and format instruction and chat datasets and run supervised and parameter-efficient fine-tuning (LoRA, QLoRA, PEFT) on a single consumer or cloud GPU.
- Apply preference tuning with DPO and track experiments to improve a model while avoiding overfitting and catastrophic forgetting.
- Quantize models into GGUF, AWQ, GPTQ, and FP8 formats and analyze the resulting precision, quality, and memory tradeoffs.
- Assess licensing and ethical constraints on model weights and training data and package an adapted, quantized model for serving.
Student learning outcomes
- Choose correctly among prompt engineering, RAG, and fine-tuning for a given task, cost, and hardware budget.
- Build instruction/chat datasets and run SFT and parameter-efficient fine-tuning (LoRA/QLoRA/PEFT) on a single GPU.
- Apply DPO preference tuning and track experiments while avoiding overfitting and catastrophic forgetting.
- Quantize models to GGUF, AWQ, GPTQ, and FP8 and analyze precision, quality, and memory tradeoffs.
- Evaluate model-weight and training-data licensing/ethics and package an adapted, quantized model for serving.