// AIINFRA 202 · Semester 2

Model Adaptation — Fine-Tuning & Quantization

Customize and compress open-weight LLMs on consumer and cloud GPUs

This hands-on course teaches adult learners to adapt open-weight language models to specific tasks and domains without enterprise-scale hardware. Students learn to decide when to fine-tune versus prompt-engineer or use RAG, build and format instruction datasets, and run supervised and parameter-efficient fine-tuning (LoRA, QLoRA, PEFT) plus preference tuning with DPO using Hugging Face, Unsloth, and Axolotl. The course also covers quantization formats (GGUF, AWQ, GPTQ, FP8), KV-cache quantization, and knowledge distillation to shrink models for deployment. Learners evaluate adapted models, guard against overfitting and catastrophic forgetting, and address the licensing and ethics of model weights and training data.

Contact hours54 hrs

Credit equivalent3-unit

PrerequisiteAIINFRA 201

Length16 weeks

01 / outcomes

Outcomes

Course objectives

Evaluate a task and choose correctly among prompt engineering, retrieval-augmented generation, and fine-tuning, justifying the decision by cost, data availability, and hardware limits.
Build, clean, and format instruction and chat datasets and run supervised and parameter-efficient fine-tuning (LoRA, QLoRA, PEFT) on a single consumer or cloud GPU.
Apply preference tuning with DPO and track experiments to improve a model while avoiding overfitting and catastrophic forgetting.
Quantize models into GGUF, AWQ, GPTQ, and FP8 formats and analyze the resulting precision, quality, and memory tradeoffs.
Assess licensing and ethical constraints on model weights and training data and package an adapted, quantized model for serving.

Student learning outcomes

Choose correctly among prompt engineering, RAG, and fine-tuning for a given task, cost, and hardware budget.
Build instruction/chat datasets and run SFT and parameter-efficient fine-tuning (LoRA/QLoRA/PEFT) on a single GPU.
Apply DPO preference tuning and track experiments while avoiding overfitting and catastrophic forgetting.
Quantize models to GGUF, AWQ, GPTQ, and FP8 and analyze precision, quality, and memory tradeoffs.
Evaluate model-weight and training-data licensing/ethics and package an adapted, quantized model for serving.

02 / schedule

16-week schedule

Wk 01

Adapt, Prompt, or Retrieve: The Model Customization Decision

Introduces the canonical decision sequence — prompt engineering, then RAG, then fine-tuning — and when each is justified by cost, data, and hardware.

Wk 02

Open Models, Licenses, and the Fine-Tuning Toolchain Setup

Surveys open-weight model families and licenses and sets up the fine-tuning toolchain used throughout the course.

Wk 03

Dataset Design and Data Preparation for Fine-Tuning

Covers designing, cleaning, and preparing datasets suitable for supervised fine-tuning.

Wk 04

Instruction and Chat Formatting, Tokenization, and Templates

Covers instruction and chat formatting conventions, tokenization, and chat templates used to prepare training data.

Wk 05

Supervised Fine-Tuning Fundamentals with Hugging Face TRL

Introduces supervised fine-tuning fundamentals using the Hugging Face TRL library.

Wk 06

Parameter-Efficient Fine-Tuning: LoRA and PEFT

Covers parameter-efficient fine-tuning methods, focusing on LoRA and the Hugging Face PEFT library.

Wk 07

QLoRA and Training on Consumer GPUs with Unsloth

Covers QLoRA and using Unsloth to train models efficiently on consumer-grade GPUs.

Wk 08

Axolotl, Config-Driven Training, and Experiment Tracking

Midterm week: covers config-driven training with Axolotl and experiment tracking practices.

Midterm · covers Wks 1–7

Wk 09

Preference Tuning: DPO in Practice (with a Look at GRPO)

Covers Direct Preference Optimization in practice and contrasts it with GRPO's group-relative reward approach.

Wk 10

Evaluating Fine-Tuned Models and Detecting Overfitting

Covers evaluation methods for fine-tuned models and techniques for detecting overfitting.

Wk 11

Catastrophic Forgetting, Adapter Merging, and Cloud Training

Covers catastrophic forgetting, merging trained adapters into base models, and moving training workloads to the cloud.

Wk 12

Quantization Foundations: Precision, GGUF, and llama.cpp

Introduces quantization foundations, numeric precision tradeoffs, the GGUF format, and llama.cpp.

Wk 13

Advanced Quantization: AWQ, GPTQ, FP8, and KV-Cache Quantization

Covers advanced quantization methods including AWQ, GPTQ, FP8, and KV-cache quantization.

Wk 14

Knowledge Distillation and Quality-Size Tradeoff Analysis

Covers knowledge distillation techniques and analyzing quality-versus-size tradeoffs for compressed models.

Wk 15

Serving Adapted and Quantized Models — Handoff to AIINFRA 201

Covers serving fine-tuned, quantized models and preparing a handoff document for production serving in AIINFRA 201.

Wk 16

Capstone Project & Course Review

Final capstone week: students package and present an adapted, quantized model ready for deployment.

Capstone

03 / tools

Tools & frameworks

Fine-tuning libraries

Hugging Face TransformersPEFTTRL

Accelerated training

UnslothAxolotlbitsandbytes

Quantization

llama.cpp/GGUFAutoAWQGPTQModelllm-compressor (FP8)

Experiment tracking

Weights & BiasesTensorBoardHugging Face Hub

Compute platforms

Google ColabKaggle NotebooksRunPodLambda Cloud

Datasets and models

Hugging Face DatasetsHub model cardsmeta-llama/Qwen/Mistral/Gemma open weights

Evaluation

lm-evaluation-harnessLLM-as-judgecustom held-out test sets

Serving handoff

OllamavLLMLM Studiotext-generation-inference

← AIINFRA 201: Production Inference Serving & GPU Orchestration AIINFRA 300: Agentic AI & the Model Context Protocol (MCP) →