← All courses

// AIINFRA 102 · Semester 1

Containerization & GPU-Aware Orchestration

Package, deploy, and scale AI workloads with Docker and Kubernetes

This course teaches students to package, run, and orchestrate AI and Python applications using industry-standard containerization and orchestration tools. Students progress from building Docker images and GPU-enabled containers to deploying, scaling, and monitoring multi-container AI workloads on Kubernetes, including GPU-aware scheduling with the NVIDIA GPU Operator. By the end of the course, students can package a containerized AI application with Helm, provision its infrastructure with Terraform, and monitor it in production with Prometheus and Grafana.

Contact hours54 hrs
Credit equivalent3-unit
PrerequisiteAIINFRA 101
Length16 weeks
01 / outcomes

Outcomes

Learning outcomes

  1. Explain containerization and build optimized Docker images for AI/Python applications.
  2. Compose and run multi-container AI stacks locally, including GPU-enabled containers via the NVIDIA Container Toolkit.
  3. Deploy and manage AI workloads on Kubernetes using pods, deployments, services, config, and persistent storage.
  4. Configure resource limits, autoscaling, and GPU-aware scheduling with the NVIDIA GPU Operator.
  5. Package, provision, monitor, and troubleshoot containerized AI systems using Helm, Terraform, and Prometheus/Grafana.
02 / schedule

16-week schedule

Wk 01
Why Containers for AI: VMs, Images, and the Docker Model
Introduces containerization concepts, contrasting VMs and container images, and the Docker model for packaging AI apps.
Wk 02
Docker Fundamentals: Running and Managing Containers
Covers the container lifecycle — running, inspecting, executing into, and managing Docker containers with resource and port controls.
Wk 03
Building AI Container Images with Dockerfiles
Teaches Dockerfile fundamentals, layer caching, multi-stage builds, and security practices for containerizing AI inference apps.
Wk 04
Docker Compose and Multi-Container AI Applications
Uses Docker Compose to define and run multi-container AI application stacks locally.
Wk 05
GPU Containers and the NVIDIA Container Toolkit
Covers running GPU-enabled containers using the NVIDIA Container Toolkit for AI workloads.
Wk 06
Container Registries, Tagging, and Image Security
Covers pushing images to registries, semantic versioning tag strategy, and scanning images for vulnerabilities with Trivy.
Wk 07
Introduction to Kubernetes: Architecture and kubectl
Introduces Kubernetes architecture and the kubectl command-line tool for interacting with a cluster.
Wk 08
Kubernetes Workloads: Deployments, Services, and Config
Covers Kubernetes Deployments, Services, and configuration objects. This week includes the course midterm.
Midterm · covers Wks 1–7
Wk 09
Deploying an AI Inference App to Kubernetes
Applies Kubernetes fundamentals to deploy a real AI inference application onto a cluster.
Wk 10
Scaling and Resource Management with the Horizontal Pod Autoscaler
Covers resource requests/limits and autoscaling AI workloads with the Kubernetes Horizontal Pod Autoscaler.
Wk 11
GPU-Aware Kubernetes with the NVIDIA GPU Operator
Covers GPU-aware scheduling in Kubernetes using the NVIDIA GPU Operator and device plugin.
Wk 12
Storage, Volumes, and Persistence for AI Models and Data
Covers Kubernetes storage, volumes, and persistence strategies for AI models and datasets.
Wk 13
Packaging Kubernetes Apps with Helm
Teaches packaging and deploying Kubernetes applications as reusable charts with Helm.
Wk 14
Infrastructure as Code with Terraform
Introduces Terraform's declarative model — providers, resources, and state — to provision a local Kubernetes cluster and app as code.
Wk 15
Monitoring, Logging, and Troubleshooting Containerized AI
Covers installing the kube-prometheus-stack, reading GPU metrics via DCGM Exporter, and troubleshooting failing pods with kubectl.
Wk 16
Capstone Project & Course Review
Students design, build, and present a final capstone project demonstrating mastery of the course's containerization and orchestration outcomes.
Capstone
03 / tools

Tools & frameworks

Containers
DockerDocker ComposeNVIDIA Container Toolkit
Orchestration
kubectlminikube/kindDeployments/Services/IngressHorizontal Pod Autoscaler
GPU Scheduling
NVIDIA GPU OperatorDevice pluginGPU resource requestsDCGM Exporter
Packaging & Infrastructure as Code
HelmTerraform
Registries & Security
Docker HubGitHub Container RegistryTrivy image scanning
Monitoring
PrometheusGrafanaLiveness/readiness probes

What this course trains you for

Computer Network Architects$163,317 median
Network & Computer Systems Administrators$109,420 median
Software Developers$179,292 median

CA median wages, 2024–34 projections (EDD/OEWS). See the full labor-market dashboard on the program overview.