← All courses

// AIINFRA 200 · Semester 2

Cloud Platforms for AI & CI/CD for ML

Deploy and automate AI workloads on AWS, Azure, and Google Cloud

This course prepares students to deploy, automate, and operate AI/ML workloads on major cloud platforms. Students will provision cloud infrastructure across AWS, Azure, and Google Cloud, then build CI/CD pipelines with GitHub Actions to automatically test, build, and deploy containerized model services. The course concludes with MLOps foundations — experiment tracking, model registries, and infrastructure-as-code — culminating in an end-to-end, automated, cost-aware deployment pipeline for a machine learning application.

Contact hours54 hrs
Credit equivalent3-unit
PrerequisiteAIINFRA 102
Length16 weeks
01 / outcomes

Outcomes

Learning outcomes

  1. Provision and secure cloud resources (accounts, IAM, networking, cost controls) for AI workloads across AWS, Azure, and Google Cloud.
  2. Deploy AI workloads using cloud compute, GPU instances, object storage, and managed AI/ML services.
  3. Build CI/CD pipelines with GitHub Actions that test, build, and deploy containerized model services automatically.
  4. Apply MLOps foundations — experiment tracking, model registry, and reproducible pipelines — with MLflow and Terraform.
  5. Monitor, right-size, and cost-optimize cloud AI deployments, choosing appropriate services for a given use case.
02 / schedule

16-week schedule

Wk 01
Cloud Computing for AI: Providers, Regions, and Free Tiers
Introduces AWS, Azure, and Google Cloud free-tier structures, and the distinction between cloud regions and Availability Zones.
Wk 02
Cloud Accounts, IAM, and Cost Guardrails
Covers least-privilege IAM, multi-account guardrails, and configuring AWS Budgets to control cloud spending.
Wk 03
Cloud Compute for AI: VMs, GPU Instances, and Spot Pricing
Compares On-Demand, Spot, and GPU instance pricing models and teaches checkpointing strategies for interruptible training.
Wk 04
Cloud Storage and Data for AI: Object Storage and Buckets
Covers S3 buckets, storage classes, lifecycle rules, versioning, and security defaults for AI data storage.
Wk 05
Managed AI/ML Services: SageMaker, Vertex AI, and Azure ML
Compares the architecture and philosophy of SageMaker AI, Vertex AI (Gemini Enterprise Agent Platform), and Azure Machine Learning.
Wk 06
Serverless and Managed Kubernetes in the Cloud
Covers AWS Lambda serverless compute, managed Kubernetes options (EKS, GKE Autopilot), and KServe model serving.
Wk 07
Infrastructure as Code for Cloud AI with Terraform
Teaches the Terraform workflow, state management, and provisioning free-tier cloud AI infrastructure as code.
Wk 08
Cloud Security, Networking, and Secrets
Covers security groups vs. network ACLs, secrets management, and GitHub Actions OIDC authentication; this week includes the course midterm.
Midterm · covers Wks 1–7
Wk 09
Introduction to CI/CD and GitHub Actions
Introduces Continuous Integration, Delivery, and Deployment concepts and building a first GitHub Actions workflow.
Wk 10
Continuous Integration for ML: Testing and Image Builds
Extends CI with data-quality and model-evaluation gates, and builds reproducible Docker image pipelines.
Wk 11
Continuous Delivery for ML: Automated Model Deployment
Covers blue-green vs. canary deployment strategies and building tested rollback plans for model releases.
Wk 12
MLOps Foundations: Experiment Tracking and Model Registry with MLflow
Introduces MLflow Tracking and Model Registry for logging experiments and governing model versions.
Wk 13
End-to-End Automated Model Deployment Pipelines
Chains CI/CD and MLOps tools into a full automated pipeline from code commit to monitored production deployment.
Wk 14
Monitoring and Cost Optimization for Cloud AI
Covers CloudWatch metrics, logs, and alarms alongside cost optimization levers for cloud AI workloads.
Wk 15
Multi-Cloud, Portability, and GitOps
Covers GitOps principles, ArgoCD vs. Flux, and the real costs and limits of multi-cloud portability.
Wk 16
Capstone Project & Course Review
Students design and partially build an end-to-end cloud AI system as a final capstone project and reflect on the course.
Capstone
03 / tools

Tools & frameworks

Cloud Platforms
AWSMicrosoft AzureGoogle Cloud
Compute & Storage
EC2 / Compute Engine / Azure VMsGPU instances (G5, P5, Trainium3)S3 / GCS / Blob Storage
Managed AI/ML Services
AWS SageMakerGoogle Vertex AI (Gemini Enterprise Agent Platform)Azure Machine Learning
CI/CD
GitHub Actionsact (local runner)Docker container image build/pushGitHub Container Registry (GHCR)
MLOps
MLflow (tracking + model registry)Great ExpectationsDocker
IaC, Orchestration & Monitoring
TerraformLocalStackKubernetes (EKS, GKE Autopilot, Minikube, kind)ArgoCD / FluxPrometheus/GrafanaCloudWatch

What this course trains you for

Software Developers$179,292 median
Computer Occupations, All Other$138,203 median
Computer & Information Systems Managers$221,952 median

CA median wages, 2024–34 projections (EDD/OEWS). See the full labor-market dashboard on the program overview.