← All courses
// AIINFRA 200 · Semester 2
Cloud Platforms for AI & CI/CD for ML
Deploy and automate AI workloads on AWS, Azure, and Google Cloud
This course prepares students to deploy, automate, and operate AI/ML workloads on major cloud platforms. Students will provision cloud infrastructure across AWS, Azure, and Google Cloud, then build CI/CD pipelines with GitHub Actions to automatically test, build, and deploy containerized model services. The course concludes with MLOps foundations — experiment tracking, model registries, and infrastructure-as-code — culminating in an end-to-end, automated, cost-aware deployment pipeline for a machine learning application.
01 / outcomes
Outcomes
Learning outcomes
- Provision and secure cloud resources (accounts, IAM, networking, cost controls) for AI workloads across AWS, Azure, and Google Cloud.
- Deploy AI workloads using cloud compute, GPU instances, object storage, and managed AI/ML services.
- Build CI/CD pipelines with GitHub Actions that test, build, and deploy containerized model services automatically.
- Apply MLOps foundations — experiment tracking, model registry, and reproducible pipelines — with MLflow and Terraform.
- Monitor, right-size, and cost-optimize cloud AI deployments, choosing appropriate services for a given use case.
02 / schedule
16-week schedule
Wk 01
Cloud Computing for AI: Providers, Regions, and Free Tiers
Introduces AWS, Azure, and Google Cloud free-tier structures, and the distinction between cloud regions and Availability Zones.
Wk 02
Cloud Accounts, IAM, and Cost Guardrails
Covers least-privilege IAM, multi-account guardrails, and configuring AWS Budgets to control cloud spending.
Wk 03
Cloud Compute for AI: VMs, GPU Instances, and Spot Pricing
Compares On-Demand, Spot, and GPU instance pricing models and teaches checkpointing strategies for interruptible training.
Wk 04
Cloud Storage and Data for AI: Object Storage and Buckets
Covers S3 buckets, storage classes, lifecycle rules, versioning, and security defaults for AI data storage.
Wk 05
Managed AI/ML Services: SageMaker, Vertex AI, and Azure ML
Compares the architecture and philosophy of SageMaker AI, Vertex AI (Gemini Enterprise Agent Platform), and Azure Machine Learning.
Wk 06
Serverless and Managed Kubernetes in the Cloud
Covers AWS Lambda serverless compute, managed Kubernetes options (EKS, GKE Autopilot), and KServe model serving.
Wk 07
Infrastructure as Code for Cloud AI with Terraform
Teaches the Terraform workflow, state management, and provisioning free-tier cloud AI infrastructure as code.
Wk 08
Cloud Security, Networking, and Secrets
Covers security groups vs. network ACLs, secrets management, and GitHub Actions OIDC authentication; this week includes the course midterm.
Midterm · covers Wks 1–7Wk 09
Introduction to CI/CD and GitHub Actions
Introduces Continuous Integration, Delivery, and Deployment concepts and building a first GitHub Actions workflow.
Wk 10
Continuous Integration for ML: Testing and Image Builds
Extends CI with data-quality and model-evaluation gates, and builds reproducible Docker image pipelines.
Wk 11
Continuous Delivery for ML: Automated Model Deployment
Covers blue-green vs. canary deployment strategies and building tested rollback plans for model releases.
Wk 12
MLOps Foundations: Experiment Tracking and Model Registry with MLflow
Introduces MLflow Tracking and Model Registry for logging experiments and governing model versions.
Wk 13
End-to-End Automated Model Deployment Pipelines
Chains CI/CD and MLOps tools into a full automated pipeline from code commit to monitored production deployment.
Wk 14
Monitoring and Cost Optimization for Cloud AI
Covers CloudWatch metrics, logs, and alarms alongside cost optimization levers for cloud AI workloads.
Wk 15
Multi-Cloud, Portability, and GitOps
Covers GitOps principles, ArgoCD vs. Flux, and the real costs and limits of multi-cloud portability.
Wk 16
Capstone Project & Course Review
Students design and partially build an end-to-end cloud AI system as a final capstone project and reflect on the course.
Capstone03 / tools
Tools & frameworks
Cloud Platforms
Compute & Storage
Managed AI/ML Services
CI/CD
MLOps
IaC, Orchestration & Monitoring