Skip to content

AWS SageMaker — Machine Learning Platform

AWS SageMaker — Machine Learning Platform

Amazon SageMaker is AWS’s fully managed end-to-end machine learning platform. It covers every step of the ML lifecycle — from data preparation and model training to deployment and monitoring.

In Azure terms: SageMaker ≈ Azure Machine Learning (Azure ML)

SageMaker Components

Component	Description
Studio	Web-based IDE for ML (notebooks, pipelines, experiments)
Notebooks	Managed Jupyter notebooks with pre-built ML environments
Training Jobs	Run model training on managed compute (CPU/GPU clusters)
Endpoints	Deploy models as real-time HTTP endpoints
Batch Transform	Run inference on large datasets (no live endpoint)
Pipelines	MLOps workflow orchestration
Feature Store	Centralized storage and reuse of ML features
Model Registry	Version, track, and approve models before deployment
Data Wrangler	Visual data preparation and transformation
Clarify	Bias detection and model explainability
Experiments	Track model training runs and hyperparameter comparisons
Canvas	No-code ML for business analysts
Bedrock	Separate service — managed access to foundation models (LLMs)

SageMaker ML Workflow

Data Prep (Data Wrangler / S3)
    ↓
Feature Engineering (Feature Store)
    ↓
Model Training (Training Job — EC2 GPU/CPU)
    ↓
Model Evaluation (Clarify, Experiments)
    ↓
Model Registry (versioning + approval)
    ↓
Deployment (Real-time Endpoint / Batch Transform)
    ↓
Monitoring (Model Monitor — drift detection)

Training Jobs

SageMaker provisions compute, trains your model, and terminates the instance — you pay only for training time:

import sagemaker
from sagemaker.estimator import Estimator

estimator = Estimator(
    image_uri='763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.1.0-gpu-py310',
    role='arn:aws:iam::123456789:role/SageMakerRole',
    instance_type='ml.g4dn.xlarge',  # GPU instance
    instance_count=1,
    hyperparameters={
        'epochs': 10,
        'learning-rate': 0.001,
    },
    output_path='s3://my-bucket/model-output/'
)

estimator.fit({'train': 's3://my-bucket/training-data/'})

Real-Time Endpoints

Deploy trained models as REST APIs:

predictor = estimator.deploy(
    initial_instance_count=1,
    instance_type='ml.m5.large'
)

# Invoke the endpoint
result = predictor.predict({'input': [1.0, 2.0, 3.0]})

# Invoke endpoint via CLI
aws sagemaker-runtime invoke-endpoint \
  --endpoint-name my-endpoint \
  --content-type application/json \
  --body '{"input": [1.0, 2.0, 3.0]}' \
  output.json

Built-in Algorithms

SageMaker includes many optimized built-in algorithms (no container needed):

Algorithm	Type	Use Case
XGBoost	Gradient boosting	Classification, regression
Linear Learner	Linear models	Binary/multiclass classification
K-Means	Clustering	Customer segmentation
BlazingText	NLP	Text classification, word2vec
Object Detection	Computer vision	Detect objects in images
Semantic Segmentation	Computer vision	Pixel-level image labeling
DeepAR	Time series	Demand forecasting
Random Cut Forest	Anomaly detection	Fraud detection

SageMaker vs Azure Machine Learning

Feature	SageMaker	Azure ML
IDE	SageMaker Studio	Azure ML Studio
Notebooks	Managed Jupyter	Managed Jupyter + VS Code
Training	Training Jobs (managed clusters)	Compute Clusters / Compute Instances
Model deployment	Endpoints (real-time/batch)	Online / Batch Endpoints
MLOps pipelines	SageMaker Pipelines	Azure ML Pipelines
Model registry	SageMaker Model Registry	Azure ML Model Registry
Feature store	SageMaker Feature Store	Azure ML Feature Store
No-code ML	SageMaker Canvas	Azure ML Automated ML
LLM access	Amazon Bedrock	Azure OpenAI Service
Experiment tracking	SageMaker Experiments	Azure ML Experiments / MLflow

Amazon Bedrock (Generative AI)

For Large Language Models (LLMs) and foundation models, AWS uses Amazon Bedrock (not SageMaker directly):

Access models from Anthropic (Claude), Meta (Llama), Amazon (Titan), Mistral, Cohere
Serverless — no infrastructure to manage
Supports RAG (Retrieval-Augmented Generation) via Knowledge Bases
Fine-tuning and agents support

In Azure terms: Amazon Bedrock ≈ Azure OpenAI Service

Useful Links