Skip to content

aws-ia/terraform-aws-sagemaker-endpoint

Repository files navigation

Terraform Amazon SageMaker Endpoint Module

This module includes resources to deploy Amazon SageMaker endpoints. It takes care of creating a SageMaker model, SageMaker endpoint configuration, and the SageMaker endpoint.

With Amazon SageMaker, you can start getting predictions, or inferences, from your trained machine learning models. SageMaker provides a broad selection of ML infrastructure and model deployment options to help meet all your ML inference needs. With SageMaker Inference, you can scale your model deployment, manage models more effectively in production, and reduce operational burden.

This module supports the following ways to deploy a model, depending on your use case:

  • For persistent, real-time endpoints that make one prediction at a time, use SageMaker real-time hosting services. Real-time inference is ideal for inference workloads where you have real-time, interactive, low latency requirements. You can deploy your model to SageMaker hosting services and get an endpoint that can be used for inference. These endpoints are fully managed and support autoscaling.
  • For requests with large payload sizes up to 1GB, long processing times, and near real-time latency requirements, use Amazon SageMaker Asynchronous Inference. Amazon SageMaker Asynchronous Inference is a capability in SageMaker that queues incoming requests and processes them asynchronously. This option is ideal for requests with large payload sizes (up to 1GB), long processing times (up to one hour), and near real-time latency requirements. Asynchronous Inference enables you to save on costs by autoscaling the instance count to zero when there are no requests to process, so you only pay when your endpoint is processing requests.

Model configuration

Single container

In the event that a single container is sufficient for your inference use-case, you can define a single-container model.

Inference pipeline

An inference pipeline is an Amazon SageMaker model that is composed of a linear sequence of multiple containers that process requests for inferences on data. See the AWS documentation to learn more about SageMaker inference pipelines. To define an inference pipeline, you can provide additional containers for your model.

Network isolation

If you enable network isolation, the containers can't make any outbound network calls, even to other AWS services such as Amazon Simple Storage Service (S3). Additionally, no AWS credentials are made available to the container runtime environment.

To enable network isolation, set the enable_network_isolation property to true.

Container images

ECR Image

Reference an image available within ECR

DLC Image

Reference an AWS Deep Learning Container image.

Model artifacts

If you choose to decouple your model artifacts from your inference code (as is natural given different rates of change between inference code and model artifacts), the artifacts can be specified via the model_data_source property of var.containers. The default is to have no model artifacts associated with a model. For instance: model_data_source=s3://{bucket_name}/{key_name}/model.tar.gz

Model hosting

Amazon SageMaker provides model hosting services for model deployment. Amazon SageMaker provides an HTTPS endpoint where your machine learning model is available to provide inferences.

Endpoint configuration

Endpoint

When this module creates an endpoint, Amazon SageMaker launches the ML compute instances and deploys the model as specified in the configuration. To get inferences from the model, client applications send requests to the Amazon SageMaker Runtime HTTPS endpoint.

Real-time inference endpoints

Real-time inference is ideal for inference workloads where you have real-time, interactive, low latency requirements. You can deploy your model to SageMaker AI hosting services and get an endpoint that can be used for inference. These endpoints are fully managed and support autoscaling.

Asynchronous inference endpoints

Coming soon

AutoScaling

To enable autoscaling on the production variant, use the autoscaling_config variable. For load testing guidance on determining the maximum requests per second per instance, please see this documentation.

Requirements

Name Version
terraform >= 1.0.7
aws ~>5.0
random >= 3.6.0

Providers

Name Version
aws ~>5.0
random >= 3.6.0

Modules

No modules.

Resources

Name Type
aws_appautoscaling_policy.sagemaker_policy resource
aws_appautoscaling_target.sagemaker_target resource
aws_iam_role.sg_endpoint_role resource
aws_iam_role_policy_attachment.sg_policy_attachment resource
aws_sagemaker_endpoint.sagemaker_endpoint resource
aws_sagemaker_endpoint_configuration.sagemaker_endpoint_config resource
aws_sagemaker_model.sagemaker_model resource
random_string.solution_suffix resource
aws_iam_policy.sg_full_access data source
aws_iam_policy_document.sg_trust data source
aws_partition.current data source

Inputs

Name Description Type Default Required
autoscaling_config Enable autoscaling for the SageMaker Endpoint production variant.
object({
min_capacity = optional(number, 1)
max_capacity = number
target_value = number
scale_in_cooldown = optional(number)
scale_out_cooldown = optional(number)
})
null no
containers Specifies the container definitions for this SageMaker model, consisting of either a single primary container or an inference pipeline of multiple containers.
list(object({
image_uri = optional(string)
model_package_name = optional(string)
model_data_url = optional(string)
mode = optional(string, "SingleModel")
environment = optional(map(string))
container_hostname = optional(string)
image_config = optional(object({
repository_access_mode = string
repository_auth_config = optional(object({
repository_credentials_provider_arn = string
}))
}))
inference_specification_name = optional(string)
model_data_source = optional(object({
s3_data_type = string
s3_uri = string
is_compressed = optional(bool)
accept_eula = optional(bool)
}))
multi_model_config = optional(object({
model_cache_setting = optional(string)
}))
}))
[] no
enable_network_isolation Isolates the model container. No inbound or outbound network calls can be made to or from the model container. bool false no
endpoint_name The name of the Amazon SageMaker Endpoint. string "SGendpoint" no
kms_key_arn Amazon Resource Name (ARN) of a AWS Key Management Service key that Amazon SageMaker uses to encrypt data on the storage volume attached to the ML compute instance that hosts the endpoint. string null no
name_prefix This value is appended at the beginning of resource names. string "SGTFendpoint" no
production_variant Configuration for the production variant of the SageMaker endpoint.
object({
accelerator_type = optional(string)
container_startup_health_check_timeout_in_seconds = optional(number)
core_dump_config = optional(object({
destination_s3_uri = string
kms_key_id = optional(string)
}))
enable_ssm_access = optional(bool)
inference_ami_version = optional(string)
initial_instance_count = optional(number)
instance_type = optional(string)
model_data_download_timeout_in_seconds = optional(number)
variant_name = optional(string, "AllTraffic")
volume_size_in_gb = optional(number)
})
{
"initial_instance_count": 1,
"instance_type": "ml.t2.medium",
"model_data_download_timeout_in_seconds": 900,
"variant_name": "AllTraffic",
"volume_size_in_gb": 30
}
no
sg_role_arn The ARN of the IAM role with permission to access model artifacts and docker images for deployment. string null no
tags Tag the Amazon SageMaker Endpoint resource. map(string) null no

Outputs

Name Description
sagemaker_endpoint_config_name The name of the SageMaker endpoint configuration
sagemaker_endpoint_name The name of the SageMaker endpoint
sagemaker_role_arn The ARN of the IAM role for the SageMaker endpoint