Terraform Amazon SageMaker Endpoint Module

This module includes resources to deploy Amazon SageMaker endpoints. It takes care of creating a SageMaker model, SageMaker endpoint configuration, and the SageMaker endpoint.

With Amazon SageMaker, you can start getting predictions, or inferences, from your trained machine learning models. SageMaker provides a broad selection of ML infrastructure and model deployment options to help meet all your ML inference needs. With SageMaker Inference, you can scale your model deployment, manage models more effectively in production, and reduce operational burden.

This module supports the following ways to deploy a model, depending on your use case:

For persistent, real-time endpoints that make one prediction at a time, use SageMaker real-time hosting services. Real-time inference is ideal for inference workloads where you have real-time, interactive, low latency requirements. You can deploy your model to SageMaker hosting services and get an endpoint that can be used for inference. These endpoints are fully managed and support autoscaling.
For requests with large payload sizes up to 1GB, long processing times, and near real-time latency requirements, use Amazon SageMaker Asynchronous Inference. Amazon SageMaker Asynchronous Inference is a capability in SageMaker that queues incoming requests and processes them asynchronously. This option is ideal for requests with large payload sizes (up to 1GB), long processing times (up to one hour), and near real-time latency requirements. Asynchronous Inference enables you to save on costs by autoscaling the instance count to zero when there are no requests to process, so you only pay when your endpoint is processing requests.

Model configuration

Single container

In the event that a single container is sufficient for your inference use-case, you can define a single-container model.

Inference pipeline

An inference pipeline is an Amazon SageMaker model that is composed of a linear sequence of multiple containers that process requests for inferences on data. See the AWS documentation to learn more about SageMaker inference pipelines. To define an inference pipeline, you can provide additional containers for your model.

Network isolation

If you enable network isolation, the containers can't make any outbound network calls, even to other AWS services such as Amazon Simple Storage Service (S3). Additionally, no AWS credentials are made available to the container runtime environment.

To enable network isolation, set the enable_network_isolation property to true.

Container images

ECR Image

Reference an image available within ECR

DLC Image

Reference an AWS Deep Learning Container image.

Model artifacts

If you choose to decouple your model artifacts from your inference code (as is natural given different rates of change between inference code and model artifacts), the artifacts can be specified via the model_data_source property of var.containers. The default is to have no model artifacts associated with a model. For instance: model_data_source=s3://{bucket_name}/{key_name}/model.tar.gz

Model hosting

Amazon SageMaker provides model hosting services for model deployment. Amazon SageMaker provides an HTTPS endpoint where your machine learning model is available to provide inferences.

Endpoint configuration

Endpoint

When this module creates an endpoint, Amazon SageMaker launches the ML compute instances and deploys the model as specified in the configuration. To get inferences from the model, client applications send requests to the Amazon SageMaker Runtime HTTPS endpoint.

Real-time inference endpoints

Real-time inference is ideal for inference workloads where you have real-time, interactive, low latency requirements. You can deploy your model to SageMaker AI hosting services and get an endpoint that can be used for inference. These endpoints are fully managed and support autoscaling.

Asynchronous inference endpoints

Coming soon

AutoScaling

To enable autoscaling on the production variant, use the autoscaling_config variable. For load testing guidance on determining the maximum requests per second per instance, please see this documentation.

Requirements

Name	Version
terraform	>= 1.0.7
aws	~>5.0
random	>= 3.6.0

Providers

Name	Version
aws	~>5.0
random	>= 3.6.0

Modules

No modules.

Resources

Name	Type
aws_appautoscaling_policy.sagemaker_policy	resource
aws_appautoscaling_target.sagemaker_target	resource
aws_iam_role.sg_endpoint_role	resource
aws_iam_role_policy_attachment.sg_policy_attachment	resource
aws_sagemaker_endpoint.sagemaker_endpoint	resource
aws_sagemaker_endpoint_configuration.sagemaker_endpoint_config	resource
aws_sagemaker_model.sagemaker_model	resource
random_string.solution_suffix	resource
aws_iam_policy.sg_full_access	data source
aws_iam_policy_document.sg_trust	data source
aws_partition.current	data source

Inputs

Name	Description	Type	Default	Required
autoscaling_config	Enable autoscaling for the SageMaker Endpoint production variant.	object({ min_capacity = optional(number, 1) max_capacity = number target_value = number scale_in_cooldown = optional(number) scale_out_cooldown = optional(number) })	`null`	no
containers	Specifies the container definitions for this SageMaker model, consisting of either a single primary container or an inference pipeline of multiple containers.	list(object({ image_uri = optional(string) model_package_name = optional(string) model_data_url = optional(string) mode = optional(string, "SingleModel") environment = optional(map(string)) container_hostname = optional(string) image_config = optional(object({ repository_access_mode = string repository_auth_config = optional(object({ repository_credentials_provider_arn = string })) })) inference_specification_name = optional(string) model_data_source = optional(object({ s3_data_type = string s3_uri = string is_compressed = optional(bool) accept_eula = optional(bool) })) multi_model_config = optional(object({ model_cache_setting = optional(string) })) }))	`[]`	no
enable_network_isolation	Isolates the model container. No inbound or outbound network calls can be made to or from the model container.	`bool`	`false`	no
endpoint_name	The name of the Amazon SageMaker Endpoint.	`string`	`"SGendpoint"`	no
kms_key_arn	Amazon Resource Name (ARN) of a AWS Key Management Service key that Amazon SageMaker uses to encrypt data on the storage volume attached to the ML compute instance that hosts the endpoint.	`string`	`null`	no
name_prefix	This value is appended at the beginning of resource names.	`string`	`"SGTFendpoint"`	no
production_variant	Configuration for the production variant of the SageMaker endpoint.	object({ accelerator_type = optional(string) container_startup_health_check_timeout_in_seconds = optional(number) core_dump_config = optional(object({ destination_s3_uri = string kms_key_id = optional(string) })) enable_ssm_access = optional(bool) inference_ami_version = optional(string) initial_instance_count = optional(number) instance_type = optional(string) model_data_download_timeout_in_seconds = optional(number) variant_name = optional(string, "AllTraffic") volume_size_in_gb = optional(number) })	{ "initial_instance_count": 1, "instance_type": "ml.t2.medium", "model_data_download_timeout_in_seconds": 900, "variant_name": "AllTraffic", "volume_size_in_gb": 30 }	no
sg_role_arn	The ARN of the IAM role with permission to access model artifacts and docker images for deployment.	`string`	`null`	no
tags	Tag the Amazon SageMaker Endpoint resource.	`map(string)`	`null`	no

Outputs

Name	Description
sagemaker_endpoint_config_name	The name of the SageMaker endpoint configuration
sagemaker_endpoint_name	The name of the SageMaker endpoint
sagemaker_role_arn	The ARN of the IAM role for the SageMaker endpoint

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.config		.config
.project_automation		.project_automation
examples		examples
tests		tests
.copier-answers.yml		.copier-answers.yml
.gitignore		.gitignore
.header.md		.header.md
.pre-commit-config.yaml		.pre-commit-config.yaml
.project_config.yml		.project_config.yml
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE.txt		NOTICE.txt
README.md		README.md
VERSION		VERSION
data.tf		data.tf
iam.tf		iam.tf
main.tf		main.tf
outputs.tf		outputs.tf
providers.tf		providers.tf
variables.tf		variables.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Terraform Amazon SageMaker Endpoint Module

Model configuration

Single container

Inference pipeline

Network isolation

Container images

ECR Image

DLC Image

Model artifacts

Model hosting

Endpoint configuration

Endpoint

Real-time inference endpoints

Asynchronous inference endpoints

AutoScaling

Requirements

Providers

Modules

Resources

Inputs

Outputs

About

Releases 1

Packages

Contributors 2

Languages

License

aws-ia/terraform-aws-sagemaker-endpoint

Folders and files

Latest commit

History

Repository files navigation

Terraform Amazon SageMaker Endpoint Module

Model configuration

Single container

Inference pipeline

Network isolation

Container images

ECR Image

DLC Image

Model artifacts

Model hosting

Endpoint configuration

Endpoint

Real-time inference endpoints

Asynchronous inference endpoints

AutoScaling

Requirements

Providers

Modules

Resources

Inputs

Outputs

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages