This module includes resources to deploy Amazon SageMaker endpoints. It takes care of creating a SageMaker model, SageMaker endpoint configuration, and the SageMaker endpoint.
With Amazon SageMaker, you can start getting predictions, or inferences, from your trained machine learning models. SageMaker provides a broad selection of ML infrastructure and model deployment options to help meet all your ML inference needs. With SageMaker Inference, you can scale your model deployment, manage models more effectively in production, and reduce operational burden.
This module supports the following ways to deploy a model, depending on your use case:
- For persistent, real-time endpoints that make one prediction at a time, use SageMaker real-time hosting services. Real-time inference is ideal for inference workloads where you have real-time, interactive, low latency requirements. You can deploy your model to SageMaker hosting services and get an endpoint that can be used for inference. These endpoints are fully managed and support autoscaling.
- For requests with large payload sizes up to 1GB, long processing times, and near real-time latency requirements, use Amazon SageMaker Asynchronous Inference. Amazon SageMaker Asynchronous Inference is a capability in SageMaker that queues incoming requests and processes them asynchronously. This option is ideal for requests with large payload sizes (up to 1GB), long processing times (up to one hour), and near real-time latency requirements. Asynchronous Inference enables you to save on costs by autoscaling the instance count to zero when there are no requests to process, so you only pay when your endpoint is processing requests.
In the event that a single container is sufficient for your inference use-case, you can define a single-container model.
An inference pipeline is an Amazon SageMaker model that is composed of a linear sequence of multiple containers that process requests for inferences on data. See the AWS documentation to learn more about SageMaker inference pipelines. To define an inference pipeline, you can provide additional containers for your model.
If you enable network isolation, the containers can't make any outbound network calls, even to other AWS services such as Amazon Simple Storage Service (S3). Additionally, no AWS credentials are made available to the container runtime environment.
To enable network isolation, set the enable_network_isolation property to true.
Reference an image available within ECR
Reference an AWS Deep Learning Container image.
If you choose to decouple your model artifacts from your inference code (as is natural given different rates of change between inference code and model artifacts), the artifacts can be specified via the model_data_source property of var.containers. The default is to have no model artifacts associated with a model. For instance: model_data_source=s3://{bucket_name}/{key_name}/model.tar.gz
Amazon SageMaker provides model hosting services for model deployment. Amazon SageMaker provides an HTTPS endpoint where your machine learning model is available to provide inferences.
When this module creates an endpoint, Amazon SageMaker launches the ML compute instances and deploys the model as specified in the configuration. To get inferences from the model, client applications send requests to the Amazon SageMaker Runtime HTTPS endpoint.
Real-time inference is ideal for inference workloads where you have real-time, interactive, low latency requirements. You can deploy your model to SageMaker AI hosting services and get an endpoint that can be used for inference. These endpoints are fully managed and support autoscaling.
Coming soon
To enable autoscaling on the production variant, use the autoscaling_config variable. For load testing guidance on determining the maximum requests per second per instance, please see this documentation.
Name | Version |
---|---|
terraform | >= 1.0.7 |
aws | ~>5.0 |
random | >= 3.6.0 |
Name | Version |
---|---|
aws | ~>5.0 |
random | >= 3.6.0 |
No modules.
Name | Type |
---|---|
aws_appautoscaling_policy.sagemaker_policy | resource |
aws_appautoscaling_target.sagemaker_target | resource |
aws_iam_role.sg_endpoint_role | resource |
aws_iam_role_policy_attachment.sg_policy_attachment | resource |
aws_sagemaker_endpoint.sagemaker_endpoint | resource |
aws_sagemaker_endpoint_configuration.sagemaker_endpoint_config | resource |
aws_sagemaker_model.sagemaker_model | resource |
random_string.solution_suffix | resource |
aws_iam_policy.sg_full_access | data source |
aws_iam_policy_document.sg_trust | data source |
aws_partition.current | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
autoscaling_config | Enable autoscaling for the SageMaker Endpoint production variant. | object({ |
null |
no |
containers | Specifies the container definitions for this SageMaker model, consisting of either a single primary container or an inference pipeline of multiple containers. | list(object({ |
[] |
no |
enable_network_isolation | Isolates the model container. No inbound or outbound network calls can be made to or from the model container. | bool |
false |
no |
endpoint_name | The name of the Amazon SageMaker Endpoint. | string |
"SGendpoint" |
no |
kms_key_arn | Amazon Resource Name (ARN) of a AWS Key Management Service key that Amazon SageMaker uses to encrypt data on the storage volume attached to the ML compute instance that hosts the endpoint. | string |
null |
no |
name_prefix | This value is appended at the beginning of resource names. | string |
"SGTFendpoint" |
no |
production_variant | Configuration for the production variant of the SageMaker endpoint. | object({ |
{ |
no |
sg_role_arn | The ARN of the IAM role with permission to access model artifacts and docker images for deployment. | string |
null |
no |
tags | Tag the Amazon SageMaker Endpoint resource. | map(string) |
null |
no |
Name | Description |
---|---|
sagemaker_endpoint_config_name | The name of the SageMaker endpoint configuration |
sagemaker_endpoint_name | The name of the SageMaker endpoint |
sagemaker_role_arn | The ARN of the IAM role for the SageMaker endpoint |