Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run s3 jobs via step functions #136

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

run s3 jobs via step functions #136

wants to merge 17 commits into from

Conversation

coilysiren
Copy link
Contributor

@coilysiren coilysiren commented Sep 13, 2024

Ticket

Resolves navapbc/template-infra#744

Changes

This PR changes the Event Bridge file upload job from triggering an ECS task directly, to triggering a step function that then triggers an ECS task. We are doing this because Step Functions brings a basic observability layer (it shows failures and successes) to the triggering of ECS tasks.

Context for reviewers

This PR looks a lot like the scheduled jobs PR. It even shares a role with the scheduled jobs (infra/modules/service/workflow_orchestrator_role.tf)

Testing

(the gif runs through the whole process end to end, and ends up 70 seconds long)

Screen.Recording.2024-09-13.at.2.mp4

Preview environment

@@ -30,27 +30,35 @@ resource "aws_iam_policy" "run_task" {

data "aws_iam_policy_document" "run_task" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file looks exactly like the one for the scheduler role:

https://github.com/navapbc/platform-test/blob/main/infra/modules/service/scheduler_role.tf

But it didn't seem right to share them, given that they are acting on different arns, and are assumed by different services.

for_each = var.file_upload_jobs

name = "${var.service_name}-${each.key}"
role_arn = aws_iam_role.workflow_orchestrator.arn
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same role that the cron scheduler uses 😎

@coilysiren coilysiren marked this pull request as ready for review September 13, 2024 21:31
Copy link
Collaborator

@lorenyu lorenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments about simplifying a few things but otherwise looks amazing.

Comment on lines 86 to 94
input_template = replace(replace(jsonencode({
containerOverrides = [
{
name = local.container_name,
command = each.value.task_command
}
]
}), "\\u003c", "<"), "\\u003e", ">")
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simplify this:

input_template = replace(replace(jsonencode({
  task_command = each.value.task_command
}), "\\u003c", "<"), "\\u003e", ">")

then below where we set the container overrides we can do

"Command.$" : "$.task_command"

#
# The syntax for parsing the input data comes from JSONPath.
"Name" : local.container_name,
"Command.$" : "$.containerOverrides[0].command[*]"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous comment but I think we can simplify the input transformer and this line.

resource "aws_cloudwatch_log_group" "file_upload_jobs" {
for_each = var.file_upload_jobs

name_prefix = "/aws/vendedlogs/states/${var.service_name}-${each.key}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that if someone names one of their file_upload_jobs the same as one of their scheduled_jobs then the logs would conflict. Could we update this to:

Suggested change
name_prefix = "/aws/vendedlogs/states/${var.service_name}-${each.key}"
name_prefix = "/aws/vendedlogs/states/file-upload-jobs/${var.service_name}-${each.key}"

and also make a similar change in scheduled_jobs.tf to add /scheduled-jobs/ into the path

Comment on lines +42 to +49
dynamic "statement" {
for_each = aws_sfn_state_machine.file_upload_jobs

content {
actions = [
"states:StartExecution",
]
resources = [statement.value.arn]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize this previously when we worked on the scheduled jobs, but wouldn't this be much simpler as a non-dynamic statement with an array for the resources list:

Suggested change
dynamic "statement" {
for_each = aws_sfn_state_machine.file_upload_jobs
content {
actions = [
"states:StartExecution",
]
resources = [statement.value.arn]
statement {
actions = [
"states:StartExecution",
]
resources = [for job in aws_sfn_state_machine.file_upload_jobs : job.arn]

can we try this? If it works can we make the same change to scheduler_role.tf?

Comment on lines +53 to +61
dynamic "statement" {
for_each = aws_sfn_state_machine.file_upload_jobs

content {
actions = [
"states:DescribeExecution",
"states:StopExecution",
]
resources = ["${statement.value.arn}:*"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my previous comment, would this be simpler as

Suggested change
dynamic "statement" {
for_each = aws_sfn_state_machine.file_upload_jobs
content {
actions = [
"states:DescribeExecution",
"states:StopExecution",
]
resources = ["${statement.value.arn}:*"]
statement {
actions = [
"states:DescribeExecution",
"states:StopExecution",
]
resources = [for job in aws_sfn_state_machine.file_upload_jobs : "${job.arn}:*"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use step functions to run event jobs
2 participants