You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When working with built-in envelopes, I should be able to work with the inner body and either provide a schema to parse it, or get it back decoded or as-is.
A few categories:
Requests
This category includes:
API Gateway REST (v1) - ApiGatewayEnvelope
API Gateway HTTP (v2) - ApiGatewayV2Envelope
Lambda Function URL - LambdaFunctionUrlEnvelope
VPC Lattice (v1) - VpcLatticeEnvelope
VPC Lattice (v2) - VpcLatticeV2Envelope
All the events in this category have a payloads that boil down to this:
For all of them, the body is always a string type however the content could be a number of things like a plain text string (like in the example), a JSON encoded string, a base64 encoded string, a base64 encoded binary, or a base64 encoded something else.
When using the envelope, I should be able to either: 1/ just get the body as-is so that I can handle the transform in my code, or 2/ specify a Zodiac schema that will be used to parse the body. In the latter case, the Parser envelope should try to pre-transform the body based on heuristic logic before applying the parsing.
Specifically, the envelopes for these events should be able to handle the following cases:
For all other cases we return the body as is, see diagrams in the sections below.
base64 encoded streams
Kinesis Data Stream - KinesisEnvelope
Kinesis Firehose - KinesisFirehoseEnvelope
Kafka (MSK + Self managed) - KafkaEnvelope
These types of envelopes deal with payloads that always have their body partially or fully base64 encoded. Similar to what happens for the previous category (requests), customers might want to either: 1/ just get the body within the records, or 2/ get the body and parse it according to a Zod schema they specify.
For the first case, we should simply base 64 decode the encoded fields and return the body. For the latter instead, we try a further JSON parsing if the customer passes a complex schema (i.e. z.object({ ... })), before actually piping the body records through it.
Queues and topics
SNS - SnsEnvelope
SQS - SqsEnvelope
These are relatively simple and follow a similar logic to the previous two categories, except we don't have any way of telling whether the body of the messages are encoded in any way. The heuristic proposed below should work though.
No pre-parsing needed
EventBridge - EventBridgeEnvelope
This is self explanatory, but essentially it's the only category of envelopes that kind of works today since we don't need to do any pre-parsing and the inner body is always already an object.
Others
DynamoDB Stream DynamoDBStreamEnvelope
CloudWatch Logs - CloudWatchEnvelope
These envelopes are the odd-ones-out of the bunch. They require special handling because one requires unmarshalling from DynamoDB Attribute types to native JS objects, and the second one includes a compression step before the parsing.
For the purpose of this issue, we could move the logic within their respective envelopes and tackle them separately. Specifically the DynamoDB one requires further attention and considerations due to both missing event types (#3193) and requiring an extra dependency (#3194).
Current Behavior
Currently the envelopes don't support base64 encoded body fields at all, and in all cases assume the body is a JSON string. This is why I am marking this issue as a bug and not a feature request / improvement.
There are also a few other issues that are related and that will likely be resolved by addressing this one:
For now I'll leave them open mainly to make sure we address each and every one.
Code snippet
N/A
Steps to Reproduce
N/A
Possible Solution
The proposed approach for the parsing could be three:
Option 1
flowchart TB
A[Parse using outer schema - i.e. VpcLatticeV2Schema & extend schema to enfoce body: z.string] -->|Ensure body, headers, and isBase64Encoded fields are present| B{isBase64Encoded === true}
B -->|YES| C{decodeBase64}
C -->|successful| D
C -->|fail| E
B -->|NO| D{Did customer pass a Zod schema for the body/payload?}
D -->|NO| E[Return paylaod as is, aka a string]
D -->|YES| F[JSON.parse]
F -->|successful| G[Apply customer Zod schema]
F -->|fail| H[Throw ParseError]
Loading
In this case, we throw only when the customer explicitly passes a Zod schema for the inner body. This can happen either as a result of a failed JSON parse, or as a result of a failed Zod parse.
We don't throw if the customer doesn't pass any schema, but we still attempt to base64 decode the body if the rest of the payload indicates that it's base64 encoded. However we don't throw if this fails and just return the original body as a string.
This option would cover most of the use cases with no changes to the API, while still being lax enough to allow customers to apply their own parsing (escape hatch) if our heuristics fail.
Option 2
The alternative approach, would be to be slightly more strict and throw even when the base64 decoding operation fails:
flowchart TB
A[Parse using outer schema - i.e. VpcLatticeV2Schema & extend schema to enfoce body: z.string] -->|Ensure body, headers, and isBase64Encoded fields are present| B{isBase64Encoded === true}
B -->|YES| C{decodeBase64}
C -->|successful| D
C -->|fail| H
B -->|NO| D{Did customer pass a Zod schema for the body/payload?}
D -->|NO| E[Return paylaod as is, aka a string]
D -->|YES| F[JSON.parse]
F -->|successful| G[Apply customer Zod schema]
F -->|fail| H[Throw ParseError]
Loading
This would admittedly be less magic but would make it impossible for customers to apply their own transform whenever ours fail.
Option 3
This approach is a further refined version of the first one, in which customers can specify what I'm gonna refer to as primitive schemas for the payload. These are things like z.string(), z.number(), z.boolean(), but importantly not z.object().
This makes the diagram slightly uglier, but what it really means is that we avoid running JSON.parse() whenever the customer tells us (via the inner schema) that the body is not supposed to be a JSON:
flowchart TB
A[Parse using outer schema - i.e. VpcLatticeV2Schema & extend schema to enfoce body: z.string] -->|Ensure body, headers, and isBase64Encoded fields are present| B{isBase64Encoded === true}
B -->|YES| C{decodeBase64}
C -->|successful| D
C -->|fail| H
B -->|NO| D{Did customer pass a Zod schema for the body/payload?}
D -->|NO| E[Return paylaod as is, aka a string]
D -->|YES| F{Is a primitive schema like z.string, z.boolean, etc}
F -->|YES| H
F -->|NO| G[JSON.parse]
G -->|successful| H[Apply customer Zod schema]
G -->|fail| I[Throw ParseError]
Loading
Note that I am not sure this is even possible, I don't know Zod enough to know if it allows introspection of this kind, but imo this has all the benefits of option 1, with the added benefit of handling even more cases.
Personally I am inclined to go for the third option if it's feasible, otherwise fall back on the first option.
Provided we explain the logic properly in our documentation, I believe it provides the most versatility while still allowing for extensibility. By not providing a schema for the inner body, we still attempt to decode the body whenever the request tells us so, but customers are responsible for their own further decoding and parsing.
Finally, as discussed during our brief, I think we'd be better served with having some helper pure functions and have the orchestration of how they're called in each envelope type/category for maximum extensibility.
Powertools for AWS Lambda (TypeScript) version
latest
AWS Lambda function runtime
20.x
Packaging format used
npm
Execution logs
No response
The text was updated successfully, but these errors were encountered:
I am going through the implementation and I made few observations that contradict our assumptions.
Schema is not optional for envelopes
The concept of envelope was created with the idea to extract and parse inner payload in one step. There is no point in using envelope without the schema, the use case for that is to use APIGatewayProxyEventSchema and then transform your payload manually. Reimplementing this in envelopes introduces redundancy and makes it challenging to explain when use one over the other.
Fall back to string returns
One of the key points outlined is to fall back and return the original event if something goes wrong, base64 decoding, JSON parse, or anything else. Here, we also have a separate use case for that, safeParse, that returns error and the original event. Having the same functionality in parse would introduce redundancy, because the user would still need to guess "what did parse with envelope returned?" and probe the answer with custom logic. I think parse should be strict and straightforward, you give me the schema and envelope, and will return an object (any transformation included), but I will throw an error if something goes wrong. For anything else, use safeParse.
This results into a much lighter implementation, mostly adding base64 use case. Using envelopes and custom schema also means that the payload is a JSON object (except EventBridge), whether it was base64 encoded, compressed or stringified. We certainly need to highlight this point in our docs, so it is clear when to use envelopes and when fallback to the custom schema (i.e. extending built-in schema).
Expected Behavior
When working with built-in envelopes, I should be able to work with the inner body and either provide a schema to parse it, or get it back decoded or as-is.
A few categories:
Requests
This category includes:
ApiGatewayEnvelope
ApiGatewayV2Envelope
LambdaFunctionUrlEnvelope
VpcLatticeEnvelope
VpcLatticeV2Envelope
All the events in this category have a payloads that boil down to this:
For all of them, the
body
is always a string type however the content could be a number of things like a plain text string (like in the example), a JSON encoded string, a base64 encoded string, a base64 encoded binary, or a base64 encoded something else.When using the envelope, I should be able to either: 1/ just get the
body
as-is so that I can handle the transform in my code, or 2/ specify a Zodiac schema that will be used to parse the body. In the latter case, the Parser envelope should try to pre-transform thebody
based on heuristic logic before applying the parsing.Specifically, the envelopes for these events should be able to handle the following cases:
JSON stringified body
This should work whenever the
isBase64Encoded
field is set tofalse
AND content type header set to"application/json"
or omitted, and allow this:base64-encoded body
This should work whenever the
isBase64Encoded
field is set totrue
AND content type header is set to"application/base64"
, and allow this:For all other cases we return the body as is, see diagrams in the sections below.
base64 encoded streams
KinesisEnvelope
KinesisFirehoseEnvelope
KafkaEnvelope
These types of envelopes deal with payloads that always have their body partially or fully base64 encoded. Similar to what happens for the previous category (requests), customers might want to either: 1/ just get the body within the records, or 2/ get the body and parse it according to a Zod schema they specify.
For the first case, we should simply base 64 decode the encoded fields and return the body. For the latter instead, we try a further JSON parsing if the customer passes a complex schema (i.e.
z.object({ ... })
), before actually piping the body records through it.Queues and topics
SnsEnvelope
SqsEnvelope
These are relatively simple and follow a similar logic to the previous two categories, except we don't have any way of telling whether the body of the messages are encoded in any way. The heuristic proposed below should work though.
No pre-parsing needed
EventBridgeEnvelope
This is self explanatory, but essentially it's the only category of envelopes that kind of works today since we don't need to do any pre-parsing and the inner body is always already an object.
Others
DynamoDBStreamEnvelope
CloudWatchEnvelope
These envelopes are the odd-ones-out of the bunch. They require special handling because one requires unmarshalling from DynamoDB Attribute types to native JS objects, and the second one includes a compression step before the parsing.
For the purpose of this issue, we could move the logic within their respective envelopes and tackle them separately. Specifically the DynamoDB one requires further attention and considerations due to both missing event types (#3193) and requiring an extra dependency (#3194).
Current Behavior
Currently the envelopes don't support base64 encoded body fields at all, and in all cases assume the body is a JSON string. This is why I am marking this issue as a bug and not a feature request / improvement.
There are also a few other issues that are related and that will likely be resolved by addressing this one:
For now I'll leave them open mainly to make sure we address each and every one.
Code snippet
N/A
Steps to Reproduce
N/A
Possible Solution
The proposed approach for the parsing could be three:
Option 1
In this case, we throw only when the customer explicitly passes a Zod schema for the inner body. This can happen either as a result of a failed JSON parse, or as a result of a failed Zod parse.
We don't throw if the customer doesn't pass any schema, but we still attempt to base64 decode the body if the rest of the payload indicates that it's base64 encoded. However we don't throw if this fails and just return the original body as a string.
This option would cover most of the use cases with no changes to the API, while still being lax enough to allow customers to apply their own parsing (escape hatch) if our heuristics fail.
Option 2
The alternative approach, would be to be slightly more strict and throw even when the base64 decoding operation fails:
This would admittedly be less magic but would make it impossible for customers to apply their own transform whenever ours fail.
Option 3
This approach is a further refined version of the first one, in which customers can specify what I'm gonna refer to as primitive schemas for the payload. These are things like
z.string()
,z.number()
,z.boolean()
, but importantly notz.object()
.This makes the diagram slightly uglier, but what it really means is that we avoid running
JSON.parse()
whenever the customer tells us (via the inner schema) that the body is not supposed to be a JSON:Note that I am not sure this is even possible, I don't know Zod enough to know if it allows introspection of this kind, but imo this has all the benefits of option 1, with the added benefit of handling even more cases.
Personally I am inclined to go for the third option if it's feasible, otherwise fall back on the first option.
Provided we explain the logic properly in our documentation, I believe it provides the most versatility while still allowing for extensibility. By not providing a schema for the inner body, we still attempt to decode the body whenever the request tells us so, but customers are responsible for their own further decoding and parsing.
Finally, as discussed during our brief, I think we'd be better served with having some helper pure functions and have the orchestration of how they're called in each envelope type/category for maximum extensibility.
Powertools for AWS Lambda (TypeScript) version
latest
AWS Lambda function runtime
20.x
Packaging format used
npm
Execution logs
No response
The text was updated successfully, but these errors were encountered: