AWS Lambda to validate a Bagit bag stored in S3.
- To preview a list of available Makefile commands:
make help
- To install with dev dependencies:
make install
- To update dependencies:
make update
- To run unit tests:
make test
- To lint the repo:
make lint
https://docs.aws.amazon.com/lambda/latest/dg/images-test.html
- Build the container:
docker build -t validator:latest .
- Run the default handler for the container:
docker run --env-file .env -p 9000:8080 validator:latest
- Post to the container:
curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'
- Observe output:
"You have successfully called this lambda!"
This application includes a CLI that is designed to invoke the deployed AWS Lambda. This supports running AIP validation from a command line context, while still utilizing all the wiring and permissions of the deployed lambda.
To run CLI commands ensure the following environment variables are set:
WORKSPACE=### Environment "dev", "stage", or "prod"
AIP_VALIDATOR_ENDPOINT=### Deployed AWS Lambda endpoint URL.
CHALLENGE_SECRET=### Secret string that is passed as part of lambda invocation payload and checked before running.
Then run one of the following CLI commands:
Usage: -c [OPTIONS] COMMAND [ARGS]...
S3 BagIt Validator CLI.
Options:
-v, --verbose Flag for verbose output.
--help Show this message and exit.
Commands:
bulk-validate Bulk validate AIPs stored in S3 via the AIP UUID or S3 URI.
ping Ping deployed AWS lambda, ensure connection and...
validate Validate a single AIP stored in S3 via the AIP UUID or...
Usage: -c ping [OPTIONS]
Ping deployed AWS lambda, ensure connection and authorization.
Options:
--help Show this message and exit.
Example:
pipenv run cli ping
Usage: -c validate [OPTIONS]
Validate a single AIP stored in S3 via the AIP UUID or S3 URI.
The result is either 'OK' or the full validation response if the '--details'
is set.
Note: the timeout for the lambda HTTP request is quite long to accommodate
AIPs that take substantial time to validate. If there are connection issues
it is recommended to use the 'ping' CLI command to ensure firewall access
and authorization.
Options:
-a, --aip-uuid TEXT AIP UUID from Archivematica.
-u, --s3-uri TEXT Full S3 URI of AIP stored in S3.
-d, --details Return full AIP validation details as JSON to stdout
instead of 'OK'.
--help Show this message and exit.
Usage: -c bulk-validate [OPTIONS]
Bulk validate AIPs stored in S3 via the AIP UUID or S3 URI.
Options:
-i, --input-csv-filepath TEXT Filepath of CSV with AIP UUIDs or S3 URIs to
be validated. [required]
-o, --output-csv-filepath TEXT Filepath of CSV for validation results.
-d, --details Return full AIP validation details as JSON
to stdout instead of 'OK'.
-w, --max-workers INTEGER Maximum number of concurrent validation
workers. This should not exceed the maximum
concurrency for the deployed AWS Lambda
function.
--help Show this message and exit.
Examples:
# providing the AIP UUID
pipenv run cli --verbose validate --aip-uuid="c73d10a7-7cd2-406f-95b6-b12e8f2da646"
# providing the explicit S3 URI of the AIP
pipenv run cli --verbose validate --s3-uri="s3://my-bucket/c73d/10a7/7cd2/406f/95bf/b12e/8f2d/a646/my-amazing-aip-c73d10a7-7cd2-406f-95b6-b12e8f2da646"
# bulk validate against a list of AIPs in a CSV
pipenv run cli --verbose bulk-validate --input-csv-filepath="output/bulk-uuids.csv"
SENTRY_DSN=### If set to a valid Sentry DSN, enables Sentry exception monitoring. This is not needed for local development.
WORKSPACE=### Set to `dev` for local development, this will be set to `stage` and `prod` in those environments by Terraform.
CHALLENGE_SECRET=### Secret string that is passed as part of lambda invocation payload and checked before running
S3_INVENTORY_LOCATIONS=### Comma seperated list of S3 URIs that have S3 Inventory data for a particular bucket
WARNING_ONLY_LOGGERS=### Optionally set "WARNING" logging level for comma seperated list of libraries; e.g. asyncio,botocore,urllib3,s3transfer,boto3
INTEGRATION_TEST_BUCKET=### [Integration tests] Bucket to use for integration testing
INTEGRATION_TEST_PREFIX=### [Integration tests] Prefix for any uploaded AIPs as part of integration testing
CHECKSUM_NUM_WORKERS=### Number of parallel threads to use for checksum generation / retrieval; default 256
AIP_VALIDATOR_ENDPOINT=### Deployed AWS Lambda endpoint URL; required for CLI commands
LAMBDA_MAX_CONCURRENCY=### Maximum number of parallel workers for CLI bulk validation. This should not exceed the maximum concurrency of the deployed AWS Lambda.
- Infrastructure: TODO
mindmap
root((s3-bagit-validator))
- Team: DataEng
- Last Maintenance: 2025-03