Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error while checking bucket ownership with connector athena-federation-jdbc to mysql #288

Closed
existeundelta opened this issue Nov 7, 2020 · 18 comments
Assignees
Labels
bug Something isn't working

Comments

@existeundelta
Copy link

GENERIC_USER_ERROR: Encountered an exception[java.lang.RuntimeException] from your LambdaFunction[arn:function:athena-federation-jdbc] executed in context[retrieving meta-data] with message[Error while checking bucket ownership for spill_bucket]

the query is executed in an account/Vpc and runs in a mysql in other account/vpc the vpcs are peer connected and correct security groups, and the queries runs (all related to schemas metadata work properly). The problem es when the query is executed it says this error, tried with two buckets one in each account and also proved with bucket in the account where athena query and lambda is with giving permissions to the other.

Lambda function has role rights to access this bucket(spill_bucket) and and for the one for athena results

mysql
java8

@existeundelta existeundelta added the bug Something isn't working label Nov 7, 2020
@avirtuos
Copy link
Contributor

avirtuos commented Nov 7, 2020

Can you check the cloudwatch logs for the Lambda function, it should have a more detailed error with stack trace. I suspect that your Lambda is unable to contact S3 possibly due to lack of a route to S3 or no internet gateway. A quick way to test this theory is to add an S3 VPC endpoint to the subnet(s) your Lambda function is running in

@existeundelta
Copy link
Author

existeundelta commented Nov 7, 2020

it says:
INFO SpillLocationVerifier:67 - Spill bucket has been changed from null to [spill_bucket_name]
WARN CompositeHandler:104 - handleRequest: Completed with an exception.
java.lang.RuntimeException: Error while checking bucket ownership for [spill_bucket_name]

Caused by: java.net.SocketTimeoutException: connect timed out
then
Exception in thread "main" java.lang.Error: java.lang.OutOfMemoryError: Metaspace

so I think you are right, but I don't really understand because lambda has policy permission in its role to be have access to s3 buckets (athena results and spill_bucket), i'll try the quick way you propose, to test that. If I correctly understand the problem is not the permisions, is that lambda can't "go" to s3?
answerd: https://stackoverflow.com/questions/60714724/how-lambda-connects-to-s3-inside-vpc

@avirtuos
Copy link
Contributor

avirtuos commented Nov 7, 2020

Yes, the Lambda is running in a network (subnet) which has no route (path) to S3. Either because of no internet gateway or due to security groups, etc... Trying a box endpoint is the 'safest' way to test fixing it since there is virtually no risk you'd open up your VPC to unwanted traffic from the internet. The box endpoint would only allow traffic to/from S3.

@existeundelta
Copy link
Author

thanks!

@avirtuos
Copy link
Contributor

avirtuos commented Nov 7, 2020

Did it work? Can we close the issue?

@existeundelta
Copy link
Author

existeundelta commented Nov 7, 2020

Still with no access but the error has changed to
Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied;

the changes are:

  • Create and Endpoint in the VPC to S3 Service
  • Add an Endpoint to S3 Bucket with policy of the type:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": "",
"Action": "s3:
",
"Resource": "[s3-enpoint-arn]"
}
]
}
tried different ones but a lot of ErrorPolicy in resource

  • Also a policy for s3 bucket itself
  • The spill_bucket defined of the form: [test-name].s3-us-east-1.amazonaws.com
    proved with non-existing bucket and have same result, so it seams may be is this "uri" that is not ok

because of Athena console freeze, difficult to test it throw the console
so i think now the vpc can conect to s3 , but that the endpoint in s3 don't have correct policy, i'm doing it ok creating the 2 enpoints one in the vpc and the other for s3 bucket?
thanks!

@avirtuos
Copy link
Contributor

avirtuos commented Nov 8, 2020

The console freezing only affects the left nav (it's a bug we are working on). You can still run queries.

@avirtuos
Copy link
Contributor

avirtuos commented Nov 8, 2020

As for your permission error, you should not be using policies based on the endpoint Arn. That won't work especially since when Athena goes to read that dta it won't have access to that endpoint .

@existeundelta
Copy link
Author

existeundelta commented Nov 8, 2020

yes, thanks, really now I delete the s3 endopoint (so also its policy), beacause get the error despite whatever bucket name I put, so i get the same acces denied if i put a bucket name that don't exist... I'll try to reproduce the acces throw a ec2, so I can understant it better.

The console freezing only affects the left nav (it's a bug we are working on). You can still run queries.
ok, thanks, but really it affects all the tab (freeze all the tab)

@avirtuos
Copy link
Contributor

avirtuos commented Nov 8, 2020

It's causing the entire console to freeze for you?

@existeundelta
Copy link
Author

existeundelta commented Nov 8, 2020

yes, the entire tab (chrome) (if I wait some minutes after freeze it recovers and the trick is maintain it open, but if the page is refreshed it happends again)

@existeundelta
Copy link
Author

existeundelta commented Nov 9, 2020

how has to be the bucket name in spill_bucket for accesing it throw and endpoint in an VPC?
bucketname
or [bucketname].s3.[region].amazon.com
or may be another
?
because we think the problem is that lambda is not checking bucket ownership in the appropriate bucket
we follow s3-private-connection-no-authentication
but don't detail how would be the "bucketname" to be used

this are the ways to acces:
UsingBucket

the problem is in

https://github.com/awslabs/aws-athena-query-federation/blob/master/athena-federation-sdk/src/main/java/com/amazonaws/athena/connector/lambda/handlers/MetadataHandler.java

@kpkab
Copy link

kpkab commented Jun 3, 2021

hi

Is this issue still open? Im trying to use Athena Federated Query to access data in RS Cluster and im getting the same error.

GENERIC_USER_ERROR: Encountered an exception[java.lang.RuntimeException] from your LambdaFunction[arn:aws:lambda:us-east-1:11111111111:function:lambdafunctionname] executed in context[retrieving meta-data] with message[Error while checking bucket ownership for my_spill_bucket_name]

@ianbrumby
Copy link

Hello,

I was struggling with this error for a long time. It turns out that the Lambda Role needs permissions to List all the S3 Buckets in the account, not just the spill bucket.

@akuzin1 akuzin1 self-assigned this Apr 3, 2023
@akuzin1 akuzin1 closed this as completed Oct 23, 2023
@akuzin1
Copy link
Contributor

akuzin1 commented Oct 23, 2023

@ianbrumby is correct. This should however be accounted for by the default Lambda Execution Role. If issue persists, please open a new issue for better traceability. Thanks!

@evbo
Copy link

evbo commented Jan 15, 2024

@akuzin1 the problem is the lambda tries calling listBuckets. This can lead to a pretty big caveat because if you're in a private VPC there's no simple way to interact with buckets in other regions.

For instance, my connector might be setup in us-west-2 but there happens to also be buckets from us-west-1 in my account. Calling listBuckets will fail since my VPC endpoint for S3 must be in the same region as the lambda. Perhaps VPC peering could be setup, but that is a lot of complexity just to satisfy a listBuckets call.

As you suggest, for traceability I've opened a new issue: #1702

@akuzin1
Copy link
Contributor

akuzin1 commented Jan 24, 2024

Thanks for reopening issue, will prioritize addressing ticket this week.

@aimethed
Copy link
Contributor

Just to fully resolve, I believe we updated the permissions check to not use listBuckets anymore and instead use headBucket which requires fewer permissions and allowed us to remove the need for the listBuckets s3 permission.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants