Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

presto requires namespaces when using function registration #24363

Open
soumiiow opened this issue Jan 14, 2025 · 3 comments
Open

presto requires namespaces when using function registration #24363

soumiiow opened this issue Jan 14, 2025 · 3 comments
Assignees
Labels

Comments

@soumiiow
Copy link

soumiiow commented Jan 14, 2025

Your Environment

  • Presto version used:
  • Storage (HDFS/S3/GCS..):
  • Data source and connector used:
  • Deployment (Cloud or On-prem):
  • Pastebin link to the complete debug logs:

Expected Behavior

Issue: While using registerFunctions to register a function in Presto, it requires a format following catalog.schema.function_name. While this makes sense, it begs the following question when implementing a dynamically loading library in Presto Cpp:
Should we gracefully create a user experience that if a user were to not follow the guidelines in this instance, it does not crash the worker?

Current Behavior

Background: When loading functions (types, or anything else for that matter) dynamically, the DynamicLibraryLoader loads the symbols without knowing what is being loaded in the first place. While this is known behavior for the dlopen library, it can lead to abrupt terminations of the worker. While the burden of responsibility does fall on the user to register functions properly, it can be confusing especially when there seems to be no straightforward way for me to declare a default namespace for these registrations. Moreover, another source of confusion can arise from Velox's function registrations which does not require namespaces at this time.

PR for dynamically loading library in Presto CPP [https://github.com//pull/24330]

as an example, if I attempt to register this function dynamically, my worker crashes and leads to the following error:

facebook::velox::registerFunction<
      facebook::velox::common::dynamicRegistry::Dynamic123Function,
      int64_t>({"dynamic_3"});
}
terminate called after throwing an instance of 'facebook::velox::VeloxUserError'
  what():  Exception: VeloxUserError
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: Prefix missing for function dynamic_3
Retriable: False
Expression: parts.size() == 3
Function: getFunctionNameParts

Possible Solution

after discussing with @aditi-pandit and @czentgr, there seem to be four potential ways to handle this. Which way, if any would be the best way to move forward?

  1. Create a wrapper function registry in Presto following roughly this format:

facebook::presto::registerFunction<...>(std::string functionName, std::string functionNamespace)

This function can perform some checks and then call the velox function registry with the proper Presto function naming.

  1. Create a registerNamespace<> function where each user will be required to register a namespace for every dynamic library they wish to load one time. After that, all function registries will follow the namespace name that was registered for this file.

for the above solution, there is a caveat that the user will have to separate any functions they want to add to a different namespace onto a separate file.

  1. Create a config for a namespace value. this may be unflexible and would lead to having the user restart the server each time they want to make a change to the namespace or use multiple configs somehow.

  2. REST API calls to dynamically load namespaces. This was we can deploy without having to restart the presto server. Not sure how we will be able to connect which namespace correspond to which dynamic function registrations.

Steps to Reproduce

  1. cherry-pick the changes in Dynamically Linked Library in Presto CPP #24330
  2. cherry pick the velox changes in Dynamically Linked Library in CPP facebookincubator/velox#11439
  3. Run presto server with sidecar worker, the sidecar worker will crash
  4. to get rid of the error, add "default.namespace." to the function names, they will register successfully.
@majetideepak
Copy link
Collaborator

majetideepak commented Jan 14, 2025

This error seems to be coming from a check inside FunctionMetadata.cpp::getFunctionNameParts which gets invoked inside FunctionMetadata.cpp::getFunctionsMetadata(). This function is invoked when a rest call of /v1/functions is made likely by the coordinator.
The catalog.schema prefix is a Presto artifact. How are the Java UDFs registered?

@tdcmeehan
Copy link
Contributor

Why does Velox require a prefix for the function? I thought functions in Velox had no concept of namespaces.

Java UDFs may be registered under the default namespace, or with a custom namespace. In C++, I would expect we need the ability to declare a function under a particular namespace, with this namespace reflecting upward to the execution engine through the sidecar process.

@soumiiow
Copy link
Author

@tdcmeehan apologies for the disconnect, i agree that velox does not reqire a prefix to the function name!! I was just trying to mention that as a way to mention the different requirements for velox and presto which IMO means we deal with it differently between the two libraries. this requirement comes from the presto side from FunctionMetadata.cpp

Moreover, another source of confusion can arise from Velox's function registrations which does not require namespaces at this time.

the prefix is checked in FunctionMetadata.cpp in the function getFunctionNameParts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 🆕 Unprioritized
Development

No branches or pull requests

5 participants