-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for groovy static analysis for groovy scripts #14844
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14844 +/- ##
============================================
+ Coverage 61.75% 63.46% +1.71%
- Complexity 207 1483 +1276
============================================
Files 2436 2749 +313
Lines 133233 154575 +21342
Branches 20636 23823 +3187
============================================
+ Hits 82274 98097 +15823
- Misses 44911 49080 +4169
- Partials 6048 7398 +1350
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
I think the checks are disabled by default. Can you create a doc on various configurations supported by Pinot? |
d7077e8
to
93dc362
Compare
Github issue link: #14995 |
@soumitra-st I have created a github issue and linked the document. Please review it. Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Please add the new config in the PR description, and call out a) where does it need to be set (server/minion/controller conf or cluster config) b) is restart required when it's changed |
Nice! Is this another attempt of #14197? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is static analysis, we should apply the analysis in 2 places:
- When table config gets created/updated, we want to validate the groovy transforms configured
- When broker gets a query, we want to validate the groovy transform within it
Take a look at where we block groovy functions right now. We can integrate this logic at the same place.
The analysis applied at both the suggested places.
|
93dc362
to
9812e00
Compare
...-local/src/main/java/org/apache/pinot/segment/local/function/GroovyConfigChangeListener.java
Outdated
Show resolved
Hide resolved
9812e00
to
5f19543
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not able to find the related change in controller and broker code. Can you point me to the class changed?
pinot-common/src/main/java/org/apache/pinot/common/metrics/ControllerMeter.java
Outdated
Show resolved
Hide resolved
@@ -38,7 +38,8 @@ public enum MinionMeter implements AbstractMetrics.Meter { | |||
SEGMENT_BYTES_UPLOADED("bytes", false), | |||
RECORDS_PROCESSED_COUNT("rows", false), | |||
RECORDS_PURGED_COUNT("rows", false), | |||
COMPACTED_RECORDS_COUNT("rows", false); | |||
COMPACTED_RECORDS_COUNT("rows", false), | |||
GROOVY_SECURITY_VIOLATIONS("exceptions", true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's clean up the unrelated changes. The metric should be emitted from controller for table config, and broker for query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed the metric part.
...ent-local/src/main/java/org/apache/pinot/segment/local/function/GroovyFunctionEvaluator.java
Outdated
Show resolved
Hide resolved
...ent-local/src/main/java/org/apache/pinot/segment/local/function/GroovyFunctionEvaluator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we apply groovy check in 2 places:
- In
TableConfigUtils
(see the usage of_disableGroovy
field) - In
BaseSingleStageBrokerRequestHandler
(see the usage of_disableGroovy
field)
Can we apply the static analysis at the same place? Applying it during ingestion/query execution might already be too late.
...ent-local/src/main/java/org/apache/pinot/segment/local/function/GroovyFunctionEvaluator.java
Outdated
Show resolved
Hide resolved
* @param groovyConfig GroovyStaticAnalyzerConfig instance to be used for static syntax analysis. | ||
* @return GroovyShell instance with static syntax analysis. | ||
*/ | ||
private GroovyShell createSafeShell(Binding binding, GroovyStaticAnalyzerConfig groovyConfig) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you anticipate performance overhead for the safe shell?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. There should not be any performance impact. The CompilerConfiguration
is created only once and reused. It is updated everytime static analyzer config is updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the change in org.apache.pinot.segment.local.utils.TableConfigUtils
. That is needed for the ingestion groovy validation
if (handlerContext._disableGroovy) { | ||
rejectGroovyQuery(serverPinotQuery); | ||
} | ||
rejectGroovyQuery(serverPinotQuery, handlerContext._disableGroovy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest renaming it to validateGroovy
@@ -29,6 +29,8 @@ | |||
|
|||
|
|||
public class CommonConstants { | |||
public static final String GROOVY_STATIC_ANALYZER_CONFIG = "pinot.server.groovy.static.analyzer"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be added as top level config. We need this config for both controller and broker, and they can potentially be different (one is for ingestion, and one is for query). Take a look at the subclass Controller
and Broker
within this class.
We can also consider adding a common one to be used when there is no controller/broker specific config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OKay. As a single configuration, it was added at the top level. I will create separate configuration for controller (ingestion) and broker (query).
For ingestion, under Controller subclass: pinot.ingestion.groovy.static.analyzer
For query, under Broker subclass: pinot.query.groovy.static.analyzer
Common configuration, as top level config in the CommonConstants class: pinot.groovy.static.analyzer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will use service name only and not use case specific name like pinot.broker.*
and 'pinot.controller.*` to continue the existing pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this, I think, we can avoid the specific REST APIs too for the groovy config and use POST /cluster/configs
API to configure the groovy static analyzer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Since we are not keeping the instance type prefix, we can put them in the top level, or add a subclass Groovy
. Let's not put it in the beginning of the file, and also add some documentation for them.
Regarding the REST API, the ser/de of this config is not easy, so I'd suggest keeping the REST API, but add a type param which can be ingestion
, query
or all
. Same for the GET request, where we perform the same fallback logic based on the type asked
The docs mention that security is required in Minions as well. However I dont see the security apparatus setup in Minions. Is that required ? |
@vrajat Minion should be fine because it pulls the table config which is already validated on the controller |
...-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotClusterConfigs.java
Outdated
Show resolved
Hide resolved
@@ -29,6 +29,8 @@ | |||
|
|||
|
|||
public class CommonConstants { | |||
public static final String GROOVY_STATIC_ANALYZER_CONFIG = "pinot.server.groovy.static.analyzer"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Since we are not keeping the instance type prefix, we can put them in the top level, or add a subclass Groovy
. Let's not put it in the beginning of the file, and also add some documentation for them.
Regarding the REST API, the ser/de of this config is not easy, so I'd suggest keeping the REST API, but add a type param which can be ingestion
, query
or all
. Same for the GET request, where we perform the same fallback logic based on the type asked
@Jackie-Jiang one thing to note is, the controller config would be applied on the table update and creation. It would not apply on any existing ingestion config. For that it need to be init the config in minion and other external jobs (hadoop, spark etc). |
Github Issue: #14995
Google Doc: https://docs.google.com/document/d/10-j1hevpwOWzaU8q0ndqv3toRBWTiPfOWy6xHojoiL4
It adds the support for static analysis for the groovy functions. Users can configure the allowed receivers, allowed imports, allowed static imports, disallowed method names, toggle method definition allowed in the groovy scripts.
A groovy listener to update the configuration on the server nodes.
Testing