-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[STORM-3683] Check if JVM options used for launching worker are valid. #3319
base: master
Are you sure you want to change the base?
Conversation
@@ -1907,4 +1909,201 @@ private void readArchive(ZipFile zipFile) throws IOException { | |||
} | |||
} | |||
} | |||
|
|||
/** | |||
* Return path to the Java command "x", prefixing with $}{JAVA_HOME}/bin/ if JAVA_HOME system property is defined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra } bracket in $}{JAVA_HOME}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
/** | ||
* Return path to the Java command "x", prefixing with $}{JAVA_HOME}/bin/ if JAVA_HOME system property is defined. | ||
* Otherwie return the supplied Java command unmodified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise spelling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
* @param cmd Java command, e.g. "java", "jar" etc. | ||
* @return command string to use. | ||
*/ | ||
public static String getJavaCmd(String cmd) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this.
@@ -568,14 +570,7 @@ private int getMemOffHeap(WorkerResources resources) { | |||
} | |||
|
|||
protected String javaCmd(String cmd) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any need to keep this method now that it is available in Utils?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I took this out, a bunch of test classes started failing. They were using a Mock class to override this class method to return just "java". And in the test, they check the returned array. In order to avoid changing a whole bunch of other classes, I left this signature unchanged. But this definitely looks ugly and the tests should be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I'm fine keeping this for now.
public static boolean validateWorkerLaunchOptions(Map<String, Object> supervisorConf, Map<String, Object> topoConf, | ||
Map<String, Object> substitutions, boolean throwExceptionOnFailure) | ||
throws InvalidTopologyException { | ||
// from storm-server/.../BasicContainer.mkLaunchCommand |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we commonize this, or grab them from a BasicContainer method? Hard to maintain this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BasicContainer is in storm-server package which depends on the storm-client package for compile.
Should be commonized somehow - otherwise this is a hidden forward dependency. Will look into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that storm-server:BasicContainer.java is now using storm-client:Utils.WellKnownRuntimeSubstitutionVars. Is this sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like others to chime in here. I like the intent of this change, but the way we're replicating the code here so specifically I'm not so happy with. I understand the dependency issue causes problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the replication either. It adds a lot of maintenance overhead. The suggestion I can think of is to move the whole check into server side at Nimbus.submitTopologyWithOpts
, which is better in a few ways. But it has its own problem to be taken care of (see my comments above).
Some thoughts on this:
This feature (check GC option conflicts) is nice to have. But it is not hard to find out that workers fail because of GC option conflicts. So I think we don't have to implement this feature if there is no good/clean way to implement it.
close/reopen for rebuild |
@@ -528,6 +528,7 @@ private static void validateConfs(Map<String, Object> topoConf, StormTopology to | |||
InvalidTopologyException, AuthorizationException { | |||
ConfigValidation.validateTopoConf(topoConf); | |||
Utils.validateTopologyBlobStoreMap(topoConf); | |||
Utils.validateWorkerLaunchOptions(null, topoConf, null, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A validation here is not likely to prevent the issue since it doesn't use the supervisorConf here. But in BasicContainer, it will use supervisorConf, which might have some default values.
We can add the check inside Nimbus.submitTopologyWithOpts
(server side), and if it fails the check, throws InvalidTopologyException. This can avoid code duplication and avoid exposing too many method on the client side.
But the problem is still it will use nimbusConf instead of supervisorConf, unless we can find a clean way to use the same source of truth.
public static boolean validateWorkerLaunchOptions(Map<String, Object> supervisorConf, Map<String, Object> topoConf, | ||
Map<String, Object> substitutions, boolean throwExceptionOnFailure) | ||
throws InvalidTopologyException { | ||
// from storm-server/.../BasicContainer.mkLaunchCommand |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the replication either. It adds a lot of maintenance overhead. The suggestion I can think of is to move the whole check into server side at Nimbus.submitTopologyWithOpts
, which is better in a few ways. But it has its own problem to be taken care of (see my comments above).
Some thoughts on this:
This feature (check GC option conflicts) is nice to have. But it is not hard to find out that workers fail because of GC option conflicts. So I think we don't have to implement this feature if there is no good/clean way to implement it.
I think a good reason to catching this early (i.e. at submission time) is that the cycle time for error detection is longer. Case in point is JVM options changes and the debug cycle yesterday just to discover the problem with the profiler options. |
It is not hard to find out that jvm didn't launch because of jvm option issue, GC option, profiler option or other options, since they are in the logs. But if this feature is really desired, I would suggest to move to Nimbus code. Doing it in Another advantage of doing it at But after moving it to Nimbus code, we need to deal with nimbusConf vs supervisorConf issue (although it is a less serious problem) |
What is the purpose of the change
Error in JVM options can cause the worker launch to fail. This change will detect the erroneous options to be detected early when topology is submitted.
How was the change tested
With new unit tests