Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running BriskStream benchmarks #4

Open
mencagli opened this issue Jun 11, 2019 · 20 comments
Open

Running BriskStream benchmarks #4

mencagli opened this issue Jun 11, 2019 · 20 comments

Comments

@mencagli
Copy link

Dear Tony,

I am trying to run BriskStream benchmarks in my multi-core machine (stable branch). The compilation works fine but I am not able to run any program. Since your scripts are very complex, I am trying to run by hand the WordCount benchmark using my local dataset (I have changed the path in the configuration file). Unfortunately, the application hangs foreever without processing anything. I run it using a command like:

java -cp target/BriskBenchmarks-1.2.0-jar-with-dependencies.jar applications.BriskRunner

which should use the default configuration and the default application (WordCount). By inspecting the code (by adding proper prints), I discover that the code hangs in the file ExecutionGraph.java in the Loading function (the two double loops inside). I understand that there are some problems in loading the configuration and nothing happens: the topology is not working and not processing any data.

Can you please help me in understanding how to run BriskStream?

Thanks!

Gabriele

@ShuhaoZhangTony
Copy link
Collaborator

Hi Gabriele,

I guess the system hangs at the "load()" function (see the following code). The reason is that it seeks to load statistics file that is required by the cost model, but you haven't prepare it.
while (!tmpfile.exists()) { target = executionNode.getOP() + Math.max(1, (numTasks--)) + srcNode.getOP(); tmpfile = new File(dir + OsUtils.OS_wrapper(target + ".txt")); }

If you want to test the raw performance of briskstream without invoke the cost model and optimization process, you can append a "--native" argument. It's like "java -cp target/BriskBenchmarks-1.2.0-jar-with-dependencies.jar applications.BriskRunner --native".

If you want to test the optimization, you need to first profile the application by specifying a "--profile" argument. An example argument is as follows (you can find it in ``nus" script).
-st 1 -sit 1 -tt 40 --num_socket 8 --num_cpu 18 --THz 500000 --runtime 5000 --loop 100000 --size_tuple 10 --repeat 1 -bt 100--percentile 50"

After profiliation is done, you should be able to see some files like follows.
../briskstream/STAT/wc/50/10/wordCount1splitSentence.txt
Then, you can test briskstream with optimization enabled.
An example argument is as follows.
--THz 500000 --runtime 30 --loop 1000 --num_socket 8 --num_cpu 10 --size_tuple 10

You can find full list of arguments in ``abstractRunner.Java" class.
Hope it helps. :)

I'll pin this issue in case someone else encounters the same problem.
Thanks!
Tony.

@ShuhaoZhangTony ShuhaoZhangTony pinned this issue Jun 11, 2019
@mencagli
Copy link
Author

Thanks for the quick reply!!

I tried with "java -cp target/BriskBenchmarks-1.2.0-jar-with-dependencies.jar applications.BriskRunner --native" and the problem is now solved. Unfortunately, there is still a NullPointerExpection raised at

2019-06-11 14:10:20 INFO Optimizer:54 - number of CPUs:8
2019-06-11 14:10:20 INFO Optimizer:55 - number of num_socket:1
2019-06-11 14:10:20 INFO Optimizer:56 - GC factor:1.0
2019-06-11 14:10:20 INFO TopologySubmitter:64 - DB initialize starts @2019-06-11T14:10:20.173+02:00
Exception in thread "main" java.lang.NullPointerException
at brisk.topology.TopologySubmitter.submitTopology(TopologySubmitter.java:69)
at applications.BriskRunner.runTopologyLocally(BriskRunner.java:100)
at applications.BriskRunner.run(BriskRunner.java:415)
at applications.BriskRunner.main(BriskRunner.java:91)

Probably, there is still some configuration issues.

Thanks a lot for your help and guide in this.

@ShuhaoZhangTony
Copy link
Collaborator

ShuhaoZhangTony commented Jun 11, 2019

No problem. :) In fact, I should apologize for not having time to make it more user-friendly.

For the NullPointerException, please ensure
enable_shared_state @ Line 62 is false.
You can find this flag in common/src/main/java/applications/CONTROL.Java.
This flag is an experimental flag, and should be disabled for testing BriskStream.

Another suggestion is that, when you test the performance of BriskStream, remember to use "-bt " argument to configure different jumbo tuple size. A good starting point is "-bt 10", that is 10 tuples are merged into a single joint tuple. This is to reduce instruction cache miss and improve cross-operator communication efficiency.

More details can be found in the paper. A personal version of the paper is here https://www.comp.nus.edu.sg/~shuhao-z/docs/briskstream.pdf

Thanks!
Tony

@mencagli
Copy link
Author

Thanks a lot!
Everything seems to start now. Unfortunately with still some problems:

Exception in thread "Operator:splitSentence Executor ID:2" java.lang.NullPointerException
at java.lang.String.(String.java:166)
at applications.bolts.wc.SplitSentenceBolt.execute(SplitSentenceBolt.java:102)
at brisk.components.operators.executor.BasicBoltBatchExecutor.execute(BasicBoltBatchExecutor.java:38)
at brisk.execution.runtime.boltThread._execute_noControl(boltThread.java:183)
at brisk.execution.runtime.boltThread._execute(boltThread.java:193)
at brisk.execution.runtime.executorThread.routing(executorThread.java:359)
at brisk.execution.runtime.boltThread.run(boltThread.java:267)

It seems that the SplitSentenceBolt picked an Exception. But it is good that the topology is running now. Today I will try to understand what happened and in case I will post tomorrow some updates hoping to find some help from you!

Thanks!

Gabriele

@ShuhaoZhangTony
Copy link
Collaborator

ShuhaoZhangTony commented Jun 11, 2019

Ha, I guess that's related to the input data sets.
I have assumed a delimiter of ",".
The input data is like "eating, an, apple, is, good".
You can change the delimiter, for example, use space, at Line 100 of SplitSentenceBolt.
String[] split = new String(value).split(" ");
Thanks!
Tony

@mencagli
Copy link
Author

Dear Tony,

WordCount is running well. I am collecting performance results now. Thanks.

I would like to run also FraudDectection. I have the two dataset files (credit-card.dat and model.txt). I am trying the run the application using the following command:

java -cp target/BriskBenchmarks-1.2.0-jar-with-dependencies.jar applications.BriskRunner --native -bt 10 -a FraudDetection

and I receive immediately the following exception

Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:657)
at java.util.ArrayList.get(ArrayList.java:433)
at brisk.controller.output.PartitionController.(PartitionController.java:97)
at brisk.controller.output.partition.FieldsPartitionController.(FieldsPartitionController.java:31)
at brisk.execution.ExecutionGraph.partitionController_create(ExecutionGraph.java:370)
at brisk.execution.ExecutionGraph.init_pc(ExecutionGraph.java:338)
at brisk.execution.ExecutionGraph.build_streamController(ExecutionGraph.java:436)
at brisk.execution.ExecutionGraph.setup(ExecutionGraph.java:165)
at brisk.execution.ExecutionGraph.Configuration(ExecutionGraph.java:129)
at brisk.execution.ExecutionGraph.(ExecutionGraph.java:55)
at brisk.topology.TopologyComiler.generateEG(TopologyComiler.java:16)
at brisk.topology.TopologySubmitter.submitTopology(TopologySubmitter.java:45)
at applications.BriskRunner.runTopologyLocally(BriskRunner.java:100)
at applications.BriskRunner.run(BriskRunner.java:415)
at applications.BriskRunner.main(BriskRunner.java:91)

Maybe another configuration issue? Thanks for your great help.

Best,

Gabriele

@ShuhaoZhangTony
Copy link
Collaborator

Hi Gabriele,

You need to at least configure "-tt 2" (default is 1) to the system in order to run FraudDetection application. Otherwise, the parallelism is zero for some operators.
I have updated the program to handle such corner case. Thanks!

Tony

@mencagli
Copy link
Author

Dear Tony,

thank you for your help in the past. In this weeks, I have executed some applications in BriskStream in my multicore machine. All the experiments have been performed using the --native option. Now, I would like to test your interesting optimizations.

I have tried with the following command to generate the profile statistics as explained in your previous post answering my issue:

java -jar target/BriskBenchmarks-1.2.0-jar-with-dependencies.jar --profile -st 1 -sit 1 -tt 40 --num_socket 2 --num_cpu 16 --THz 500000 --runtime 5000 --loop 100000 --size_tuple 10 --repeat 1 -bt 100 --percentile 50

Please, note that my machine is quite small, two sockets only, but I am going to test BriskStream in a larger shared-memory system soon.

The problem that I have with the previous command is that the system hangs forever without doing anything. By running "htop", I see that there is only one thread continuously consuming 100% of a given CPU core. I am wondering if there is a similar problem as the one in my previous post (hanging on load() or whatever). By looking into the code, it seems that the program hangs on this call:

ExecutionGraph g = new TopologyComiler().generateEG(topology, conf);

in TopologySubmitter.java

I hope that you can give me some insights into how to proceed. Many thanks for your attention,

Gabriele

@ShuhaoZhangTony
Copy link
Collaborator

Hi Gabriele,
I just checked the code and confirm it's a logic bug, it happens when the user doesn't have any statistic files.
Thanks for pointing it out, I will commit an updated version of the repo. Alternatively, you can simply update the following code in STAT.java in your local copy to work around:

           while (!tmpfile.exists()) {
                target = executionNode.getOP() + Math.max(1, (numTasks--)) + srcNode.getOP();
                tmpfile = new File(dir + OsUtils.OS_wrapper(target + ".txt"));
                if (numTasks == 0) // please add these two lines.
                    break;
            }

Furthermore, please remember that under profiling mode, "-tt" only specifies the number of threads of ``stateful" operator (e.g., Counter in WordCount application), the rest operators (as they are stateless) will be automatically configured with a parallelism of one.

One more thing, since you are playing with the optimizations, please remember to configure your own machines' specifications as a new class file such as briskstream\common\src\main\java\applications\HP_Machine.java.

I have used two machines, HP and HUAWEI. Hence, there're two class files correspondingly.

Besides, you need to configure
public static ArrayList[] getNodes(int machine)
method in Platform.java to indicates the NUMA mapping of your machine.
After that, you need to specify --machine to indicate your machine of usage.

As of now, I haven't prepare the code to generate all those static information automatically but are prepared offline manually and hard-coded into the system. Sorry for such inconvenience! To generate all necessary information of your machine (e.g., latency, memory bandwidth etc.), you can use Intel mlc tool (https://software.intel.com/en-us/articles/intelr-memory-latency-checker). To generate NUMA mapping, you can use NUMACTL.
Feel free to ask me if you are unsure about how to get those information. Thanks!

Tony.

@mencagli
Copy link
Author

Dear Tony,

with your help, I am very close to be able to do the profiling. I did all the passages in your answer above. However, I have still a problem. I prepared a file like your HP_Machine.java with the latency profiling information obtained by the Intell MLC tool. Great! I have also modified the getNodes() method as follows:

    else if (machine == 4) { // pianosa
        Integer[] no_0 = {0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23};
        node_0 = new ArrayList<>(Arrays.asList(no_0));
        Integer[] no_1 = {8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31};
        ArrayList<Integer> node_1 = new ArrayList<>(Arrays.asList(no_1));
        return new ArrayList[]{
                node_0,
                node_1
        };
    }

My machine is named "Pianosa", two sockets with 8 cores each (16 with hyperthhreading). Totally 16 cores (32 contexts). Everything seems ready. When I run:

java -jar target/BriskBenchmarks-1.2.0-jar-with-dependencies.jar --profile -a FraudDetection -st 1 -sit 1 -tt 29 --THz 500000 --runtime 5000 --loop 100000 --size_tuple 10 --repeat 1 -bt 100 --percentile 50 --machine 4

everything starts and the program fails in assigning a core to one thread. In particular this is a fragment of the output:

...
2019-07-16 12:19:53 INFO boltThread:234 - Successfully create boltExecutors 13 for bolts:predictorBolt on node: 0binding:000000000000000000000000000000000000000000000000000000000000
2019-07-16 12:19:53 INFO executorThread:127 - predictorBolt(14)
2019-07-16 12:19:53 INFO executorThread:132 - binding to node:0 cpu:23
2019-07-16 12:19:53 INFO boltThread:234 - Successfully create boltExecutors 14 for bolts:predictorBolt on node: 0binding:000000000000000000000000000000000000000000000000000000000000
2019-07-16 12:19:53 INFO AffinityController:141 - No available CPUs

The problem should be in AffinityController.java in the requirePerCore() method. It raises an exception maybe here:

        try {
            cpus[0] = (this.mapping_node[node].get(cnt));
        } catch (Exception e) {
            LOG.info("No available CPUs");
            System.out.println("Node: " + node + " cnt " + cnt);
            System.exit(-1);
        }

After debugging, I see that node is 0 and cnt il 16. In fact, each numa node in my machine has 16 SMT cores, so the last valid cnt should be 15 and not 16. Maybe, there is still some problems (ArrayIndexOfOutBoundException). Do you have any idea about how to fix this?

Many thanks!

gabriele

@mencagli
Copy link
Author

I don't know if this might help. I also receive a message from the AffinityLock saying that

2019-07-16 12:33:03 INFO AffinityLock:126 - No isolated CPUs found, so assuming CPUs 1 to 31 available.

@ShuhaoZhangTony
Copy link
Collaborator

Hi Gabriele,

The problem is that cpu0 must be reserved for operating system.
Hence, I have a line cpu_pnt.put(0, 1); at AffinityController.java line 42 to explicitly leave out cpu0.
In your case, to make your profiling works, you need to make sure -tt vary from 1 to 28 (i.e., maximally you use 1+1+28+1=31 cores of your machine).

By the way, I have explicitly avoided the usage of hyper threading (HT) cores in my experiments so I'm not sure how will it goes if HT is enabled. At least, it will most likely make the cost model less accurate. It should be an interesting study to explore through.

Thanks!

Tony.

@ShuhaoZhangTony
Copy link
Collaborator

AffinityLock's message is saying you didn't explicitly make CPU cores isolated (isolcpus). That means, the OS will still try to schedule its works (e.g., other processes running on the same machine) on those CPU cores that you have explicitly pined for the streaming operators. Based on my experience, isolated cpu cores will make the system performance even more stable but that's not a must requirement. You may simply ignore that warning message.

@mencagli
Copy link
Author

Dear Tony,

I understand that core0 is reserved to the OS. Unfortunately, the exception still happens as soon as the -tt value is greater than 12. It seems that it is looking for the core with index cnt = 16 that does not exist (on node 0 I have 16 cores so index from 0 to 15)

cpus[0] = (this.mapping_node[node].get(cnt));

@mencagli
Copy link
Author

NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15,24-31

The profiling assigns correctly the threads on the cores in the first numa node, but as soon as it tries to assign the cores on the second numa node it does not work (it still tries to seek the core in the first numa node)

java -jar target/BriskBenchmarks-1.2.0-jar-with-dependencies.jar --profile -a FraudDetection -st 1 -sit 1 -tt 13 --THz 500000 --runtime 5000 --loop 100000 --size_tuple 10 --repeat 1 -bt 100 --percentile 50 --machine 4
[MyPrint] Running application: FraudDetection
[MyPrint] Starting topology...
2019-07-16 13:35:02 INFO AffinityLock:126 - No isolated CPUs found, so assuming CPUs 1 to 31 available.
Mapping node length 2
2019-07-16 13:35:02 INFO OptimizationManager:185 - Start profiling
2019-07-16 13:35:02 INFO executorThread:127 - spout(0)
2019-07-16 13:35:02 INFO executorThread:132 - binding to node:0 cpu:1
2019-07-16 13:35:02 INFO spoutThread:127 - Successfully create spoutExecutors 0 on node: 0binding:000000000000000000000000000000000000000000000000000000000000
2019-07-16 13:35:02 INFO executorThread:127 - parser(1)
2019-07-16 13:35:02 INFO executorThread:132 - binding to node:0 cpu:2
2019-07-16 13:35:02 INFO boltThread:234 - Successfully create boltExecutors 1 for bolts:parser on node: 0binding:000000000000000000000000000000000000000000000000000000000000
2019-07-16 13:35:02 INFO executorThread:127 - predictorBolt(2)
2019-07-16 13:35:02 INFO executorThread:132 - binding to node:0 cpu:3
2019-07-16 13:35:02 INFO boltThread:234 - Successfully create boltExecutors 2 for bolts:predictorBolt on node: 0binding:000000000000000000000000000000000000000000000000000000000000
2019-07-16 13:35:03 INFO executorThread:127 - predictorBolt(3)
2019-07-16 13:35:03 INFO executorThread:132 - binding to node:0 cpu:4
2019-07-16 13:35:03 INFO boltThread:234 - Successfully create boltExecutors 3 for bolts:predictorBolt on node: 0binding:000000000000000000000000000000000000000000000000000000000000
2019-07-16 13:35:03 INFO executorThread:127 - predictorBolt(4)
2019-07-16 13:35:03 INFO executorThread:132 - binding to node:0 cpu:5
2019-07-16 13:35:03 INFO boltThread:234 - Successfully create boltExecutors 4 for bolts:predictorBolt on node: 0binding:000000000000000000000000000000000000000000000000000000000000
2019-07-16 13:35:03 INFO executorThread:127 - predictorBolt(5)
2019-07-16 13:35:03 INFO executorThread:132 - binding to node:0 cpu:6
2019-07-16 13:35:03 INFO boltThread:234 - Successfully create boltExecutors 5 for bolts:predictorBolt on node: 0binding:000000000000000000000000000000000000000000000000000000000000
2019-07-16 13:35:03 INFO executorThread:127 - predictorBolt(6)
2019-07-16 13:35:03 INFO executorThread:132 - binding to node:0 cpu:7
2019-07-16 13:35:03 INFO boltThread:234 - Successfully create boltExecutors 6 for bolts:predictorBolt on node: 0binding:000000000000000000000000000000000000000000000000000000000000
2019-07-16 13:35:03 INFO executorThread:127 - predictorBolt(7)
2019-07-16 13:35:03 INFO executorThread:132 - binding to node:0 cpu:16
2019-07-16 13:35:03 INFO boltThread:234 - Successfully create boltExecutors 7 for bolts:predictorBolt on node: 0binding:000000000000000000000000000000000000000000000000000000000000
2019-07-16 13:35:03 INFO executorThread:127 - predictorBolt(8)
2019-07-16 13:35:03 INFO executorThread:132 - binding to node:0 cpu:17
2019-07-16 13:35:03 INFO boltThread:234 - Successfully create boltExecutors 8 for bolts:predictorBolt on node: 0binding:000000000000000000000000000000000000000000000000000000000000
2019-07-16 13:35:03 INFO executorThread:127 - predictorBolt(9)
2019-07-16 13:35:03 INFO executorThread:132 - binding to node:0 cpu:18
2019-07-16 13:35:03 INFO boltThread:234 - Successfully create boltExecutors 9 for bolts:predictorBolt on node: 0binding:000000000000000000000000000000000000000000000000000000000000
2019-07-16 13:35:03 INFO executorThread:127 - predictorBolt(10)
2019-07-16 13:35:03 INFO executorThread:132 - binding to node:0 cpu:19
2019-07-16 13:35:03 INFO boltThread:234 - Successfully create boltExecutors 10 for bolts:predictorBolt on node: 0binding:000000000000000000000000000000000000000000000000000000000000
2019-07-16 13:35:03 INFO executorThread:127 - predictorBolt(11)
2019-07-16 13:35:03 INFO executorThread:132 - binding to node:0 cpu:20
2019-07-16 13:35:03 INFO boltThread:234 - Successfully create boltExecutors 11 for bolts:predictorBolt on node: 0binding:000000000000000000000000000000000000000000000000000000000000
2019-07-16 13:35:03 INFO executorThread:127 - predictorBolt(12)
2019-07-16 13:35:03 INFO executorThread:132 - binding to node:0 cpu:21
2019-07-16 13:35:03 INFO boltThread:234 - Successfully create boltExecutors 12 for bolts:predictorBolt on node: 0binding:000000000000000000000000000000000000000000000000000000000000
2019-07-16 13:35:04 INFO executorThread:127 - predictorBolt(13)
2019-07-16 13:35:04 INFO executorThread:132 - binding to node:0 cpu:22
2019-07-16 13:35:04 INFO boltThread:234 - Successfully create boltExecutors 13 for bolts:predictorBolt on node: 0binding:000000000000000000000000000000000000000000000000000000000000
2019-07-16 13:35:04 INFO executorThread:127 - predictorBolt(14)
2019-07-16 13:35:04 INFO executorThread:132 - binding to node:0 cpu:23
2019-07-16 13:35:04 INFO boltThread:234 - Successfully create boltExecutors 14 for bolts:predictorBolt on node: 0binding:000000000000000000000000000000000000000000000000000000000000
2019-07-16 13:35:04 INFO AffinityController:141 - No available CPUs

@ShuhaoZhangTony
Copy link
Collaborator

Ah! I see.

if ((conf.getBoolean("profile", false) || conf.getBoolean("NAV", false)) && cnt == 17) { At Line 151 of AffinityController.java.

I have unfortunately hard-coded the turn-around case as 17 (because my machine has 18 cores/socket).

Please change it to (#cores/socket -1 ) or any ending point you want the program to seek cores from next socket.

Sorry for that!

@mencagli
Copy link
Author

Dear Tony,

thanks for your precious help! Now the profiling is going well. To have a shorter execution, just for debugging reasons, I reduce the --runtime to 50. At the end of the profiling I receive another exception:

Exception in thread "Operator:parser Executor ID:1" java.lang.IllegalStateException: Agent not initted
at com.javamex.classmexer.Agent.getInstrumentation(Agent.java:33)
at com.javamex.classmexer.MemoryUtil.deepMemoryUsageOf(MemoryUtil.java:104)
at brisk.execution.runtime.boltThread._profile(boltThread.java:109)
at brisk.execution.runtime.executorThread.profile_routing(executorThread.java:342)
at brisk.execution.runtime.boltThread.run(boltThread.java:263)

We are very close to see everything works correctly I guess;)

BTW, your work is very important for us. Maybe in the future when you want, we can have a skype-call because I see that we can use your tool for several applications here in my group.

@ShuhaoZhangTony
Copy link
Collaborator

It would be definitely interesting to discuss or even research collaboration!

The error you see is because you haven't pass in the classmexer agent.
To do that, simply append -javaagent:$HOME/briskstream/common/lib/classmexer.jar in your arguments.
Make sure the path is correct in your case and the classmexer jar is correctly compiled.

@mencagli
Copy link
Author

mencagli commented Jul 16, 2019

Now the profiling is done!!!! Everything seems to work. I have tried FraudDetection with 1 thread per operator and I have a small improvement than the native execution. However, if I increase the number of threads, for example 1 for the spout, parser and sink each, and 10 threads for the predictor, I have a problem:

2019-07-16 14:49:15 INFO ExecutionGraph:91 - Creates the compressed graph with compressRatio:1
2019-07-16 14:49:15 INFO BranchAndBound:145 - original graph size:14
2019-07-16 14:49:15 INFO BranchAndBound:146 - compressed graph size:14
2019-07-16 14:49:15 INFO randomSearch_hardConstraints:69 - failed to find any better plan
2019-07-16 14:49:30 INFO BranchAndBound:208 - It takes 15.12 seconds to finish initialScheduler searching
2019-07-16 14:49:30 INFO BranchAndBound:220 - Failed to find any feasible plan
2019-07-16 14:49:30 INFO BranchAndBound:269 - ======Bound output rate:======= 611.545988258317
2019-07-16 14:49:30 INFO BranchAndBound:306 - BnB takes too long: 14.0 ms, now force to exist. current nodes's output rate:611.545988258317 stack size:22#validOperators:0
2019-07-16 14:49:30 INFO BranchAndBound:349 - All efforts are failed, figure out the bottleneck operators and try to scale it up.
2019-07-16 14:49:30 INFO BranchAndBound:232 - It takes 0.02 seconds to finish branch and bound searching
2019-07-16 14:49:30 INFO BranchAndBound:234 - =============BnB failed===============
2019-07-16 14:49:30 INFO BranchAndBound:236 - CPU Relax:1.0 Memory Relax:1.0 QPI Relax:1.0 Cores Relax:0
2019-07-16 14:49:30 INFO Optimizer:118 - failed to find valid new plan, use original plan instead.
Exception in thread "main" java.lang.NullPointerException
at brisk.optimization.impl.SchedulingPlan.(SchedulingPlan.java:150)
at brisk.optimization.Optimizer.optimize_plan(Optimizer.java:119)
at brisk.optimization.OptimizationManager.lanuch(OptimizationManager.java:200)
at brisk.topology.TopologySubmitter.submitTopology(TopologySubmitter.java:76)
at applications.BriskRunner.runTopologyLocally(BriskRunner.java:101)
at applications.BriskRunner.run(BriskRunner.java:428)
at applications.BriskRunner.main(BriskRunner.java:92

I am very excited to see the optimizations working on my machine!

@ShuhaoZhangTony
Copy link
Collaborator

That's a version control issue, the clock should be removed in briskstream. You can simply remove that line of code.

Next step is to simply run the program without --native" nor --profile" arguments.

An example argument is as follows.
--gc_factor $gc_factor --backPressure --compressRatio -1 --parallelism_tune -st $st -sit 1 -tt $6 -input $iteration -bt $bt --relax 1 -a $app -mp $outputPath/opt/$percentile

Take especially note on the following arguments.

--parallelism_tune : the system will take the application and assume each operator has a parallelism of one, then it will conduct an iterative search for the optimal parallelism and placement automatically. If you don't use this, you need to manually configure the parallelism of each operator, and it may run into error... For example, it cannot find any valid placement plan (the error you have just seen).

--compressRatio -1: this is related to one of the heuristics I applied in the optimisation algorithm. If it is set to -1, the system will automatically tune the compression ratio. You can leave it as -1 for now.

--gc_factor: this is a configuration parameter may or may not be useful. This factor is to tell the cost model how much additional overhead are there due to JVM GC. Currently, I didn't find a nice solution to determine the effect of GC, so I set it manually with a try-and-error process. You can leave it as 0 for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants