-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running BriskStream benchmarks #4
Comments
Hi Gabriele, I guess the system hangs at the "load()" function (see the following code). The reason is that it seeks to load statistics file that is required by the cost model, but you haven't prepare it. If you want to test the raw performance of briskstream without invoke the cost model and optimization process, you can append a "--native" argument. It's like "java -cp target/BriskBenchmarks-1.2.0-jar-with-dependencies.jar applications.BriskRunner --native". If you want to test the optimization, you need to first profile the application by specifying a "--profile" argument. An example argument is as follows (you can find it in ``nus" script). After profiliation is done, you should be able to see some files like follows. You can find full list of arguments in ``abstractRunner.Java" class. I'll pin this issue in case someone else encounters the same problem. |
Thanks for the quick reply!! I tried with "java -cp target/BriskBenchmarks-1.2.0-jar-with-dependencies.jar applications.BriskRunner --native" and the problem is now solved. Unfortunately, there is still a NullPointerExpection raised at 2019-06-11 14:10:20 INFO Optimizer:54 - number of CPUs:8 Probably, there is still some configuration issues. Thanks a lot for your help and guide in this. |
No problem. :) In fact, I should apologize for not having time to make it more user-friendly. For the NullPointerException, please ensure Another suggestion is that, when you test the performance of BriskStream, remember to use "-bt " argument to configure different jumbo tuple size. A good starting point is "-bt 10", that is 10 tuples are merged into a single joint tuple. This is to reduce instruction cache miss and improve cross-operator communication efficiency. More details can be found in the paper. A personal version of the paper is here https://www.comp.nus.edu.sg/~shuhao-z/docs/briskstream.pdf Thanks! |
Thanks a lot! Exception in thread "Operator:splitSentence Executor ID:2" java.lang.NullPointerException It seems that the SplitSentenceBolt picked an Exception. But it is good that the topology is running now. Today I will try to understand what happened and in case I will post tomorrow some updates hoping to find some help from you! Thanks! Gabriele |
Ha, I guess that's related to the input data sets. |
Dear Tony, WordCount is running well. I am collecting performance results now. Thanks. I would like to run also FraudDectection. I have the two dataset files (credit-card.dat and model.txt). I am trying the run the application using the following command: java -cp target/BriskBenchmarks-1.2.0-jar-with-dependencies.jar applications.BriskRunner --native -bt 10 -a FraudDetection and I receive immediately the following exception Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 Maybe another configuration issue? Thanks for your great help. Best, Gabriele |
Hi Gabriele, You need to at least configure "-tt 2" (default is 1) to the system in order to run FraudDetection application. Otherwise, the parallelism is zero for some operators. Tony |
Dear Tony, thank you for your help in the past. In this weeks, I have executed some applications in BriskStream in my multicore machine. All the experiments have been performed using the --native option. Now, I would like to test your interesting optimizations. I have tried with the following command to generate the profile statistics as explained in your previous post answering my issue: java -jar target/BriskBenchmarks-1.2.0-jar-with-dependencies.jar --profile -st 1 -sit 1 -tt 40 --num_socket 2 --num_cpu 16 --THz 500000 --runtime 5000 --loop 100000 --size_tuple 10 --repeat 1 -bt 100 --percentile 50 Please, note that my machine is quite small, two sockets only, but I am going to test BriskStream in a larger shared-memory system soon. The problem that I have with the previous command is that the system hangs forever without doing anything. By running "htop", I see that there is only one thread continuously consuming 100% of a given CPU core. I am wondering if there is a similar problem as the one in my previous post (hanging on load() or whatever). By looking into the code, it seems that the program hangs on this call: ExecutionGraph g = new TopologyComiler().generateEG(topology, conf); in TopologySubmitter.java I hope that you can give me some insights into how to proceed. Many thanks for your attention, Gabriele |
Hi Gabriele,
Furthermore, please remember that under profiling mode, "-tt" only specifies the number of threads of ``stateful" operator (e.g., Counter in WordCount application), the rest operators (as they are stateless) will be automatically configured with a parallelism of one. One more thing, since you are playing with the optimizations, please remember to configure your own machines' specifications as a new class file such as I have used two machines, HP and HUAWEI. Hence, there're two class files correspondingly. Besides, you need to configure As of now, I haven't prepare the code to generate all those static information automatically but are prepared offline manually and hard-coded into the system. Sorry for such inconvenience! To generate all necessary information of your machine (e.g., latency, memory bandwidth etc.), you can use Intel mlc tool (https://software.intel.com/en-us/articles/intelr-memory-latency-checker). To generate NUMA mapping, you can use NUMACTL. Tony. |
Dear Tony, with your help, I am very close to be able to do the profiling. I did all the passages in your answer above. However, I have still a problem. I prepared a file like your HP_Machine.java with the latency profiling information obtained by the Intell MLC tool. Great! I have also modified the getNodes() method as follows:
My machine is named "Pianosa", two sockets with 8 cores each (16 with hyperthhreading). Totally 16 cores (32 contexts). Everything seems ready. When I run: java -jar target/BriskBenchmarks-1.2.0-jar-with-dependencies.jar --profile -a FraudDetection -st 1 -sit 1 -tt 29 --THz 500000 --runtime 5000 --loop 100000 --size_tuple 10 --repeat 1 -bt 100 --percentile 50 --machine 4 everything starts and the program fails in assigning a core to one thread. In particular this is a fragment of the output: ... The problem should be in AffinityController.java in the requirePerCore() method. It raises an exception maybe here:
After debugging, I see that node is 0 and cnt il 16. In fact, each numa node in my machine has 16 SMT cores, so the last valid cnt should be 15 and not 16. Maybe, there is still some problems (ArrayIndexOfOutBoundException). Do you have any idea about how to fix this? Many thanks! gabriele |
I don't know if this might help. I also receive a message from the AffinityLock saying that 2019-07-16 12:33:03 INFO AffinityLock:126 - No isolated CPUs found, so assuming CPUs 1 to 31 available. |
Hi Gabriele, The problem is that cpu0 must be reserved for operating system. By the way, I have explicitly avoided the usage of hyper threading (HT) cores in my experiments so I'm not sure how will it goes if HT is enabled. At least, it will most likely make the cost model less accurate. It should be an interesting study to explore through. Thanks! Tony. |
AffinityLock's message is saying you didn't explicitly make CPU cores isolated |
Dear Tony, I understand that core0 is reserved to the OS. Unfortunately, the exception still happens as soon as the -tt value is greater than 12. It seems that it is looking for the core with index cnt = 16 that does not exist (on node 0 I have 16 cores so index from 0 to 15) cpus[0] = (this.mapping_node[node].get(cnt)); |
NUMA node0 CPU(s): 0-7,16-23 The profiling assigns correctly the threads on the cores in the first numa node, but as soon as it tries to assign the cores on the second numa node it does not work (it still tries to seek the core in the first numa node) java -jar target/BriskBenchmarks-1.2.0-jar-with-dependencies.jar --profile -a FraudDetection -st 1 -sit 1 -tt 13 --THz 500000 --runtime 5000 --loop 100000 --size_tuple 10 --repeat 1 -bt 100 --percentile 50 --machine 4 |
Ah! I see.
I have unfortunately hard-coded the turn-around case as 17 (because my machine has 18 cores/socket). Please change it to (#cores/socket -1 ) or any ending point you want the program to seek cores from next socket. Sorry for that! |
Dear Tony, thanks for your precious help! Now the profiling is going well. To have a shorter execution, just for debugging reasons, I reduce the --runtime to 50. At the end of the profiling I receive another exception: Exception in thread "Operator:parser Executor ID:1" java.lang.IllegalStateException: Agent not initted We are very close to see everything works correctly I guess;) BTW, your work is very important for us. Maybe in the future when you want, we can have a skype-call because I see that we can use your tool for several applications here in my group. |
It would be definitely interesting to discuss or even research collaboration! The error you see is because you haven't pass in the classmexer agent. |
Now the profiling is done!!!! Everything seems to work. I have tried FraudDetection with 1 thread per operator and I have a small improvement than the native execution. However, if I increase the number of threads, for example 1 for the spout, parser and sink each, and 10 threads for the predictor, I have a problem: 2019-07-16 14:49:15 INFO ExecutionGraph:91 - Creates the compressed graph with compressRatio:1 I am very excited to see the optimizations working on my machine! |
That's a version control issue, the clock should be removed in briskstream. You can simply remove that line of code. Next step is to simply run the program without An example argument is as follows. Take especially note on the following arguments. --parallelism_tune : the system will take the application and assume each operator has a parallelism of one, then it will conduct an iterative search for the optimal parallelism and placement automatically. If you don't use this, you need to manually configure the parallelism of each operator, and it may run into error... For example, it cannot find any valid placement plan (the error you have just seen). --compressRatio -1: this is related to one of the heuristics I applied in the optimisation algorithm. If it is set to -1, the system will automatically tune the compression ratio. You can leave it as -1 for now. --gc_factor: this is a configuration parameter may or may not be useful. This factor is to tell the cost model how much additional overhead are there due to JVM GC. Currently, I didn't find a nice solution to determine the effect of GC, so I set it manually with a try-and-error process. You can leave it as 0 for now. |
Dear Tony,
I am trying to run BriskStream benchmarks in my multi-core machine (stable branch). The compilation works fine but I am not able to run any program. Since your scripts are very complex, I am trying to run by hand the WordCount benchmark using my local dataset (I have changed the path in the configuration file). Unfortunately, the application hangs foreever without processing anything. I run it using a command like:
java -cp target/BriskBenchmarks-1.2.0-jar-with-dependencies.jar applications.BriskRunner
which should use the default configuration and the default application (WordCount). By inspecting the code (by adding proper prints), I discover that the code hangs in the file ExecutionGraph.java in the Loading function (the two double loops inside). I understand that there are some problems in loading the configuration and nothing happens: the topology is not working and not processing any data.
Can you please help me in understanding how to run BriskStream?
Thanks!
Gabriele
The text was updated successfully, but these errors were encountered: