|
1 | 1 | # [Benchmarks](@id benchmarks)
|
2 | 2 |
|
3 | 3 | Here we benchmark the model performance in two `Architecture`s.
|
4 |
| -The number of individuals used in the benchmark are `(2^5, 2^10, 2^15, 2^17)`. |
| 4 | +The number of individuals used in the benchmark are `(2^10, 2^15, 2^17, 2^20)`. |
5 | 5 | And we also use different grid resolutions in 2-Dimensional and 3-Dimensional model setup.
|
6 | 6 |
|
7 | 7 | ## 0-Dimensional model
|
8 | 8 |
|
9 | 9 | This is a benchmark of a simple 0-Dimensional model setup without advection of Eulerian tracers. However, the advection of individuals still take the same amount of time whether the velocity field is provided or not.
|
10 | 10 |
|
11 | 11 | ```julia
|
12 |
| -PlanktonIndividuals v0.4.2 |
13 |
| -Julia Version 1.7.0-rc1 |
14 |
| -Commit 9eade6195e (2021-09-12 06:45 UTC) |
| 12 | +PlanktonIndividuals v0.6.1 |
| 13 | +Julia Version 1.8.0 |
| 14 | +Commit 5544a0fab76 (2022-08-17 13:38 UTC) |
15 | 15 | Platform Info:
|
16 | 16 | OS: Linux (x86_64-pc-linux-gnu)
|
17 | 17 | CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
|
18 | 18 | WORD_SIZE: 64
|
19 | 19 | LIBM: libopenlibm
|
20 |
| - LLVM: libLLVM-12.0.1 (ORCJIT, broadwell) |
| 20 | + LLVM: libLLVM-13.0.1 (ORCJIT, broadwell) |
21 | 21 | GPU: Tesla P100-PCIE-12GB
|
| 22 | + CUDA runtime 11.8, artifact installation |
| 23 | + CUDA driver 11.2 |
| 24 | + NVIDIA driver 460.84.0 |
22 | 25 | ```
|
23 | 26 |
|
24 |
| -| Arch | N | min | median | mean | max | memory | allocs | |
25 |
| -|------|--------|------------|------------|------------|------------|------------|--------| |
26 |
| -| CPU | 32 | 978.736 μs | 1.062 ms | 1.114 ms | 1.745 ms | 639.39 KiB | 3377 | |
27 |
| -| CPU | 1024 | 3.217 ms | 3.319 ms | 3.357 ms | 4.003 ms | 639.39 KiB | 3377 | |
28 |
| -| CPU | 32768 | 73.551 ms | 73.612 ms | 73.955 ms | 77.018 ms | 638.91 KiB | 3346 | |
29 |
| -| CPU | 131072 | 297.726 ms | 298.756 ms | 300.489 ms | 316.688 ms | 638.91 KiB | 3346 | |
30 |
| -| GPU | 32 | 7.498 ms | 7.566 ms | 7.636 ms | 8.331 ms | 2.27 MiB | 16453 | |
31 |
| -| GPU | 1024 | 7.599 ms | 7.691 ms | 7.755 ms | 8.487 ms | 2.26 MiB | 16443 | |
32 |
| -| GPU | 32768 | 8.171 ms | 8.362 ms | 8.470 ms | 9.745 ms | 2.26 MiB | 16443 | |
33 |
| -| GPU | 131072 | 9.698 ms | 10.456 ms | 10.637 ms | 12.999 ms | 2.26 MiB | 16438 | |
| 27 | +| Arch | N | min | median | mean | max | memory | allocs | |
| 28 | +|------|---------|------------|------------|------------|------------|------------|--------| |
| 29 | +| CPU | 1024 | 2.945 ms | 3.016 ms | 3.167 ms | 4.328 ms | 478.67 KiB | 2992 | |
| 30 | +| CPU | 32768 | 69.741 ms | 69.812 ms | 71.594 ms | 80.231 ms | 477.72 KiB | 2931 | |
| 31 | +| CPU | 131072 | 276.553 ms | 276.966 ms | 280.569 ms | 300.907 ms | 477.72 KiB | 2931 | |
| 32 | +| CPU | 1048576 | 2.582 s | 2.590 s | 2.590 s | 2.598 s | 477.72 KiB | 2931 | |
| 33 | +| GPU | 1024 | 7.085 ms | 7.158 ms | 7.364 ms | 9.323 ms | 1.92 MiB | 21327 | |
| 34 | +| GPU | 32768 | 7.435 ms | 7.520 ms | 7.925 ms | 10.173 ms | 1.92 MiB | 21327 | |
| 35 | +| GPU | 131072 | 7.053 ms | 9.161 ms | 9.851 ms | 19.812 ms | 1.92 MiB | 21294 | |
| 36 | +| GPU | 1048576 | 8.005 ms | 46.217 ms | 47.484 ms | 122.516 ms | 1.92 MiB | 21294 | |
34 | 37 |
|
35 | 38 | ## 2-Dimensional model
|
36 | 39 |
|
37 | 40 | This is the benchmark of a 2-Dimensional model setup with `(Ns, 1, Ns)` grid cells. Here `Ns = [32, 64, 128]`.
|
38 | 41 |
|
39 | 42 | ```julia
|
40 |
| -PlanktonIndividuals v0.4.2 |
41 |
| -Julia Version 1.7.0-rc1 |
42 |
| -Commit 9eade6195e (2021-09-12 06:45 UTC) |
| 43 | +PlanktonIndividuals v0.6.1 |
| 44 | +Julia Version 1.8.0 |
| 45 | +Commit 5544a0fab76 (2022-08-17 13:38 UTC) |
43 | 46 | Platform Info:
|
44 | 47 | OS: Linux (x86_64-pc-linux-gnu)
|
45 | 48 | CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
|
46 | 49 | WORD_SIZE: 64
|
47 | 50 | LIBM: libopenlibm
|
48 |
| - LLVM: libLLVM-12.0.1 (ORCJIT, broadwell) |
| 51 | + LLVM: libLLVM-13.0.1 (ORCJIT, broadwell) |
49 | 52 | GPU: Tesla P100-PCIE-12GB
|
| 53 | + CUDA runtime 11.8, artifact installation |
| 54 | + CUDA driver 11.2 |
| 55 | + NVIDIA driver 460.84.0 |
50 | 56 | ```
|
51 | 57 |
|
52 |
| -| Arch | N | Ns | min | median | mean | max | memory | allocs | |
53 |
| -|------|--------|-----|------------|------------|------------|------------|-----------|--------| |
54 |
| -| CPU | 32 | 32 | 4.183 ms | 5.037 ms | 4.932 ms | 5.125 ms | 2.86 MiB | 3413 | |
55 |
| -| CPU | 32 | 64 | 12.474 ms | 12.583 ms | 12.697 ms | 13.670 ms | 8.84 MiB | 3386 | |
56 |
| -| CPU | 32 | 128 | 46.953 ms | 57.432 ms | 53.965 ms | 61.537 ms | 31.87 MiB | 3386 | |
57 |
| -| CPU | 1024 | 32 | 6.800 ms | 7.802 ms | 7.589 ms | 7.937 ms | 2.86 MiB | 3413 | |
58 |
| -| CPU | 1024 | 64 | 15.106 ms | 15.227 ms | 15.361 ms | 16.476 ms | 8.84 MiB | 3386 | |
59 |
| -| CPU | 1024 | 128 | 51.023 ms | 61.336 ms | 57.659 ms | 62.330 ms | 31.87 MiB | 3386 | |
60 |
| -| CPU | 32768 | 32 | 91.757 ms | 91.996 ms | 92.255 ms | 93.695 ms | 2.86 MiB | 3382 | |
61 |
| -| CPU | 32768 | 64 | 105.509 ms | 105.603 ms | 106.028 ms | 108.820 ms | 8.84 MiB | 3386 | |
62 |
| -| CPU | 32768 | 128 | 154.187 ms | 155.702 ms | 156.419 ms | 163.824 ms | 31.87 MiB | 3386 | |
63 |
| -| CPU | 131072 | 32 | 362.675 ms | 363.038 ms | 363.071 ms | 363.607 ms | 2.86 MiB | 3382 | |
64 |
| -| CPU | 131072 | 64 | 392.255 ms | 392.962 ms | 395.636 ms | 405.071 ms | 8.84 MiB | 3386 | |
65 |
| -| CPU | 131072 | 128 | 447.502 ms | 458.867 ms | 461.654 ms | 488.007 ms | 31.87 MiB | 3386 | |
66 |
| -| GPU | 32 | 32 | 8.094 ms | 8.161 ms | 8.285 ms | 9.522 ms | 2.29 MiB | 16137 | |
67 |
| -| GPU | 32 | 64 | 7.603 ms | 7.783 ms | 7.833 ms | 8.644 ms | 2.39 MiB | 16141 | |
68 |
| -| GPU | 32 | 128 | 7.728 ms | 7.783 ms | 7.966 ms | 9.569 ms | 2.76 MiB | 16221 | |
69 |
| -| GPU | 1024 | 32 | 8.248 ms | 8.310 ms | 8.432 ms | 9.660 ms | 2.29 MiB | 16127 | |
70 |
| -| GPU | 1024 | 64 | 7.253 ms | 7.329 ms | 7.428 ms | 8.332 ms | 2.38 MiB | 16131 | |
71 |
| -| GPU | 1024 | 128 | 7.957 ms | 7.991 ms | 8.173 ms | 9.711 ms | 2.76 MiB | 16211 | |
72 |
| -| GPU | 32768 | 32 | 8.173 ms | 8.251 ms | 8.372 ms | 9.494 ms | 2.29 MiB | 16127 | |
73 |
| -| GPU | 32768 | 64 | 7.237 ms | 7.291 ms | 7.435 ms | 8.777 ms | 2.38 MiB | 16131 | |
74 |
| -| GPU | 32768 | 128 | 7.681 ms | 7.816 ms | 8.036 ms | 10.264 ms | 2.76 MiB | 16211 | |
75 |
| -| GPU | 131072 | 32 | 8.970 ms | 9.371 ms | 9.390 ms | 9.851 ms | 2.29 MiB | 16153 | |
76 |
| -| GPU | 131072 | 64 | 9.451 ms | 10.731 ms | 10.602 ms | 10.960 ms | 2.38 MiB | 16126 | |
77 |
| -| GPU | 131072 | 128 | 9.267 ms | 12.095 ms | 11.808 ms | 12.248 ms | 2.76 MiB | 16206 | |
| 58 | +| Arch | N | Ns | min | median | mean | max | memory | allocs | |
| 59 | +|------|---------|-----|------------|------------|------------|------------|-----------|--------| |
| 60 | +| CPU | 1024 | 32 | 8.096 ms | 8.132 ms | 8.211 ms | 8.688 ms | 2.70 MiB | 3109 | |
| 61 | +| CPU | 1024 | 64 | 19.889 ms | 19.940 ms | 20.064 ms | 20.952 ms | 8.68 MiB | 3052 | |
| 62 | +| CPU | 1024 | 128 | 68.735 ms | 69.030 ms | 69.672 ms | 75.046 ms | 31.72 MiB | 3052 | |
| 63 | +| CPU | 32768 | 32 | 74.115 ms | 74.154 ms | 76.313 ms | 85.288 ms | 2.70 MiB | 3048 | |
| 64 | +| CPU | 32768 | 64 | 89.999 ms | 90.163 ms | 92.340 ms | 101.475 ms | 8.68 MiB | 3052 | |
| 65 | +| CPU | 32768 | 128 | 162.286 ms | 162.618 ms | 168.129 ms | 190.011 ms | 31.72 MiB | 3052 | |
| 66 | +| CPU | 131072 | 32 | 282.810 ms | 282.913 ms | 286.631 ms | 307.620 ms | 2.70 MiB | 3048 | |
| 67 | +| CPU | 131072 | 64 | 328.584 ms | 328.962 ms | 332.448 ms | 357.787 ms | 8.68 MiB | 3052 | |
| 68 | +| CPU | 131072 | 128 | 447.271 ms | 453.263 ms | 470.108 ms | 509.040 ms | 31.72 MiB | 3052 | |
| 69 | +| CPU | 1048576 | 32 | 2.476 s | 2.476 s | 2.501 s | 2.552 s | 2.70 MiB | 3048 | |
| 70 | +| CPU | 1048576 | 64 | 2.910 s | 2.911 s | 2.911 s | 2.911 s | 8.68 MiB | 3052 | |
| 71 | +| CPU | 1048576 | 128 | 2.905 s | 2.909 s | 2.909 s | 2.914 s | 31.72 MiB | 3052 | |
| 72 | +| GPU | 1024 | 32 | 6.902 ms | 6.920 ms | 7.101 ms | 8.719 ms | 1.98 MiB | 21513 | |
| 73 | +| GPU | 1024 | 64 | 7.417 ms | 7.622 ms | 7.755 ms | 8.430 ms | 2.07 MiB | 21632 | |
| 74 | +| GPU | 1024 | 128 | 7.734 ms | 8.071 ms | 8.141 ms | 8.854 ms | 2.45 MiB | 21713 | |
| 75 | +| GPU | 32768 | 32 | 7.011 ms | 7.092 ms | 7.392 ms | 10.142 ms | 1.98 MiB | 21513 | |
| 76 | +| GPU | 32768 | 64 | 6.769 ms | 6.837 ms | 7.152 ms | 10.035 ms | 2.07 MiB | 21632 | |
| 77 | +| GPU | 32768 | 128 | 7.027 ms | 8.381 ms | 8.561 ms | 11.845 ms | 2.45 MiB | 21713 | |
| 78 | +| GPU | 131072 | 32 | 6.580 ms | 8.054 ms | 8.560 ms | 15.323 ms | 1.98 MiB | 21541 | |
| 79 | +| GPU | 131072 | 64 | 7.491 ms | 9.106 ms | 9.664 ms | 16.128 ms | 2.07 MiB | 21599 | |
| 80 | +| GPU | 131072 | 128 | 7.918 ms | 12.640 ms | 12.791 ms | 23.534 ms | 2.45 MiB | 21680 | |
| 81 | +| GPU | 1048576 | 32 | 9.781 ms | 35.539 ms | 36.437 ms | 59.171 ms | 1.98 MiB | 21528 | |
| 82 | +| GPU | 1048576 | 64 | 10.682 ms | 37.958 ms | 39.055 ms | 65.476 ms | 2.08 MiB | 21647 | |
| 83 | +| GPU | 1048576 | 128 | 7.994 ms | 50.094 ms | 50.772 ms | 126.537 ms | 2.45 MiB | 21680 | |
78 | 84 |
|
79 | 85 | ## 3-Dimensional model
|
80 | 86 |
|
81 | 87 | This is the benchmark of a 3-Dimensional model setup with `(Ns, Ns, Ns)` grid cells. Here `Ns = [32, 64]`.
|
82 | 88 |
|
83 | 89 | ```julia
|
84 |
| -PlanktonIndividuals v0.4.2 |
85 |
| -Julia Version 1.7.0-rc1 |
86 |
| -Commit 9eade6195e (2021-09-12 06:45 UTC) |
| 90 | +PlanktonIndividuals v0.6.1 |
| 91 | +Julia Version 1.8.0 |
| 92 | +Commit 5544a0fab76 (2022-08-17 13:38 UTC) |
87 | 93 | Platform Info:
|
88 | 94 | OS: Linux (x86_64-pc-linux-gnu)
|
89 | 95 | CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
|
90 | 96 | WORD_SIZE: 64
|
91 | 97 | LIBM: libopenlibm
|
92 |
| - LLVM: libLLVM-12.0.1 (ORCJIT, broadwell) |
| 98 | + LLVM: libLLVM-13.0.1 (ORCJIT, broadwell) |
93 | 99 | GPU: Tesla P100-PCIE-12GB
|
| 100 | + CUDA runtime 11.8, artifact installation |
| 101 | + CUDA driver 11.2 |
| 102 | + NVIDIA driver 460.84.0 |
94 | 103 | ```
|
95 | 104 |
|
96 |
| -| Arch | N | Ns | min | median | mean | max | memory | allocs | |
97 |
| -|------|--------|-----|------------|------------|------------|------------|-----------|--------| |
98 |
| -| CPU | 32 | 32 | 38.263 ms | 38.316 ms | 39.038 ms | 41.863 ms | 1.54 MiB | 3154 | |
99 |
| -| CPU | 32 | 64 | 332.699 ms | 333.257 ms | 333.191 ms | 333.711 ms | 8.59 MiB | 3155 | |
100 |
| -| CPU | 1024 | 32 | 41.214 ms | 41.334 ms | 41.623 ms | 44.407 ms | 1.54 MiB | 3154 | |
101 |
| -| CPU | 1024 | 64 | 337.645 ms | 341.374 ms | 350.123 ms | 375.033 ms | 8.59 MiB | 3155 | |
102 |
| -| CPU | 32768 | 32 | 135.441 ms | 135.510 ms | 135.875 ms | 137.648 ms | 1.54 MiB | 3154 | |
103 |
| -| CPU | 32768 | 64 | 447.552 ms | 448.844 ms | 458.740 ms | 499.685 ms | 8.59 MiB | 3155 | |
104 |
| -| CPU | 131072 | 32 | 433.618 ms | 433.704 ms | 433.846 ms | 434.720 ms | 1.54 MiB | 3154 | |
105 |
| -| CPU | 131072 | 64 | 763.314 ms | 763.408 ms | 777.291 ms | 848.858 ms | 8.59 MiB | 3155 | |
106 |
| -| GPU | 32 | 32 | 7.094 ms | 7.159 ms | 7.348 ms | 9.046 ms | 3.26 MiB | 15561 | |
107 |
| -| GPU | 32 | 64 | 10.841 ms | 11.494 ms | 11.443 ms | 11.617 ms | 10.31 MiB | 15611 | |
108 |
| -| GPU | 1024 | 32 | 6.679 ms | 6.790 ms | 6.897 ms | 8.001 ms | 3.25 MiB | 15551 | |
109 |
| -| GPU | 1024 | 64 | 10.791 ms | 11.485 ms | 11.427 ms | 11.617 ms | 10.30 MiB | 15601 | |
110 |
| -| GPU | 32768 | 32 | 6.686 ms | 6.762 ms | 6.936 ms | 8.584 ms | 3.25 MiB | 15551 | |
111 |
| -| GPU | 32768 | 64 | 11.470 ms | 11.857 ms | 11.821 ms | 12.028 ms | 10.30 MiB | 15601 | |
112 |
| -| GPU | 131072 | 32 | 8.724 ms | 10.342 ms | 10.180 ms | 10.585 ms | 3.25 MiB | 15546 | |
113 |
| -| GPU | 131072 | 64 | 12.760 ms | 15.537 ms | 15.228 ms | 15.779 ms | 10.30 MiB | 15627 | |
| 105 | +| Arch | N | Ns | min | median | mean | max | memory | allocs | |
| 106 | +|------|---------|-----|------------|------------|------------|------------|----------|--------| |
| 107 | +| CPU | 1024 | 32 | 50.081 ms | 50.249 ms | 50.421 ms | 51.994 ms | 1.38 MiB | 2820 | |
| 108 | +| CPU | 1024 | 64 | 410.840 ms | 459.105 ms | 451.043 ms | 459.516 ms | 8.43 MiB | 2821 | |
| 109 | +| CPU | 32768 | 32 | 124.176 ms | 124.312 ms | 126.438 ms | 138.224 ms | 1.38 MiB | 2820 | |
| 110 | +| CPU | 32768 | 64 | 498.713 ms | 534.237 ms | 534.148 ms | 554.501 ms | 8.43 MiB | 2821 | |
| 111 | +| CPU | 131072 | 32 | 351.282 ms | 351.674 ms | 355.733 ms | 387.071 ms | 1.38 MiB | 2820 | |
| 112 | +| CPU | 131072 | 64 | 790.994 ms | 808.337 ms | 816.691 ms | 848.149 ms | 8.43 MiB | 2821 | |
| 113 | +| CPU | 1048576 | 32 | 3.019 s | 3.072 s | 3.072 s | 3.125 s | 1.38 MiB | 2820 | |
| 114 | +| CPU | 1048576 | 64 | 3.258 s | 3.258 s | 3.258 s | 3.258 s | 8.43 MiB | 2821 | |
| 115 | +| GPU | 1024 | 32 | 6.229 ms | 6.286 ms | 6.466 ms | 7.329 ms | 2.94 MiB | 21053 | |
| 116 | +| GPU | 1024 | 64 | 9.194 ms | 11.891 ms | 11.689 ms | 12.604 ms | 9.99 MiB | 21077 | |
| 117 | +| GPU | 32768 | 32 | 6.570 ms | 6.638 ms | 6.966 ms | 8.974 ms | 2.94 MiB | 21053 | |
| 118 | +| GPU | 32768 | 64 | 9.143 ms | 12.882 ms | 12.712 ms | 15.781 ms | 9.99 MiB | 21077 | |
| 119 | +| GPU | 131072 | 32 | 6.481 ms | 9.150 ms | 9.469 ms | 16.907 ms | 2.94 MiB | 21081 | |
| 120 | +| GPU | 131072 | 64 | 9.212 ms | 16.623 ms | 16.438 ms | 25.557 ms | 9.99 MiB | 21105 | |
| 121 | +| GPU | 1048576 | 32 | 7.257 ms | 39.894 ms | 40.268 ms | 96.189 ms | 2.94 MiB | 21020 | |
| 122 | +| GPU | 1048576 | 64 | 9.586 ms | 54.934 ms | 53.741 ms | 118.675 ms | 9.99 MiB | 21105 | |
0 commit comments