Skip to content

Commit 86b3d73

Browse files
committed
update benchmarks
1 parent f9fba0d commit 86b3d73

File tree

1 file changed

+76
-67
lines changed

1 file changed

+76
-67
lines changed

docs/src/benchmarks.md

+76-67
Original file line numberDiff line numberDiff line change
@@ -1,113 +1,122 @@
11
# [Benchmarks](@id benchmarks)
22

33
Here we benchmark the model performance in two `Architecture`s.
4-
The number of individuals used in the benchmark are `(2^5, 2^10, 2^15, 2^17)`.
4+
The number of individuals used in the benchmark are `(2^10, 2^15, 2^17, 2^20)`.
55
And we also use different grid resolutions in 2-Dimensional and 3-Dimensional model setup.
66

77
## 0-Dimensional model
88

99
This is a benchmark of a simple 0-Dimensional model setup without advection of Eulerian tracers. However, the advection of individuals still take the same amount of time whether the velocity field is provided or not.
1010

1111
```julia
12-
PlanktonIndividuals v0.4.2
13-
Julia Version 1.7.0-rc1
14-
Commit 9eade6195e (2021-09-12 06:45 UTC)
12+
PlanktonIndividuals v0.6.1
13+
Julia Version 1.8.0
14+
Commit 5544a0fab76 (2022-08-17 13:38 UTC)
1515
Platform Info:
1616
OS: Linux (x86_64-pc-linux-gnu)
1717
CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
1818
WORD_SIZE: 64
1919
LIBM: libopenlibm
20-
LLVM: libLLVM-12.0.1 (ORCJIT, broadwell)
20+
LLVM: libLLVM-13.0.1 (ORCJIT, broadwell)
2121
GPU: Tesla P100-PCIE-12GB
22+
CUDA runtime 11.8, artifact installation
23+
CUDA driver 11.2
24+
NVIDIA driver 460.84.0
2225
```
2326

24-
| Arch | N | min | median | mean | max | memory | allocs |
25-
|------|--------|------------|------------|------------|------------|------------|--------|
26-
| CPU | 32 | 978.736 μs | 1.062 ms | 1.114 ms | 1.745 ms | 639.39 KiB | 3377 |
27-
| CPU | 1024 | 3.217 ms | 3.319 ms | 3.357 ms | 4.003 ms | 639.39 KiB | 3377 |
28-
| CPU | 32768 | 73.551 ms | 73.612 ms | 73.955 ms | 77.018 ms | 638.91 KiB | 3346 |
29-
| CPU | 131072 | 297.726 ms | 298.756 ms | 300.489 ms | 316.688 ms | 638.91 KiB | 3346 |
30-
| GPU | 32 | 7.498 ms | 7.566 ms | 7.636 ms | 8.331 ms | 2.27 MiB | 16453 |
31-
| GPU | 1024 | 7.599 ms | 7.691 ms | 7.755 ms | 8.487 ms | 2.26 MiB | 16443 |
32-
| GPU | 32768 | 8.171 ms | 8.362 ms | 8.470 ms | 9.745 ms | 2.26 MiB | 16443 |
33-
| GPU | 131072 | 9.698 ms | 10.456 ms | 10.637 ms | 12.999 ms | 2.26 MiB | 16438 |
27+
| Arch | N | min | median | mean | max | memory | allocs |
28+
|------|---------|------------|------------|------------|------------|------------|--------|
29+
| CPU | 1024 | 2.945 ms | 3.016 ms | 3.167 ms | 4.328 ms | 478.67 KiB | 2992 |
30+
| CPU | 32768 | 69.741 ms | 69.812 ms | 71.594 ms | 80.231 ms | 477.72 KiB | 2931 |
31+
| CPU | 131072 | 276.553 ms | 276.966 ms | 280.569 ms | 300.907 ms | 477.72 KiB | 2931 |
32+
| CPU | 1048576 | 2.582 s | 2.590 s | 2.590 s | 2.598 s | 477.72 KiB | 2931 |
33+
| GPU | 1024 | 7.085 ms | 7.158 ms | 7.364 ms | 9.323 ms | 1.92 MiB | 21327 |
34+
| GPU | 32768 | 7.435 ms | 7.520 ms | 7.925 ms | 10.173 ms | 1.92 MiB | 21327 |
35+
| GPU | 131072 | 7.053 ms | 9.161 ms | 9.851 ms | 19.812 ms | 1.92 MiB | 21294 |
36+
| GPU | 1048576 | 8.005 ms | 46.217 ms | 47.484 ms | 122.516 ms | 1.92 MiB | 21294 |
3437

3538
## 2-Dimensional model
3639

3740
This is the benchmark of a 2-Dimensional model setup with `(Ns, 1, Ns)` grid cells. Here `Ns = [32, 64, 128]`.
3841

3942
```julia
40-
PlanktonIndividuals v0.4.2
41-
Julia Version 1.7.0-rc1
42-
Commit 9eade6195e (2021-09-12 06:45 UTC)
43+
PlanktonIndividuals v0.6.1
44+
Julia Version 1.8.0
45+
Commit 5544a0fab76 (2022-08-17 13:38 UTC)
4346
Platform Info:
4447
OS: Linux (x86_64-pc-linux-gnu)
4548
CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
4649
WORD_SIZE: 64
4750
LIBM: libopenlibm
48-
LLVM: libLLVM-12.0.1 (ORCJIT, broadwell)
51+
LLVM: libLLVM-13.0.1 (ORCJIT, broadwell)
4952
GPU: Tesla P100-PCIE-12GB
53+
CUDA runtime 11.8, artifact installation
54+
CUDA driver 11.2
55+
NVIDIA driver 460.84.0
5056
```
5157

52-
| Arch | N | Ns | min | median | mean | max | memory | allocs |
53-
|------|--------|-----|------------|------------|------------|------------|-----------|--------|
54-
| CPU | 32 | 32 | 4.183 ms | 5.037 ms | 4.932 ms | 5.125 ms | 2.86 MiB | 3413 |
55-
| CPU | 32 | 64 | 12.474 ms | 12.583 ms | 12.697 ms | 13.670 ms | 8.84 MiB | 3386 |
56-
| CPU | 32 | 128 | 46.953 ms | 57.432 ms | 53.965 ms | 61.537 ms | 31.87 MiB | 3386 |
57-
| CPU | 1024 | 32 | 6.800 ms | 7.802 ms | 7.589 ms | 7.937 ms | 2.86 MiB | 3413 |
58-
| CPU | 1024 | 64 | 15.106 ms | 15.227 ms | 15.361 ms | 16.476 ms | 8.84 MiB | 3386 |
59-
| CPU | 1024 | 128 | 51.023 ms | 61.336 ms | 57.659 ms | 62.330 ms | 31.87 MiB | 3386 |
60-
| CPU | 32768 | 32 | 91.757 ms | 91.996 ms | 92.255 ms | 93.695 ms | 2.86 MiB | 3382 |
61-
| CPU | 32768 | 64 | 105.509 ms | 105.603 ms | 106.028 ms | 108.820 ms | 8.84 MiB | 3386 |
62-
| CPU | 32768 | 128 | 154.187 ms | 155.702 ms | 156.419 ms | 163.824 ms | 31.87 MiB | 3386 |
63-
| CPU | 131072 | 32 | 362.675 ms | 363.038 ms | 363.071 ms | 363.607 ms | 2.86 MiB | 3382 |
64-
| CPU | 131072 | 64 | 392.255 ms | 392.962 ms | 395.636 ms | 405.071 ms | 8.84 MiB | 3386 |
65-
| CPU | 131072 | 128 | 447.502 ms | 458.867 ms | 461.654 ms | 488.007 ms | 31.87 MiB | 3386 |
66-
| GPU | 32 | 32 | 8.094 ms | 8.161 ms | 8.285 ms | 9.522 ms | 2.29 MiB | 16137 |
67-
| GPU | 32 | 64 | 7.603 ms | 7.783 ms | 7.833 ms | 8.644 ms | 2.39 MiB | 16141 |
68-
| GPU | 32 | 128 | 7.728 ms | 7.783 ms | 7.966 ms | 9.569 ms | 2.76 MiB | 16221 |
69-
| GPU | 1024 | 32 | 8.248 ms | 8.310 ms | 8.432 ms | 9.660 ms | 2.29 MiB | 16127 |
70-
| GPU | 1024 | 64 | 7.253 ms | 7.329 ms | 7.428 ms | 8.332 ms | 2.38 MiB | 16131 |
71-
| GPU | 1024 | 128 | 7.957 ms | 7.991 ms | 8.173 ms | 9.711 ms | 2.76 MiB | 16211 |
72-
| GPU | 32768 | 32 | 8.173 ms | 8.251 ms | 8.372 ms | 9.494 ms | 2.29 MiB | 16127 |
73-
| GPU | 32768 | 64 | 7.237 ms | 7.291 ms | 7.435 ms | 8.777 ms | 2.38 MiB | 16131 |
74-
| GPU | 32768 | 128 | 7.681 ms | 7.816 ms | 8.036 ms | 10.264 ms | 2.76 MiB | 16211 |
75-
| GPU | 131072 | 32 | 8.970 ms | 9.371 ms | 9.390 ms | 9.851 ms | 2.29 MiB | 16153 |
76-
| GPU | 131072 | 64 | 9.451 ms | 10.731 ms | 10.602 ms | 10.960 ms | 2.38 MiB | 16126 |
77-
| GPU | 131072 | 128 | 9.267 ms | 12.095 ms | 11.808 ms | 12.248 ms | 2.76 MiB | 16206 |
58+
| Arch | N | Ns | min | median | mean | max | memory | allocs |
59+
|------|---------|-----|------------|------------|------------|------------|-----------|--------|
60+
| CPU | 1024 | 32 | 8.096 ms | 8.132 ms | 8.211 ms | 8.688 ms | 2.70 MiB | 3109 |
61+
| CPU | 1024 | 64 | 19.889 ms | 19.940 ms | 20.064 ms | 20.952 ms | 8.68 MiB | 3052 |
62+
| CPU | 1024 | 128 | 68.735 ms | 69.030 ms | 69.672 ms | 75.046 ms | 31.72 MiB | 3052 |
63+
| CPU | 32768 | 32 | 74.115 ms | 74.154 ms | 76.313 ms | 85.288 ms | 2.70 MiB | 3048 |
64+
| CPU | 32768 | 64 | 89.999 ms | 90.163 ms | 92.340 ms | 101.475 ms | 8.68 MiB | 3052 |
65+
| CPU | 32768 | 128 | 162.286 ms | 162.618 ms | 168.129 ms | 190.011 ms | 31.72 MiB | 3052 |
66+
| CPU | 131072 | 32 | 282.810 ms | 282.913 ms | 286.631 ms | 307.620 ms | 2.70 MiB | 3048 |
67+
| CPU | 131072 | 64 | 328.584 ms | 328.962 ms | 332.448 ms | 357.787 ms | 8.68 MiB | 3052 |
68+
| CPU | 131072 | 128 | 447.271 ms | 453.263 ms | 470.108 ms | 509.040 ms | 31.72 MiB | 3052 |
69+
| CPU | 1048576 | 32 | 2.476 s | 2.476 s | 2.501 s | 2.552 s | 2.70 MiB | 3048 |
70+
| CPU | 1048576 | 64 | 2.910 s | 2.911 s | 2.911 s | 2.911 s | 8.68 MiB | 3052 |
71+
| CPU | 1048576 | 128 | 2.905 s | 2.909 s | 2.909 s | 2.914 s | 31.72 MiB | 3052 |
72+
| GPU | 1024 | 32 | 6.902 ms | 6.920 ms | 7.101 ms | 8.719 ms | 1.98 MiB | 21513 |
73+
| GPU | 1024 | 64 | 7.417 ms | 7.622 ms | 7.755 ms | 8.430 ms | 2.07 MiB | 21632 |
74+
| GPU | 1024 | 128 | 7.734 ms | 8.071 ms | 8.141 ms | 8.854 ms | 2.45 MiB | 21713 |
75+
| GPU | 32768 | 32 | 7.011 ms | 7.092 ms | 7.392 ms | 10.142 ms | 1.98 MiB | 21513 |
76+
| GPU | 32768 | 64 | 6.769 ms | 6.837 ms | 7.152 ms | 10.035 ms | 2.07 MiB | 21632 |
77+
| GPU | 32768 | 128 | 7.027 ms | 8.381 ms | 8.561 ms | 11.845 ms | 2.45 MiB | 21713 |
78+
| GPU | 131072 | 32 | 6.580 ms | 8.054 ms | 8.560 ms | 15.323 ms | 1.98 MiB | 21541 |
79+
| GPU | 131072 | 64 | 7.491 ms | 9.106 ms | 9.664 ms | 16.128 ms | 2.07 MiB | 21599 |
80+
| GPU | 131072 | 128 | 7.918 ms | 12.640 ms | 12.791 ms | 23.534 ms | 2.45 MiB | 21680 |
81+
| GPU | 1048576 | 32 | 9.781 ms | 35.539 ms | 36.437 ms | 59.171 ms | 1.98 MiB | 21528 |
82+
| GPU | 1048576 | 64 | 10.682 ms | 37.958 ms | 39.055 ms | 65.476 ms | 2.08 MiB | 21647 |
83+
| GPU | 1048576 | 128 | 7.994 ms | 50.094 ms | 50.772 ms | 126.537 ms | 2.45 MiB | 21680 |
7884

7985
## 3-Dimensional model
8086

8187
This is the benchmark of a 3-Dimensional model setup with `(Ns, Ns, Ns)` grid cells. Here `Ns = [32, 64]`.
8288

8389
```julia
84-
PlanktonIndividuals v0.4.2
85-
Julia Version 1.7.0-rc1
86-
Commit 9eade6195e (2021-09-12 06:45 UTC)
90+
PlanktonIndividuals v0.6.1
91+
Julia Version 1.8.0
92+
Commit 5544a0fab76 (2022-08-17 13:38 UTC)
8793
Platform Info:
8894
OS: Linux (x86_64-pc-linux-gnu)
8995
CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
9096
WORD_SIZE: 64
9197
LIBM: libopenlibm
92-
LLVM: libLLVM-12.0.1 (ORCJIT, broadwell)
98+
LLVM: libLLVM-13.0.1 (ORCJIT, broadwell)
9399
GPU: Tesla P100-PCIE-12GB
100+
CUDA runtime 11.8, artifact installation
101+
CUDA driver 11.2
102+
NVIDIA driver 460.84.0
94103
```
95104

96-
| Arch | N | Ns | min | median | mean | max | memory | allocs |
97-
|------|--------|-----|------------|------------|------------|------------|-----------|--------|
98-
| CPU | 32 | 32 | 38.263 ms | 38.316 ms | 39.038 ms | 41.863 ms | 1.54 MiB | 3154 |
99-
| CPU | 32 | 64 | 332.699 ms | 333.257 ms | 333.191 ms | 333.711 ms | 8.59 MiB | 3155 |
100-
| CPU | 1024 | 32 | 41.214 ms | 41.334 ms | 41.623 ms | 44.407 ms | 1.54 MiB | 3154 |
101-
| CPU | 1024 | 64 | 337.645 ms | 341.374 ms | 350.123 ms | 375.033 ms | 8.59 MiB | 3155 |
102-
| CPU | 32768 | 32 | 135.441 ms | 135.510 ms | 135.875 ms | 137.648 ms | 1.54 MiB | 3154 |
103-
| CPU | 32768 | 64 | 447.552 ms | 448.844 ms | 458.740 ms | 499.685 ms | 8.59 MiB | 3155 |
104-
| CPU | 131072 | 32 | 433.618 ms | 433.704 ms | 433.846 ms | 434.720 ms | 1.54 MiB | 3154 |
105-
| CPU | 131072 | 64 | 763.314 ms | 763.408 ms | 777.291 ms | 848.858 ms | 8.59 MiB | 3155 |
106-
| GPU | 32 | 32 | 7.094 ms | 7.159 ms | 7.348 ms | 9.046 ms | 3.26 MiB | 15561 |
107-
| GPU | 32 | 64 | 10.841 ms | 11.494 ms | 11.443 ms | 11.617 ms | 10.31 MiB | 15611 |
108-
| GPU | 1024 | 32 | 6.679 ms | 6.790 ms | 6.897 ms | 8.001 ms | 3.25 MiB | 15551 |
109-
| GPU | 1024 | 64 | 10.791 ms | 11.485 ms | 11.427 ms | 11.617 ms | 10.30 MiB | 15601 |
110-
| GPU | 32768 | 32 | 6.686 ms | 6.762 ms | 6.936 ms | 8.584 ms | 3.25 MiB | 15551 |
111-
| GPU | 32768 | 64 | 11.470 ms | 11.857 ms | 11.821 ms | 12.028 ms | 10.30 MiB | 15601 |
112-
| GPU | 131072 | 32 | 8.724 ms | 10.342 ms | 10.180 ms | 10.585 ms | 3.25 MiB | 15546 |
113-
| GPU | 131072 | 64 | 12.760 ms | 15.537 ms | 15.228 ms | 15.779 ms | 10.30 MiB | 15627 |
105+
| Arch | N | Ns | min | median | mean | max | memory | allocs |
106+
|------|---------|-----|------------|------------|------------|------------|----------|--------|
107+
| CPU | 1024 | 32 | 50.081 ms | 50.249 ms | 50.421 ms | 51.994 ms | 1.38 MiB | 2820 |
108+
| CPU | 1024 | 64 | 410.840 ms | 459.105 ms | 451.043 ms | 459.516 ms | 8.43 MiB | 2821 |
109+
| CPU | 32768 | 32 | 124.176 ms | 124.312 ms | 126.438 ms | 138.224 ms | 1.38 MiB | 2820 |
110+
| CPU | 32768 | 64 | 498.713 ms | 534.237 ms | 534.148 ms | 554.501 ms | 8.43 MiB | 2821 |
111+
| CPU | 131072 | 32 | 351.282 ms | 351.674 ms | 355.733 ms | 387.071 ms | 1.38 MiB | 2820 |
112+
| CPU | 131072 | 64 | 790.994 ms | 808.337 ms | 816.691 ms | 848.149 ms | 8.43 MiB | 2821 |
113+
| CPU | 1048576 | 32 | 3.019 s | 3.072 s | 3.072 s | 3.125 s | 1.38 MiB | 2820 |
114+
| CPU | 1048576 | 64 | 3.258 s | 3.258 s | 3.258 s | 3.258 s | 8.43 MiB | 2821 |
115+
| GPU | 1024 | 32 | 6.229 ms | 6.286 ms | 6.466 ms | 7.329 ms | 2.94 MiB | 21053 |
116+
| GPU | 1024 | 64 | 9.194 ms | 11.891 ms | 11.689 ms | 12.604 ms | 9.99 MiB | 21077 |
117+
| GPU | 32768 | 32 | 6.570 ms | 6.638 ms | 6.966 ms | 8.974 ms | 2.94 MiB | 21053 |
118+
| GPU | 32768 | 64 | 9.143 ms | 12.882 ms | 12.712 ms | 15.781 ms | 9.99 MiB | 21077 |
119+
| GPU | 131072 | 32 | 6.481 ms | 9.150 ms | 9.469 ms | 16.907 ms | 2.94 MiB | 21081 |
120+
| GPU | 131072 | 64 | 9.212 ms | 16.623 ms | 16.438 ms | 25.557 ms | 9.99 MiB | 21105 |
121+
| GPU | 1048576 | 32 | 7.257 ms | 39.894 ms | 40.268 ms | 96.189 ms | 2.94 MiB | 21020 |
122+
| GPU | 1048576 | 64 | 9.586 ms | 54.934 ms | 53.741 ms | 118.675 ms | 9.99 MiB | 21105 |

0 commit comments

Comments
 (0)