Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STL optimization: Bounding volume hierarchy #4140

Open
wants to merge 9 commits into
base: development
Choose a base branch
from

Conversation

WeiqunZhang
Copy link
Member

@WeiqunZhang WeiqunZhang commented Sep 8, 2024

Speed up EB geometry generation with STL by using the bounding volume hierarchy (BVH) method. The BVH tree is stored in a contiguous chunk of memory making it easier for GPUs. Using a fixed size stack, recursion is avoided when traversing the tree.

X-Ref: https://rmrsk.github.io/EBGeometry/Concepts.html#bounding-volume-hierarchies

Speed up EB geometry generation with STL by using the bounding volume
hierarchy (BVH) method. The BVH tree is stored in a contiguous chunk of
memory making it easier for GPUs. Using a fixed size stack, recursion is
avoided when traversing the tree.
@WeiqunZhang WeiqunZhang marked this pull request as draft September 8, 2024 18:02
@rmrsk
Copy link

rmrsk commented Sep 8, 2024

It is also possible to build the BVH from the bottom and upwards using space-filling curves. This leads to a worse tree, but construction is much faster and without recursion. Would you like some timing results to see if you want to support both construction methods?

@WeiqunZhang
Copy link
Member Author

Yes, it would be great if you can share your timing results.

For the max number of children of the tree, 2 would be too small because the stack (needed for every GPU threads) would need to quite big. So I use 4. What's your experience of how that affects performance in CPU code?

@rmrsk
Copy link

rmrsk commented Sep 8, 2024

Ok, I've tested on an STL with 1.2 million triangles on the Adirondack STL (https://github.com/rmrsk/EBGeometry/tree/main/Examples/Resources) Timing results on a single CPU showed that bottom-up construction took about 60% of the time of the top-down construction. Bottom-up construction works by doing a single sort using an SFC index as comparator, as opposed to top-down construction which is sorted each time a leaf node is split. The performance difference decreases as the mesh size decreases (probably due to sorting of triangles which becomes gradually cheaper). We have yet to hit a case where bottom-up construction is better, but we don't do moving geometries so there's that...

We generally use factor 4 branching ratios for the trees, with factor 2 being only slightly slower.

@WeiqunZhang
Copy link
Member Author

The performance seems to be very good. For big STL files, the new version is 100x faster on CPU and 10x faster on GPU. The GPU kernel for BVH has thread divergence issue. That's probably why the performance gap between CPU and GPU has shrunk with the new version.

STL: https://github.com/rmrsk/EBGeometry/blob/main/Examples/Resources/adirondack.stl
Number of triangles: 1,193,660
GPU: A100 40 GB
CPU: AMD EPYC 7763, one core
Time: amrex::EB2::Build time

| Device | BVH | Time (s) |
|--------+-----+----------|
| CPU    | Yes |     27.2 |
| GPU    | Yes |      3.4 |
|--------+-----+----------|
| CPU    | No  |   > 3000 |
| GPU    | No  |     29.7 |

The CPU job without BVH ran out of time.

@WeiqunZhang
Copy link
Member Author

I repeated the test on my desktop with a less powerful GPU. Here are the results.

  GPU: GeForce GTX 1060 GPU
  CPU: Intel Xeon E3-1275 v5 @ 3.60GHz, one core
 
  | Device | BVH | Time |
  |--------+-----+------|
  | CPU    | Yes |   35 |
  | GPU    | Yes |   13 |
  |--------+-----+------|
  | CPU    | No  | 7538 |
  | GPU    | No  |  515 |

@WeiqunZhang WeiqunZhang marked this pull request as ready for review September 13, 2024 02:07
@WeiqunZhang WeiqunZhang enabled auto-merge (squash) September 13, 2024 02:07
@rmrsk
Copy link

rmrsk commented Sep 13, 2024

@WeiqunZhang Do you have the BVH build times?

@WeiqunZhang
Copy link
Member Author

build_bvh seems to be very fast. Here is the test. https://github.com/WeiqunZhang/amrex-devtests/tree/main/eb_stl2

-------------------------------------------------------------------------------------------
Name                                        NCalls  Incl. Min  Incl. Avg  Incl. Max   Max %
-------------------------------------------------------------------------------------------
main                                             1      33.71      33.71      33.71 100.00%
EB2:Build                                        1      33.46      33.46      33.46  99.27%
EB2::STLLevel()-fine                             1      31.73      31.73      31.73  94.12%
EB2::GShopLevel()-fine                           1      31.73      31.73      31.73  94.12%
STLtools::getBoxType                             8      14.44      14.44      14.44  42.83%
STLtools::prepare                                1     0.9077     0.9077     0.9077   2.69%
STLtools::build_bvh                              1     0.8783     0.8783     0.8783   2.61%
Other                                           91      0.285      0.285      0.285   0.85%
-------------------------------------------------------------------------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants