Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
源码运行结果
优化后运行结果
优化细节
1.首先立即尝试就尝试将AOS修改为SOA
使用array替换了vector,效果群拔,立即就将1300ms左右的耗时降低到了130ms出头
2.对初始化下手了
我看汇编代码,发现轮流px[i] py[i] pz[i] vx[i] vy[i] vz[i] mass[i]的方法并没有发生矢量化优化,
结果特地去写了先全部赋值px,在全部赋值px,再全部赋值py的方法。为了处理mass和其它不同情况,还特地写了模板。
结果发现成功的实现了矢量化,而且还发现模板函数以内联的方式插进去了,并没有想象中的jump或者是call。
优化是真的优化了,但是没用也是真的一点用也没有。
初始化时间复杂度O(n),完全比不上step和calc的O(n^2)时间复杂度。
3.尝试抠各种细节
(1)把内层循环中,乘除法为常量的部分拉到外层循环,只需要一次计算即可。
(2)把循环拆开,主要是指计算energy那里,energy自增和自减两部分完全不相关啊,我就拆成了两个循环。因为循环越简单,编译器越容易优化。
但是扣这些细节似乎没什么效果,好像之前光把AOS改为SOA就已经成功矢量化优化了。
我还到处添加了#pragma unroll,压根没用,去网上搜,结果人家说开-O1 -O2 -O3,这个宏就会失效,好像是因为开了优化本身就会尝试unroll。
综上,结果我一顿操作猛如虎,除了改SOA以外,其它操作耗时压根没降多少,就从原来的130ms出头,挤进130ms以内而已,甚至可以说只是误差.......