GPU Version #21

expd · 2015-11-03T12:47:14Z

Any plans on having a GPU accelerated version of the algorithm?

pablofdezalc · 2015-11-03T20:34:07Z

That is something I would love to do. But due to time constraints, I do not think that will happen soon unless another person contributes the GPU implementation. However, by the end of the year there will be some improvements in speed regarding the CPU version.

jhhsia · 2015-11-03T21:06:53Z

Hi Pablo,

 What are the areas of AKAZE  you think GPU would contribute most?

Cheers

pablofdezalc · 2015-11-04T23:39:31Z

I think GPU will help a lot to speed up the nonlinear scale space computation, as well as keypoint selection and descriptor computation.

Celebrandil · 2015-11-08T19:51:54Z

I might attempt a GPU implementation, together with Alessandro Pieropan, but that depends on whether we'll find time for it. Most things are relatively easy to parallelize, even if there are some exceptions. Preferably, we would like an implementation to results in exactly the same features as the original implementation. It's relatively easy to port the code to a GPU, but to exploit the full power of a GPU, without short-cuts, is way harder.

pablofdezalc · 2015-11-09T21:45:30Z

A GPU implementation will be highly appreciated and really interesting for the community. My main problem these days about not doing that is really lack of time, and that I still have some CPU optimizations that need to be integrated. However, if someone is interested in taking the lead for the GPU implementation I can help testing that the performance matches the CPU version.

Celebrandil · 2015-11-11T23:03:02Z

To speed up the CPU version, you should try to focus on the memory footprint and access patterns. For example, instead of first creating the pyramid and then detecting features, you better interleave the two. If you have something in CPU cache, you better complete as many operations as possible, before data is eventually stored into main memory. Also try to reduce the memory buffers, possibly by using different names to make the code readable, but still share the same physical memory.

jhhsia · 2015-11-25T19:15:22Z

Hi Pablo,

While working through AKAZE, there seem to be a simple optimization in the Compute_Multiscale_Derivatives area. The original code have the following:

evolution_[i].Lx = evolution_[i].Lx_((sigma_size_));
evolution_[i].Ly = evolution_[i].Ly_((sigma_size_));
evolution_[i].Lxx = evolution_[i].Lxx_((sigma_size_) * (sigma_size_));
evolution_[i].Lxy = evolution_[i].Lxy_((sigma_size_) * (sigma_size_));
evolution_[i].Lyy = evolution_[i].Lyy*((sigma_size_) * (sigma_size_));

First off , it seems Lx and Ly will not be used again after this, so we can just take them out. Also, we can multiply sigma_size_^2 during the determine phase:

sigma_size_quad = sigma_size_^4;

for (int ix = 0; ix < evo[i].Ldet.rows; ix++) {
const float* lxx = evo[i].Lxx.ptr(ix);
const float* lxy = evo[i].Lxy.ptr(ix);
const float* lyy = evo[i].Lyy.ptr(ix);
float* ldet = evo[i].Ldet.ptr(ix);
for (int jx = 0; jx < evo[i].Ldet.cols; jx++)
ldet[jx] = (lxx[jx]_lyy[jx]-lxy[jx]_lxy[jx])*sigma_size_quad;
}

so we can pretty much remove the whole 5 scaling off.
I have only ran against few test images and the result seems to be identical. Do them seem to be correct?

Thanks!

pablofdezalc · 2015-11-26T19:28:32Z

Hi Jay Hsia,

Yes, the modifications that you are suggesting work fine. I will incorporate those changes, since it will speed up a bit.

Regards,
Pablo

Celebrandil · 2015-11-30T10:32:38Z

On a Tesla K40c I can compute the scale space in about 6.7 ms, excluding GPU-to-CPU transfers. The sequential nature of the computations makes it hard to limit the number of kernel calls, which leads to a considerable overhead for finer scales. Preferably, one would like to work on all scales of each octave in parallel, since the resolution on smaller images is too low for parallelism spatially to be enough.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Version #21

GPU Version #21

expd commented Nov 3, 2015

pablofdezalc commented Nov 3, 2015

jhhsia commented Nov 3, 2015

pablofdezalc commented Nov 4, 2015

Celebrandil commented Nov 8, 2015

pablofdezalc commented Nov 9, 2015

Celebrandil commented Nov 11, 2015

jhhsia commented Nov 25, 2015

pablofdezalc commented Nov 26, 2015

Celebrandil commented Nov 30, 2015

GPU Version #21

GPU Version #21

Comments

expd commented Nov 3, 2015

pablofdezalc commented Nov 3, 2015

jhhsia commented Nov 3, 2015

pablofdezalc commented Nov 4, 2015

Celebrandil commented Nov 8, 2015

pablofdezalc commented Nov 9, 2015

Celebrandil commented Nov 11, 2015

jhhsia commented Nov 25, 2015

pablofdezalc commented Nov 26, 2015

Celebrandil commented Nov 30, 2015