Transpose on padded arrays - unclear #44

robertmaxton42 · 2018-08-10T05:44:12Z

Related to the last issue - it's not entirely clear how to use Transpose on a padded array. After fixing my silly mistake last time, my output reads:

out[:,:,0]
array([[  0,   0,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0,   0,   0],
       [  6,  12,  18,  24,  30,  36,  42],
       [  8,  16,  24,  32,  40,  48,  56],
       [  6,  12,  18,  24,  30,  36,  42],
       [  0,   0,   0,   0,   0,   0,   0],
       [ 32,  64,  96, 128, 160, 192, 224],
       [ 30,  60,  90, 120, 150, 180, 210]], dtype=uint8)

For comparison, the correct result ought to be

arrgpu[:,0,:]
array([[ 1,  2,  3,  4,  5,  6,  7],
       [ 2,  4,  6,  8, 10, 12, 14],
       [ 3,  6,  9, 12, 15, 18, 21],
       [ 4,  8, 12, 16, 20, 24, 28],
       [ 5, 10, 15, 20, 25, 30, 35],
       [ 6, 12, 18, 24, 30, 36, 42],
       [ 7, 14, 21, 28, 35, 42, 49],
       [ 8, 16, 24, 32, 40, 48, 56]], dtype=uint8)

plus or minus some padding zeroes.

Now, I might just be using Transpose's new padding feature wrong - but, uh, in that case I'm not entirely sure how to use it right, so a documentation update might be in order.

Thanks!

The text was updated successfully, but these errors were encountered:

fjarri · 2018-08-10T07:03:35Z

I think it is working correctly; the problem arises when you copy it to the CPU. So far I've been relying on PyCUDA/PyOpenCL to do that, but they currently have problems with non-standard offsets and strides.

fjarri · 2018-08-10T07:14:02Z

Also, I'm not even sure it is possible to create an array with offset in numpy.

robertmaxton42 · 2018-08-10T19:16:17Z

numpy can make padded arrays, but as far as I know it doesn't keep track of them as we'd like.

Is there any way to write __host__ code in either PyCUDA or Reikna, do you happen to know? If there is, I could try experimenting with managed memory or otherwise doing my own copying. (I could write it in pure CUDA C++, compile it, and call it with ctypes, but that would involve, yes, exploring the wonderful world of Python/C++ calling, which I have no familiarity with whatsoever as of yet... >.>)

Barring that, I suppose it'd work if I pass arrbase as a parameter, and then I can transpose the whole base array and make my own views internally. Less than elegant/intuitive from a library-user perspective, but I don't actually seriously expect anyone else to use this code, I suppose.

robertmaxton42 · 2018-08-10T21:17:53Z

... Actually, I can't do that, because there's no way to plan the creation of an array that takes a base or base_data parameter. Hm.

fjarri · 2018-08-12T08:53:06Z

numpy can make padded arrays, but as far as I know it doesn't keep track of them as we'd like.

Could you point me to the relevant place in the docs?

Is there any way to write host code in either PyCUDA or Reikna, do you happen to know?

I don't think PyCUDA supports it, and by extension, neither does Reikna.

If there is, I could try experimenting with managed memory or otherwise doing my own copying. (I could write it in pure CUDA C++, compile it, and call it with ctypes

Or, perhaps, cffi would be a better variant.

Barring that, I suppose it'd work if I pass arrbase as a parameter, and then I can transpose the whole base array and make my own views internally. Less than elegant/intuitive from a library-user perspective, but I don't actually seriously expect anyone else to use this code, I suppose.

As long as the padded array stays on GPU, it is processed correctly. It's only when you copy it the problems arise. The question is, what variant to prefer: remove the padding on copy and return a contiguous array on CPU, or preserve the structure and return a padded array (well, technically, both can be available, but one has to be the default)? Do you need the latter in your code?

robertmaxton42 · 2018-08-12T22:50:55Z

Can you point me to the relevant place in the docs?

Well, for example, we have np.pad, which pads an array in a variety of helpful ways but just returns a normal numpy array at the end of it.

I don't think PyCUDA supports it, and by extension, neither does Reikna.

Unfortunate.

Or, perhaps, cffi would be a better variant.

Ooh. Yes, that does look promising. Thanks.

As long as the padded array stays on GPU, it is processed correctly. It's only when you copy it the problems arise. The question is, what variant to prefer: remove the padding on copy and return a contiguous array on CPU, or preserve the structure and return a padded array (well, technically, both can be available, but one has to be the default)? Do you need the latter in your code?

Not on return, no. As long as I'm processing it I need the padding, but when I transfer it back, then for this code in particular at least I'm basically done with processing and only care about prettyprinting of one form or another.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transpose on padded arrays - unclear #44

Transpose on padded arrays - unclear #44

robertmaxton42 commented Aug 10, 2018

fjarri commented Aug 10, 2018

fjarri commented Aug 10, 2018

robertmaxton42 commented Aug 10, 2018

robertmaxton42 commented Aug 10, 2018 •

edited

Loading

fjarri commented Aug 12, 2018

robertmaxton42 commented Aug 12, 2018

Transpose on padded arrays - unclear #44

Transpose on padded arrays - unclear #44

Comments

robertmaxton42 commented Aug 10, 2018

fjarri commented Aug 10, 2018

fjarri commented Aug 10, 2018

robertmaxton42 commented Aug 10, 2018

robertmaxton42 commented Aug 10, 2018 • edited Loading

fjarri commented Aug 12, 2018

robertmaxton42 commented Aug 12, 2018

robertmaxton42 commented Aug 10, 2018 •

edited

Loading