Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transpose on padded arrays - unclear #44

Open
robertmaxton42 opened this issue Aug 10, 2018 · 6 comments
Open

Transpose on padded arrays - unclear #44

robertmaxton42 opened this issue Aug 10, 2018 · 6 comments

Comments

@robertmaxton42
Copy link

Related to the last issue - it's not entirely clear how to use Transpose on a padded array. After fixing my silly mistake last time, my output reads:

out[:,:,0]
array([[  0,   0,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,   0,   0,   0],
       [  6,  12,  18,  24,  30,  36,  42],
       [  8,  16,  24,  32,  40,  48,  56],
       [  6,  12,  18,  24,  30,  36,  42],
       [  0,   0,   0,   0,   0,   0,   0],
       [ 32,  64,  96, 128, 160, 192, 224],
       [ 30,  60,  90, 120, 150, 180, 210]], dtype=uint8)

For comparison, the correct result ought to be

arrgpu[:,0,:]
array([[ 1,  2,  3,  4,  5,  6,  7],
       [ 2,  4,  6,  8, 10, 12, 14],
       [ 3,  6,  9, 12, 15, 18, 21],
       [ 4,  8, 12, 16, 20, 24, 28],
       [ 5, 10, 15, 20, 25, 30, 35],
       [ 6, 12, 18, 24, 30, 36, 42],
       [ 7, 14, 21, 28, 35, 42, 49],
       [ 8, 16, 24, 32, 40, 48, 56]], dtype=uint8)

plus or minus some padding zeroes.

Now, I might just be using Transpose's new padding feature wrong - but, uh, in that case I'm not entirely sure how to use it right, so a documentation update might be in order.

Thanks!

@fjarri
Copy link
Owner

fjarri commented Aug 10, 2018

I think it is working correctly; the problem arises when you copy it to the CPU. So far I've been relying on PyCUDA/PyOpenCL to do that, but they currently have problems with non-standard offsets and strides.

@fjarri
Copy link
Owner

fjarri commented Aug 10, 2018

Also, I'm not even sure it is possible to create an array with offset in numpy.

@robertmaxton42
Copy link
Author

numpy can make padded arrays, but as far as I know it doesn't keep track of them as we'd like.

Is there any way to write __host__ code in either PyCUDA or Reikna, do you happen to know? If there is, I could try experimenting with managed memory or otherwise doing my own copying. (I could write it in pure CUDA C++, compile it, and call it with ctypes, but that would involve, yes, exploring the wonderful world of Python/C++ calling, which I have no familiarity with whatsoever as of yet... >.>)

Barring that, I suppose it'd work if I pass arrbase as a parameter, and then I can transpose the whole base array and make my own views internally. Less than elegant/intuitive from a library-user perspective, but I don't actually seriously expect anyone else to use this code, I suppose.

@robertmaxton42
Copy link
Author

robertmaxton42 commented Aug 10, 2018

... Actually, I can't do that, because there's no way to plan the creation of an array that takes a base or base_data parameter. Hm.

@fjarri
Copy link
Owner

fjarri commented Aug 12, 2018

numpy can make padded arrays, but as far as I know it doesn't keep track of them as we'd like.

Could you point me to the relevant place in the docs?

Is there any way to write host code in either PyCUDA or Reikna, do you happen to know?

I don't think PyCUDA supports it, and by extension, neither does Reikna.

If there is, I could try experimenting with managed memory or otherwise doing my own copying. (I could write it in pure CUDA C++, compile it, and call it with ctypes

Or, perhaps, cffi would be a better variant.

Barring that, I suppose it'd work if I pass arrbase as a parameter, and then I can transpose the whole base array and make my own views internally. Less than elegant/intuitive from a library-user perspective, but I don't actually seriously expect anyone else to use this code, I suppose.

As long as the padded array stays on GPU, it is processed correctly. It's only when you copy it the problems arise. The question is, what variant to prefer: remove the padding on copy and return a contiguous array on CPU, or preserve the structure and return a padded array (well, technically, both can be available, but one has to be the default)? Do you need the latter in your code?

@robertmaxton42
Copy link
Author

Can you point me to the relevant place in the docs?

Well, for example, we have np.pad, which pads an array in a variety of helpful ways but just returns a normal numpy array at the end of it.

I don't think PyCUDA supports it, and by extension, neither does Reikna.

Unfortunate.

Or, perhaps, cffi would be a better variant.

Ooh. Yes, that does look promising. Thanks.

As long as the padded array stays on GPU, it is processed correctly. It's only when you copy it the problems arise. The question is, what variant to prefer: remove the padding on copy and return a contiguous array on CPU, or preserve the structure and return a padded array (well, technically, both can be available, but one has to be the default)? Do you need the latter in your code?

Not on return, no. As long as I'm processing it I need the padding, but when I transfer it back, then for this code in particular at least I'm basically done with processing and only care about prettyprinting of one form or another.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants