More data types supported in cv::cuda::transpose() #3371

chacha21 · 2022-11-09T20:53:32Z

Fixes opencv/opencv#22782

proposal for opencv/opencv#22782

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

proposal for #22782

alalek · 2022-11-09T23:45:37Z

modules/cudaarithm/src/cuda/transpose.cu


-    CV_Assert( elemSize == 1 || elemSize == 4 || elemSize == 8 );
+    //CV_Assert( elemSize == 1 || elemSize == 4 || elemSize == 8 );


What if elemSize == 5?

What kind of error message we would show to the user?

It is handled by

... else CV_Error(Error::StsUnsupportedFormat, "");

alalek · 2022-11-09T23:52:56Z

modules/cudaarithm/src/cuda/transpose.cu

+    if (src.empty())
+      dst.release();


Empty input should be a error as nobody want to process "nothing" in real use cases: opencv/opencv#8300

Ok, I thought it was better since Allow empty matrices in most functions (personnaly I prefer a no-op rather than an exception)

I will also add a fastpath for size == (1, 1), where transpose is just a copy

cudawarped · 2022-11-10T09:27:22Z

modules/cudaarithm/src/cuda/transpose.cu

+      (srcType == CV_32SC1) || (srcType == CV_32SC3) || (srcType == CV_32SC4) ||
+      (srcType == CV_32FC1) || (srcType == CV_32FC3) || (srcType == CV_32FC4);
+    const bool isElemSizeSupportedByNppi =
+      ((elemSize != 0) && !(elemSize%1) && ((elemSize/1)<=4)) ||


Apologies if I have completely misunderstood the logic but
Isn't elemSize%1 == 0 always?
When can elemSize == 0?
Isn't this already taken care of by
(srcType == CV_8UC1) || (srcType == CV_8UC3) || (srcType == CV_8UC4) || (srcType == CV_16UC1)

You are right, but it was on purpose for code clarity/readibility
elemSize%1 and elemSize/1 will be optimized out by the compiler, this is just to balance with the 2, 4, 8 cases

Ok to get rid of the (elemSize != 0) check

For clarity I would use elemSize1 and channels?
For what CV data type will isNppiNativelySupported == false and isElemSizeSupportedByNppi == true or isElemSizeSupportedByGridTranspose == true?

There is overlap between isNppiNativelySupported, isElemSizeSupportedByNppi and isElemSizeSupportedByGridTranspose
The idea is that if anything goes wrong, they can be disabled independently.
isNppiNativelySupported means that the data type is directly mapped to an nppiTranspose call
isElemSizeSupportedByNppi means that we can cheat on the real data type to use another data type of the same size (assuming that NPP is only using memory copies, otherwise it would not work)
isElemSizeSupportedByGridTranspose is the fallback if we don't rely on the NPP implementation

For instance, 16UC2 is not supported by NPP but can be mapped to 32SC1 (isNppiNativelySupported == false and isElemSizeSupportedByNppi == true)

Is there a complete overlap, i.e. is isElemSizeSupportedByNppi and isElemSizeSupportedByGridTranspose redundant unless
isNppiNativelySupported is manually set to false?

Is there a complete overlap, i.e. is isElemSizeSupportedByNppi and isElemSizeSupportedByGridTranspose redundant unless
isNppiNativelySupported is manually set to false?

There are cases where (isNppiNativelySupported == false) and (isElemSizeSupportedByNppi== true) (e.g. 16UC2)

But there are currently no cases where (isElemSizeSupportedByNppi== false) and (isElemSizeSupportedByGridTranspose == true). This is a fallback implementation if :

a (future) opencv global option disables NPP

performance tests reveals that some NPP calls are slower than gridTranspose(). Specific cases could be excluded from isElemSizeSupportedByNppi.

Is 16UC2 is handled by the below block?

else if (!(elemSize%2) && ((elemSize/2)==2)) nppSafeCall( nppiTranspose_8u_C4R(src.ptr<Npp8u>(), static_cast<int>(src.step), dst.ptr<Npp8u>(), static_cast<int>(dst.step), sz) );

If so wouldn't it be better to have this under

else if (srcType == CV_16UC2)

Exactly, this is not a native nppi Call (isNppiNativelySupported == false) but it can be handled by cheating on the data type (isElemSizeSupportedByNppi == true)
That's why the tests inside if (isElemSizeSupportedByNppi) is explicitely focusing on elemSize rather than srcType

I'm not sure if this makes it easier or harder for the next person who looks at it to see whats going on, what do you think @alalek ?

Either way I think having overlap between both makes it difficult to understand, maybe if the "cheat" only included the cases where it was applicable it would be more obvious?

cudawarped · 2022-11-10T09:38:04Z

Would it be possible to incorportate the newer stream api for newer versions of CUDA aswell?
See #3338

Fixed a typo in gridTranspose() usage Added fast path for single cell/row/col matrix Throw error for empty matrix instead of no-op Code style

chacha21 · 2022-11-10T10:15:50Z

Would it be possible to incorportate the newer stream api for newer versions of CUDA aswell?
See #3338

I am currently based on 4.x branch, usage of _Ctx seems to be in master.
I would prefer deferring that work to another PR

cudawarped · 2022-11-10T11:12:15Z

modules/cudaarithm/src/cuda/transpose.cu

@@ -201,6 +199,9 @@ void cv::cuda::transpose(InputArray _src, OutputArray _dst, Stream& stream)
      else if (!(elemSize%8) && ((elemSize/8)==2))
        nppSafeCall( nppiTranspose_32f_C4R(src.ptr<Npp32f>(), static_cast<int>(src.step),
          dst.ptr<Npp32f>(), static_cast<int>(dst.step), sz) );
+
+      if (!stream)


Should this be applied to gridTranspose() as well, I can't understand why it was previously missing, surely the result could have been in flight on returns from this function when the default stream is passed?

I have the opposite question : why is it ever needed ? I let them here, but I don't understand the purpose of this extra synchronization when the default stream (sync by default) is used.

Kernel launches are asynchronous with respect. The default stream syncs with other streams by default in legacy mode.

My interpretation is that the OpenCV API works on the assumption that if a stream isn't passed the user wants synchronization.

Do you mean that now that exist both cudaStreamPerThread and cudaStreamLegacy, people using OpenCV always expect the cudaStreamLegacy behaviour with the default stream, thus requiring the cudaDeviceSynchronize() ?
If so, I agree.

No users of the api will expect functions which they don't pass a stream to, to be synchronous with respect to the host when they return. This would not be the case with either cudaStreamPerThread or cudaStreamLegacy as the kernel launch is asynchronous in both cases.

In the example from the docs

1) k_1<<<1, 1, 0, s>>>(); 2) k_2<<<1, 1>>>(); 3) k_3<<<1, 1, 0, s>>>() 4) ...

k_2 waits on k_1 because k_2 is in the legacy default stream, then k_3 waits on the legacy stream. Because of the specific way this has been set up k_1 and k_2 have finished executing before the call to k_3<<<1, 1, 0, s>>>() however the result from k_3 may still be in flight after control has returned to the host when you reach line 4.

Now I haven't used per thread default streams (I always use explicit streams) but my understanding is that if the CUDA_API_PER_THREAD_DEFAULT_STREAM macro was used to enable per thread default streams k_1 would run before k_3 but both would be asynchronous with respect to k_2. Either way when control returns to the host on line 4 they may still all be in flight.

On the other hand if we have the following which is our case if no stream is passed

1) k_1<<<1, 1>>>(); 2) k_2<<<1, 1>>>(); 3) k_3<<<1, 1>>>(); 4)

in either case cudaStreamPerThread or cudaStreamLegacy all three kernels may still be in flight when control returns to the host on line 4 if we don't explicitly syncronize.

Ok, anyway, I just checked and gridTranspose() does ultimately call cudaDeviceSynchronize()for null stream (see https://github.com/opencv/opencv_contrib/blob/4.x/modules/cudev/include/opencv2/cudev/grid/detail/transpose.hpp)

@cudawarped Do you have open issues with the PR. I want to merge, if you do not mind.

@asmorkalov I am not 100% convinced that isNppiNativelySupported and isElemSizeSupportedByNppi flags are needed. I can understand that @chacha21 wants a fall back option in case something goes wrong and because of that wants to seperate out the two logic paths but I am not sure if the redundant calls under isElemSizeSupportedByNppi make the function harder to maintain, what do you think?

Additionally elemSize1 is unused, the isElemSizeSupportedByNppi logic path won't be fully tested and due to the names of the nppi functions it may make more sense to just use the bit size instead of examining elemSize, i.e.

else if (!(elemSize%2) && ((elemSize/2)==2)) nppSafeCall( nppiTranspose_8u_C4R(src.ptr<Npp8u>(), static_cast<int>(src.step), dst.ptr<Npp8u>(), static_cast<int>(dst.step), sz) );

could be

else if (elemSize==4) nppSafeCall( nppiTranspose_8u_C4R(src.ptr<Npp8u>(), static_cast<int>(src.step), dst.ptr<Npp8u>(), static_cast<int>(dst.step), sz) );

Whatever the decision, I will certainly agree : I explained my initial code structure, but I am OK to adapt for a more "OpenCV style"

Conditions like if (!(elemSize%2) && ((elemSize/2)==2)) looks very cryptic. I understand the logic behind it, but if (elemSize==4) definitely more readable.

The only reason for fallback I see is 2 channel matrix. All other cases are handled with regular types. I propose to merge "native" support and just "support" in single case and use relevant Npp call. No duplicated/dead branches, mor obvious testing.

if (rows == 1), the stride has no importance for the reshape()

asmorkalov · 2022-12-14T12:43:03Z

modules/cudaarithm/src/cuda/transpose.cu

@@ -201,6 +199,9 @@ void cv::cuda::transpose(InputArray _src, OutputArray _dst, Stream& stream)
      else if (!(elemSize%8) && ((elemSize/8)==2))
        nppSafeCall( nppiTranspose_32f_C4R(src.ptr<Npp32f>(), static_cast<int>(src.step),
          dst.ptr<Npp32f>(), static_cast<int>(dst.step), sz) );
+
+      if (!stream)


@cudawarped Do you have open issues with the PR. I want to merge, if you do not mind.

chacha21 · 2022-12-14T16:12:37Z

I could not test performance with other GPUs than mine (an old GTX 750). I am pretty confident, but it would be good if someone could have a look with more modern ones.

asmorkalov · 2022-12-15T08:40:06Z

Transpose perf test results for GF 1080 with CUDA 10.2, Ubuntu 18.04:

Geometric mean (ms)

             Name of Test               before     after     after   
                                       transpose transpose transpose 
                                                               vs    
                                                             before  
                                                           transpose 
                                                           (x-factor)
Transpose::Sz_Type::(1280x720, 8UC1)     0.024     0.024      1.00   
Transpose::Sz_Type::(1280x720, 32SC1)    0.050     0.037      1.36   
Transpose::Sz_Type::(1280x720, 64FC1)    0.072     0.073      0.99   
Transpose::Sz_Type::(1280x720, 16UC2)    0.050     0.037      1.36   
Transpose::Sz_Type::(1280x720, 16SC2)    0.051     0.037      1.36   
Transpose::Sz_Type::(1280x720, 32SC2)    0.072     0.073      0.98   
Transpose::Sz_Type::(1280x720, 8UC4)     0.050     0.037      1.36   
Transpose::Sz_Type::(1280x1024, 8UC1)    0.034     0.034      1.00   
Transpose::Sz_Type::(1280x1024, 32SC1)   0.076     0.050      1.53   
Transpose::Sz_Type::(1280x1024, 64FC1)   0.101     0.096      1.06   
Transpose::Sz_Type::(1280x1024, 16UC2)   0.076     0.050      1.53   
Transpose::Sz_Type::(1280x1024, 16SC2)   0.076     0.049      1.53   
Transpose::Sz_Type::(1280x1024, 32SC2)   0.101     0.095      1.06   
Transpose::Sz_Type::(1280x1024, 8UC4)    0.076     0.049      1.54   
Transpose::Sz_Type::(1920x1080, 8UC1)    0.050     0.050      1.00   
Transpose::Sz_Type::(1920x1080, 32SC1)   0.124     0.082      1.51   
Transpose::Sz_Type::(1920x1080, 64FC1)   0.170     0.169      1.01   
Transpose::Sz_Type::(1920x1080, 16UC2)   0.124     0.082      1.51   
Transpose::Sz_Type::(1920x1080, 16SC2)   0.124     0.082      1.50   
Transpose::Sz_Type::(1920x1080, 32SC2)   0.170     0.169      1.00   
Transpose::Sz_Type::(1920x1080, 8UC4)    0.124     0.082      1.51

The execution time is very small. Not sure if, it's real speedup or some system side effect. In any case, the PR does not introduce regressions.

more obvious elemSize test logic to dispatch to the right nppiTranspose variant added modern NPP _ctx stream support

asmorkalov · 2022-12-22T12:35:22Z

modules/cudaarithm/src/cuda/transpose.cu

+          else if (elemSize == 1)
+            nppSafeCall( nppiTranspose_8u_C1R(src.ptr<Npp8u>(), static_cast<int>(src.step),
+              dst.ptr<Npp8u>(), static_cast<int>(dst.step), sz) );
+          else if (elemSize == 2)
+            nppSafeCall( nppiTranspose_16u_C1R(src.ptr<Npp16u>(), static_cast<int>(src.step),
+              dst.ptr<Npp16u>(), static_cast<int>(dst.step), sz) );
+          else if (elemSize == 3)
+            nppSafeCall( nppiTranspose_8u_C3R(src.ptr<Npp8u>(), static_cast<int>(src.step),
+              dst.ptr<Npp8u>(), static_cast<int>(dst.step), sz) );
+          else if (elemSize == 4)
+            nppSafeCall( nppiTranspose_32s_C1R(src.ptr<Npp32s>(), static_cast<int>(src.step),
+              dst.ptr<Npp32s>(), static_cast<int>(dst.step), sz) );
+          else if (elemSize == 6)
+            nppSafeCall( nppiTranspose_16u_C3R(src.ptr<Npp16u>(), static_cast<int>(src.step),
+              dst.ptr<Npp16u>(), static_cast<int>(dst.step), sz) );
+          else if (elemSize == 8)
+            nppSafeCall( nppiTranspose_16u_C4R(src.ptr<Npp16u>(), static_cast<int>(src.step),
+              dst.ptr<Npp16u>(), static_cast<int>(dst.step), sz) );
+          else if (elemSize == 12)
+            nppSafeCall( nppiTranspose_32s_C3R(src.ptr<Npp32s>(), static_cast<int>(src.step),
+              dst.ptr<Npp32s>(), static_cast<int>(dst.step), sz) );
+          else if (elemSize == 16)
+            nppSafeCall( nppiTranspose_32s_C4R(src.ptr<Npp32s>(), static_cast<int>(src.step),
+              dst.ptr<Npp32s>(), static_cast<int>(dst.step), sz) );


The interpretation block does the transpose second time, if elemSize duplicates some of srcType

The interpretation block does the transpose second time, if elemSize duplicates some of srcType

Well... no ? This is still an "else if"

asmorkalov · 2022-12-22T12:37:25Z

modules/cudaarithm/src/cuda/transpose.cu

+          else if (elemSize == 1)
+            nppSafeCall( nppiTranspose_8u_C1R_Ctx(src.ptr<Npp8u>(), static_cast<int>(src.step),
+              dst.ptr<Npp8u>(), static_cast<int>(dst.step), sz, ctx) );
+          else if (elemSize == 2)
+            nppSafeCall( nppiTranspose_16u_C1R_Ctx(src.ptr<Npp16u>(), static_cast<int>(src.step),
+              dst.ptr<Npp16u>(), static_cast<int>(dst.step), sz, ctx) );
+          else if (elemSize == 3)
+            nppSafeCall( nppiTranspose_8u_C3R_Ctx(src.ptr<Npp8u>(), static_cast<int>(src.step),
+              dst.ptr<Npp8u>(), static_cast<int>(dst.step), sz, ctx) );
+          else if (elemSize == 4)
+            nppSafeCall( nppiTranspose_32s_C1R_Ctx(src.ptr<Npp32s>(), static_cast<int>(src.step),
+              dst.ptr<Npp32s>(), static_cast<int>(dst.step), sz, ctx) );
+          else if (elemSize == 6)
+            nppSafeCall( nppiTranspose_16u_C3R_Ctx(src.ptr<Npp16u>(), static_cast<int>(src.step),
+              dst.ptr<Npp16u>(), static_cast<int>(dst.step), sz, ctx) );
+          else if (elemSize == 8)
+            nppSafeCall( nppiTranspose_16u_C4R_Ctx(src.ptr<Npp16u>(), static_cast<int>(src.step),
+              dst.ptr<Npp16u>(), static_cast<int>(dst.step), sz, ctx) );
+          else if (elemSize == 12)
+            nppSafeCall( nppiTranspose_32s_C3R_Ctx(src.ptr<Npp32s>(), static_cast<int>(src.step),
+              dst.ptr<Npp32s>(), static_cast<int>(dst.step), sz, ctx) );
+          else if (elemSize == 16)
+            nppSafeCall( nppiTranspose_32s_C4R_Ctx(src.ptr<Npp32s>(), static_cast<int>(src.step),
+              dst.ptr<Npp32s>(), static_cast<int>(dst.step), sz, ctx) );


The same here.

asmorkalov · 2022-12-22T12:39:16Z

modules/cudaarithm/src/cuda/transpose.cu


        if (!stream)
-            CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );
-    }
+          CV_CUDEV_SAFE_CALL( cudaDeviceSynchronize() );


Syncronious NPP call is used if !stream. No need to sync again. Right?

asmorkalov · 2022-12-22T12:39:32Z

modules/cudaarithm/src/cuda/transpose.cu

+    if (elemSize == 1)
+      gridTranspose(globPtr<unsigned char>(src), globPtr<unsigned char>(dst), stream);
+    else if (elemSize == 2)
+      gridTranspose(globPtr<unsigned short>(src), globPtr<unsigned short>(dst), stream);
    else if (elemSize == 4)
-    {
-        gridTranspose(globPtr<int>(src), globPtr<int>(dst), stream);
-    }
-    else // if (elemSize == 8)
-    {
-        gridTranspose(globPtr<double>(src), globPtr<double>(dst), stream);
-    }
+      gridTranspose(globPtr<signed int>(src), globPtr<signed int>(dst), stream);
+    else if (elemSize == 8)
+      gridTranspose(globPtr<double>(src), globPtr<double>(dst), stream);
+    */


Please remove dead code.

cudawarped · 2022-12-24T17:42:21Z

@chacha21 If you use the new stream ctx api you should also include the old api for compatibility with pre CUDA 10.1, see #3338.

but I am OK to adapt for a more "OpenCV style"

I was thinking about this and a more "OpenCV style" would be with a table (e.g. warp.cpp) to reduce the code redundancy.

cudawarped · 2023-01-05T10:52:27Z

modules/cudaarithm/src/cuda/transpose.cu

+        #endif
+        cudaStream_t _stream = StreamAccessor::getStream(stream);
+
+        if (!_stream || !useNppStreamCtx)


Would it be better to always use the old api when CUDA SDK < 10.1 and the newer one otherwise regardless of whether there is a stream or not? That way it should be easier to carve out the older code when it is depreciated in newer CUDA versions.

See https://github.com/cudawarped/opencv_contrib/blob/e40c43d96a22edaa3fbb880e957a3753938dc4f0/modules/cudaarithm/src/cuda/transpose.cu

cudawarped · 2023-01-05T10:54:11Z

modules/cudaarithm/src/cuda/transpose.cu

-void cv::cuda::transpose(InputArray _src, OutputArray _dst, Stream& stream)
-{
-    GpuMat src = getInputMat(_src, stream);
+#define USE_NPP_STREAM_CONTEXT (NPP_VERSION >= (10 * 1000 + 1 * 100 + 0))


I think it would be better to define this globally somewhere although not sure where, maybe modules/cudev/include/opencv2/cudev/common.hpp.

cudawarped · 2023-01-05T11:00:34Z

@chacha21 I like the changes apart from my comments. For reference I was thinking the modification could look like
cudawarped@e40c43d

I would wait for @asmorkalov to take a look before making any more updates as he will ultimately decide on the required format.

cudawarped · 2023-01-05T11:02:28Z

modules/cudaarithm/src/cuda/transpose.cu


-    syncOutput(dst, _dst, stream);
+    const int srcType = src.type();


Is this redundant?

cudawarped · 2023-01-05T11:03:18Z

modules/cudaarithm/src/cuda/transpose.cu

+      (elemSize == 1) || (elemSize == 2) || (elemSize == 3) || (elemSize == 4) ||
+      (elemSize == 6) || (elemSize == 8) || (elemSize == 12) || (elemSize == 16);
+
+    if (!isSupported)


Would CV_Assert() be better here?

More data types supported in cv::cuda::transpose()

51180e9

proposal for #22782

alalek mentioned this pull request Nov 9, 2022

cv::cuda::transpose() limitation opencv/opencv#22782

Open

4 tasks

alalek reviewed Nov 9, 2022

View reviewed changes

cudawarped reviewed Nov 10, 2022

View reviewed changes

added fast paths

be2dd8d

Fixed a typo in gridTranspose() usage Added fast path for single cell/row/col matrix Throw error for empty matrix instead of no-op Code style

missing cudaDeviceSynchronize()

9e3eb3d

cudawarped reviewed Nov 10, 2022

View reviewed changes

chacha21 added 2 commits November 11, 2022 21:16

split (rows == 1) and (cols == 1) cases

00abef3

if (rows == 1), the stride has no importance for the reshape()

code clarity

e5b152e

asmorkalov approved these changes Dec 14, 2022

View reviewed changes

alalek assigned asmorkalov Dec 19, 2022

updated according to suggestions

4f0e470

more obvious elemSize test logic to dispatch to the right nppiTranspose variant added modern NPP _ctx stream support

asmorkalov reviewed Dec 22, 2022

View reviewed changes

removed useless and dead code

43a1691

chacha21 added 2 commits January 3, 2023 10:10

support legacy npp stream handling

29cb16a

Style : use tables of functions

3672694

cudawarped reviewed Jan 5, 2023

View reviewed changes

modules/cudaarithm/src/cuda/transpose.cu

syncOutput(dst, _dst, stream);

const int srcType = src.type();

Copy link

Contributor

cudawarped Jan 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this redundant?

cudawarped reviewed Jan 5, 2023

View reviewed changes

cudawarped mentioned this pull request Oct 17, 2024

the cv::cuda::transpose crash when CV_32F rows exceed 1048560 opencv/opencv#26317

Open

4 tasks


		CV_Assert( elemSize == 1 \|\| elemSize == 4 \|\| elemSize == 8 );
		//CV_Assert( elemSize == 1 \|\| elemSize == 4 \|\| elemSize == 8 );


		syncOutput(dst, _dst, stream);
		const int srcType = src.type();

More data types supported in cv::cuda::transpose() #3371

Are you sure you want to change the base?

More data types supported in cv::cuda::transpose() #3371

Conversation

chacha21 commented Nov 9, 2022 • edited by alalek Loading

Pull Request Readiness Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cudawarped Nov 10, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cudawarped Nov 10, 2022 • edited Loading

Choose a reason for hiding this comment

chacha21 Nov 10, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chacha21 Nov 10, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chacha21 Nov 10, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cudawarped commented Nov 10, 2022 • edited Loading

chacha21 commented Nov 10, 2022 • edited Loading

cudawarped Nov 10, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cudawarped Nov 10, 2022 • edited Loading

Choose a reason for hiding this comment

chacha21 Nov 10, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chacha21 Nov 10, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cudawarped Dec 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chacha21 commented Dec 14, 2022

asmorkalov commented Dec 15, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cudawarped commented Dec 24, 2022 • edited Loading

cudawarped Jan 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cudawarped commented Jan 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chacha21 commented Nov 9, 2022 •

edited by alalek

Loading

cudawarped Nov 10, 2022 •

edited

Loading

cudawarped Nov 10, 2022 •

edited

Loading

chacha21 Nov 10, 2022 •

edited

Loading

chacha21 Nov 10, 2022 •

edited

Loading

chacha21 Nov 10, 2022 •

edited

Loading

cudawarped commented Nov 10, 2022 •

edited

Loading

chacha21 commented Nov 10, 2022 •

edited

Loading

cudawarped Nov 10, 2022 •

edited

Loading

cudawarped Nov 10, 2022 •

edited

Loading

chacha21 Nov 10, 2022 •

edited

Loading

chacha21 Nov 10, 2022 •

edited

Loading

cudawarped Dec 14, 2022 •

edited

Loading

cudawarped commented Dec 24, 2022 •

edited

Loading

cudawarped Jan 5, 2023 •

edited

Loading

cudawarped commented Jan 5, 2023 •

edited

Loading