FastCV extension 3rd Post #3891

adsha-quic · 2025-02-26T05:58:18Z

Adding FastCV extensions for merge, split, gemm and arithm APIs add, subtract

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

modules/fastcv/test/test_channel.cpp

modules/fastcv/include/opencv2/fastcv/pyramid.hpp

asmorkalov · 2025-02-26T18:40:04Z

modules/fastcv/include/opencv2/fastcv/arithm.hpp

+/**
+ * @brief Matrix multiplication of two float type matrices
+ *        R = a*A*B + b*C where A,B,C,R are matrices and a,b are constants
+ *        It is optimized for Qualcomm's processors
+ * @param src1 First source matrix of type CV_32F
+ * @param src2 Second source matrix of type CV_32F with same rows as src1 cols
+ * @param dst Resulting matrix of type CV_32F
+ * @param alpha multiplying factor for src1 and src2
+ * @param src3 Optional third matrix of type CV_32F to be added to matrix product
+ * @param beta multiplying factor for src3
+ */
+CV_EXPORTS_W void gemm(InputArray src1, InputArray src2, OutputArray dst, float alpha = 1.0,
+                           InputArray src3 = noArray(), float beta = 0.0);


OpenCV HAL has GEMM options:

/** The function performs generalized matrix multiplication similar to the gemm functions in BLAS level 3: \f$D = \alpha*AB+\beta*C\f$ @param src1 pointer to input \f$M\times N\f$ matrix \f$A\f$ or \f$A^T\f$ stored in row major order. @param src1_step number of bytes between two consequent rows of matrix \f$A\f$ or \f$A^T\f$. @param src2 pointer to input \f$N\times K\f$ matrix \f$B\f$ or \f$B^T\f$ stored in row major order. @param src2_step number of bytes between two consequent rows of matrix \f$B\f$ or \f$B^T\f$. @param alpha \f$\alpha\f$ multiplier before \f$AB\f$ @param src3 pointer to input \f$M\times K\f$ matrix \f$C\f$ or \f$C^T\f$ stored in row major order. @param src3_step number of bytes between two consequent rows of matrix \f$C\f$ or \f$C^T\f$. @param beta \f$\beta\f$ multiplier before \f$C\f$ @param dst pointer to input \f$M\times K\f$ matrix \f$D\f$ stored in row major order. @param dst_step number of bytes between two consequent rows of matrix \f$D\f$. @param m number of rows in matrix \f$A\f$ or \f$A^T\f$, equals to number of rows in matrix \f$D\f$ @param n number of columns in matrix \f$A\f$ or \f$A^T\f$ @param k number of columns in matrix \f$B\f$ or \f$B^T\f$, equals to number of columns in matrix \f$D\f$ @param flags algorithm options (combination of CV_HAL_GEMM_1_T, ...). */ //! @addtogroup core_hal_interface_matrix_multiplication Matrix multiplication //! @{ inline int hal_ni_gemm32f(const float* src1, size_t src1_step, const float* src2, size_t src2_step, float alpha, const float* src3, size_t src3_step, float beta, float* dst, size_t dst_step, int m, int n, int k, int flags) { return CV_HAL_ERROR_NOT_IMPLEMENTED; } inline int hal_ni_gemm64f(const double* src1, size_t src1_step, const double* src2, size_t src2_step, double alpha, const double* src3, size_t src3_step, double beta, double* dst, size_t dst_step, int m, int n, int k, int flags) { return CV_HAL_ERROR_NOT_IMPLEMENTED; } inline int hal_ni_gemm32fc(const float* src1, size_t src1_step, const float* src2, size_t src2_step, float alpha, const float* src3, size_t src3_step, float beta, float* dst, size_t dst_step, int m, int n, int k, int flags) { return CV_HAL_ERROR_NOT_IMPLEMENTED; } inline int hal_ni_gemm64fc(const double* src1, size_t src1_step, const double* src2, size_t src2_step, double alpha, const double* src3, size_t src3_step, double beta, double* dst, size_t dst_step, int m, int n, int k, int flags) { return CV_HAL_ERROR_NOT_IMPLEMENTED; } //! @}

I propose to implement rather then add extension.

asmorkalov · 2025-02-26T18:40:35Z

modules/fastcv/include/opencv2/fastcv/channel.hpp

+/**
+ * @brief Creates one multi-channel mat out of several single-channel CV_8U mats.
+ *        Optimized for Qualcomm's processors
+ * @param mv input vector of matrices to be merged; all the matrices in mv must be of CV_8UC1 and have the same size
+ *           Note: numbers of mats can be 2,3 or 4.
+ * @param dst output array of depth CV_8U and same size as mv[0]; The number of channels
+ *            will be the total number of matrices in the matrix array
+ */
+CV_EXPORTS_W void merge(InputArrayOfArrays mv, OutputArray dst);
+
+//! @}
+
+//! @addtogroup fastcv
+//! @{
+
+/**
+ * @brief Splits an CV_8U multi-channel mat into several CV_8UC1 mats
+ *        Optimized for Qualcomm's processors
+ * @param src input 2,3 or 4 channel mat of depth CV_8U
+ * @param mv  output vector of size src.channels() of CV_8UC1 mats
+ */
+CV_EXPORTS_W void split(InputArray src, OutputArrayOfArrays mv);


The same question on HAL.

We are trying optimizations to achieve consistent perf across various targets. So for now we added it in extension

modules/fastcv/test/test_channel.cpp

… Alexander

adsha-quic added 2 commits February 26, 2025 11:23

FastCV extension 3rd Post

d3d26cb

Adding support for zeroing output buffers in FastCV sobel pyramid

dafbca2

asmorkalov added feature category: fastcv labels Feb 26, 2025

asmorkalov self-requested a review February 26, 2025 18:41

asmorkalov requested changes Feb 26, 2025

View reviewed changes

adsha-quic added 2 commits March 3, 2025 15:10

Merge branch 'opencv:4.x' into 3rdPost

423e3e1

Adding modifications as per feedback by Alexander

3c63f54

asmorkalov self-assigned this Mar 5, 2025

asmorkalov requested changes Mar 5, 2025

View reviewed changes

modules/fastcv/test/test_channel.cpp Outdated Show resolved Hide resolved

modules/fastcv/test/test_channel.cpp Outdated Show resolved Hide resolved

Adding modifications in channel tests and pyramid roi as suggested by…

15ea63c

… Alexander

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FastCV extension 3rd Post #3891

FastCV extension 3rd Post #3891

adsha-quic commented Feb 26, 2025

asmorkalov Feb 26, 2025

asmorkalov Feb 26, 2025

adsha-quic Mar 3, 2025

FastCV extension 3rd Post #3891

Are you sure you want to change the base?

FastCV extension 3rd Post #3891

Conversation

adsha-quic commented Feb 26, 2025

Pull Request Readiness Checklist

asmorkalov Feb 26, 2025

Choose a reason for hiding this comment

asmorkalov Feb 26, 2025

Choose a reason for hiding this comment

adsha-quic Mar 3, 2025

Choose a reason for hiding this comment