Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastCV extension 3rd Post #3891

Open
wants to merge 5 commits into
base: 4.x
Choose a base branch
from
Open

FastCV extension 3rd Post #3891

wants to merge 5 commits into from

Conversation

adsha-quic
Copy link
Contributor

Adding FastCV extensions for merge, split, gemm and arithm APIs add, subtract

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

Comment on lines +52 to +64
/**
* @brief Matrix multiplication of two float type matrices
* R = a*A*B + b*C where A,B,C,R are matrices and a,b are constants
* It is optimized for Qualcomm's processors
* @param src1 First source matrix of type CV_32F
* @param src2 Second source matrix of type CV_32F with same rows as src1 cols
* @param dst Resulting matrix of type CV_32F
* @param alpha multiplying factor for src1 and src2
* @param src3 Optional third matrix of type CV_32F to be added to matrix product
* @param beta multiplying factor for src3
*/
CV_EXPORTS_W void gemm(InputArray src1, InputArray src2, OutputArray dst, float alpha = 1.0,
InputArray src3 = noArray(), float beta = 0.0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenCV HAL has GEMM options:

/**
The function performs generalized matrix multiplication similar to the gemm functions in BLAS level 3:
\f$D = \alpha*AB+\beta*C\f$

@param src1 pointer to input \f$M\times N\f$ matrix \f$A\f$ or \f$A^T\f$ stored in row major order.
@param src1_step number of bytes between two consequent rows of matrix \f$A\f$ or \f$A^T\f$.
@param src2 pointer to input \f$N\times K\f$ matrix \f$B\f$ or \f$B^T\f$ stored in row major order.
@param src2_step number of bytes between two consequent rows of matrix \f$B\f$ or \f$B^T\f$.
@param alpha \f$\alpha\f$ multiplier before \f$AB\f$
@param src3 pointer to input \f$M\times K\f$ matrix \f$C\f$ or \f$C^T\f$ stored in row major order.
@param src3_step number of bytes between two consequent rows of matrix \f$C\f$ or \f$C^T\f$.
@param beta \f$\beta\f$ multiplier before \f$C\f$
@param dst pointer to input \f$M\times K\f$ matrix \f$D\f$ stored in row major order.
@param dst_step number of bytes between two consequent rows of matrix \f$D\f$.
@param m number of rows in matrix \f$A\f$ or \f$A^T\f$, equals to number of rows in matrix \f$D\f$
@param n number of columns in matrix \f$A\f$ or \f$A^T\f$
@param k number of columns in matrix \f$B\f$ or \f$B^T\f$, equals to number of columns in matrix \f$D\f$
@param flags algorithm options (combination of CV_HAL_GEMM_1_T, ...).
 */

//! @addtogroup core_hal_interface_matrix_multiplication Matrix multiplication
//! @{
inline int hal_ni_gemm32f(const float* src1, size_t src1_step, const float* src2, size_t src2_step,
                          float alpha, const float* src3, size_t src3_step, float beta, float* dst, size_t dst_step,
                          int m, int n, int k, int flags) { return CV_HAL_ERROR_NOT_IMPLEMENTED; }
inline int hal_ni_gemm64f(const double* src1, size_t src1_step, const double* src2, size_t src2_step,
                          double alpha, const double* src3, size_t src3_step, double beta, double* dst, size_t dst_step,
                          int m, int n, int k, int flags) { return CV_HAL_ERROR_NOT_IMPLEMENTED; }
inline int hal_ni_gemm32fc(const float* src1, size_t src1_step, const float* src2, size_t src2_step,
                          float alpha, const float* src3, size_t src3_step, float beta, float* dst, size_t dst_step,
                          int m, int n, int k, int flags) { return CV_HAL_ERROR_NOT_IMPLEMENTED; }
inline int hal_ni_gemm64fc(const double* src1, size_t src1_step, const double* src2, size_t src2_step,
                          double alpha, const double* src3, size_t src3_step, double beta, double* dst, size_t dst_step,
                          int m, int n, int k, int flags) { return CV_HAL_ERROR_NOT_IMPLEMENTED; }
//! @}

I propose to implement rather then add extension.

Comment on lines +17 to +38
/**
* @brief Creates one multi-channel mat out of several single-channel CV_8U mats.
* Optimized for Qualcomm's processors
* @param mv input vector of matrices to be merged; all the matrices in mv must be of CV_8UC1 and have the same size
* Note: numbers of mats can be 2,3 or 4.
* @param dst output array of depth CV_8U and same size as mv[0]; The number of channels
* will be the total number of matrices in the matrix array
*/
CV_EXPORTS_W void merge(InputArrayOfArrays mv, OutputArray dst);

//! @}

//! @addtogroup fastcv
//! @{

/**
* @brief Splits an CV_8U multi-channel mat into several CV_8UC1 mats
* Optimized for Qualcomm's processors
* @param src input 2,3 or 4 channel mat of depth CV_8U
* @param mv output vector of size src.channels() of CV_8UC1 mats
*/
CV_EXPORTS_W void split(InputArray src, OutputArrayOfArrays mv);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same question on HAL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying optimizations to achieve consistent perf across various targets. So for now we added it in extension

@asmorkalov asmorkalov self-assigned this Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants