-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FastCV extension 3rd Post #3891
base: 4.x
Are you sure you want to change the base?
Conversation
/** | ||
* @brief Matrix multiplication of two float type matrices | ||
* R = a*A*B + b*C where A,B,C,R are matrices and a,b are constants | ||
* It is optimized for Qualcomm's processors | ||
* @param src1 First source matrix of type CV_32F | ||
* @param src2 Second source matrix of type CV_32F with same rows as src1 cols | ||
* @param dst Resulting matrix of type CV_32F | ||
* @param alpha multiplying factor for src1 and src2 | ||
* @param src3 Optional third matrix of type CV_32F to be added to matrix product | ||
* @param beta multiplying factor for src3 | ||
*/ | ||
CV_EXPORTS_W void gemm(InputArray src1, InputArray src2, OutputArray dst, float alpha = 1.0, | ||
InputArray src3 = noArray(), float beta = 0.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenCV HAL has GEMM options:
/**
The function performs generalized matrix multiplication similar to the gemm functions in BLAS level 3:
\f$D = \alpha*AB+\beta*C\f$
@param src1 pointer to input \f$M\times N\f$ matrix \f$A\f$ or \f$A^T\f$ stored in row major order.
@param src1_step number of bytes between two consequent rows of matrix \f$A\f$ or \f$A^T\f$.
@param src2 pointer to input \f$N\times K\f$ matrix \f$B\f$ or \f$B^T\f$ stored in row major order.
@param src2_step number of bytes between two consequent rows of matrix \f$B\f$ or \f$B^T\f$.
@param alpha \f$\alpha\f$ multiplier before \f$AB\f$
@param src3 pointer to input \f$M\times K\f$ matrix \f$C\f$ or \f$C^T\f$ stored in row major order.
@param src3_step number of bytes between two consequent rows of matrix \f$C\f$ or \f$C^T\f$.
@param beta \f$\beta\f$ multiplier before \f$C\f$
@param dst pointer to input \f$M\times K\f$ matrix \f$D\f$ stored in row major order.
@param dst_step number of bytes between two consequent rows of matrix \f$D\f$.
@param m number of rows in matrix \f$A\f$ or \f$A^T\f$, equals to number of rows in matrix \f$D\f$
@param n number of columns in matrix \f$A\f$ or \f$A^T\f$
@param k number of columns in matrix \f$B\f$ or \f$B^T\f$, equals to number of columns in matrix \f$D\f$
@param flags algorithm options (combination of CV_HAL_GEMM_1_T, ...).
*/
//! @addtogroup core_hal_interface_matrix_multiplication Matrix multiplication
//! @{
inline int hal_ni_gemm32f(const float* src1, size_t src1_step, const float* src2, size_t src2_step,
float alpha, const float* src3, size_t src3_step, float beta, float* dst, size_t dst_step,
int m, int n, int k, int flags) { return CV_HAL_ERROR_NOT_IMPLEMENTED; }
inline int hal_ni_gemm64f(const double* src1, size_t src1_step, const double* src2, size_t src2_step,
double alpha, const double* src3, size_t src3_step, double beta, double* dst, size_t dst_step,
int m, int n, int k, int flags) { return CV_HAL_ERROR_NOT_IMPLEMENTED; }
inline int hal_ni_gemm32fc(const float* src1, size_t src1_step, const float* src2, size_t src2_step,
float alpha, const float* src3, size_t src3_step, float beta, float* dst, size_t dst_step,
int m, int n, int k, int flags) { return CV_HAL_ERROR_NOT_IMPLEMENTED; }
inline int hal_ni_gemm64fc(const double* src1, size_t src1_step, const double* src2, size_t src2_step,
double alpha, const double* src3, size_t src3_step, double beta, double* dst, size_t dst_step,
int m, int n, int k, int flags) { return CV_HAL_ERROR_NOT_IMPLEMENTED; }
//! @}
I propose to implement rather then add extension.
/** | ||
* @brief Creates one multi-channel mat out of several single-channel CV_8U mats. | ||
* Optimized for Qualcomm's processors | ||
* @param mv input vector of matrices to be merged; all the matrices in mv must be of CV_8UC1 and have the same size | ||
* Note: numbers of mats can be 2,3 or 4. | ||
* @param dst output array of depth CV_8U and same size as mv[0]; The number of channels | ||
* will be the total number of matrices in the matrix array | ||
*/ | ||
CV_EXPORTS_W void merge(InputArrayOfArrays mv, OutputArray dst); | ||
|
||
//! @} | ||
|
||
//! @addtogroup fastcv | ||
//! @{ | ||
|
||
/** | ||
* @brief Splits an CV_8U multi-channel mat into several CV_8UC1 mats | ||
* Optimized for Qualcomm's processors | ||
* @param src input 2,3 or 4 channel mat of depth CV_8U | ||
* @param mv output vector of size src.channels() of CV_8UC1 mats | ||
*/ | ||
CV_EXPORTS_W void split(InputArray src, OutputArrayOfArrays mv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same question on HAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are trying optimizations to achieve consistent perf across various targets. So for now we added it in extension
Adding FastCV extensions for merge, split, gemm and arithm APIs add, subtract
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.