Skip to content
View WanliZhong's full-sized avatar
:octocat:
Focusing
:octocat:
Focusing
  • OpenCV China
  • SUSTech
  • 03:46 - 8h ahead

Highlights

  • Pro

Organizations

@opencv @SUSTown

Block or report WanliZhong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉

3,504 240 Updated Feb 24, 2025

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,457 255 Updated Feb 24, 2025

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 397 47 Updated Sep 11, 2024

Private Cloud Compute (PCC)

Swift 783 68 Updated Oct 24, 2024

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 980 61 Updated Feb 15, 2025

Development repository for the Triton language and compiler

MLIR 14,603 1,815 Updated Feb 25, 2025

This repository contains integer operators on GPUs for PyTorch.

Python 190 50 Updated Sep 29, 2023

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 1,346 161 Updated Jul 12, 2024

Fast and memory-efficient exact attention

Python 15,889 1,496 Updated Feb 25, 2025

✨ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows

TypeScript 81,299 61,036 Updated Feb 24, 2025

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, …

Python 5,803 966 Updated May 29, 2024

The collection of pre-trained, state-of-the-art AI models for ailia SDK

Python 2,134 339 Updated Feb 23, 2025

System for AI Education Resource.

Python 3,857 480 Updated Oct 25, 2024

Try to reproduce mediapipe with OpenCV_lite and MNN.

C++ 3 Updated Jun 16, 2024

Convert Caffe models to ONNX.

Python 59 18 Updated Nov 8, 2021

Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Python 9,163 1,487 Updated Aug 9, 2024

Step-by-step GEMM optimization tutorial on OpenCL GPU platforms

C 5 Updated Apr 25, 2024

MLX: An array framework for Apple silicon

C++ 19,280 1,096 Updated Feb 25, 2025

LLM inference in C/C++

C++ 75,275 10,879 Updated Feb 25, 2025

The programming language Ficus

C 72 9 Updated Nov 8, 2024

A fast reverse proxy to help you expose a local server behind a NAT or firewall to the internet.

Go 90,820 13,743 Updated Feb 12, 2025

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…

C++ 9,776 1,755 Updated Feb 25, 2025

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…

Python 46,745 8,027 Updated Feb 25, 2025

official pypi project for libfacedetection

Python 4 Updated Jun 4, 2023

The interactive graphing library for Python ✨

Python 16,775 2,600 Updated Feb 24, 2025

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Rust 9,411 847 Updated Feb 16, 2025

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 27,537 3,456 Updated Jul 23, 2024

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 9,099 648 Updated Feb 20, 2025
Next
Showing results