-
awesome-cuda-triton-hpc Public
🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR and High Performance Computing (HPC) projects.
-
awesome-llm-and-aigc Public
🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applica…
-
DeepGEMM Public
Forked from deepseek-ai/DeepGEMMDeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
-
FlashMLA Public
Forked from deepseek-ai/FlashMLAFlashMLA: Efficient MLA Decoding Kernel for Hopper GPUs
-
awesome-deepseek-integration Public
Forked from deepseek-ai/awesome-deepseek-integration -
-
🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.
-
VLM-R1 Public
Forked from om-ai-lab/VLM-R1Solve Visual Understanding with Reinforced VLMs
-
X-AnyLabeling Public
Forked from CVHub520/X-AnyLabelingEffortless data labeling with AI support from Segment Anything and other awesome models.
-
edgeyolo Public
Forked from LSH9832/edgeyoloan edge-real-time anchor-free object detector with decent performance
-
TensorRT-Model-Optimizer Public
Forked from NVIDIA/TensorRT-Model-OptimizerTensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream d…
-
NuMojo Public
Forked from Mojo-Numerics-and-Algorithms-group/NuMojoNuMojo is a library for numerical computing in Mojo 🔥 similar to numpy in Python.
-
OpenSeek Public
Forked from FlagAI-Open/OpenSeekOpenSeek aims to unite the global open source community to drive collaborative innovation in algorithms, data and systems to develop next-generation models that surpass DeepSeek.
1 UpdatedFeb 14, 2025 -
ktransformers Public
Forked from kvcache-ai/ktransformersA Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
-
minimind Public
Forked from jingyaogong/minimind🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
-
deepscaler Public
Forked from agentica-project/deepscalerDemocratizing Reinforcement Learning for LLMs
-
minimind-v Public
Forked from jingyaogong/minimind-v🚀 「大模型」3小时从0训练27M参数的视觉多模态VLM!🌏 Train a 27M-parameter VLM from scratch in just 3 hours!
-
unsloth Public
Forked from unslothai/unslothFinetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory
-
TensorRT-YOLO Public
Forked from laugh12321/TensorRT-YOLO🚀 Easier & Faster YOLO Deployment Toolkit for NVIDIA 🛠️
-
CUDA-Learn-Notes Public
Forked from DefTruth/CUDA-Learn-Notes📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
-
maestro Public
Forked from roboflow/maestrostreamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL
-
ultralyticsPro Public
Forked from iscyy/ultralyticsPro🔥🔥🔥 专注于YOLO11,YOLOv8、YOLOv10、RT-DETR、YOLOv7、YOLOv5改进模型,Support to improve backbone, neck, head, loss, IoU, NMS and other modules🚀
-
-
LLaMA-Factory Public
Forked from hiyouga/LLaMA-FactoryUnified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
-
R1-V Public
Forked from Deep-Agent/R1-VWitness the aha moment of VLM with less than $3.
-
a-hamdi-cuda Public
Forked from a-hamdi/GPU100 days of building Cuda kernels!
-
TinyZero Public
Forked from Jiayi-Pan/TinyZeroClean, accessible reproduction of DeepSeek R1-Zero
-
llama-cpp-python Public
Forked from abetlen/llama-cpp-pythonPython bindings for llama.cpp
-
open-r1 Public
Forked from huggingface/open-r1Fully open reproduction of DeepSeek-R1
-
tilelang Public
Forked from tile-ai/tilelangDomain-specific language designed to streamline the development of high-performance GPU/CPU kernels