This repo aims to provide a set of best practices for AI infrastructures based on consumer-grade hardware. The target audience is AI engineers who want to build their own AI infrastructures.
In this repo, we use Ubuntu 24.04 as the base system. The hardware is Amazon AWS G5 instances powered by NVIDIA A10G GPUs.
Specifically, we choose AWS G5 x.2large instances. The NVIDIA A10G is close to RTX3060, and it's the proper choice for consumer-grade AI infrastructures in Amazon AWS.
- Prepare the CUDA environment in AWS G5 instances undert Ubuntu 24.04
- Build llama-cpp with CUDA
- Setup ollama
- LLM chatbot demo - hard-chat
- Build your own Copilot
- Voice to text and translation with Whisper model
AI/ML Library | Complexity | Description |
---|---|---|
PyTorch | High | General-purpose AI/ML framework for building and deploying complex models across various domains. |
tinygrad | Low | Light-weighted AI/ML framework |
llm.c | Low | pure C/CUDA implementation |