This repository provides a Triton Inference Server setup for face detection and face recognition using ONNX models with dynamic batching.
- Face Detection:
RetinaFace-MobileNetV2
(Dynamic Batch, ONNX) - Face Recognition:
MobileNetV2
(Dynamic Batch, ONNX)
docker build -t tritonserver .
docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 tritonserver:latest
docker compose up --build -d
docker compose down
to visualize and test the API endpoints.
uvicorn api.face_api:app --host 0.0.0.0 --port 8000 --reload
Open your browser and navigate to http://localhost:8000/docs
to access the FastAPI Swagger UI. Here, you can test the available endpoints for face detection and recognition.
To improve inference performance using TensorRT, enable GPU acceleration by adding the following configuration:
execution_accelerators {
gpu_execution_accelerator : [
{
name : "tensorrt"
parameters { key: "precision_mode" value: "FP16" } # Run inference in FP16 for better performance
}
]
}
✅ Upcoming Enhancements:
- Integrate TensorRT optimizations
- Improve model inference speed
Feel free to open an issue or submit a PR if you have improvements or suggestions! 🚀
This project is open-source and available under the MIT License.