Skip to content

The-Swarm-Corporation/ViTI

Repository files navigation

ViTI - Vision Transformer Inference Server

Join our Discord Subscribe on YouTube Connect on LinkedIn Follow on X.com

Build Status License GitHub stars

ViTI is a high-performance, production-grade inference server for Vision Transformer (ViT) models and other state-of-the-art architectures from the timm library. Built with FastAPI, ViTI supports dynamic model loading, concurrent image processing, and an enterprise-level cost calculation system based on model complexity, image count, and processing time.

Key Features

  • Dynamic Model Inference: Load and run any vision model from the timm library.
  • Concurrent Image Processing: Supports multiple image processing simultaneously, taking full advantage of your CPU cores.
  • Enterprise-Grade Cost Calculation: Scalable pricing based on model size, number of images, and processing time.
  • FastAPI: A modern, fast (high-performance) web framework for building APIs with Python 3.7+.
  • Loguru Logging: For robust, customizable, and readable logging throughout the inference process.
  • Pydantic: Ensures fast and accurate data validation, guaranteeing reliability in production environments.

Table of Contents

Quick Start

Follow the steps below to get started with ViTI.

Prerequisites

  • Python 3.7+
  • pip for managing Python packages
  • git to clone the repository

Clone the Repository

git clone https://github.com/The-Swarm-Corporation/ViTI.git
cd ViTI

Install Dependencies

pip install -r requirements.txt

Run the API Server

uvicorn main:app --reload --host 0.0.0.0 --port 8000

The server will now be running at http://localhost:8000.

Example cURL Request

Single image inference using resnet50:

curl -X POST "http://localhost:8000/v1/vision/inference" \
    -H "Content-Type: application/json" \
    -d '{
          "image_base64": ["iVBORw0KGgoAAAANSUhEUgAAAGQAAABkCAIAAAD/gAIDAAABBklEQVR4nO3cwQnDMBQFQcukuaSX+Jge7GOKSX1uwQsCE5gp4CGW
"],
          "model_name": "resnet50"
        }'

Multiple image inference:

curl -X POST "http://localhost:8000/v1/vision/inference/multiple" \
    -H "Content-Type: application/json" \
    -d '{
          "image_base64": ["<your_base64_encoded_image_1>", "<your_base64_encoded_image_2>"],
          "model_name": "efficientnet_b0"
        }'

Installation

You can install ViTI via pip using the provided requirements.txt or manually by installing the following:

  • FastAPI: The core web framework.
  • Uvicorn: ASGI server for high-performance API deployment.
  • Torch: For deep learning model inference.
  • Timm: Access to hundreds of pre-trained vision models.
  • Loguru: For logging.
pip3 install -r requirements.txt

API Endpoints

Health Check

  • Endpoint: /v1/health
  • Method: GET
  • Description: Check the health of the API server.
  • Response:
    {
      "status": "ok",
      "time": "2024-09-27 10:30:00"
    }

List Available Models

  • Endpoint: /v1/models
  • Method: GET
  • Description: Get a list of available models from the timm library.
  • Response:
    [
      "resnet50",
      "efficientnet_b0",
      "mobilenetv2_100"
    ]

Inference on a Single Image

  • Endpoint: /v1/vision/inference
  • Method: POST
  • Description: Run inference on a single image.
  • Request Body:
    {
      "image_base64": ["<base64_encoded_image>"],
      "model_name": "resnet50"
    }
  • Response:
    {
      "logits": [[0.45, 0.35, 0.1, 0.05, 0.05]],
      "top_5_classes": [["class_23", "class_34", "class_5", "class_101", "class_56"]],
      "cost": 0.02
    }

Inference on Multiple Images

  • Endpoint: /v1/vision/inference/multiple
  • Method: POST
  • Description: Run inference on multiple images concurrently.
  • Request Body:
    {
      "image_base64": ["<base64_encoded_image_1>", "<base64_encoded_image_2>"],
      "model_name": "efficientnet_b0"
    }
  • Response:
    {
      "logits": [[0.45, 0.35, 0.1], [0.55, 0.15, 0.12]],
      "top_5_classes": [["class_23", "class_34", "class_5"], ["class_12", "class_56", "class_78"]],
      "cost": 0.03
    }

Cost Calculation

ViTI implements a dynamic pricing model based on:

  • Number of Images: Each additional image increases the cost.
  • Model Size: Models with more parameters incur higher costs.
  • Processing Time: Longer inference times increase the cost.

Cost Formula:

cost = base_cost + (image_count * image_cost) + (processing_time * time_cost) + (log(model_size) * model_size_factor)

Where:

  • base_cost: A fixed base cost ($0.01).
  • image_cost: Cost per image ($0.01).
  • time_cost: $0.01 per second of inference time.
  • model_size_factor: A scaling factor for model size ($0.0001 * log(model size)).

Example Cost Calculation

For 2 images using efficientnet_b0:

  • Number of images: 2
  • Model size: 5.3M parameters
  • Processing time: 2 seconds
cost = 0.01 + (2 * 0.01) + (2 * 0.01) + (log(5288548) * 0.0001)
cost ≈ 0.06

Docker Support

You can run ViTI using Docker to easily deploy the inference server in a containerized environment.

Dockerfile

# Base image
FROM python:3.9-slim

# Set environment variables
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

# Install dependencies
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Expose port and run the server
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]

Build and Run the Docker Image

docker build -t viti-inference .
docker run -d -p 8000:8000 viti-inference

License

ViTI is released under the MIT License.

Contributing

We welcome contributions from the community! Please read our contributing guide for more information.


This README.md provides a comprehensive, professional, and structured overview of ViTI, aimed at both developers and enterprise users. It includes detailed sections on getting started, usage, API endpoints, cost calculations, and Docker deployment, making it ready for production-level deployments.