Skip to content

Latest commit

 

History

History
99 lines (76 loc) · 2.59 KB

README.md

File metadata and controls

99 lines (76 loc) · 2.59 KB

maestro



version

Hello

maestro is a tool designed to streamline and accelerate the fine-tuning process for multimodal models. It provides ready-to-use recipes for fine-tuning popular vision-language models (VLMs) such as Florence-2, PaliGemma 2, and Qwen2.5-VL on downstream vision-language tasks.

maestro

Quickstart

Install

To get started with maestro, you’ll need to install the dependencies specific to the model you wish to fine-tune.

pip install maestro[qwen_2_5_vl]

Note: Some models may have clashing dependencies. We recommend creating a separate python environment for each model to avoid version conflicts.

CLI

maestro qwen_2_5_vl train \
  --dataset "dataset/location" \
  --epochs 10 \
  --batch-size 4 \
  --optimization_strategy "qlora" \
  --metrics "edit_distance"

Python

from maestro.trainer.models.qwen_2_5_vl.core import train

config = {
    "dataset": "dataset/location",
    "epochs": 10,
    "batch_size": 4,
    "optimization_strategy": "qlora",
    "metrics": ["edit_distance"]
}

train(config)

Contribution

We would love your help in making this repository even better! We are especially looking for contributors with experience in fine-tuning vision-language models (VLMs). If you notice any bugs or have suggestions for improvement, feel free to open an issue or submit a pull request.