Releases · mindspore-lab/mindone

We are excited to announce the official release of MindONE, a state-of-the-art repository dedicated to multi-modal understanding and content generation. Built on MindSpore 2.3.1 and optimized for Ascend NPUs, MindONE provides a comprehensive suite of algorithms and models designed to facilitate advanced content generation across various modalities, including images, audio, videos, and even 3D objects.

Key Features

diffusers support on MindSpore

We've tried to provide a completely consistent interface and usage with the huggingface/diffusers.
Only necessary changes are made to the huggingface/diffusers to make it seamless for users from torch.

- from diffusers import DiffusionPipeline
+ from mindone.diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
-    torch_dtype=torch.float16,
+    mindspore_dtype=mindspore.float16
    use_safetensors=True
)

prompt = "An astronaut riding a green horse"

images = pipe(prompt=prompt)[0][0]

Important

Due to the huggingface/diffusers is still under active development,
many features are not yet well-supported.
Currently, most functions of huggingface/diffusers v0.29.x are supported.
For details, see MindOne Diffusers.

MindSpore patch for transformers

This MindSpore patch for huggingface/Transformers enables researchers or developers
in the field of text-to-image (t2i) and text-to-video (t2v) generation to utilize pretrained text and image models
from huggingface/Transformers on MindSpore.
Only the Ascend related modules are modified. Other modules reuse the huggingface/Transformers.

The following lines of code are an example that shows you how to download and use the pretrained models. Remember that the models are from mindone.transformers, and anything else is from huggingface/Transformers.

from mindspore import Tensor
# use tokenizer from huggingface/Transformers
from transformers import AutoTokenizer
# use model from mindone.transformers
-from transformers import CLIPTextModel
+from mindone.transformers import CLIPTextModel

model = CLIPTextModel.from_pretrained("openai/clip-vit-base-patch32")
tokenizer = AutoTokenizer.from_pretrained("openai/clip-vit-base-patch32")

inputs = tokenizer(
    ["a photo of a cat", "a photo of a dog"],
    padding=True,
-    return_tensors="pt",
+    return_tensors="np"
)
-outputs = model(**inputs)
+outputs = model(Tensor(inputs.input_ids))

For details, see MindOne Transformers.

State-of-the-Art generative models

MindONE showcases various state-of-the-art generative models as examples, ensuring efficient training performance on Ascend NPUs, including:

model	features
hpcai open sora	support v1.0/1.1/1.2 large scale training with dp/sp/zero
open sora plan	support v1.0/1.1/1.2 large scale training with dp/sp/zero
stable diffusion	support sd 1.5/2.0/2.1, vanilla fine tune, lora, dreambooth, text inversion
stable diffusion xl	support sai style(stability AI) vanilla fine tune, lora, dreambooth
dit	support text to image fine tune
hunyuan_dit	support text to image fine tune
pixart_sigma	suuport text to image fine tune at different aspect ratio
latte	support uncondition text to image fine tune
animate diff	support motion module and lora training
dynamicrafter	support image to video generation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Key Features

Releases: mindspore-lab/mindone

MindONE 0.2.0

Key Features