Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add florence 2 #560

Closed
wants to merge 32 commits into from
Closed

Add florence 2 #560

wants to merge 32 commits into from

Conversation

MadeWithStone
Copy link

Description

This PR adds support for florence2 in workflows. It allows for a wide variety of computer vision tasks with prompts and finetuned loras. The node returns both raw outputs from the model, and parsed supervision detections.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

  • hosting the inference server on a gpu virtual machine (gcp ce) and connecting to the server using roboflow.com as the client

@CLAassistant
Copy link

CLAassistant commented Aug 1, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ MadeWithStone
❌ roboflowmaxwell
You have signed the CLA already but the status is still pending? Let us recheck it.

inference/models/florence2/florence2.py Outdated Show resolved Hide resolved
inference/models/florence2/florence2.py Outdated Show resolved Hide resolved
@@ -41,8 +41,10 @@ def serialise_sv_detections(detections: sv.Detections) -> dict:
detection_dict[X_KEY] = x1 + detection_dict[WIDTH_KEY] / 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nature of this change makes me worrying about integrity of the whole thing after the change.
If you don't have confidence, then output is not really sv.Detections, the same with class name being empty string. Those problems must be addressed at the block level

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, just realised that there is this from_lmm(...) constructor for sv.Detections, but still - objects produced by the block will be incompatible with other blocks due to optional property

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that different vision tasks have different information. I dont really see how this is different than a an sv detection for bounding boxes having a box vs. a detection for classification or detection for segmentation. We already support optional properties in supervision and multimodal models are bound to have these incomplete detections so i dont really see a way around it. I think all blocks should assume incompleteness and we may consider a separate method for validating supervision blocks for necessary info on the receiving end rather than generation end of a supervision detection.

]


class BlockManifest(WorkflowBlockManifest):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must be adjusted to be aligned with main - we got rid of asyncio and introduced versioning patterns. Please apply (take a look at other blocks, in doubts I can help)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: What happens when LMM output does not match expectations - any error handling would be possible / error would be raised / empty output will be yielded?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean by doesnt match expectation. Microsoft has already handled output validation and error handling so we can assume their preprocessor returns valid output data or empty data.

return await self.run_locally(
images=images, vision_task=vision_task, prompt=prompt, model_id=model_id
)
elif self._step_execution_mode is StepExecutionMode.REMOTE:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not up to date with new models that are to be supported on the platform, in priv please provide me the info about models to be hosted on the platform and the way on how we want to host them - in particular - are we going to let this model to be run only locally / do we let people train the model etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like paligemma, florence 2 requires intensive compute to generate results in a reasonable amount of time so its only supported on local/dedicated deployments. we dont have a dedicated endpoint at this time. we currently support training loras outside of roboflow and uploading to the platform (and loading through this block)

return [
OutputDefinition(name="parent_id", kind=[BATCH_OF_PARENT_ID_KIND]),
OutputDefinition(name="root_parent_id", kind=[BATCH_OF_PARENT_ID_KIND]),
OutputDefinition(name="image", kind=[BATCH_OF_IMAGE_METADATA_KIND]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this output is no longer used given sv.Detections are there under predictions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw output is the raw text from the model, structured output is the formatted dict generated by the microsoft provided output text processor, predictions is sv detections processed by our from_lmm method in supervision

@@ -313,6 +313,24 @@ def __hash__(self) -> int:
docs=DETECTION_KIND_DOCS,
)

DETECTION_KIND_DOCS = """
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain why this additional kind is needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each sv detection wont necessarily fall exactly into one of the current types i.e. missing class, class_id, etc so I created the batch of Detection kind as a catch all

Copy link
Collaborator

@PawelPeczek-Roboflow PawelPeczek-Roboflow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR to be approved:

  • integration tests to be created - as must have I would like to see how output is practically used by other block
  • there is high chance that UQL operations on detections are incompatible with sv.Detections produced as output of this step
  • deployment on hosted platform to be clarified

@PawelPeczek-Roboflow
Copy link
Collaborator

Hi there,
we need to push that forward, I need orientation to make some decisions:

  • @MadeWithStone - are you willing to continue the task
  • @probicheaux - what is the status of Florence 2 model on the platform, in particular:
    • what kind of pre-trained models we support
    • where Florence 2 could run - my brief analysis indicated that only on GPU server builds - if so, the block itself is not practically useful, even potentially harmful once random users got hyped, enter Workflows and see something is broken, without proper info on how to run
    • do we have plan on making good docs about this VLM?
    • do we have plan to onboard the model on the inference platform? what about CPU builds?

@PawelPeczek-Roboflow
Copy link
Collaborator

shall we close in favour of #661

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants