-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vision capability for bots #413
base: main
Are you sure you want to change the base?
Conversation
I also made a demo video about this feature. |
why change default port to 56069? and why comment out init message? |
Oh, I hadn’t noticed that settings.js was changed. Thanks for pointing it out! |
Currently, the This makes me wonder if it might be better to separate these responsibilities by creating a new class, such as I’ll think more about whether this approach would be better. I’d appreciate any feedback or thoughts! |
Can I test this with models from the company Mistral "pixtral-large-latest"? |
Changelog
|
Changelog
Initially, I only considered OpenAI models, but I’ve updated the implementation to support Mistral vision requests as well, as their format is slightly different. (https://docs.mistral.ai/capabilities/vision/) |
You should consider adding the vision models from Groq as well. That way free users can also test out the vision capability. |
Also Gemini. At this point it might be better to add a "vision_model" in profile.json |
@Vineethm0410 @uukelele-scratch ![]() ![]() |
this looks very promising. is it near completion? |
Yes, i think the basics are all done. Note: If |
I noticed you added |
Changelog
|
@gmuffiness Is this ready for review? |
@MaxRobinsonTheGreat Yes! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested yet, just have a small request and needs to merge with main
src/agent/library/skills.js
Outdated
@@ -1351,3 +1353,77 @@ export async function activateNearestBlock(bot, type) { | |||
log(bot, `Activated ${type} at x:${block.position.x.toFixed(1)}, y:${block.position.y.toFixed(1)}, z:${block.position.z.toFixed(1)}.`); | |||
return true; | |||
} | |||
|
|||
// export async function lookAtPlayer(agent, bot, player_name, direction) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove these comments
I’ve removed the comments! Also, I made a small change—sweaterdog had an issue during the installation process. So, I updated the way importing |
I added support for image input using GPT-4V and GPT-4o, enabling effective image interpretation.
This is an initial implementation, so I would greatly appreciate any feedback or suggestions for improvement. Thank you!
Changelog
• lookAtPlayer: Allows the bot to focus on the player’s direction or viewpoint for better understanding
• lookAtPosition: Enables the bot to focus on specific coordinates for targeted image interpretation
Known Limitations