Show Lab

All

82 repositories

DiffSim
Public
Official repository of DiffSim: Taming Diffusion Models for Evaluating Visual Similarity
Python
•1•10•0•0•Updated Feb 21, 2025Feb 21, 2025
Awesome-Video-Diffusion
Public
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
awesome video-editing video-understanding video-generation diffusion-models text-to-video video-restoration text-to-motion
230•4k•1•0•Updated Feb 21, 2025Feb 21, 2025
WorldGUI
Public
WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation
HTML
•3•17•1•0•Updated Feb 19, 2025Feb 19, 2025
LOVA3
Public
(NeurIPS 2024) Learning to Visual Question Answering, Asking and Assessment
benchmark visual-question-answering multimodal-deep-learning visual-question-generation multimodal-large-language-models data-asse
Python
•2•73•0•0•Updated Feb 19, 2025Feb 19, 2025
Awesome-Robotics-Diffusion
Public
(In progress) A curated list of recent robot learning papers incorporating diffusion models for robotics tasks.
2•48•0•0•Updated Feb 19, 2025Feb 19, 2025
whisperV
Public
video speech-recognition face-detection speech-to-text whisper asr
Jupyter Notebook
•0•2•0•0•Updated Feb 18, 2025Feb 18, 2025
MakeAnything
Public
Official code of "MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation"
Python
•
MIT License
•5•144•0•0•Updated Feb 16, 2025Feb 16, 2025
Impossible-Videos
Public
JavaScript
•0•1•1•0•Updated Feb 15, 2025Feb 15, 2025
ShowUI
Public
Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
agent vision-language-model vision-language-action computer-use gui-agent
Jupyter Notebook
•
Apache License 2.0
•57•1k•5•0•Updated Feb 13, 2025Feb 13, 2025
computer_use_ootb
Public
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
Python
•
Apache License 2.0
•123•1.3k•37•6•Updated Feb 12, 2025Feb 12, 2025
UniMoD
Public
The code repository of UniMoD
0•7•1•0•Updated Feb 10, 2025Feb 10, 2025
Show-o
Public
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
multimodal diffusion-models large-language-models
Python
•
Apache License 2.0
•51•1.2k•37•1•Updated Feb 10, 2025Feb 10, 2025
LayerTracer
Public
Official code of "LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer"
Python
•
MIT License
•2•31•1•0•Updated Feb 8, 2025Feb 8, 2025
Awesome-GUI-Agent
Public
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
awesome graphical-user-interface ai-assistant llm-agent gui-agents
30•500•0•0•Updated Jan 28, 2025Jan 28, 2025
Awesome-Unified-Multimodal-Models
Public
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
16•375•0•0•Updated Jan 18, 2025Jan 18, 2025
MovieSeq
Public
[ECCV2024] Learning Video Context as Interleaved Multimodal Sequences
Jupyter Notebook
•1•35•0•0•Updated Jan 18, 2025Jan 18, 2025
FQGAN
Public
FQGAN: Factorized Visual Tokenization and Generation
Python
•
Other
•0•42•0•0•Updated Jan 5, 2025Jan 5, 2025
Tune-An-Ellipse
Public
[CVPR 2024] Tune-An-Ellipse: CLIP Has Potential to Find What You Want
Python
•1•9•2•0•Updated Jan 5, 2025Jan 5, 2025
VideoLISA
Public
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Python
•
Apache License 2.0
•3•105•8•0•Updated Dec 26, 2024Dec 26, 2024
MovieBench
Public
Python
•1•38•0•0•Updated Dec 24, 2024Dec 24, 2024
Awesome-MLLM-Hallucination
Public
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
20•586•1•0•Updated Dec 23, 2024Dec 23, 2024
IDProtector
Public
The code implementation of **IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation**.
0•7•0•0•Updated Dec 16, 2024Dec 16, 2024
ROICtrl
Public
Code for ROICtrl: Boosting Instance Control for Visual Generation
Python
•0•101•1•0•Updated Dec 10, 2024Dec 10, 2024
videogui
Public
[NeurIPS2024] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
gui video-language llm-agent
JavaScript
•1•29•0•0•Updated Dec 10, 2024Dec 10, 2024
VideoSwap
Public
Code for [CVPR 2024] VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
Python
•15•380•5•0•Updated Dec 6, 2024Dec 6, 2024
Show-1
Public
[IJCV] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Python
•
Other
•61•1k•9•7•Updated Nov 15, 2024Nov 15, 2024
BoxDiff
Public
[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
text-to-image-synthesis diffusion-models
Python
•18•258•7•0•Updated Nov 12, 2024Nov 12, 2024
sparseformer
Public
(ICLR 2024, CVPR 2024) SparseFormer
computer-vision transformer efficient-neural-networks vision-transformer sparseformer
Python
•
MIT License
•2•71•1•0•Updated Nov 10, 2024Nov 10, 2024
VisInContext
Public
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
efficient in-context-learning llm mllm
Python
•2•14•1•0•Updated Oct 30, 2024Oct 30, 2024
Exo2Ego-V
Public
0•11•1•0•Updated Oct 29, 2024Oct 29, 2024