CircleRadon/Osprey
The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
Language: Python
#mllm #pixel_understanding #sam #visual_instruction_tuning
Stars: 200 Issues: 1 Forks: 6
https://github.com/CircleRadon/Osprey
The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
Language: Python
#mllm #pixel_understanding #sam #visual_instruction_tuning
Stars: 200 Issues: 1 Forks: 6
https://github.com/CircleRadon/Osprey
GitHub
GitHub - CircleRadon/Osprey: [CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning" - CircleRadon/Osprey
X-PLUG/MobileAgent
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Language: Python
#agent #gpt4v #mllm #mobile_agents #multimodal #multimodal_large_language_models
Stars: 246 Issues: 3 Forks: 21
https://github.com/X-PLUG/MobileAgent
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Language: Python
#agent #gpt4v #mllm #mobile_agents #multimodal #multimodal_large_language_models
Stars: 246 Issues: 3 Forks: 21
https://github.com/X-PLUG/MobileAgent
GitHub
GitHub - X-PLUG/MobileAgent: Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family - X-PLUG/MobileAgent
magic-quill/MagicQuill
Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
Language: Python
#aigc #image_editing #mllm
Stars: 531 Issues: 7 Forks: 32
https://github.com/magic-quill/MagicQuill
Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
Language: Python
#aigc #image_editing #mllm
Stars: 531 Issues: 7 Forks: 32
https://github.com/magic-quill/MagicQuill
GitHub
GitHub - ant-research/MagicQuill: [CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing…
[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System - ant-research/MagicQuill
SkyworkAI/Skywork-R1V
Pioneering Multimodal Reasoning with CoT
Language: Python
#deepseek_r1 #llm #mllm
Stars: 387 Issues: 5 Forks: 19
https://github.com/SkyworkAI/Skywork-R1V
Pioneering Multimodal Reasoning with CoT
Language: Python
#deepseek_r1 #llm #mllm
Stars: 387 Issues: 5 Forks: 19
https://github.com/SkyworkAI/Skywork-R1V
GitHub
GitHub - SkyworkAI/Skywork-R1V: Skywork-R1V2:Multimodal Hybrid Reinforcement Learning for Reasoning …
Skywork-R1V2:Multimodal Hybrid Reinforcement Learning for Reasoning (Best open-source multimodal reasoning model) - SkyworkAI/Skywork-R1V
manycore-research/SpatialLM
SpatialLM: Large Language Model for Spatial Understanding
Language: Python
#mllm #point_clouds #scene_understanding #spatial_intelligence
Stars: 643 Issues: 2 Forks: 33
https://github.com/manycore-research/SpatialLM
SpatialLM: Large Language Model for Spatial Understanding
Language: Python
#mllm #point_clouds #scene_understanding #spatial_intelligence
Stars: 643 Issues: 2 Forks: 33
https://github.com/manycore-research/SpatialLM
GitHub
GitHub - manycore-research/SpatialLM: SpatialLM: Large Language Model for Spatial Understanding
SpatialLM: Large Language Model for Spatial Understanding - manycore-research/SpatialLM