NVlabs/prismer
The implementation of "Prismer: A Vision-Language Model with An Ensemble of Experts".
Language: Python
#image_captioning #language_model #multi_modal_learning #multi_task_learning #vision_and_language #vision_language_model #vqa
Stars: 479 Issues: 6 Forks: 21
https://github.com/NVlabs/prismer
The implementation of "Prismer: A Vision-Language Model with An Ensemble of Experts".
Language: Python
#image_captioning #language_model #multi_modal_learning #multi_task_learning #vision_and_language #vision_language_model #vqa
Stars: 479 Issues: 6 Forks: 21
https://github.com/NVlabs/prismer
GitHub
GitHub - NVlabs/prismer: The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts". - NVlabs/prismer
OFA-Sys/ONE-PEACE
A general representation modal across vision, audio, language modalities.
Language: Python
#audio_language #foundation_models #multimodal #representation_learning #vision_language
Stars: 185 Issues: 2 Forks: 5
https://github.com/OFA-Sys/ONE-PEACE
A general representation modal across vision, audio, language modalities.
Language: Python
#audio_language #foundation_models #multimodal #representation_learning #vision_language
Stars: 185 Issues: 2 Forks: 5
https://github.com/OFA-Sys/ONE-PEACE
GitHub
GitHub - OFA-Sys/ONE-PEACE: A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring…
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities - OFA-Sys/ONE-PEACE
roboflow/multimodal-maestro
Effective prompting for Large Multimodal Models like GPT-4 Vision or LLaVA. 🔥
Language: Python
#cross_modal #gpt_4 #gpt_4_vision #instance_segmentation #llava #lmm #multimodality #object_detection #prompt_engineering #segment_anything #vision_language_model #visual_prompting
Stars: 367 Issues: 1 Forks: 23
https://github.com/roboflow/multimodal-maestro
Effective prompting for Large Multimodal Models like GPT-4 Vision or LLaVA. 🔥
Language: Python
#cross_modal #gpt_4 #gpt_4_vision #instance_segmentation #llava #lmm #multimodality #object_detection #prompt_engineering #segment_anything #vision_language_model #visual_prompting
Stars: 367 Issues: 1 Forks: 23
https://github.com/roboflow/multimodal-maestro
GitHub
GitHub - roboflow/maestro: streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL
streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL - roboflow/maestro
mbzuai-oryx/LLaVA-pp
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
Language: Python
#conversation #llama_3_llava #llama_3_vision #llama3 #llama3_llava #llama3_vision #llava #llava_llama3 #llava_phi3 #llm #lmms #phi_3_llava #phi_3_vision #phi3 #phi3_llava #phi3_vision #vision_language
Stars: 297 Issues: 2 Forks: 13
https://github.com/mbzuai-oryx/LLaVA-pp
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
Language: Python
#conversation #llama_3_llava #llama_3_vision #llama3 #llama3_llava #llama3_vision #llava #llava_llama3 #llava_phi3 #llm #lmms #phi_3_llava #phi_3_vision #phi3 #phi3_llava #phi3_vision #vision_language
Stars: 297 Issues: 2 Forks: 13
https://github.com/mbzuai-oryx/LLaVA-pp
GitHub
GitHub - mbzuai-oryx/LLaVA-pp: 🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3) - mbzuai-oryx/LLaVA-pp