This media is not supported in your browser
VIEW IN TELEGRAM
🔗 GitHub_Link
❇️ Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models 🚀🚀🚀
#VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
❇️ Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models 🚀🚀🚀
#VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
🔗 GitHub_Link
❇️ OMG-LLaVA and OMG-Seg codebase 🔥🔥🔥
#VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
❇️ OMG-LLaVA and OMG-Seg codebase 🔥🔥🔥
#VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
This media is not supported in your browser
VIEW IN TELEGRAM
🔗 GitHub_Link
❇️ Set-of-Mark Visual Prompting for GPT-4V 🔥🔥🔥
#LLMs #VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
❇️ Set-of-Mark Visual Prompting for GPT-4V 🔥🔥🔥
#LLMs #VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
This media is not supported in your browser
VIEW IN TELEGRAM
🔗 GitHub_Link
❇️ Robotic Transformer 2 (RT-2): The Vision-Language-Action Model
#VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
❇️ Robotic Transformer 2 (RT-2): The Vision-Language-Action Model
#VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
🔗 GitHub_Link
❇️ ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
It's really cool to see the open-source community creating small and cheap customized vision-language models (VLMs) that outperform the much larger closed-source APIs.
ChartGemma is a fine-tuned version of PaliGemma created by Megh Thakkar and team, which excels at answering questions regarding charts and plots. The idea is pretty simple: first use a closed-source API like Gemini 1.5 Flash to collect training data, then fine-tune the open PaliGemma model on it. You end up with a model that is much smaller and cheaper to run for this specific niche task, and it outperforms the closed-source APIs! 🔥
#VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
❇️ ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
It's really cool to see the open-source community creating small and cheap customized vision-language models (VLMs) that outperform the much larger closed-source APIs.
ChartGemma is a fine-tuned version of PaliGemma created by Megh Thakkar and team, which excels at answering questions regarding charts and plots. The idea is pretty simple: first use a closed-source API like Gemini 1.5 Flash to collect training data, then fine-tune the open PaliGemma model on it. You end up with a model that is much smaller and cheaper to run for this specific niche task, and it outperforms the closed-source APIs! 🔥
#VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
🔗 GitHub_Link
❇️ TF-ID: Table/Figure IDentifier for academic papers.
Seeing the open-source community develop small, cost-effective customized vision-language models (VLMs) that outperform the much larger closed-source APIs is really impressive.
One of them is the TF-ID model by Yifei Huang. It's a fine-tuned version of Florence-2, the small but very powerful VLM by Microsoft. TF-ID (Table/Figure IDentifier) is a family of object detection models finetuned to extract tables and figures in academic papers. Interestingly, the author labeled 4600 images by hand, ensuring high data quality!
#VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
❇️ TF-ID: Table/Figure IDentifier for academic papers.
Seeing the open-source community develop small, cost-effective customized vision-language models (VLMs) that outperform the much larger closed-source APIs is really impressive.
One of them is the TF-ID model by Yifei Huang. It's a fine-tuned version of Florence-2, the small but very powerful VLM by Microsoft. TF-ID (Table/Figure IDentifier) is a family of object detection models finetuned to extract tables and figures in academic papers. Interestingly, the author labeled 4600 images by hand, ensuring high data quality!
#VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
👍1
🔗 GitHub_Link
❇️ VisionLLM Series 🔥🔥🔥
#VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
❇️ VisionLLM Series 🔥🔥🔥
#VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
This media is not supported in your browser
VIEW IN TELEGRAM
🔗 GitHub_Link
❇️ VideoLLM-online: Online Video Large Language Model for Streaming Video
#VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
❇️ VideoLLM-online: Online Video Large Language Model for Streaming Video
#VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
🔗 GitHub_Link
❇️ MyVLM: Personalizing VLMs for User-Specific Queries
#VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
❇️ MyVLM: Personalizing VLMs for User-Specific Queries
#VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
This media is not supported in your browser
VIEW IN TELEGRAM
🔗 GitHub_Link
❇️ FastVLM: Efficient Vision Encoding for Vision Language Models
#VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
❇️ FastVLM: Efficient Vision Encoding for Vision Language Models
#VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
This media is not supported in your browser
VIEW IN TELEGRAM
🔗 GitHub_Link
❇️ VoRA: Integrating Visual Capabilities into LLMs
#LLMs #VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
❇️ VoRA: Integrating Visual Capabilities into LLMs
#LLMs #VLMs
Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
👍1