Artificial Intelligence l l AI Updates
1.53K subscribers
345 photos
182 videos
28 files
783 links
News about AI & DL & ML!!!

Admin: @Gayrat_Anvarovich
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🔗 GitHub_Link

❇️ Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models 🚀🚀🚀

#VLMs

Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
🔗 GitHub_Link

❇️ OMG-LLaVA and OMG-Seg codebase 🔥🔥🔥

#VLMs

Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
This media is not supported in your browser
VIEW IN TELEGRAM
🔗 GitHub_Link

❇️ Set-of-Mark Visual Prompting for GPT-4V 🔥🔥🔥

#LLMs #VLMs

Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
This media is not supported in your browser
VIEW IN TELEGRAM
🔗 GitHub_Link

❇️ Robotic Transformer 2 (RT-2): The Vision-Language-Action Model

#VLMs

Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
🔗 GitHub_Link

❇️ ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild

It's really cool to see the open-source community creating small and cheap customized vision-language models (VLMs) that outperform the much larger closed-source APIs.

ChartGemma is a fine-tuned version of PaliGemma created by Megh Thakkar and team, which excels at answering questions regarding charts and plots. The idea is pretty simple: first use a closed-source API like Gemini 1.5 Flash to collect training data, then fine-tune the open PaliGemma model on it. You end up with a model that is much smaller and cheaper to run for this specific niche task, and it outperforms the closed-source APIs! 🔥

#VLMs

Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
🔗 GitHub_Link

❇️ TF-ID: Table/Figure IDentifier for academic papers.

Seeing the open-source community develop small, cost-effective customized vision-language models (VLMs) that outperform the much larger closed-source APIs is really impressive.

One of them is the TF-ID model by Yifei Huang. It's a fine-tuned version of Florence-2, the small but very powerful VLM by Microsoft. TF-ID (Table/Figure IDentifier) is a family of object detection models finetuned to extract tables and figures in academic papers. Interestingly, the author labeled 4600 images by hand, ensuring high data quality!

#VLMs

Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
👍1
🔗 GitHub_Link

❇️ VisionLLM Series 🔥🔥🔥

#VLMs

Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
This media is not supported in your browser
VIEW IN TELEGRAM
🔗 GitHub_Link

❇️ VideoLLM-online: Online Video Large Language Model for Streaming Video

#VLMs

Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
🔗 GitHub_Link

❇️ MyVLM: Personalizing VLMs for User-Specific Queries

#VLMs

Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
This media is not supported in your browser
VIEW IN TELEGRAM
🔗 GitHub_Link

❇️ FastVLM: Efficient Vision Encoding for Vision Language Models

#VLMs

Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
This media is not supported in your browser
VIEW IN TELEGRAM
🔗 GitHub_Link

❇️ VoRA: Integrating Visual Capabilities into LLMs

#LLMs #VLMs

Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates
👍1