Artificial Intelligence l l AI Updates

0:30

🔗 GitHub_Link

❇️ Set-of-Mark Visual Prompting for GPT-4V 🔥🔥🔥

#LLMs #VLMs

Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates

406 views03:14

🔗 GitHub_Link

❇️ Robotic Transformer 2 (RT-2): The Vision-Language-Action Model

#VLMs

Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates

388 viewsedited 08:32

🔗 GitHub_Link

❇️ ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild

It's really cool to see the open-source community creating small and cheap customized vision-language models (VLMs) that outperform the much larger closed-source APIs.

ChartGemma is a fine-tuned version of PaliGemma created by Megh Thakkar and team, which excels at answering questions regarding charts and plots. The idea is pretty simple: first use a closed-source API like Gemini 1.5 Flash to collect training data, then fine-tune the open PaliGemma model on it. You end up with a model that is much smaller and cheaper to run for this specific niche task, and it outperforms the closed-source APIs! 🔥

#VLMs

Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates

393 views08:07

🔗 GitHub_Link

❇️ TF-ID: Table/Figure IDentifier for academic papers.

Seeing the open-source community develop small, cost-effective customized vision-language models (VLMs) that outperform the much larger closed-source APIs is really impressive.

One of them is the TF-ID model by Yifei Huang. It's a fine-tuned version of Florence-2, the small but very powerful VLM by Microsoft. TF-ID (Table/Figure IDentifier) is a family of object detection models finetuned to extract tables and figures in academic papers. Interestingly, the author labeled 4600 images by hand, ensuring high data quality!

#VLMs

Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates

👍1

414 views08:20

🔗 GitHub_Link

❇️ VisionLLM Series 🔥🔥🔥

#VLMs

Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates

410 views01:22

0:18

🔗 GitHub_Link

❇️ VideoLLM-online: Online Video Large Language Model for Streaming Video

#VLMs

Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates

528 views05:10

🔗 GitHub_Link

❇️ MyVLM: Personalizing VLMs for User-Specific Queries

#VLMs

Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates

521 views01:38

🔗 GitHub_Link

❇️ FastVLM: Efficient Vision Encoding for Vision Language Models

#VLMs

Join my channel:
👇👇👇👇👇👇
https://t.me/Artificial_Intelligence_Updates

556 views00:45