OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Language: Python
#chinese #computer_vision #multi_modal_learning #nlp #pytorch #vision_and_language_pre_training
Stars: 80 Issues: 0 Forks: 7
https://github.com/OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Language: Python
#chinese #computer_vision #multi_modal_learning #nlp #pytorch #vision_and_language_pre_training
Stars: 80 Issues: 0 Forks: 7
https://github.com/OFA-Sys/Chinese-CLIP
GitHub
GitHub - OFA-Sys/Chinese-CLIP: Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation. - OFA-Sys/Chinese-CLIP
NVlabs/prismer
The implementation of "Prismer: A Vision-Language Model with An Ensemble of Experts".
Language: Python
#image_captioning #language_model #multi_modal_learning #multi_task_learning #vision_and_language #vision_language_model #vqa
Stars: 479 Issues: 6 Forks: 21
https://github.com/NVlabs/prismer
The implementation of "Prismer: A Vision-Language Model with An Ensemble of Experts".
Language: Python
#image_captioning #language_model #multi_modal_learning #multi_task_learning #vision_and_language #vision_language_model #vqa
Stars: 479 Issues: 6 Forks: 21
https://github.com/NVlabs/prismer
GitHub
GitHub - NVlabs/prismer: The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts". - NVlabs/prismer
HKUDS/VideoRAG
"VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos"
Language: Python
#large_language_models #llms #long_video_understanding #multi_modal_llms #rag #retrieval_augmented_generation
Stars: 201 Issues: 1 Forks: 14
https://github.com/HKUDS/VideoRAG
"VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos"
Language: Python
#large_language_models #llms #long_video_understanding #multi_modal_llms #rag #retrieval_augmented_generation
Stars: 201 Issues: 1 Forks: 14
https://github.com/HKUDS/VideoRAG
GitHub
GitHub - HKUDS/VideoRAG: "VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos"
"VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos" - HKUDS/VideoRAG
ses4255/Versatile-OCR-Program
Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)
Language: Python
#doclayout #educational_data #exam_ocr #machine_learning #ml_datasets #multi_modal #ocr #openai #paper_ocr #table_parsing
Stars: 250 Issues: 0 Forks: 11
https://github.com/ses4255/Versatile-OCR-Program
Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)
Language: Python
#doclayout #educational_data #exam_ocr #machine_learning #ml_datasets #multi_modal #ocr #openai #paper_ocr #table_parsing
Stars: 250 Issues: 0 Forks: 11
https://github.com/ses4255/Versatile-OCR-Program
GitHub
GitHub - ses4255/Versatile-OCR-Program: Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)
Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams) - ses4255/Versatile-OCR-Program