Video Understanding with Large Language Models: A Survey
📝https://github.com/yunlong10/awesome-llms-for-video-understanding
📝https://github.com/yunlong10/awesome-llms-for-video-understanding
GitHub
GitHub - yunlong10/Awesome-LLMs-for-Video-Understanding: 🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs. Contribute to yunlong10/Awesome-LLMs-for-Video-Understanding development by creating an account on GitHub.
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
📝https://github.com/stanford-oval/wikichat
📝https://github.com/stanford-oval/wikichat
GitHub
GitHub - stanford-oval/WikiChat: WikiChat is an improved RAG. It stops the hallucination of large language models by retrieving…
WikiChat is an improved RAG. It stops the hallucination of large language models by retrieving data from a corpus. - stanford-oval/WikiChat
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
📝https://github.com/facebookresearch/audio2photoreal
📝https://github.com/facebookresearch/audio2photoreal
GitHub
GitHub - facebookresearch/audio2photoreal: Code and dataset for photorealistic Codec Avatars driven from audio
Code and dataset for photorealistic Codec Avatars driven from audio - facebookresearch/audio2photoreal
InstantID: Zero-shot Identity-Preserving Generation in Seconds
📝https://github.com/instantid/instantid
📝https://github.com/instantid/instantid
GitHub
GitHub - instantX-research/InstantID: InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥 - instantX-research/InstantID
AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception
📝https://github.com/yipoh/aesbench
📝https://github.com/yipoh/aesbench
GitHub
GitHub - yipoh/AesBench: An expert benchmark aiming to comprehensively evaluate the aesthetic perception capacities of MLLMs.
An expert benchmark aiming to comprehensively evaluate the aesthetic perception capacities of MLLMs. - GitHub - yipoh/AesBench: An expert benchmark aiming to comprehensively evaluate the aesthetic ...
Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities
📝https://github.com/zhanghm1995/forge_vfm4ad
📝https://github.com/zhanghm1995/forge_vfm4ad
GitHub
GitHub - zhanghm1995/Forge_VFM4AD: A comprehensive survey of forging vision foundation models for autonomous driving, including…
A comprehensive survey of forging vision foundation models for autonomous driving, including challenges, methodologies, and opportunities. - GitHub - zhanghm1995/Forge_VFM4AD: A comprehensive surv...
World Model on Million-Length Video And Language With RingAttention
📝https://github.com/LargeWorldModel/LWM
📝https://github.com/LargeWorldModel/LWM
GitHub
GitHub - LargeWorldModel/LWM: Large World Model -- Modeling Text and Video with Millions Context
Large World Model -- Modeling Text and Video with Millions Context - LargeWorldModel/LWM
Revisiting Feature Prediction for Learning Visual Representations from Video
📝https://github.com/facebookresearch/jepa
📝https://github.com/facebookresearch/jepa
GitHub
GitHub - facebookresearch/jepa: PyTorch code and models for V-JEPA self-supervised learning from video.
PyTorch code and models for V-JEPA self-supervised learning from video. - facebookresearch/jepa
GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting
📝https://github.com/GaussianObject/GaussianObject
📝https://github.com/GaussianObject/GaussianObject
GitHub
GitHub - chensjtu/GaussianObject: GaussianObject: High-Quality 3D Object Reconstruction from Four Views with Gaussian Splatting…
GaussianObject: High-Quality 3D Object Reconstruction from Four Views with Gaussian Splatting (SIGGRAPH Asia 2024, TOG) - chensjtu/GaussianObject
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
📝https://github.com/willisma/sit
📝https://github.com/willisma/sit
GitHub
GitHub - willisma/SiT: Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable…
Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers" - willisma/SiT
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
📝https://github.com/wongkinyiu/yolov9
📝https://github.com/wongkinyiu/yolov9
GitHub
GitHub - WongKinYiu/yolov9: Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information - WongKinYiu/yolov9
Transparent Image Layer Diffusion using Latent Transparency
📝https://github.com/layerdiffusion/layerdiffusion
📝https://github.com/layerdiffusion/layerdiffusion
GitHub
GitHub - layerdiffusion/LayerDiffuse: Transparent Image Layer Diffusion using Latent Transparency
Transparent Image Layer Diffusion using Latent Transparency - layerdiffusion/LayerDiffuse
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
📝https://github.com/parthsarthi03/RAPTOR
📝https://github.com/parthsarthi03/RAPTOR
GitHub
GitHub - parthsarthi03/raptor: The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval - parthsarthi03/raptor
TripoSR: Fast 3D Object Reconstruction from a Single Image
📝https://github.com/vast-ai-research/triposr
📝https://github.com/vast-ai-research/triposr
GitHub
GitHub - VAST-AI-Research/TripoSR
Contribute to VAST-AI-Research/TripoSR development by creating an account on GitHub.
V3D: Video Diffusion Models are Effective 3D Generators
📝https://github.com/heheyas/v3d
📝https://github.com/heheyas/v3d
GitHub
GitHub - heheyas/V3D: V3D: Video Diffusion Models are Effective 3D Generators
V3D: Video Diffusion Models are Effective 3D Generators - heheyas/V3D
Extreme Compression of Large Language Models via Additive Quantization
📝https://github.com/vahe1994/aqlm
📝https://github.com/vahe1994/aqlm
GitHub
GitHub - Vahe1994/AQLM: Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization…
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf - Vahe1994/AQLM