xlang-ai/OSWorld
OSWorld: A real computer environment for multimodal agents to evaluate open-ended computer tasks
Language:Python
Total stars: 446
Stars trend:
#python
#agent, #artificialintelligence, #benchmark, #codegeneration, #languagemodel, #multimodal, #reinforcementlearning, #rpa
OSWorld: A real computer environment for multimodal agents to evaluate open-ended computer tasks
Language:Python
Total stars: 446
Stars trend:
28 Apr 2024
3pm ▎ +2
4pm ██▎ +18
5pm █▉ +15
6pm █ +8
7pm █▍ +11
8pm █▎ +10
9pm █▎ +10
10pm ▉ +7
#python
#agent, #artificialintelligence, #benchmark, #codegeneration, #languagemodel, #multimodal, #reinforcementlearning, #rpa
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 最接近GPT-4V表现的可商用开源模型
Language:Python
Total stars: 1073
Stars trend:
#python
#imageclassification, #imagetextretrieval, #llm, #mme, #multimodal, #semanticsegmentation, #videoclassification, #visionlanguagemodel, #vit22b, #vit6b
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 最接近GPT-4V表现的可商用开源模型
Language:Python
Total stars: 1073
Stars trend:
28 Apr 2024
7pm ▎ +2
8pm +0
9pm +0
10pm +0
11pm ▏ +1
29 Apr 2024
12am ▍ +3
1am ██▏ +17
2am ██▋ +21
3am █▌ +12
4am ▌ +4
5am █▏ +9
6am █▎ +10
#python
#imageclassification, #imagetextretrieval, #llm, #mme, #multimodal, #semanticsegmentation, #videoclassification, #visionlanguagemodel, #vit22b, #vit6b
louis030195/screen-pipe
Record your screen & mic 24/7 and connect it to LLMs. Inspired by adept.ai, rewind.ai, Apple Shortcut. Written in Rust. Free. You own your data.
Language:Rust
Total stars: 234
Stars trend:
#rust
#ai, #computervision, #llm, #machinelearning, #ml, #multimodal, #vision
Record your screen & mic 24/7 and connect it to LLMs. Inspired by adept.ai, rewind.ai, Apple Shortcut. Written in Rust. Free. You own your data.
Language:Rust
Total stars: 234
Stars trend:
5 Jul 2024
1pm ▉ +7
2pm ▋ +5
3pm ▌ +4
4pm █ +8
5pm ▉ +7
6pm █ +8
7pm █▏ +9
8pm ▉ +7
9pm █ +8
10pm ▊ +6
11pm ▋ +5
6 Jul 2024
12am ▍ +3
#rust
#ai, #computervision, #llm, #machinelearning, #ml, #multimodal, #vision
iterative/datachain
DataChain 🔗 Process and curate unstructured data using local ML models and LLM calls
Language:Python
Total stars: 145
Stars trend:
#python
#ai, #cv, #dataanalytics, #datawrangling, #embeddings, #llm, #llmeval, #mlops, #multimodal
DataChain 🔗 Process and curate unstructured data using local ML models and LLM calls
Language:Python
Total stars: 145
Stars trend:
23 Jul 2024
12pm ▏ +1
1pm ██▏ +17
2pm ██▎ +18
3pm █▉ +15
4pm █████████▌ +76
#python
#ai, #cv, #dataanalytics, #datawrangling, #embeddings, #llm, #llmeval, #mlops, #multimodal
mediar-ai/screenpipe
24/7 local AI screen & mic recording. Build AI apps that have the full context. Works with Ollama. Alternative to Rewind.ai. Open. Secure. You own your data. Rust.
Language:Rust
Total stars: 1772
Stars trend:
#rust
#ai, #computervision, #llm, #machinelearning, #ml, #multimodal, #vision
24/7 local AI screen & mic recording. Build AI apps that have the full context. Works with Ollama. Alternative to Rewind.ai. Open. Secure. You own your data. Rust.
Language:Rust
Total stars: 1772
Stars trend:
28 Sep 2024
3am ▍ +3
4am ▍ +3
5am ▏ +1
6am ▍ +3
7am +0
8am ▍ +3
9am ▍ +3
10am ▊ +6
11am ██▌ +20
12pm ██▍ +19
1pm ██▍ +19
#rust
#ai, #computervision, #llm, #machinelearning, #ml, #multimodal, #vision
livekit/agents
Build real-time multimodal AI applications 🤖🎙️📹
Language:Python
Total stars: 1404
Stars trend:
#python
#agents, #ai, #multimodal, #realtime, #video, #voice, #voiceassistant
Build real-time multimodal AI applications 🤖🎙️📹
Language:Python
Total stars: 1404
Stars trend:
4 Oct 2024
6pm ▍ +3
7pm ▎ +2
8pm ▌ +4
9pm ███ +24
10pm ██▊ +22
11pm ██ +16
5 Oct 2024
12am ██▊ +22
1am ███▎ +26
2am ██▋ +21
#python
#agents, #ai, #multimodal, #realtime, #video, #voice, #voiceassistant
rhymes-ai/Aria
Codebase for Aria - an Open Multimodal Native MoE
Language:Jupyter Notebook
Total stars: 88
Stars trend:
#jupyternotebook
#mixtureofexperts, #multimodal, #visionandlanguage
Codebase for Aria - an Open Multimodal Native MoE
Language:Jupyter Notebook
Total stars: 88
Stars trend:
10 Oct 2024
2am ▎ +2
3am ▎ +2
4am ▍ +3
5am ▏ +1
6am ▏ +1
7am ▉ +7
8am ▊ +6
9am █▎ +10
10am ▊ +6
11am ██ +16
12pm █▎ +10
1pm █▋ +13
#jupyternotebook
#mixtureofexperts, #multimodal, #visionandlanguage
kyegomez/swarms
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework Join our Community: https://discord.com/servers/agora-999382051935506503
Language:Python
Total stars: 1313
Stars trend:
#python
#agents, #ai, #artificialintelligence, #attentionmechanism, #chatgpt, #gpt4, #gpt4all, #huggingface, #langchain, #langchainpython, #machinelearning, #multimodalimaging, #multimodality, #multimodal, #promptengineering, #prompttoolkit, #prompting, #swarms, #transformermodels, #treeofthoughts
The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework Join our Community: https://discord.com/servers/agora-999382051935506503
Language:Python
Total stars: 1313
Stars trend:
12 Oct 2024
2pm ▎ +2
3pm █▏ +9
4pm ▋ +5
5pm ▉ +7
6pm ▊ +6
7pm █▎ +10
8pm █▎ +10
9pm █ +8
10pm ▋ +5
11pm ▋ +5
13 Oct 2024
12am ▌ +4
1am ▋ +5
#python
#agents, #ai, #artificialintelligence, #attentionmechanism, #chatgpt, #gpt4, #gpt4all, #huggingface, #langchain, #langchainpython, #machinelearning, #multimodalimaging, #multimodality, #multimodal, #promptengineering, #prompttoolkit, #prompting, #swarms, #transformermodels, #treeofthoughts
TEN-framework/TEN-Agent
TEN Agent is the world’s first real-time multimodal agent integrated with the OpenAI Realtime API, RTC, and features weather checks, web search, vision, and RAG capabilities.
Language:Python
Total stars: 1252
Stars trend:
#python
#agent, #ai, #asr, #cpp, #gemini, #golang, #gpt4, #gpt4o, #llm, #lowlatency, #multimodal, #nextjs14, #openai, #python, #rag, #realtime, #realtime, #tts, #vision, #voiceassistant
TEN Agent is the world’s first real-time multimodal agent integrated with the OpenAI Realtime API, RTC, and features weather checks, web search, vision, and RAG capabilities.
Language:Python
Total stars: 1252
Stars trend:
25 Oct 2024
10pm ▏ +1
11pm ▏ +1
26 Oct 2024
12am ▎ +2
1am ▊ +6
2am ██ +16
3am ██ +16
4am █▍ +11
5am ▊ +6
6am █▍ +11
7am ▏ +1
8am █▌ +12
#python
#agent, #ai, #asr, #cpp, #gemini, #golang, #gpt4, #gpt4o, #llm, #lowlatency, #multimodal, #nextjs14, #openai, #python, #rag, #realtime, #realtime, #tts, #vision, #voiceassistant
OpenBMB/MiniCPM-o
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
Language:Python
Total stars: 13278
Stars trend:
#python
#minicpm, #minicpmv, #multimodal
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
Language:Python
Total stars: 13278
Stars trend:
14 Jan 2025
2pm █▉ +15
3pm ██ +16
4pm █▊ +14
5pm ▋ +5
6pm ▋ +5
7pm ▊ +6
8pm ▎ +2
9pm ▎ +2
10pm ▋ +5
11pm ▊ +6
15 Jan 2025
12am ▉ +7
1am ██▍ +19
#python
#minicpm, #minicpmv, #multimodal
om-ai-lab/OmAgent
Build multimodal language agents for very fast prototype and production
Language:Python
Total stars: 1170
Stars trend:
#python
#agent, #chatbot, #gemini, #gpt, #gpt4, #gradio, #languageagent, #largelanguagemodels, #llama, #llava, #llm, #multimodal, #multimodalagent, #openai, #python, #rag, #smarthardware, #visionandlanguage, #vlm, #workflow
Build multimodal language agents for very fast prototype and production
Language:Python
Total stars: 1170
Stars trend:
16 Jan 2025
5am █▏ +9
6am █▉ +15
7am █ +8
8am ▋ +5
9am ▋ +5
10am ▎ +2
11am █▎ +10
12pm ▌ +4
1pm ▍ +3
2pm ▋ +5
3pm ▊ +6
4pm ▍ +3
#python
#agent, #chatbot, #gemini, #gpt, #gpt4, #gradio, #languageagent, #largelanguagemodels, #llama, #llava, #llm, #multimodal, #multimodalagent, #openai, #python, #rag, #smarthardware, #visionandlanguage, #vlm, #workflow
TEN-framework/TEN-Agent
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatible with popular workflow platforms like Dify and Coze.
Language:Python
Total stars: 4121
Stars trend:
#python
#agent, #ai, #asr, #cpp, #gemini, #golang, #gpt4, #gpt4o, #llm, #lowlatency, #multimodal, #nextjs14, #openai, #python, #rag, #realtime, #realtime, #tts, #vision, #voiceassistant
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatible with popular workflow platforms like Dify and Coze.
Language:Python
Total stars: 4121
Stars trend:
28 Jan 2025
10am ▎ +2
11am ▉ +7
12pm ▉ +7
1pm ▉ +7
2pm █▏ +9
3pm █▌ +12
4pm █▍ +11
5pm ▋ +5
6pm █▌ +12
7pm ▌ +4
#python
#agent, #ai, #asr, #cpp, #gemini, #golang, #gpt4, #gpt4o, #llm, #lowlatency, #multimodal, #nextjs14, #openai, #python, #rag, #realtime, #realtime, #tts, #vision, #voiceassistant
Mintplex-Labs/anything-llm
The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.
Language:JavaScript
Total stars: 31528
Stars trend:
#javascript
#agentframeworkjavascript, #aiagents, #crewai, #customaiagents, #desktopapp, #llama3, #llm, #llmapplication, #llmwebui, #lmstudio, #localllm, #localai, #multimodal, #nodejs, #ollama, #rag, #vectordatabase, #webui
The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.
Language:JavaScript
Total stars: 31528
Stars trend:
31 Jan 2025
9am ▌ +4
10am █ +8
11am ▌ +4
12pm ▍ +3
1pm █▍ +11
2pm █▏ +9
3pm ▉ +7
4pm ▊ +6
5pm ▊ +6
6pm ▌ +4
7pm █▏ +9
8pm ▌ +4
#javascript
#agentframeworkjavascript, #aiagents, #crewai, #customaiagents, #desktopapp, #llama3, #llm, #llmapplication, #llmwebui, #lmstudio, #localllm, #localai, #multimodal, #nodejs, #ollama, #rag, #vectordatabase, #webui
turningpoint-ai/VisualThinker-R1-Zero
Explore the Multimodal “Aha Moment” on 2B Model
Language:Python
Total stars: 208
Stars trend:
#python
#deepseek, #deepseekr1, #deepseekr1zero, #grpo, #multimodal, #multimodaljourney, #multimodalr1, #posttraining, #r1, #r1zero, #reasoning, #reinforcementlearning
Explore the Multimodal “Aha Moment” on 2B Model
Language:Python
Total stars: 208
Stars trend:
5 Mar 2025
1am ▍ +3
2am +0
3am ▏ +1
4am █ +8
5am ██▉ +23
6am ██▉ +23
7am ██▌ +20
8am ▉ +7
9am █▏ +9
10am ▉ +7
11am ▉ +7
12pm █▌ +12
#python
#deepseek, #deepseekr1, #deepseekr1zero, #grpo, #multimodal, #multimodaljourney, #multimodalr1, #posttraining, #r1, #r1zero, #reasoning, #reinforcementlearning
morphik-org/morphik-core
Open source multi-modal RAG for building AI apps over private knowledge.
Language:Python
Total stars: 885
Stars trend:
#python
#artificialintelligence, #cacheaugmentedgeneration, #colpali, #database, #litellm, #multimodal, #opensource, #rag, #rulesbasedingestion
Open source multi-modal RAG for building AI apps over private knowledge.
Language:Python
Total stars: 885
Stars trend:
11 Apr 2025
3pm ▎ +2
4pm █▎ +10
5pm █ +8
6pm ▋ +5
7pm ▊ +6
8pm ▉ +7
9pm ▊ +6
10pm ▍ +3
11pm ▋ +5
12 Apr 2025
12am █▋ +13
1am █▉ +15
#python
#artificialintelligence, #cacheaugmentedgeneration, #colpali, #database, #litellm, #multimodal, #opensource, #rag, #rulesbasedingestion
mediar-ai/screenpipe
AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
Language:TypeScript
Total stars: 13961
Stars trend:
#typescript
#agents, #agi, #ai, #computervision, #llm, #machinelearning, #ml, #multimodal, #vision
AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording
Language:TypeScript
Total stars: 13961
Stars trend:
29 Apr 2025
4am █▏ +9
5am ▌ +4
6am ▏ +1
7am ▊ +6
8am ▌ +4
9am ▎ +2
10am ▌ +4
11am ▍ +3
12pm ██▊ +22
1pm ██▏ +17
2pm █▋ +13
3pm █▊ +14
#typescript
#agents, #agi, #ai, #computervision, #llm, #machinelearning, #ml, #multimodal, #vision
Blaizzy/mlx-audio
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.
Language:Python
Total stars: 1009
Stars trend:
#python
#applesilicon, #audioprocessing, #mlx, #multimodal, #speechrecognition, #speechsynthesis, #speechtotext, #texttospeech, #transformers
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.
Language:Python
Total stars: 1009
Stars trend:
8 May 2025
5am ▍ +3
6am ▍ +3
7am ▏ +1
8am ▍ +3
9am +0
10am ▋ +5
11am █▍ +11
12pm █▍ +11
1pm ▍ +3
2pm █▊ +14
3pm █▊ +14
4pm █ +8
#python
#applesilicon, #audioprocessing, #mlx, #multimodal, #speechrecognition, #speechsynthesis, #speechtotext, #texttospeech, #transformers
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Language:Python
Total stars: 13977
Stars trend:
#python
#asr, #deeplearning, #generativeai, #largelanguagemodels, #machinetranslation, #multimodal, #neuralnetworks, #speakerdiariazation, #speakerrecognition, #speechsynthesis, #speechtranslation, #tts
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Language:Python
Total stars: 13977
Stars trend:
8 May 2025
11am ▉ +7
12pm █▉ +15
1pm ▉ +7
2pm █▏ +9
3pm ▉ +7
4pm ▉ +7
5pm ▊ +6
6pm ▋ +5
7pm ▍ +3
8pm ▌ +4
9pm ▍ +3
10pm ▉ +7
#python
#asr, #deeplearning, #generativeai, #largelanguagemodels, #machinetranslation, #multimodal, #neuralnetworks, #speakerdiariazation, #speakerrecognition, #speechsynthesis, #speechtranslation, #tts
Capsize-Games/airunner
Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows
Language:Python
Total stars: 948
Stars trend:
#python
#ai, #aiart, #art, #assetgenerator, #chatbot, #deeplearning, #desktopapp, #imagegeneration, #mistral, #multimodal, #privacy, #pygame, #pyside6, #python, #selfhosted, #speechtotext, #stablediffusion, #texttoimage, #texttospeech, #texttospeechapp
Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows
Language:Python
Total stars: 948
Stars trend:
17 May 2025
8am ▎ +2
9am ▎ +2
10am ▍ +3
11am █▏ +9
12pm █▏ +9
1pm █▏ +9
2pm █▎ +10
3pm █▏ +9
4pm ▋ +5
5pm ▋ +5
6pm █ +8
7pm ▊ +6
#python
#ai, #aiart, #art, #assetgenerator, #chatbot, #deeplearning, #desktopapp, #imagegeneration, #mistral, #multimodal, #privacy, #pygame, #pyside6, #python, #selfhosted, #speechtotext, #stablediffusion, #texttoimage, #texttospeech, #texttospeechapp
FareedKhan-dev/all-rag-techniques
Implementation of all RAG techniques in a simpler way
Language:Jupyter Notebook
Total stars: 2178
Stars trend:
#jupyternotebook
#ai, #llm, #llms, #multimodal, #openai, #python, #rag
Implementation of all RAG techniques in a simpler way
Language:Jupyter Notebook
Total stars: 2178
Stars trend:
9 Jun 2025
11pm █▎ +10
10 Jun 2025
12am ▊ +6
1am █ +8
2am █▏ +9
3am ▉ +7
4am ▊ +6
5am ▉ +7
6am ▊ +6
7am ▍ +3
8am ▋ +5
9am █ +8
10am ▊ +6
#jupyternotebook
#ai, #llm, #llms, #multimodal, #openai, #python, #rag