Code Stars

k2-fsa/sherpa-onnx
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift
Language:C++
Total stars: 1120
Stars trend:

6 Jun 2024
 4pm ▏ +1
 5pm ▏ +1
 6pm  +0
 7pm  +0
 8pm  +0
 9pm  +0
10pm  +0
11pm  +0
7 Jun 2024
12am ▍ +3
 1am ████▌ +36
 2am ███▎ +26
 3am ██ +16

#cplusplus
#aarch64, #android, #arm32, #asr, #cpp, #csharp, #dotnet, #ios, #linux, #macos, #mfc, #onnx, #openkylin, #raspberrypi, #riscv, #speechtotext, #texttospeech, #vits, #windows

138 views04:18

Code Stars

ictnlp/StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Language:Python
Total stars: 389
Stars trend:

17 Jun 2024
 9pm ▏ +1
10pm ▏ +1
11pm ▎ +2
18 Jun 2024
12am  +0
 1am ▋ +5
 2am ▍ +3
 3am █▍ +11
 4am ███ +24
 5am █▋ +13
 6am █ +8
 7am ▉ +7

#python
#allinone, #asr, #audioprocessing, #machinetranslation, #nonautoregressive, #seamless, #simultaneoustranslation, #speech, #speechenhancement, #speechprocessing, #speechrecognition, #speechsynthesis, #speechtotext, #speechtranslation, #streamingaudio, #texttoaudio, #texttospeech, #translation, #tts, #voice

116 views08:17

Code Stars

PeterH0323/Streamer-Sales
Streamer-Sales Top Sales - Sales Anchor LLM Model 🛒🎁, a large sales anchor model that can explain products from the perspective of stimulating users' purchase intention based on given product characteristics. 🚀⭐️Contains detailed data generation process❗️ 📦In addition, it also integrates LMDeploy accelerated reasoning🚀, RAG search enhanced generation 📚, TTS text-to-speech🔊, digital human generation , Agent uses the network to query real-time information🌐, ASR voice-to-text🎙
Language:Python
Total stars: 470
Stars trend:

24 Jun 2024
11pm ▏ +1
25 Jun 2024
12am ▏ +1
 1am ██ +16
 2am ███▋ +29
 3am ██▍ +19
 4am █▏ +9

#python
#asr, #chat, #chatapplication, #chatbot, #chatgpt, #digitalhuman, #gpt, #internlmchat7b, #internlm2, #llm, #metahuman, #rag, #textgeneration, #tts

113 viewsedited 05:18

Code Stars

harry0703/AudioNotes
快速提取音视频内容，整理成一份结构化的markdown笔记
Language:Python
Total stars: 194
Stars trend:

22 Jul 2024
12am ▌ +4
 1am ▎ +2
 2am ▍ +3
 3am ▎ +2
 4am █ +8
 5am ██ +16
 6am █▉ +15
 7am ██ +16
 8am █▎ +10

#python
#ai, #asr, #funasr, #ollama, #python, #qwen2, #whisper

112 views09:23

Code Stars

yeyupiaoling/MASR
Pytorch实现的流式与非流式的自动语音识别框架，同时兼容在线和离线识别，目前支持Conformer、Squeezeformer、DeepSpeech2模型，支持多种数据增强方法。
Language:Python
Total stars: 580
Stars trend:

3 Aug 2024
 2pm ███████████▎ +90
 3pm  +0
 4pm  +0
 5pm  +0
 6pm  +0
 7pm  +0
 8pm  +0
 9pm  +0
10pm  +0
11pm  +0
4 Aug 2024
12am  +0
 1am ▏ +1

#python
#asr, #conformer, #deeplearning, #deepspeech, #pytorch, #speech, #speechrecognition, #speechtotext, #squeezeformer

110 views02:18

Code Stars

m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Language:Python
Total stars: 10840
Stars trend:

3 Sep 2024
 9am ▏ +1
10am  +0
11am  +0
12pm ▏ +1
 1pm ▏ +1
 2pm ▌ +4
 3pm ▋ +5
 4pm ▌ +4
 5pm █▎ +10
 6pm ██▏ +17
 7pm ██▎ +18
 8pm ███▏ +25

#python
#asr, #speech, #speechrecognition, #speechtotext, #whisper

98 views21:19

Code Stars

NexaAI/nexa-sdk
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
Language:Python
Total stars: 1085
Stars trend:

22 Sep 2024
10pm █ +8
11pm ▊ +6
23 Sep 2024
12am ▍ +3
 1am ▊ +6
 2am █▎ +10
 3am ▋ +5
 4am █ +8
 5am ▌ +4
 6am █▏ +9
 7am ▌ +4
 8am ▌ +4
 9am █▏ +9

#python
#asr, #edgecomputing, #languagemodel, #llm, #ondeviceai, #ondeviceml, #sdk, #sdkpython, #stablediffusion, #transformers, #tts, #vlm, #whisper

295 views10:18

Code Stars

TEN-framework/TEN-Agent
TEN Agent is the world’s first real-time multimodal agent integrated with the OpenAI Realtime API, RTC, and features weather checks, web search, vision, and RAG capabilities.
Language:Python
Total stars: 1252
Stars trend:

25 Oct 2024
10pm ▏ +1
11pm ▏ +1
26 Oct 2024
12am ▎ +2
 1am ▊ +6
 2am ██ +16
 3am ██ +16
 4am █▍ +11
 5am ▊ +6
 6am █▍ +11
 7am ▏ +1
 8am █▌ +12

#python
#agent, #ai, #asr, #cpp, #gemini, #golang, #gpt4, #gpt4o, #llm, #lowlatency, #multimodal, #nextjs14, #openai, #python, #rag, #realtime, #realtime, #tts, #vision, #voiceassistant

104 views09:17

Code Stars

abus-aikorea/voice-pro
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech, and Translation.
Language:Python
Total stars: 385
Stars trend:

9 Nov 2024
10pm ▏ +1
11pm ▌ +4
10 Nov 2024
12am ▎ +2
 1am █▏ +9
 2am ██▏ +17
 3am █▎ +10
 4am ▉ +7
 5am ▊ +6
 6am ▍ +3
 7am ▌ +4
 8am ▌ +4
 9am █ +8

#python
#asr, #demucs, #fasterwhisper, #gradio, #speechrecognition, #speechsynthesis, #speechtotext, #stt, #subtitles, #texttospeech, #transcription, #translate, #translation, #translator, #tts, #uvr5, #webui, #webui, #whisper, #ytdlp

107 views10:17

Code Stars

TEN-framework/TEN-Agent
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatible with popular workflow platforms like Dify and Coze.
Language:Python
Total stars: 4121
Stars trend:

28 Jan 2025
10am ▎ +2
11am ▉ +7
12pm ▉ +7
 1pm ▉ +7
 2pm █▏ +9
 3pm █▌ +12
 4pm █▍ +11
 5pm ▋ +5
 6pm █▌ +12
 7pm ▌ +4

#python
#agent, #ai, #asr, #cpp, #gemini, #golang, #gpt4, #gpt4o, #llm, #lowlatency, #multimodal, #nextjs14, #openai, #python, #rag, #realtime, #realtime, #tts, #vision, #voiceassistant

94 views20:18

Code Stars

umlx5h/LLPlayer
The media player for language learning, with dual subtitles, AI-generated subtitles, real-time translation, and more!
Language:C#
Total stars: 838
Stars trend:

12 Apr 2025
 3am ▎ +2
 4am ▎ +2
 5am ▍ +3
 6am ▏ +1
 7am ▌ +4
 8am ▏ +1
 9am ▏ +1
10am ▍ +3
11am █▎ +10
12pm ████▏ +33
 1pm ▍ +3
 2pm █▋ +13

#csharp
#asr, #csharp, #fasterwhisper, #flyleaf, #languagelearning, #llm, #mediaplayer, #ocr, #ollama, #player, #video, #videoplayer, #whisper, #wpf, #ytdlp

92 views15:19

Code Stars

m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Language:Python
Total stars: 15449
Stars trend:

6 May 2025
 4am ▊ +6
 5am █▏ +9
 6am ▊ +6
 7am ▌ +4
 8am █▏ +9
 9am ▊ +6
10am ▋ +5
11am █▏ +9
12pm ▍ +3
 1pm ▍ +3
 2pm █▎ +10
 3pm ▉ +7

#python
#asr, #speech, #speechrecognition, #speechtotext, #whisper

94 views16:17

Code Stars

NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Language:Python
Total stars: 13977
Stars trend:

8 May 2025
11am ▉ +7
12pm █▉ +15
 1pm ▉ +7
 2pm █▏ +9
 3pm ▉ +7
 4pm ▉ +7
 5pm ▊ +6
 6pm ▋ +5
 7pm ▍ +3
 8pm ▌ +4
 9pm ▍ +3
10pm ▉ +7

#python
#asr, #deeplearning, #generativeai, #largelanguagemodels, #machinetranslation, #multimodal, #neuralnetworks, #speakerdiariazation, #speakerrecognition, #speechsynthesis, #speechtranslation, #tts

108 views23:19

Code Stars

alphacep/vosk-api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Language:Jupyter Notebook
Total stars: 10057
Stars trend:

7 Jun 2025
 7pm ▍ +3
 8pm ▋ +5
 9pm ▎ +2
10pm ▊ +6
11pm ▉ +7
8 Jun 2025
12am ▉ +7
 1am ▉ +7
 2am ▉ +7
 3am █ +8
 4am █▎ +10
 5am ▋ +5
 6am █▏ +9

#jupyternotebook
#android, #asr, #deeplearning, #deepneuralnetworks, #deepspeech, #googlespeechtotext, #ios, #kaldi, #offline, #privacy, #python, #raspberrypi, #speakeridentification, #speakerverification, #speechrecognition, #speechtotext, #speechtotextandroid, #stt, #voicerecognition, #vosk

116 views07:17

Code Stars

jdepoix/youtube-transcript-api
This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!
Language:Python
Total stars: 4231
Stars trend:

11 Jun 2025
 3pm ▉ +7
 4pm ▉ +7
 5pm ▌ +4
 6pm ▋ +5
 7pm █▎ +10
 8pm ▍ +3
 9pm ▎ +2
10pm ▉ +7
11pm ▍ +3
12 Jun 2025
12am ▊ +6
 1am ██ +16
 2am █▋ +13

#python
#asr, #captions, #cli, #python, #subtitle, #subtitles, #transcript, #transcripts, #translatingtranscripts, #youtube, #youtubeapi, #youtubeasr, #youtubecaptions, #youtubesubtitles, #youtubetranscript, #youtubetranscripts, #youtubevideo

119 views03:17

About

Blog

Apps

Platform