Code Stars

innovatorved/whisper.api
This project provides an API with user level access support to transcribe speech to text using a finetuned and processed Whisper ASR model.
Language: Python
Total stars: 228
Stars trend:
22 Aug 2023

 5pm ▎ +2

 6pm ▏ +1

 7pm ██▊ +22

 8pm ███████ +56

 9pm █████▎ +42

10pm ████▏ +33

11pm ███▋ +29

23 Aug 2023

12am ██ +16

#python
#asr, #innovatorved, #transcribe, #whisper

48 views01:16

Code Stars

speechbrain/speechbrain
A PyTorch-based Speech Toolkit
Language:Python
Total stars: 7392
Stars trend:

28 Feb 2024
 2pm ▏ +1
 3pm ██ +16
 4pm █▌ +12
 5pm █▋ +13
 6pm █▌ +12
 7pm ▉ +7

#python
#asr, #audio, #audioprocessing, #deeplearning, #huggingface, #languagemodel, #pytorch, #speakerdiarization, #speakerrecognition, #speakerverification, #speechenhancement, #speechprocessing, #speechrecognition, #speechseparation, #speechtotext, #speechtoolkit, #speechrecognition, #spokenlanguageunderstanding, #transformers, #voicerecognition

75 views20:17

Code Stars

k2-fsa/sherpa-onnx
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift
Language:C++
Total stars: 1120
Stars trend:

6 Jun 2024
 4pm ▏ +1
 5pm ▏ +1
 6pm  +0
 7pm  +0
 8pm  +0
 9pm  +0
10pm  +0
11pm  +0
7 Jun 2024
12am ▍ +3
 1am ████▌ +36
 2am ███▎ +26
 3am ██ +16

#cplusplus
#aarch64, #android, #arm32, #asr, #cpp, #csharp, #dotnet, #ios, #linux, #macos, #mfc, #onnx, #openkylin, #raspberrypi, #riscv, #speechtotext, #texttospeech, #vits, #windows

135 views04:18

Code Stars

ictnlp/StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Language:Python
Total stars: 389
Stars trend:

17 Jun 2024
 9pm ▏ +1
10pm ▏ +1
11pm ▎ +2
18 Jun 2024
12am  +0
 1am ▋ +5
 2am ▍ +3
 3am █▍ +11
 4am ███ +24
 5am █▋ +13
 6am █ +8
 7am ▉ +7

#python
#allinone, #asr, #audioprocessing, #machinetranslation, #nonautoregressive, #seamless, #simultaneoustranslation, #speech, #speechenhancement, #speechprocessing, #speechrecognition, #speechsynthesis, #speechtotext, #speechtranslation, #streamingaudio, #texttoaudio, #texttospeech, #translation, #tts, #voice

113 views08:17

Code Stars

PeterH0323/Streamer-Sales
Streamer-Sales Top Sales - Sales Anchor LLM Model 🛒🎁, a large sales anchor model that can explain products from the perspective of stimulating users' purchase intention based on given product characteristics. 🚀⭐️Contains detailed data generation process❗️ 📦In addition, it also integrates LMDeploy accelerated reasoning🚀, RAG search enhanced generation 📚, TTS text-to-speech🔊, digital human generation , Agent uses the network to query real-time information🌐, ASR voice-to-text🎙
Language:Python
Total stars: 470
Stars trend:

24 Jun 2024
11pm ▏ +1
25 Jun 2024
12am ▏ +1
 1am ██ +16
 2am ███▋ +29
 3am ██▍ +19
 4am █▏ +9

#python
#asr, #chat, #chatapplication, #chatbot, #chatgpt, #digitalhuman, #gpt, #internlmchat7b, #internlm2, #llm, #metahuman, #rag, #textgeneration, #tts

112 viewsedited 05:18

Code Stars

harry0703/AudioNotes
快速提取音视频内容，整理成一份结构化的markdown笔记
Language:Python
Total stars: 194
Stars trend:

22 Jul 2024
12am ▌ +4
 1am ▎ +2
 2am ▍ +3
 3am ▎ +2
 4am █ +8
 5am ██ +16
 6am █▉ +15
 7am ██ +16
 8am █▎ +10

#python
#ai, #asr, #funasr, #ollama, #python, #qwen2, #whisper

108 views09:23

Code Stars

yeyupiaoling/MASR
Pytorch实现的流式与非流式的自动语音识别框架，同时兼容在线和离线识别，目前支持Conformer、Squeezeformer、DeepSpeech2模型，支持多种数据增强方法。
Language:Python
Total stars: 580
Stars trend:

3 Aug 2024
 2pm ███████████▎ +90
 3pm  +0
 4pm  +0
 5pm  +0
 6pm  +0
 7pm  +0
 8pm  +0
 9pm  +0
10pm  +0
11pm  +0
4 Aug 2024
12am  +0
 1am ▏ +1

#python
#asr, #conformer, #deeplearning, #deepspeech, #pytorch, #speech, #speechrecognition, #speechtotext, #squeezeformer

106 views02:18

Code Stars

m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Language:Python
Total stars: 10840
Stars trend:

3 Sep 2024
 9am ▏ +1
10am  +0
11am  +0
12pm ▏ +1
 1pm ▏ +1
 2pm ▌ +4
 3pm ▋ +5
 4pm ▌ +4
 5pm █▎ +10
 6pm ██▏ +17
 7pm ██▎ +18
 8pm ███▏ +25

#python
#asr, #speech, #speechrecognition, #speechtotext, #whisper

96 views21:19

Code Stars

NexaAI/nexa-sdk
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
Language:Python
Total stars: 1085
Stars trend:

22 Sep 2024
10pm █ +8
11pm ▊ +6
23 Sep 2024
12am ▍ +3
 1am ▊ +6
 2am █▎ +10
 3am ▋ +5
 4am █ +8
 5am ▌ +4
 6am █▏ +9
 7am ▌ +4
 8am ▌ +4
 9am █▏ +9

#python
#asr, #edgecomputing, #languagemodel, #llm, #ondeviceai, #ondeviceml, #sdk, #sdkpython, #stablediffusion, #transformers, #tts, #vlm, #whisper

217 views10:18

Code Stars

TEN-framework/TEN-Agent
TEN Agent is the world’s first real-time multimodal agent integrated with the OpenAI Realtime API, RTC, and features weather checks, web search, vision, and RAG capabilities.
Language:Python
Total stars: 1252
Stars trend:

25 Oct 2024
10pm ▏ +1
11pm ▏ +1
26 Oct 2024
12am ▎ +2
 1am ▊ +6
 2am ██ +16
 3am ██ +16
 4am █▍ +11
 5am ▊ +6
 6am █▍ +11
 7am ▏ +1
 8am █▌ +12

#python
#agent, #ai, #asr, #cpp, #gemini, #golang, #gpt4, #gpt4o, #llm, #lowlatency, #multimodal, #nextjs14, #openai, #python, #rag, #realtime, #realtime, #tts, #vision, #voiceassistant

104 views09:17

Code Stars

abus-aikorea/voice-pro
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech, and Translation.
Language:Python
Total stars: 385
Stars trend:

9 Nov 2024
10pm ▏ +1
11pm ▌ +4
10 Nov 2024
12am ▎ +2
 1am █▏ +9
 2am ██▏ +17
 3am █▎ +10
 4am ▉ +7
 5am ▊ +6
 6am ▍ +3
 7am ▌ +4
 8am ▌ +4
 9am █ +8

#python
#asr, #demucs, #fasterwhisper, #gradio, #speechrecognition, #speechsynthesis, #speechtotext, #stt, #subtitles, #texttospeech, #transcription, #translate, #translation, #translator, #tts, #uvr5, #webui, #webui, #whisper, #ytdlp

102 views10:17

Code Stars

TEN-framework/TEN-Agent
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatible with popular workflow platforms like Dify and Coze.
Language:Python
Total stars: 4121
Stars trend:

28 Jan 2025
10am ▎ +2
11am ▉ +7
12pm ▉ +7
 1pm ▉ +7
 2pm █▏ +9
 3pm █▌ +12
 4pm █▍ +11
 5pm ▋ +5
 6pm █▌ +12
 7pm ▌ +4

#python
#agent, #ai, #asr, #cpp, #gemini, #golang, #gpt4, #gpt4o, #llm, #lowlatency, #multimodal, #nextjs14, #openai, #python, #rag, #realtime, #realtime, #tts, #vision, #voiceassistant

89 views20:18

Code Stars

umlx5h/LLPlayer
The media player for language learning, with dual subtitles, AI-generated subtitles, real-time translation, and more!
Language:C#
Total stars: 838
Stars trend:

12 Apr 2025
 3am ▎ +2
 4am ▎ +2
 5am ▍ +3
 6am ▏ +1
 7am ▌ +4
 8am ▏ +1
 9am ▏ +1
10am ▍ +3
11am █▎ +10
12pm ████▏ +33
 1pm ▍ +3
 2pm █▋ +13

#csharp
#asr, #csharp, #fasterwhisper, #flyleaf, #languagelearning, #llm, #mediaplayer, #ocr, #ollama, #player, #video, #videoplayer, #whisper, #wpf, #ytdlp

84 views15:19

About

Blog

Apps

Platform