innovatorved/whisper.api
This project provides an API with user level access support to transcribe speech to text using a finetuned and processed Whisper ASR model.
Language: Python
Total stars: 228
Stars trend:
22 Aug 2023
23 Aug 2023
#python
#asr, #innovatorved, #transcribe, #whisper
This project provides an API with user level access support to transcribe speech to text using a finetuned and processed Whisper ASR model.
Language: Python
Total stars: 228
Stars trend:
22 Aug 2023
5pm ▎ +2
6pm ▏ +1
7pm ██▊ +22
8pm ███████ +56
9pm █████▎ +42
10pm ████▏ +33
11pm ███▋ +29
23 Aug 2023
12am ██ +16
#python
#asr, #innovatorved, #transcribe, #whisper
speechbrain/speechbrain
A PyTorch-based Speech Toolkit
Language:Python
Total stars: 7392
Stars trend:
#python
#asr, #audio, #audioprocessing, #deeplearning, #huggingface, #languagemodel, #pytorch, #speakerdiarization, #speakerrecognition, #speakerverification, #speechenhancement, #speechprocessing, #speechrecognition, #speechseparation, #speechtotext, #speechtoolkit, #speechrecognition, #spokenlanguageunderstanding, #transformers, #voicerecognition
A PyTorch-based Speech Toolkit
Language:Python
Total stars: 7392
Stars trend:
28 Feb 2024
2pm ▏ +1
3pm ██ +16
4pm █▌ +12
5pm █▋ +13
6pm █▌ +12
7pm ▉ +7
#python
#asr, #audio, #audioprocessing, #deeplearning, #huggingface, #languagemodel, #pytorch, #speakerdiarization, #speakerrecognition, #speakerverification, #speechenhancement, #speechprocessing, #speechrecognition, #speechseparation, #speechtotext, #speechtoolkit, #speechrecognition, #spokenlanguageunderstanding, #transformers, #voicerecognition
k2-fsa/sherpa-onnx
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift
Language:C++
Total stars: 1120
Stars trend:
#cplusplus
#aarch64, #android, #arm32, #asr, #cpp, #csharp, #dotnet, #ios, #linux, #macos, #mfc, #onnx, #openkylin, #raspberrypi, #riscv, #speechtotext, #texttospeech, #vits, #windows
Speech-to-text, text-to-speech, and speaker recongition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift
Language:C++
Total stars: 1120
Stars trend:
6 Jun 2024
4pm ▏ +1
5pm ▏ +1
6pm +0
7pm +0
8pm +0
9pm +0
10pm +0
11pm +0
7 Jun 2024
12am ▍ +3
1am ████▌ +36
2am ███▎ +26
3am ██ +16
#cplusplus
#aarch64, #android, #arm32, #asr, #cpp, #csharp, #dotnet, #ios, #linux, #macos, #mfc, #onnx, #openkylin, #raspberrypi, #riscv, #speechtotext, #texttospeech, #vits, #windows
ictnlp/StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Language:Python
Total stars: 389
Stars trend:
#python
#allinone, #asr, #audioprocessing, #machinetranslation, #nonautoregressive, #seamless, #simultaneoustranslation, #speech, #speechenhancement, #speechprocessing, #speechrecognition, #speechsynthesis, #speechtotext, #speechtranslation, #streamingaudio, #texttoaudio, #texttospeech, #translation, #tts, #voice
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Language:Python
Total stars: 389
Stars trend:
17 Jun 2024
9pm ▏ +1
10pm ▏ +1
11pm ▎ +2
18 Jun 2024
12am +0
1am ▋ +5
2am ▍ +3
3am █▍ +11
4am ███ +24
5am █▋ +13
6am █ +8
7am ▉ +7
#python
#allinone, #asr, #audioprocessing, #machinetranslation, #nonautoregressive, #seamless, #simultaneoustranslation, #speech, #speechenhancement, #speechprocessing, #speechrecognition, #speechsynthesis, #speechtotext, #speechtranslation, #streamingaudio, #texttoaudio, #texttospeech, #translation, #tts, #voice
PeterH0323/Streamer-Sales
Streamer-Sales Top Sales - Sales Anchor LLM Model 🛒🎁, a large sales anchor model that can explain products from the perspective of stimulating users' purchase intention based on given product characteristics. 🚀⭐️Contains detailed data generation process❗️ 📦In addition, it also integrates LMDeploy accelerated reasoning🚀, RAG search enhanced generation 📚, TTS text-to-speech🔊, digital human generation , Agent uses the network to query real-time information🌐, ASR voice-to-text🎙
Language:Python
Total stars: 470
Stars trend:
#python
#asr, #chat, #chatapplication, #chatbot, #chatgpt, #digitalhuman, #gpt, #internlmchat7b, #internlm2, #llm, #metahuman, #rag, #textgeneration, #tts
Streamer-Sales Top Sales - Sales Anchor LLM Model 🛒🎁, a large sales anchor model that can explain products from the perspective of stimulating users' purchase intention based on given product characteristics. 🚀⭐️Contains detailed data generation process❗️ 📦In addition, it also integrates LMDeploy accelerated reasoning🚀, RAG search enhanced generation 📚, TTS text-to-speech🔊, digital human generation , Agent uses the network to query real-time information🌐, ASR voice-to-text🎙
Language:Python
Total stars: 470
Stars trend:
24 Jun 2024
11pm ▏ +1
25 Jun 2024
12am ▏ +1
1am ██ +16
2am ███▋ +29
3am ██▍ +19
4am █▏ +9
#python
#asr, #chat, #chatapplication, #chatbot, #chatgpt, #digitalhuman, #gpt, #internlmchat7b, #internlm2, #llm, #metahuman, #rag, #textgeneration, #tts
yeyupiaoling/MASR
Pytorch实现的流式与非流式的自动语音识别框架,同时兼容在线和离线识别,目前支持Conformer、Squeezeformer、DeepSpeech2模型,支持多种数据增强方法。
Language:Python
Total stars: 580
Stars trend:
#python
#asr, #conformer, #deeplearning, #deepspeech, #pytorch, #speech, #speechrecognition, #speechtotext, #squeezeformer
Pytorch实现的流式与非流式的自动语音识别框架,同时兼容在线和离线识别,目前支持Conformer、Squeezeformer、DeepSpeech2模型,支持多种数据增强方法。
Language:Python
Total stars: 580
Stars trend:
3 Aug 2024
2pm ███████████▎ +90
3pm +0
4pm +0
5pm +0
6pm +0
7pm +0
8pm +0
9pm +0
10pm +0
11pm +0
4 Aug 2024
12am +0
1am ▏ +1
#python
#asr, #conformer, #deeplearning, #deepspeech, #pytorch, #speech, #speechrecognition, #speechtotext, #squeezeformer
m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Language:Python
Total stars: 10840
Stars trend:
#python
#asr, #speech, #speechrecognition, #speechtotext, #whisper
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Language:Python
Total stars: 10840
Stars trend:
3 Sep 2024
9am ▏ +1
10am +0
11am +0
12pm ▏ +1
1pm ▏ +1
2pm ▌ +4
3pm ▋ +5
4pm ▌ +4
5pm █▎ +10
6pm ██▏ +17
7pm ██▎ +18
8pm ███▏ +25
#python
#asr, #speech, #speechrecognition, #speechtotext, #whisper
NexaAI/nexa-sdk
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
Language:Python
Total stars: 1085
Stars trend:
#python
#asr, #edgecomputing, #languagemodel, #llm, #ondeviceai, #ondeviceml, #sdk, #sdkpython, #stablediffusion, #transformers, #tts, #vlm, #whisper
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
Language:Python
Total stars: 1085
Stars trend:
22 Sep 2024
10pm █ +8
11pm ▊ +6
23 Sep 2024
12am ▍ +3
1am ▊ +6
2am █▎ +10
3am ▋ +5
4am █ +8
5am ▌ +4
6am █▏ +9
7am ▌ +4
8am ▌ +4
9am █▏ +9
#python
#asr, #edgecomputing, #languagemodel, #llm, #ondeviceai, #ondeviceml, #sdk, #sdkpython, #stablediffusion, #transformers, #tts, #vlm, #whisper
TEN-framework/TEN-Agent
TEN Agent is the world’s first real-time multimodal agent integrated with the OpenAI Realtime API, RTC, and features weather checks, web search, vision, and RAG capabilities.
Language:Python
Total stars: 1252
Stars trend:
#python
#agent, #ai, #asr, #cpp, #gemini, #golang, #gpt4, #gpt4o, #llm, #lowlatency, #multimodal, #nextjs14, #openai, #python, #rag, #realtime, #realtime, #tts, #vision, #voiceassistant
TEN Agent is the world’s first real-time multimodal agent integrated with the OpenAI Realtime API, RTC, and features weather checks, web search, vision, and RAG capabilities.
Language:Python
Total stars: 1252
Stars trend:
25 Oct 2024
10pm ▏ +1
11pm ▏ +1
26 Oct 2024
12am ▎ +2
1am ▊ +6
2am ██ +16
3am ██ +16
4am █▍ +11
5am ▊ +6
6am █▍ +11
7am ▏ +1
8am █▌ +12
#python
#agent, #ai, #asr, #cpp, #gemini, #golang, #gpt4, #gpt4o, #llm, #lowlatency, #multimodal, #nextjs14, #openai, #python, #rag, #realtime, #realtime, #tts, #vision, #voiceassistant
abus-aikorea/voice-pro
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech, and Translation.
Language:Python
Total stars: 385
Stars trend:
#python
#asr, #demucs, #fasterwhisper, #gradio, #speechrecognition, #speechsynthesis, #speechtotext, #stt, #subtitles, #texttospeech, #transcription, #translate, #translation, #translator, #tts, #uvr5, #webui, #webui, #whisper, #ytdlp
Gradio WebUI for whisper, faster-whisper, whisper-timestamped. Supports YouTube Downloader, Vocal Remover, Transcription, Text-to-Speech, and Translation.
Language:Python
Total stars: 385
Stars trend:
9 Nov 2024
10pm ▏ +1
11pm ▌ +4
10 Nov 2024
12am ▎ +2
1am █▏ +9
2am ██▏ +17
3am █▎ +10
4am ▉ +7
5am ▊ +6
6am ▍ +3
7am ▌ +4
8am ▌ +4
9am █ +8
#python
#asr, #demucs, #fasterwhisper, #gradio, #speechrecognition, #speechsynthesis, #speechtotext, #stt, #subtitles, #texttospeech, #transcription, #translate, #translation, #translator, #tts, #uvr5, #webui, #webui, #whisper, #ytdlp
TEN-framework/TEN-Agent
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatible with popular workflow platforms like Dify and Coze.
Language:Python
Total stars: 4121
Stars trend:
#python
#agent, #ai, #asr, #cpp, #gemini, #golang, #gpt4, #gpt4o, #llm, #lowlatency, #multimodal, #nextjs14, #openai, #python, #rag, #realtime, #realtime, #tts, #vision, #voiceassistant
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatible with popular workflow platforms like Dify and Coze.
Language:Python
Total stars: 4121
Stars trend:
28 Jan 2025
10am ▎ +2
11am ▉ +7
12pm ▉ +7
1pm ▉ +7
2pm █▏ +9
3pm █▌ +12
4pm █▍ +11
5pm ▋ +5
6pm █▌ +12
7pm ▌ +4
#python
#agent, #ai, #asr, #cpp, #gemini, #golang, #gpt4, #gpt4o, #llm, #lowlatency, #multimodal, #nextjs14, #openai, #python, #rag, #realtime, #realtime, #tts, #vision, #voiceassistant
umlx5h/LLPlayer
The media player for language learning, with dual subtitles, AI-generated subtitles, real-time translation, and more!
Language:C#
Total stars: 838
Stars trend:
#csharp
#asr, #csharp, #fasterwhisper, #flyleaf, #languagelearning, #llm, #mediaplayer, #ocr, #ollama, #player, #video, #videoplayer, #whisper, #wpf, #ytdlp
The media player for language learning, with dual subtitles, AI-generated subtitles, real-time translation, and more!
Language:C#
Total stars: 838
Stars trend:
12 Apr 2025
3am ▎ +2
4am ▎ +2
5am ▍ +3
6am ▏ +1
7am ▌ +4
8am ▏ +1
9am ▏ +1
10am ▍ +3
11am █▎ +10
12pm ████▏ +33
1pm ▍ +3
2pm █▋ +13
#csharp
#asr, #csharp, #fasterwhisper, #flyleaf, #languagelearning, #llm, #mediaplayer, #ocr, #ollama, #player, #video, #videoplayer, #whisper, #wpf, #ytdlp