Code Stars
1.88K subscribers
8.5K photos
8.79K links
Code Stars provides notifications about GitHub repositories that are gaining a significant number of stars in a short period of time. Be the first to find out about trending repositories that everybody will be talking about soon.
#AI #chatGPT #python
Download Telegram
CatchTheTornado/pdf-extract-api
Document (PDF) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
Language:Python
Total stars: 250
Stars trend:
3 Nov 2024
2pm ▏ +1
3pm █▊ +14
4pm █▉ +15
5pm ▋ +5
6pm ▍ +3
7pm ▌ +4
8pm ▍ +3
9pm ▌ +4
10pm ▍ +3
11pm ▉ +7
4 Nov 2024
12am ▊ +6
1am █▉ +15

#python
#anonymization, #api, #extract, #json, #llm, #ocr, #ocrpython, #pdf, #pii
getomni-ai/zerox
PDF to Markdown with vision models
Language:Python
Total stars: 8139
Stars trend:
16 Jan 2025
3am ▍ +3
4am ▊ +6
5am +0
6am ▌ +4
7am ▉ +7
8am ▌ +4
9am ▌ +4
10am ▌ +4
11am █▍ +11
12pm █▏ +9
1pm █▎ +10
2pm ██▉ +23

#python
#ocr, #pdf
siyuan-note/siyuan
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
Language:TypeScript
Total stars: 26661
Stars trend:
17 Jan 2025
7pm ▏ +1
8pm +0
9pm +0
10pm +0
11pm ▏ +1
18 Jan 2025
12am ███ +24
1am ███▍ +27
2am ███▏ +25
3am ███▍ +27
4am ██▉ +23

#typescript
#anki, #chatgpt, #electron, #evernote, #knowledgebase, #localfirst, #markdown, #notetaking, #notebook, #notesapp, #notion, #obsidian, #ocr, #openai, #pdf, #pkm, #s3, #selfhosted, #webdav
codexu/note-gen
一款专注于记录和写作的跨端 AI 笔记
Language:TypeScript
Total stars: 265
Stars trend:
19 Jan 2025
9am ▎ +2
10am █▌ +12
11am ▎ +2
12pm █ +8
1pm █▉ +15
2pm █▊ +14
3pm █▋ +13
4pm ▋ +5
5pm ▍ +3
6pm ▏ +1

#typescript
#ai, #app, #chatgpt, #markdown, #nextjs, #notes, #ocr, #openai, #rust, #shadcnui, #tailwindcss, #tauri
yobix-ai/extractous
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Language:Rust
Total stars: 777
Stars trend:
29 Jan 2025
10pm █▏ +9
11pm ▌ +4
30 Jan 2025
12am █▎ +10
1am ▋ +5
2am █▏ +9
3am ▊ +6
4am ▉ +7
5am █ +8
6am ▉ +7
7am █ +8
8am ▋ +5
9am █ +8

#rust
#datapipelines, #docx, #etl, #etlpipelines, #extraction, #llm, #machinelearning, #naturallanguageprocessing, #nlp, #ocr, #pdf, #pdfparser, #rag, #rust, #tika, #unstructured, #unstructureddata
paperless-ngx/paperless-ngx
A community-supported supercharged version of paperless: scan, index and archive all your physical documents
Language:Python
Total stars: 24407
Stars trend:
31 Jan 2025
6am ▎ +2
7am ▎ +2
8am ▎ +2
9am ▎ +2
10am +0
11am ▉ +7
12pm █▏ +9
1pm █▉ +15
2pm █▋ +13
3pm █▉ +15
4pm █▌ +12
5pm █▋ +13

#python
#angular, #archiving, #django, #dms, #documentmanagement, #documentmanagementsystem, #machinelearning, #ocr, #opticalcharacterrecognition, #pdf
ocrmypdf/OCRmyPDF
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Language:Python
Total stars: 14952
Stars trend:
2 Feb 2025
6am ▏ +1
7am +0
8am ▏ +1
9am ▏ +1
10am ▍ +3
11am ▎ +2
12pm ▍ +3
1pm █▎ +10
2pm ███▎ +26
3pm █▌ +12
4pm █▋ +13
5pm █▉ +15

#python
#imageprocessing, #ocr, #pdf, #python, #tesseract
Goldziher/kreuzberg
A text extraction library supporting PDFs, images, office documents and more
Language:Python
Total stars: 304
Stars trend:
15 Feb 2025
12am █ +8
1am ▋ +5
2am █ +8
3am ▊ +6
4am ▉ +7
5am ▉ +7
6am ▊ +6
7am ▎ +2
8am █ +8
9am █ +8
10am █▋ +13

#python
#asyncio, #docx, #ocr, #pdf, #textextraction
CatchTheTornado/text-extract-api
Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
Language:Python
Total stars: 2248
Stars trend:
15 Feb 2025
6am ▉ +7
7am █▎ +10
8am ▌ +4
9am ▉ +7
10am ▉ +7
11am ▍ +3
12pm ▊ +6
1pm ▋ +5
2pm █ +8
3pm █▎ +10
4pm █ +8
5pm ▍ +3

#python
#anonymization, #api, #extract, #json, #llm, #ocr, #ocrpython, #pdf, #pii
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Language:Python
Total stars: 27107
Stars trend:
3 Mar 2025
3am █▋ +13
4am ▋ +5
5am ▉ +7
6am █▍ +11
7am █▏ +9
8am ▊ +6
9am ▉ +7
10am █ +8
11am ▊ +6
12pm ▊ +6
1pm █ +8
2pm ▉ +7

#python
#ai4science, #documentanalysis, #extractdata, #layoutanalysis, #ocr, #parser, #pdf, #pdfconverter, #pdfextractorllm, #pdfextractorpretrain, #pdfextractorrag, #pdfparser, #python
oomol-lab/pdf-craft
PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books. The project has just started.
Language:Python
Total stars: 1537
Stars trend:
10 Apr 2025
4pm ▏ +1
5pm +0
6pm +0
7pm +0
8pm +0
9pm +0
10pm ▏ +1
11pm ▏ +1
11 Apr 2025
12am ████▊ +38
1am ██████████▊ +86
2am ████████ +64

#python
#ai, #document, #ocr, #pdf
umlx5h/LLPlayer
The media player for language learning, with dual subtitles, AI-generated subtitles, real-time translation, and more!
Language:C#
Total stars: 838
Stars trend:
12 Apr 2025
3am ▎ +2
4am ▎ +2
5am ▍ +3
6am ▏ +1
7am ▌ +4
8am ▏ +1
9am ▏ +1
10am ▍ +3
11am █▎ +10
12pm ████▏ +33
1pm ▍ +3
2pm █▋ +13

#csharp
#asr, #csharp, #fasterwhisper, #flyleaf, #languagelearning, #llm, #mediaplayer, #ocr, #ollama, #player, #video, #videoplayer, #whisper, #wpf, #ytdlp
kotaro-kinoshita/yomitoku
Yomitoku is an AI-powered document image analysis package designed specifically for the Japanese language.
Language:Python
Total stars: 697
Stars trend:
20 Apr 2025
10am ▊ +6
11am █▎ +10
12pm █▍ +11
1pm █▉ +15
2pm █▊ +14
3pm ▌ +4
4pm █ +8
5pm ▌ +4
6pm +0
7pm +0
8pm ▍ +3
9pm ▌ +4

#python
#deeplearning, #layoutanalysis, #ocr, #python, #pytorch
hiroi-sora/Umi-OCR
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
Language:Python
Total stars: 33184
Stars trend:
7 May 2025
4am ▎ +2
5am ▌ +4
6am ▎ +2
7am ▍ +3
8am ▉ +7
9am █▎ +10
10am ▊ +6
11am █▎ +10
12pm ▊ +6
1pm ▉ +7
2pm █▏ +9
3pm █▏ +9

#python
#ocr, #ocrpython, #paddleocr, #qml, #qt, #screenshot, #umiocr
clawsoftware/clawPDF
Open Source Virtual (Network) Printer for Windows that allows you to create PDFs, OCR text, and print images, with advanced features usually available only in enterprise solutions.
Language:C#
Total stars: 1043
Stars trend:
19 May 2025
12pm ▍ +3
1pm █████▌ +44
2pm ███████▎ +58
3pm ██████▌ +52
4pm ██▋ +21

#csharp
#imageprocessing, #merge, #networkprinter, #ocr, #pdf, #pdfmerger, #pdfprinter, #print, #printer, #terminalserver, #windows