GitHub Trends

#cplusplus #artificial_intelligence #computer_vision #document #document_analysis #document_intelligence #document_recognition #document_understanding #documentai #end_to_end_ocr #multimodal #multimodal_deep_learning #ocr #scene_text_detection #scene_text_detection_recognition #scene_text_recognition #text_detection #text_recognition #vision_language #vision_language_model #vision_language_transformer

https://github.com/AlibabaResearch/AdvancedLiterateMachinery

GitHub

GitHub - AlibabaResearch/AdvancedLiterateMachinery: A collection of original, innovative ideas and algorithms towards Advanced…

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group. ...

👍1

1.59K views11:56

GitHub Trends

#python #ocr #ocr_python #paddleocr #qml #qt #screenshot #umi_ocr

Umi-OCR is a free, open-source, and offline OCR (Optical Character Recognition) software that offers several benefits. Here are the key points The software is completely free to use, with all code available openly.
- **Convenient** It comes with efficient OCR engines and supports multiple languages.
- **Flexible** It includes screenshot OCR, batch OCR, PDF recognition, QR code scanning and generation, and formula recognition.

This software is easy to use, supports various file formats, and has features like ignoring regions in images to exclude unwanted text. It also supports multiple languages and themes, making it highly customizable. Overall, Umi-OCR is a powerful tool for anyone needing to extract text from images or documents efficiently.

https://github.com/hiroi-sora/Umi-OCR

GitHub

GitHub - hiroi-sora/Umi-OCR: OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片，PDF文档识别，排除水印/页眉页脚，扫描/生成二维码。内置多国语言库。

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片，PDF文档识别，排除水印/页眉页脚，扫描/生成二维码。内置多国语言库。 - hiroi-sora/Umi-OCR

408 views11:30

GitHub Trends

#python #chineseocr #crnn #db #ocr #ocrlite

PaddleOCR is a powerful tool for Optical Character Recognition (OCR) that helps developers create and use advanced models. It supports various cutting-edge algorithms and models, such as text recognition, table recognition, and formula recognition. The tool offers low-code development capabilities, making it easy to use with simple Python APIs and graphical interfaces. This allows developers to quickly integrate and customize models for different tasks, including automated office work, financial risk control, healthcare, education, and more. It also supports deployment on various hardware like NVIDIA GPUs, Kunlun chips, and others, making it highly efficient and versatile.

https://github.com/PaddlePaddle/PaddleOCR

GitHub

GitHub - PaddlePaddle/PaddleOCR: Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit…

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages. - PaddlePaddle/Paddl...

👍2

391 views16:00

GitHub Trends

#typescript #clipboard #color_picker #cross_platform #electron #image_editing #image_editor #live_text #ocr #paddleocr #screen_capture #screen_recorder #screenshot #search #search_photos

eSearch is a powerful tool that helps you capture, edit, and search content on your screen. It works on Windows, Linux, and macOS. With eSearch, you can take screenshots, recognize text using OCR (even offline), translate text, and search images. You can also record your screen, add annotations, and use various editing tools like cropping, blurring, and more.

The benefit to you is that eSearch makes it easy to manage and interact with the content on your screen in multiple ways, saving you time and effort. It's especially useful for tasks like capturing and translating text from images or videos, which can be very handy for work or study.

https://github.com/xushengfeng/eSearch

GitHub

GitHub - xushengfeng/eSearch: 截屏离线OCR 搜索翻译以图搜图贴图录屏万向滚动截屏屏幕翻译 Screenshot Offline OCR Search Translate Search for…

截屏离线OCR 搜索翻译以图搜图贴图录屏万向滚动截屏屏幕翻译 Screenshot Offline OCR Search Translate Search for picture Paste the picture on the screen Screen recorder Omnidirectional scrolling screenshot ...

430 views15:30

GitHub Trends

#python #ocr #pdf

Zerox OCR is a simple tool to convert documents into Markdown format using AI. Here’s how it helps you you pass in your file, and Zerox OCR returns the content in Markdown format, which you can easily read and use.

This tool saves time and effort by automating the process of extracting text from complex documents, making it easier to work with the content digitally.

https://github.com/getomni-ai/zerox

GitHub

GitHub - getomni-ai/zerox: OCR & Document Extraction using vision models

OCR & Document Extraction using vision models. Contribute to getomni-ai/zerox development by creating an account on GitHub.

499 views22:30

GitHub Trends

#python #chineseocr #crnn #dbnet #easyocr #ocr #onnxocr #onnxruntime #openvino #paddleocr #rapidocr

RapidOCR is a free, open-source tool that quickly recognizes text from images. It is very fast, supports multiple languages like Chinese and English, and works on various platforms including Linux, Windows, and Mac. You can use it offline, which is convenient. The tool is easy to install and use, and it even allows you to customize it for specific needs. This makes it beneficial for users who need quick and accurate text recognition without relying on internet connectivity.

https://github.com/RapidAI/RapidOCR

GitHub

GitHub - RapidAI/RapidOCR: 📄 Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO, PaddlePaddle and…

📄 Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO, PaddlePaddle and PyTorch. - RapidAI/RapidOCR

360 views13:00

GitHub Trends

#python #ai4science #document_analysis #extract_data #layout_analysis #ocr #parser #pdf #pdf_converter #pdf_extractor_llm #pdf_extractor_pretrain #pdf_extractor_rag #pdf_parser #python

MinerU is a tool that converts PDFs into machine-readable formats like markdown or JSON. Here are the key benefits and features MinerU removes headers, footers, and other unnecessary elements to ensure the text is semantically coherent and in human-readable order, even for complex layouts.
- **Structure Preservation** It extracts images, image descriptions, tables, and table titles.
- **Formula Conversion** Recognizes tables and converts them to LaTeX or HTML format.
- **OCR Support** Supports multiple output formats and various visualization results.
- **GPU and CPU Compatibility**: Works on both CPU and GPU environments, compatible with Windows, Linux, and Mac.

You can try MinerU through an online demo, a quick CPU demo, or by using a GPU for faster processing. For detailed usage, refer to the command line options, API integration, and deployment guides provided.

https://github.com/opendatalab/MinerU

GitHub

GitHub - opendatalab/MinerU: Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows. - opendatalab/MinerU

458 views11:30

GitHub Trends

#javascript #deep_learning #javascript #ocr #tesseract #webassembly

Tesseract.js is a JavaScript library that helps you extract text from images in almost any language. It works in both browsers and on servers using Node.js. You can easily install it using a script tag, webpack, or npm. Here’s how it benefits you: it allows you to convert images into text quickly and accurately, supporting multiple languages and formats. This can be very useful for tasks like scanning documents, recognizing text in videos, and more. The library is also efficient, with smaller file sizes and lower memory usage, making it faster to use.

https://github.com/naptha/tesseract.js

GitHub

GitHub - naptha/tesseract.js: Pure Javascript OCR for more than 100 Languages 📖🎉🖥

Pure Javascript OCR for more than 100 Languages 📖🎉🖥 - naptha/tesseract.js

552 views15:00

GitHub Trends

#python #image_processing #ocr #pdf #python #tesseract

OCRmyPDF is a tool that makes scanned PDF files searchable and editable. It adds a text layer to the PDF, so you can search for words or copy and paste text from the document. It supports many languages, fixes misrotated or crooked pages, and optimizes the file size. The tool works on various operating systems like Linux, Windows, and macOS, and it uses multiple CPU cores to speed up the process. This makes it easier to work with scanned documents and keeps your files organized and searchable.

https://github.com/ocrmypdf/OCRmyPDF

GitHub

GitHub - ocrmypdf/OCRmyPDF: OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched - ocrmypdf/OCRmyPDF

523 views11:30

GitHub Trends

#kotlin #aes_256 #android #background_removal #clean_architecture #crop #djvu #edit_photo #exif #f_droid #filter_image #image_manipulation #jetpack_compose #jxl #kotlin #material_you #ocr_recognition #pdf #psd #qrcode_scanner #watermark

Image Toolbox is a powerful and versatile image editing tool that lets you do many things with your photos. You can crop, apply over 230 different filters, edit EXIF data, remove backgrounds, and even convert images to PDFs. It also allows you to add stickers and text, extract text from images in over 120 languages, and encrypt files with AES-256 encryption. You can resize images using various scaling algorithms, convert between multiple image formats, and create collages. The app also supports GIF, WEBP, APNG, and JXL conversions, document scanning, QR code scanning and creation, and more. It has a simple interface but offers many advanced features, making it useful for both photographers and developers.

https://github.com/T8RIN/ImageToolbox

GitHub

GitHub - T8RIN/ImageToolbox: 🖼️ Image Toolbox is a powerful app for advanced image manipulation. It offers dozens of features,…

🖼️ Image Toolbox is a powerful app for advanced image manipulation. It offers dozens of features, from basic tools like crop and draw to filters, OCR, and a wide range of image processing options -...

458 views11:30

GitHub Trends

#typescript #anki #chatgpt #deepseek #electron #evernote #knowledge_base #local_first #markdown #note_taking #notes_app #notion #obsidian #ocr #ollama #openai #pdf #s3 #self_hosted #webdav

SiYuan is a privacy-first personal knowledge management tool. It allows you to organize your thoughts and notes in a secure way, even offline. You can use features like block-level references, Markdown editing, and mathematical formulas. It also supports AI tools and has apps for Android, iOS, and HarmonyOS. SiYuan is open source and free for most features, making it a great choice for managing your personal knowledge securely.

https://github.com/siyuan-note/siyuan

GitHub

GitHub - siyuan-note/siyuan: A privacy-first, self-hosted, fully open source personal knowledge management software, written in…

A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang. - siyuan-note/siyuan

482 views00:00

GitHub Trends

#javascript #linux #macos #ocr #pot #pot_app #recognize #tauri #translate #translation #tts #windows

Pot is a cross-platform translation tool that lets you quickly translate text by selecting it and using a shortcut, typing text to translate, or using OCR to translate text from screenshots. It supports many translation engines like OpenAI, Google, DeepL, and more, plus offline options. You can also add plugins to extend its features and use it on Windows, macOS, and Linux. Pot offers an API for integration with other software and works well even on Wayland systems. This makes translating easier, faster, and more flexible, helping you understand and work with multiple languages efficiently.

https://github.com/pot-app/pot-desktop

GitHub

GitHub - pot-app/pot-desktop: 🌈一个跨平台的划词翻译和OCR软件 | A cross-platform software for text translation and recognition.

🌈一个跨平台的划词翻译和OCR软件 | A cross-platform software for text translation and recognition. - pot-app/pot-desktop

327 views12:00

GitHub Trends

#python #document_analysis #layout_analysis #ocr #parser #pdf #pdf_converter #pdf_parser #python #vlm_ocr

Dolphin is a smart AI tool that can analyze and understand complex document images, like pages with text, tables, formulas, and pictures. It works in two steps: first, it figures out the layout and reading order of the page; then, it quickly parses each element using special prompts. This makes it fast and accurate for turning document images into structured data like JSON or Markdown. You can use pre-trained models and easy code to process single pages, PDFs, or specific elements. This helps you save time and effort when extracting information from complicated documents efficiently.

https://github.com/bytedance/Dolphin

GitHub

GitHub - bytedance/Dolphin: The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025. - bytedance/Dolphin

443 views19:30

About

Blog

Apps

Platform