GitHub Trends
10.1K subscribers
15.3K links
See what the GitHub community is most excited about today.

A bot automatically fetches new repositories from https://github.com/trending and sends them to the channel.

Author and maintainer: https://github.com/katursis
Download Telegram
#python #ai4science #document_analysis #extract_data #layout_analysis #ocr #parser #pdf #pdf_converter #pdf_extractor_llm #pdf_extractor_pretrain #pdf_extractor_rag #pdf_parser #python

MinerU is a tool that converts PDFs into machine-readable formats like markdown or JSON. Here are the key benefits and features MinerU removes headers, footers, and other unnecessary elements to ensure the text is semantically coherent and in human-readable order, even for complex layouts.
- **Structure Preservation** It extracts images, image descriptions, tables, and table titles.
- **Formula Conversion** Recognizes tables and converts them to LaTeX or HTML format.
- **OCR Support** Supports multiple output formats and various visualization results.
- **GPU and CPU Compatibility**: Works on both CPU and GPU environments, compatible with Windows, Linux, and Mac.

You can try MinerU through an online demo, a quick CPU demo, or by using a GPU for faster processing. For detailed usage, refer to the command line options, API integration, and deployment guides provided.

https://github.com/opendatalab/MinerU
#ruby #daisyui #document_signing #documents #e_signature #github_catalyst #hotwired_turbo #legaltech #open_source #pdf #pdf_sign #pdf_signature #ruby_on_rails #self_hosted #tailwindcss #vuejs #webpack

DocuSeal is a free and open-source platform that helps you fill and sign documents online easily. You can create PDF forms with various field types like signatures, dates, and checkboxes, and these forms can be filled and signed on any device. It offers features like automated emails, multiple language support, and integration with cloud storage services. The platform is mobile-optimized and has tools for user management and API integrations. This makes it convenient for businesses to integrate document signing into their apps, reducing costs and ensuring security and compliance. You can try it out with a live demo or deploy it quickly using various hosting options.

https://github.com/docusealco/docuseal
#python #docx #llm #parser #pdf #powerpoint

MegaParse is a powerful tool that helps you parse different types of documents like text, PDFs, PowerPoint presentations, and Word documents without losing any information. It is fast, efficient, and supports many file formats. You can use it for free since it is open source. To use MegaParse, you just need to install it with a simple command and set up some additional tools depending on your needs. This tool benefits you by making it easy to extract data from various documents quickly and accurately, saving you time and effort.

https://github.com/QuivrHQ/MegaParse
#lua #cbz #djvu #djvu_reflow #ebook #ebook_reader #eink #epub #ereader #fb2 #kindle #kobo #luajit #opds #pdf #pdf_reflow #pocketbook #reader #reflow #remarkable_tablet #ubuntu_touch

KOReader is a powerful document viewer designed for e-ink readers and other devices. It supports many file formats like PDF, EPUB, and more, and allows you to customize the reading experience with adjustable margins, line spacing, and fonts. It's fast, even on older devices, and integrates with tools like calibre and Google Translate. KOReader is also optimized for e-ink devices with features like easy zoom and no animations. This makes reading comfortable and efficient, giving you a better experience overall.

https://github.com/koreader/koreader
#javascript #book #cb7 #cbr #cbt #cbz #comic #docx #ebook #epub #fb2 #html #markdown #mobi #pdf #reader #rtf #txt #xml

Koodo Reader is a powerful ebook reader that works on many platforms like Windows, macOS, Linux, and even the web. It supports many file formats such as EPUB, PDF, MOBI, and more. You can customize how your books look by changing font size, color, and background. It also has features like text-to-speech, translation, and night mode. You can save your books to cloud services like OneDrive, Google Drive, and Dropbox, making it easy to access your books on different devices. This makes reading convenient and enjoyable anywhere you go.

https://github.com/koodo-reader/koodo-reader
1
#python #chinese #english #japanese #korean #latex #openai #pdf #pdf2zh #russian #translation

This tool, called PDFMathTranslate, helps you translate scientific PDF papers while keeping formulas, charts, and other important parts intact. You can use it in several ways: through a command line, an interactive user interface, or using Docker. It supports multiple languages and various translation services like Google, DeepL, and more. You can even try it online without installing anything. This makes it easier to understand and work with documents in different languages, saving you time and effort.

https://github.com/Byaidu/PDFMathTranslate
#typescript #digital_signature #document_signing #docusign_alternative #e_signature #esign #esignature #next_auth #nextjs #open_source #pades_standard #pdf #pdf_sign #pdf_signature #postgresql #prisma #self_hosted #signing #typescript

Documenso is an open-source alternative to DocuSign, allowing you to sign documents digitally in a secure and transparent way. You can self-host it, which means you have full control over how it works and can review the code. This builds trust because you aren't relying on a third-party provider. Joining the community helps in creating a more open and trustworthy signing tool. You can test it locally, provide feedback, and even contribute to its development. This gives you flexibility and control over your document signing process.

https://github.com/documenso/documenso
#python #image_processing #ocr #pdf #python #tesseract

OCRmyPDF is a tool that makes scanned PDF files searchable and editable. It adds a text layer to the PDF, so you can search for words or copy and paste text from the document. It supports many languages, fixes misrotated or crooked pages, and optimizes the file size. The tool works on various operating systems like Linux, Windows, and macOS, and it uses multiple CPU cores to speed up the process. This makes it easier to work with scanned documents and keeps your files organized and searchable.

https://github.com/ocrmypdf/OCRmyPDF
#kotlin #aes_256 #android #background_removal #clean_architecture #crop #djvu #edit_photo #exif #f_droid #filter_image #image_manipulation #jetpack_compose #jxl #kotlin #material_you #ocr_recognition #pdf #psd #qrcode_scanner #watermark

Image Toolbox is a powerful and versatile image editing tool that lets you do many things with your photos. You can crop, apply over 230 different filters, edit EXIF data, remove backgrounds, and even convert images to PDFs. It also allows you to add stickers and text, extract text from images in over 120 languages, and encrypt files with AES-256 encryption. You can resize images using various scaling algorithms, convert between multiple image formats, and create collages. The app also supports GIF, WEBP, APNG, and JXL conversions, document scanning, QR code scanning and creation, and more. It has a simple interface but offers many advanced features, making it useful for both photographers and developers.

https://github.com/T8RIN/ImageToolbox
#rust #drawing #gtk #gtk_rs #gtk4 #gtk4_rs #hacktoberfest #handwriting #infinite_canvas #notes #notes_app #pdf #rust #wacom_tablet

Rnote is a free, open-source app for sketching and taking handwritten notes. It's great for students, teachers, and anyone with a drawing tablet. You can import and export PDFs, pictures, and various image formats. The app has an adaptive interface that works well on both big and small screens. It offers features like pressure-sensitive stylus input, customizable backgrounds, and the ability to work on multiple documents at once. You can also save your work in a native format and export it in several formats. Rnote is available for Linux, MacOS, and Windows, making it a versatile tool for anyone who likes to draw or take notes digitally.

https://github.com/flxzt/rnote
👍1
#html #autogen #autogen_extension #langchain #markdown #microsoft_office #openai #pdf

MarkItDown is a tool that helps you convert many types of files into Markdown format. It supports converting files like PDF, PowerPoint, Word, Excel, images, audio, HTML, and more. You can install it using `pip install markitdown` or from the source code. The tool has a simple command-line interface and also works with Python scripts. It even supports plugins and integration with Azure Document Intelligence for advanced conversions. This makes it easy to analyze and index different types of files, which is very useful for organizing and working with various document formats.

https://github.com/microsoft/markitdown
#java #docx #fileview #fileviewer #java #kkfileview #office #office_view #pdf #word

kkFileView is a tool that helps you preview many types of files online, like documents (docx, pptx), images, videos, and more. It uses Spring Boot, making it easy to set up and use. This project supports a wide range of file formats and allows you to extend its capabilities by adding new file types. The benefit is that you can easily view files without needing to download them first, which saves time and space on your device.

https://github.com/kekingcn/kkFileView
#tex #awesome #coverletter #cv #latex #latex_template #overleaf #pdf #resume #sharelatex #tex

Awesome CV is a LaTeX template designed for creating professional CVs, résumés, and cover letters with clean, customizable formatting, offering an easy way to produce polished job application documents that stand out. It includes pre-designed layouts, supports PDF generation, and works seamlessly with tools like Overleaf or Docker, saving time while ensuring a visually consistent and industry-standard presentation.

https://github.com/posquit0/Awesome-CV
1
#kotlin #compiler #markdown #markdown_parser #markup_language #paper #pdf #presentations #programming_language #scripting_language #slides #typesetting #typesetting_system

Quarkdown is a powerful tool that helps you write and format documents using Markdown. It allows you to create complex content with functions and variables, making it more versatile than regular Markdown. You can easily compile your work into print-ready books or interactive presentations. Quarkdown supports exporting to HTML and PDF, and it includes features like live preview, which helps you see changes as you make them. This makes it easier to ensure your document looks exactly how you want it to.

https://github.com/iamgio/quarkdown
#typescript #alternative #converter #data_manipulation #developer_tools #devtools #frontend #good_first_issue #image_manipulation #image_processing #javascript #pdf_manipulation #productivity #react #self_hosted #swissarmyknife #tools #typescript #video_manipulation #webapp #website

OmniTools is a self-hosted web app that helps with many tasks like image and video editing, number crunching, and more. It offers tools for resizing images, converting videos, calculating dates, and generating prime numbers. You can run it on your own computer using Docker, which means your data stays local. This app is open-source and free, allowing you to contribute new features or tools easily. Using OmniTools simplifies many everyday tasks and keeps your data private.

https://github.com/iib0011/omni-tools
👍1
#typescript #anki #chatgpt #deepseek #electron #evernote #knowledge_base #local_first #markdown #note_taking #notes_app #notion #obsidian #ocr #ollama #openai #pdf #s3 #self_hosted #webdav

SiYuan is a privacy-first personal knowledge management tool. It allows you to organize your thoughts and notes in a secure way, even offline. You can use features like block-level references, Markdown editing, and mathematical formulas. It also supports AI tools and has apps for Android, iOS, and HarmonyOS. SiYuan is open source and free for most features, making it a great choice for managing your personal knowledge securely.

https://github.com/siyuan-note/siyuan
#typescript #desktop #docx #electron #html #languages #libreoffice #linux #macos #markdown #nodejs #office #offline #pandoc #pdf #productivity #windows #zettlr

Zettlr is a free, open-source app that helps you write, organize, and publish your notes and documents using simple Markdown files. It works on Windows, macOS, and Linux, and lets you manage your notes with features like workspaces, tags, and powerful search, so you can quickly find what you need. Zettlr supports easy citations with reference managers like Zotero, offers code highlighting, dark mode, and flexible export options to PDF, Word, or LaTeX, making it ideal for students, researchers, and writers who want a privacy-focused, distraction-free way to work with their ideas and publish their work[1][3][5]. The benefit is that you can focus on your content, not formatting, and easily turn your notes into professional documents.

https://github.com/Zettlr/Zettlr
2
#typescript #bun #conversion #convert #converter #document_conversion #elysia #file_conversion #file_converter #hacktoberfest #pdf_converter #self_hosted #tailwindcss #typescript

ConvertX is a self-hosted online file converter that supports over a thousand file formats, including images, videos, documents, e-books, and 3D assets. It lets you convert multiple files at once, offers password protection, and supports multiple user accounts for privacy. You can run it easily using Docker, making it simple to set up on your own server. This means your files stay private since conversions happen locally without sending data to external servers. It uses powerful open-source tools like FFmpeg and ImageMagick, giving you a versatile and secure way to handle all your file conversion needs in one place[1][2].

https://github.com/C4illin/ConvertX
#python #document_analysis #layout_analysis #ocr #parser #pdf #pdf_converter #pdf_parser #python #vlm_ocr

Dolphin is a smart AI tool that can analyze and understand complex document images, like pages with text, tables, formulas, and pictures. It works in two steps: first, it figures out the layout and reading order of the page; then, it quickly parses each element using special prompts. This makes it fast and accurate for turning document images into structured data like JSON or Markdown. You can use pre-trained models and easy code to process single pages, PDFs, or specific elements. This helps you save time and effort when extracting information from complicated documents efficiently.

https://github.com/bytedance/Dolphin
#csharp #pdf #pdf_converter #pdf_document_processor #pdf_generation

PDFPatcher is a free and open-source tool that helps you manage PDF files. It allows you to edit PDF metadata, bookmarks, and page layouts. You can also merge, split, and rotate PDF pages. Additionally, it supports converting PDF pages to images and extracting specific pages. The software is free to use and does not have ads or privacy concerns. It encourages users to do a good deed after using it, which is part of its unique "良心授权" (conscience license) agreement. This tool is beneficial for users who need to manipulate PDFs without spending money on expensive software.

https://github.com/wmjordan/PDFPatcher