ML Research Hub
32.8K subscribers
4.13K photos
244 videos
23 files
4.46K links
Advancing research in Machine Learning – practical insights, tools, and techniques for researchers.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
MinerU: An Open-Source Solution for Precise Document Content Extraction

📝 Summary:
MinerU is an open-source tool that provides high-precision document content extraction. It uses fine-tuned models and pre/postprocessing rules to consistently achieve high performance across diverse document types.

🔹 Publication Date: Published on Sep 27, 2024

🔹 Paper Links:
• arXiv Page: https://arxiv.org/pdf/2409.18839
• PDF: https://huggingface.co/spaces/Echo9k/PDF_reader
• Github: https://github.com/opendatalab/MinerU

Spaces citing this paper:
https://huggingface.co/spaces/opendatalab/MinerU
https://huggingface.co/spaces/xiaoye-winters/MinerU-API
https://huggingface.co/spaces/ApeAITW/MinerU_2.5_Test

==================================

For more data science resources:
https://t.me/DataScienceT

#DocumentExtraction #OpenSource #DataScience #NLP #AI