✨DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
📝 Summary:
DataFlow is an LLM-driven framework for unified, high-quality data preparation. It automates pipeline generation from natural language, significantly boosting LLM performance across diverse tasks like math, code, and text. DataFlow ensures reproducible data and provides a scalable foundation for AI.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16676
• PDF: https://arxiv.org/pdf/2512.16676
• Project Page: https://github.com/OpenDCAI/DataFlow
• Github: https://github.com/OpenDCAI/DataFlow
✨ Datasets citing this paper:
• https://huggingface.co/datasets/OpenDCAI/dataflow-demo-code
• https://huggingface.co/datasets/OpenDCAI/dataflow-demo-Text2SQL
• https://huggingface.co/datasets/OpenDCAI/dataflow-instruct-10k
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #DataPreparation #DataCentricAI #WorkflowAutomation #AIResearch
📝 Summary:
DataFlow is an LLM-driven framework for unified, high-quality data preparation. It automates pipeline generation from natural language, significantly boosting LLM performance across diverse tasks like math, code, and text. DataFlow ensures reproducible data and provides a scalable foundation for AI.
🔹 Publication Date: Published on Dec 18
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.16676
• PDF: https://arxiv.org/pdf/2512.16676
• Project Page: https://github.com/OpenDCAI/DataFlow
• Github: https://github.com/OpenDCAI/DataFlow
✨ Datasets citing this paper:
• https://huggingface.co/datasets/OpenDCAI/dataflow-demo-code
• https://huggingface.co/datasets/OpenDCAI/dataflow-demo-Text2SQL
• https://huggingface.co/datasets/OpenDCAI/dataflow-instruct-10k
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #DataPreparation #DataCentricAI #WorkflowAutomation #AIResearch
arXiv.org
DataFlow: An LLM-Driven Framework for Unified Data Preparation and...
The rapidly growing demand for high-quality data in Large Language Models (LLMs) has intensified the need for scalable, reliable, and semantically rich data preparation pipelines. However, current...