PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data
20 Feb 2025 · Shijie Huang, Yiren Song, Yuxuan Zhang, Hailong Guo, Xueyin Wang, Mike Zheng Shou, Jiaming Liu ·
Paper: https://arxiv.org/pdf/2502.14397v1.pdf
Code: https://github.com/showlab/PhotoDoodle
20 Feb 2025 · Shijie Huang, Yiren Song, Yuxuan Zhang, Hailong Guo, Xueyin Wang, Mike Zheng Shou, Jiaming Liu ·
We introduce PhotoDoodle, a novel image editing framework designed to facilitate photo doodling by enabling artists to overlay decorative elements onto photographs. Photo doodling is challenging because the inserted elements must appear seamlessly integrated with the background, requiring realistic blending, perspective alignment, and contextual coherence. Additionally, the background must be preserved without distortion, and the artist's unique style must be captured efficiently from limited training data. These requirements are not addressed by previous methods that primarily focus on global style transfer or regional inpainting. The proposed method, PhotoDoodle, employs a two-stage training strategy. Initially, we train a general-purpose image editing model, OmniEditor, using large-scale data. Subsequently, we fine-tune this model with EditLoRA using a small, artist-curated dataset of before-and-after image pairs to capture distinct editing styles and techniques. To enhance consistency in the generated results, we introduce a positional encoding reuse mechanism. Additionally, we release a PhotoDoodle dataset featuring six high-quality styles. Extensive experiments demonstrate the advanced performance and robustness of our method in customized image editing, opening new possibilities for artistic creation.
Paper: https://arxiv.org/pdf/2502.14397v1.pdf
Code: https://github.com/showlab/PhotoDoodle
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents
https://t.me/DataScienceT
👍5
Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator
26 Feb 2025 · Xiankang He, Dongyan Guo, Hongji Li, Ruibo Li, Ying Cui, Chi Zhang ·
Paper: https://arxiv.org/pdf/2502.19204v1.pdf
Code: https://github.com/Westlake-AGI-Lab/Distill-Any-Depth
Datasets: ScanNet - NYUv2 - ETH3D
Note: Ranked #1 on Depth Estimation on ScanNetV2
26 Feb 2025 · Xiankang He, Dongyan Guo, Hongji Li, Ruibo Li, Ying Cui, Chi Zhang ·
Monocular depth estimation (#MDE) aims to predict scene depth from a single RGB image and plays a crucial role in 3D scene understanding. Recent advances in zero-shot MDE leverage normalized depth representations and distillation-based learning to improve generalization across diverse scenes. However, current depth normalization methods for distillation, relying on global normalization, can amplify noisy pseudo-labels, reducing distillation effectiveness. In this paper, we systematically analyze the impact of different depth normalization strategies on pseudo-label distillation. Based on our findings, we propose Cross-Context Distillation, which integrates global and local depth cues to enhance pseudo-label quality. Additionally, we introduce a multi-teacher distillation framework that leverages complementary strengths of different depth estimation models, leading to more robust and accurate depth predictions. Extensive experiments on benchmark datasets demonstrate that our approach significantly outperforms state-of-the-art methods, both quantitatively and qualitatively.
Paper: https://arxiv.org/pdf/2502.19204v1.pdf
Code: https://github.com/Westlake-AGI-Lab/Distill-Any-Depth
Datasets: ScanNet - NYUv2 - ETH3D
Note: Ranked #1 on Depth Estimation on ScanNetV2
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents
https://t.me/DataScienceT
👍4
Escaping The Big Data Paradigm in Self-Supervised Representation Learning
25 Feb 2025 · Carlos Vélez García, Miguel Cazorla, Jorge Pomares ·
Paper: https://arxiv.org/pdf/2502.18056v1.pdf
Code: https://github.com/inescopresearch/scott
Datasets: Oxford 102 Flower - Oxford-IIIT Pets - Imagenet100
25 Feb 2025 · Carlos Vélez García, Miguel Cazorla, Jorge Pomares ·
The reliance on large-scale datasets and extensive computational resources has become a major barrier to advancing representation learning in vision, especially in data-scarce domains. In this paper, we address the critical question: Can we escape the big data paradigm in self-supervised representation learning from images? We introduce #SCOTT (Sparse Convolutional Tokenizer for Transformers), a shallow tokenization architecture that is compatible with Masked Image Modeling (MIM) tasks. SCOTT injects convolutional inductive biases into Vision Transformers (ViTs), enhancing their efficacy in small-scale data regimes. Alongside, we propose to train on a Joint-Embedding Predictive Architecture within a MIM framework (MIM-JEPA), operating in latent representation space to capture more semantic features. Our approach enables ViTs to be trained from scratch on datasets orders of magnitude smaller than traditionally required --without relying on massive external datasets for pretraining. We validate our method on three small-size, standard-resoultion, fine-grained datasets: Oxford Flowers-102, Oxford IIIT Pets-37, and ImageNet-100. Despite the challenges of limited data and high intra-class similarity, frozen SCOTT models pretrained with MIM-JEPA significantly outperform fully supervised methods and achieve competitive results with SOTA approaches that rely on large-scale pretraining, complex image augmentations and bigger model sizes. By demonstrating that robust off-the-shelf representations can be learned with limited data, compute, and model sizes, our work paves the way for computer applications in resource constrained environments such as medical imaging or robotics. Our findings challenge the prevailing notion that vast amounts of data are indispensable for effective representation learning in vision, offering a new pathway toward more accessible and inclusive advancements in the field.
Paper: https://arxiv.org/pdf/2502.18056v1.pdf
Code: https://github.com/inescopresearch/scott
Datasets: Oxford 102 Flower - Oxford-IIIT Pets - Imagenet100
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents #GPT4
https://t.me/DataScienceT
👍3
A-MEM: Agentic Memory for LLM Agents
17 Feb 2025 · Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, Yongfeng Zhang ·
Paper: https://arxiv.org/pdf/2502.12110v3.pdf
Code: https://github.com/wujiangxu/agenticmemory
17 Feb 2025 · Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, Yongfeng Zhang ·
While large language model (LLM) agents can effectively use external tools for complex real-world tasks, they require memory systems to leverage historical experiences. Current memory systems enable basic storage and retrieval but lack sophisticated memory organization, despite recent attempts to incorporate graph databases. Moreover, these systems' fixed operations and structures limit their adaptability across diverse tasks. To address this limitation, this paper proposes a novel agentic memory system for LLM agents that can dynamically organize memories in an agentic way. Following the basic principles of the Zettelkasten method, we designed our memory system to create interconnected knowledge networks through dynamic indexing and linking. When a new memory is added, we generate a comprehensive note containing multiple structured attributes, including contextual descriptions, keywords, and tags. The system then analyzes historical memories to identify relevant connections, establishing links where meaningful similarities exist. Additionally, this process enables memory evolution - as new memories are integrated, they can trigger updates to the contextual representations and attributes of existing historical memories, allowing the memory network to continuously refine its understanding. Our approach combines the structured organization principles of Zettelkasten with the flexibility of agent-driven decision making, allowing for more adaptive and context-aware memory management. Empirical experiments on six foundation models show superior improvement against existing SOTA baselines. The source code for evaluating performance is available at https://github.com/WujiangXu/AgenticMemory, while the source code of agentic memory system is available at https://github.com/agiresearch/A-mem.
Paper: https://arxiv.org/pdf/2502.12110v3.pdf
Code: https://github.com/wujiangxu/agenticmemory
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #RAG #Agents #GPT4
https://t.me/DataScienceT
👍3
Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles
26 Feb 2025 · Kuang Wang, Xianfei Li, Shenghao Yang, Li Zhou, Feng Jiang, Haizhou Li ·
.
Paper: https://arxiv.org/pdf/2502.18968v1.pdf
Code: https://github.com/wangkevin02/USP
Dataset: LMSYS-USP
26 Feb 2025 · Kuang Wang, Xianfei Li, Shenghao Yang, Li Zhou, Feng Jiang, Haizhou Li ·
User simulators are crucial for replicating human interactions with dialogue systems, supporting both collaborative training and automatic evaluation, especially for large language models (LLMs). However, existing simulators often rely solely on text utterances, missing implicit user traits such as personality, speaking style, and goals. In contrast, persona-based methods lack generalizability, as they depend on predefined profiles of famous individuals or archetypes. To address these challenges, we propose User Simulator with implicit Profiles (#USP), a framework that infers implicit user profiles from human-machine conversations and uses them to generate more personalized and realistic dialogues. We first develop an LLM-driven extractor with a comprehensive profile schema. Then, we refine the simulation through conditional supervised fine-tuning and reinforcement learning with cycle consistency, optimizing it at both the utterance and conversation levels. Finally, we adopt a diverse profile sampler to capture the distribution of real-world user profiles. Experimental results demonstrate that USP outperforms strong baselines in terms of authenticity and diversity while achieving comparable performance in consistency. Furthermore, dynamic multi-turn evaluations based on USP strongly align with mainstream benchmarks, demonstrating its effectiveness in real-world applications
.
Paper: https://arxiv.org/pdf/2502.18968v1.pdf
Code: https://github.com/wangkevin02/USP
Dataset: LMSYS-USP
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents #GPT4
https://t.me/DataScienceT
👍3
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
Paper: https://arxiv.org/pdf/2503.01710v1.pdf
Code: https://github.com/sparkaudio/spark-tts
3 Mar 2025 · Xinsheng Wang, Mingqi Jiang, Ziyang Ma, Ziyu Zhang, Songxiang Liu, Linqin Li, Zheng Liang, Qixi Zheng, Rui Wang, Xiaoqin Feng, Weizhen Bian, Zhen Ye, Sitong Cheng, Ruibin Yuan, Zhixian Zhao, Xinfa Zhu, Jiahao Pan, Liumeng Xue, Pengcheng Zhu, Yunlin Chen, Zhifei Li, Xie Chen, Lei Xie, Yike Guo, Wei Xue ·
Recent advancements in large language models (LLMs) have driven significant progress in zero-shot text-to-speech (TTS) synthesis. However, existing foundation models rely on multi-stage processing or complex architectures for predicting multiple codebooks, limiting efficiency and integration flexibility. To overcome these challenges, we introduce Spark-TTS, a novel system powered by BiCodec, a single-stream speech codec that decomposes speech into two complementary token types: low-bitrate semantic tokens for linguistic content and fixed-length global tokens for speaker attributes. This disentangled representation, combined with the Qwen2.5 LLM and a chain-of-thought (CoT) generation approach, enables both coarse-grained control (e.g., gender, speaking style) and fine-grained adjustments (e.g., precise pitch values, speaking rate). To facilitate research in controllable TTS, we introduce VoxBox, a meticulously curated 100,000-hour dataset with comprehensive attribute annotations. Extensive experiments demonstrate that Spark-TTS not only achieves state-of-the-art zero-shot voice cloning but also generates highly customizable voices that surpass the limitations of reference-based synthesis. Source code, pre-trained models, and audio samples are available at https://github.com/SparkAudio/Spark-TTS.
Paper: https://arxiv.org/pdf/2503.01710v1.pdf
Code: https://github.com/sparkaudio/spark-tts
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents #GPT4
https://t.me/DataScienceT
👍6
Open Deep Search: Democratizing Search with Open-source Reasoning Agents
26 Mar 2025 · Salaheddin Alzubi, Creston Brooks, Purva Chiniya, Edoardo Contente, Chiara von Gerlach, Lucas Irwin, Yihan Jiang, Arda Kaz, Windsor Nguyen, Sewoong Oh, Himanshu Tyagi, Pramod Viswanath ·
Paper: https://arxiv.org/pdf/2503.20201v1.pdf
Code: https://github.com/sentient-agi/opendeepsearch
26 Mar 2025 · Salaheddin Alzubi, Creston Brooks, Purva Chiniya, Edoardo Contente, Chiara von Gerlach, Lucas Irwin, Yihan Jiang, Arda Kaz, Windsor Nguyen, Sewoong Oh, Himanshu Tyagi, Pramod Viswanath ·
We introduce Open Deep Search (ODS) to close the increasing gap between the proprietary search AI solutions, such as Perplexity's Sonar Reasoning Pro and OpenAI's GPT-4o Search Preview, and their open-source counterparts. The main innovation introduced in ODS is to augment the reasoning capabilities of the latest open-source LLMs with reasoning agents that can judiciously use web search tools to answer queries. Concretely, ODS consists of two components that work with a base LLM chosen by the user: Open Search Tool and Open Reasoning Agent. Open Reasoning Agent interprets the given task and completes it by orchestrating a sequence of actions that includes calling tools, one of which is the Open Search Tool. Open Search Tool is a novel web search tool that outperforms proprietary counterparts. Together with powerful open-source reasoning LLMs, such as DeepSeek-R1, ODS nearly matches and sometimes surpasses the existing state-of-the-art baselines on two benchmarks: SimpleQA and FRAMES. For example, on the FRAMES evaluation benchmark, ODS improves the best existing baseline of the recently released GPT-4o Search Preview by 9.7% in accuracy. ODS is a general framework for seamlessly augmenting any LLMs -- for example, DeepSeek-R1 that achieves 82.4% on SimpleQA and 30.1% on FRAMES -- with search and reasoning capabilities to achieve state-of-the-art performance: 88.3% on SimpleQA and 75.3% on FRAMES.
Paper: https://arxiv.org/pdf/2503.20201v1.pdf
Code: https://github.com/sentient-agi/opendeepsearch
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents #GPT4
https://t.me/DataScienceT
👍4
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models
10 Feb 2025 · Yangguang Li, Zi-Xin Zou, Zexiang Liu, Dehu Wang, Yuan Liang, Zhipeng Yu, Xingchao Liu, Yuan-Chen Guo, Ding Liang, Wanli Ouyang, Yan-Pei Cao ·
Paper: https://arxiv.org/pdf/2502.06608v3.pdf
Codes:
https://github.com/VAST-AI-Research/TripoSG
https://github.com/tencent/flashvdm
Dataset: 100poisonMpts
10 Feb 2025 · Yangguang Li, Zi-Xin Zou, Zexiang Liu, Dehu Wang, Yuan Liang, Zhipeng Yu, Xingchao Liu, Yuan-Chen Guo, Ding Liang, Wanli Ouyang, Yan-Pei Cao ·
Recent advancements in diffusion techniques have propelled image and video generation to unprecedented levels of quality, significantly accelerating the deployment and application of generative AI. However, 3D shape generation technology has so far lagged behind, constrained by limitations in 3D data scale, complexity of 3D data processing, and insufficient exploration of advanced techniques in the 3D domain. Current approaches to 3D shape generation face substantial challenges in terms of output quality, generalization capability, and alignment with input conditions. We present TripoSG, a new streamlined shape diffusion paradigm capable of generating high-fidelity 3D meshes with precise correspondence to input images. Specifically, we propose: 1) A large-scale rectified flow transformer for 3D shape generation, achieving state-of-the-art fidelity through training on extensive, high-quality data. 2) A hybrid supervised training strategy combining SDF, normal, and eikonal losses for 3D VAE, achieving high-quality 3D reconstruction performance. 3) A data processing pipeline to generate 2 million high-quality 3D samples, highlighting the crucial rules for data quality and quantity in training 3D generative models. Through comprehensive experiments, we have validated the effectiveness of each component in our new framework. The seamless integration of these parts has enabled TripoSG to achieve state-of-the-art performance in 3D shape generation. The resulting 3D shapes exhibit enhanced detail due to high-resolution capabilities and demonstrate exceptional fidelity to input images. Moreover, TripoSG demonstrates improved versatility in generating 3D models from diverse image styles and contents, showcasing strong generalization capabilities. To foster progress and innovation in the field of 3D generation, we will make our model publicly available.
Paper: https://arxiv.org/pdf/2502.06608v3.pdf
Codes:
https://github.com/VAST-AI-Research/TripoSG
https://github.com/tencent/flashvdm
Dataset: 100poisonMpts
#DataScience #ArtificialIntelligence #MachineLearning #PythonProgramming #DeepLearning #LLM #AIResearch #BigData #NeuralNetworks #DataAnalytics #NLP #AutoML #DataVisualization #ScikitLearn #Pandas #NumPy #TensorFlow #AIethics #PredictiveModeling #GPUComputing #OpenSourceAI #DeepSeek #RAG #Agents #GPT4
https://t.me/DataScienceT
👍3
🤖🧠 Ling-1T by inclusionAI: The Future of Smarter, Faster and More Efficient AI Models
🗓️ 09 Oct 2025
📚 AI News & Trends
Artificial Intelligence is evolving at lightning speed and inclusionAI’s Ling-1T is one of the most exciting innovations leading the charge. Built on the advanced Ling 2.0 architecture, Ling-1T is a trillion-parameter model designed to combine incredible reasoning power, speed and scalability in one open-source system. Image Source : Hugging Face Unlike many AI models that ...
#Ling1T #inclusionAI #ArtificialIntelligence #OpenSourceAI #LargeLanguageModels #AIArchitecture
🗓️ 09 Oct 2025
📚 AI News & Trends
Artificial Intelligence is evolving at lightning speed and inclusionAI’s Ling-1T is one of the most exciting innovations leading the charge. Built on the advanced Ling 2.0 architecture, Ling-1T is a trillion-parameter model designed to combine incredible reasoning power, speed and scalability in one open-source system. Image Source : Hugging Face Unlike many AI models that ...
#Ling1T #inclusionAI #ArtificialIntelligence #OpenSourceAI #LargeLanguageModels #AIArchitecture
❤1
🤖🧠 Quivr AI: Building Your Second Brain with Open-Source Generative Intelligence
🗓️ 12 Oct 2025
📚 AI News & Trends
In the rapidly evolving landscape of artificial intelligence, developers and businesses are seeking solutions that merge flexibility, power, and simplicity. Enter Quivr — an open-source framework designed to help you build your own “second brain” powered by Generative AI. Whether you’re an indie developer, startup founder or enterprise engineer, it makes it possible to integrate ...
#QuivrAI #SecondBrain #GenerativeAI #OpenSourceAI #AIFramework #AIProductivity
🗓️ 12 Oct 2025
📚 AI News & Trends
In the rapidly evolving landscape of artificial intelligence, developers and businesses are seeking solutions that merge flexibility, power, and simplicity. Enter Quivr — an open-source framework designed to help you build your own “second brain” powered by Generative AI. Whether you’re an indie developer, startup founder or enterprise engineer, it makes it possible to integrate ...
#QuivrAI #SecondBrain #GenerativeAI #OpenSourceAI #AIFramework #AIProductivity
🤖🧠 HunyuanWorld-Mirror: Tencent’s Breakthrough in Universal 3D Reconstruction
🗓️ 03 Nov 2025
📚 AI News & Trends
The race toward achieving universal 3D understanding has reached a significant milestone with Tencent’s HunyuanWorld-Mirror, a cutting-edge open-source model designed to revolutionize 3D reconstruction. In an era dominated by visual intelligence and immersive digital experiences, this new model stands out by offering a feed-forward, geometry-aware framework that can predict multiple 3D outputs in a single ...
#HunyuanWorld #Tencent #3DReconstruction #UniversalAI #GeometryAware #OpenSourceAI
🗓️ 03 Nov 2025
📚 AI News & Trends
The race toward achieving universal 3D understanding has reached a significant milestone with Tencent’s HunyuanWorld-Mirror, a cutting-edge open-source model designed to revolutionize 3D reconstruction. In an era dominated by visual intelligence and immersive digital experiences, this new model stands out by offering a feed-forward, geometry-aware framework that can predict multiple 3D outputs in a single ...
#HunyuanWorld #Tencent #3DReconstruction #UniversalAI #GeometryAware #OpenSourceAI
❤1
✨WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning
📝 Summary:
WebSailor is a post-training method that teaches open-source AI models to systematically reduce uncertainty in complex information-seeking tasks. Using synthetic high-uncertainty tasks and an RL algorithm, it enables open-source agents to match the performance of proprietary systems.
🔹 Publication Date: Published on Sep 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.13305
• PDF: https://arxiv.org/pdf/2509.13305
• Project Page: https://tongyi-agent.github.io/blog/
• Github: https://tongyi-agent.github.io/blog/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #ReinforcementLearning #OpenSourceAI #AIAgents #MachineLearning
📝 Summary:
WebSailor is a post-training method that teaches open-source AI models to systematically reduce uncertainty in complex information-seeking tasks. Using synthetic high-uncertainty tasks and an RL algorithm, it enables open-source agents to match the performance of proprietary systems.
🔹 Publication Date: Published on Sep 16
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2509.13305
• PDF: https://arxiv.org/pdf/2509.13305
• Project Page: https://tongyi-agent.github.io/blog/
• Github: https://tongyi-agent.github.io/blog/
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#AI #ReinforcementLearning #OpenSourceAI #AIAgents #MachineLearning
✨InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
📝 Summary:
InternVL3 introduces a native multimodal pre-training paradigm, jointly learning from multimodal and text data to overcome conventional alignment challenges. This unified approach, combined with advanced techniques, achieves state-of-the-art performance on multimodal tasks, rivaling proprietary m...
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.10479
• PDF: https://arxiv.org/pdf/2504.10479
• Project Page: https://internvl.github.io/blog/2025-04-11-InternVL-3.0/
🔹 Models citing this paper:
• https://huggingface.co/OpenGVLab/InternVL3-78B
• https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B
• https://huggingface.co/OpenGVLab/InternVL3-8B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/OpenGVLab/MMPR-v1.2-prompts
✨ Spaces citing this paper:
• https://huggingface.co/spaces/AntResearchNLP/ViLaBench
• https://huggingface.co/spaces/TIGER-Lab/MEGA-Bench
• https://huggingface.co/spaces/prithivMLmods/Tiny-VLMs-Lab
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MultimodalAI #DeepLearning #AIResearch #OpenSourceAI #GenerativeAI
📝 Summary:
InternVL3 introduces a native multimodal pre-training paradigm, jointly learning from multimodal and text data to overcome conventional alignment challenges. This unified approach, combined with advanced techniques, achieves state-of-the-art performance on multimodal tasks, rivaling proprietary m...
🔹 Publication Date: Published on Apr 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.10479
• PDF: https://arxiv.org/pdf/2504.10479
• Project Page: https://internvl.github.io/blog/2025-04-11-InternVL-3.0/
🔹 Models citing this paper:
• https://huggingface.co/OpenGVLab/InternVL3-78B
• https://huggingface.co/OpenGVLab/InternVL3_5-241B-A28B
• https://huggingface.co/OpenGVLab/InternVL3-8B
✨ Datasets citing this paper:
• https://huggingface.co/datasets/OpenGVLab/MMPR-v1.2-prompts
✨ Spaces citing this paper:
• https://huggingface.co/spaces/AntResearchNLP/ViLaBench
• https://huggingface.co/spaces/TIGER-Lab/MEGA-Bench
• https://huggingface.co/spaces/prithivMLmods/Tiny-VLMs-Lab
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MultimodalAI #DeepLearning #AIResearch #OpenSourceAI #GenerativeAI
arXiv.org
InternVL3: Exploring Advanced Training and Test-Time Recipes for...
We introduce InternVL3, a significant advancement in the InternVL series featuring a native multimodal pre-training paradigm. Rather than adapting a text-only large language model (LLM) into a...
🤖🧠 vLLM Semantic Router: The Next Frontier in Intelligent Model Routing for LLMs
🗓️ 11 Nov 2025
📚 AI News & Trends
As large language models (LLMs) continue to evolve, organizations face new challenges in optimizing performance, accuracy and cost across various AI workloads. Running multiple models efficiently – each specialized for specific tasks has become essential for scalable AI deployment. Enter vLLM Semantic Router, an open-source innovation that introduces a new layer of intelligence to the ...
#vLLMSemanticRouter #LargeLanguageModels #AIScaling #ModelRouting #OpenSourceAI #LLMOptimization
🗓️ 11 Nov 2025
📚 AI News & Trends
As large language models (LLMs) continue to evolve, organizations face new challenges in optimizing performance, accuracy and cost across various AI workloads. Running multiple models efficiently – each specialized for specific tasks has become essential for scalable AI deployment. Enter vLLM Semantic Router, an open-source innovation that introduces a new layer of intelligence to the ...
#vLLMSemanticRouter #LargeLanguageModels #AIScaling #ModelRouting #OpenSourceAI #LLMOptimization
🤖🧠 Plandex AI: The Future of Autonomous Coding Agents for Large-Scale Development
🗓️ 11 Nov 2025
📚 AI News & Trends
As software development becomes increasingly complex, developers are turning to AI tools that can manage, understand and automate large portions of the coding workflow. Among the most promising innovations in this space is Plandex AI, an open-source terminal-based coding agent designed for real-world, large-scale projects. Unlike simple AI coding assistants that handle small snippets, Plandex ...
#PlandexAI #AutonomousCoding #LargeScaleDevelopment #AICoding #OpenSourceAI #CodeAutomation
🗓️ 11 Nov 2025
📚 AI News & Trends
As software development becomes increasingly complex, developers are turning to AI tools that can manage, understand and automate large portions of the coding workflow. Among the most promising innovations in this space is Plandex AI, an open-source terminal-based coding agent designed for real-world, large-scale projects. Unlike simple AI coding assistants that handle small snippets, Plandex ...
#PlandexAI #AutonomousCoding #LargeScaleDevelopment #AICoding #OpenSourceAI #CodeAutomation
🤖🧠 Bytebot: The Future of AI Desktop Automation
🗓️ 12 Nov 2025
📚 AI News & Trends
In the era of rapid digital transformation, automation is the driving force behind business efficiency and innovation. While most AI agents are limited to browsers or APIs, a groundbreaking open-source project called Bytebot has redefined what AI can achieve. Bytebot introduces a self-hosted AI desktop agent — a virtual computer that performs complex, multi-step tasks ...
#Bytebot #AIDesktopAutomation #SelfHostedAI #OpenSourceAI #AIAgents #TaskAutomation
🗓️ 12 Nov 2025
📚 AI News & Trends
In the era of rapid digital transformation, automation is the driving force behind business efficiency and innovation. While most AI agents are limited to browsers or APIs, a groundbreaking open-source project called Bytebot has redefined what AI can achieve. Bytebot introduces a self-hosted AI desktop agent — a virtual computer that performs complex, multi-step tasks ...
#Bytebot #AIDesktopAutomation #SelfHostedAI #OpenSourceAI #AIAgents #TaskAutomation
❤1
✨MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling
📝 Summary:
MiroThinker v1.0 is an open-source research agent introducing 'interactive scaling.' It trains models with reinforcement learning for deeper agent-environment interactions, performing up to 600 tool calls per task. This achieves state-of-the-art performance and establishes interaction depth as a ...
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11793
• PDF: https://arxiv.org/pdf/2511.11793
• Project Page: https://dr.miromind.ai/
• Github: https://github.com/MiroMindAI/MiroThinker
🔹 Models citing this paper:
• https://huggingface.co/miromind-ai/MiroThinker-v1.0-72B
• https://huggingface.co/miromind-ai/MiroThinker-v1.0-8B
• https://huggingface.co/miromind-ai/MiroThinker-v1.0-30B
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MiroThinker #ResearchAgents #ReinforcementLearning #OpenSourceAI #LLM
📝 Summary:
MiroThinker v1.0 is an open-source research agent introducing 'interactive scaling.' It trains models with reinforcement learning for deeper agent-environment interactions, performing up to 600 tool calls per task. This achieves state-of-the-art performance and establishes interaction depth as a ...
🔹 Publication Date: Published on Nov 14
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2511.11793
• PDF: https://arxiv.org/pdf/2511.11793
• Project Page: https://dr.miromind.ai/
• Github: https://github.com/MiroMindAI/MiroThinker
🔹 Models citing this paper:
• https://huggingface.co/miromind-ai/MiroThinker-v1.0-72B
• https://huggingface.co/miromind-ai/MiroThinker-v1.0-8B
• https://huggingface.co/miromind-ai/MiroThinker-v1.0-30B
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#MiroThinker #ResearchAgents #ReinforcementLearning #OpenSourceAI #LLM
arXiv.org
MiroThinker: Pushing the Performance Boundaries of Open-Source...
We present MiroThinker v1.0, an open-source research agent designed to advance tool-augmented reasoning and information-seeking capabilities. Unlike previous agents that only scale up model size...
❤1
✨Mobile-Agent-v3: Foundamental Agents for GUI Automation
📝 Summary:
GUI-Owl and Mobile-Agent-v3 are open-source GUI agent models achieving state-of-the-art performance on GUI benchmarks. GUI-Owl introduces large-scale environment infrastructure, diverse agent capabilities, and scalable reinforcement learning, with Mobile-Agent-v3 further improving these results.
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15144
• PDF: https://arxiv.org/pdf/2508.15144
• Project Page: https://github.com/X-PLUG/MobileAgent
• Github: https://github.com/X-PLUG/MobileAgent
🔹 Models citing this paper:
• https://huggingface.co/mPLUG/GUI-Owl-7B
• https://huggingface.co/mPLUG/GUI-Owl-32B
• https://huggingface.co/mPLUG/GUI-Owl-7B-Desktop-RL
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#GUIAgent #Automation #ReinforcementLearning #AIResearch #OpenSourceAI
📝 Summary:
GUI-Owl and Mobile-Agent-v3 are open-source GUI agent models achieving state-of-the-art performance on GUI benchmarks. GUI-Owl introduces large-scale environment infrastructure, diverse agent capabilities, and scalable reinforcement learning, with Mobile-Agent-v3 further improving these results.
🔹 Publication Date: Published on Aug 21
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2508.15144
• PDF: https://arxiv.org/pdf/2508.15144
• Project Page: https://github.com/X-PLUG/MobileAgent
• Github: https://github.com/X-PLUG/MobileAgent
🔹 Models citing this paper:
• https://huggingface.co/mPLUG/GUI-Owl-7B
• https://huggingface.co/mPLUG/GUI-Owl-32B
• https://huggingface.co/mPLUG/GUI-Owl-7B-Desktop-RL
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#GUIAgent #Automation #ReinforcementLearning #AIResearch #OpenSourceAI
✨Scaling Open-Ended Reasoning to Predict the Future
📝 Summary:
This work trains language models for open-ended future prediction using a new dataset synthesized from news. Their OpenForecaster 8B model matches larger proprietary models in accuracy, calibration, and consistency. All resources are open-sourced.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.25070
• PDF: https://arxiv.org/pdf/2512.25070
• Project Page: https://www.openforecaster.github.io
• Github: https://github.com/OpenForecaster/scaling-forecasting-training
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLMs #FuturePrediction #AI #OpenSourceAI #MachineLearning
📝 Summary:
This work trains language models for open-ended future prediction using a new dataset synthesized from news. Their OpenForecaster 8B model matches larger proprietary models in accuracy, calibration, and consistency. All resources are open-sourced.
🔹 Publication Date: Published on Dec 31, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.25070
• PDF: https://arxiv.org/pdf/2512.25070
• Project Page: https://www.openforecaster.github.io
• Github: https://github.com/OpenForecaster/scaling-forecasting-training
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLMs #FuturePrediction #AI #OpenSourceAI #MachineLearning
✨BitNet b1.58 2B4T Technical Report
📝 Summary:
BitNet b1.58 2B4T is the first open-source 1-bit Large Language Model with 2 billion parameters. It matches full-precision LLM performance while offering significant improvements in computational efficiency like reduced memory and energy. The model weights are openly released for research.
🔹 Publication Date: Published on Apr 16, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.12285
• PDF: https://arxiv.org/pdf/2504.12285
• Github: https://github.com/microsoft/bitnet
🔹 Models citing this paper:
• https://huggingface.co/microsoft/bitnet-b1.58-2B-4T
• https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf
• https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16
✨ Spaces citing this paper:
• https://huggingface.co/spaces/suayptalha/Chat-with-Bitnet-b1.58-2B-4T
• https://huggingface.co/spaces/aizip-dev/SLM-RAG-Arena
• https://huggingface.co/spaces/Tonic/Native_1-bit_LLM
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #AI #Quantization #OpenSourceAI #DeepLearning
📝 Summary:
BitNet b1.58 2B4T is the first open-source 1-bit Large Language Model with 2 billion parameters. It matches full-precision LLM performance while offering significant improvements in computational efficiency like reduced memory and energy. The model weights are openly released for research.
🔹 Publication Date: Published on Apr 16, 2025
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2504.12285
• PDF: https://arxiv.org/pdf/2504.12285
• Github: https://github.com/microsoft/bitnet
🔹 Models citing this paper:
• https://huggingface.co/microsoft/bitnet-b1.58-2B-4T
• https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf
• https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-bf16
✨ Spaces citing this paper:
• https://huggingface.co/spaces/suayptalha/Chat-with-Bitnet-b1.58-2B-4T
• https://huggingface.co/spaces/aizip-dev/SLM-RAG-Arena
• https://huggingface.co/spaces/Tonic/Native_1-bit_LLM
==================================
For more data science resources:
✓ https://t.me/DataScienceT
#LLM #AI #Quantization #OpenSourceAI #DeepLearning
arXiv.org
BitNet b1.58 2B4T Technical Report
We introduce BitNet b1.58 2B4T, the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter scale. Trained on a corpus of 4 trillion tokens, the model has been...