ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs
📚 'Read
@Machine_learn
  📚 'Read
@Machine_learn
Forwarded from Github LLMs
From System 1 to System 2: A Survey of Reasoning Large Language Models
24 Feb 2025 · Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, Yingying Zhang, Fei Yin, Jiahua Dong, Zhijiang Guo, Le Song, Cheng-Lin Liu ·
Achieving human-level intelligence requires refining the transition from the fast, intuitive System 1 to the slower, more deliberate System 2 reasoning. While System 1 excels in quick, heuristic decisions, System 2 relies on logical reasoning for more accurate judgments and reduced biases. Foundational Large Language Models (LLMs) excel at fast decision-making but lack the depth for complex reasoning, as they have not yet fully embraced the step-by-step analysis characteristic of true System 2 thinking. Recently, reasoning LLMs like OpenAI's o1/o3 and DeepSeek's R1 have demonstrated expert-level performance in fields such as mathematics and coding, closely mimicking the deliberate reasoning of System 2 and showcasing human-like cognitive abilities. This survey begins with a brief overview of the progress in foundational LLMs and the early development of System 2 technologies, exploring how their combination has paved the way for reasoning LLMs. Next, we discuss how to construct reasoning #LLMs, analyzing their features, the core methods enabling advanced reasoning, and the evolution of various reasoning LLMs. Additionally, we provide an overview of reasoning benchmarks, offering an in-depth comparison of the performance of representative reasoning LLMs. Finally, we explore promising directions for advancing reasoning LLMs and maintain a real-time \href{https://github.com/zzli2022/Awesome-Slow-Reason-System}{GitHub Repository} to track the latest developments. We hope this survey will serve as a valuable resource to inspire innovation and drive progress in this rapidly evolving field.
Paper: https://arxiv.org/pdf/2502.17419v1.pdf
Code: https://github.com/zzli2022/awesome-slow-reason-system
Datasets: GSM8K - MedQA - MathVista - GPQA - MMLU-Pro - PGPS9K
💠 https://t.me/deep_learning_proj
24 Feb 2025 · Zhong-Zhi Li, Duzhen Zhang, Ming-Liang Zhang, Jiaxin Zhang, Zengyan Liu, Yuxuan Yao, Haotian Xu, Junhao Zheng, Pei-Jie Wang, Xiuyi Chen, Yingying Zhang, Fei Yin, Jiahua Dong, Zhijiang Guo, Le Song, Cheng-Lin Liu ·
Achieving human-level intelligence requires refining the transition from the fast, intuitive System 1 to the slower, more deliberate System 2 reasoning. While System 1 excels in quick, heuristic decisions, System 2 relies on logical reasoning for more accurate judgments and reduced biases. Foundational Large Language Models (LLMs) excel at fast decision-making but lack the depth for complex reasoning, as they have not yet fully embraced the step-by-step analysis characteristic of true System 2 thinking. Recently, reasoning LLMs like OpenAI's o1/o3 and DeepSeek's R1 have demonstrated expert-level performance in fields such as mathematics and coding, closely mimicking the deliberate reasoning of System 2 and showcasing human-like cognitive abilities. This survey begins with a brief overview of the progress in foundational LLMs and the early development of System 2 technologies, exploring how their combination has paved the way for reasoning LLMs. Next, we discuss how to construct reasoning #LLMs, analyzing their features, the core methods enabling advanced reasoning, and the evolution of various reasoning LLMs. Additionally, we provide an overview of reasoning benchmarks, offering an in-depth comparison of the performance of representative reasoning LLMs. Finally, we explore promising directions for advancing reasoning LLMs and maintain a real-time \href{https://github.com/zzli2022/Awesome-Slow-Reason-System}{GitHub Repository} to track the latest developments. We hope this survey will serve as a valuable resource to inspire innovation and drive progress in this rapidly evolving field.
Paper: https://arxiv.org/pdf/2502.17419v1.pdf
Code: https://github.com/zzli2022/awesome-slow-reason-system
Datasets: GSM8K - MedQA - MathVista - GPQA - MMLU-Pro - PGPS9K
Please open Telegram to view this post
    VIEW IN TELEGRAM
  ❤1👍1
  Forwarded from Papers
با عرض سلام برای یکی از کارهای سریزمانی با استفاده از Whighted Deep Neural Network و Wavelet نیاز به نویسنده مسول داریم. 
نفر ۴ ام از مقاله خواهند بود. هزینه این مشارکت 250 دلار و دوستانی که نیاز دارند جهت بررسی جزئیات بیشتر به ایدی بنده پیام بدن.
@Raminmousa
  نفر ۴ ام از مقاله خواهند بود. هزینه این مشارکت 250 دلار و دوستانی که نیاز دارند جهت بررسی جزئیات بیشتر به ایدی بنده پیام بدن.
@Raminmousa
Machine learning books and papers pinned «با عرض سلام برای یکی از کارهای سریزمانی با استفاده از Whighted Deep Neural Network و Wavelet نیاز به نویسنده مسول داریم.  نفر ۴ ام از مقاله خواهند بود.  هزینه این مشارکت 250 دلار و دوستانی که نیاز دارند جهت بررسی جزئیات بیشتر به ایدی بنده  پیام بدن.   @Raminmousa»
  Hawk: Learning to Understand Open-World Video Anomalies
27 May 2024 · Jiaqi Tang, Hao Lu, Ruizheng Wu, Xiaogang Xu, Ke Ma, Cheng Fang, Bin Guo, Jiangbo Lu, Qifeng Chen, Ying-Cong Chen ·
Video Anomaly Detection (#VAD) systems can autonomously monitor and identify disturbances, reducing the need for manual labor and associated costs. However, current VAD systems are often limited by their superficial semantic understanding of scenes and minimal user interaction. Additionally, the prevalent data scarcity in existing datasets restricts their applicability in open-world scenarios. In this paper, we introduce Hawk, a novel framework that leverages interactive large Visual Language Models (#VLM) to interpret video anomalies precisely. Recognizing the difference in motion information between abnormal and normal videos, Hawk explicitly integrates motion modality to enhance anomaly identification. To reinforce motion attention, we construct an auxiliary consistency loss within the motion and video space, guiding the video branch to focus on the motion modality. Moreover, to improve the interpretation of motion-to-language, we establish a clear supervisory relationship between motion and its linguistic representation. Furthermore, we have annotated over 8,000 anomaly videos with language descriptions, enabling effective training across diverse open-world scenarios, and also created 8,000 question-answering pairs for users' open-world questions. The final results demonstrate that #Hawk achieves SOTA performance, surpassing existing baselines in both video description generation and question-answering. Our codes/dataset/demo will be released at https://github.com/jqtangust/hawk.
Paper: https://arxiv.org/pdf/2405.16886v1.pdf
Code: https://github.com/jqtangust/hawk
Dataset: Hawk Annotation Dataset
@Machine_learn
  27 May 2024 · Jiaqi Tang, Hao Lu, Ruizheng Wu, Xiaogang Xu, Ke Ma, Cheng Fang, Bin Guo, Jiangbo Lu, Qifeng Chen, Ying-Cong Chen ·
Video Anomaly Detection (#VAD) systems can autonomously monitor and identify disturbances, reducing the need for manual labor and associated costs. However, current VAD systems are often limited by their superficial semantic understanding of scenes and minimal user interaction. Additionally, the prevalent data scarcity in existing datasets restricts their applicability in open-world scenarios. In this paper, we introduce Hawk, a novel framework that leverages interactive large Visual Language Models (#VLM) to interpret video anomalies precisely. Recognizing the difference in motion information between abnormal and normal videos, Hawk explicitly integrates motion modality to enhance anomaly identification. To reinforce motion attention, we construct an auxiliary consistency loss within the motion and video space, guiding the video branch to focus on the motion modality. Moreover, to improve the interpretation of motion-to-language, we establish a clear supervisory relationship between motion and its linguistic representation. Furthermore, we have annotated over 8,000 anomaly videos with language descriptions, enabling effective training across diverse open-world scenarios, and also created 8,000 question-answering pairs for users' open-world questions. The final results demonstrate that #Hawk achieves SOTA performance, surpassing existing baselines in both video description generation and question-answering. Our codes/dataset/demo will be released at https://github.com/jqtangust/hawk.
Paper: https://arxiv.org/pdf/2405.16886v1.pdf
Code: https://github.com/jqtangust/hawk
Dataset: Hawk Annotation Dataset
@Machine_learn
Forwarded from Papers
با عرض سلام نفر سوم از مقاله زير را نياز داريم. 
Title: Wavelet transform and deep average model for price and illiquidity prediction cryptocurrencies using high-dimensional features
🔸 🔸 🔸 🔸 🔸 🔸 🔸 🔸 
abstarct: Cryptocurrencies are alternative payment methods that are created using encrypted algorithms. Encryption technologies mean that cryptocurrencies act as both a currency and a virtual accounting system. The global crypto market value is \$2.9 trillion. Hence, it requires high investment requirements. One of the challenging issues in cryptocurrencies is illiquidity. Due to behavioural chaos in the market, some currencies have severe dumps and pumps, which cause concerns for investors. This paper deals with price prediction and illiquidity prediction (converting one asset to another while maintaining its value). The proposed Wavelet Deep average model uses a combination of Wavelet transform and average deep learning models for the final prediction. This model uses the hash rate information of currencies as the main inputs. Then, it achieves the selection of a subset of features using a Random Forest(RF). The selected features are designed by a Wavelet and are considered as the input to the deep network. Four currencies, BTC, Dogecoin, Ethereum(ETH), and Bitcoin Cash(BCH), were considered for model evaluation. In Bitcoin prediction, the lowest MAE for price prediction and illiquidity was achieved, which was 1.19 and 1.49, respectively. Also, the proposed model achieved MAE of 1.99, 3.69, and 2.99 for the illiquidity of three currencies Dogecoin, ETH, and BCH. The implementation codes are available in https://github.com/Ramin1Mousa/.
Journal:
Neural computing and application (springer)
@Raminmousa
Title: Wavelet transform and deep average model for price and illiquidity prediction cryptocurrencies using high-dimensional features
abstarct: Cryptocurrencies are alternative payment methods that are created using encrypted algorithms. Encryption technologies mean that cryptocurrencies act as both a currency and a virtual accounting system. The global crypto market value is \$2.9 trillion. Hence, it requires high investment requirements. One of the challenging issues in cryptocurrencies is illiquidity. Due to behavioural chaos in the market, some currencies have severe dumps and pumps, which cause concerns for investors. This paper deals with price prediction and illiquidity prediction (converting one asset to another while maintaining its value). The proposed Wavelet Deep average model uses a combination of Wavelet transform and average deep learning models for the final prediction. This model uses the hash rate information of currencies as the main inputs. Then, it achieves the selection of a subset of features using a Random Forest(RF). The selected features are designed by a Wavelet and are considered as the input to the deep network. Four currencies, BTC, Dogecoin, Ethereum(ETH), and Bitcoin Cash(BCH), were considered for model evaluation. In Bitcoin prediction, the lowest MAE for price prediction and illiquidity was achieved, which was 1.19 and 1.49, respectively. Also, the proposed model achieved MAE of 1.99, 3.69, and 2.99 for the illiquidity of three currencies Dogecoin, ETH, and BCH. The implementation codes are available in https://github.com/Ramin1Mousa/.
Journal:
Neural computing and application (springer)
@Raminmousa
Please open Telegram to view this post
    VIEW IN TELEGRAM
  GitHub
  
  Ramin1Mousa - Overview
  I have a dream    
 
****. Ramin1Mousa has 40 repositories available. Follow their code on GitHub.
****. Ramin1Mousa has 40 repositories available. Follow their code on GitHub.
👍3
  Machine learning books and papers pinned «با عرض سلام نفر سوم از مقاله زير را نياز داريم.   Title: Wavelet transform and deep average model for price and illiquidity prediction cryptocurrencies using high-dimensional features  🔸 🔸 🔸 🔸 🔸 🔸 🔸 🔸  abstarct: Cryptocurrencies are alternative payment methods that…»
  
  Papers
با عرض سلام نفر سوم از مقاله زير را نياز داريم.   Title: Wavelet transform and deep average model for price and illiquidity prediction cryptocurrencies using high-dimensional features  🔸 🔸 🔸 🔸 🔸 🔸 🔸 🔸  abstarct: Cryptocurrencies are alternative payment methods that…
با عرض سلام این مقاله امشب فرایند سابمیتش دوستانی که نیاز به مشارکت دارن می تونن شرکت کنند.
@Raminmousa
  @Raminmousa
📄The role and application of bioinformatics techniques and tools in drug discovery
📎 Study the paper
@Machine_learn
  📎 Study the paper
@Machine_learn
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
25 Feb 2025 · Qiuchen Wang, Ruixue Ding, Zehui Chen, Weiqi Wu, Shihang Wang, Pengjun Xie, Feng Zhao ·
Understanding information from visually rich documents remains a significant challenge for traditional Retrieval-Augmented Generation (RAG) methods. Existing benchmarks predominantly focus on image-based question answering (QA), overlooking the fundamental challenges of efficient retrieval, comprehension, and reasoning within dense visual documents. To bridge this gap, we introduce ViDoSeek, a novel dataset designed to evaluate RAG performance on visually rich documents requiring complex reasoning. Based on it, we identify key limitations in current RAG approaches: (i) purely visual retrieval methods struggle to effectively integrate both textual and visual features, and (ii) previous approaches often allocate insufficient reasoning tokens, limiting their effectiveness. To address these challenges, we propose #ViDoRAG, a novel multi-agent RAG framework tailored for complex reasoning across visual documents. ViDoRAG employs a Gaussian Mixture Model (GMM)-based hybrid strategy to effectively handle multi-modal retrieval. To further elicit the model's reasoning capabilities, we introduce an iterative agent workflow incorporating exploration, summarization, and reflection, providing a framework for investigating test-time scaling in RAG domains. Extensive experiments on ViDoSeek validate the effectiveness and generalization of our approach. Notably, ViDoRAG outperforms existing methods by over 10% on the competitive #ViDoSeek benchmark.
Paper: https://arxiv.org/pdf/2502.18017v1.pdf
Code: https://github.com/Alibaba-NLP/ViDoRAG
@Machine_learn
25 Feb 2025 · Qiuchen Wang, Ruixue Ding, Zehui Chen, Weiqi Wu, Shihang Wang, Pengjun Xie, Feng Zhao ·
Understanding information from visually rich documents remains a significant challenge for traditional Retrieval-Augmented Generation (RAG) methods. Existing benchmarks predominantly focus on image-based question answering (QA), overlooking the fundamental challenges of efficient retrieval, comprehension, and reasoning within dense visual documents. To bridge this gap, we introduce ViDoSeek, a novel dataset designed to evaluate RAG performance on visually rich documents requiring complex reasoning. Based on it, we identify key limitations in current RAG approaches: (i) purely visual retrieval methods struggle to effectively integrate both textual and visual features, and (ii) previous approaches often allocate insufficient reasoning tokens, limiting their effectiveness. To address these challenges, we propose #ViDoRAG, a novel multi-agent RAG framework tailored for complex reasoning across visual documents. ViDoRAG employs a Gaussian Mixture Model (GMM)-based hybrid strategy to effectively handle multi-modal retrieval. To further elicit the model's reasoning capabilities, we introduce an iterative agent workflow incorporating exploration, summarization, and reflection, providing a framework for investigating test-time scaling in RAG domains. Extensive experiments on ViDoSeek validate the effectiveness and generalization of our approach. Notably, ViDoRAG outperforms existing methods by over 10% on the competitive #ViDoSeek benchmark.
Paper: https://arxiv.org/pdf/2502.18017v1.pdf
Code: https://github.com/Alibaba-NLP/ViDoRAG
@Machine_learn
❤5👍1
  Forwarded from Papers
با عرض سلام براي مقاله بالا نياز به نفر ٣ ام هستيم. 
مجله هاي پيشنهادي جهت سابميت.
🔺 🔺 🔺 🔸 🔸 🔸 🔺 🔺 🔺 
-Soft computing
- Computational Economics
- Multimedia Tools and Applicaion
جهت ثبت اسم با ايدي بنده در ارتباط باشين
@Raminmousa
@Machine_learn
@paper4money
مجله هاي پيشنهادي جهت سابميت.
-Soft computing
- Computational Economics
- Multimedia Tools and Applicaion
جهت ثبت اسم با ايدي بنده در ارتباط باشين
@Raminmousa
@Machine_learn
@paper4money
Please open Telegram to view this post
    VIEW IN TELEGRAM
  🔥1
  Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation
🖥  Github: https://github.com/EnVision-Research/Kiss3DGen
📕  Paper: https://arxiv.org/abs/2503.01370v1
🌟 Dataset: https://paperswithcode.com/dataset/nerf
@Machine_learn
🌟 Dataset: https://paperswithcode.com/dataset/nerf
@Machine_learn
Please open Telegram to view this post
    VIEW IN TELEGRAM
  ❤2
  Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles
26 Feb 2025 · Kuang Wang, Xianfei Li, Shenghao Yang, Li Zhou, Feng Jiang, Haizhou Li ·
User simulators are crucial for replicating human interactions with dialogue systems, supporting both collaborative training and automatic evaluation, especially for large language models (LLMs). However, existing simulators often rely solely on text utterances, missing implicit user traits such as personality, speaking style, and goals. In contrast, persona-based methods lack generalizability, as they depend on predefined profiles of famous individuals or archetypes. To address these challenges, we propose User Simulator with implicit Profiles (#USP), a framework that infers implicit user profiles from human-machine conversations and uses them to generate more personalized and realistic dialogues. We first develop an LLM-driven extractor with a comprehensive profile schema. Then, we refine the simulation through conditional supervised fine-tuning and reinforcement learning with cycle consistency, optimizing it at both the utterance and conversation levels. Finally, we adopt a diverse profile sampler to capture the distribution of real-world user profiles. Experimental results demonstrate that USP outperforms strong baselines in terms of authenticity and diversity while achieving comparable performance in consistency. Furthermore, dynamic multi-turn evaluations based on USP strongly align with mainstream benchmarks, demonstrating its effectiveness in real-world applications
.
Paper: https://arxiv.org/pdf/2502.18968v1.pdf
Code: https://github.com/wangkevin02/USP
Dataset: LMSYS-USP
@Machine_learn
26 Feb 2025 · Kuang Wang, Xianfei Li, Shenghao Yang, Li Zhou, Feng Jiang, Haizhou Li ·
User simulators are crucial for replicating human interactions with dialogue systems, supporting both collaborative training and automatic evaluation, especially for large language models (LLMs). However, existing simulators often rely solely on text utterances, missing implicit user traits such as personality, speaking style, and goals. In contrast, persona-based methods lack generalizability, as they depend on predefined profiles of famous individuals or archetypes. To address these challenges, we propose User Simulator with implicit Profiles (#USP), a framework that infers implicit user profiles from human-machine conversations and uses them to generate more personalized and realistic dialogues. We first develop an LLM-driven extractor with a comprehensive profile schema. Then, we refine the simulation through conditional supervised fine-tuning and reinforcement learning with cycle consistency, optimizing it at both the utterance and conversation levels. Finally, we adopt a diverse profile sampler to capture the distribution of real-world user profiles. Experimental results demonstrate that USP outperforms strong baselines in terms of authenticity and diversity while achieving comparable performance in consistency. Furthermore, dynamic multi-turn evaluations based on USP strongly align with mainstream benchmarks, demonstrating its effectiveness in real-world applications
.
Paper: https://arxiv.org/pdf/2502.18968v1.pdf
Code: https://github.com/wangkevin02/USP
Dataset: LMSYS-USP
@Machine_learn
👍1
  