با عرض سلام برای مقاله زیر نیاز به کسی داریم که هزینه سرور با ما شریک بشه. 
Multi-modal wound classification using wound image and location by vit-wavelet and transformer
🔸 🔸 🔸 🔸 🔸 🔸 🔸 
Jouranl: scientific reports(nature)
هزینه مشارکت نفر ۵ ام ۳۰۰$ می باشد.
🔻 @Raminmousa
Multi-modal wound classification using wound image and location by vit-wavelet and transformer
Jouranl: scientific reports(nature)
هزینه مشارکت نفر ۵ ام ۳۰۰$ می باشد.
Please open Telegram to view this post
    VIEW IN TELEGRAM
  Forwarded from Papers
با عرض سلام براي مقاله بالا نياز به نفر  سوم ام هستيم. 
مجله پيشنهادي جهت سابميت.
https://www.springerprofessional.de/financial-innovation/50101254
If6️⃣ . 5
هزینه نفر سوم ۱۵ میلیون می باشد
🔺 🔺 🔺 🔸 🔸 🔸 🔺 🔺 🔺 
جهت ثبت اسم با ايدي بنده در ارتباط باشين
@Raminmousa
@Machine_learn
@paper4money
مجله پيشنهادي جهت سابميت.
https://www.springerprofessional.de/financial-innovation/50101254
If
هزینه نفر سوم ۱۵ میلیون می باشد
جهت ثبت اسم با ايدي بنده در ارتباط باشين
@Raminmousa
@Machine_learn
@paper4money
Please open Telegram to view this post
    VIEW IN TELEGRAM
  MonSter: Marry Monodepth to Stereo Unleashes Power
15 Jan 2025 · Junda Cheng, Longliang Liu, Gangwei Xu, Xianqi Wang, Zhaoxing Zhang, Yong Deng, Jinliang Zang, Yurui Chen, Zhipeng Cai, Xin Yang ·
Stereo matching recovers depth from image correspondences. Existing methods struggle to handle ill-posed regions with limited matching cues, such as occlusions and textureless areas. To address this, we propose MonSter, a novel method that leverages the complementary strengths of monocular depth estimation and stereo matching. MonSter integrates monocular depth and stereo matching into a dual-branch architecture to iteratively improve each other. Confidence-based guidance adaptively selects reliable stereo cues for monodepth scale-shift recovery. The refined monodepth is in turn guides stereo effectively at ill-posed regions. Such iterative mutual enhancement enables MonSter to evolve monodepth priors from coarse object-level structures to pixel-level geometry, fully unlocking the potential of stereo matching. As shown in Fig.1, MonSter ranks 1st across five most commonly used leaderboards -- SceneFlow, KITTI 2012, KITTI 2015, Middlebury, and ETH3D. Achieving up to 49.5% improvements (Bad 1.0 on ETH3D) over the previous best method. Comprehensive analysis verifies the effectiveness of MonSter in ill-posed regions. In terms of zero-shot generalization, MonSter significantly and consistently outperforms state-of-the-art across the board. The code is publicly available at: https://github.com/Junda24/MonSter.
Paper: https://arxiv.org/pdf/2501.08643v1.pdf
Code: https://github.com/junda24/monster
Datasets: KITTI - TartanAir
@Machine_learn
15 Jan 2025 · Junda Cheng, Longliang Liu, Gangwei Xu, Xianqi Wang, Zhaoxing Zhang, Yong Deng, Jinliang Zang, Yurui Chen, Zhipeng Cai, Xin Yang ·
Stereo matching recovers depth from image correspondences. Existing methods struggle to handle ill-posed regions with limited matching cues, such as occlusions and textureless areas. To address this, we propose MonSter, a novel method that leverages the complementary strengths of monocular depth estimation and stereo matching. MonSter integrates monocular depth and stereo matching into a dual-branch architecture to iteratively improve each other. Confidence-based guidance adaptively selects reliable stereo cues for monodepth scale-shift recovery. The refined monodepth is in turn guides stereo effectively at ill-posed regions. Such iterative mutual enhancement enables MonSter to evolve monodepth priors from coarse object-level structures to pixel-level geometry, fully unlocking the potential of stereo matching. As shown in Fig.1, MonSter ranks 1st across five most commonly used leaderboards -- SceneFlow, KITTI 2012, KITTI 2015, Middlebury, and ETH3D. Achieving up to 49.5% improvements (Bad 1.0 on ETH3D) over the previous best method. Comprehensive analysis verifies the effectiveness of MonSter in ill-posed regions. In terms of zero-shot generalization, MonSter significantly and consistently outperforms state-of-the-art across the board. The code is publicly available at: https://github.com/Junda24/MonSter.
Paper: https://arxiv.org/pdf/2501.08643v1.pdf
Code: https://github.com/junda24/monster
Datasets: KITTI - TartanAir
@Machine_learn
👍5
  Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement
🖥  Github: https://github.com/yunncheng/MMRL
📕  Paper: https://arxiv.org/abs/2503.08497v1
🌟 Dataset: https://paperswithcode.com/dataset/imagenet-s
@Machine_learn
🌟 Dataset: https://paperswithcode.com/dataset/imagenet-s
@Machine_learn
Please open Telegram to view this post
    VIEW IN TELEGRAM
  ❤1🔥1
  VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control
7 Mar 2025 · Yuxuan Bian, Zhaoyang Zhang, Xuan Ju, Mingdeng Cao, Liangbin Xie, Ying Shan, Qiang Xu ·
Video inpainting, which aims to restore corrupted video content, has experienced substantial progress. Despite these advances, existing methods, whether propagating unmasked region pixels through optical flow and receptive field priors, or extending image-inpainting models temporally, face challenges in generating fully masked objects or balancing the competing objectives of background context preservation and foreground generation in one model, respectively. To address these limitations, we propose a novel dual-stream paradigm VideoPainter that incorporates an efficient context encoder (comprising only 6% of the backbone parameters) to process masked videos and inject backbone-aware background contextual cues to any pre-trained video DiT, producing semantically consistent content in a plug-and-play manner. This architectural separation significantly reduces the model's learning complexity while enabling nuanced integration of crucial background context. We also introduce a novel target region ID resampling technique that enables any-length video inpainting, greatly enhancing our practical applicability. Additionally, we establish a scalable dataset pipeline leveraging current vision understanding models, contributing VPData and VPBench to facilitate segmentation-based inpainting training and assessment, the largest video inpainting dataset and benchmark to date with over 390K diverse clips. Using inpainting as a pipeline basis, we also explore downstream applications including video editing and video editing pair data generation, demonstrating competitive performance and significant practical potential. Extensive experiments demonstrate VideoPainter's superior performance in both any-length video inpainting and editing, across eight key metrics, including video quality, mask region preservation, and textual coherence.
Paper: https://arxiv.org/pdf/2503.05639v2.pdf
Code: https://github.com/TencentARC/VideoPainter
Datasets: VPData - VPBench
@Machine_learn
7 Mar 2025 · Yuxuan Bian, Zhaoyang Zhang, Xuan Ju, Mingdeng Cao, Liangbin Xie, Ying Shan, Qiang Xu ·
Video inpainting, which aims to restore corrupted video content, has experienced substantial progress. Despite these advances, existing methods, whether propagating unmasked region pixels through optical flow and receptive field priors, or extending image-inpainting models temporally, face challenges in generating fully masked objects or balancing the competing objectives of background context preservation and foreground generation in one model, respectively. To address these limitations, we propose a novel dual-stream paradigm VideoPainter that incorporates an efficient context encoder (comprising only 6% of the backbone parameters) to process masked videos and inject backbone-aware background contextual cues to any pre-trained video DiT, producing semantically consistent content in a plug-and-play manner. This architectural separation significantly reduces the model's learning complexity while enabling nuanced integration of crucial background context. We also introduce a novel target region ID resampling technique that enables any-length video inpainting, greatly enhancing our practical applicability. Additionally, we establish a scalable dataset pipeline leveraging current vision understanding models, contributing VPData and VPBench to facilitate segmentation-based inpainting training and assessment, the largest video inpainting dataset and benchmark to date with over 390K diverse clips. Using inpainting as a pipeline basis, we also explore downstream applications including video editing and video editing pair data generation, demonstrating competitive performance and significant practical potential. Extensive experiments demonstrate VideoPainter's superior performance in both any-length video inpainting and editing, across eight key metrics, including video quality, mask region preservation, and textual coherence.
Paper: https://arxiv.org/pdf/2503.05639v2.pdf
Code: https://github.com/TencentARC/VideoPainter
Datasets: VPData - VPBench
@Machine_learn
❤2👍1
  Forwarded from Papers
با عرض سلام براي مقاله بالا نياز به co-author (نفر اول) هستيم. 
مجله پيشنهادي جهت سابميت.
https://www.springerprofessional.de/financial-innovation/50101254
If6️⃣ . 5
🔺 🔺 🔺 🔸 🔸 🔸 🔺 🔺 🔺 
جهت ثبت اسم با ايدي بنده در ارتباط باشين
@Raminmousa
@Machine_learn
@paper4money
مجله پيشنهادي جهت سابميت.
https://www.springerprofessional.de/financial-innovation/50101254
If
جهت ثبت اسم با ايدي بنده در ارتباط باشين
@Raminmousa
@Machine_learn
@paper4money
Please open Telegram to view this post
    VIEW IN TELEGRAM
  ❤2
  Forwarded from ENG. Hussein Sheikho
This channels is for Programmers, Coders, Software Engineers.
0️⃣  Python 
1️⃣  Data Science
2️⃣  Machine Learning
3️⃣  Data Visualization 
4️⃣   Artificial Intelligence
5️⃣  Data Analysis
6️⃣  Statistics
7️⃣  Deep Learning
8️⃣  programming Languages
✅  https://t.me/addlist/8_rRW2scgfRhOTc0
✅  https://t.me/codeprogrammer
Please open Telegram to view this post
    VIEW IN TELEGRAM
  LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
10 Mar 2025 · Yingzhe Peng, Gongrui Zhang, Miaosen Zhang, Zhiyuan You, Jie Liu, Qipeng Zhu, Kai Yang, Xingzhong Xu, Xin Geng, Xu Yang
Enhancing reasoning in Large Multimodal Models (#LMMs) faces unique challenges from the complex interplay between visual perception and logical reasoning, particularly in compact 3B-parameter architectures where architectural constraints limit reasoning capacity and modality alignment. While rule-based reinforcement learning (RL) excels in text-only domains, its multimodal extension confronts two critical barriers: (1) data limitations due to ambiguous answers and scarce complex reasoning examples, and (2) degraded foundational reasoning induced by multimodal pretraining. To address these challenges, we propose \textbf{\method}, a two-stage framework adapting rule-based RL for multimodal reasoning through \textbf{Foundational Reasoning Enhancement (FRE)} followed by \textbf{Multimodal Generalization Training (MGT)}. The FRE stage first strengthens reasoning abilities using text-only data with rule-based RL, then the MGT stage generalizes these reasoning capabilities to multimodal domains. Experiments on Qwen2.5-VL-Instruct-3B demonstrate that \method achieves 4.83\% and 4.5\% average improvements over baselines in multimodal and text-only benchmarks, respectively, with a 3.63\% gain in complex Football Game tasks. These results validate that text-based reasoning enhancement enables effective multimodal generalization, offering a data-efficient paradigm that bypasses costly high-quality multimodal training data.
Paper: https://arxiv.org/pdf/2503.07536v1.pdf
code: https://github.com/tidedra/lmm-r1
@Machine_learn
  10 Mar 2025 · Yingzhe Peng, Gongrui Zhang, Miaosen Zhang, Zhiyuan You, Jie Liu, Qipeng Zhu, Kai Yang, Xingzhong Xu, Xin Geng, Xu Yang
Enhancing reasoning in Large Multimodal Models (#LMMs) faces unique challenges from the complex interplay between visual perception and logical reasoning, particularly in compact 3B-parameter architectures where architectural constraints limit reasoning capacity and modality alignment. While rule-based reinforcement learning (RL) excels in text-only domains, its multimodal extension confronts two critical barriers: (1) data limitations due to ambiguous answers and scarce complex reasoning examples, and (2) degraded foundational reasoning induced by multimodal pretraining. To address these challenges, we propose \textbf{\method}, a two-stage framework adapting rule-based RL for multimodal reasoning through \textbf{Foundational Reasoning Enhancement (FRE)} followed by \textbf{Multimodal Generalization Training (MGT)}. The FRE stage first strengthens reasoning abilities using text-only data with rule-based RL, then the MGT stage generalizes these reasoning capabilities to multimodal domains. Experiments on Qwen2.5-VL-Instruct-3B demonstrate that \method achieves 4.83\% and 4.5\% average improvements over baselines in multimodal and text-only benchmarks, respectively, with a 3.63\% gain in complex Football Game tasks. These results validate that text-based reasoning enhancement enables effective multimodal generalization, offering a data-efficient paradigm that bypasses costly high-quality multimodal training data.
Paper: https://arxiv.org/pdf/2503.07536v1.pdf
code: https://github.com/tidedra/lmm-r1
@Machine_learn
You don't need to buy a GPU for machine learning work!
There are other alternatives. Here are some:
1. Google Colab
2. Kaggle
3. Deepnote
4. AWS SageMaker
5. GCP Notebooks
6. Azure Notebooks
7. Cocalc
8. Binder
9. Saturncloud
10. Datablore
11. IBM Notebooks
12. Ola kutrim
@Machine_learn
  
  There are other alternatives. Here are some:
1. Google Colab
2. Kaggle
3. Deepnote
4. AWS SageMaker
5. GCP Notebooks
6. Azure Notebooks
7. Cocalc
8. Binder
9. Saturncloud
10. Datablore
11. IBM Notebooks
12. Ola kutrim
@Machine_learn
Kaggle
  
  Kaggle: Your Machine Learning and Data Science Community
  Kaggle is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals.
❤5
  📃 Biological Multi-Layer and Single Cell Network-Based Multiomics Models - a Review
📎 Study the paper
@Machine_learn
  📎 Study the paper
@Machine_learn
مقاله ی طبقه بندی زخم چند وجهی که در یکی از بهترین ژورنال های Elsevier به چاپ رسوندیم. 
Multi-modal wound classification using wound image and location by Swin Transformer and Transformer
✅ Accepted ✅ 
Author: Ramin Mousa,
Behnaz Rezaei, Laya Mahmoudi, Jafar Abdollahi
If: 7.5
Journal: https://www.sciencedirect.com/journal/expert-systems-with-applications
Paper: Link
@Machine_learn
Multi-modal wound classification using wound image and location by Swin Transformer and Transformer
Author: Ramin Mousa,
Behnaz Rezaei, Laya Mahmoudi, Jafar Abdollahi
If: 7.5
Journal: https://www.sciencedirect.com/journal/expert-systems-with-applications
Paper: Link
@Machine_learn
Please open Telegram to view this post
    VIEW IN TELEGRAM
  👍4🔥2❤1
  This media is not supported in your browser
    VIEW IN TELEGRAM
  Jointly announcing EAGLE-3 with SGLang: Setting a new record in LLM inference acceleration!
- 5x🚀than vanilla (on HF)
- 1.4x🚀than EAGLE-2 (on HF)
- A record of ~400 TPS on LLama 3.1 8B with a single H100 (on SGLang)
- 1.65x🚀in latency even for large bs=64 (on SGLang)
- A new scaling law: more training data, better speedup
- Apache 2.0
Paper: https://arxiv.org/abs/2503.01840
Code: https://github.com/SafeAILab/EAGLE
SGLang version: https://github.com/sgl-project/sglang/pull/4247
@Machine_learn
- 5x🚀than vanilla (on HF)
- 1.4x🚀than EAGLE-2 (on HF)
- A record of ~400 TPS on LLama 3.1 8B with a single H100 (on SGLang)
- 1.65x🚀in latency even for large bs=64 (on SGLang)
- A new scaling law: more training data, better speedup
- Apache 2.0
Paper: https://arxiv.org/abs/2503.01840
Code: https://github.com/SafeAILab/EAGLE
SGLang version: https://github.com/sgl-project/sglang/pull/4247
@Machine_learn
❤1👍1
  Executable Code Actions Elicit Better LLM Agents
1 Feb 2024 · Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, Heng Ji
Large Language Model (LLM) agents, capable of performing a broad range of actions, such as invoking tools and controlling robots, show great potential in tackling real-world challenges. LLM agents are typically prompted to produce actions by generating #JSON or text in a pre-defined format, which is usually limited by constrained action space (e.g., the scope of pre-defined tools) and restricted flexibility (e.g., inability to compose multiple tools). This work proposes to use executable Python code to consolidate LLM agents' actions into a unified action space (CodeAct). Integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations through multi-turn interactions. Our extensive analysis of 17 LLMs on API-Bank and a newly curated benchmark shows that CodeAct outperforms widely used alternatives (up to 20% higher success rate). The encouraging performance of CodeAct motivates us to build an open-source #LLM agent that interacts with environments by executing interpretable code and collaborates with users using natural language. To this end, we collect an instruction-tuning dataset CodeActInstruct that consists of 7k multi-turn interactions using CodeAct. We show that it can be used with existing data to improve models in agent-oriented tasks without compromising their general capability. CodeActAgent, finetuned from Llama2 and Mistral, is integrated with #Python interpreter and uniquely tailored to perform sophisticated tasks (e.g., model training) using existing libraries and autonomously self-debug.
Paper: https://arxiv.org/pdf/2402.01030v4.pdf
Codes:
https://github.com/epfllm/megatron-llm
https://github.com/xingyaoww/code-act
Datasets: MMLU - GSM8K - HumanEval - MATH
@Machine_learn
1 Feb 2024 · Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, Heng Ji
Large Language Model (LLM) agents, capable of performing a broad range of actions, such as invoking tools and controlling robots, show great potential in tackling real-world challenges. LLM agents are typically prompted to produce actions by generating #JSON or text in a pre-defined format, which is usually limited by constrained action space (e.g., the scope of pre-defined tools) and restricted flexibility (e.g., inability to compose multiple tools). This work proposes to use executable Python code to consolidate LLM agents' actions into a unified action space (CodeAct). Integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations through multi-turn interactions. Our extensive analysis of 17 LLMs on API-Bank and a newly curated benchmark shows that CodeAct outperforms widely used alternatives (up to 20% higher success rate). The encouraging performance of CodeAct motivates us to build an open-source #LLM agent that interacts with environments by executing interpretable code and collaborates with users using natural language. To this end, we collect an instruction-tuning dataset CodeActInstruct that consists of 7k multi-turn interactions using CodeAct. We show that it can be used with existing data to improve models in agent-oriented tasks without compromising their general capability. CodeActAgent, finetuned from Llama2 and Mistral, is integrated with #Python interpreter and uniquely tailored to perform sophisticated tasks (e.g., model training) using existing libraries and autonomously self-debug.
Paper: https://arxiv.org/pdf/2402.01030v4.pdf
Codes:
https://github.com/epfllm/megatron-llm
https://github.com/xingyaoww/code-act
Datasets: MMLU - GSM8K - HumanEval - MATH
@Machine_learn
❤1
  PiEEG kit - bioscience Lab in home for your Brain and Body
🖥  Github: https://github.com/pieeg-club/PiEEG_Kit
📕  Paper: https://arxiv.org/abs/2503.13482
🌟 Methods: https://paperswithcode.com/task/eeg-1
@Machine_learn
🌟 Methods: https://paperswithcode.com/task/eeg-1
@Machine_learn
Please open Telegram to view this post
    VIEW IN TELEGRAM
  ❤4
  Introduction to Graph Neural Networks: A Starting Point for Machine Learning Engineers
📓 Paper
@Machine_learn
📓 Paper
@Machine_learn
👍4
  