Chem ML/AI/Datasets

Mapping Boryl Radical Properties and Reactivity Using Machine Learning: The B-Rad and React-B-Rad Maps

https://doi.org/10.1002/anie.202511509

Boryl radicals have become indispensable in organic synthesis, yet, translating their complex steric and electronic properties into actionable reactivity insights remains challenging. Herein, we present a comprehensive classification of boryl radicals, including a publicly accessible database of 141 neutral 7e-4c boryl radicals, each parametrized by a set of electronic and steric features derived from DFT calculations.

Unsupervised machine learning (k-means clustering) and dimensionality reduction (PCA/UMAP) condense this high dimensional descriptor space into the “B-rad map”, capturing trends in sterics and electronics among the resulting five clusters. Global electrophilicity (ω) and nucleophilicity (N) indices are overlaid to create a polarity‑annotated guide, while DFT‑computed activation free energies for six benchmark reactions (HAT, radical addition, and XAT for two different substrates) yield the React‑B‑rad maps that directly link intrinsic properties to specific reaction performance. To demonstrate predictive power, supervised machine learning models (random forest) are trained on the descriptors and successfully predict radical reactivity regimes across all reaction types.

Overall, this integrated, machine-learning-driven platform can serve as both a practical guide for experimental decision-making and a foundation for data-driven discovery, paving the way towards rational design and virtual screening of boryl-radical reagents for diverse synthetic applications.

📕Angewandte Chemie (IF=16.9)
#method

Please open Telegram to view this post

VIEW IN TELEGRAM

Wiley Online Library

Mapping Boryl Radical Properties and Reactivity Using Machine Learning: The B‐Rad and React‐B‐Rad Maps

Boryl radicals are central to synthesis, yet their steric and electronic complexity makes systematic mapping challenging. The B-rad map organizes 141 in silico-parametrized 7e-4c radicals into five m...

❤4👍3🔥3

673 views11:21

Chem ML/AI/Datasets

ECloudGen: leveraging electron clouds as a latent variable to scale up structure-based molecular design

https://doi.org/10.1038/s43588-025-00886-7

Here we propose a latent variable approach that bridges the gap between ligand-only data and protein–ligand complexes, enabling target-aware generative models to explore a broader chemical space, thereby enhancing the quality of molecular generation. Inspired by quantum molecular simulations, we introduce ECloudGen, a generative model that leverages electron clouds as meaningful latent variables.

ECloudGen incorporates techniques such as latent diffusion models, Llama architectures and a contrastive learning task, which organizes the chemical space into a structured and highly interpretable latent representation.

Benchmark studies demonstrate that ECloudGen outperforms state-of-the-art methods by generating more potent binders with superior physiochemical properties and by covering a broader chemical space.

🖥

https://github.com/HaotianZhangAI4Science/ECloudGen

📕 Nature Computational Science (IF=18.3)
#method

Please open Telegram to view this post

VIEW IN TELEGRAM

Nature

ECloudGen: leveraging electron clouds as a latent variable to scale up structure-based molecular design

Nature Computational Science - This study presents ECloudGen, which uses latent diffusion to generate electron clouds from protein pockets and decodes them into molecules. The adopted two-stage...

❤6👍5🔥5

735 viewsedited 07:59

Chem ML/AI/Datasets

Nanostructured Material Design via a Retrieval-Augmented Generation (RAG) Approach: Bridging Laboratory Practice and Scientific Literature

https://doi.org/10.1021/acs.jcim.5c01897

The increasing complexity in designing nanostructured materials for electronics, biomedicine, and energy applications requires advanced computational methods to enhance research efficiency and minimize experimental costs. This study proposes an innovative agent-based retrieval-augmented generation (RAG) system integrated with large language models (LLMs) to automate the extraction and analysis of scientific information from extensive literature databases, specifically targeting nanostructured materials developed via two-photon polymerization (2PP). In addition to extracting and analyzing scientific data, our approach emphasizes understanding how these nanostructured materials interact with cells, which is crucial for controlling their application in biomedicine.

The developed platform demonstrates robust semantic accuracy (cosine similarity: 0.82) and high overall task precision (0.81), significantly reducing the likelihood of misinformation by incorporating dynamic query refinement mechanisms. The intuitive, user-friendly interface facilitates quick access to relevant scientific data, thereby improving researchers’ productivity and enabling more accurate experimental planning. Although the system exhibits certain limitations regarding domain-specific terminology coverage, further fine-tuning and specialized training are anticipated to enhance its performance and reliability for advanced scientific applications.

📕Journal of Chemical Information and Modeling (IF=5.3)
#article

Please open Telegram to view this post

VIEW IN TELEGRAM

ACS Publications

Nanostructured Material Design via a Retrieval-Augmented Generation (RAG) Approach: Bridging Laboratory Practice and Scientific…

The increasing complexity in designing nanostructured materials for electronics, biomedicine, and energy applications requires advanced computational methods to enhance research efficiency and minimize experimental costs. This study proposes an innovative…

❤5👍4🔥2❤‍🔥1

928 views11:25

Chem ML/AI/Datasets

Data-Driven Discovery of Polar Organic Cocrystals: Integration of Machine Learning and Automated Screening

https://doi.org/10.1021/jacs.5c16276

Polar organic cocrystals hold significant promise for various advanced technological applications. However, their relatively low occurrence emphasizes the difficulties in achieving the desired polar packing arrangements, making their discovery complex and challenging.

Here, we introduce a data-driven method that combines machine learning (ML) with high-throughput (HT) automation to speed up the discovery of polar organic cocrystals. Using ML techniques, we identified key factors that influence polar cocrystal formation, allowing for targeted selection of molecular candidates. We examined 13 cocrystal combinations with chloranilic acid (CA), screening 20 solvent systems for each, which enabled a highly efficient search across a broad chemical space. HT automation further enhanced the synthesis and characterization by enabling rapid screening and precise structural validation, while thoroughly exploring the chemical landscape. Experimental results confirmed 13 pairs of CA cocrystals, with 6 crystallizing in polar space groups, resulting in a polar discovery rate of 46%-nearly three times higher than the average in the Cambridge Structural Database (CSD) (∼13.2%). This integrated approach offers a new strategy in polar organic cocrystal research. The findings demonstrate the potential of this method to advance functional molecular materials and pave the way for next-generation applications using polar organic cocrystals.

📕Journal of the American Chemical Society (IF=15.6)
#method

Please open Telegram to view this post

VIEW IN TELEGRAM

ACS Publications

Data-Driven Discovery of Polar Organic Cocrystals: Integration of Machine Learning and Automated Screening

Polar organic cocrystals hold significant promise for various advanced technological applications. However, their relatively low occurrence emphasizes the difficulties in achieving the desired polar packing arrangements, making their discovery complex and…

🔥5❤3👍3

684 views17:22

Chem ML/AI/Datasets

MOF-ChemUnity: Literature-Informed Large Language Models for Metal–Organic Framework Research

https://doi.org/10.1021/jacs.5c11789

Artificial intelligence (AI) is transforming research in metal–organic frameworks (MOFs), where models trained on structured computational data routinely predict new materials and optimize their properties. This raises a central question: What if we could leverage the full breadth of MOF knowledge, not just structured data sets, but also the scientific literature? For researchers, the literature remains the primary source of knowledge, yet much of its content, including experimental data and expert insight, remains underutilized by AI systems.

We introduce MOF-ChemUnity, a structured, extensible, and scalable knowledge graph that unifies MOF data by linking literature-derived insights to crystal structures and computational data sets. By disambiguating MOF names in the literature and connecting them to crystal structures in the Cambridge Structural Database, MOF-ChemUnity unifies experimental and computational sources and enables cross-document knowledge extraction and linking. We showcase how this enables multiproperty machine learning across simulated and experimental data, compilation of complete synthesis records for individual compounds by aggregating information across multiple publications, and expert-guided materials recommendations via structure-based machine learning descriptors for pore geometry and chemistry. When used as a knowledge source to augment large language models (LLMs), MOF-ChemUnity enables a literature-informed AI assistant that operates over the full scope of MOF knowledge. Expert evaluations show improved accuracy, interpretability, and trustworthiness across tasks such as retrieval, inference of structure–property relationships, and materials recommendation, outperforming standard LLMs. This work lays the foundation for literature-informed materials discovery, enabling both scientists and AI systems to reason over the full existing knowledge in a new way.

📕Journal of the American Chemical Society (IF=15.6)
#method

Please open Telegram to view this post

VIEW IN TELEGRAM

ACS Publications

MOF-ChemUnity: Literature-Informed Large Language Models for Metal–Organic Framework Research

Artificial intelligence (AI) is transforming research in metal–organic frameworks (MOFs), where models trained on structured computational data routinely predict new materials and optimize their properties. This raises a central question: What if we could…

❤7👍3🔥3❤‍🔥1

652 views09:29

Chem ML/AI/Datasets

PersADE: a database of personalized adverse drug events and their underlying molecular mechanisms

https://doi.org/10.1093/nar/gkaf1095

As a major burden on global healthcare systems, adverse drug events (ADEs) result in significant morbidity, mortality, and healthcare resource consumption. With the rapid advances in precision medicine, personalized ADEs and their molecular mechanisms are important components of drug repurposing and drug safety improvement. Thus, extensive studies have been conducted to collect valuable information on personalized ADEs, but no database has yet been available to provide such data.

In this work, PersADE, a database aiming to provide personalized drug adverse events and their molecular mechanisms, was constructed. It integrated 4 061 772 personalized drug-ADE associations, 31 756 protein-ADE associations, and 108 677 drug-protein interactions, with a particular emphasis on off-target effects.

The uniqueness of these data lies in (a) providing demographic characteristics, disease context and drug administration parameters associated with ADEs, enabling stratification of drug-ADE associations; (b) systematically integrating interactions among drugs, human proteins and ADEs, describing the mechanistic insights. Given the growing global focus on precision medicine, PersADE is highly anticipated to significantly impact studies on personalized ADEs and mechanistic explorations by providing researchers and clinicians with evidence-based tools. It is now freely accessible at: https://idrblab.org/PersADE

📕Nucleic Acids Research (IF=13.1)
#dataset

Please open Telegram to view this post

VIEW IN TELEGRAM

OUP Academic

PersADE: a database of personalized adverse drug events and their underlying molecular mechanisms Open Access

Abstract. As a major burden on global healthcare systems, adverse drug events (ADEs) result in significant morbidity, mortality, and healthcare resource co

❤3👍2🔥2

612 views11:22

Chem ML/AI/Datasets

Domain-Trained Language Model for Inverse Design and Synthesis of High-Performance Hydrogen Storage MOFs

https://doi.org/10.1002/anie.202513366

A domain-specific large language model, MOFs-LLM, is developed to accelerate the inverse design and synthesis of metal—organic frameworks (MOFs) for hydrogen storage. Trained on 210 million tokens derived from over 6 000 MOF-related publications and 15 000 crystal structures, the model integrates chemical knowledge with structural features to improve structure–property reasoning. Compared to baseline methods, MOFs-LLM achieves a 46.7% enhancement in capturing structure–property relationships. It enables the inverse design of 60 candidate frameworks optimized for both hydrogen storage performance and synthetic accessibility.

Guided by the model, a novel MOF (Cu-LLMs-1) was synthesized in three experimental iterations, exhibiting a hydrogen uptake of 1.33 wt% at room temperature, ranking among the top five pure MOFs under comparable conditions. These findings highlight the potential of domain-trained language models to bridge virtual screening and experimental realization in materials discovery.

📕Angewandte Chemie (IF=16.9)
#method

Please open Telegram to view this post

VIEW IN TELEGRAM

Wiley Online Library

Domain‐Trained Language Model for Inverse Design and Synthesis of High‐Performance Hydrogen Storage MOFs

A domain-specific language model (MOFs-LLM), trained on over 6 000 MOF-related publications and 15 000 structures, enables inverse design of synthetically accessible hydrogen storage MOFs. Integratin...

❤‍🔥2❤1👍1🔥1

651 views08:54

Chem ML/AI/Datasets

MGDB: a curated database for molecular glues🔥

https://doi.org/10.1093/nar/gkaf1131

We developed MGDB, a specialized open-access repository integrating rigorously curated multidimensional data for MGs. MGDB contains 7396 curated MGs being sourced from 162 peer-reviewed publications and 156 patents. It consolidates structural data, 9728 experimental bioactivity data points (covering degradation efficiency, binding affinity, cellular/animal activity) across 201 targets and 108 effectors, 115 296 computed physicochemical properties, and 270 785 ADMET profiles.

The database supports text-based and chemical structure-based queries and interoperability with external resources (e.g. PubChem, ChEMBL, DrugBank, UniProt, and WIPO) via hyperlinks.

By centralizing and standardizing specialized MG information, MGDB empowers researchers to rapidly explore MG research landscapes and provides high-quality datasets for artificial intelligence-driven rational therapeutic design. MGDB is freely available at http://mgdb.idruglab.cn/.

📕Nucleic Acids Research (IF=13.1)
#dataset

Please open Telegram to view this post

VIEW IN TELEGRAM

OUP Academic

MGDB: a curated database for molecular glues Open Access

Abstract. Molecular glues (MGs) represent a unique class of small molecules that modulate protein–protein interactions by altering target protein surface p

🔥4❤2👍2

629 views15:48

Chem ML/AI/Datasets

molSimplify 2.0: Improved Structure Generation for Automating Discovery in Inorganic Molecular and Reticular Chemistry

🔥

https://doi.org/10.26434/chemrxiv-2025-h8gff-v2

We provide an overview of core molSimplify functionality and recent updates that enhance its capabilities for automated molecular and materials modeling. We describe the mol3D and atom3D classes, which store atomic and bonding information for a wide range of functions, including reading, modifying, and characterizing molecular geometries from common file formats. Enhancements to decoration and substructure addition functions enable systematic derivatization of template molecules.

We introduce a new mol2D class that enables graph-based uniqueness checks and substructure identification. Most importantly, we introduce improvements to transition metal complex (TMC) generation that eliminate steric clashes and enable structure building with ligands of higher denticity. Integration with machine learning models that predict coordinating atom identities enables truly high-throughput, de novo TMC generation.

We describe applications of molSimplify outside of isolated TMCs, including extensions to periodic systems (i.e., particularly metal–organic frameworks) and to metalloenzymes through the protein3D class. We demonstrate our improved combined structure prediction and generation workflow by generating structures of a database of experimentally characterized Ir complexes from only the SMILES strings of their respective ligands.

We envision that recent enhancements will make the code easily extendible to other periodic materials such as covalent organic frameworks and zeolites or to multimetallic transition metal complexes.

https://molsimplify.mit.edu/

ChemRxiv
#method

Please open Telegram to view this post

VIEW IN TELEGRAM

❤6👍4🔥4

620 viewsedited 07:31

Chem ML/AI/Datasets

Advancing Structure Elucidation with a Flexible Multi-Spectral AI Model

https://doi.org/10.1002/anie.202517611

Validating chemical synthesis success requires confirming the desired product using various analytical techniques. While spectroscopic data collection is increasingly automated, interpreting results remains a major bottleneck, often requiring expert input. With advances in laboratory automation and high-throughput synthesis, this challenge is expected to intensify.

We introduce the MultiModalSpectralTransformer (MMST), a machine learning method that predicts chemical structures directly from diverse spectral data (NMR, IR, and MS). Trained on 4 million simulated compounds, MMST achieves 72% and 80% as top-1 and top-3 accuracy, respectively. To address out-of-distribution challenges, we implemented an active learning improvement cycle that generates molecules in similar chemical spaces, enabling the model to adapt to chemical structures beyond its original training data. We demonstrate MMST's capabilities through comprehensive benchmarking across diverse molecular weight ranges and chemical spaces. Notably, despite training solely on simulated data, MMST demonstrates good performance with experimental spectra. This research represents a significant advancement in automated structure elucidation, offering a powerful and adaptable tool that bridges the gap between simulated and real-world data.

📕Angewandte Chemie (IF=17.0)
#method

Please open Telegram to view this post

VIEW IN TELEGRAM

Wiley Online Library

Advancing Structure Elucidation with a Flexible Multi‐Spectral AI Model

The MultiModalSpectralTransformer (MMST) is an automated tool that helps identify molecular structures by combining different spectroscopic methods, including 2D and 1D NMR, IR, and MS, within a unif...

❤8👍5🔥4

572 views11:43

Chem ML/AI/Datasets

Наши коллеги из OdanChem завели свой канал: https://t.me/odanchem

OdanChem — это российский сервис поиска химической информации с самой большой в мире базой ЯМР-спектров.

Одна из ключевых фишек — возможность решения обратной спектроскопической задачи. То есть загрузить спектр и найти какая структура ему соответствует.

В их базе данных содержится:
>17млн ЯМР спектров на 37 типах ядер
>20млн молекул
>2млн ИК-спектров
>500k ВЭЖХ и ГХ

На канале можно найти полезные вещи для химиков, например, Как сдать образец на ЯМР?

Также коллеги добавили поиск по 10млн химических реакций, что уже частично замещает Reaxys и SciFinder.

👉🏻 https://t.me/odanchem

OdanChem research

О платформе OdanChem:
https://odanchem.org/
Лабораторный журнал:
https://lab.odanchem.org/
По всем вопросам: @ole_afan

🔥10❤5👍4

726 viewsedited 08:42

Chem ML/AI/Datasets

SynTwins: a retrosynthesis-guided framework for synthesizable molecular analog generation

🔥

https://doi.org/10.1039/D5SC05225D

The disconnect between AI-generated molecules with desirable properties and their synthetic feasibility remains a critical bottleneck in computational discovery of drugs and materials. While generative AI has accelerated the proposal of candidate molecules, many of these structures prove challenging or impossible to synthesize using established chemical reactions.

Here, we introduce SynTwins, a novel retrosynthesis-guided molecule design framework that finds synthetically accessible molecular analogs by emulating expert chemists' strategies in three steps: retrosynthesis, searching similar building blocks, and virtual synthesis. Using a search algorithm instead of a stochastic data-driven generator, SynTwins outperforms state-of-the-art machine learning models at exploring synthetically accessible analogs while maintaining high structural similarity to original target molecules. Furthermore, when integrated into existing molecular property-optimization frameworks, our hybrid approach produces synthetically feasible analogs with minimal loss in property scores.

Our comprehensive benchmarking across diverse molecular datasets demonstrates that SynTwins effectively bridges the gap between computational design and experimental synthesis, providing a practical solution for accelerating the discovery of synthesizable molecules with desired properties for a wide range of applications.

📕Chemical Science (IF=7.5)
#method

Please open Telegram to view this post

VIEW IN TELEGRAM

pubs.rsc.org

SynTwins: a retrosynthesis-guided framework for synthesizable molecular analog generation

The disconnect between AI-generated molecules with desirable properties and their synthetic feasibility remains a critical bottleneck in computational discovery of drugs and materials. While generative AI has accelerated the proposal of candidate molecules…

🔥4❤3👍3

493 viewsedited 09:15

Chem ML/AI/Datasets

Forwarded from Химия в России и за рубежом (канал ИОНХ РАН)

Химическое информационное агентство начинает свою работу!

ХИА — профессиональное сообщество, создающее единое инфополе для всех, кто связан с химией. Цель агентства — помочь профессионалам оставаться в курсе ключевых событий, а всем интересующимся химией — увидеть её фундаментальную роль в современном мире.

Основные направления, по которым ХИА будет вести свою работу:
• Химическая наука – новые открытия, публикации в ведущих научных журналах и обзоры перспективных направлений.
• Химическое образование – новости вузов, анонсы студенческих конференций и олимпиад, полезные материалы для студентов и преподавателей.
• Химическая промышленность – инновационные технологии, экологические решения, анализ рынка и интервью с представителями отрасли.
• Конференции и семинары – анонсы и обзоры материалов международных и российских форумов, отраслевых съездов и образовательных школ.
• История химии – популярные статьи о становлении науки, биографии выдающихся химиков, архивные материалы и малоизвестные факты.
• Официально – документы, нормативные акты, гранты и конкурсы в сфере химии и смежных наук.
• Персоналии – поздравления учёным, руководителям и ведущим специалистам с наградами, премиями, почётными званиями и юбилеями.
• Химия в школе – доступные материалы для учителей и учеников: эксперименты, методические разработки, подготовка к ЕГЭ и олимпиадам.
• Происшествия – информация об авариях, инцидентах и чрезвычайных ситуациях в химической промышленности по всему миру, анализ их причин и последствий.
ХИА позиционирует себя как сообщество, где за каждой новостью стоят конкретные люди и их достижения. Агентство открыто для сотрудничества и приглашает направлять новости, пресс-релизы и анонсы по адресу hia@igic.ras.ru.

Где читать ХИА:
• сайт «Химическое информационное агентство» (https://cheminform.ru/)
• канал Telegram «Первый химический» (https://t.me/firstchemical)
• группа ВКонтакте «Первый химический» (https://vk.com/firstchemical)

Наполним информационное пространство самыми яркими и значимыми событиями из мира химии!

#российскаянаука

🔥4❤3👍3🤝2❤‍🔥1🗿1

406 views08:16

Chem ML/AI/Datasets

tmQMg* Data Set: Excited State Properties of 74k Transition Metal Complexes

🔥

https://pubs.acs.org/doi/10.1021/acs.jcim.5c01958

The application of machine learning approaches to meaningful problems in chemistry and materials science is still challenged by the limited availability of data. In order to close this gap, we report the tmQMg* data set, which provides excited state properties for 74k mononuclear transition metal complexes extracted from the Cambridge Structural Database. All properties were computed at the TD-DFT ωB97xd/def2SVP level of theory. The strongest electron excitations in the ultraviolet, visible, and near-infrared ranges are included, together with the wavelengths and intensities of the first 30 excited states.

Further, natural transition orbitals were computed for the strongest excitations in the visible range to determine the nature of the associated charge transfers. By computing the TD-DFT spectra in both gas phase and acetone, we quantified solvatochromic effects, which are also provided with the data set, in terms of both wavelength shifts and intensity changes.

The tmQMg* data set will enable the development of discriminative and generative artificial intelligence models with respect to absorption spectra, charge transfer character, and solvatochromism, enabling novel advances in the field of transition metal photochemistry.

📕Journal of Chemical Information and Modeling (IF=5.3)
#dataset

Please open Telegram to view this post

VIEW IN TELEGRAM

ACS Publications

tmQMg* Data Set: Excited State Properties of 74k Transition Metal Complexes

The application of machine learning approaches to meaningful problems in chemistry and materials science is still challenged by the limited availability of data. In order to close this gap, we report the tmQMg* data set, which provides excited state properties…

❤4👍3🔥3

589 views07:18

Chem ML/AI/Datasets

SpecML: web tool for predicting the spectral properties of BODIPYs🏛

https://doi.org/10.1016/j.saa.2025.127091

In this paper, we present the results of training machine learning (ML) models for accurate prediction of several key photophysical characteristics (absorption maximum wavelength, molar absorption coefficient, emission maximum wavelength, fluorescence quantum yield and lifetime, singlet oxygen generation quantum yield) for BODIPYs. ML models were trained using experimental data comprising more than 35,000 records for the predicted parameters. Particular emphasis was placed on model interpretability and on accounting for the solvent nature effect on the predicted photophysical parameters. To ensure open data access and a user-friendly interface, all developed models were integrated into the created web tool SpecML (http://specml.isc-ras.ru/).

SpecML allows prediction of photophysical parameters for individual BODIPY molecules and the screening of entire series of BODIPYs. We believe that our created SpecML web tool will become an effective resource for accelerating the rational design of BODIPYs with desired photophysical properties and will be useful for a wide range of researchers in the fields of photonics, organic electronics, and molecular design.

SpecML: http://specml.isc-ras.ru/

📕Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy (IF=4.6)

#methods

Please open Telegram to view this post

VIEW IN TELEGRAM

👍6❤3🔥3❤‍🔥1😱1

641 views16:01

Chem ML/AI/Datasets

Machine Learning-Assisted Crystal Structure Prediction of Solid-State Electrolytes Reveals Superior Ionic Conductivity in Metastable Edge-Sharing Phases

https://doi.org/10.1021/jacs.5c15665

Significant attention has been devoted to developing novel solid-state electrolytes (SSEs) with high ionic conductivity for all-solid-state batteries (ASSBs). However, most studies have primarily focused on compositional substitutions, often overlooking the fundamental role of inherent crystal structures on ion transport.

To address this, we introduce a theoretical crystal structure prediction (CSP) approach based on the machine-learning moment tensor potential (MTP). The proposed approach successfully identifies novel SSE structures and reproduces 12 experimental crystal structures. Using a phase-diagram-guided strategy, CSP is applied to four promising SSE candidates, Li2SiS3, Li2GeS3, Li4SiGeS6, and Li4SiSnS6, to assess their polyhedral connectivity, relative stability, and Li-ion transport properties.

The results reveal that metastable edge-sharing phases exhibit superior Li-ion mobility compared with their stable corner-sharing counterparts. This superior conductivity is attributed to the Li-ion accessible volume, quantified by the packing ratio (fraction of the unit cell volume occupied by nonconductive volume) and by the dynamic distortion of the Li–S4 sublattice, which represents the local environment encountered by migrating Li-ions. The metastable phases feature higher packing efficiency, larger Li–S4 sublattice volume, and greater distortion, all of which contribute to improved Li-ion transport. This study highlights the potential of CSP to design novel SSEs and high-performance ASSBs.

📕Journal of the American Chemical Society (IF=15.6)
#method

Please open Telegram to view this post

VIEW IN TELEGRAM

ACS Publications

Machine Learning-Assisted Crystal Structure Prediction of Solid-State Electrolytes Reveals Superior Ionic Conductivity in Metastable…

Significant attention has been devoted to developing novel solid-state electrolytes (SSEs) with high ionic conductivity for all-solid-state batteries (ASSBs). However, most studies have primarily focused on compositional substitutions, often overlooking the…

❤5👍3🔥2❤‍🔥1

500 views10:03

Chem ML/AI/Datasets

🎊 Спешим поделиться нашим новым и очень важным релизом

MixtureSolDB, dataset of solubility values for organic compounds in binary mixtures of solvents at various temperatures

https://doi.org/10.26434/chemrxiv-2025-m51v8

Многие из вас помнят наш датасет по растворимости BigSolDB 2.0, который мы опубликовали в июле в Scientific Data📕.

У BigSolDB был один принципиальный недостаток: он не покрывал случаи, когда соединение растворяется в бинарной смеси растворителей, а текущие датасеты (BaoDB, MixSolDB) — на наш взгляд, были слишком небольшими для ML.

Поэтому мы решили собрать самый большой в мире датасет по смесям бинарных растворителей. Так и получился MixtureSolDB.

В него входит:
— 175626 экспериментально измеренных значений растворимости
— 813 уникальных соединений
— 750 уникальных бинарных смесей растворителей
— 3023 уникальные системы растворённое вещество – бинарная смесь растворителей
— данные из 1119 рецензируемых статей

Датасет подходит как для обучения и сравнения различных ML-моделей, так и для прямого анализа экспериментальных данных.

Для удобной визуализации мы также сделали интерактивный веб-интерфейс с 3D-графиками растворимости с возможностью поиска по тривиальным названиям (Aspirin, Paracetamol и т.д.):
https://mixturesoldb.streamlit.app/

Скачать MixtureSolDB можно как всегда на Zenodo:
https://zenodo.org/records/17846307

Please open Telegram to view this post

VIEW IN TELEGRAM

🔥14🎉10🏆6❤1👍1

1.56K views12:35

Chem ML/AI/Datasets

Identifying Dynamic Metal–Ligand Coordination Modes with Ensemble Learning

https://pubs.acs.org/doi/10.1021/jacs.5c17169

In this work, we curate data sets of hemilabile and nonhemilabile ligands from experimentally characterized structures in the Cambridge Structural Database, analyze trends in observed coordination modes, and introduce four exhaustive and mutually exclusive types of hemilability.

Using these labeled data sets, we train graph neural networks to carry out classification of hemilabile ligands with high accuracy, precision, and recall and develop an ensemble algorithm that predicts primary and alternative chemically plausible coordination modes from SMILES strings in an end-to-end fashion. We demonstrate the utility of our algorithm by generating novel TMCs in predicted coordination modes and calculating the corresponding energy difference due to changes in coordination (i.e., ΔEc) with density functional theory.

Comparing our novel TMCs in multiple poses against an energetic criterion from experimentally observed TMCs confirms the plausibility of our alternative poses. We anticipate that our open-source workflows will accelerate organometallic discovery in experimental and virtual screening campaigns by proposing realistic metal–ligand coordination.

👉🏻Web interface enabling no-code prediction of ligand coordination modes: https://molsimplify.mit.edu/pydentate.html

📕Journal of the American Chemical Society (IF=15.6)
#method

Please open Telegram to view this post

VIEW IN TELEGRAM

ACS Publications

Identifying Dynamic Metal–Ligand Coordination Modes with Ensemble Learning

Knowledge of how a ligand coordinates a metal is essential for mechanistic and data-driven studies of transition metal complexes (TMCs), but most analyses assume a single binding interaction for a given metal–ligand pair. In catalysis, many ligands engage…

❤6👍4🔥4🤡1

632 views07:05

Chem ML/AI/Datasets

2501.09223v2.pdf

2.6 MB

Foundations of Large Language Models

https://arxiv.org/abs/2501.09223

Сегодня хочется отойти от химии и поделиться свежей книгой по LLM на 250+ страниц:

This is a book about large language models. As indicated by the title, it primarily focuses on foundational concepts rather than comprehensive coverage of all cutting-edge technologies. The book is structured into five main chapters, each exploring a key area: pre-training, generative models, prompting, alignment, and inference. It is intended for college students, professionals, and practitioners in natural language processing and related fields, and can serve as a reference for anyone interested in large language models.

🔥10❤3👍3

680 viewsedited 14:05

Chem ML/AI/Datasets

oMeBench: Towards Robust Benchmarking of LLMs in Organic Mechanism Elucidation and Reasoning

🔥

https://arxiv.org/abs/2510.07731

We address this by introducing oMeBench, the first large-scale, expert-curated benchmark for organic mechanism reasoning in organic chemistry. It comprises over 10,000 annotated mechanistic steps with intermediates, type labels, and difficulty ratings.

Furthermore, to evaluate LLM capability more precisely and enable fine-grained scoring, we propose oMeS, a dynamic evaluation framework that combines step-level logic and chemical similarity.

We analyze the performance of state-of-the-art LLMs, and our results show that although current models display promising chemical intuition, they struggle with correct and consistent multi-step reasoning. Notably, we find that using prompting strategy and fine-tuning a specialist model on our proposed dataset increases performance by 50% over the leading closed-source model.

#benchmark

Please open Telegram to view this post

VIEW IN TELEGRAM

👍7🔥5❤4

712 viewsedited 07:23

Chem ML/AI/Datasets

Computer vision for high-throughput materials synthesis: a tutorial for experimentalists🔥

https://doi.org/10.1039/D5DD00384A

Here, we aim to fill that identified gap and present a structured tutorial for experimentalists to integrate computer vision into high-throughput materials research, providing a detailed roadmap from data collection to model validation.

Specifically, we describe the hardware and software stack required for deploying CV in materials characterization, including image acquisition, annotation strategies, model training, and performance evaluation.

As a case study, we demonstrate the implementation of a CV workflow within a high-throughput materials synthesis and characterization platform to investigate the crystallization of metal–organic frameworks (MOFs). By outlining key challenges and best practices, this tutorial aims to equip chemists and materials scientists with the necessary tools to harness CV for accelerating materials discovery.

📕Digital Discovery (IF=6.2)
#method

Please open Telegram to view this post

VIEW IN TELEGRAM

pubs.rsc.org

Computer vision for high-throughput materials synthesis: a tutorial for experimentalists

Advances in high-throughput instrumentation and laboratory automation are revolutionizing materials synthesis by enabling the rapid generation of large libraries of novel materials. However, efficient characterization of these synthetic libraries remains…

🔥3❤2👍2

697 views10:23

About

Blog

Apps

Platform