Graph Machine Learning

GraphML News (March 23rd) - Neo-1 and Lila Sciences round

🧬 VantAI announced Neo-1, a foundation model for structure prediction and de novo generation capable of doing a bunch of protein design tasks (folding, co-folding, docking, all-atom molecule design, fragment linking, and more) at once instead of different modules. While we are waiting for the tech report, we could guesstimate that Neo-1 is an all-atom latent generative model (perhaps a Diffusion Transformer like in other competitors as it’s powered by a hefty cluster of H100s) with some advanced sampling techniques beyond standard guidance - the blog post talks about optimizing for non-differentiable properties with reward-like models and it sounds quite similar to the ICLR 2025 paper on posterior prediction.

As impressive as the modeling advances are, true aficionados know that data diversity and distribution is even more important at scale - on that front VantAI introduce NeoLink, a massive data generation flywheel based on cross-linking mass-spectrometry (XLMS). Reported experiments suggest it brings massive improvements in quality, so it’s likely to be the key innovation and the point of further scaling up. The graphics in the blog post are amazing and the graphic designer should get a raise 📈.

💸 Lila Sciences went out of stealth with $200M seed funding. Lila will focus on materials discovery and automated self-driving labs while alluding to Superscience, an AI 4 Science equivalent of Super Intelligence you often hear from LLM folks which would massive speed up exploration pipelines. Lila is part of the Flagship Pioneering ecosystem (you might know Generate Biomedicines and their Chroma generative model made some noise last year) and attracted funding from General Catalyst, March Capital, ARK, and other famous VCs (even Abu Dhabi Investment Authority). Knowing that the OpenAI VP of post-training William Fedus left to start his own AI 4 Science company, the area is likely to attract even more VC funding in the near future.

Weekend reading:

Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement by Huidong Liang and Oxford folks - introduces new long-range graph datasets extracted from road networks in OpenStreetMap. Good news: graphs are quite large and sparse (100k nodes with 100+ diameter). Less good news: GraphSAGE is still SOTA 🫠

No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets by Corinna Coupette, Jeremy Wayland, et al - studies the quality of 11 graph classification datasets, only NC11, MolHIV, and LRGB datasets are ok, others should be thrown to garbage.

A Materials Foundation Model via Hybrid Invariant-Equivariant Architectures by Keqiang Yan and large Texas A&M collab - introduces HIENet, an ML potential rivaling MACE-MP0, Equiformer, and ORB on energy, forces, and stresses predictions.

Survey on Generalization Theory for Graph Neural Networks by Antonis Vasileiou, Stefanie Jegelka, Ron Levie, and Christopher Morris - everything you wanted to know about GNNs linked to VC dimension, Rademacher complexity, PAC-Bayes, and learning theory. MATH ALERT

👍26❤11🔥1

4.13K views04:05