This was part of Materials Informatics: Tutorials and Hands-On

A mathematical analysis of GNoME and other materials databases

Vitaliy Kurlin, University of Liverpool

Thursday, March 14, 2024



Slides
Abstract:

A solid crystalline material was traditionally represented by a Crystallographic Information File (CIF) with a unit cell containing a motif of atoms, ions, or molecules, which are periodically repeated in three independent directions. This cell-based representation was highly ambiguous because a unit cell (even if primitive with a minimal volume) can be chosen in infinitely many ways. Crystallography tried to avoid this ambiguity by using a reduced cell whose best-known example is Niggli's cell. Unfortunately, all reduced cells are discontinuous under almost any noise, which can break the symmetry and arbitrarily scale up a minimal cell.This ambiguity was recently resolved by continuous invariants that provably distinguish all periodic crystals in general position (NeurIPS 2022) under isometry, which is a composition of translations, rotations, and reflections.  All 660+ thousand real materials (with no disorder) in the Cambridge Structural Database were distinguished through 200+ billion pairwise comparisons within two days on a modest desktop. The unexpected pairs of geometric duplicates, where one atom was replaced with a different one, without changing atomic coordinates, are investigated by five journals for data integrity.In November 2023, Nature published two papers reporting Google's GNoME database of 384+ thousand predicted materials (https://deepmind.google/discover/blog/millions-of-new-materials-discovered-with-deep-learning) of which 41 were claimed to be synthesized in the Berkeley lab. The question "whether an AI-controlled lab assistant actually made any novel substances" is discussed at https://www.nature.com/articles/d41586-023-03956-w. This talk will report pairwise comparisons across several databases including GNoME, which turned out to contain thousands of identical CIFs. The relevant papers are linked at http://kurlin.org/research-papers.php#Geometric-Data-Science.