compute slcsa/atom command

Syntax

compute ID group-ID slcsa/atom twojmax nclasses db_mean_descriptor_file lda_file lr_decision_file lr_bias_file maha_file value

ID, group-ID are documented in compute command
slcsa/atom = style name of this compute command
twojmax = band limit for bispectrum components (non-negative integer)
nclasses = number of crystal structures used in the database for the classifier SL-CSA
db_mean_descriptor_file = file name of file containing the database mean descriptor
lda_file = file name of file containing the linear discriminant analysis matrix for dimension reduction
lr_decision_file = file name of file containing the scaling matrix for logistic regression classification
lr_bias_file = file name of file containing the bias vector for logistic regression classification
maha_file = file name of file containing for each crystal structure: the Mahalanobis distance threshold for sanity check purposes, the average reduced descriptor and the inverse of the corresponding covariance matrix
c_ID[1] = compute ID and output data column of previously defined compute sna/atom command

Examples

compute b1 all sna/atom 9.0 0.99363 8 0.5 1.0 rmin0 0.0 nnn 24 wmode 1 delta 0.3
compute b2 all slcsa/atom 8 4 mean_descriptors.dat lda_scalings.dat lr_decision.dat lr_bias.dat maha_thresholds.dat c_b1[1]

Description

Added in version 7Feb2024.

Define a computation that performs the Supervised Learning Crystal Structure Analysis (SL-CSA) from (Lafourcade) for each atom in the group. The SL-CSA tool takes as an input a per-atom descriptor (bispectrum) that is computed through the compute sna/atom command and then proceeds to a dimension reduction step followed by a logistic regression in order to assign a probable crystal structure to each atom in the group. The SL-CSA tool is pre-trained on a database containing $C$ distinct crystal structures from which a crystal structure classifier is derived and a tutorial to build such a tool is available at SL-CSA.

The first step of the SL-CSA tool consists in performing a dimension reduction of the per-atom descriptor $B^{i} \in R^{D}$ through the Linear Discriminant Analysis (LDA) method, leading to a new projected descriptor $x^{i} = P_{LDA} (B^{i}) : R^{D} \to R^{d = C - 1}$ :

x^{i} = C_{LDA}^{T} \cdot (B^{i} - μ_{db}^{B})

where $C_{LDA}^{T} \in R^{D \times d}$ is the reduction coefficients matrix of the LDA model read in file lda_file, $B^{i} \in R^{D}$ is the bispectrum of atom $i$ and $μ_{db}^{B} \in R^{D}$ is the average descriptor of the entire database. The latter is computed from the average descriptors of each crystal structure read from the file mean_descriptors_file.

The new projected descriptor with dimension $d = C - 1$ allows for a good separation of different crystal structures fingerprints in the latent space.

Once the dimension reduction step is performed by means of LDA, the new descriptor $x^{i} \in R^{d = C - 1}$ is taken as an input for performing a multinomial logistic regression (LR) which provides a score vector $s^{i} = P_{LR} (x^{i}) : R^{d} \to R^{C}$ defined as:

s^{i} = b_{LR} + D_{LR} \cdot {x^{i}}^{T}

with $b_{LR} \in R^{C}$ and $D_{LR} \in R^{C \times d}$ the bias vector and decision matrix of the LR model after training both read in files lr_fil1 and lr_file2 respectively.

Finally, a probability vector $p^{i} = P_{LR} (x^{i}) : R^{d} \to R^{C}$ is defined as:

p^{i} = \frac{\exp (s^{i})}{\sum_{j} \exp (s_{j}^{i})}

from which the crystal structure assigned to each atom with descriptor $B^{i}$ and projected descriptor $x^{i}$ is computed as the argmax of the probability vector $p^{i}$ . Since the logistic regression step systematically attributes a crystal structure to each atom, a sanity check is needed to avoid misclassification. To this end, a per-atom Mahalanobis distance to each crystal structure CS present in the database is computed:

d_{Mahalanobis}^{i \to CS} = \sqrt{(x^{i} - μ_{CS}^{x})^{T} \cdot Σ_{CS}^{- 1} \cdot (x^{i} - μ_{CS}^{x})}

where $μ_{CS}^{x} \in R^{d}$ is the average projected descriptor of crystal structure CS in the database and where $Σ_{CS} \in R^{d \times d}$ is the corresponding covariance matrix. Finally, if the Mahalanobis distance to crystal structure CS for atom i is greater than the pre-determined threshold, no crystal structure is assigned to atom i. The Mahalanobis distance thresholds are read in file maha_file while the covariance matrices are read in file covmat_file.

The SL-CSA framework provides an automatic computation of the different matrices and thresholds required for a proper classification and writes down all the required files for calling the compute slcsa/atom command.

The compute slcsa/atom command requires that the compute sna/atom command is called before as it takes the resulting per-atom bispectrum as an input. In addition, it is crucial that the value twojmax is set to the same value of the value twojmax used in the compute sna/atom command, as well as that the value nclasses is set to the number of crystal structures used in the database to train the SL-CSA tool.

Output info

By default, this compute computes the Mahalanobis distances to the different crystal structures present in the database in addition to assigning a crystal structure for each atom as a per-atom vector, which can be accessed by any command that uses per-atom values from a compute as input. See the Howto output page for an overview of LAMMPS output options.

Restrictions

This compute is part of the EXTRA-COMPUTE package. It is only enabled if LAMMPS was built with that package. See the Build package page for more info.

Default

none

(Lafourcade) Lafourcade, Maillet, Denoual, Duval, Allera, Goryaeva, and Marinica, Comp. Mat. Science, 230, 112534 (2023)