\(\renewcommand{\AA}{\text{Å}}\)

# compute slcsa/atom command

## Syntax

```
compute ID group-ID slcsa/atom twojmax nclasses db_mean_descriptor_file lda_file lr_decision_file lr_bias_file maha_file value
```

ID, group-ID are documented in compute command

slcsa/atom = style name of this compute command

twojmax = band limit for bispectrum components (non-negative integer)

nclasses = number of crystal structures used in the database for the classifier SL-CSA

db_mean_descriptor_file = file name of file containing the database mean descriptor

lda_file = file name of file containing the linear discriminant analysis matrix for dimension reduction

lr_decision_file = file name of file containing the scaling matrix for logistic regression classification

lr_bias_file = file name of file containing the bias vector for logistic regression classification

maha_file = file name of file containing for each crystal structure: the Mahalanobis distance threshold for sanity check purposes, the average reduced descriptor and the inverse of the corresponding covariance matrix

c_ID[*] = compute ID of previously required

*compute sna/atom*command

## Examples

```
compute b1 all sna/atom 9.0 0.99363 8 0.5 1.0 rmin0 0.0 nnn 24 wmode 1 delta 0.3
compute b2 all slcsa/atom 8 4 mean_descriptors.dat lda_scalings.dat lr_decision.dat lr_bias.dat maha_thresholds.dat c_b1[*]
```

## Description

Added in version 7Feb2024.

Define a computation that performs the Supervised Learning Crystal
Structure Analysis (SL-CSA) from (Lafourcade)
for each atom in the group. The SL-CSA tool takes as an input a per-atom
descriptor (bispectrum) that is computed through the *compute sna/atom*
command and then proceeds to a dimension reduction step followed by a
logistic regression in order to assign a probable crystal structure to
each atom in the group. The SL-CSA tool is pre-trained on a database
containing \(C\) distinct crystal structures from which a crystal
structure classifier is derived and a tutorial to build such a tool is
available at SL-CSA.

The first step of the SL-CSA tool consists in performing a dimension reduction of the per-atom descriptor \(\mathbf{B}^i \in \mathbb{R}^{D}\) through the Linear Discriminant Analysis (LDA) method, leading to a new projected descriptor \(\mathbf{x}^i=\mathrm{P}_\mathrm{LDA}(\mathbf{B}^i):\mathbb{R}^D \rightarrow \mathbb{R}^{d=C-1}\):

where \(\mathbf{C}^T_\mathrm{LDA} \in \mathbb{R}^{D \times d}\) is
the reduction coefficients matrix of the LDA model read in file
*lda_file*, \(\mathbf{B}^i \in \mathbb{R}^{D}\) is the bispectrum of
atom \(i\) and \(\mu^\mathbf{B}_\mathrm{db} \in \mathbb{R}^{D}\)
is the average descriptor of the entire database. The latter is computed
from the average descriptors of each crystal structure read from the
file *mean_descriptors_file*.

The new projected descriptor with dimension \(d=C-1\) allows for a good separation of different crystal structures fingerprints in the latent space.

Once the dimension reduction step is performed by means of LDA, the new descriptor \(\mathbf{x}^i \in \mathbb{R}^{d=C-1}\) is taken as an input for performing a multinomial logistic regression (LR) which provides a score vector \(\mathbf{s}^i=\mathrm{P}_\mathrm{LR}(\mathbf{x}^i):\mathbb{R}^d \rightarrow \mathbb{R}^C\) defined as:

with \(\mathbf{b}_\mathrm{LR} \in \mathbb{R}^C\) and
\(\mathbf{D}_\mathrm{LR} \in \mathbb{R}^{C \times d}\) the bias
vector and decision matrix of the LR model after training both read in
files *lr_fil1* and *lr_file2* respectively.

Finally, a probability vector \(\mathbf{p}^i=\mathrm{P}_\mathrm{LR}(\mathbf{x}^i):\mathbb{R}^d \rightarrow \mathbb{R}^C\) is defined as:

from which the crystal structure assigned to each atom with descriptor
\(\mathbf{B}^i\) and projected descriptor \(\mathbf{x}^i\) is
computed as the *argmax* of the probability vector
\(\mathbf{p}^i\). Since the logistic regression step systematically
attributes a crystal structure to each atom, a sanity check is needed to
avoid misclassification. To this end, a per-atom Mahalanobis distance to
each crystal structure *CS* present in the database is computed:

where \(\mathbf{\mu}^\mathbf{x}_\mathrm{CS} \in \mathbb{R}^{d}\) is
the average projected descriptor of crystal structure *CS* in the
database and where \(\mathbf{\Sigma}_\mathrm{CS} \in \mathbb{R}^{d
\times d}\) is the corresponding covariance matrix. Finally, if the
Mahalanobis distance to crystal structure *CS* for atom *i* is greater
than the pre-determined threshold, no crystal structure is assigned to
atom *i*. The Mahalanobis distance thresholds are read in file
*maha_file* while the covariance matrices are read in file
*covmat_file*.

The SL-CSA framework provides
an automatic computation of the different matrices and thresholds
required for a proper classification and writes down all the required
files for calling the *compute slcsa/atom* command.

The *compute slcsa/atom* command requires that the compute
sna/atom command is called before as it takes the
resulting per-atom bispectrum as an input. In addition, it is crucial
that the value *twojmax* is set to the same value of the value *twojmax*
used in the *compute sna/atom* command, as well as that the value
*nclasses* is set to the number of crystal structures used in the
database to train the SL-CSA tool.

## Output info

By default, this compute computes the Mahalanobis distances to the different crystal structures present in the database in addition to assigning a crystal structure for each atom as a per-atom vector, which can be accessed by any command that uses per-atom values from a compute as input. See the Howto output page for an overview of LAMMPS output options.

## Restrictions

This compute is part of the EXTRA-COMPUTE package. It is only enabled if LAMMPS was built with that package. See the Build package page for more info.

## Default

none

**(Lafourcade)** Lafourcade, Maillet, Denoual, Duval, Allera, Goryaeva, and Marinica,
Comp. Mat. Science, 230, 112534 (2023)