\(\renewcommand{\AA}{\text{Å}}\)
fitpod command
Syntax
fitpod Ta_param.pod Ta_data.pod Ta_coefficients.pod
fitpod = style name of this command
Ta_param.pod = an input file that describes proper orthogonal descriptors (PODs)
Ta_data.pod = an input file that specifies DFT data used to fit a POD potential
Ta_coefficients.pod (optional) = an input file that specifies trainable coefficients of a POD potential
Examples
fitpod Ta_param.pod Ta_data.pod
fitpod Ta_param.pod Ta_data.pod Ta_coefficients.pod
Description
Added in version 22Dec2022.
Fit a machinelearning interatomic potential (MLIAP) based on proper orthogonal descriptors (POD); please see (Nguyen and Rohskopf), (Nguyen2023), (Nguyen2024), and (Nguyen and Sema) for details. The fitted POD potential can be used to run MD simulations via pair_style pod.
Two input files are required for this command. The first input file
describes a POD potential parameter settings, while the second input
file specifies the DFT data used for the fitting procedure. All keywords
except species have default values. If a keyword is not set in the
input file, its default value is used. The table below has oneline
descriptions of all the keywords that can be used in the first input
file (i.e. Ta_param.pod
)
Keyword 
Default 
Type 
Description 

species 
(none) 
STRING 
Chemical symbols for all elements in the system and have to match XYZ training files. 
pbc 
1 1 1 
INT 
three integer constants specify boundary conditions 
rin 
0.5 
REAL 
a real number specifies the inner cutoff radius 
rcut 
5.0 
REAL 
a real number specifies the outer cutoff radius 
bessel_polynomial_degree 
4 
INT 
the maximum degree of Bessel polynomials 
inverse_polynomial_degree 
8 
INT 
the maximum degree of inverse radial basis functions 
number_of_environment_clusters 
1 
INT 
the number of clusters for environmentadaptive potentials 
number_of_principal_components 
2 
INT 
the number of principal components for dimensionality reduction 
onebody 
1 
BOOL 
turns on/off onebody potential 
twobody_number_radial_basis_functions 
8 
INT 
number of radial basis functions for twobody potential 
threebody_number_radial_basis_functions 
6 
INT 
number of radial basis functions for threebody potential 
threebody_angular_degree 
5 
INT 
angular degree for threebody potential 
fourbody_number_radial_basis_functions 
4 
INT 
number of radial basis functions for fourbody potential 
fourbody_angular_degree 
3 
INT 
angular degree for fourbody potential 
fivebody_number_radial_basis_functions 
0 
INT 
number of radial basis functions for fivebody potential 
fivebody_angular_degree 
0 
INT 
angular degree for fivebody potential 
sixbody_number_radial_basis_functions 
0 
INT 
number of radial basis functions for sixbody potential 
sixbody_angular_degree 
0 
INT 
angular degree for sixbody potential 
sevenbody_number_radial_basis_functions 
0 
INT 
number of radial basis functions for sevenbody potential 
sevenbody_angular_degree 
0 
INT 
angular degree for sevenbody potential 
Note that both the number of radial basis functions and angular degree
must decrease as the body order increases. The next table describes all
keywords that can be used in the second input file (i.e. Ta_data.pod
in the example above):
Keyword 
Default 
Type 
Description 

file_format 
extxyz 
STRING 
only the extended xyz format (extxyz) is currently supported 
file_extension 
xyz 
STRING 
extension of the data files 
path_to_training_data_set 
(none) 
STRING 
specifies the path to training data files in double quotes 
path_to_test_data_set 
“” 
STRING 
specifies the path to test data files in double quotes 
path_to_environment_configuration_set 
“” 
STRING 
specifies the path to environment configuration files in double quotes 
fraction_training_data_set 
1.0 
REAL 
a real number (<= 1.0) specifies the fraction of the training set used to fit POD 
randomize_training_data_set 
0 
BOOL 
turns on/off randomization of the training set 
fraction_test_data_set 
1.0 
REAL 
a real number (<= 1.0) specifies the fraction of the test set used to validate POD 
randomize_test_data_set 
0 
BOOL 
turns on/off randomization of the test set 
fitting_weight_energy 
100.0 
REAL 
a real constant specifies the weight for energy in the leastsquares fit 
fitting_weight_force 
1.0 
REAL 
a real constant specifies the weight for force in the leastsquares fit 
fitting_regularization_parameter 
1.0e10 
REAL 
a real constant specifies the regularization parameter in the leastsquares fit 
error_analysis_for_training_data_set 
0 
BOOL 
turns on/off error analysis for the training data set 
error_analysis_for_test_data_set 
0 
BOOL 
turns on/off error analysis for the test data set 
basename_for_output_files 
pod 
STRING 
a basename string added to the output files 
precision_for_pod_coefficients 
8 
INT 
number of digits after the decimal points for numbers in the coefficient file 
group_weights 
global 
STRING 

All keywords except path_to_training_data_set have default values. If a keyword is not set in the input file, its default value is used. After successful training, a number of output files are produced, if enabled:
<basename>_training_errors.pod
reports the errors in energy and forces for the training data set<basename>_training_analysis.pod
reports detailed errors for all training configurations<basename>_test_errors.pod
reports errors for the test data set<basename>_test_analysis.pod
reports detailed errors for all test configurations<basename>_coefficients.pod
contains the coefficients of the POD potential
After training the POD potential, Ta_param.pod
and
<basename>_coefficients.pod
are the two files needed to use the POD
potential in LAMMPS. See pair_style pod for using the
POD potential. Examples about training and using POD potentials are
found in the directory lammps/examples/PACKAGES/pod and the Github repo
https://github.com/cesmixmit/podexamples.
Loss Function Group Weights
The group_weights keyword in the data.pod
file is responsible for
weighting certain groups of configurations in the loss function. For
example:
group_weights table
Displaced_A15 100.0 1.0
Displaced_BCC 100.0 1.0
Displaced_FCC 100.0 1.0
Elastic_BCC 100.0 1.0
Elastic_FCC 100.0 1.0
GSF_110 100.0 1.0
GSF_112 100.0 1.0
Liquid 100.0 1.0
Surface 100.0 1.0
Volume_A15 100.0 1.0
Volume_BCC 100.0 1.0
Volume_FCC 100.0 1.0
This will apply an energy weight of 100.0
and a force weight of
1.0
for all groups in the Ta
example. The groups are named by
their respective filename. If certain groups are left out of this table,
then the globally defined weights from the fitting_weight_energy
and
fitting_weight_force
keywords will be used.
POD Potential
We consider a multielement system of N atoms with \(N_{\rm e}\) unique elements. We denote by \(\boldsymbol r_n\) and \(Z_n\) position vector and type of an atom n in the system, respectively. Note that we have \(Z_n \in \{1, \ldots, N_{\rm e} \}\), \(\boldsymbol R = (\boldsymbol r_1, \boldsymbol r_2, \ldots, \boldsymbol r_N) \in \mathbb{R}^{3N}\), and \(\boldsymbol Z = (Z_1, Z_2, \ldots, Z_N) \in \mathbb{N}^{N}\). The total energy of the POD potential is expressed as \(E(\boldsymbol R, \boldsymbol Z) = \sum_{i=1}^N E_i(\boldsymbol R_i, \boldsymbol Z_i)\), where
Here \(c_m\) are trainable coefficients and \(\mathcal{D}_{im}(\boldsymbol R_i, \boldsymbol Z_i)\) are peratom POD descriptors. Summing the peratom descriptors over \(i\) yields the global descriptors \(d_m(\boldsymbol R, \boldsymbol Z) = \sum_{i=1}^N \mathcal{D}_{im}(\boldsymbol R_i, \boldsymbol Z_i)\). It thus follows that \(E(\boldsymbol R, \boldsymbol Z) = \sum_{m=1}^M c_m d_m(\boldsymbol R, \boldsymbol Z)\).
The peratom POD descriptors include one, two, three, four, five, six, and sevenbody descriptors, which can be specified in the first input file. Furthermore, the peratom POD descriptors also depend on the number of environment clusters specified in the first input file. Please see (Nguyen2024) and (Nguyen and Sema) for the detailed description of the peratom POD descriptors.
Training
A POD potential is trained using the leastsquares regression against density functional theory (DFT) data. Let \(J\) be the number of training configurations, with \(N_j\) being the number of atoms in the jth configuration. The training configurations are extracted from the extended XYZ files located in a directory (i.e., path_to_training_data_set in the second input file). Let \(\{E^{\star}_j\}_{j=1}^{J}\) and \(\{\boldsymbol F^{\star}_j\}_{j=1}^{J}\) be the DFT energies and forces for \(J\) configurations. Next, we calculate the global descriptors and their derivatives for all training configurations. Let \(d_{jm}, 1 \le m \le M\), be the global descriptors associated with the jth configuration, where \(M\) is the number of global descriptors. We then form a matrix \(\boldsymbol A \in \mathbb{R}^{J \times M}\) with entries \(A_{jm} = d_{jm}/ N_j\) for \(j=1,\ldots,J\) and \(m=1,\ldots,M\). Moreover, we form a matrix \(\boldsymbol B \in \mathbb{R}^{\mathcal{N} \times M}\) by stacking the derivatives of the global descriptors for all training configurations from top to bottom, where \(\mathcal{N} = 3\sum_{j=1}^{J} N_j\).
The coefficient vector \(\boldsymbol c\) of the POD potential is found by solving the following leastsquares problem
where \(w_E\) and \(w_F\) are weights for the energy (fitting_weight_energy) and force (fitting_weight_force), respectively; and \(w_R\) is the regularization parameter (fitting_regularization_parameter). Here \(\bar{\boldsymbol E}^{\star} \in \mathbb{R}^{J}\) is a vector of with entries \(\bar{E}^{\star}_j = E^{\star}_j/N_j\) and \(\boldsymbol F^{\star}\) is a vector of \(\mathcal{N}\) entries obtained by stacking \(\{\boldsymbol F^{\star}_j\}_{j=1}^{J}\) from top to bottom.
Validation
POD potential can be validated on a test dataset in a directory specified by setting path_to_test_data_set in the second input file. It is possible to validate the POD potential after the training is complete. This is done by providing the coefficient file as an input to fitpod, for example,
fitpod Ta_param.pod Ta_data.pod Ta_coefficients.pod
Restrictions
This command is part of the MLPOD package. It is only enabled if LAMMPS was built with that package. See the Build package page for more info.
Default
The keyword defaults are also given in the description of the input files.
(Nguyen and Rohskopf) Nguyen and Rohskopf, Journal of Computational Physics, 480, 112030, (2023).
(Nguyen2023) Nguyen, Physical Review B, 107(14), 144103, (2023).
(Nguyen2024) Nguyen, Journal of Computational Physics, 113102, (2024).
(Nguyen and Sema) Nguyen and Sema, https://arxiv.org/abs/2405.00306, (2024).