Calculating Features
Atom and Atoms class
Calculating features is done through the Atoms class, which contains one molecular geometry. Each atom inside an Atoms instance is an Atom instance. These two classes are central to calculations relating to molecular geometries. Files containing a molecular geometry should have the .atoms attribute.
[51]:
from ichor.core.atoms import Atoms, Atom
# make some random geometry
atoms_instance = Atoms([Atom("O", 0.0, 0.0, 0.0),
Atom("H", 2.1, 2.3, 0.0),
Atom("H", 5.0, 0.0, 4.3)])
print(atoms_instance)
O1 0.00000000 0.00000000 0.00000000
H2 2.10000000 2.30000000 0.00000000
H3 5.00000000 0.00000000 4.30000000
Calculators
Calculator functions take in an Atom or Atoms instance and calculate something with it. An example below calculated the atomic local frame (ALF) features.
Features are calculated using feature calculator functions, which take in an Atom instance and return the calculated features. The calculators might need extra positional or keyword arguments. Currently, only an atomic local frame calculator is implemented directly in ichor, but there is nothing stopping a user from implementing their own calculators as needed. Below are examples of using feature calculator functions and writing custom ones.
[52]:
from ichor.core.calculators import default_alf_calculator, default_feature_calculator
from pprint import pprint
# for the alf features, we need to define an ALF first
alf_dict = atoms_instance.alf_dict(default_alf_calculator)
pprint(alf_dict)
alf_list = atoms_instance.alf_list(default_alf_calculator)
pprint(alf_list)
{'H2': ALF(origin_idx=1, x_axis_idx=0, xy_plane_idx=2),
'H3': ALF(origin_idx=2, x_axis_idx=0, xy_plane_idx=1),
'O1': ALF(origin_idx=0, x_axis_idx=1, xy_plane_idx=2)}
[ALF(origin_idx=0, x_axis_idx=1, xy_plane_idx=2),
ALF(origin_idx=1, x_axis_idx=0, xy_plane_idx=2),
ALF(origin_idx=2, x_axis_idx=0, xy_plane_idx=1)]
As seen above, we first calculate the ALF for every atom. There is the alf_list and alf_dict methods, which return either a list of a dictionary respectively. In both cases, the ALF class is used to contain information about the atom’s ALF. This ALF class is simply a named tuple containing the origin index, x axis index, and xy plane index. Note that these are 0-indexed as Python is 0 indexed. However, do note that the actual atom names (eg. O1, H2, etc.) do start at 1. There is
no atom H0 for example.
[53]:
# calculating features with given alf
features = atoms_instance.features(default_feature_calculator, alf_dict)
pprint(features)
features = atoms_instance.features(default_feature_calculator, alf_list)
pprint(features)
array([[ 5.88551857, 12.46216712, 1.0341914 ],
[ 5.88551857, 10.72159395, 1.61608526],
[12.46216712, 10.72159395, 0.49131599]])
array([[ 5.88551857, 12.46216712, 1.0341914 ],
[ 5.88551857, 10.72159395, 1.61608526],
[12.46216712, 10.72159395, 0.49131599]])
The ALF features for the given Atoms instance are calculated as a numpy array of shape natoms x nfeatures. Since these features are per-atom, each atom has its own feature set. Now, we can also calculate the features for an individual atom like so
[54]:
one_atom_features = atoms_instance["O1"].features(default_feature_calculator, alf_dict)
pprint(one_atom_features)
array([ 5.88551857, 12.46216712, 1.0341914 ])
This is done by directly calculating the ALF features only for the O1 atom.
Defining a custom feature calculator
Let’s say that you want to implement a custom calculator. All you need to do is implement a function that takes in an Atom or Atoms instance and does something with it. If an Atom is passed in, then the function should compute separate features for each atom. If an Atoms instance is passed in, then the function should compute one set of features for the whole system. Below are two examples of defining custom feature calculator functions.
Computing separate feature vectors for each atom
[58]:
import numpy as np
def x_coordinate_features(atom_instance: Atom):
"""A calculator function that grabs only the x-coordinate if the atom is an oxygen
or grabs the y-coordinate for any other atom type.
:param atom_instance: an Atom instance to work with
:return: An array containing the relevant coordinate for each atom
"""
if atom_instance.type == "O":
return np.array([a.coordinates[0] for a in atom_instance.parent])
return np.array([a.coordinates[1] for a in atom_instance.parent])
new_set_of_features_o1 = atoms_instance["O1"].features(x_coordinate_features)
new_set_of_features_h2 = atoms_instance["H2"].features(x_coordinate_features)
print(new_set_of_features_o1)
print(new_set_of_features_h2)
[0. 2.1 5. ]
[0. 2.3 0. ]
To get the features for all atoms at once, the Atoms class is used instead, which just gets the features for all Atom instances held inside the Atoms class. Of course, much more complicated feature calculators can be built.
[56]:
new_set_of_features_all_atoms = atoms_instance.features(x_coordinate_features)
print(new_set_of_features_all_atoms)
[[0. 2.1 5. ]
[0. 2.3 0. ]
[0. 2.3 0. ]]
Coulomb Matrix Example - One feature vector for whole system
Below is an example of how to use ichor to compute the coulomb matrix representation of molecules. Note that the is_atomic=False is used to indicate that the calculator is going to be obtaining a feature vector used to describe the whole system. In this case, each atom does not have its own set of features.
[57]:
# define calculator function which is going to compute a coulomb matrix for a given geometry
from ichor.core.common.constants import type2nuclear_charge
from ichor.core.atoms import Atom, Atoms
def coulomb_matrix_representation(atoms: Atoms):
natoms = len(atoms)
res = np.empty((natoms, natoms))
for i, atm1 in enumerate(atoms):
for j, atm2 in enumerate(atoms):
if i == j:
res[i, j] = 0.5 * type2nuclear_charge[atm1.type]**2.4
else:
dist = np.linalg.norm(atm2.coordinates - atm1.coordinates)
res[i, j] = (type2nuclear_charge[atm1.type] * type2nuclear_charge[atm2.type]) / dist
return res
atoms = Atoms([Atom("O", 0.0, 0.0, 0.0), Atom("H", 0.82, 0.0, 0.0), Atom("H", -0.54, 0.83, 0.0)])
coulomb_matrix_descriptors = atoms.features(coulomb_matrix_representation, is_atomic=False)
coulomb_matrix_descriptors
[57]:
array([[73.51669472, 9.75609756, 8.07915961],
[ 9.75609756, 0.5 , 0.62764116],
[ 8.07915961, 0.62764116, 0.5 ]])