ichor.core.files package

Subpackages

Submodules

ichor.core.files.directory module

class AnnotatedDirectory(path: Path | str)

Bases: Directory, ABC

Abstract method for adding a parser for a Directory that has annotated files (such as GJF, Int, WFN). For example, look at the PointDirectory class.

Note

If multiple files with the same extensions are found, they will be stored in a list instead, so accessing an attribute might return a list if multiple files are found with the same extension

contents = None
property directories: List[Directory]

Return all objects which are contained in the AnnotatedDirectory instance and that subclass from Directory class.

dirtypes

value pairs where the keys are the attributes and the values are the type of class these attributes are going to be set to. These classes are all subclassing from the Directory class. For example {‘ints’: INTs}.

Type:

Returns a dictionary of key

property files: List[File]

Return all objects which are contained in the AnnotatedDirectory instance and that subclass from File class.

filetypes

value pairs where the keys are the attributes and the values are the type of class these attributes are going to be set to. These classes are all subclassing from the File class. For example {‘gjf’: GJF, ‘wfn’: WFN}.

Type:

Returns a dictionary of key

path: Path | str
property path_objects: List[PathObject]

Returns a list of PathObjects corresponding to files and directories that are in the instance of AnnotatedDirectory.

pathtypes
property type_to_contents: dict

Returns a dictionary containing the class as keys and the attributes as values. Reverses the self.contents attribute

class Directory(path: Path | str)

Bases: PathObject, ABC

A class that implements helper methods for working with directories (which are stored on a hard drive). :param path: The path to a directory

classmethod check_path(path: Path) bool

Implement if the path of the directory needs to be checked if it contains something specific

iterdir()

alias to __iter__ in case child object overrides __iter__

mkdir()

Make an empty directory at the location of the path attribute.

move(dst: Path)

Move a directory object to a new location (a new path), modifies the path attribute and moves contents on disk :param dst: The new path of the directory

property name
property name_without_suffix
path: Path | str

ichor.core.files.file module

class File(path: Path | str)

Bases: PathObject, ABC

Abstract Base Class for any type of file that is used by ICHOR.

block()

Blocks a file from being read. Contents of the file cannot be read.

classmethod check_path(path: str | Path) bool

Checks the suffix of the given path matches the filetype associated with class that subclasses from File :param path: A Path object to check :return: True if the Path object has the same suffix as the class filetype, False otherwise

classmethod get_filetype() str

Returns a filetype for the particular kind of file

Returns:

A string containing the suffix of the file (the filetype)

move(dst)

Move the file to a new destination.

Parameters:

dst – The new path to the file. If a directory, the file is moved inside the directory.

path: Path | str
unblock()

Unblocks a blocked file.

class FileContentsType

Bases: NoStr

A class whose instance is used for class attributes that are read in from a file. If a class attribute is FileContents type, then we read the file and store the read in value. This class allows for lazily reading files (i.e. files are not directly read in when an instance of a File (or its subclasses) is made, but only when attributes of that instance (which are FileContents) are being accessed.

exception FileReadError

Bases: Exception

class FileState(value)

Bases: Enum

An enum that is used to make it easier to check the current file state. Blocked is actually not used currently.

Blocked = -1
Read = 3
Reading = 2
Unread = 1
exception FileWriteError

Bases: Exception

class ReadFile(path: Path | str)

Bases: File, ABC

path: Path | str
read(*args, **kwargs)

Read the contents of the file. Depending on the type of file, different parts will be read in.

Note

Only files which exist on disk can be read from. Otherwise, nothing will be read in.

class WriteFile(path: Path | str)

Bases: File, ABC

path: Path | str
write(path: str | Path | None = None, *args, **kwargs)

This write method should only be called if no other write method exists. A write method is implemented for files that we typically write out (such as .xyz or .gjf files). But other files (which are outputs of a program, such as .wfn, and .int), we only need to read and do not have to write out ourselves.

ichor.core.files.file_data module

class HasAtoms

Bases: ABC

Abstract base class for classes which either have a property or attribute of atoms that gives back an Atoms instance.

C_matrix_dict(system_alf: List[ALF]) Dict[str, ndarray]

Returns a dictionary of key (atom name), value (C matrix np array) for every atom

C_matrix_list(system_alf: List[ALF]) List[ndarray]

Returns a list C matrix np array for every atom

alf(alf_calculator: Callable[[...], ALF], *args, **kwargs) List[ALF]

Returns the Atomic Local Frame (ALF) for all Atom instances that are held in Atoms e.g. [[0,1,2],[1,0,2], [2,0,1]]

alf_dict(alf_calculator: Callable[[...], ALF], *args, **kwargs) Dict[str, ALF]

Returns a list of lists with the atomic local frame indices for every atom (0-indexed).

alf_list(alf_calculator: Callable[[...], ALF], *args, **kwargs) List[List[int]]

Returns a list of lists with the atomic local frame indices for every atom (0-indexed).

property atom_names: List[str]
center_geometry_on_atom_and_write_xyz(central_atom_alf: ALF, central_atom_name: str, fname: str | Path | None = None)

Centers all geometries (from a Trajectory of PointsDirectory instance) onto a central atom and then writes out a new xyz file with all geometries centered on that atom. This is essentially what the ALFVisualizier application (ALFi) does. The features for the central atom are calculated, after which they are converted back into xyz coordinates (thus all geometries) are now centered on the given central atom).

Parameters:
  • feature_calculator – Function which calculates features

  • central_atom_name – the name of the central atom to center all geometries on. Eg. O1

  • fname – Optional file name in which to save the rotated geometries.

  • kwargs – Key word arguments to pass to calculator function

connectivity(connectivity_calculator: Callable[[...], ndarray]) ndarray

Return the connectivity matrix (n_atoms x n_atoms) for the given Atoms instance.

Returns:

type: np.ndarray of shape n_atoms x n_atoms

property coordinates: ndarray
features(feature_calculator: Callable, *args, is_atomic=True, **kwargs) ndarray
features_dict(feature_calculator: Callable[[...], ndarray], *args, **kwargs) dict

Returns the features in a dictionary for this Atoms instance, corresponding to the features of each Atom instance held in this Atoms isinstance Features are calculated in the Atom class and concatenated to a 2d array here.

e.g. {“C1”: np.array, “H2”: np.array}

property natoms: int
property types_extended: List[str]
class HasData

Bases: ABC

Class used to describe a file containing properties/data for a particular geometry

property data_names: List[str]

Returns a list of strings corresponding to data names that the object should have. These names can be used as keys in raw_data or processed_data to obtain values. Note that values might be other dictionaries.

processed_data(processing_func, *args, **kwargs) dict

Processed data is some way, given any arguments and key words arguments, and returns a dictionary of the processed data, with keys

abstract property raw_data: dict

Returns the raw data associated with the current object.

Returns:

_description_

Return type:

dict

ichor.core.files.mol2 module

class AtomType(value)

Bases: Enum

An enumeration.

Al = 'Al'
Br = 'Br'
C1 = 'C.1'
C2 = 'C.2'
C3 = 'C.3'
CAr = 'C.ar'
CCat = 'C.cat'
Ca = 'Ca'
Cl = 'Cl'
CoOH = 'Co.oh'
Dummy = 'Du'
F = 'F'
H = 'H'
HSPC = 'H.spc'
HT3P = 'H.t3p'
I = 'I'
K = 'K'
Li = 'Li'
LonePair = 'LP'
N1 = 'N.1'
N2 = 'N.2'
N3 = 'N.3'
N4 = 'N.4'
NAm = 'N.am'
NAr = 'N.ar'
NP13 = 'N.p13'
Na = 'Na'
O2 = 'O.2'
O3 = 'O.3'
OCO2 = 'O.co2'
OSPC = 'O.spc'
OT3P = 'O.t3p'
P3 = 'P.3'
RuOH = 'Ru.oh'
S2 = 'S.2'
S3 = 'S.3'
SO = 'S.o'
SO2 = 'S.o2'
Si = 'Si'
class BondType(value)

Bases: Enum

An enumeration.

Amide = 'am'
Aromatic = 'ar'
Double = '2'
Single = '1'
Triple = '3'
Unspecified = 'un'
class ChargeType(value)

Bases: Enum

An enumeration.

Ampac = 'AMPAC_CHARGES'
DelRe = 'DEL_RE'
Dict = 'DICT_CHARGES'
GastHuck = 'GAST_HUCK'
Gasteiger = 'GASTEIGER'
Gaussian = 'GAUSS80_CHARGES'
Huckel = 'HUCKEL'
MMFF94 = 'MMFF94_CHARGES'
Mulliken = 'MULLIKEN_CHARGES'
NoCharges = 'NO_CHARGES'
Pullman = 'PULLMAN'
User = 'USER_CHARGES'
class Mol2(path: Path | str, system_name: str, atoms: Atoms)

Bases: HasAtoms, WriteFile

format()
path: Path | str
class Mol2Atom(ty: str, x: float, y: float, z: float, index: int | None = None, parent: Atoms | None = None, units: AtomicDistance = AtomicDistance.Angstroms, atom_type: AtomType | None = None)

Bases: Atom

property atom_type
property unpaired_electrons
property valence

Returns the valence of the Atom instance

Returns:

the valence of the atom (as defined by the atom type)

Return type:

int

class MoleculeType(value)

Bases: Enum

An enumeration.

BioPolymer = 'BIOPOLYMER'
NucleicAcid = 'NUCLEIC_ACID'
Protein = 'PROTEIN'
Saccharide = 'SACCHARIDE'
Small = 'SMALL'
class SybylStatus(value)

Bases: Enum

An enumeration.

Altered = 'altered'
Analyzed = 'analyzed'
InvalidCharges = 'invalid_charges'
NONE = '****'
RefAngle = 'ref_angle'
Substituted = 'substituted'
System = 'system'
acyclic_bond(atom1: Atom, atom2: Atom) bool
bond_index_to_atom(bond: Tuple[int, int], parent: List[Atom]) Tuple[Atom, Atom]
bonds_of_type(atom, parent, bond_type)
charge(atom: Atom) float
gasteigger_charge(atom: Atom) float
get_atom_bonds(atom: Atom) List[Tuple[int, int]]
get_atom_type(atom: Atom, parent: Atoms) AtomType
get_bond_type(atom1: Atom, atom2: Atom) BondType
get_bond_types(atom, parent) List[BondType]
get_bonded_atoms(atom: Atom) List[Atom]

Return the atoms bonded to ‘atom’ from ‘parent’

get_nbonds(atom)
get_ring(atom)
n_bonds_of_type(atom, parent, bond_type)
nonmet(atom) List[Atom]
other_atom(atom: Atom, atom1: Atom, atom2: Atom) Atom

Return the other atom i.e. not ‘atom’ viev 2 atoms: ‘atom1’ and ‘atom2’

other_atom_bonds(atom, atom1, atom2) List[Tuple[int, int]]
other_bonded_atoms(atom, atom1, atom2) List[Atom]

ichor.core.files.optional_content module

class OptionalContentType

Bases: object

exists()

ichor.core.files.path_object module

class PathObject(path: Path | str)

Bases: ABC, object

An abstract base class that is used for anything that has a path (i.e. files or directories)

classmethod check_path(path: Path) bool
delete()

Delete the Path object from disk.

exists() bool

Determines if the path points to an existing directory or file on the storage drive.

abstract move(dst) None

An abstract method that subclasses need to implement. This is used to move files around.

path: Path | str
remove()

Alias for delete

property stem

Returns the stem of the file (without suffix, if one is present)

ichor.core.files.point_directory module

class PointDirectory(path: Path | str)

Bases: AnnotatedDirectory, HasAtoms, HasData

A helper class that wraps around ONE directory which contains ONE point (one molecular geometry).

Parameters:

path – Path to a directory which contains ONE point.

property atoms: Atoms

Returns the Atoms instance which the PointDirectory encapsulates.

atoms_from_file(file_with_atoms: HasAtoms) Atoms

Given a class (which is in the contents of the directory), obtain the Atoms instance from that specific file which is wrapped by the class.

Parameters:

file_with_atoms – file class which subclasses from HasAtoms and has a .atoms attribute

Raises:

ichor.core.atoms.AtomsNotFoundError – If file class does not contain atoms

Returns:

_description_

Return type:

ichor.core.atoms.Atoms

classmethod check_path(path: Path) bool

Makes sure that path is PointDirectory-like

contents = {'aim': <class 'ichor.core.files.aimall.aim.Aim'>, 'gaussian_output': <class 'ichor.core.files.gaussian.gaussian_output.GaussianOutput'>, 'gjf': <class 'ichor.core.files.gaussian.gjf.GJF'>, 'ints': <class 'ichor.core.files.aimall.ints.IntDirectory'>, 'orca_input': <class 'ichor.core.files.orca.orca_input.OrcaInput'>, 'orca_output': <class 'ichor.core.files.orca.orca_output.OrcaOutput'>, 'wfn': <class 'ichor.core.files.gaussian.wfn.WFN'>, 'xyz': <class 'ichor.core.files.xyz.xyz.XYZ'>}
features(feature_calculator: Callable, *args, is_atomic=True, **kwargs)

Returns the features for this Atoms instance, corresponding to the features of each Atom instance held in this Atoms isinstance Features are calculated in the Atom class and concatenated to a 2d array here.

The array shape is n_atoms x n_features (3*n_atoms - 6)

Parameters:
  • is_atomic – whether the feature calculator calculates features for individual atoms or for the whole geometry.

  • args – positional arguments to pass to feature calculator

  • kwargs – key word arguments to pass to feature calculator

Returns:

type: np.ndarray of shape n_atoms x n_features (3N-6)

Return the feature matrix of this Atoms instance

path: Path | str
property raw_data: dict

Returns the raw data associated with the current object.

Returns:

_description_

Return type:

dict

ichor.core.files.points_directory module

class PointsDirectory(path: Path | str, needs_parsing=True, *args, **kwargs)

Bases: ListOfAtoms, Directory, HasData

A helper class that wraps around a directory which contains points (molecules with various geometries). Calling Directory.__init__(self, path) will call the parse method of PointsDirectory instead of Directory (because Python looks for the method in this class first before looking at parent class methods.) A typical ICHOR directory that contains points will points will have a structure like this:

-TRAINING_SET
    -- SYSTEM_NAME000
    -- SYSTEM_NAME001
    -- SYSTEM_NAME002
    ........

Each of the subdirectories contains Gaussian files (such as .gjf), as well as an atomic_files directory, which then contains the AIMALL files. A PointsDirectory will wrap around the whole TRAINING_SET directory (which contains multiple points), while a PointDirectory will wrap around a SYSTEM_NAME00… folder (which only contains information about 1 point).

Parameters:
  • path – Path to a directory which contains points. This path is typically the path to the training set, sample pool, etc.

  • needs_parsing – By default, every PointsDirectory is parsed when the instance is created to create PointDirectory instances of each inner directory (but the contents of the files are not read). If however, a slice of a already created PointsDirectory is made, the contents of the directories do not need to be parsed again, so needs_parsing would be false

alf(alf_calculator: Callable[[...], ALF], *args, **kwargs) List[ALF]

Returns the Atomic Local Frame (ALF) for all Atom instances that are held in Atoms e.g. [[0,1,2],[1,0,2], [2,0,1]].

Parameters:
  • args – positional arguments to pass to alf calculator

  • kwargs – key word arguments to pass to alf calculator

alf_dict(alf_calculator: Callable[[...], ALF], *args, **kwargs) Dict[str, ALF]
Returns a dictionary of key: atom_name, value: ALF instance

(containing central atom index, x-axis idx, xy-plane idx)

e.g. {"O1":ALF(0,1,2),"H2":ALF(1,0,2), "H3":ALF(2,0,1)].

Parameters:
  • args – positional arguments to pass to alf calculator

  • kwargs – key word arguments to pass to alf calculator

property atom_names: List[str]

Return the atom names from the first timestep. Assumes that all timesteps have the same number of atoms/atom names.

classmethod check_path(path: Path) bool

Makes sure that path is PointsDirectory-like

connectivity(connectivity_calculator: Callable[[...], ndarray]) ndarray

Return the connectivity matrix (n_atoms x n_atoms) for the given Atoms instance.

Returns:

type: np.ndarray of shape n_atoms x n_atoms

property coordinates: ndarray

the xyz coordinates of all atoms for all timesteps. Shape n_timesteps x n_atoms x 3

Type:

return

coordinates_to_xyz(fname: str | Path | None = PosixPath('system_to_xyz.xyz'), step: int | None = 1)

write a new .xyz file that contains the timestep i, as well as the coordinates of the atoms for that timestep.

Parameters:
  • fname – The file name to which to write the timesteps/coordinates

  • step – Write coordinates for every n^th step. Default is 1, so writes coordinates for every step

coordinates_to_xyz_with_errors(models_path: str | Path, fname: str | Path | None = PosixPath('xyz_with_properties_error.xyz'), step: int | None = 1)

Write a new .xyz file that contains the timestep i, as well as the coordinates of the atoms for that timestep. The comment lines in the xyz have absolute predictions errors. These can then be plotted in ALFVisualizer as cmap to see where poor predictions happen.

Parameters:
  • models_path – The model path to one atom.

  • property – The property for which to predict for and get errors (iqa or any multipole moment)

  • fname – The file name to which to write the timesteps/coordinates

  • step – Write coordinates for every n^th step. Default is 1, so writes coordinates for every step

features_with_properties_to_csv(system_alf: Dict[str, ALF], str_to_append_to_fname: str = '_features_with_properties.csv', atom_names: List[str] | None = None, property_types: List[str] | None = None, **kwargs)

Calculates ALF features and properties (with multipole moments rotated).

Parameters:
  • str_to_append_to_fname – a string that is appended to the default file name (which is name_of_atom.csv), defaults to None

  • atom_names – A list of atom names for which to write out csv files with properties. If None, then writes out files for all atoms in the system, defaults to None

  • property_types – A list of property names (iqa, multipole names) for which to write columns. If None, then writes out columns for all properties, defaults to None

  • args – positional arguments to pass to calculator function

  • kwargs – key word arguments to be passed to the feature calculator function

Raises:

TypeError – This method only works for PointsDirectory instances because it needs access to AIMALL information. Does not work for Trajectory instances.

features_with_wfn_energy_and_dE_df_to_csv(alf_list: List[ALF], central_atom_idx: int, str_to_append_to_fname: str = '_features_with_dE_df.csv', **kwargs)

Writes out a csv file containing wfn energy and FORCEs calculated for every feature. Note that the forces (dE/df_i) are the negative of the PES gradient, so for machine learning, the negative of these forces needs to be taken to add gradient information into GP models.

Parameters:
  • system_alf – A list of ALF instances containing alf info

  • central_atom_idx (int) – The central atom which to center the alf on and for which dE/df will be calculated

  • str_to_append_to_fname (str, optional) – _description_, defaults to “_features_with_properties.csv”

classmethod from_trajectory(trajectory_path: str | Path, system_name: str | None = None, every=1, center=True) PointsDirectory

Generate a PointsDirectory-type structure directory from a trajectory (.xyz) file

Parameters:
  • trajectory_path – A str or Path to a .xyz file containing geometries

  • system_name – The name of the chemical system. This is going to be the name of the directory which will be created.

  • center – Whether to center the geometries on the centroid of the system. This is useful to prevent the molecule from translating in 3D space (and prevents issues with WFN files, where a very large x,y,z value (over 100) for the coordinates leads to **** being written in the .wfn file…)

property natoms: int

Returns the number of atoms in the first timestep. Each timestep should have the same number of atoms.

path: Path | str
processed_data(processing_func, *args, **kwargs) dict

Processed data is some way, given any arguments and key words arguments, and returns a dictionary of the processed data, with keys Note that processing function can be any callable, i.e. closure, class, etc.

Note

The processing function must act on one PointDirectory.

Parameters:
  • processing_func – Callable which is going to process ONE PointDirectory

  • args – Positional arguments to pass to processing func

Returns:

A dictionary of processed data. Keys of the dictionary are the stem of each PointDirectory contained inside this PointsDirectory instance.

properties(system_alf: List[ALF] | None = None, specific_property: str | None = None)

Get properties contained in the PointDirectory. IF no system alf is passed in, an automatic process to get C matrices is started.

Parameters:
  • system_alf – Optional list of ALF instances that can be passed in to use a specific alf instead of automatically trying to compute it.

  • key – return only a specific key from the returned dictionary

property raw_data: dict

Returns all raw data associated with the PointsDirectory instance. The key is the point name (of a PointDirectory instance) and value is the raw data associated with the one point.

Returns:

A dictionary of raw data. Keys of the dictionary are the stem of each PointsDirectory contained inside this PointsDirectoryParent instance.

property total_energy

Returns np array of wfn energies of all points

Returns:

np array of total energy (in Hartree) for all points

property types: List[str]

Returns the atom elements for atoms, assumes each timesteps has the same atoms. Removes duplicates.

property types_extended: List[str]

Returns the atom elements for atoms, assumes each timesteps has the same atoms. Removes duplicates.

property wfn_energy: ndarray

Returns np array of wfn energies of all points

Returns:

np array of total energy (in Hartree) for all points

write_to_json_database(root_path: str | ~pathlib.Path | None = None, datafunction: ~typing.Callable = <function get_data_for_point>, npoints_per_json=500, print_missing_data=True, indent: int = 2, separators=(', ', ':')) Path

Write out important information from a PointsDirectory instance to a json file.

Parameters:
  • root_path – Name of directory which will the json database. This is a directory, which contains multiple directories inside. Each directory inside is one PointsDirectory. The reason for implementing like this is if using for multiple PointsDirectory-ies at once, so that data for each PointDirectory is written in a separate folder

  • datafunction – A function used to get all data for a single point. This data is going to get written to the json file.

  • npoints_per_json – Maximum number of geometries to write to one json file This is done so that the individual files do not become very large.

  • print_missing_data – Whether to print out any missing data from each PointDirectory contained in self, defaults to False

  • indent – integer representing number of spaces to indent, defaults to 2

  • separators – Separators used for each entry, default (“,”, “:”)

Returns:

The path to the written json file

write_to_sqlite3_database(db_path: str | Path | None = None, echo=False, print_missing_data=True) Path

Write out important information from a PointsDirectory instance to an SQLite3 database.

Parameters:
  • db_path – database to write to

  • echo – Whether to print out SQL queries from SQL Alchemy

  • echo – Whether to print out SQL queries from SQL Alchemy, defaults to False

  • print_missing_data – Whether to print out any missing data from each PointDirectory contained in self, defaults to False

Returns:

The path to the written SQL database

ichor.core.files.points_directory_parent module

class PointsDirectoryParent(path: Path | str)

Bases: list, Directory

Should wrap around multiple PointsDirectory-ies.

path: Path | str
processed_data(processing_func, *args, **kwargs) dict

Processed data is some way, given any arguments and key words arguments, and returns a dictionary of the processed data, with keys Note that processing function can be any callable, i.e. closure, class, etc.

Note

The processing function must act on one PointDirectory.

Parameters:
  • processing_func – Callable which is going to process ONE PointDirectory

  • args – Positional arguments to pass to processing func

Returns:

A dictionary of processed data. Keys of the dictionary are the stem of each PointsDirectory contained inside this PointsDirectoryParent instance.

property raw_data: dict

Returns all raw data associated with the PointsDirectoryParent instance. The key is the points directory name (of a PointsDirectory instance) and value is another dictionary.

Returns:

A dictionary of raw data. Keys of the dictionary are the stem of each PointsDirectory contained inside this PointsDirectoryParent instance.

write_to_json_database(root_name: str | None = None, npoints_per_json=500, print_missing_data=True, indent: int = 2, separators=(',', ':')) List[Path]

Makes a database from multiple PointsDirectory-like directories which are contained in this PointsDirectoryParent

Parameters:
  • root_name – The name of the database. If not selected, uses the name of the current PointsDirectoryParent, defaults to None

  • npoints_per_json – Number of json files in each sub-directory, defaults to 500

  • print_missing_data – Whether or not to print missing data, defaults to True

  • indent – json file indent, defaults to 2

  • separators – json file separators, defaults to (“,”, “:”)

write_to_sqlite3_database(db_path: str | Path | None = None, echo=False, print_missing_data=True) Path

Write out important information from a PointsDirectory instance to an SQLite3 database. All PointsDirectory-like directories contained inside will be written to the same database.

Parameters:
  • db_path – database to write to

  • echo – Whether to print out SQL queries from SQL Alchemy, defaults to False

  • print_missing_data – Whether to print out any missing data from each PointDirectory contained in self, defaults to False

Returns:

The path to the written SQL database

ichor.core.files.qcp module

class QuantumChemistryProgramInput(path: Path | str, method: str | None = FileContents, basis_set: str | None = FileContents, atoms: Atoms = FileContents)

Bases: HasAtoms, ABC

Abstract class to interface with quantum chemistry programs

Module contents

class AbInt(path: str | Path)

Bases: HasData, ReadFile

classmethod check_path(path: Path) bool

Checks the suffix of the given path matches the filetype associated with class that subclasses from File :param path: A Path object to check :return: True if the Path object has the same suffix as the class filetype, False otherwise

property e_inter
path: Path | str
property raw_data: dict

Returns the raw data associated with the current object.

Returns:

_description_

Return type:

dict

class Aim(path: Path)

Bases: ReadFile, dict

Class which wraps around an AIMAll output file, where settings and timings are written out to. The .int files are parsed separately in the INT/INTs classes.

path: Path | str
class DlPolyConfig(system_name: str, trajectory: Trajectory, path: Path | str = PosixPath('CONFIG'), cell_size: float = 50.0, comment_line='Frame :         1\n')

Bases: WriteFile

Write out a DLPoly CONFIG file. The name of the file needs to be CONFIG, so DL POLY knows to use it.

Parameters:
  • system_name – the name of the chemical system

  • trajectory – a Trajectory instance containing the geometries that are going to be written to the CONFIG file. Each timestep in the trajectory is an Atoms instance.

  • path – The path to the CONFIG file, defaults to Path(‘CONFIG’)

  • cell_size – The size of the box, float

  • line (comment) –

    The very first line in the CONFIG file. Must be below 72 characters

    Note

    ALL of the timesteps in the Trajectory will be written to one CONFIG file. Each timestep groups geometries which should be represented by a GP model. For example, if each timestep is only one molecule, then it means it is a monomer model and the labels of the atoms in the CONFIG will show that. If each timestep is two molecules, it means it is a dimer model, so then the labels in the CONFIG file will make sure that two molecules which should be represented by one GP model have the correct atom labeling in the CONFIG file.

path: Path | str
class DlPolyControl(system_name: str, path: Path = PosixPath('CONTROL'), ensemble: str = 'nvt', thermostat: str = 'hoover', thermostat_settings: list = [0.04], temperature: int = 1, timestep=0.001, steps=500, scale=100, cutoff=8.0, rvwd=8.0, dump=1000, trajectory_i=0, trajectory_j=1, trajectory_k=0, print_every=1, stats_every=1, job_time=10000000, close_time=20000)

Bases: WriteFile

Write out a DLPoly CONTROL file. The name of the file needs to be CONTROL, so DL POLY knows to use it. The default Control file is made to be used for geometry optimizations at very low temperatures. Settings must be changed to write out a file for water box simulations for example.

path: Path | str
class DlPolyFFLUX(path: Path | str = PosixPath('FFLUX'))

Bases: ReadFile

READS the FFLUX file from FFLUX.

Parameters:

path – Path to FFLUX file

Variables:
  • df – A pandas dataframe storing all the data in the FFLUX file.

  • sum_iqa_energy – The total energies array of shape ntimesteps

  • vdw_energy – The Van der Waals energies of each timestep. Only computed if there are multiple molecules. Otherwise they will be 0.0

  • electrostatic_energy – The electrostatic energies of each timestep. Only computed if there are multiple molecules. Otherwise they will be 0.0

classmethod check_path(path: Path) bool

Checks the suffix of the given path matches the filetype associated with class that subclasses from File :param path: A Path object to check :return: True if the Path object has the same suffix as the class filetype, False otherwise

property delta_between_timesteps: List[float]

Calculates the delta energy (in kJ mol-1) between each pairs of timesteps. Useful for checking convergence of energy when doing optimizations.

Returns:

List containing the first index (timestep) where the threshold is met as well as the list of differences for all timesteps

property delta_between_timesteps_kj_mol: List[float]

Calculates the delta energy (in kJ mol-1) between each pairs of timesteps. Useful for checking convergence of energy when doing optimizations.

Returns:

List containing the first index (timestep) where the threshold is met as well as the list of differences for all timesteps

property electrostatic_energy
first_index_where_delta_less_than(delta=0.0001) int

Returns first index where the energy between timesteps is below delta (in kJ mol-1)

Parameters:

delta – The threshold when geometry is converged, defaults to 1e-4 kJ mol-1

property kinetic_energy
property ntimesteps
path: Path | str
plot_abs_differences(until_converged_energy=True)
property sum_iqa_energy
property total_energy
property total_energy_kj_mol
property vdw_energy
class DlPolyField(system_name: str, atoms: Atoms, path: Path | str = PosixPath('FIELD'), nummols=1)

Bases: WriteFile

path: Path | str
class DlPolyIQAEnergies(path: Path | str = PosixPath('IQA_ENERGIES'))

Bases: ReadFile

READS the IQA_ENERGIES file from FFLUX.

Parameters:

path – Path to IQA_ENERGIES file

Variables:
  • natoms – Number of atoms in system

  • energies – Array of shape ntimesteps x natoms for read energies

classmethod check_path(path: Path) bool

Checks the suffix of the given path matches the filetype associated with class that subclasses from File :param path: A Path object to check :return: True if the Path object has the same suffix as the class filetype, False otherwise

path: Path | str
class DlPolyIQAForces(path: Path | str = PosixPath('IQA_FORCES'))

Bases: ReadFile

READS the IQA_FORCES file from FFLUX.

Parameters:

path – Path to IQA_FORCES file

Variables:
  • forces – The forces array of shape ntimesteps x natoms x 3. Initialized as FileContents prior to file reading.

  • natoms – Number of atoms in each timestep

check_forces_less_than_value(value=0.001) ndarray

Checks what timesteps have all forces less than value. The GP models will revert back to prior mean when far away from training data, so that the forces on atoms will be 0.

We can check for that because if the forces are consistently less than the value then either the simulation has crashed or a minimum is reached

Parameters:

value – Value for which all forces need to be less than

Returns:

np.ndarray containing timestep indices for which condition is true If len(array) is 0, then the condition is not met for any timestep. Could be useful to check if a geometry is optimized or simulation crashed.

classmethod check_path(path: Path) bool

Checks the suffix of the given path matches the filetype associated with class that subclasses from File :param path: A Path object to check :return: True if the Path object has the same suffix as the class filetype, False otherwise

path: Path | str
class DlpolyHistory(path: Path | None = PosixPath('HISTORY'))

Bases: Trajectory

DLPOLY HISTORY File

Inherits from Trajectory as is a list of Atoms Builds on the Trajectory class by adding DLPOLY information provided by the HISTORY file

classmethod check_path(path: Path) bool

Checks the suffix of the given path matches the filetype associated with class that subclasses from File :param path: A Path object to check :return: True if the Path object has the same suffix as the class filetype, False otherwise

path: Path | str
write_final_geometry_to_xyz(xyz_path: Path)
write_to_trajectory(path: str = 'TRAJECTORY.xyz')

Writes a trajectory .xyz file from the DL POLY HISTORY file.

class FFLUXDirectory(path: Path | str)

Bases: AnnotatedDirectory

READS a FFLUX Directory containing FFLUX, IQA_ENERGIES, IQA_FORCEs and HISTORY file

Parameters:

path – Path to FFLUX Directory

contents = {'fflux_file': <class 'ichor.core.files.dl_poly.dl_poly_fflux.DlPolyFFLUX'>, 'history_file': <class 'ichor.core.files.dl_poly.dl_poly_history.DlpolyHistory'>, 'iqa_energies_file': <class 'ichor.core.files.dl_poly.dl_poly_iqa_energies.DlPolyIQAEnergies'>, 'iqa_forces_file': <class 'ichor.core.files.dl_poly.dl_poly_iqa_forces.DlPolyIQAForces'>}
property coordinates: ndarray

Returns coordinates as array of shape ntimesteps x natoms x 3

property iqa_energies: ndarray

Returns individual atom iqa enegy array of shape ntimesteps x natoms

property iqa_forces: ndarray

Returns iqa forces array of shape ntimesteps x natoms x 3

property natoms: int

Returns number of atoms

path: Path | str
property total_iqa_energies: ndarray

Returns total energy array of shape ntimesteps

class GJF(path: Path | str, link0: List[str] | None = None, print_level: PrintLevel | None = None, method: str | None = None, basis_set: str | None = None, keywords: List[str] | None = None, title: str | None = None, charge: int | None = None, spin_multiplicity: int | None = None, atoms: Atoms | None = None, output_chk: bool = False)

Bases: ReadFile, WriteFile, HasAtoms

Wraps around a .gjf file that is used as input to Gaussian. See https://gaussian.com/input/ for details. Below is the usual gjf file structure:

%nproc
%mem
# <job_type> <method>/<basis-set> <keywords>

Title

0 1
<atom-name> <todo: add -1 for freeze> <x> <y> <z>
...

extra_details_str (containing basis sets for individual atoms, what to freeze, etc.)

<wfn-name>
blank line
blank line
blank line
...
Parameters:
  • path – A string or Path to the .gjf file. If a path is not give, then there is no file to be read, so the user has to write the file contents. If no contents/options are written by user, they are written as the default values in the write method.

  • title – A string to be written between the link0 options and the keywords. It can contain any information.

  • job_type – The job type, an energy, optimization, or frequency

  • keywords – A list of keywords to be added to the Gaussian keywords line

  • method – The method to be used by Gaussian (e.g. B3LYP)

  • basis_set – The basis set to be used by Gaussian (e.g. 6-31+g(d,p))

  • charge – The charge to be used by Gaussian for the system

  • multiplicity – The multiplicity to be used by Gaussian for the system.

  • atoms – An Atoms instance containing a geometry to be written in the .gjf file. This is either read in (if an existing gjf path is given) or an error is thrown when attempting to write the gjf file (because no gjf file or Atoms instance was given)

  • extra_calculation_details – A list of strings to be added to the bottom of the gjf file (after atoms section containing atom names and coordinates). This is done in order to handle different basis sets for individual atoms, modredundant settings, and other settings that Gaussian handles.

Note

It is up to the user to handle write the extra_calculation_details settings. ICHOR does NOT do checks to see if these additional settings are going to be read in correctly in Gaussian.

add_keyword(keyword: str)

Add a keyword to the Gaussian input keywords

Parameters:

keywords – A string to add as a keyword

Note

The keyword is not checked internally.

add_keywords(keywords: List[str])

Add a list of keywords to the Gaussian input keywords

Parameters:

keywords – A list of keywords

Note

The keywords are not checked internally.

output_wfn()

Helper method to add ‘output=wfn’ to the GJF keyword list

classmethod parse_route_card(route_card: str) RouteCard
path: Path | str
set_mem(mem: str)

Sets memory for Gaussian job

Parameters:

mem – string to set as memory

Note

This is not checked internally.

set_nproc(nproc: int)

Sets the number of processor cores for Gaussian

Parameters:

nproc – An integer which is the number of cores.

Note

No checks are done for CPU core count.

class GaussianOutput(path: Path | str)

Bases: ReadFile, HasAtoms, HasData

Wraps around a .gaussianoutput file that is the output of Gaussian. This file contains coordinates (in Angstroms), forces, as well as molecular multipole moments.

Parameters:

path – Path object or string to the .gaussianoutput file that are Gaussian output files

path: Path | str
property raw_data: dict

Returns the raw data associated with the current object.

Returns:

_description_

Return type:

dict

rotated_forces(rotation_matrix: ndarray) dict

Rotates forces gives a rotation_matrix, which could be the C matrix to rotate on an ALF axis system with central atom, x-axis atom, and xy-plane atom.

Parameters:

rotation_matrix – A 3x3 rotation matrix

class Int(path: Path | str)

Bases: ReadFile, HasData

Wraps around one .int file which is generated by AIMALL for every atom in the system.

Parameters:
  • path – The Path object corresponding to an .int file

  • parent – An Atoms instance which holds the coordinate information for all atoms in the system. This information is needed to form the C matrix when rotating multipoles from the global to the local frame. Note that the Atoms instance must contain the same atom name (i.e. atom type + atom index), so that rotating of the multipoles can happen.

property atom_num: int

Returns the atom index in the system. (atom indices in atom names start at 1)

property bond_critical_points: List[CriticalPoint]

Returns list of bond critical points

property cage_critical_points: List[CriticalPoint]

Returns list of ring critical points

classmethod check_path(path: Path) bool

Checks the path is the same as for .int file. The _ in the .int file indicates AB interactions, which have to be read in differently because the file is strucutred differently.

property dipole_mag: float

Returns the magnitude of the dipole moment of the topological atom. The magnitude of the vector is not affected by the rotation of multipoles.

property e_intra: float
property global_multipole_moments: dict

Returns the spherical multipole moments calculated by AIMAll. .. note:

These are in the global (Cartesian) frame, i.e. they have NOT been rotated using ALF.
Rotation is done by converting to Cartesian, rotating, and then converting back to spherical.
property i: int

Returns the atom index in the system. (atom indices in atom names start at 1)

property integration_error: float

The integration error can tell you if a point has been decomposed into topological atoms correctly. A large integration error signals that the point might not be suitable for training as the AIMALL IQA/multipole moments might be inaccurate.

property iqa: float

Returns the IQA energy of the topological atom that was calculated for this topological atom (since 1 .int file is written for each topological atom).

local_spherical_multipoles(C: ndarray) Dict[str, float]

Rotates global spherical multipoles into local spherical multipoles. Optionally a rotation matrix can be passed in. Otherwise, the wfn file associated with this int file (as read in from the int file) will be used (if it exists).

Parameters:

C – Rotation matrix to be used to rotate multipoles.

Raises:

FileNotFoundError – If no C_matrix is passed in and the wfn file associated with the int file does not exist. Then we cannot calculate multipoles.

path: Path | str
property q: float

Returns the point charge (monopole moment) of the topological atom.

property q00: float

Returns the point charge (monopole moment) of the topological atom.

property raw_data

Get properties which we are interested in machine learning from the INT file. Rotate multipoles using a given C matrix.

property ring_critical_points: List[CriticalPoint]

Returns list of ring critical points

class IntDirectory(path: Path | str)

Bases: HasData, AnnotatedDirectory

Wraps around a directory which contains all .int files for the system.

Parameters:
  • path – The Path corresponding to a directory holding .int files

  • parent – An Atoms instance that holds coordinate information for all the atoms in the system. Things like XYZ and GJF hold geometry.

classmethod check_path(path: Path) bool

Checks if the given Path instance has _atomicfiles in its name.

contents = {'interaction_ints': <class 'ichor.core.files.aimall.ab_int.AbInt'>, 'ints': <class 'ichor.core.files.aimall.int.Int'>}
get(pattern: str, default=None)

Does the same thing as get of a dictionary, returning a default is KeyError

path: Path | str
properties(C_dict: Dict[str, ndarray]) Dict[str, Dict[str, float]]

Returns a dictionary of dictionaries containing atom names as keys an a dictionary as value. The value dictionary contains the properties we are interested in machine learning as keys and the values of these properties as floats. A list of C matrices needs to be passed in because we must rotate the multipoles.

Parameters:

C_list – A list of rotation matrices, each of the atoms

Raises:

FileNotFoundError – If no C_matrix is passed in and the wfn file associated with the int file does not exist. Then we cannot calculate multipoles.

property raw_data: dict

Returns data associated with each atom. If interaction ints are present, also adds these to the dictionary.

class Mol2(path: Path | str, system_name: str, atoms: Atoms)

Bases: HasAtoms, WriteFile

format()
path: Path | str
class MorfiDirectory(path: Path | str)

Bases: AnnotatedDirectory

classmethod check_path(path: Path) bool

Implement if the path of the directory needs to be checked if it contains something specific

contents = {'mout': <class 'ichor.core.files.pandora.mout.MOUT'>}
dirname = 'morfi-2pdm'
path: Path | str
class OrcaEngrad(path: Path | str)

Bases: ReadFile, HasAtoms, HasData

property gradient: ndarray
path: Path | str
property raw_data: dict

Returns the raw data associated with the current object.

Returns:

_description_

Return type:

dict

class OrcaInput(path: Path | str, method: str | None = None, basis_set: str | None = None, main_input: List[str] | None = None, charge: int | None = None, spin_multiplicity: int | None = None, atoms: Atoms | None = None, input_blocks: Dict[str, List[tuple]] | None = None)

Bases: ReadFile, WriteFile, File, HasAtoms

Wraps around an ORCA input file that is used as input to ORCA.

Parameters:
  • path – A string or Path to the ORCA input file file. If a path is not give, then there is no file to be read, so the user has to write the file contents. If no contents/options are written by user, they are written as the default values in the write method.

  • method – The method to use for calculation, defaults to b3lyp/g if not given

  • basis_set – The basis set for the calculation, defaults to “6-31+g(d,p)”

  • main_input – A list of strings which are commands beginning with ! charge: Optional[int] = None, spin_multiplicity: Optional[int] = None, atoms: Optional[Atoms] = None, input_blocks: Dict[str, Union[str, List[str]]]

  • charge – The charge of the system

  • spin_multiplicity – The spin multiplicity of the system

  • atoms – An Atoms instance that contains the molecular structure

  • input_blocks – A dictionary consisting of keys: The option, and values: A list containing even number of elements. The option is going to be written out with a %, followed by the specifications that the user gives for the option

Note

There is no checking of what the inputs are, so it is up to the user to make sure that the inputs are correct.

Note

Gaussian uses a different b3lyp version (https://sites.google.com/site/orcainputlibrary/dft-calculations) so use b3lyp/g (this is the Gaussian implementation) instead of b3lyp

References

https://sites.google.com/site/orcainputlibrary/home https://www.cup.uni-muenchen.de/oc/zipse/teaching/computational-chemistry-2/topics/a-typical-orca-output-file/ https://www.orcasoftware.de/tutorials_orca/first_steps/input_output.html https://www.afs.enea.it/software/orca/orca_manual_4_2_1.pdf (note this is for version 4, not 5) version 5 manual, needs login: available in https://orcaforum.kofo.mpg.de/app.php/dlext/?view=detail&df_id=186 https://orcaforum.kofo.mpg.de/viewtopic.php?f=8&t=7470&p=32102&hilit=atomic+force#p32102

path: Path | str
class OrcaOutput(path: Path | str)

Bases: HasAtoms, HasData, ReadFile

Wraps around a .gau/.log file that is the output of Gaussian. This file contains coordinates (in Angstroms), forces, as well as molecular multipole moments.

Parameters:

path – Path object or string to the .gau or .log file that are Gaussian output files

path: Path | str
property raw_data: dict

Returns the raw data associated with the current object.

Returns:

_description_

Return type:

dict

class PandoraDirectory(path: Path | str)

Bases: HasAtoms, AnnotatedDirectory

classmethod check_path(path: Path) bool

Implement if the path of the directory needs to be checked if it contains something specific

contents = {'input': <class 'ichor.core.files.pandora.pandora_input.PandoraInput'>, 'morfi': <class 'ichor.core.files.pandora.morfi_output.MorfiDirectory'>, 'pyscf': <class 'ichor.core.files.pandora.pyscf_output.PySCFDirectory'>}
dirname = 'pandora'
path: Path | str
write()
class PandoraInput(path: Path, atoms: Atoms | None = None, ccsdmod: PandoraCCSDmod = FileContents, morfi_grid_radial: float = FileContents, morfi_grid_angular: int = FileContents, morfi_grid_radial_h: float = FileContents, morfi_grid_angular_h: int = FileContents, method: str = FileContents, basis_set: str = FileContents)

Bases: HasAtoms, ReadFile, WriteFile

path: Path | str
class PointDirectory(path: Path | str)

Bases: AnnotatedDirectory, HasAtoms, HasData

A helper class that wraps around ONE directory which contains ONE point (one molecular geometry).

Parameters:

path – Path to a directory which contains ONE point.

property atoms: Atoms

Returns the Atoms instance which the PointDirectory encapsulates.

atoms_from_file(file_with_atoms: HasAtoms) Atoms

Given a class (which is in the contents of the directory), obtain the Atoms instance from that specific file which is wrapped by the class.

Parameters:

file_with_atoms – file class which subclasses from HasAtoms and has a .atoms attribute

Raises:

ichor.core.atoms.AtomsNotFoundError – If file class does not contain atoms

Returns:

_description_

Return type:

ichor.core.atoms.Atoms

classmethod check_path(path: Path) bool

Makes sure that path is PointDirectory-like

contents = {'aim': <class 'ichor.core.files.aimall.aim.Aim'>, 'gaussian_output': <class 'ichor.core.files.gaussian.gaussian_output.GaussianOutput'>, 'gjf': <class 'ichor.core.files.gaussian.gjf.GJF'>, 'ints': <class 'ichor.core.files.aimall.ints.IntDirectory'>, 'orca_input': <class 'ichor.core.files.orca.orca_input.OrcaInput'>, 'orca_output': <class 'ichor.core.files.orca.orca_output.OrcaOutput'>, 'wfn': <class 'ichor.core.files.gaussian.wfn.WFN'>, 'xyz': <class 'ichor.core.files.xyz.xyz.XYZ'>}
features(feature_calculator: Callable, *args, is_atomic=True, **kwargs)

Returns the features for this Atoms instance, corresponding to the features of each Atom instance held in this Atoms isinstance Features are calculated in the Atom class and concatenated to a 2d array here.

The array shape is n_atoms x n_features (3*n_atoms - 6)

Parameters:
  • is_atomic – whether the feature calculator calculates features for individual atoms or for the whole geometry.

  • args – positional arguments to pass to feature calculator

  • kwargs – key word arguments to pass to feature calculator

Returns:

type: np.ndarray of shape n_atoms x n_features (3N-6)

Return the feature matrix of this Atoms instance

path: Path | str
property raw_data: dict

Returns the raw data associated with the current object.

Returns:

_description_

Return type:

dict

class PointsDirectory(path: Path | str, needs_parsing=True, *args, **kwargs)

Bases: ListOfAtoms, Directory, HasData

A helper class that wraps around a directory which contains points (molecules with various geometries). Calling Directory.__init__(self, path) will call the parse method of PointsDirectory instead of Directory (because Python looks for the method in this class first before looking at parent class methods.) A typical ICHOR directory that contains points will points will have a structure like this:

-TRAINING_SET
    -- SYSTEM_NAME000
    -- SYSTEM_NAME001
    -- SYSTEM_NAME002
    ........

Each of the subdirectories contains Gaussian files (such as .gjf), as well as an atomic_files directory, which then contains the AIMALL files. A PointsDirectory will wrap around the whole TRAINING_SET directory (which contains multiple points), while a PointDirectory will wrap around a SYSTEM_NAME00… folder (which only contains information about 1 point).

Parameters:
  • path – Path to a directory which contains points. This path is typically the path to the training set, sample pool, etc.

  • needs_parsing – By default, every PointsDirectory is parsed when the instance is created to create PointDirectory instances of each inner directory (but the contents of the files are not read). If however, a slice of a already created PointsDirectory is made, the contents of the directories do not need to be parsed again, so needs_parsing would be false

alf(alf_calculator: Callable[[...], ALF], *args, **kwargs) List[ALF]

Returns the Atomic Local Frame (ALF) for all Atom instances that are held in Atoms e.g. [[0,1,2],[1,0,2], [2,0,1]].

Parameters:
  • args – positional arguments to pass to alf calculator

  • kwargs – key word arguments to pass to alf calculator

alf_dict(alf_calculator: Callable[[...], ALF], *args, **kwargs) Dict[str, ALF]
Returns a dictionary of key: atom_name, value: ALF instance

(containing central atom index, x-axis idx, xy-plane idx)

e.g. {"O1":ALF(0,1,2),"H2":ALF(1,0,2), "H3":ALF(2,0,1)].

Parameters:
  • args – positional arguments to pass to alf calculator

  • kwargs – key word arguments to pass to alf calculator

property atom_names: List[str]

Return the atom names from the first timestep. Assumes that all timesteps have the same number of atoms/atom names.

classmethod check_path(path: Path) bool

Makes sure that path is PointsDirectory-like

connectivity(connectivity_calculator: Callable[[...], ndarray]) ndarray

Return the connectivity matrix (n_atoms x n_atoms) for the given Atoms instance.

Returns:

type: np.ndarray of shape n_atoms x n_atoms

property coordinates: ndarray

the xyz coordinates of all atoms for all timesteps. Shape n_timesteps x n_atoms x 3

Type:

return

coordinates_to_xyz(fname: str | Path | None = PosixPath('system_to_xyz.xyz'), step: int | None = 1)

write a new .xyz file that contains the timestep i, as well as the coordinates of the atoms for that timestep.

Parameters:
  • fname – The file name to which to write the timesteps/coordinates

  • step – Write coordinates for every n^th step. Default is 1, so writes coordinates for every step

coordinates_to_xyz_with_errors(models_path: str | Path, fname: str | Path | None = PosixPath('xyz_with_properties_error.xyz'), step: int | None = 1)

Write a new .xyz file that contains the timestep i, as well as the coordinates of the atoms for that timestep. The comment lines in the xyz have absolute predictions errors. These can then be plotted in ALFVisualizer as cmap to see where poor predictions happen.

Parameters:
  • models_path – The model path to one atom.

  • property – The property for which to predict for and get errors (iqa or any multipole moment)

  • fname – The file name to which to write the timesteps/coordinates

  • step – Write coordinates for every n^th step. Default is 1, so writes coordinates for every step

features_with_properties_to_csv(system_alf: Dict[str, ALF], str_to_append_to_fname: str = '_features_with_properties.csv', atom_names: List[str] | None = None, property_types: List[str] | None = None, **kwargs)

Calculates ALF features and properties (with multipole moments rotated).

Parameters:
  • str_to_append_to_fname – a string that is appended to the default file name (which is name_of_atom.csv), defaults to None

  • atom_names – A list of atom names for which to write out csv files with properties. If None, then writes out files for all atoms in the system, defaults to None

  • property_types – A list of property names (iqa, multipole names) for which to write columns. If None, then writes out columns for all properties, defaults to None

  • args – positional arguments to pass to calculator function

  • kwargs – key word arguments to be passed to the feature calculator function

Raises:

TypeError – This method only works for PointsDirectory instances because it needs access to AIMALL information. Does not work for Trajectory instances.

features_with_wfn_energy_and_dE_df_to_csv(alf_list: List[ALF], central_atom_idx: int, str_to_append_to_fname: str = '_features_with_dE_df.csv', **kwargs)

Writes out a csv file containing wfn energy and FORCEs calculated for every feature. Note that the forces (dE/df_i) are the negative of the PES gradient, so for machine learning, the negative of these forces needs to be taken to add gradient information into GP models.

Parameters:
  • system_alf – A list of ALF instances containing alf info

  • central_atom_idx (int) – The central atom which to center the alf on and for which dE/df will be calculated

  • str_to_append_to_fname (str, optional) – _description_, defaults to “_features_with_properties.csv”

classmethod from_trajectory(trajectory_path: str | Path, system_name: str | None = None, every=1, center=True) PointsDirectory

Generate a PointsDirectory-type structure directory from a trajectory (.xyz) file

Parameters:
  • trajectory_path – A str or Path to a .xyz file containing geometries

  • system_name – The name of the chemical system. This is going to be the name of the directory which will be created.

  • center – Whether to center the geometries on the centroid of the system. This is useful to prevent the molecule from translating in 3D space (and prevents issues with WFN files, where a very large x,y,z value (over 100) for the coordinates leads to **** being written in the .wfn file…)

property natoms: int

Returns the number of atoms in the first timestep. Each timestep should have the same number of atoms.

path: Path | str
processed_data(processing_func, *args, **kwargs) dict

Processed data is some way, given any arguments and key words arguments, and returns a dictionary of the processed data, with keys Note that processing function can be any callable, i.e. closure, class, etc.

Note

The processing function must act on one PointDirectory.

Parameters:
  • processing_func – Callable which is going to process ONE PointDirectory

  • args – Positional arguments to pass to processing func

Returns:

A dictionary of processed data. Keys of the dictionary are the stem of each PointDirectory contained inside this PointsDirectory instance.

properties(system_alf: List[ALF] | None = None, specific_property: str | None = None)

Get properties contained in the PointDirectory. IF no system alf is passed in, an automatic process to get C matrices is started.

Parameters:
  • system_alf – Optional list of ALF instances that can be passed in to use a specific alf instead of automatically trying to compute it.

  • key – return only a specific key from the returned dictionary

property raw_data: dict

Returns all raw data associated with the PointsDirectory instance. The key is the point name (of a PointDirectory instance) and value is the raw data associated with the one point.

Returns:

A dictionary of raw data. Keys of the dictionary are the stem of each PointsDirectory contained inside this PointsDirectoryParent instance.

property total_energy

Returns np array of wfn energies of all points

Returns:

np array of total energy (in Hartree) for all points

property types: List[str]

Returns the atom elements for atoms, assumes each timesteps has the same atoms. Removes duplicates.

property types_extended: List[str]

Returns the atom elements for atoms, assumes each timesteps has the same atoms. Removes duplicates.

property wfn_energy: ndarray

Returns np array of wfn energies of all points

Returns:

np array of total energy (in Hartree) for all points

write_to_json_database(root_path: str | ~pathlib.Path | None = None, datafunction: ~typing.Callable = <function get_data_for_point>, npoints_per_json=500, print_missing_data=True, indent: int = 2, separators=(', ', ':')) Path

Write out important information from a PointsDirectory instance to a json file.

Parameters:
  • root_path – Name of directory which will the json database. This is a directory, which contains multiple directories inside. Each directory inside is one PointsDirectory. The reason for implementing like this is if using for multiple PointsDirectory-ies at once, so that data for each PointDirectory is written in a separate folder

  • datafunction – A function used to get all data for a single point. This data is going to get written to the json file.

  • npoints_per_json – Maximum number of geometries to write to one json file This is done so that the individual files do not become very large.

  • print_missing_data – Whether to print out any missing data from each PointDirectory contained in self, defaults to False

  • indent – integer representing number of spaces to indent, defaults to 2

  • separators – Separators used for each entry, default (“,”, “:”)

Returns:

The path to the written json file

write_to_sqlite3_database(db_path: str | Path | None = None, echo=False, print_missing_data=True) Path

Write out important information from a PointsDirectory instance to an SQLite3 database.

Parameters:
  • db_path – database to write to

  • echo – Whether to print out SQL queries from SQL Alchemy

  • echo – Whether to print out SQL queries from SQL Alchemy, defaults to False

  • print_missing_data – Whether to print out any missing data from each PointDirectory contained in self, defaults to False

Returns:

The path to the written SQL database

class PointsDirectoryParent(path: Path | str)

Bases: list, Directory

Should wrap around multiple PointsDirectory-ies.

path: Path | str
processed_data(processing_func, *args, **kwargs) dict

Processed data is some way, given any arguments and key words arguments, and returns a dictionary of the processed data, with keys Note that processing function can be any callable, i.e. closure, class, etc.

Note

The processing function must act on one PointDirectory.

Parameters:
  • processing_func – Callable which is going to process ONE PointDirectory

  • args – Positional arguments to pass to processing func

Returns:

A dictionary of processed data. Keys of the dictionary are the stem of each PointsDirectory contained inside this PointsDirectoryParent instance.

property raw_data: dict

Returns all raw data associated with the PointsDirectoryParent instance. The key is the points directory name (of a PointsDirectory instance) and value is another dictionary.

Returns:

A dictionary of raw data. Keys of the dictionary are the stem of each PointsDirectory contained inside this PointsDirectoryParent instance.

write_to_json_database(root_name: str | None = None, npoints_per_json=500, print_missing_data=True, indent: int = 2, separators=(',', ':')) List[Path]

Makes a database from multiple PointsDirectory-like directories which are contained in this PointsDirectoryParent

Parameters:
  • root_name – The name of the database. If not selected, uses the name of the current PointsDirectoryParent, defaults to None

  • npoints_per_json – Number of json files in each sub-directory, defaults to 500

  • print_missing_data – Whether or not to print missing data, defaults to True

  • indent – json file indent, defaults to 2

  • separators – json file separators, defaults to (“,”, “:”)

write_to_sqlite3_database(db_path: str | Path | None = None, echo=False, print_missing_data=True) Path

Write out important information from a PointsDirectory instance to an SQLite3 database. All PointsDirectory-like directories contained inside will be written to the same database.

Parameters:
  • db_path – database to write to

  • echo – Whether to print out SQL queries from SQL Alchemy, defaults to False

  • print_missing_data – Whether to print out any missing data from each PointDirectory contained in self, defaults to False

Returns:

The path to the written SQL database

class PySCFDirectory(path: Path | str)

Bases: AnnotatedDirectory

classmethod check_path(path: Path) bool

Implement if the path of the directory needs to be checked if it contains something specific

contents = {'aimall_wfn': <class 'ichor.core.files.gaussian.wfn.WFN'>, 'morfi_wfn': <class 'ichor.core.files.pandora.pyscf_output.MorfiWFN'>}
dirname = 'pyscf'
path: Path | str
class Trajectory(path: Path | str, *args, **kwargs)

Bases: ReadFile, WriteFile, ListOfAtoms

Handles .xyz files that have multiple timesteps, with each timestep giving the x y z coordinates of the atoms. A user can also initialize an empty trajectory and append Atoms instances to it without reading in a .xyz file. This allows the user to build custom trajectories containing any sort of geometries.

Parameters:

path – The path to a .xyz file that contains timesteps. Set to None by default as the user can initialize an empty trajectory and built it up themselves

add(atoms)

Add a list of Atoms (corresponding to one timestep) to the end of the trajectory list

alf(alf_calculator: Callable[[...], ALF], *args, **kwargs) List[ALF]

Returns the Atomic Local Frame (ALF) for all Atom instances that are held in Atoms. e.g. [[0,1,2],[1,0,2], [2,0,1]]

Parameters:
  • args – positional arguments to pass to alf calculator

  • kwargs – key word arguments to pass to alf calculator

alf_dict(alf_calculator: Callable[[...], ALF], *args, **kwargs) Dict[str, ALF]

Returns a dictionary with the atomic local frame indices for every atom (0-indexed).

property atom_names

Return the atom names from the first timestep. Assumes that all timesteps have the same number of atoms/atom names.

change_atom_ordering(new_traj_name: Path, new_atom_ordering: List[int])

Changes the atom ordering of the trajectory, given a list of how indices should be permuted and writes out a new trajectory file in the specified location.

Parameters:
  • new_traj_name – Name of new trajectory file

  • new_atom_ordering – A list of indices telling how to permute the current trajectory

connectivity(connectivity_calculator: Callable[[...], ndarray]) ndarray

Return the connectivity matrix n_atoms x n_atoms for the given Atoms instance.

Return type:

np.ndarray of shape n_atoms x n_atoms

property coordinates: ndarray

Returns: :type: np.ndarray the xyz coordinates of all atoms for all timesteps. Shape n_timesteps x n_atoms x 3

coordinates_to_xyz(fname: Path = PosixPath('system_to_xyz.xyz'), step: int = 1)

write a new .xyz file that contains the timestep i, as well as the coordinates of the atoms for that timestep.

Parameters:
  • fname – The file name to which to write the timesteps/coordinates

  • step – Write coordinates for every n^th step. Default is 1, so writes coordinates for every step

classmethod features_file_to_trajectory(f: Path, trajectory_path: Path, atom_types: List[str], header=0, index_col=0, sheet_name=0) Trajectory

Takes in a csv or excel file containing features and convert it to a Trajectory object. It assumes that the features start from the very first column (column after the index column, if one exists). Feature files that are written out by ichor are in Bohr instead of Angstroms for now.

After converting to cartesian coordinates, we have to convert Bohr to Angstroms because .xyz files are written out in Angstroms (and programs like Avogadro, VMD, etc. expect distances in angstroms). Failing to do that will result in xyz files that are in Bohr, so if features are calculated from them again, the features will be wrong.

Parameters:
  • f – Path to the file (either .csv or .xlsx) containing the features. We only need the features for one atom to reconstruct the geometries, thus we only need 1 csv file or 1 sheet of an excel file. By default, the 0th sheet of the excel file is read in.

  • atom_types – A list of strings corresponding to the atom elements (C, O, H, etc.). This has to be ordered the same way as atoms corresponding to the features. Note that the central atom (for which features are given in the file) also needs to be present in this list as the very first atom.

  • header – The row index (0-indexed) of the line in the csv file which contains the names of the columns. Default is set to 0 to use the 0th row.

  • index_col – Whether a column should be used as the index column. Default is set to 0 to use 0th column. If no index column is present, set to False.

  • sheet_name – The excel sheet to be used to convert to xyz. Default is 0. This is only needed for excel files, not csv files.

Note

Ensure that the list of atom names is correct, i.e. that it contains the central atom as the very first atom, and the following atoms are in the ordering that is in the file containing the features.

property natoms

Returns the number of atoms in the first timestep. Each timestep should have the same number of atoms.

classmethod np_array_to_trajectory(arr: ndarray, trajectory_path: str | Path, atom_types: List[str])

Creates a Trajectory instance from a np.ndarray object

Parameters:
  • arr – np.ndarray containing features.This should be a 2D array of shape n_timesteps x n_features

  • trajectory_path – The path associated with the trajectory instance which is made

  • atom_types – A list of atom types (elements) that correspond to the features in the given array. It is important that they are the same order as in the np.ndarray.

Returns:

Trajectory instance containing xyz geometries converted from features

path: Path | str
rmsd(ref=None)
split_packmol_trajectory(atoms_per_molecule: int, trajectory_name='packmol_traj_split.xyz')

Used to create packmol inputs

split_traj(root_dir: Path = PosixPath('split_trajectory'), split_size: int = 1000)

Splits trajectory into sub-trajectories and writes then to a folder. Eg. a 10,000 original trajectory can be split into 10 sub-trajectories containing 1,000 geometries each (given a split size of 1,000).

Parameters:
  • root_dir – The folder to write sub-trajectories to. Must be a Path object and this directory will be created internally.

  • split_size – The split size by which to split original trajectory.

to_dir(system_name: str, every: int = 1, center: bool = False, parent_dir: Path | None = None) Path

Writes out every nth timestep to a separate .xyz file to a given directory

Parameters:
  • system_name – The name of the system. This will be the name of the given directory, with a suffix added. Default suffix is PointsDirectory._suffix

  • every – An integer value that indicates the nth step at which an xyz file should be written. Default is 1. If a value eg. 5 is given, then it will only write out a .xyz file for every 5th timestep.

  • center – Whether or not to subtract mean of coordinates from atomic coordinates, defaults to False

  • parent_dir – A path to a parent directory where the inner directory will be created.

Returns:

The Path object to the made directory

to_dirs(system_name: str, split_size: int = 1000, every: int = 1, center=False) Path

Writes out every nth timestep to a separate .xyz file. This method differs from to_dir because it has a structure system_name_root / points_directory / xyz file. I.e. there is an additional root directory which encapsulates all the PointsDirectory-like directories.

Parameters:
  • system_name – The name of the system. This will be used in the names of the files and directories as well

  • split_size – How many .xyz files are going to be in each of the inner PointsDirectory-like directories

  • every – An integer value that indicates the nth step at which an xyz file should be written. Default is 1. If a value eg. 5 is given, then it will only write out a .xyz file for every 5th timestep.

Returns:

The Path object to the made parent directory

to_multiple_parent_dirs(system_name: str, split_size: int = 1000, nsplits_in_root: int = 5, every: int = 1, center=False)

Splits a trajectory into multiple parent directories, each of which can contain multiple PointsDirectory-like directories.

Parameters:
  • system_name – name of system. This name will be used in the names of the files and directories which are made

  • split_size – The number of .xyz files that inner PointsDirectory-like directory will contain, default 1000

  • nsplits_in_root – The number of splits that are going to be in one root directory, default 5 This would mean that there are 5 x 1000 geometries in that root directory.

  • every – An integer value that indicates the nth step at which an xyz file should be written, defaults to 1

  • center – whether or not to subtract centroid of geometry before writing out xyz. Useful if geometries are far away from the origin which can result in Gaussian failing to write outputs properly, defaults to False

property types

Returns the atom elements for atoms, assumes each timesteps has the same atoms. Removes duplicates.

property types_extended

Returns the atom elements for atoms, assumes each timesteps has the same atoms. Removes duplicates.

class WFN(path: Path | str, method: str | None = None)

Bases: HasAtoms, HasData, ReadFile, WriteFile

Wraps around a .wfn file that is the output of Gaussian. The .wfn file is an output file, but must also implement a write method because AIMAll needs to know the method used in the WFN calculation, otherwise AIMAll can give the wrong results.

Parameters:
  • path – Path object or string to the .wfn file

  • atoms – an Atoms instance which is read in from the top of the .wfn file. Note that the units of the .wfn file are in Bohr.

  • method – The method (eg. B3LYP) which was used in the Gaussian calculation that created the .wfn file. The method is not initially written to the .wfn file by Gaussian, but it is necessary to add it to the .wfn file because AIMAll does not automatically determine the method itself, so it can lead to wrong IQA/multipole moments results. To make sure AIMAll results are correct, the method is a required argument.

Variables:
  • mol_orbitals – The number of molecular orbitals to be read in from the .wfn file.

  • primitives – The number of primitives to be read in from the .wfn file.

  • nuclei – The number of nuclei in the system to be read in from the .wfn file.

  • energy – The molecular energy read in from the bottom of the .wfn file

  • virial – The virial read in from the bottom of the .wfn file

Note

Since the wfn file is written out by Gaussian, we do not really have to modify it when writing out except we need to add the method used, so that AIMALL can use the correct method. Otherwise AIMAll assumes Hartree-Fock was used, which might be wrong.

path: Path | str
property raw_data: Dict[str, float]

Returns the raw data associated with the current object.

Returns:

_description_

Return type:

dict

class WFX(path: Path | str, method: str | None = None)

Bases: HasAtoms, HasData, ReadFile

Wraps around a .wfn file that is the output of Gaussian. The .wfn file is an output file, so it does not have a write method.

Parameters:
  • path – Path object or string to the .wfn file

  • atoms – an Atoms instance which is read in from the top of the .wfn file. Note that the units of the .wfn file are in Bohr.

  • method – The method (eg. B3LYP) which was used in the Gaussian calculation that created the .wfn file. The method is not initially written to the .wfn file by Gaussian, but it is necessary to add it to the .wfn file because AIMAll does not automatically determine the method itself, so it can lead to wrong IQA/multipole moments results. To make sure AIMAll results are correct, the method is a required argument.

Variables:
  • mol_orbitals – The number of molecular orbitals to be read in from the .wfn file.

  • primitives – The number of primitives to be read in from the .wfn file.

  • nuclei – The number of nuclei in the system to be read in from the .wfn file.

  • energy – The molecular energy read in from the bottom of the .wfn file

  • virial – The virial read in from the bottom of the .wfn file

Note

Since the wfn file is written out by Gaussian, we do not really have to modify it when writing out except we need to add the method used, so that AIMALL can use the correct method. Otherwise AIMAll assumes Hartree-Fock was used, which might be wrong.

path: Path | str
property properties: Dict[str, float]
class XYZ(path: Path | str, atoms: Atoms | None = None)

Bases: HasAtoms, ReadFile, WriteFile, File

A class which wraps around a .xyz file that is contained in each PointDirectory. This .xyz file should always be there and it is used to write out .gjf files. Each instance of XYZ only has one geometry. If there is a need to read a .xyz file that contains multiple geometries (i.e. a trajectory file), the use the Trajectory class.

Parameters:
  • path – The path to an .xyz file

  • atoms – Optional list of Atoms which can be used to construct a .xyz file. If a list of atoms is passed, then a new xyz file with the given Atoms will be written to the given Path.

path: Path | str