ichor.core.files package
Subpackages
Submodules
ichor.core.files.directory module
- class AnnotatedDirectory(path: Path | str)
Bases:
Directory,ABCAbstract method for adding a parser for a Directory that has annotated files (such as GJF, Int, WFN). For example, look at the PointDirectory class.
Note
If multiple files with the same extensions are found, they will be stored in a list instead, so accessing an attribute might return a list if multiple files are found with the same extension
- contents = None
- property directories: List[Directory]
Return all objects which are contained in the AnnotatedDirectory instance and that subclass from Directory class.
- dirtypes
value pairs where the keys are the attributes and the values are the type of class these attributes are going to be set to. These classes are all subclassing from the Directory class. For example {‘ints’: INTs}.
- Type:
Returns a dictionary of key
- property files: List[File]
Return all objects which are contained in the AnnotatedDirectory instance and that subclass from File class.
- filetypes
value pairs where the keys are the attributes and the values are the type of class these attributes are going to be set to. These classes are all subclassing from the File class. For example {‘gjf’: GJF, ‘wfn’: WFN}.
- Type:
Returns a dictionary of key
- path: Path | str
- property path_objects: List[PathObject]
Returns a list of PathObjects corresponding to files and directories that are in the instance of AnnotatedDirectory.
- pathtypes
- property type_to_contents: dict
Returns a dictionary containing the class as keys and the attributes as values. Reverses the self.contents attribute
- class Directory(path: Path | str)
Bases:
PathObject,ABCA class that implements helper methods for working with directories (which are stored on a hard drive). :param path: The path to a directory
- classmethod check_path(path: Path) bool
Implement if the path of the directory needs to be checked if it contains something specific
- iterdir()
alias to __iter__ in case child object overrides __iter__
- mkdir()
Make an empty directory at the location of the path attribute.
- move(dst: Path)
Move a directory object to a new location (a new path), modifies the path attribute and moves contents on disk :param dst: The new path of the directory
- property name
- property name_without_suffix
- path: Path | str
ichor.core.files.file module
- class File(path: Path | str)
Bases:
PathObject,ABCAbstract Base Class for any type of file that is used by ICHOR.
- block()
Blocks a file from being read. Contents of the file cannot be read.
- classmethod check_path(path: str | Path) bool
Checks the suffix of the given path matches the filetype associated with class that subclasses from File :param path: A Path object to check :return: True if the Path object has the same suffix as the class filetype, False otherwise
- classmethod get_filetype() str
Returns a filetype for the particular kind of file
- Returns:
A string containing the suffix of the file (the filetype)
- move(dst)
Move the file to a new destination.
- Parameters:
dst – The new path to the file. If a directory, the file is moved inside the directory.
- path: Path | str
- unblock()
Unblocks a blocked file.
- class FileContentsType
Bases:
NoStrA class whose instance is used for class attributes that are read in from a file. If a class attribute is FileContents type, then we read the file and store the read in value. This class allows for lazily reading files (i.e. files are not directly read in when an instance of a File (or its subclasses) is made, but only when attributes of that instance (which are FileContents) are being accessed.
- exception FileReadError
Bases:
Exception
- class FileState(value)
Bases:
EnumAn enum that is used to make it easier to check the current file state. Blocked is actually not used currently.
- Blocked = -1
- Read = 3
- Reading = 2
- Unread = 1
- exception FileWriteError
Bases:
Exception
- class ReadFile(path: Path | str)
Bases:
File,ABC- path: Path | str
- read(*args, **kwargs)
Read the contents of the file. Depending on the type of file, different parts will be read in.
Note
Only files which exist on disk can be read from. Otherwise, nothing will be read in.
- class WriteFile(path: Path | str)
Bases:
File,ABC- path: Path | str
- write(path: str | Path | None = None, *args, **kwargs)
This write method should only be called if no other write method exists. A write method is implemented for files that we typically write out (such as .xyz or .gjf files). But other files (which are outputs of a program, such as .wfn, and .int), we only need to read and do not have to write out ourselves.
ichor.core.files.file_data module
- class HasAtoms
Bases:
ABCAbstract base class for classes which either have a property or attribute of atoms that gives back an Atoms instance.
- C_matrix_dict(system_alf: List[ALF]) Dict[str, ndarray]
Returns a dictionary of key (atom name), value (C matrix np array) for every atom
- alf(alf_calculator: Callable[[...], ALF], *args, **kwargs) List[ALF]
Returns the Atomic Local Frame (ALF) for all Atom instances that are held in Atoms e.g. [[0,1,2],[1,0,2], [2,0,1]]
- alf_dict(alf_calculator: Callable[[...], ALF], *args, **kwargs) Dict[str, ALF]
Returns a list of lists with the atomic local frame indices for every atom (0-indexed).
- alf_list(alf_calculator: Callable[[...], ALF], *args, **kwargs) List[List[int]]
Returns a list of lists with the atomic local frame indices for every atom (0-indexed).
- property atom_names: List[str]
- center_geometry_on_atom_and_write_xyz(central_atom_alf: ALF, central_atom_name: str, fname: str | Path | None = None)
Centers all geometries (from a Trajectory of PointsDirectory instance) onto a central atom and then writes out a new xyz file with all geometries centered on that atom. This is essentially what the ALFVisualizier application (ALFi) does. The features for the central atom are calculated, after which they are converted back into xyz coordinates (thus all geometries) are now centered on the given central atom).
- Parameters:
feature_calculator – Function which calculates features
central_atom_name – the name of the central atom to center all geometries on. Eg. O1
fname – Optional file name in which to save the rotated geometries.
kwargs – Key word arguments to pass to calculator function
- connectivity(connectivity_calculator: Callable[[...], ndarray]) ndarray
Return the connectivity matrix (n_atoms x n_atoms) for the given Atoms instance.
- Returns:
type: np.ndarray of shape n_atoms x n_atoms
- property coordinates: ndarray
- features(feature_calculator: Callable, *args, is_atomic=True, **kwargs) ndarray
- features_dict(feature_calculator: Callable[[...], ndarray], *args, **kwargs) dict
Returns the features in a dictionary for this Atoms instance, corresponding to the features of each Atom instance held in this Atoms isinstance Features are calculated in the Atom class and concatenated to a 2d array here.
e.g. {“C1”: np.array, “H2”: np.array}
- property natoms: int
- property types_extended: List[str]
- class HasData
Bases:
ABCClass used to describe a file containing properties/data for a particular geometry
- property data_names: List[str]
Returns a list of strings corresponding to data names that the object should have. These names can be used as keys in raw_data or processed_data to obtain values. Note that values might be other dictionaries.
- processed_data(processing_func, *args, **kwargs) dict
Processed data is some way, given any arguments and key words arguments, and returns a dictionary of the processed data, with keys
- abstract property raw_data: dict
Returns the raw data associated with the current object.
- Returns:
_description_
- Return type:
dict
ichor.core.files.mol2 module
- class AtomType(value)
Bases:
EnumAn enumeration.
- Al = 'Al'
- Br = 'Br'
- C1 = 'C.1'
- C2 = 'C.2'
- C3 = 'C.3'
- CAr = 'C.ar'
- CCat = 'C.cat'
- Ca = 'Ca'
- Cl = 'Cl'
- CoOH = 'Co.oh'
- Dummy = 'Du'
- F = 'F'
- H = 'H'
- HSPC = 'H.spc'
- HT3P = 'H.t3p'
- I = 'I'
- K = 'K'
- Li = 'Li'
- LonePair = 'LP'
- N1 = 'N.1'
- N2 = 'N.2'
- N3 = 'N.3'
- N4 = 'N.4'
- NAm = 'N.am'
- NAr = 'N.ar'
- NP13 = 'N.p13'
- Na = 'Na'
- O2 = 'O.2'
- O3 = 'O.3'
- OCO2 = 'O.co2'
- OSPC = 'O.spc'
- OT3P = 'O.t3p'
- P3 = 'P.3'
- RuOH = 'Ru.oh'
- S2 = 'S.2'
- S3 = 'S.3'
- SO = 'S.o'
- SO2 = 'S.o2'
- Si = 'Si'
- class BondType(value)
Bases:
EnumAn enumeration.
- Amide = 'am'
- Aromatic = 'ar'
- Double = '2'
- Single = '1'
- Triple = '3'
- Unspecified = 'un'
- class ChargeType(value)
Bases:
EnumAn enumeration.
- Ampac = 'AMPAC_CHARGES'
- DelRe = 'DEL_RE'
- Dict = 'DICT_CHARGES'
- GastHuck = 'GAST_HUCK'
- Gasteiger = 'GASTEIGER'
- Gaussian = 'GAUSS80_CHARGES'
- Huckel = 'HUCKEL'
- MMFF94 = 'MMFF94_CHARGES'
- Mulliken = 'MULLIKEN_CHARGES'
- NoCharges = 'NO_CHARGES'
- Pullman = 'PULLMAN'
- User = 'USER_CHARGES'
- class Mol2Atom(ty: str, x: float, y: float, z: float, index: int | None = None, parent: Atoms | None = None, units: AtomicDistance = AtomicDistance.Angstroms, atom_type: AtomType | None = None)
Bases:
Atom- property atom_type
- property unpaired_electrons
- property valence
Returns the valence of the Atom instance
- Returns:
the valence of the atom (as defined by the atom type)
- Return type:
int
- class MoleculeType(value)
Bases:
EnumAn enumeration.
- BioPolymer = 'BIOPOLYMER'
- NucleicAcid = 'NUCLEIC_ACID'
- Protein = 'PROTEIN'
- Saccharide = 'SACCHARIDE'
- Small = 'SMALL'
- class SybylStatus(value)
Bases:
EnumAn enumeration.
- Altered = 'altered'
- Analyzed = 'analyzed'
- InvalidCharges = 'invalid_charges'
- NONE = '****'
- RefAngle = 'ref_angle'
- Substituted = 'substituted'
- System = 'system'
- bonds_of_type(atom, parent, bond_type)
- get_nbonds(atom)
- get_ring(atom)
- n_bonds_of_type(atom, parent, bond_type)
- other_atom(atom: Atom, atom1: Atom, atom2: Atom) Atom
Return the other atom i.e. not ‘atom’ viev 2 atoms: ‘atom1’ and ‘atom2’
- other_atom_bonds(atom, atom1, atom2) List[Tuple[int, int]]
ichor.core.files.optional_content module
ichor.core.files.path_object module
- class PathObject(path: Path | str)
Bases:
ABC,objectAn abstract base class that is used for anything that has a path (i.e. files or directories)
- classmethod check_path(path: Path) bool
- delete()
Delete the Path object from disk.
- exists() bool
Determines if the path points to an existing directory or file on the storage drive.
- abstract move(dst) None
An abstract method that subclasses need to implement. This is used to move files around.
- path: Path | str
- remove()
Alias for delete
- property stem
Returns the stem of the file (without suffix, if one is present)
ichor.core.files.point_directory module
- class PointDirectory(path: Path | str)
Bases:
AnnotatedDirectory,HasAtoms,HasDataA helper class that wraps around ONE directory which contains ONE point (one molecular geometry).
- Parameters:
path – Path to a directory which contains ONE point.
- atoms_from_file(file_with_atoms: HasAtoms) Atoms
Given a class (which is in the contents of the directory), obtain the Atoms instance from that specific file which is wrapped by the class.
- Parameters:
file_with_atoms – file class which subclasses from HasAtoms and has a
.atomsattribute- Raises:
ichor.core.atoms.AtomsNotFoundError – If file class does not contain atoms
- Returns:
_description_
- Return type:
- classmethod check_path(path: Path) bool
Makes sure that path is PointDirectory-like
- contents = {'aim': <class 'ichor.core.files.aimall.aim.Aim'>, 'gaussian_output': <class 'ichor.core.files.gaussian.gaussian_output.GaussianOutput'>, 'gjf': <class 'ichor.core.files.gaussian.gjf.GJF'>, 'ints': <class 'ichor.core.files.aimall.ints.IntDirectory'>, 'orca_input': <class 'ichor.core.files.orca.orca_input.OrcaInput'>, 'orca_output': <class 'ichor.core.files.orca.orca_output.OrcaOutput'>, 'wfn': <class 'ichor.core.files.gaussian.wfn.WFN'>, 'xtb': <class 'ichor.core.files.ase.opt.XTB'>, 'xyz': <class 'ichor.core.files.xyz.xyz.XYZ'>}
- features(feature_calculator: Callable, *args, is_atomic=True, **kwargs)
Returns the features for this Atoms instance, corresponding to the features of each Atom instance held in this Atoms isinstance Features are calculated in the Atom class and concatenated to a 2d array here.
The array shape is n_atoms x n_features (3*n_atoms - 6)
- Parameters:
is_atomic – whether the feature calculator calculates features for individual atoms or for the whole geometry.
args – positional arguments to pass to feature calculator
kwargs – key word arguments to pass to feature calculator
- Returns:
- type: np.ndarray of shape n_atoms x n_features (3N-6)
Return the feature matrix of this Atoms instance
- path: Path | str
- property raw_data: dict
Returns the raw data associated with the current object.
- Returns:
_description_
- Return type:
dict
ichor.core.files.points_directory module
- class PointsDirectory(path: Path | str, needs_parsing=True, *args, **kwargs)
Bases:
ListOfAtoms,Directory,HasDataA helper class that wraps around a directory which contains points (molecules with various geometries). Calling Directory.__init__(self, path) will call the parse method of PointsDirectory instead of Directory (because Python looks for the method in this class first before looking at parent class methods.) A typical ICHOR directory that contains points will points will have a structure like this:
-TRAINING_SET -- SYSTEM_NAME000 -- SYSTEM_NAME001 -- SYSTEM_NAME002 ........Each of the subdirectories contains Gaussian files (such as .gjf), as well as an atomic_files directory, which then contains the AIMALL files. A PointsDirectory will wrap around the whole TRAINING_SET directory (which contains multiple points), while a PointDirectory will wrap around a SYSTEM_NAME00… folder (which only contains information about 1 point).
- Parameters:
path – Path to a directory which contains points. This path is typically the path to the training set, sample pool, etc.
needs_parsing – By default, every PointsDirectory is parsed when the instance is created to create PointDirectory instances of each inner directory (but the contents of the files are not read). If however, a slice of a already created PointsDirectory is made, the contents of the directories do not need to be parsed again, so needs_parsing would be false
- alf(alf_calculator: Callable[[...], ALF], *args, **kwargs) List[ALF]
Returns the Atomic Local Frame (ALF) for all Atom instances that are held in Atoms e.g.
[[0,1,2],[1,0,2], [2,0,1]].- Parameters:
args – positional arguments to pass to alf calculator
kwargs – key word arguments to pass to alf calculator
- alf_dict(alf_calculator: Callable[[...], ALF], *args, **kwargs) Dict[str, ALF]
- Returns a dictionary of key: atom_name, value: ALF instance
(containing central atom index, x-axis idx, xy-plane idx)
e.g.
{"O1":ALF(0,1,2),"H2":ALF(1,0,2), "H3":ALF(2,0,1)].- Parameters:
args – positional arguments to pass to alf calculator
kwargs – key word arguments to pass to alf calculator
- property atom_names: List[str]
Return the atom names from the first timestep. Assumes that all timesteps have the same number of atoms/atom names.
- classmethod check_path(path: Path) bool
Makes sure that path is PointsDirectory-like
- connectivity(connectivity_calculator: Callable[[...], ndarray]) ndarray
Return the connectivity matrix (n_atoms x n_atoms) for the given Atoms instance.
- Returns:
type: np.ndarray of shape n_atoms x n_atoms
- property coordinates: ndarray
the xyz coordinates of all atoms for all timesteps. Shape n_timesteps x n_atoms x 3
- Type:
return
- coordinates_to_xyz(fname: str | Path | None = PosixPath('system_to_xyz.xyz'), step: int | None = 1)
write a new .xyz file that contains the timestep i, as well as the coordinates of the atoms for that timestep.
- Parameters:
fname – The file name to which to write the timesteps/coordinates
step – Write coordinates for every n^th step. Default is 1, so writes coordinates for every step
- coordinates_to_xyz_with_errors(models_path: str | Path, fname: str | Path | None = PosixPath('xyz_with_properties_error.xyz'), step: int | None = 1)
Write a new .xyz file that contains the timestep i, as well as the coordinates of the atoms for that timestep. The comment lines in the xyz have absolute predictions errors. These can then be plotted in ALFVisualizer as cmap to see where poor predictions happen.
- Parameters:
models_path – The model path to one atom.
property – The property for which to predict for and get errors (iqa or any multipole moment)
fname – The file name to which to write the timesteps/coordinates
step – Write coordinates for every n^th step. Default is 1, so writes coordinates for every step
- features_with_properties_to_csv(system_alf: Dict[str, ALF], str_to_append_to_fname: str = '_features_with_properties.csv', atom_names: List[str] | None = None, property_types: List[str] | None = None, **kwargs)
Calculates ALF features and properties (with multipole moments rotated).
- Parameters:
str_to_append_to_fname – a string that is appended to the default file name (which is
name_of_atom.csv), defaults to Noneatom_names – A list of atom names for which to write out csv files with properties. If None, then writes out files for all atoms in the system, defaults to None
property_types – A list of property names (iqa, multipole names) for which to write columns. If None, then writes out columns for all properties, defaults to None
args – positional arguments to pass to calculator function
kwargs – key word arguments to be passed to the feature calculator function
- Raises:
TypeError – This method only works for PointsDirectory instances because it needs access to AIMALL information. Does not work for Trajectory instances.
- features_with_wfn_energy_and_dE_df_to_csv(alf_list: List[ALF], central_atom_idx: int, str_to_append_to_fname: str = '_features_with_dE_df.csv', **kwargs)
Writes out a csv file containing wfn energy and FORCEs calculated for every feature. Note that the forces (dE/df_i) are the negative of the PES gradient, so for machine learning, the negative of these forces needs to be taken to add gradient information into GP models.
- Parameters:
system_alf – A list of ALF instances containing alf info
central_atom_idx (int) – The central atom which to center the alf on and for which dE/df will be calculated
str_to_append_to_fname (str, optional) – _description_, defaults to “_features_with_properties.csv”
- classmethod from_trajectory(trajectory_path: str | Path, system_name: str | None = None, every=1, center=True) PointsDirectory
Generate a PointsDirectory-type structure directory from a trajectory (.xyz) file
- Parameters:
trajectory_path – A str or Path to a .xyz file containing geometries
system_name – The name of the chemical system. This is going to be the name of the directory which will be created.
center – Whether to center the geometries on the centroid of the system. This is useful to prevent the molecule from translating in 3D space (and prevents issues with WFN files, where a very large x,y,z value (over 100) for the coordinates leads to **** being written in the .wfn file…)
- property natoms: int
Returns the number of atoms in the first timestep. Each timestep should have the same number of atoms.
- path: Path | str
- processed_data(processing_func, *args, **kwargs) dict
Processed data is some way, given any arguments and key words arguments, and returns a dictionary of the processed data, with keys Note that processing function can be any callable, i.e. closure, class, etc.
Note
The processing function must act on one PointDirectory.
- Parameters:
processing_func – Callable which is going to process ONE PointDirectory
args – Positional arguments to pass to processing func
- Returns:
A dictionary of processed data. Keys of the dictionary are the stem of each PointDirectory contained inside this PointsDirectory instance.
- properties(system_alf: List[ALF] | None = None, specific_property: str | None = None)
Get properties contained in the PointDirectory. IF no system alf is passed in, an automatic process to get C matrices is started.
- Parameters:
system_alf – Optional list of ALF instances that can be passed in to use a specific alf instead of automatically trying to compute it.
key – return only a specific key from the returned dictionary
- property raw_data: dict
Returns all raw data associated with the PointsDirectory instance. The key is the point name (of a PointDirectory instance) and value is the raw data associated with the one point.
- Returns:
A dictionary of raw data. Keys of the dictionary are the stem of each PointsDirectory contained inside this PointsDirectoryParent instance.
- property total_energy
Returns np array of wfn energies of all points
- Returns:
np array of total energy (in Hartree) for all points
- property types: List[str]
Returns the atom elements for atoms, assumes each timesteps has the same atoms. Removes duplicates.
- property types_extended: List[str]
Returns the atom elements for atoms, assumes each timesteps has the same atoms. Removes duplicates.
- property wfn_energy: ndarray
Returns np array of wfn energies of all points
- Returns:
np array of total energy (in Hartree) for all points
- write_to_json_database(root_path: str | ~pathlib.Path | None = None, datafunction: ~typing.Callable = <function get_data_for_point>, npoints_per_json=500, print_missing_data=True, indent: int = 2, separators=(', ', ':')) Path
Write out important information from a PointsDirectory instance to a json file.
- Parameters:
root_path – Name of directory which will the json database. This is a directory, which contains multiple directories inside. Each directory inside is one PointsDirectory. The reason for implementing like this is if using for multiple PointsDirectory-ies at once, so that data for each PointDirectory is written in a separate folder
datafunction – A function used to get all data for a single point. This data is going to get written to the json file.
npoints_per_json – Maximum number of geometries to write to one json file This is done so that the individual files do not become very large.
print_missing_data – Whether to print out any missing data from each PointDirectory contained in self, defaults to False
indent – integer representing number of spaces to indent, defaults to 2
separators – Separators used for each entry, default (“,”, “:”)
- Returns:
The path to the written json file
- write_to_sqlite3_database(db_path: str | Path | None = None, echo=False, print_missing_data=True) Path
Write out important information from a PointsDirectory instance to an SQLite3 database.
- Parameters:
db_path – database to write to
echo – Whether to print out SQL queries from SQL Alchemy
echo – Whether to print out SQL queries from SQL Alchemy, defaults to False
print_missing_data – Whether to print out any missing data from each PointDirectory contained in self, defaults to False
- Returns:
The path to the written SQL database
ichor.core.files.points_directory_parent module
- class PointsDirectoryParent(path: Path | str)
Bases:
list,DirectoryShould wrap around multiple PointsDirectory-ies.
- path: Path | str
- processed_data(processing_func, *args, **kwargs) dict
Processed data is some way, given any arguments and key words arguments, and returns a dictionary of the processed data, with keys Note that processing function can be any callable, i.e. closure, class, etc.
Note
The processing function must act on one PointDirectory.
- Parameters:
processing_func – Callable which is going to process ONE PointDirectory
args – Positional arguments to pass to processing func
- Returns:
A dictionary of processed data. Keys of the dictionary are the stem of each PointsDirectory contained inside this PointsDirectoryParent instance.
- property raw_data: dict
Returns all raw data associated with the PointsDirectoryParent instance. The key is the points directory name (of a PointsDirectory instance) and value is another dictionary.
- Returns:
A dictionary of raw data. Keys of the dictionary are the stem of each PointsDirectory contained inside this PointsDirectoryParent instance.
- write_to_json_database(root_name: str | None = None, npoints_per_json=500, print_missing_data=True, indent: int = 2, separators=(',', ':')) List[Path]
Makes a database from multiple PointsDirectory-like directories which are contained in this PointsDirectoryParent
- Parameters:
root_name – The name of the database. If not selected, uses the name of the current PointsDirectoryParent, defaults to None
npoints_per_json – Number of json files in each sub-directory, defaults to 500
print_missing_data – Whether or not to print missing data, defaults to True
indent – json file indent, defaults to 2
separators – json file separators, defaults to (“,”, “:”)
- write_to_sqlite3_database(db_path: str | Path | None = None, echo=False, print_missing_data=True) Path
Write out important information from a PointsDirectory instance to an SQLite3 database. All PointsDirectory-like directories contained inside will be written to the same database.
- Parameters:
db_path – database to write to
echo – Whether to print out SQL queries from SQL Alchemy, defaults to False
print_missing_data – Whether to print out any missing data from each PointDirectory contained in self, defaults to False
- Returns:
The path to the written SQL database
ichor.core.files.qcp module
Module contents
- class AbInt(path: str | Path)
-
- classmethod check_path(path: Path) bool
Checks the suffix of the given path matches the filetype associated with class that subclasses from File :param path: A Path object to check :return: True if the Path object has the same suffix as the class filetype, False otherwise
- property e_inter
- path: Path | str
- property raw_data: dict
Returns the raw data associated with the current object.
- Returns:
_description_
- Return type:
dict
- class Aim(path: Path)
Bases:
ReadFile,dictClass which wraps around an AIMAll output file, where settings and timings are written out to. The .int files are parsed separately in the INT/INTs classes.
- path: Path | str
- class DlPolyConfig(system_name: str, trajectory: Trajectory, path: Path | str = PosixPath('CONFIG'), cell_size: float = 50.0, comment_line='Frame : 1\n')
Bases:
WriteFileWrite out a DLPoly CONFIG file. The name of the file needs to be CONFIG, so DL POLY knows to use it.
- Parameters:
system_name – the name of the chemical system
trajectory – a Trajectory instance containing the geometries that are going to be written to the CONFIG file. Each timestep in the trajectory is an Atoms instance.
path – The path to the CONFIG file, defaults to Path(‘CONFIG’)
cell_size – The size of the box, float
line (comment) –
The very first line in the CONFIG file. Must be below 72 characters
Note
ALL of the timesteps in the Trajectory will be written to one CONFIG file. Each timestep groups geometries which should be represented by a GP model. For example, if each timestep is only one molecule, then it means it is a monomer model and the labels of the atoms in the CONFIG will show that. If each timestep is two molecules, it means it is a dimer model, so then the labels in the CONFIG file will make sure that two molecules which should be represented by one GP model have the correct atom labeling in the CONFIG file.
- path: Path | str
- class DlPolyControl(system_name: str, path: Path = PosixPath('CONTROL'), ensemble: str = 'nvt', thermostat: str = 'hoover', thermostat_settings: list = [0.04], temperature: int = 1, timestep=0.001, steps=500, scale=100, cutoff=8.0, rvwd=8.0, dump=1000, trajectory_i=0, trajectory_j=1, trajectory_k=0, print_every=1, stats_every=1, job_time=10000000, close_time=20000)
Bases:
WriteFileWrite out a DLPoly CONTROL file. The name of the file needs to be CONTROL, so DL POLY knows to use it. The default Control file is made to be used for geometry optimizations at very low temperatures. Settings must be changed to write out a file for water box simulations for example.
- path: Path | str
- class DlPolyFFLUX(path: Path | str = PosixPath('FFLUX'))
Bases:
ReadFileREADS the FFLUX file from FFLUX.
- Parameters:
path – Path to FFLUX file
- Variables:
df – A pandas dataframe storing all the data in the FFLUX file.
sum_iqa_energy – The total energies array of shape ntimesteps
vdw_energy – The Van der Waals energies of each timestep. Only computed if there are multiple molecules. Otherwise they will be 0.0
electrostatic_energy – The electrostatic energies of each timestep. Only computed if there are multiple molecules. Otherwise they will be 0.0
- classmethod check_path(path: Path) bool
Checks the suffix of the given path matches the filetype associated with class that subclasses from File :param path: A Path object to check :return: True if the Path object has the same suffix as the class filetype, False otherwise
- property delta_between_timesteps: List[float]
Calculates the delta energy (in kJ mol-1) between each pairs of timesteps. Useful for checking convergence of energy when doing optimizations.
- Returns:
List containing the first index (timestep) where the threshold is met as well as the list of differences for all timesteps
- property delta_between_timesteps_kj_mol: List[float]
Calculates the delta energy (in kJ mol-1) between each pairs of timesteps. Useful for checking convergence of energy when doing optimizations.
- Returns:
List containing the first index (timestep) where the threshold is met as well as the list of differences for all timesteps
- property electrostatic_energy
- first_index_where_delta_less_than(delta=0.0001) int
Returns first index where the energy between timesteps is below delta (in kJ mol-1)
- Parameters:
delta – The threshold when geometry is converged, defaults to 1e-4 kJ mol-1
- property kinetic_energy
- property ntimesteps
- path: Path | str
- plot_abs_differences(until_converged_energy=True)
- property sum_iqa_energy
- property total_energy
- property total_energy_kj_mol
- property vdw_energy
- class DlPolyField(system_name: str, atoms: Atoms, path: Path | str = PosixPath('FIELD'), nummols=1)
Bases:
WriteFile- path: Path | str
- class DlPolyHistory(path: Path | None = PosixPath('HISTORY'))
Bases:
TrajectoryDLPOLY HISTORY File
Inherits from Trajectory as is a list of Atoms Builds on the Trajectory class by adding DLPOLY information provided by the HISTORY file
Warning
Indexing the history as a python list, i.e. history[1000] is not guaranteed to give the 1000th timestep (0-indexed). This is because sometimes there is binary written to the HISTORY file which messes up some geometries. These geometries are excluded from the read in data, so indexing as a list will might return a different timestep.
To make sure that the exact timestep is returned (useful when you also want to get data from FFLUX or IQA_ENERGIES/IQA_FORCES file, then ensure that you check the
ntimestepattribute of a timestep). This will be correct, even if some geoemtries are missing.To get a list of missing timesteps, use the self.removed_timesteps attribute of the DlPolyHistory class.
- classmethod check_path(path: Path) bool
Checks the suffix of the given path matches the filetype associated with class that subclasses from File :param path: A Path object to check :return: True if the Path object has the same suffix as the class filetype, False otherwise
- path: Path | str
- write_final_geometry_to_xyz(xyz_path: Path)
- write_to_trajectory(path: str = 'TRAJECTORY.xyz')
Writes a trajectory .xyz file from the DL POLY HISTORY file.
- class DlPolyIQAEnergies(path: Path | str = PosixPath('IQA_ENERGIES'))
Bases:
ReadFileREADS the IQA_ENERGIES file from FFLUX.
- Parameters:
path – Path to IQA_ENERGIES file
- Variables:
natoms – Number of atoms in system
energies – Array of shape ntimesteps x natoms for read energies
- classmethod check_path(path: Path) bool
Checks the suffix of the given path matches the filetype associated with class that subclasses from File :param path: A Path object to check :return: True if the Path object has the same suffix as the class filetype, False otherwise
- path: Path | str
- class DlPolyIQAForces(path: Path | str = PosixPath('IQA_FORCES'))
Bases:
ReadFileREADS the IQA_FORCES file from FFLUX.
- Parameters:
path – Path to IQA_FORCES file
- Variables:
forces – The forces array of shape ntimesteps x natoms x 3. Initialized as FileContents prior to file reading.
natoms – Number of atoms in each timestep
- check_forces_less_than_value(value=0.001) ndarray
Checks what timesteps have all forces less than value. The GP models will revert back to prior mean when far away from training data, so that the forces on atoms will be 0.
We can check for that because if the forces are consistently less than the value then either the simulation has crashed or a minimum is reached
- Parameters:
value – Value for which all forces need to be less than
- Returns:
np.ndarray containing timestep indices for which condition is true If len(array) is 0, then the condition is not met for any timestep. Could be useful to check if a geometry is optimized or simulation crashed.
- classmethod check_path(path: Path) bool
Checks the suffix of the given path matches the filetype associated with class that subclasses from File :param path: A Path object to check :return: True if the Path object has the same suffix as the class filetype, False otherwise
- path: Path | str
- class FFLUXDirectory(path: Path | str)
Bases:
AnnotatedDirectoryREADS a FFLUX Directory containing FFLUX, IQA_ENERGIES, IQA_FORCEs and HISTORY file
- Parameters:
path – Path to FFLUX Directory
- contents = {'fflux_file': <class 'ichor.core.files.dl_poly.dl_poly_fflux.DlPolyFFLUX'>, 'history_file': <class 'ichor.core.files.dl_poly.dl_poly_history.DlPolyHistory'>, 'iqa_energies_file': <class 'ichor.core.files.dl_poly.dl_poly_iqa_energies.DlPolyIQAEnergies'>, 'iqa_forces_file': <class 'ichor.core.files.dl_poly.dl_poly_iqa_forces.DlPolyIQAForces'>}
- property coordinates: ndarray
Returns coordinates as array of shape ntimesteps x natoms x 3
- property iqa_energies: ndarray
Returns individual atom iqa enegy array of shape ntimesteps x natoms
- property iqa_forces: ndarray
Returns iqa forces array of shape ntimesteps x natoms x 3
- property natoms: int
Returns number of atoms
- path: Path | str
- property total_iqa_energies: ndarray
Returns total energy array of shape ntimesteps
- class GJF(path: Path | str, link0: List[str] | None = None, print_level: PrintLevel | None = None, method: str | None = None, basis_set: str | None = None, keywords: List[str] | None = None, title: str | None = None, charge: int | None = None, spin_multiplicity: int | None = None, atoms: Atoms | None = None, output_chk: bool = False)
Bases:
ReadFile,WriteFile,HasAtomsWraps around a .gjf file that is used as input to Gaussian. See https://gaussian.com/input/ for details. Below is the usual gjf file structure:
%nproc %mem # <job_type> <method>/<basis-set> <keywords> Title 0 1 <atom-name> <todo: add -1 for freeze> <x> <y> <z> ... extra_details_str (containing basis sets for individual atoms, what to freeze, etc.) <wfn-name> blank line blank line blank line ...
- Parameters:
path – A string or Path to the .gjf file. If a path is not give, then there is no file to be read, so the user has to write the file contents. If no contents/options are written by user, they are written as the default values in the
writemethod.title – A string to be written between the link0 options and the keywords. It can contain any information.
job_type – The job type, an energy, optimization, or frequency
keywords – A list of keywords to be added to the Gaussian keywords line
method – The method to be used by Gaussian (e.g. B3LYP)
basis_set – The basis set to be used by Gaussian (e.g. 6-31+g(d,p))
charge – The charge to be used by Gaussian for the system
multiplicity – The multiplicity to be used by Gaussian for the system.
atoms – An Atoms instance containing a geometry to be written in the .gjf file. This is either read in (if an existing gjf path is given) or an error is thrown when attempting to write the gjf file (because no gjf file or Atoms instance was given)
extra_calculation_details – A list of strings to be added to the bottom of the gjf file (after atoms section containing atom names and coordinates). This is done in order to handle different basis sets for individual atoms, modredundant settings, and other settings that Gaussian handles.
Note
It is up to the user to handle write the extra_calculation_details settings. ICHOR does NOT do checks to see if these additional settings are going to be read in correctly in Gaussian.
- add_keyword(keyword: str)
Add a keyword to the Gaussian input keywords
- Parameters:
keywords – A string to add as a keyword
Note
The keyword is not checked internally.
- add_keywords(keywords: List[str])
Add a list of keywords to the Gaussian input keywords
- Parameters:
keywords – A list of keywords
Note
The keywords are not checked internally.
- output_wfn()
Helper method to add ‘output=wfn’ to the GJF keyword list
- path: Path | str
- set_mem(mem: str)
Sets memory for Gaussian job
- Parameters:
mem – string to set as memory
Note
This is not checked internally.
- set_nproc(nproc: int)
Sets the number of processor cores for Gaussian
- Parameters:
nproc – An integer which is the number of cores.
Note
No checks are done for CPU core count.
- class GaussianOutput(path: Path | str)
Bases:
ReadFile,HasAtoms,HasDataWraps around a .gaussianoutput file that is the output of Gaussian. This file contains coordinates (in Angstroms), forces, as well as molecular multipole moments.
- Parameters:
path – Path object or string to the .gaussianoutput file that are Gaussian output files
- path: Path | str
- property raw_data: dict
Returns the raw data associated with the current object.
- Returns:
_description_
- Return type:
dict
- rotated_forces(rotation_matrix: ndarray) dict
Rotates forces gives a rotation_matrix, which could be the C matrix to rotate on an ALF axis system with central atom, x-axis atom, and xy-plane atom.
- Parameters:
rotation_matrix – A 3x3 rotation matrix
- class Int(path: Path | str)
-
Wraps around one .int file which is generated by AIMALL for every atom in the system.
- Parameters:
path – The Path object corresponding to an .int file
parent – An Atoms instance which holds the coordinate information for all atoms in the system. This information is needed to form the C matrix when rotating multipoles from the global to the local frame. Note that the Atoms instance must contain the same atom name (i.e. atom type + atom index), so that rotating of the multipoles can happen.
- property atom_num: int
Returns the atom index in the system. (atom indices in atom names start at 1)
- property bond_critical_points: List[CriticalPoint]
Returns list of bond critical points
- property cage_critical_points: List[CriticalPoint]
Returns list of ring critical points
- classmethod check_path(path: Path) bool
Checks the path is the same as for .int file. The _ in the .int file indicates AB interactions, which have to be read in differently because the file is strucutred differently.
- property dipole_mag: float
Returns the magnitude of the dipole moment of the topological atom. The magnitude of the vector is not affected by the rotation of multipoles.
- property e_intra: float
- property global_multipole_moments: dict
Returns the spherical multipole moments calculated by AIMAll. .. note:
These are in the global (Cartesian) frame, i.e. they have NOT been rotated using ALF. Rotation is done by converting to Cartesian, rotating, and then converting back to spherical.
- property i: int
Returns the atom index in the system. (atom indices in atom names start at 1)
- property integration_error: float
The integration error can tell you if a point has been decomposed into topological atoms correctly. A large integration error signals that the point might not be suitable for training as the AIMALL IQA/multipole moments might be inaccurate.
- property iqa: float
Returns the IQA energy of the topological atom that was calculated for this topological atom (since 1 .int file is written for each topological atom).
- local_spherical_multipoles(C: ndarray) Dict[str, float]
Rotates global spherical multipoles into local spherical multipoles. Optionally a rotation matrix can be passed in. Otherwise, the wfn file associated with this int file (as read in from the int file) will be used (if it exists).
- Parameters:
C – Rotation matrix to be used to rotate multipoles.
- Raises:
FileNotFoundError – If no C_matrix is passed in and the wfn file associated with the int file does not exist. Then we cannot calculate multipoles.
- path: Path | str
- property q: float
Returns the point charge (monopole moment) of the topological atom.
- property q00: float
Returns the point charge (monopole moment) of the topological atom.
- property raw_data
Get properties which we are interested in machine learning from the INT file. Rotate multipoles using a given C matrix.
- property ring_critical_points: List[CriticalPoint]
Returns list of ring critical points
- class IntDirectory(path: Path | str)
Bases:
HasData,AnnotatedDirectoryWraps around a directory which contains all .int files for the system.
- Parameters:
path – The Path corresponding to a directory holding .int files
parent – An Atoms instance that holds coordinate information for all the atoms in the system. Things like XYZ and GJF hold geometry.
- classmethod check_path(path: Path) bool
Checks if the given Path instance has _atomicfiles in its name.
- contents = {'interaction_ints': <class 'ichor.core.files.aimall.ab_int.AbInt'>, 'ints': <class 'ichor.core.files.aimall.int.Int'>}
- get(pattern: str, default=None)
Does the same thing as get of a dictionary, returning a default is KeyError
- path: Path | str
- properties(C_dict: Dict[str, ndarray]) Dict[str, Dict[str, float]]
Returns a dictionary of dictionaries containing atom names as keys an a dictionary as value. The value dictionary contains the properties we are interested in machine learning as keys and the values of these properties as floats. A list of C matrices needs to be passed in because we must rotate the multipoles.
- Parameters:
C_list – A list of rotation matrices, each of the atoms
- Raises:
FileNotFoundError – If no C_matrix is passed in and the wfn file associated with the int file does not exist. Then we cannot calculate multipoles.
- property raw_data: dict
Returns data associated with each atom. If interaction ints are present, also adds these to the dictionary.
- class MorfiDirectory(path: Path | str)
Bases:
AnnotatedDirectory- classmethod check_path(path: Path) bool
Implement if the path of the directory needs to be checked if it contains something specific
- contents = {'mout': <class 'ichor.core.files.pandora.mout.MOUT'>}
- dirname = 'morfi-2pdm'
- path: Path | str
- class OrcaEngrad(path: Path | str)
Bases:
ReadFile,HasAtoms,HasData- property gradient: ndarray
- path: Path | str
- property raw_data: dict
Returns the raw data associated with the current object.
- Returns:
_description_
- Return type:
dict
- class OrcaInput(path: Path | str, method: str | None = None, basis_set: str | None = None, main_input: List[str] | None = None, charge: int | None = None, spin_multiplicity: int | None = None, atoms: Atoms | None = None, input_blocks: Dict[str, List[tuple]] | None = None)
Bases:
ReadFile,WriteFile,File,HasAtomsWraps around an ORCA input file that is used as input to ORCA.
- Parameters:
path – A string or Path to the ORCA input file file. If a path is not give, then there is no file to be read, so the user has to write the file contents. If no contents/options are written by user, they are written as the default values in the
writemethod.method – The method to use for calculation, defaults to b3lyp/g if not given
basis_set – The basis set for the calculation, defaults to “6-31+g(d,p)”
main_input – A list of strings which are commands beginning with ! charge: Optional[int] = None, spin_multiplicity: Optional[int] = None, atoms: Optional[Atoms] = None, input_blocks: Dict[str, Union[str, List[str]]]
charge – The charge of the system
spin_multiplicity – The spin multiplicity of the system
atoms – An Atoms instance that contains the molecular structure
input_blocks – A dictionary consisting of keys: The option, and values: A list containing even number of elements. The option is going to be written out with a %, followed by the specifications that the user gives for the option
Note
There is no checking of what the inputs are, so it is up to the user to make sure that the inputs are correct.
Note
Gaussian uses a different b3lyp version (https://sites.google.com/site/orcainputlibrary/dft-calculations) so use b3lyp/g (this is the Gaussian implementation) instead of b3lyp
References
https://sites.google.com/site/orcainputlibrary/home https://www.cup.uni-muenchen.de/oc/zipse/teaching/computational-chemistry-2/topics/a-typical-orca-output-file/ https://www.orcasoftware.de/tutorials_orca/first_steps/input_output.html https://www.afs.enea.it/software/orca/orca_manual_4_2_1.pdf (note this is for version 4, not 5) version 5 manual, needs login: available in https://orcaforum.kofo.mpg.de/app.php/dlext/?view=detail&df_id=186 https://orcaforum.kofo.mpg.de/viewtopic.php?f=8&t=7470&p=32102&hilit=atomic+force#p32102
- path: Path | str
- class OrcaOutput(path: Path | str)
Bases:
HasAtoms,HasData,ReadFileWraps around an .orcaoutput file that is the output of ORCA. Contains information such as coordinates (in Angstroms) and molecular dipoles and quadrupoles.
- Parameters:
path – Path object or string to the .orcaoutput file that are ORCA output files
- path: Path | str
- property raw_data: dict
Returns the raw data associated with the current object.
- Returns:
_description_
- Return type:
dict
- class PandoraDirectory(path: Path | str)
Bases:
HasAtoms,AnnotatedDirectory- classmethod check_path(path: Path) bool
Implement if the path of the directory needs to be checked if it contains something specific
- contents = {'input': <class 'ichor.core.files.pandora.pandora_input.PandoraInput'>, 'morfi': <class 'ichor.core.files.pandora.morfi_output.MorfiDirectory'>, 'pyscf': <class 'ichor.core.files.pandora.pyscf_output.PySCFDirectory'>}
- dirname = 'pandora'
- path: Path | str
- write()
- class PandoraInput(path: Path, atoms: Atoms | None = None, ccsdmod: PandoraCCSDmod = FileContents, morfi_grid_radial: float = FileContents, morfi_grid_angular: int = FileContents, morfi_grid_radial_h: float = FileContents, morfi_grid_angular_h: int = FileContents, method: str = FileContents, basis_set: str = FileContents)
Bases:
HasAtoms,ReadFile,WriteFile- path: Path | str
- class PointDirectory(path: Path | str)
Bases:
AnnotatedDirectory,HasAtoms,HasDataA helper class that wraps around ONE directory which contains ONE point (one molecular geometry).
- Parameters:
path – Path to a directory which contains ONE point.
- atoms_from_file(file_with_atoms: HasAtoms) Atoms
Given a class (which is in the contents of the directory), obtain the Atoms instance from that specific file which is wrapped by the class.
- Parameters:
file_with_atoms – file class which subclasses from HasAtoms and has a
.atomsattribute- Raises:
ichor.core.atoms.AtomsNotFoundError – If file class does not contain atoms
- Returns:
_description_
- Return type:
- classmethod check_path(path: Path) bool
Makes sure that path is PointDirectory-like
- contents = {'aim': <class 'ichor.core.files.aimall.aim.Aim'>, 'gaussian_output': <class 'ichor.core.files.gaussian.gaussian_output.GaussianOutput'>, 'gjf': <class 'ichor.core.files.gaussian.gjf.GJF'>, 'ints': <class 'ichor.core.files.aimall.ints.IntDirectory'>, 'orca_input': <class 'ichor.core.files.orca.orca_input.OrcaInput'>, 'orca_output': <class 'ichor.core.files.orca.orca_output.OrcaOutput'>, 'wfn': <class 'ichor.core.files.gaussian.wfn.WFN'>, 'xtb': <class 'ichor.core.files.ase.opt.XTB'>, 'xyz': <class 'ichor.core.files.xyz.xyz.XYZ'>}
- features(feature_calculator: Callable, *args, is_atomic=True, **kwargs)
Returns the features for this Atoms instance, corresponding to the features of each Atom instance held in this Atoms isinstance Features are calculated in the Atom class and concatenated to a 2d array here.
The array shape is n_atoms x n_features (3*n_atoms - 6)
- Parameters:
is_atomic – whether the feature calculator calculates features for individual atoms or for the whole geometry.
args – positional arguments to pass to feature calculator
kwargs – key word arguments to pass to feature calculator
- Returns:
- type: np.ndarray of shape n_atoms x n_features (3N-6)
Return the feature matrix of this Atoms instance
- path: Path | str
- property raw_data: dict
Returns the raw data associated with the current object.
- Returns:
_description_
- Return type:
dict
- class PointsDirectory(path: Path | str, needs_parsing=True, *args, **kwargs)
Bases:
ListOfAtoms,Directory,HasDataA helper class that wraps around a directory which contains points (molecules with various geometries). Calling Directory.__init__(self, path) will call the parse method of PointsDirectory instead of Directory (because Python looks for the method in this class first before looking at parent class methods.) A typical ICHOR directory that contains points will points will have a structure like this:
-TRAINING_SET -- SYSTEM_NAME000 -- SYSTEM_NAME001 -- SYSTEM_NAME002 ........Each of the subdirectories contains Gaussian files (such as .gjf), as well as an atomic_files directory, which then contains the AIMALL files. A PointsDirectory will wrap around the whole TRAINING_SET directory (which contains multiple points), while a PointDirectory will wrap around a SYSTEM_NAME00… folder (which only contains information about 1 point).
- Parameters:
path – Path to a directory which contains points. This path is typically the path to the training set, sample pool, etc.
needs_parsing – By default, every PointsDirectory is parsed when the instance is created to create PointDirectory instances of each inner directory (but the contents of the files are not read). If however, a slice of a already created PointsDirectory is made, the contents of the directories do not need to be parsed again, so needs_parsing would be false
- alf(alf_calculator: Callable[[...], ALF], *args, **kwargs) List[ALF]
Returns the Atomic Local Frame (ALF) for all Atom instances that are held in Atoms e.g.
[[0,1,2],[1,0,2], [2,0,1]].- Parameters:
args – positional arguments to pass to alf calculator
kwargs – key word arguments to pass to alf calculator
- alf_dict(alf_calculator: Callable[[...], ALF], *args, **kwargs) Dict[str, ALF]
- Returns a dictionary of key: atom_name, value: ALF instance
(containing central atom index, x-axis idx, xy-plane idx)
e.g.
{"O1":ALF(0,1,2),"H2":ALF(1,0,2), "H3":ALF(2,0,1)].- Parameters:
args – positional arguments to pass to alf calculator
kwargs – key word arguments to pass to alf calculator
- property atom_names: List[str]
Return the atom names from the first timestep. Assumes that all timesteps have the same number of atoms/atom names.
- classmethod check_path(path: Path) bool
Makes sure that path is PointsDirectory-like
- connectivity(connectivity_calculator: Callable[[...], ndarray]) ndarray
Return the connectivity matrix (n_atoms x n_atoms) for the given Atoms instance.
- Returns:
type: np.ndarray of shape n_atoms x n_atoms
- property coordinates: ndarray
the xyz coordinates of all atoms for all timesteps. Shape n_timesteps x n_atoms x 3
- Type:
return
- coordinates_to_xyz(fname: str | Path | None = PosixPath('system_to_xyz.xyz'), step: int | None = 1)
write a new .xyz file that contains the timestep i, as well as the coordinates of the atoms for that timestep.
- Parameters:
fname – The file name to which to write the timesteps/coordinates
step – Write coordinates for every n^th step. Default is 1, so writes coordinates for every step
- coordinates_to_xyz_with_errors(models_path: str | Path, fname: str | Path | None = PosixPath('xyz_with_properties_error.xyz'), step: int | None = 1)
Write a new .xyz file that contains the timestep i, as well as the coordinates of the atoms for that timestep. The comment lines in the xyz have absolute predictions errors. These can then be plotted in ALFVisualizer as cmap to see where poor predictions happen.
- Parameters:
models_path – The model path to one atom.
property – The property for which to predict for and get errors (iqa or any multipole moment)
fname – The file name to which to write the timesteps/coordinates
step – Write coordinates for every n^th step. Default is 1, so writes coordinates for every step
- features_with_properties_to_csv(system_alf: Dict[str, ALF], str_to_append_to_fname: str = '_features_with_properties.csv', atom_names: List[str] | None = None, property_types: List[str] | None = None, **kwargs)
Calculates ALF features and properties (with multipole moments rotated).
- Parameters:
str_to_append_to_fname – a string that is appended to the default file name (which is
name_of_atom.csv), defaults to Noneatom_names – A list of atom names for which to write out csv files with properties. If None, then writes out files for all atoms in the system, defaults to None
property_types – A list of property names (iqa, multipole names) for which to write columns. If None, then writes out columns for all properties, defaults to None
args – positional arguments to pass to calculator function
kwargs – key word arguments to be passed to the feature calculator function
- Raises:
TypeError – This method only works for PointsDirectory instances because it needs access to AIMALL information. Does not work for Trajectory instances.
- features_with_wfn_energy_and_dE_df_to_csv(alf_list: List[ALF], central_atom_idx: int, str_to_append_to_fname: str = '_features_with_dE_df.csv', **kwargs)
Writes out a csv file containing wfn energy and FORCEs calculated for every feature. Note that the forces (dE/df_i) are the negative of the PES gradient, so for machine learning, the negative of these forces needs to be taken to add gradient information into GP models.
- Parameters:
system_alf – A list of ALF instances containing alf info
central_atom_idx (int) – The central atom which to center the alf on and for which dE/df will be calculated
str_to_append_to_fname (str, optional) – _description_, defaults to “_features_with_properties.csv”
- classmethod from_trajectory(trajectory_path: str | Path, system_name: str | None = None, every=1, center=True) PointsDirectory
Generate a PointsDirectory-type structure directory from a trajectory (.xyz) file
- Parameters:
trajectory_path – A str or Path to a .xyz file containing geometries
system_name – The name of the chemical system. This is going to be the name of the directory which will be created.
center – Whether to center the geometries on the centroid of the system. This is useful to prevent the molecule from translating in 3D space (and prevents issues with WFN files, where a very large x,y,z value (over 100) for the coordinates leads to **** being written in the .wfn file…)
- property natoms: int
Returns the number of atoms in the first timestep. Each timestep should have the same number of atoms.
- path: Path | str
- processed_data(processing_func, *args, **kwargs) dict
Processed data is some way, given any arguments and key words arguments, and returns a dictionary of the processed data, with keys Note that processing function can be any callable, i.e. closure, class, etc.
Note
The processing function must act on one PointDirectory.
- Parameters:
processing_func – Callable which is going to process ONE PointDirectory
args – Positional arguments to pass to processing func
- Returns:
A dictionary of processed data. Keys of the dictionary are the stem of each PointDirectory contained inside this PointsDirectory instance.
- properties(system_alf: List[ALF] | None = None, specific_property: str | None = None)
Get properties contained in the PointDirectory. IF no system alf is passed in, an automatic process to get C matrices is started.
- Parameters:
system_alf – Optional list of ALF instances that can be passed in to use a specific alf instead of automatically trying to compute it.
key – return only a specific key from the returned dictionary
- property raw_data: dict
Returns all raw data associated with the PointsDirectory instance. The key is the point name (of a PointDirectory instance) and value is the raw data associated with the one point.
- Returns:
A dictionary of raw data. Keys of the dictionary are the stem of each PointsDirectory contained inside this PointsDirectoryParent instance.
- property total_energy
Returns np array of wfn energies of all points
- Returns:
np array of total energy (in Hartree) for all points
- property types: List[str]
Returns the atom elements for atoms, assumes each timesteps has the same atoms. Removes duplicates.
- property types_extended: List[str]
Returns the atom elements for atoms, assumes each timesteps has the same atoms. Removes duplicates.
- property wfn_energy: ndarray
Returns np array of wfn energies of all points
- Returns:
np array of total energy (in Hartree) for all points
- write_to_json_database(root_path: str | ~pathlib.Path | None = None, datafunction: ~typing.Callable = <function get_data_for_point>, npoints_per_json=500, print_missing_data=True, indent: int = 2, separators=(', ', ':')) Path
Write out important information from a PointsDirectory instance to a json file.
- Parameters:
root_path – Name of directory which will the json database. This is a directory, which contains multiple directories inside. Each directory inside is one PointsDirectory. The reason for implementing like this is if using for multiple PointsDirectory-ies at once, so that data for each PointDirectory is written in a separate folder
datafunction – A function used to get all data for a single point. This data is going to get written to the json file.
npoints_per_json – Maximum number of geometries to write to one json file This is done so that the individual files do not become very large.
print_missing_data – Whether to print out any missing data from each PointDirectory contained in self, defaults to False
indent – integer representing number of spaces to indent, defaults to 2
separators – Separators used for each entry, default (“,”, “:”)
- Returns:
The path to the written json file
- write_to_sqlite3_database(db_path: str | Path | None = None, echo=False, print_missing_data=True) Path
Write out important information from a PointsDirectory instance to an SQLite3 database.
- Parameters:
db_path – database to write to
echo – Whether to print out SQL queries from SQL Alchemy
echo – Whether to print out SQL queries from SQL Alchemy, defaults to False
print_missing_data – Whether to print out any missing data from each PointDirectory contained in self, defaults to False
- Returns:
The path to the written SQL database
- class PointsDirectoryParent(path: Path | str)
Bases:
list,DirectoryShould wrap around multiple PointsDirectory-ies.
- path: Path | str
- processed_data(processing_func, *args, **kwargs) dict
Processed data is some way, given any arguments and key words arguments, and returns a dictionary of the processed data, with keys Note that processing function can be any callable, i.e. closure, class, etc.
Note
The processing function must act on one PointDirectory.
- Parameters:
processing_func – Callable which is going to process ONE PointDirectory
args – Positional arguments to pass to processing func
- Returns:
A dictionary of processed data. Keys of the dictionary are the stem of each PointsDirectory contained inside this PointsDirectoryParent instance.
- property raw_data: dict
Returns all raw data associated with the PointsDirectoryParent instance. The key is the points directory name (of a PointsDirectory instance) and value is another dictionary.
- Returns:
A dictionary of raw data. Keys of the dictionary are the stem of each PointsDirectory contained inside this PointsDirectoryParent instance.
- write_to_json_database(root_name: str | None = None, npoints_per_json=500, print_missing_data=True, indent: int = 2, separators=(',', ':')) List[Path]
Makes a database from multiple PointsDirectory-like directories which are contained in this PointsDirectoryParent
- Parameters:
root_name – The name of the database. If not selected, uses the name of the current PointsDirectoryParent, defaults to None
npoints_per_json – Number of json files in each sub-directory, defaults to 500
print_missing_data – Whether or not to print missing data, defaults to True
indent – json file indent, defaults to 2
separators – json file separators, defaults to (“,”, “:”)
- write_to_sqlite3_database(db_path: str | Path | None = None, echo=False, print_missing_data=True) Path
Write out important information from a PointsDirectory instance to an SQLite3 database. All PointsDirectory-like directories contained inside will be written to the same database.
- Parameters:
db_path – database to write to
echo – Whether to print out SQL queries from SQL Alchemy, defaults to False
print_missing_data – Whether to print out any missing data from each PointDirectory contained in self, defaults to False
- Returns:
The path to the written SQL database
- class PySCFDirectory(path: Path | str)
Bases:
AnnotatedDirectory- classmethod check_path(path: Path) bool
Implement if the path of the directory needs to be checked if it contains something specific
- contents = {'aimall_wfn': <class 'ichor.core.files.gaussian.wfn.WFN'>, 'morfi_wfn': <class 'ichor.core.files.pandora.pyscf_output.MorfiWFN'>}
- dirname = 'pyscf'
- path: Path | str
- class Trajectory(path: Path | str, *args, read_geometries=True, **kwargs)
Bases:
ReadFile,WriteFile,ListOfAtomsHandles .xyz files that have multiple timesteps, with each timestep giving the x y z coordinates of the atoms. A user can also initialize an empty trajectory and append
Atomsinstances to it without reading in a .xyz file. This allows the user to build custom trajectories containing any sort of geometries.- Parameters:
path – The path to a .xyz file that contains timesteps. Set to None by default as the user can initialize an empty trajectory and built it up themselves
read_geometries – If the trajectory file already exist on disk, but we do not to keep the geometries in it (i.e. we want to overwrite the original trajectory), then set to False.If kept as True (the default), calling the write() method twice will cause a second set of the geometries to be added to the original trajectory file.
- add(atoms)
Add a list of Atoms (corresponding to one timestep) to the end of the trajectory list
- alf(alf_calculator: Callable[[...], ALF], *args, **kwargs) List[ALF]
Returns the Atomic Local Frame (ALF) for all Atom instances that are held in Atoms. e.g.
[[0,1,2],[1,0,2], [2,0,1]]- Parameters:
args – positional arguments to pass to alf calculator
kwargs – key word arguments to pass to alf calculator
- alf_dict(alf_calculator: Callable[[...], ALF], *args, **kwargs) Dict[str, ALF]
Returns a dictionary with the atomic local frame indices for every atom (0-indexed).
- property atom_names
Return the atom names from the first timestep. Assumes that all timesteps have the same number of atoms/atom names.
- change_atom_ordering(new_traj_name: Path, new_atom_ordering: List[int])
Changes the atom ordering of the trajectory, given a list of how indices should be permuted and writes out a new trajectory file in the specified location.
- Parameters:
new_traj_name – Name of new trajectory file
new_atom_ordering – A list of indices telling how to permute the current trajectory The list is 0 indexed. The order of the list is the new order of the atoms. e.g. rearranging atoms C1, O2, H3 to H3, C1, O2 would be [2, 0, 1]
- connectivity(connectivity_calculator: Callable[[...], ndarray]) ndarray
Return the connectivity matrix
n_atoms x n_atomsfor the given Atoms instance.- Return type:
np.ndarray of shape n_atoms x n_atoms
- property coordinates: ndarray
Returns: :type: np.ndarray the xyz coordinates of all atoms for all timesteps. Shape n_timesteps x n_atoms x 3
- coordinates_to_xyz(fname: Path = PosixPath('system_to_xyz.xyz'), step: int = 1)
write a new .xyz file that contains the timestep i, as well as the coordinates of the atoms for that timestep.
- Parameters:
fname – The file name to which to write the timesteps/coordinates
step – Write coordinates for every n^th step. Default is 1, so writes coordinates for every step
- classmethod features_file_to_trajectory(f: Path, trajectory_path: Path, atom_types: List[str], header=0, index_col=0, sheet_name=0) Trajectory
Takes in a csv or excel file containing features and convert it to a Trajectory object. It assumes that the features start from the very first column (column after the index column, if one exists). Feature files that are written out by ichor are in Bohr instead of Angstroms for now.
After converting to cartesian coordinates, we have to convert Bohr to Angstroms because .xyz files are written out in Angstroms (and programs like Avogadro, VMD, etc. expect distances in angstroms). Failing to do that will result in xyz files that are in Bohr, so if features are calculated from them again, the features will be wrong.
- Parameters:
f – Path to the file (either .csv or .xlsx) containing the features. We only need the features for one atom to reconstruct the geometries, thus we only need 1 csv file or 1 sheet of an excel file. By default, the 0th sheet of the excel file is read in.
atom_types – A list of strings corresponding to the atom elements (C, O, H, etc.). This has to be ordered the same way as atoms corresponding to the features. Note that the central atom (for which features are given in the file) also needs to be present in this list as the very first atom.
header – The row index (0-indexed) of the line in the csv file which contains the names of the columns. Default is set to 0 to use the 0th row.
index_col – Whether a column should be used as the index column. Default is set to 0 to use 0th column. If no index column is present, set to False.
sheet_name – The excel sheet to be used to convert to xyz. Default is 0. This is only needed for excel files, not csv files.
Note
Ensure that the list of atom names is correct, i.e. that it contains the central atom as the very first atom, and the following atoms are in the ordering that is in the file containing the features.
- property natoms
Returns the number of atoms in the first timestep. Each timestep should have the same number of atoms.
- classmethod np_array_to_trajectory(arr: ndarray, trajectory_path: str | Path, atom_types: List[str])
Creates a Trajectory instance from a np.ndarray object
- Parameters:
arr – np.ndarray containing features.This should be a 2D array of shape n_timesteps x n_features
trajectory_path – The path associated with the trajectory instance which is made
atom_types – A list of atom types (elements) that correspond to the features in the given array. It is important that they are the same order as in the np.ndarray.
- Returns:
Trajectory instance containing xyz geometries converted from features
- path: Path | str
- rmsd(ref=None)
- split_packmol_trajectory(atoms_per_molecule: int, trajectory_name='packmol_traj_split.xyz')
Used to create packmol inputs
- split_traj(root_dir: Path = PosixPath('split_trajectory'), split_size: int = 1000)
Splits trajectory into sub-trajectories and writes then to a folder. Eg. a 10,000 original trajectory can be split into 10 sub-trajectories containing 1,000 geometries each (given a split size of 1,000).
- Parameters:
root_dir – The folder to write sub-trajectories to. Must be a Path object and this directory will be created internally.
split_size – The split size by which to split original trajectory.
- to_dir(system_name: str, every: int = 1, center: bool = False, parent_dir: Path | None = None) Path
Writes out every nth timestep to a separate .xyz file to a given directory
- Parameters:
system_name – The name of the system. This will be the name of the given directory, with a suffix added. Default suffix is PointsDirectory._suffix
every – An integer value that indicates the nth step at which an xyz file should be written. Default is 1. If a value eg. 5 is given, then it will only write out a .xyz file for every 5th timestep.
center – Whether or not to subtract mean of coordinates from atomic coordinates, defaults to False
parent_dir – A path to a parent directory where the inner directory will be created.
- Returns:
The Path object to the made directory
- to_dirs(system_name: str, split_size: int = 1000, every: int = 1, center=False, parent_dir: Path | None = None) Path
Writes out every nth timestep to a separate .xyz file. This method differs from to_dir because it has a structure system_name_root / points_directory / xyz file. I.e. there is an additional root directory which encapsulates all the PointsDirectory-like directories.
- Parameters:
system_name – The name of the system. This will be used in the names of the files and directories as well
split_size – How many .xyz files are going to be in each of the inner PointsDirectory-like directories
every – An integer value that indicates the nth step at which an xyz file should be written. Default is 1. If a value eg. 5 is given, then it will only write out a .xyz file for every 5th timestep.
- Returns:
The Path object to the made parent directory
- to_multiple_parent_dirs(system_name: str, split_size: int = 1000, nsplits_in_root: int = 5, every: int = 1, center=False)
Splits a trajectory into multiple parent directories, each of which can contain multiple PointsDirectory-like directories.
- Parameters:
system_name – name of system. This name will be used in the names of the files and directories which are made
split_size – The number of .xyz files that inner PointsDirectory-like directory will contain, default 1000
nsplits_in_root – The number of splits that are going to be in one root directory, default 5 This would mean that there are 5 x 1000 geometries in that root directory.
every – An integer value that indicates the nth step at which an xyz file should be written, defaults to 1
center – whether or not to subtract centroid of geometry before writing out xyz. Useful if geometries are far away from the origin which can result in Gaussian failing to write outputs properly, defaults to False
- property types
Returns the atom elements for atoms, assumes each timesteps has the same atoms. Removes duplicates.
- property types_extended
Returns the atom elements for atoms, assumes each timesteps has the same atoms. Removes duplicates.
- class WFN(path: Path | str, method: str | None = None)
Bases:
HasAtoms,HasData,ReadFile,WriteFileWraps around a .wfn file that is the output of Gaussian. The .wfn file is an output file, but must also implement a write method because AIMAll needs to know the method used in the WFN calculation, otherwise AIMAll can give the wrong results.
- Parameters:
path – Path object or string to the .wfn file
atoms – an Atoms instance which is read in from the top of the .wfn file. Note that the units of the .wfn file are in Bohr.
method – The method (eg. B3LYP) which was used in the Gaussian calculation that created the .wfn file. The method is not initially written to the .wfn file by Gaussian, but it is necessary to add it to the .wfn file because AIMAll does not automatically determine the method itself, so it can lead to wrong IQA/multipole moments results. To make sure AIMAll results are correct, the method is a required argument.
- Variables:
mol_orbitals – The number of molecular orbitals to be read in from the .wfn file.
primitives – The number of primitives to be read in from the .wfn file.
nuclei – The number of nuclei in the system to be read in from the .wfn file.
energy – The molecular energy read in from the bottom of the .wfn file
virial – The virial read in from the bottom of the .wfn file
Note
Since the wfn file is written out by Gaussian, we do not really have to modify it when writing out except we need to add the method used, so that AIMALL can use the correct method. Otherwise AIMAll assumes Hartree-Fock was used, which might be wrong.
- path: Path | str
- property raw_data: Dict[str, float]
Returns the raw data associated with the current object.
- Returns:
_description_
- Return type:
dict
- class WFX(path: Path | str, method: str | None = None)
Bases:
HasAtoms,HasData,ReadFileWraps around a .wfn file that is the output of Gaussian. The .wfn file is an output file, so it does not have a write method.
- Parameters:
path – Path object or string to the .wfn file
atoms – an Atoms instance which is read in from the top of the .wfn file. Note that the units of the .wfn file are in Bohr.
method – The method (eg. B3LYP) which was used in the Gaussian calculation that created the .wfn file. The method is not initially written to the .wfn file by Gaussian, but it is necessary to add it to the .wfn file because AIMAll does not automatically determine the method itself, so it can lead to wrong IQA/multipole moments results. To make sure AIMAll results are correct, the method is a required argument.
- Variables:
mol_orbitals – The number of molecular orbitals to be read in from the .wfn file.
primitives – The number of primitives to be read in from the .wfn file.
nuclei – The number of nuclei in the system to be read in from the .wfn file.
energy – The molecular energy read in from the bottom of the .wfn file
virial – The virial read in from the bottom of the .wfn file
Note
Since the wfn file is written out by Gaussian, we do not really have to modify it when writing out except we need to add the method used, so that AIMALL can use the correct method. Otherwise AIMAll assumes Hartree-Fock was used, which might be wrong.
- path: Path | str
- property properties: Dict[str, float]
- class XTB(path: Path | str, input_xyz_path: Path | str, output_xyz_path: Path | str, traj_path: Path | str, log_path: Path | str, input_xtb_path: Path | str, method: str | None = None, solvent: str | None = None, electronic_temperature: int | None = None, max_iterations: int | None = None, fmax: float | None = None)
-
- path: Path | str
- set_write_defaults_if_needed()
- class XYZ(path: Path | str, atoms: Atoms | None = None)
Bases:
HasAtoms,ReadFile,WriteFile,FileA class which wraps around a .xyz file that is contained in each PointDirectory. This .xyz file should always be there and it is used to write out .gjf files. Each instance of XYZ only has one geometry. If there is a need to read a .xyz file that contains multiple geometries (i.e. a trajectory file), the use the Trajectory class.
- Parameters:
path – The path to an .xyz file
atoms – Optional list of Atoms which can be used to construct a .xyz file. If a list of atoms is passed, then a new xyz file with the given Atoms will be written to the given Path.
- path: Path | str