ichor.core.files.xyz package

Submodules

ichor.core.files.xyz.trajectory module

class Trajectory(path: Path | str, *args, **kwargs)

Bases: ReadFile, WriteFile, ListOfAtoms

Handles .xyz files that have multiple timesteps, with each timestep giving the x y z coordinates of the atoms. A user can also initialize an empty trajectory and append Atoms instances to it without reading in a .xyz file. This allows the user to build custom trajectories containing any sort of geometries.

Parameters:: path – The path to a .xyz file that contains timesteps. Set to None by default as the user can initialize an empty trajectory and built it up themselves

add(atoms): Add a list of Atoms (corresponding to one timestep) to the end of the trajectory list

alf(alf_calculator: Callable[[...], ALF], *args, **kwargs) → List[ALF]

Returns the Atomic Local Frame (ALF) for all Atom instances that are held in Atoms. e.g. [[0,1,2],[1,0,2], [2,0,1]]

Parameters:

args – positional arguments to pass to alf calculator
kwargs – key word arguments to pass to alf calculator

alf_dict(alf_calculator: Callable[[...], ALF], *args, **kwargs) → Dict[str, ALF]: Returns a dictionary with the atomic local frame indices for every atom (0-indexed).

property atom_names: Return the atom names from the first timestep. Assumes that all timesteps have the same number of atoms/atom names.

change_atom_ordering(new_traj_name: Path, new_atom_ordering: List[int])

Changes the atom ordering of the trajectory, given a list of how indices should be permuted and writes out a new trajectory file in the specified location.

Parameters:

new_traj_name – Name of new trajectory file
new_atom_ordering – A list of indices telling how to permute the current trajectory

connectivity(connectivity_calculator: Callable[[...], ndarray]) → ndarray

Return the connectivity matrix n_atoms x n_atoms for the given Atoms instance.

Return type:: np.ndarray of shape n_atoms x n_atoms

property coordinates: ndarray: Returns: :type: np.ndarray the xyz coordinates of all atoms for all timesteps. Shape n_timesteps x n_atoms x 3

coordinates_to_xyz(fname: Path = PosixPath('system_to_xyz.xyz'), step: int = 1)

write a new .xyz file that contains the timestep i, as well as the coordinates of the atoms for that timestep.

Parameters:

fname – The file name to which to write the timesteps/coordinates
step – Write coordinates for every n^th step. Default is 1, so writes coordinates for every step

classmethod features_file_to_trajectory(f: Path, trajectory_path: Path, atom_types: List[str], header=0, index_col=0, sheet_name=0) → Trajectory

Takes in a csv or excel file containing features and convert it to a Trajectory object. It assumes that the features start from the very first column (column after the index column, if one exists). Feature files that are written out by ichor are in Bohr instead of Angstroms for now.

After converting to cartesian coordinates, we have to convert Bohr to Angstroms because .xyz files are written out in Angstroms (and programs like Avogadro, VMD, etc. expect distances in angstroms). Failing to do that will result in xyz files that are in Bohr, so if features are calculated from them again, the features will be wrong.

Parameters:

f – Path to the file (either .csv or .xlsx) containing the features. We only need the features for one atom to reconstruct the geometries, thus we only need 1 csv file or 1 sheet of an excel file. By default, the 0th sheet of the excel file is read in.
atom_types – A list of strings corresponding to the atom elements (C, O, H, etc.). This has to be ordered the same way as atoms corresponding to the features. Note that the central atom (for which features are given in the file) also needs to be present in this list as the very first atom.
header – The row index (0-indexed) of the line in the csv file which contains the names of the columns. Default is set to 0 to use the 0th row.
index_col – Whether a column should be used as the index column. Default is set to 0 to use 0th column. If no index column is present, set to False.
sheet_name – The excel sheet to be used to convert to xyz. Default is 0. This is only needed for excel files, not csv files.

Note

Ensure that the list of atom names is correct, i.e. that it contains the central atom as the very first atom, and the following atoms are in the ordering that is in the file containing the features.

property natoms: Returns the number of atoms in the first timestep. Each timestep should have the same number of atoms.

classmethod np_array_to_trajectory(arr: ndarray, trajectory_path: str | Path, atom_types: List[str])

Creates a Trajectory instance from a np.ndarray object

Parameters:

arr – np.ndarray containing features.This should be a 2D array of shape n_timesteps x n_features
trajectory_path – The path associated with the trajectory instance which is made
atom_types – A list of atom types (elements) that correspond to the features in the given array. It is important that they are the same order as in the np.ndarray.

Returns:

Trajectory instance containing xyz geometries converted from features

path: Path | str

rmsd(ref=None)

split_packmol_trajectory(atoms_per_molecule: int, trajectory_name='packmol_traj_split.xyz'): Used to create packmol inputs

split_traj(root_dir: Path = PosixPath('split_trajectory'), split_size: int = 1000)

Splits trajectory into sub-trajectories and writes then to a folder. Eg. a 10,000 original trajectory can be split into 10 sub-trajectories containing 1,000 geometries each (given a split size of 1,000).

Parameters:

root_dir – The folder to write sub-trajectories to. Must be a Path object and this directory will be created internally.
split_size – The split size by which to split original trajectory.

to_dir(system_name: str, every: int = 1, center: bool = False, parent_dir: Path | None = None) → Path

Writes out every nth timestep to a separate .xyz file to a given directory

Parameters:

system_name – The name of the system. This will be the name of the given directory, with a suffix added. Default suffix is PointsDirectory._suffix
every – An integer value that indicates the nth step at which an xyz file should be written. Default is 1. If a value eg. 5 is given, then it will only write out a .xyz file for every 5th timestep.
center – Whether or not to subtract mean of coordinates from atomic coordinates, defaults to False
parent_dir – A path to a parent directory where the inner directory will be created.

Returns:

The Path object to the made directory

to_dirs(system_name: str, split_size: int = 1000, every: int = 1, center=False) → Path

Writes out every nth timestep to a separate .xyz file. This method differs from to_dir because it has a structure system_name_root / points_directory / xyz file. I.e. there is an additional root directory which encapsulates all the PointsDirectory-like directories.

Parameters:

system_name – The name of the system. This will be used in the names of the files and directories as well
split_size – How many .xyz files are going to be in each of the inner PointsDirectory-like directories
every – An integer value that indicates the nth step at which an xyz file should be written. Default is 1. If a value eg. 5 is given, then it will only write out a .xyz file for every 5th timestep.

Returns:

The Path object to the made parent directory

to_multiple_parent_dirs(system_name: str, split_size: int = 1000, nsplits_in_root: int = 5, every: int = 1, center=False)

Splits a trajectory into multiple parent directories, each of which can contain multiple PointsDirectory-like directories.

Parameters:

system_name – name of system. This name will be used in the names of the files and directories which are made
split_size – The number of .xyz files that inner PointsDirectory-like directory will contain, default 1000
nsplits_in_root – The number of splits that are going to be in one root directory, default 5 This would mean that there are 5 x 1000 geometries in that root directory.
every – An integer value that indicates the nth step at which an xyz file should be written, defaults to 1
center – whether or not to subtract centroid of geometry before writing out xyz. Useful if geometries are far away from the origin which can result in Gaussian failing to write outputs properly, defaults to False

property types: Returns the atom elements for atoms, assumes each timesteps has the same atoms. Removes duplicates.

property types_extended: Returns the atom elements for atoms, assumes each timesteps has the same atoms. Removes duplicates.

ichor.core.files.xyz.xyz module

class XYZ(path: Path | str, atoms: Atoms | None = None)

Bases: HasAtoms, ReadFile, WriteFile, File

A class which wraps around a .xyz file that is contained in each PointDirectory. This .xyz file should always be there and it is used to write out .gjf files. Each instance of XYZ only has one geometry. If there is a need to read a .xyz file that contains multiple geometries (i.e. a trajectory file), the use the Trajectory class.

Parameters:

path – The path to an .xyz file
atoms – Optional list of Atoms which can be used to construct a .xyz file. If a list of atoms is passed, then a new xyz file with the given Atoms will be written to the given Path.

path: Path | str

Module contents

class Trajectory(path: Path | str, *args, **kwargs)

Bases: ReadFile, WriteFile, ListOfAtoms

Handles .xyz files that have multiple timesteps, with each timestep giving the x y z coordinates of the atoms. A user can also initialize an empty trajectory and append Atoms instances to it without reading in a .xyz file. This allows the user to build custom trajectories containing any sort of geometries.

Parameters:: path – The path to a .xyz file that contains timesteps. Set to None by default as the user can initialize an empty trajectory and built it up themselves

add(atoms): Add a list of Atoms (corresponding to one timestep) to the end of the trajectory list

alf(alf_calculator: Callable[[...], ALF], *args, **kwargs) → List[ALF]

Returns the Atomic Local Frame (ALF) for all Atom instances that are held in Atoms. e.g. [[0,1,2],[1,0,2], [2,0,1]]

Parameters:

args – positional arguments to pass to alf calculator
kwargs – key word arguments to pass to alf calculator

alf_dict(alf_calculator: Callable[[...], ALF], *args, **kwargs) → Dict[str, ALF]: Returns a dictionary with the atomic local frame indices for every atom (0-indexed).

property atom_names: Return the atom names from the first timestep. Assumes that all timesteps have the same number of atoms/atom names.

change_atom_ordering(new_traj_name: Path, new_atom_ordering: List[int])

Changes the atom ordering of the trajectory, given a list of how indices should be permuted and writes out a new trajectory file in the specified location.

Parameters:

new_traj_name – Name of new trajectory file
new_atom_ordering – A list of indices telling how to permute the current trajectory

connectivity(connectivity_calculator: Callable[[...], ndarray]) → ndarray

Return the connectivity matrix n_atoms x n_atoms for the given Atoms instance.

Return type:: np.ndarray of shape n_atoms x n_atoms

property coordinates: ndarray: Returns: :type: np.ndarray the xyz coordinates of all atoms for all timesteps. Shape n_timesteps x n_atoms x 3

coordinates_to_xyz(fname: Path = PosixPath('system_to_xyz.xyz'), step: int = 1)

write a new .xyz file that contains the timestep i, as well as the coordinates of the atoms for that timestep.

Parameters:

fname – The file name to which to write the timesteps/coordinates
step – Write coordinates for every n^th step. Default is 1, so writes coordinates for every step

classmethod features_file_to_trajectory(f: Path, trajectory_path: Path, atom_types: List[str], header=0, index_col=0, sheet_name=0) → Trajectory

Takes in a csv or excel file containing features and convert it to a Trajectory object. It assumes that the features start from the very first column (column after the index column, if one exists). Feature files that are written out by ichor are in Bohr instead of Angstroms for now.

After converting to cartesian coordinates, we have to convert Bohr to Angstroms because .xyz files are written out in Angstroms (and programs like Avogadro, VMD, etc. expect distances in angstroms). Failing to do that will result in xyz files that are in Bohr, so if features are calculated from them again, the features will be wrong.

Parameters:

f – Path to the file (either .csv or .xlsx) containing the features. We only need the features for one atom to reconstruct the geometries, thus we only need 1 csv file or 1 sheet of an excel file. By default, the 0th sheet of the excel file is read in.
atom_types – A list of strings corresponding to the atom elements (C, O, H, etc.). This has to be ordered the same way as atoms corresponding to the features. Note that the central atom (for which features are given in the file) also needs to be present in this list as the very first atom.
header – The row index (0-indexed) of the line in the csv file which contains the names of the columns. Default is set to 0 to use the 0th row.
index_col – Whether a column should be used as the index column. Default is set to 0 to use 0th column. If no index column is present, set to False.
sheet_name – The excel sheet to be used to convert to xyz. Default is 0. This is only needed for excel files, not csv files.

Note

Ensure that the list of atom names is correct, i.e. that it contains the central atom as the very first atom, and the following atoms are in the ordering that is in the file containing the features.

property natoms: Returns the number of atoms in the first timestep. Each timestep should have the same number of atoms.

classmethod np_array_to_trajectory(arr: ndarray, trajectory_path: str | Path, atom_types: List[str])

Creates a Trajectory instance from a np.ndarray object

Parameters:

arr – np.ndarray containing features.This should be a 2D array of shape n_timesteps x n_features
trajectory_path – The path associated with the trajectory instance which is made
atom_types – A list of atom types (elements) that correspond to the features in the given array. It is important that they are the same order as in the np.ndarray.

Returns:

Trajectory instance containing xyz geometries converted from features

path: Path | str

rmsd(ref=None)

split_packmol_trajectory(atoms_per_molecule: int, trajectory_name='packmol_traj_split.xyz'): Used to create packmol inputs

split_traj(root_dir: Path = PosixPath('split_trajectory'), split_size: int = 1000)

Splits trajectory into sub-trajectories and writes then to a folder. Eg. a 10,000 original trajectory can be split into 10 sub-trajectories containing 1,000 geometries each (given a split size of 1,000).

Parameters:

root_dir – The folder to write sub-trajectories to. Must be a Path object and this directory will be created internally.
split_size – The split size by which to split original trajectory.

to_dir(system_name: str, every: int = 1, center: bool = False, parent_dir: Path | None = None) → Path

Writes out every nth timestep to a separate .xyz file to a given directory

Parameters:

system_name – The name of the system. This will be the name of the given directory, with a suffix added. Default suffix is PointsDirectory._suffix
every – An integer value that indicates the nth step at which an xyz file should be written. Default is 1. If a value eg. 5 is given, then it will only write out a .xyz file for every 5th timestep.
center – Whether or not to subtract mean of coordinates from atomic coordinates, defaults to False
parent_dir – A path to a parent directory where the inner directory will be created.

Returns:

The Path object to the made directory

to_dirs(system_name: str, split_size: int = 1000, every: int = 1, center=False) → Path

Writes out every nth timestep to a separate .xyz file. This method differs from to_dir because it has a structure system_name_root / points_directory / xyz file. I.e. there is an additional root directory which encapsulates all the PointsDirectory-like directories.

Parameters:

system_name – The name of the system. This will be used in the names of the files and directories as well
split_size – How many .xyz files are going to be in each of the inner PointsDirectory-like directories
every – An integer value that indicates the nth step at which an xyz file should be written. Default is 1. If a value eg. 5 is given, then it will only write out a .xyz file for every 5th timestep.

Returns:

The Path object to the made parent directory

to_multiple_parent_dirs(system_name: str, split_size: int = 1000, nsplits_in_root: int = 5, every: int = 1, center=False)

Splits a trajectory into multiple parent directories, each of which can contain multiple PointsDirectory-like directories.

Parameters:

system_name – name of system. This name will be used in the names of the files and directories which are made
split_size – The number of .xyz files that inner PointsDirectory-like directory will contain, default 1000
nsplits_in_root – The number of splits that are going to be in one root directory, default 5 This would mean that there are 5 x 1000 geometries in that root directory.
every – An integer value that indicates the nth step at which an xyz file should be written, defaults to 1
center – whether or not to subtract centroid of geometry before writing out xyz. Useful if geometries are far away from the origin which can result in Gaussian failing to write outputs properly, defaults to False

property types: Returns the atom elements for atoms, assumes each timesteps has the same atoms. Removes duplicates.

property types_extended: Returns the atom elements for atoms, assumes each timesteps has the same atoms. Removes duplicates.

class XYZ(path: Path | str, atoms: Atoms | None = None)

Bases: HasAtoms, ReadFile, WriteFile, File

A class which wraps around a .xyz file that is contained in each PointDirectory. This .xyz file should always be there and it is used to write out .gjf files. Each instance of XYZ only has one geometry. If there is a need to read a .xyz file that contains multiple geometries (i.e. a trajectory file), the use the Trajectory class.

Parameters:

path – The path to an .xyz file
atoms – Optional list of Atoms which can be used to construct a .xyz file. If a list of atoms is passed, then a new xyz file with the given Atoms will be written to the given Path.

path: Path | str