ichor.core.files.xyz package

Submodules

ichor.core.files.xyz.trajectory module

class Trajectory(path: Path | str, *args, **kwargs)

Bases: ReadFile, WriteFile, ListOfAtoms

Handles .xyz files that have multiple timesteps, with each timestep giving the x y z coordinates of the atoms. A user can also initialize an empty trajectory and append Atoms instances to it without reading in a .xyz file. This allows the user to build custom trajectories containing any sort of geometries.

Parameters:

path – The path to a .xyz file that contains timesteps. Set to None by default as the user can initialize an empty trajectory and built it up themselves

add(atoms)

Add a list of Atoms (corresponding to one timestep) to the end of the trajectory list

alf(alf_calculator: Callable[[...], ALF], *args, **kwargs) List[ALF]

Returns the Atomic Local Frame (ALF) for all Atom instances that are held in Atoms. e.g. [[0,1,2],[1,0,2], [2,0,1]]

Parameters:
  • args – positional arguments to pass to alf calculator

  • kwargs – key word arguments to pass to alf calculator

alf_dict(alf_calculator: Callable[[...], ALF], *args, **kwargs) Dict[str, ALF]

Returns a dictionary with the atomic local frame indices for every atom (0-indexed).

property atom_names

Return the atom names from the first timestep. Assumes that all timesteps have the same number of atoms/atom names.

change_atom_ordering(new_traj_name: Path, new_atom_ordering: List[int])

Changes the atom ordering of the trajectory, given a list of how indices should be permuted and writes out a new trajectory file in the specified location.

Parameters:
  • new_traj_name – Name of new trajectory file

  • new_atom_ordering – A list of indices telling how to permute the current trajectory

connectivity(connectivity_calculator: Callable[[...], ndarray]) ndarray

Return the connectivity matrix n_atoms x n_atoms for the given Atoms instance.

Return type:

np.ndarray of shape n_atoms x n_atoms

property coordinates: ndarray

Returns: :type: np.ndarray the xyz coordinates of all atoms for all timesteps. Shape n_timesteps x n_atoms x 3

coordinates_to_xyz(fname: Path = PosixPath('system_to_xyz.xyz'), step: int = 1)

write a new .xyz file that contains the timestep i, as well as the coordinates of the atoms for that timestep.

Parameters:
  • fname – The file name to which to write the timesteps/coordinates

  • step – Write coordinates for every n^th step. Default is 1, so writes coordinates for every step

classmethod features_file_to_trajectory(f: Path, trajectory_path: Path, atom_types: List[str], header=0, index_col=0, sheet_name=0) Trajectory

Takes in a csv or excel file containing features and convert it to a Trajectory object. It assumes that the features start from the very first column (column after the index column, if one exists). Feature files that are written out by ichor are in Bohr instead of Angstroms for now.

After converting to cartesian coordinates, we have to convert Bohr to Angstroms because .xyz files are written out in Angstroms (and programs like Avogadro, VMD, etc. expect distances in angstroms). Failing to do that will result in xyz files that are in Bohr, so if features are calculated from them again, the features will be wrong.

Parameters:
  • f – Path to the file (either .csv or .xlsx) containing the features. We only need the features for one atom to reconstruct the geometries, thus we only need 1 csv file or 1 sheet of an excel file. By default, the 0th sheet of the excel file is read in.

  • atom_types – A list of strings corresponding to the atom elements (C, O, H, etc.). This has to be ordered the same way as atoms corresponding to the features. Note that the central atom (for which features are given in the file) also needs to be present in this list as the very first atom.

  • header – The row index (0-indexed) of the line in the csv file which contains the names of the columns. Default is set to 0 to use the 0th row.

  • index_col – Whether a column should be used as the index column. Default is set to 0 to use 0th column. If no index column is present, set to False.

  • sheet_name – The excel sheet to be used to convert to xyz. Default is 0. This is only needed for excel files, not csv files.

Note

Ensure that the list of atom names is correct, i.e. that it contains the central atom as the very first atom, and the following atoms are in the ordering that is in the file containing the features.

property natoms

Returns the number of atoms in the first timestep. Each timestep should have the same number of atoms.

classmethod np_array_to_trajectory(arr: ndarray, trajectory_path: str | Path, atom_types: List[str])

Creates a Trajectory instance from a np.ndarray object

Parameters:
  • arr – np.ndarray containing features.This should be a 2D array of shape n_timesteps x n_features

  • trajectory_path – The path associated with the trajectory instance which is made

  • atom_types – A list of atom types (elements) that correspond to the features in the given array. It is important that they are the same order as in the np.ndarray.

Returns:

Trajectory instance containing xyz geometries converted from features

path: Path | str
rmsd(ref=None)
split_packmol_trajectory(atoms_per_molecule: int, trajectory_name='packmol_traj_split.xyz')

Used to create packmol inputs

split_traj(root_dir: Path = PosixPath('split_trajectory'), split_size: int = 1000)

Splits trajectory into sub-trajectories and writes then to a folder. Eg. a 10,000 original trajectory can be split into 10 sub-trajectories containing 1,000 geometries each (given a split size of 1,000).

Parameters:
  • root_dir – The folder to write sub-trajectories to. Must be a Path object and this directory will be created internally.

  • split_size – The split size by which to split original trajectory.

to_dir(system_name: str, every: int = 1, center: bool = False, parent_dir: Path | None = None) Path

Writes out every nth timestep to a separate .xyz file to a given directory

Parameters:
  • system_name – The name of the system. This will be the name of the given directory, with a suffix added. Default suffix is PointsDirectory._suffix

  • every – An integer value that indicates the nth step at which an xyz file should be written. Default is 1. If a value eg. 5 is given, then it will only write out a .xyz file for every 5th timestep.

  • center – Whether or not to subtract mean of coordinates from atomic coordinates, defaults to False

  • parent_dir – A path to a parent directory where the inner directory will be created.

Returns:

The Path object to the made directory

to_dirs(system_name: str, split_size: int = 1000, every: int = 1, center=False) Path

Writes out every nth timestep to a separate .xyz file. This method differs from to_dir because it has a structure system_name_root / points_directory / xyz file. I.e. there is an additional root directory which encapsulates all the PointsDirectory-like directories.

Parameters:
  • system_name – The name of the system. This will be used in the names of the files and directories as well

  • split_size – How many .xyz files are going to be in each of the inner PointsDirectory-like directories

  • every – An integer value that indicates the nth step at which an xyz file should be written. Default is 1. If a value eg. 5 is given, then it will only write out a .xyz file for every 5th timestep.

Returns:

The Path object to the made parent directory

to_multiple_parent_dirs(system_name: str, split_size: int = 1000, nsplits_in_root: int = 5, every: int = 1, center=False)

Splits a trajectory into multiple parent directories, each of which can contain multiple PointsDirectory-like directories.

Parameters:
  • system_name – name of system. This name will be used in the names of the files and directories which are made

  • split_size – The number of .xyz files that inner PointsDirectory-like directory will contain, default 1000

  • nsplits_in_root – The number of splits that are going to be in one root directory, default 5 This would mean that there are 5 x 1000 geometries in that root directory.

  • every – An integer value that indicates the nth step at which an xyz file should be written, defaults to 1

  • center – whether or not to subtract centroid of geometry before writing out xyz. Useful if geometries are far away from the origin which can result in Gaussian failing to write outputs properly, defaults to False

property types

Returns the atom elements for atoms, assumes each timesteps has the same atoms. Removes duplicates.

property types_extended

Returns the atom elements for atoms, assumes each timesteps has the same atoms. Removes duplicates.

ichor.core.files.xyz.xyz module

class XYZ(path: Path | str, atoms: Atoms | None = None)

Bases: HasAtoms, ReadFile, WriteFile, File

A class which wraps around a .xyz file that is contained in each PointDirectory. This .xyz file should always be there and it is used to write out .gjf files. Each instance of XYZ only has one geometry. If there is a need to read a .xyz file that contains multiple geometries (i.e. a trajectory file), the use the Trajectory class.

Parameters:
  • path – The path to an .xyz file

  • atoms – Optional list of Atoms which can be used to construct a .xyz file. If a list of atoms is passed, then a new xyz file with the given Atoms will be written to the given Path.

path: Path | str

Module contents

class Trajectory(path: Path | str, *args, **kwargs)

Bases: ReadFile, WriteFile, ListOfAtoms

Handles .xyz files that have multiple timesteps, with each timestep giving the x y z coordinates of the atoms. A user can also initialize an empty trajectory and append Atoms instances to it without reading in a .xyz file. This allows the user to build custom trajectories containing any sort of geometries.

Parameters:

path – The path to a .xyz file that contains timesteps. Set to None by default as the user can initialize an empty trajectory and built it up themselves

add(atoms)

Add a list of Atoms (corresponding to one timestep) to the end of the trajectory list

alf(alf_calculator: Callable[[...], ALF], *args, **kwargs) List[ALF]

Returns the Atomic Local Frame (ALF) for all Atom instances that are held in Atoms. e.g. [[0,1,2],[1,0,2], [2,0,1]]

Parameters:
  • args – positional arguments to pass to alf calculator

  • kwargs – key word arguments to pass to alf calculator

alf_dict(alf_calculator: Callable[[...], ALF], *args, **kwargs) Dict[str, ALF]

Returns a dictionary with the atomic local frame indices for every atom (0-indexed).

property atom_names

Return the atom names from the first timestep. Assumes that all timesteps have the same number of atoms/atom names.

change_atom_ordering(new_traj_name: Path, new_atom_ordering: List[int])

Changes the atom ordering of the trajectory, given a list of how indices should be permuted and writes out a new trajectory file in the specified location.

Parameters:
  • new_traj_name – Name of new trajectory file

  • new_atom_ordering – A list of indices telling how to permute the current trajectory

connectivity(connectivity_calculator: Callable[[...], ndarray]) ndarray

Return the connectivity matrix n_atoms x n_atoms for the given Atoms instance.

Return type:

np.ndarray of shape n_atoms x n_atoms

property coordinates: ndarray

Returns: :type: np.ndarray the xyz coordinates of all atoms for all timesteps. Shape n_timesteps x n_atoms x 3

coordinates_to_xyz(fname: Path = PosixPath('system_to_xyz.xyz'), step: int = 1)

write a new .xyz file that contains the timestep i, as well as the coordinates of the atoms for that timestep.

Parameters:
  • fname – The file name to which to write the timesteps/coordinates

  • step – Write coordinates for every n^th step. Default is 1, so writes coordinates for every step

classmethod features_file_to_trajectory(f: Path, trajectory_path: Path, atom_types: List[str], header=0, index_col=0, sheet_name=0) Trajectory

Takes in a csv or excel file containing features and convert it to a Trajectory object. It assumes that the features start from the very first column (column after the index column, if one exists). Feature files that are written out by ichor are in Bohr instead of Angstroms for now.

After converting to cartesian coordinates, we have to convert Bohr to Angstroms because .xyz files are written out in Angstroms (and programs like Avogadro, VMD, etc. expect distances in angstroms). Failing to do that will result in xyz files that are in Bohr, so if features are calculated from them again, the features will be wrong.

Parameters:
  • f – Path to the file (either .csv or .xlsx) containing the features. We only need the features for one atom to reconstruct the geometries, thus we only need 1 csv file or 1 sheet of an excel file. By default, the 0th sheet of the excel file is read in.

  • atom_types – A list of strings corresponding to the atom elements (C, O, H, etc.). This has to be ordered the same way as atoms corresponding to the features. Note that the central atom (for which features are given in the file) also needs to be present in this list as the very first atom.

  • header – The row index (0-indexed) of the line in the csv file which contains the names of the columns. Default is set to 0 to use the 0th row.

  • index_col – Whether a column should be used as the index column. Default is set to 0 to use 0th column. If no index column is present, set to False.

  • sheet_name – The excel sheet to be used to convert to xyz. Default is 0. This is only needed for excel files, not csv files.

Note

Ensure that the list of atom names is correct, i.e. that it contains the central atom as the very first atom, and the following atoms are in the ordering that is in the file containing the features.

property natoms

Returns the number of atoms in the first timestep. Each timestep should have the same number of atoms.

classmethod np_array_to_trajectory(arr: ndarray, trajectory_path: str | Path, atom_types: List[str])

Creates a Trajectory instance from a np.ndarray object

Parameters:
  • arr – np.ndarray containing features.This should be a 2D array of shape n_timesteps x n_features

  • trajectory_path – The path associated with the trajectory instance which is made

  • atom_types – A list of atom types (elements) that correspond to the features in the given array. It is important that they are the same order as in the np.ndarray.

Returns:

Trajectory instance containing xyz geometries converted from features

path: Path | str
rmsd(ref=None)
split_packmol_trajectory(atoms_per_molecule: int, trajectory_name='packmol_traj_split.xyz')

Used to create packmol inputs

split_traj(root_dir: Path = PosixPath('split_trajectory'), split_size: int = 1000)

Splits trajectory into sub-trajectories and writes then to a folder. Eg. a 10,000 original trajectory can be split into 10 sub-trajectories containing 1,000 geometries each (given a split size of 1,000).

Parameters:
  • root_dir – The folder to write sub-trajectories to. Must be a Path object and this directory will be created internally.

  • split_size – The split size by which to split original trajectory.

to_dir(system_name: str, every: int = 1, center: bool = False, parent_dir: Path | None = None) Path

Writes out every nth timestep to a separate .xyz file to a given directory

Parameters:
  • system_name – The name of the system. This will be the name of the given directory, with a suffix added. Default suffix is PointsDirectory._suffix

  • every – An integer value that indicates the nth step at which an xyz file should be written. Default is 1. If a value eg. 5 is given, then it will only write out a .xyz file for every 5th timestep.

  • center – Whether or not to subtract mean of coordinates from atomic coordinates, defaults to False

  • parent_dir – A path to a parent directory where the inner directory will be created.

Returns:

The Path object to the made directory

to_dirs(system_name: str, split_size: int = 1000, every: int = 1, center=False) Path

Writes out every nth timestep to a separate .xyz file. This method differs from to_dir because it has a structure system_name_root / points_directory / xyz file. I.e. there is an additional root directory which encapsulates all the PointsDirectory-like directories.

Parameters:
  • system_name – The name of the system. This will be used in the names of the files and directories as well

  • split_size – How many .xyz files are going to be in each of the inner PointsDirectory-like directories

  • every – An integer value that indicates the nth step at which an xyz file should be written. Default is 1. If a value eg. 5 is given, then it will only write out a .xyz file for every 5th timestep.

Returns:

The Path object to the made parent directory

to_multiple_parent_dirs(system_name: str, split_size: int = 1000, nsplits_in_root: int = 5, every: int = 1, center=False)

Splits a trajectory into multiple parent directories, each of which can contain multiple PointsDirectory-like directories.

Parameters:
  • system_name – name of system. This name will be used in the names of the files and directories which are made

  • split_size – The number of .xyz files that inner PointsDirectory-like directory will contain, default 1000

  • nsplits_in_root – The number of splits that are going to be in one root directory, default 5 This would mean that there are 5 x 1000 geometries in that root directory.

  • every – An integer value that indicates the nth step at which an xyz file should be written, defaults to 1

  • center – whether or not to subtract centroid of geometry before writing out xyz. Useful if geometries are far away from the origin which can result in Gaussian failing to write outputs properly, defaults to False

property types

Returns the atom elements for atoms, assumes each timesteps has the same atoms. Removes duplicates.

property types_extended

Returns the atom elements for atoms, assumes each timesteps has the same atoms. Removes duplicates.

class XYZ(path: Path | str, atoms: Atoms | None = None)

Bases: HasAtoms, ReadFile, WriteFile, File

A class which wraps around a .xyz file that is contained in each PointDirectory. This .xyz file should always be there and it is used to write out .gjf files. Each instance of XYZ only has one geometry. If there is a need to read a .xyz file that contains multiple geometries (i.e. a trajectory file), the use the Trajectory class.

Parameters:
  • path – The path to an .xyz file

  • atoms – Optional list of Atoms which can be used to construct a .xyz file. If a list of atoms is passed, then a new xyz file with the given Atoms will be written to the given Path.

path: Path | str