ichor.hpc.main package

Submodules

ichor.hpc.main.aimall module

add_method_and_get_wfn_paths(points: PointsDirectory, method: str) List[Path]

AIMALL needs to know the method from the wfn file. The method needs to be added in the wfn file, otherwise AIMALL gets the method wrong and gives the wrong results.

submit_points_directory_to_aimall(points_directory: PointsDirectory | Path, method='B3LYP', ncores: int = 2, naat: int = 1, aimall_atoms: List[str] | None = None, force_calculate_ints=False, hold: JobID | None = None, script_name: str = PosixPath('.DATA/SCRIPTS/AIMALL.sh'), outputs_dir_path=PosixPath('.DATA/SCRIPTS/OUTPUTS'), errors_dir_path=PosixPath('.DATA/SCRIPTS/ERRORS'), **kwargs) JobID | None

Submits .wfn files which will be partitioned into .int files by AIMALL. Each topological atom i the system has its own .int file

Parameters:
  • points_directory – A path to a PointsDirectory-structured directory or a PointsDirectory instance

  • method – Functional to be written to the .wfn file because AIMAll needs to know it to function correctly. Note that only HF, B3LYP, M062X, PBE are supported.

  • ncores – Number of cores to run AIMAll with, defaults to 2

  • naat – Number of atoms at a time, defaults to 1

  • aimall_atoms – A list of atom names (e.g. [C1, H2, etc.]) which to integrate over, defaults to None If left as None, AIMAll will do calculations for all atoms.

  • hold – Hold for a specific JobID, defaults to None

Raises:

ValueError – if the provided method is not in the supported functionals, then raise error.

Returns:

The job id of the submitted job

Return type:

Optional[ichor.hpc.batch_system.jobs.JobID]

submit_wfns(wfns: List[Path], aimall_atoms: List[str] | None = None, script_name: str = PosixPath('.DATA/SCRIPTS/AIMALL.sh'), ncores=2, naat=1, force_calculate_ints=False, hold: JobID | None = None, outputs_dir_path=PosixPath('.DATA/SCRIPTS/OUTPUTS'), errors_dir_path=PosixPath('.DATA/SCRIPTS/ERRORS'), **kwargs) JobID | None

Write out submission script and submit wavefunctions to AIMALL on a cluster.

Parameters:
  • wfns – a list of wavefunction paths which to write to the submission script

  • atoms – a list of stings corresponding to atom names. These will be the atoms for which AIMALL computes properties.

  • force – Whether or not to compute AIMALL for this wfn. If force is True, AIMALL will be ran again

  • hold – An optional JobID to hold for. The AIMALL job will not run until that other job is finished.

ichor.hpc.main.check_for_missing_files module

submit_check_points_directory_for_missing_files(points_dir_path: str | Path)

Submits a job that checks if contents of files are present. Files are going to be written to the file which contains standard output from the job.

Parameters:

points_dir_path – Path or PointsDirectory or PointsDirectoryParent-like directory

ichor.hpc.main.database module

submit_make_csvs_from_database(db_path: Path, db_type: str, ncores: int, alf: List[ALF] | None = None, float_difference_iqa_wfn: float = 4.184, float_integration_error: float = 0.001, rotate_multipole_moments: bool = True, calculate_feature_forces: bool = False)

Submits making of csv files from a databse to compute node. Note that the csv making code is parallelized per atom, meaning that each atomic csv is made using 1 core. Using the same number of cores as the number of atoms in the system is the optimal choice.

Parameters:
  • db_path – pathlib.Path object that holds path to database

  • db_type – The type of database, sqlite or json

  • ncores – Number of cores to run job with

  • float_difference_iqa_wfn – Absolute tolerance for difference of energy between WFN and sum of IQA energies.

  • submit_on_compute – Whether to submit on compute or now

  • float_integration_error – Absolute tolerance for integration error.

  • alf – A list of ALF for the whole system. If not given, it will be calculated automatically.

  • rotate_multipole_moments – Whether or not to rotate multipole moments, defaults to True

  • calculate_feature_forces – Whether or not to calculate ALF forces, defaults to False

submit_make_database(points_dir_path: Path, database_format: str = 'sqlite', ncores=1)

Method for making a PointsDirectory or parent to PointsDirectory into a database. Infers if it is a PointsDirectory or PointsDirectoryParent based on the suffix of the directory

Parameters:
  • points_dir_path – Path to PointsDirectory or parent to PointsDirectory-ies

  • database_format – the format, currently sqlite and json are supported

  • ncores – number of cores to use on compute node

ichor.hpc.main.gaussian module

submit_gjfs(gjfs: List[Path], force_calculate_wfn: bool = False, script_name: str | Path | None = PosixPath('.DATA/SCRIPTS/GAUSSIAN.sh'), hold: JobID | None = None, ncores=2, outputs_dir_path=PosixPath('.DATA/SCRIPTS/OUTPUTS'), errors_dir_path=PosixPath('.DATA/SCRIPTS/ERRORS')) JobID

Function that writes out a submission script which contains an array of Gaussian jobs to be ran on compute nodes. If calling this function from a log-in node, it will write out the submission script, a datafile (file which contains the names of all the .gjf file that need to be ran through Gaussian), and it will submit the submission script to compute nodes as well to run Gaussian on compute nodes. However, if using this function from a compute node, (which will happen when ichor is ran in auto-run mode), this function will only be used to write out the datafile and will not submit any new jobs from the compute node (as you cannot submit jobs from compute nodes on CSF3.)

Parameters:
  • gjfs – A list of Path objects pointing to .gjf files

  • force_calculate_wfn – Run Gaussian calculations on given .gjf files, even if .wfn files already exist. Defaults to False.

  • hold – An optional JobID for which this job to hold. This is used in auto-run to hold this job for the previous job to finish, defaults to None

Script_name:

Path to write submission script out to defaults to ichor.hpc.global_variables.SCRIPT_NAMES[“gaussian”]

Returns:

The JobID of this job given by the submission system.

submit_points_directory_to_gaussian(points_directory: Path | PointsDirectory, overwrite_existing=False, force_calculate_wfn: bool = False, ncores=2, hold: JobID | None = None, script_name: str = PosixPath('.DATA/SCRIPTS/GAUSSIAN.sh'), outputs_dir_path=PosixPath('.DATA/SCRIPTS/OUTPUTS'), errors_dir_path=PosixPath('.DATA/SCRIPTS/ERRORS'), **kwargs) JobID | None

Function that writes out .gjf files from .xyz files that are in each directory and calls submit_gjfs which submits all .gjf files in a directory to Gaussian. Gaussian outputs .wfn files.

Parameters:
  • directory – A Path object which is the path of the directory (commonly traning set path, sample pool path, etc.).

  • force_calculate_wfn – Run Gaussian calculations on given .gjf files, even if .wfn files already exist. Defaults to False.

  • kwargs – Key word arguments to pass to GJF class. These are things like number of cores, basis set, level of theory, spin multiplicity, charge, etc. These will get used in the new written gjf files (overwriting settings from previously existing gjf files)

write_gjfs(points_directory: PointsDirectory, overwrite_existing: bool, **kwargs) List[Path]

Writes out .gjf files in every PointDirectory which is contained in a PointsDirectory. Each PointDirectory should always have a .xyz file in it, which contains only one molecular geometry. This .xyz file can be used to write out the .gjf file in the PointDirectory (if it does not exist already).

Parameters:

points – A PointsDirectory instance which wraps around a whole directory containing points (such as TRAINING_SET).

Returns:

A list of Path objects which point to .gjf files in each PointDirectory that is contained in the PointsDirectory.

ichor.hpc.main.orca module

submit_orca_to_compute(orca_inputs: List[Path], force_calculate_wfn: bool = False, script_name: str | Path | None = PosixPath('.DATA/SCRIPTS/ORCA.sh'), hold: JobID | None = None, ncores=2, outputs_dir_path=PosixPath('.DATA/SCRIPTS/OUTPUTS'), errors_dir_path=PosixPath('.DATA/SCRIPTS/ERRORS')) JobID

Function that writes out a submission script which contains an array of ORCA jobs to be ran on compute nodes. If calling this function from a log-in node, it will write out the submission script, a datafile (file which contains the names of all the .orcainput file that need to be ran through ORCA), and it will submit the submission script to compute nodes as well to run ORCA on compute nodes. However, if using this function from a compute node, (which will happen when ichor is ran in auto-run mode), this function will only be used to write out the datafile and will not submit any new jobs from the compute node (as you cannot submit jobs from compute nodes on CSF3.)

Parameters:
  • orca_inputs – A list of Path objects pointing to ORCA .inp files

  • force_calculate_wfn – Run ORCA calculation on the given files, even if .wfn files already exist. Defaults to False.

  • hold – An optional JobID for which this job to hold. This is used in auto-run to hold this job for the previous job to finish, defaults to None

Script_name:

Path to write submission script out to defaults to ichor.hpc.global_variables.SCRIPT_NAMES[“orca”]

Returns:

The JobID of this job given by the submission system.

submit_points_directory_to_orca(points_directory: Path | PointsDirectory, overwrite_existing=False, force_calculate_wfn: bool = False, ncores=2, hold: JobID | None = None, script_name: str = PosixPath('.DATA/SCRIPTS/ORCA.sh'), outputs_dir_path=PosixPath('.DATA/SCRIPTS/OUTPUTS'), errors_dir_path=PosixPath('.DATA/SCRIPTS/ERRORS'), **kwargs) JobID | None

Function that writes out .inp files from .xyz files that are in each directory and submits ORCA jobs to compute nodes.

Parameters:
  • directory – A Path object which is the path of the directory (commonly training set path, sample pool path, etc.).

  • force_calculate_wfn – Run ORCA calculations on given input files, even if .wfn files already exist. Defaults to False.

  • kwargs – Key word arguments to pass to orca input file class. These are things like number of cores, basis set, level of theory, spin multiplicity, charge, etc. These will get used in the new written input files (overwriting settings from previously existing input files)

write_orca_inputs(points_directory: PointsDirectory, overwrite_existing: bool, **kwargs) List[Path]

Writes out .inp files in every PointDirectory which is contained in a PointsDirectory. Each PointDirectory should always have a .xyz file in it, which contains only one molecular geometry. This .xyz file can be used to write out the orca input file in the PointDirectory (if it does not exist already).

Parameters:

points – A PointsDirectory instance which wraps around a whole directory containing points (such as TRAINING_SET).

Returns:

A list of Path objects which point to orca input files in each PointDirectory that is contained in the PointsDirectory.

ichor.hpc.main.trajectory module

submit_center_trajectory_on_atom(trajectory_path: str | Path, central_atom_name: str, alf_dict: Dict[str, ALF], xyz_output_path='ALF_centered_trajectory.xyz', ncores=1)

Submits centering on atom on compute

Parameters:
  • trajectory_path – Path of trajectory file to center

  • central_atom_name – Central atom name to center on, eg. C1

  • alf_dict – dictionary of atom_name: ALF, which must contain the central atom as key

  • xyz_output_path – xyz file to write centered geometries to, defaults to “ALF_centered_trajectory.xyz”

  • ncores – number of cores for job, defaults to 1

Module contents

submit_check_points_directory_for_missing_files(points_dir_path: str | Path)

Submits a job that checks if contents of files are present. Files are going to be written to the file which contains standard output from the job.

Parameters:

points_dir_path – Path or PointsDirectory or PointsDirectoryParent-like directory

submit_gjfs(gjfs: List[Path], force_calculate_wfn: bool = False, script_name: str | Path | None = PosixPath('.DATA/SCRIPTS/GAUSSIAN.sh'), hold: JobID | None = None, ncores=2, outputs_dir_path=PosixPath('.DATA/SCRIPTS/OUTPUTS'), errors_dir_path=PosixPath('.DATA/SCRIPTS/ERRORS')) JobID

Function that writes out a submission script which contains an array of Gaussian jobs to be ran on compute nodes. If calling this function from a log-in node, it will write out the submission script, a datafile (file which contains the names of all the .gjf file that need to be ran through Gaussian), and it will submit the submission script to compute nodes as well to run Gaussian on compute nodes. However, if using this function from a compute node, (which will happen when ichor is ran in auto-run mode), this function will only be used to write out the datafile and will not submit any new jobs from the compute node (as you cannot submit jobs from compute nodes on CSF3.)

Parameters:
  • gjfs – A list of Path objects pointing to .gjf files

  • force_calculate_wfn – Run Gaussian calculations on given .gjf files, even if .wfn files already exist. Defaults to False.

  • hold – An optional JobID for which this job to hold. This is used in auto-run to hold this job for the previous job to finish, defaults to None

Script_name:

Path to write submission script out to defaults to ichor.hpc.global_variables.SCRIPT_NAMES[“gaussian”]

Returns:

The JobID of this job given by the submission system.

submit_make_csvs_from_database(db_path: Path, db_type: str, ncores: int, alf: List[ALF] | None = None, float_difference_iqa_wfn: float = 4.184, float_integration_error: float = 0.001, rotate_multipole_moments: bool = True, calculate_feature_forces: bool = False)

Submits making of csv files from a databse to compute node. Note that the csv making code is parallelized per atom, meaning that each atomic csv is made using 1 core. Using the same number of cores as the number of atoms in the system is the optimal choice.

Parameters:
  • db_path – pathlib.Path object that holds path to database

  • db_type – The type of database, sqlite or json

  • ncores – Number of cores to run job with

  • float_difference_iqa_wfn – Absolute tolerance for difference of energy between WFN and sum of IQA energies.

  • submit_on_compute – Whether to submit on compute or now

  • float_integration_error – Absolute tolerance for integration error.

  • alf – A list of ALF for the whole system. If not given, it will be calculated automatically.

  • rotate_multipole_moments – Whether or not to rotate multipole moments, defaults to True

  • calculate_feature_forces – Whether or not to calculate ALF forces, defaults to False

submit_points_directory_to_aimall(points_directory: PointsDirectory | Path, method='B3LYP', ncores: int = 2, naat: int = 1, aimall_atoms: List[str] | None = None, force_calculate_ints=False, hold: JobID | None = None, script_name: str = PosixPath('.DATA/SCRIPTS/AIMALL.sh'), outputs_dir_path=PosixPath('.DATA/SCRIPTS/OUTPUTS'), errors_dir_path=PosixPath('.DATA/SCRIPTS/ERRORS'), **kwargs) JobID | None

Submits .wfn files which will be partitioned into .int files by AIMALL. Each topological atom i the system has its own .int file

Parameters:
  • points_directory – A path to a PointsDirectory-structured directory or a PointsDirectory instance

  • method – Functional to be written to the .wfn file because AIMAll needs to know it to function correctly. Note that only HF, B3LYP, M062X, PBE are supported.

  • ncores – Number of cores to run AIMAll with, defaults to 2

  • naat – Number of atoms at a time, defaults to 1

  • aimall_atoms – A list of atom names (e.g. [C1, H2, etc.]) which to integrate over, defaults to None If left as None, AIMAll will do calculations for all atoms.

  • hold – Hold for a specific JobID, defaults to None

Raises:

ValueError – if the provided method is not in the supported functionals, then raise error.

Returns:

The job id of the submitted job

Return type:

Optional[ichor.hpc.batch_system.jobs.JobID]

submit_points_directory_to_gaussian(points_directory: Path | PointsDirectory, overwrite_existing=False, force_calculate_wfn: bool = False, ncores=2, hold: JobID | None = None, script_name: str = PosixPath('.DATA/SCRIPTS/GAUSSIAN.sh'), outputs_dir_path=PosixPath('.DATA/SCRIPTS/OUTPUTS'), errors_dir_path=PosixPath('.DATA/SCRIPTS/ERRORS'), **kwargs) JobID | None

Function that writes out .gjf files from .xyz files that are in each directory and calls submit_gjfs which submits all .gjf files in a directory to Gaussian. Gaussian outputs .wfn files.

Parameters:
  • directory – A Path object which is the path of the directory (commonly traning set path, sample pool path, etc.).

  • force_calculate_wfn – Run Gaussian calculations on given .gjf files, even if .wfn files already exist. Defaults to False.

  • kwargs – Key word arguments to pass to GJF class. These are things like number of cores, basis set, level of theory, spin multiplicity, charge, etc. These will get used in the new written gjf files (overwriting settings from previously existing gjf files)

submit_points_directory_to_orca(points_directory: Path | PointsDirectory, overwrite_existing=False, force_calculate_wfn: bool = False, ncores=2, hold: JobID | None = None, script_name: str = PosixPath('.DATA/SCRIPTS/ORCA.sh'), outputs_dir_path=PosixPath('.DATA/SCRIPTS/OUTPUTS'), errors_dir_path=PosixPath('.DATA/SCRIPTS/ERRORS'), **kwargs) JobID | None

Function that writes out .inp files from .xyz files that are in each directory and submits ORCA jobs to compute nodes.

Parameters:
  • directory – A Path object which is the path of the directory (commonly training set path, sample pool path, etc.).

  • force_calculate_wfn – Run ORCA calculations on given input files, even if .wfn files already exist. Defaults to False.

  • kwargs – Key word arguments to pass to orca input file class. These are things like number of cores, basis set, level of theory, spin multiplicity, charge, etc. These will get used in the new written input files (overwriting settings from previously existing input files)