ichor.hpc.submission_script package
Submodules
ichor.hpc.submission_script.command_group module
- class CommandGroup(iterable=(), /)
Bases:
SubmissionCommand,listWraps around jobs that are of the same type, i.e. Gaussian jobs, AIMALL jobs, FEREBUS jobs, ICHOR jobs. Since each job uses the same settings, we can just use the 0th index.
- property arguments: List[str]
Returns the arguments (if any) that need to be passed to the program that the job is going to execute.
- property command: str
Returns a string containing the command which is going to be ran (eg. g09 for Gaussian on CSF3.)
- property data: Tuple[str]
Returns the data that a job needs. This is usually a set of files which are the input files and the output files to be written by the job.
- property modules: list
Retruns a string containing any modules that need to be loaded in order for a program to run
- property ntypes: int
Returns the number of types of jobs that are in the command group.
- property options: List[str]
Returns the options to write at the top of the submission script
- repr(variables: List[str] | None = None) str
Return a string which represents the line in the job script which runs the program (Gaussian, AIMALL, etc.) with its given inputs (typically these inputs are in the form of a job array so they look like ${arr1[$SGE_TASK_ID-1]}
ichor.hpc.submission_script.script_names module
- class ScriptNames(script_names, parent: FileStructure, modify: str = '', **kwargs)
Bases:
dictA helper class which returns the full path of a particular script that is used to submit job files for programs like Gaussian and AIMAll. All the script files are stored into a directory ichor.hpc.global_variables.FILE_STRUCTURE[“scripts”]. These scripts are submitted to compute nodes on CSF3/FFLUXLAB which initiates a job.
- property file_structure
ichor.hpc.submission_script.submission_script module
- class SubmissionScript(submission_script_name: str | Path, ncores: int, cwd: Path | None = None, include_nodes: List[str] | None = None, exclude_nodes: List[str] | None = None, max_running_tasks: int = -1, outputs_dir_path: Path | None = None, errors_dir_path: Path | None = None, datafile_path: Path | None = None)
Bases:
objectA class that can be used to construct submission scripts for various programs such as Gaussian and AIMALL.
- Parameters:
path – A path to a submission script (such as GAUSSIAN.sh and AIMALL.sh). These .sh files are submitted as jobs to CSF3/FFLUXLAB. These job scripts will have different contents depending on the number of cores selected, the number of tasks to do (if running an array job), etc., so they need to be written out dynamically depending on what is going to be ran.
submission_script_name – The name of the submission script
ncores – Number of cores to run the job with
cwd – The current working directory. If not set, defaults to Path.cwd()
include_nodes – A list of node names to run the job on, defaults to None
exclude_nodes – A list of node names to exclude running the job on, defaults to None
max_running_tasks – Maximum number of tasks (of array job) that can run at once, defaults to -1
outputs_dir_path – Path to the outputs directory. If not set, it will use the default global_variables one
errors_dir_path – Path to the errors directory. If not set, it will use the default global_variables one
datafile_path – Path to datafile containing information needed for job to run, defaults to None
Example Gaussian submission script for SGE (array job):
#!/bin/bash -l #$ -o /net/scratch2/mbdxwym4/ammonia_with_derivatives/.DATA/SCRIPTS/OUTPUTS #$ -pe smp.pe 2 #$ -wd /net/scratch2/mbdxwym4/ammonia_with_derivatives #$ -e /net/scratch2/mbdxwym4/ammonia_with_derivatives/.DATA/SCRIPTS/ERRORS #$ -t 1-10000 echo "Loading Modules | $(date)" module load apps/binapps/gaussian/g09d01_em64t export OMP_NUM_THREADS=2 echo "Starting Job | $(date)" ICHOR_DATFILE=/net/scratch2/mbdxwym4/ammonia_with_derivatives/.DATA/JOBS/DATAFILES/dd644974-c430-449a-a8e1-d3540980bcb4 arr1=() arr2=() while IFS=, read -r var1 var2 do arr1+=($var1) arr2+=($var2) done < $ICHOR_DATFILE if [ -n ${arr1[$SGE_TASK_ID-1]} ] && [ -n ${arr2[$SGE_TASK_ID-1]} ] then export GAUSS_SCRDIR=$(dirname ${arr1[$SGE_TASK_ID-1]}) $g09root/g09/g09 ${arr1[$SGE_TASK_ID-1]} ${arr2[$SGE_TASK_ID-1]} fi echo "Finished Job | $(date)"Example Gaussian submission script for SLURM (array job):
#!/bin/bash -l #SBATCH -p multicore #SBATCH -n 2 #SBATCH -e /gpfs01/scratch/mbdxwym4/glycine_paper_geometries_gaussian_with_forces/.DATA/SCRIPTS/ERRORS/%x.e%A.%a # noqa: E501 #SBATCH -o /gpfs01/scratch/mbdxwym4/glycine_paper_geometries_gaussian_with_forces/.DATA/SCRIPTS/OUTPUTS/%x.o%A.%a # noqa: E501 #SBATCH -D /gpfs01/scratch/mbdxwym4/glycine_paper_geometries_gaussian_with_forces #SBATCH -a 1-3965 echo "Loading Modules | $(date)" module load gaussian/g16c01_em64t_detectcpu export OMP_NUM_THREADS=2 echo "Starting Job | $(date)" ICHOR_DATFILE=/gpfs01/scratch/mbdxwym4/glycine_paper_geometries_gaussian_with_forces/.DATA/JOBS/DATAFILES/78a3e81e-617e-475d-809b-01501f67bfc9 # noqa: E501 arr1=() arr2=() while IFS=, read -r var1 var2 do arr1+=($var1) arr2+=($var2) done < $ICHOR_DATFILE if [ -n ${arr1[$SLURM_ARRAY_TASK_ID-1]} ] && [ -n ${arr2[$SLURM_ARRAY_TASK_ID-1]} ] then export GAUSS_SCRDIR=$(dirname ${arr1[$SLURM_ARRAY_TASK_ID-1]}) $g16root/g16/g16 ${arr1[$SLURM_ARRAY_TASK_ID-1]} ${arr2[$SLURM_ARRAY_TASK_ID-1]} fi echo "Finished Job | $(date)"- DATAFILE = 'ICHOR_DATFILE'
- SEPARATOR = ','
- add_command(command)
Add a command to the list of commands.
- add_option(command)
Add a command to the list of commands.
- arr(n)
Returns the keyword which is used to select array jobs. Since array jobs start at 1 instead of 0, 1 needs to be added to the array size.
- array_index(n)
Returns the keywords used to run an array job through a program such as Gaussian.
Note
For example, this line in GAUSSIAN.sh (the submission script for Gaussian) is g09 ${arr1[$SGE_TASK_ID-1]} ${arr2[$SGE_TASK_ID-1]} The g09 is the program (however this depend on which system you are on). The rest is what is returned by this method.
- property bash_date: str
- property default_options: List[str]
Returns a list of default options to use in a submission script for a job. This list containing the current working directory, the output directory (where .o files are written), the errors directory (where .e files are written). If the number of cores is more than 1, the keyword needed when specifying more than 1 cores is also written to the options list. This keyword depends on the system on which the job is ran, as well as on the number of cores that the job needs.
- filetype = '.sh'
- generate_str_for_reading_datafile(datafile: Path, data: List[List[str]]) str
Forms the strings for array jobs which are then written to the submission script to specify the number of tasks in the array job and things like that. Easiest to see if you have GAUSSIAN.sh or another submission script opened.
- property grouped_commands: List[CommandGroup]
Group commands if they need to be submitted into an array job. These commands will all be the same type, i.e. they will all be Gaussian jobs, or AIMALL jobs. Sometimes ICHOR needs to be ran after the job is complete (for example ICHOR needs to be ran after a FEREBUS job in order to do adaptive sampling).
e.g. if we had a list of commands like the following: [Gaussian, Gaussian, Ichor, AIMAll, AIMAll] The commands will be grouped by type: [[Gaussian, Gaussian], [Ichor], [AIMAll, AIMAll]]
The groupings are then used to allocate a task array to the batch system
- property modules: List[str]
Returns a list of modules that need to be loaded before a job can be ran.
- property options: List[str]
Return the complete list of options (default options + other options that are specific to the job).
- setup_datafile(datafile: Path, data: List[List[str]]) Tuple[List[str], str]
Calls write_datafile which writes the datafile to disk (if it is not locked). Then it reads
- Parameters:
datafile – Path object that points to a datafile location (which is going to be written now by write_datafile)
data – A list of lists. Each inner list contains strings which are the names of the inputs and output files.
- test_array_not_null(n)
Returns a string which is used in bash to test if an array entry is not null or empty. We need to test this because there could be cases where there are 2000 SGE tasks for example, but there are only 1990 jobs (because points have been scrubbed in the previous step).
Note
the [ -n is used for the test program in bash which makes sure array entry is not null with -n
- var(n)
- write()
Writes the submission script that is passed to the queuing system. The options for the job (such as directory, number of jobs, core count, etc.) are written at the top of the file. The commands to run (such as Gaussian, AIMALL, etc.) are written below the options.
- write_datafile(datafile: Path, data: List[List[str]]) None
Write the datafile to disk. All datafiles are stored in self.datafile_path . Each line of the datafile contains text that corresponds to the inputs and output file names. These are separated by self.separator, which is a comma.
Note
For example, a datafile, which has a random name (which is set by self.uid) contains lines in the form of: WATER0001.gjf,WATER0001.gaussian_output WATER0002.gjf,WATER0002.gaussian_output …
Module contents
- class SubmissionScript(submission_script_name: str | Path, ncores: int, cwd: Path | None = None, include_nodes: List[str] | None = None, exclude_nodes: List[str] | None = None, max_running_tasks: int = -1, outputs_dir_path: Path | None = None, errors_dir_path: Path | None = None, datafile_path: Path | None = None)
Bases:
objectA class that can be used to construct submission scripts for various programs such as Gaussian and AIMALL.
- Parameters:
path – A path to a submission script (such as GAUSSIAN.sh and AIMALL.sh). These .sh files are submitted as jobs to CSF3/FFLUXLAB. These job scripts will have different contents depending on the number of cores selected, the number of tasks to do (if running an array job), etc., so they need to be written out dynamically depending on what is going to be ran.
submission_script_name – The name of the submission script
ncores – Number of cores to run the job with
cwd – The current working directory. If not set, defaults to Path.cwd()
include_nodes – A list of node names to run the job on, defaults to None
exclude_nodes – A list of node names to exclude running the job on, defaults to None
max_running_tasks – Maximum number of tasks (of array job) that can run at once, defaults to -1
outputs_dir_path – Path to the outputs directory. If not set, it will use the default global_variables one
errors_dir_path – Path to the errors directory. If not set, it will use the default global_variables one
datafile_path – Path to datafile containing information needed for job to run, defaults to None
Example Gaussian submission script for SGE (array job):
#!/bin/bash -l #$ -o /net/scratch2/mbdxwym4/ammonia_with_derivatives/.DATA/SCRIPTS/OUTPUTS #$ -pe smp.pe 2 #$ -wd /net/scratch2/mbdxwym4/ammonia_with_derivatives #$ -e /net/scratch2/mbdxwym4/ammonia_with_derivatives/.DATA/SCRIPTS/ERRORS #$ -t 1-10000 echo "Loading Modules | $(date)" module load apps/binapps/gaussian/g09d01_em64t export OMP_NUM_THREADS=2 echo "Starting Job | $(date)" ICHOR_DATFILE=/net/scratch2/mbdxwym4/ammonia_with_derivatives/.DATA/JOBS/DATAFILES/dd644974-c430-449a-a8e1-d3540980bcb4 arr1=() arr2=() while IFS=, read -r var1 var2 do arr1+=($var1) arr2+=($var2) done < $ICHOR_DATFILE if [ -n ${arr1[$SGE_TASK_ID-1]} ] && [ -n ${arr2[$SGE_TASK_ID-1]} ] then export GAUSS_SCRDIR=$(dirname ${arr1[$SGE_TASK_ID-1]}) $g09root/g09/g09 ${arr1[$SGE_TASK_ID-1]} ${arr2[$SGE_TASK_ID-1]} fi echo "Finished Job | $(date)"Example Gaussian submission script for SLURM (array job):
#!/bin/bash -l #SBATCH -p multicore #SBATCH -n 2 #SBATCH -e /gpfs01/scratch/mbdxwym4/glycine_paper_geometries_gaussian_with_forces/.DATA/SCRIPTS/ERRORS/%x.e%A.%a # noqa: E501 #SBATCH -o /gpfs01/scratch/mbdxwym4/glycine_paper_geometries_gaussian_with_forces/.DATA/SCRIPTS/OUTPUTS/%x.o%A.%a # noqa: E501 #SBATCH -D /gpfs01/scratch/mbdxwym4/glycine_paper_geometries_gaussian_with_forces #SBATCH -a 1-3965 echo "Loading Modules | $(date)" module load gaussian/g16c01_em64t_detectcpu export OMP_NUM_THREADS=2 echo "Starting Job | $(date)" ICHOR_DATFILE=/gpfs01/scratch/mbdxwym4/glycine_paper_geometries_gaussian_with_forces/.DATA/JOBS/DATAFILES/78a3e81e-617e-475d-809b-01501f67bfc9 # noqa: E501 arr1=() arr2=() while IFS=, read -r var1 var2 do arr1+=($var1) arr2+=($var2) done < $ICHOR_DATFILE if [ -n ${arr1[$SLURM_ARRAY_TASK_ID-1]} ] && [ -n ${arr2[$SLURM_ARRAY_TASK_ID-1]} ] then export GAUSS_SCRDIR=$(dirname ${arr1[$SLURM_ARRAY_TASK_ID-1]}) $g16root/g16/g16 ${arr1[$SLURM_ARRAY_TASK_ID-1]} ${arr2[$SLURM_ARRAY_TASK_ID-1]} fi echo "Finished Job | $(date)"- DATAFILE = 'ICHOR_DATFILE'
- SEPARATOR = ','
- add_command(command)
Add a command to the list of commands.
- add_option(command)
Add a command to the list of commands.
- arr(n)
Returns the keyword which is used to select array jobs. Since array jobs start at 1 instead of 0, 1 needs to be added to the array size.
- array_index(n)
Returns the keywords used to run an array job through a program such as Gaussian.
Note
For example, this line in GAUSSIAN.sh (the submission script for Gaussian) is g09 ${arr1[$SGE_TASK_ID-1]} ${arr2[$SGE_TASK_ID-1]} The g09 is the program (however this depend on which system you are on). The rest is what is returned by this method.
- property bash_date: str
- property default_options: List[str]
Returns a list of default options to use in a submission script for a job. This list containing the current working directory, the output directory (where .o files are written), the errors directory (where .e files are written). If the number of cores is more than 1, the keyword needed when specifying more than 1 cores is also written to the options list. This keyword depends on the system on which the job is ran, as well as on the number of cores that the job needs.
- filetype = '.sh'
- generate_str_for_reading_datafile(datafile: Path, data: List[List[str]]) str
Forms the strings for array jobs which are then written to the submission script to specify the number of tasks in the array job and things like that. Easiest to see if you have GAUSSIAN.sh or another submission script opened.
- property grouped_commands: List[CommandGroup]
Group commands if they need to be submitted into an array job. These commands will all be the same type, i.e. they will all be Gaussian jobs, or AIMALL jobs. Sometimes ICHOR needs to be ran after the job is complete (for example ICHOR needs to be ran after a FEREBUS job in order to do adaptive sampling).
e.g. if we had a list of commands like the following: [Gaussian, Gaussian, Ichor, AIMAll, AIMAll] The commands will be grouped by type: [[Gaussian, Gaussian], [Ichor], [AIMAll, AIMAll]]
The groupings are then used to allocate a task array to the batch system
- property modules: List[str]
Returns a list of modules that need to be loaded before a job can be ran.
- property options: List[str]
Return the complete list of options (default options + other options that are specific to the job).
- setup_datafile(datafile: Path, data: List[List[str]]) Tuple[List[str], str]
Calls write_datafile which writes the datafile to disk (if it is not locked). Then it reads
- Parameters:
datafile – Path object that points to a datafile location (which is going to be written now by write_datafile)
data – A list of lists. Each inner list contains strings which are the names of the inputs and output files.
- test_array_not_null(n)
Returns a string which is used in bash to test if an array entry is not null or empty. We need to test this because there could be cases where there are 2000 SGE tasks for example, but there are only 1990 jobs (because points have been scrubbed in the previous step).
Note
the [ -n is used for the test program in bash which makes sure array entry is not null with -n
- var(n)
- write()
Writes the submission script that is passed to the queuing system. The options for the job (such as directory, number of jobs, core count, etc.) are written at the top of the file. The commands to run (such as Gaussian, AIMALL, etc.) are written below the options.
- write_datafile(datafile: Path, data: List[List[str]]) None
Write the datafile to disk. All datafiles are stored in self.datafile_path . Each line of the datafile contains text that corresponds to the inputs and output file names. These are separated by self.separator, which is a comma.
Note
For example, a datafile, which has a random name (which is set by self.uid) contains lines in the form of: WATER0001.gjf,WATER0001.gaussian_output WATER0002.gjf,WATER0002.gaussian_output …