ichor.hpc.batch_system package
Submodules
ichor.hpc.batch_system.batch_system module
- class BatchSystem
Bases:
ABCAn abstract base class for batch systems which are the systems used to submit jobs to compute nodes (for example Sun Grid Engine.)
- Host = None
- JobID = None
- NumProcs = None
- OptionCmd = None
- TaskID = None
- TaskLast = None
- abstract classmethod array_job(njobs: int) str
Returns the flag to set the number of tasks for a job
- abstract classmethod change_working_directory(path: Path) str
“ Changes the working directory
- abstract static current_node() NodeType
Return the type of the node ichor is currently running on e.g. NodeType.ComputeNode
- delete_job_command = None
- abstract classmethod error_directory(path: Path, task_array: bool = False) str
Changes the error directory where (these are .e files)
- abstract classmethod hold_job(job: JobID | List[JobID])
Hold a job in order for it to be ran at another time/ after another job has finished running.
- abstract static is_present() bool
- abstract classmethod max_running_tasks(max_running_tasks: int) str
Returns the flag to se the maximum number of running tasks for a job
- abstract classmethod node_options(include_nodes: List[str], exclude_nodes: List[str]) str
- abstract classmethod output_directory(path: Path, task_array: bool = False) str
Changes the output directory where (these are .o files)
- abstract classmethod parallel_environment(ncores: int) str | None
Returns the flag to set the parallel environment for the job
- abstract classmethod parse_job_id(stdout: str) str
- abstract static status() str
Returns the status of running jobs.
- classmethod submit_script(job_script: Path, hold: JobID | List[JobID] | None = None) JobID
Submit a job script to the batch system in order to queue/run jobs.
- submit_script_command = None
ichor.hpc.batch_system.jobs module
- exception CannotParseJobID
Bases:
Exception
- class Job(id: str, priority: float, name: str, user: str, state: str, start: datetime, queue: str, slots: int, task_id: str | None = None)
Bases:
VarReprMixin
- class JobID(script: str | Path, id: str)
Bases:
objectClass used to keep track of jobs submitted to compute nodes.
- Parameters:
script – A path to a script file such as GAUSSIAN.sh that will be submitted to compute node.
id – The job id given to the job when the job was submitted to a compute node.
instance – the unique identified (UUID) that is used for the job’s datafile (containing the names of all the files needed for the job).
- write(path: str | Path)
ichor.hpc.batch_system.local module
- class LocalBatchSystem
Bases:
objectLocalBatchSystem is to only be used for debugging purposes (Unless one wants to implement a batch system to run on a local machine… would be a nice addition)
- property OptionCmd: str
- classmethod array_job(njobs: int, max_running_tasks: int | None = None) str
Returns the line in the job script that specifies this job is an array job. These jobs are run at the same time in parallel as they do not depend on one another. An example will be running 50 Gaussian or AIMALL jobs at the same time without having to submit 50 separate jobs. Instead 1 array job can be submitted.
- classmethod change_working_directory(dir)
- delete_job_command = ['echo']
- classmethod error_directory(d, task_array=False)
- classmethod output_directory(d, task_array=False)
- classmethod parallel_environment(cores)
- classmethod parse_job_id(stdout) List[str]
- static status() List[str]
- submit_script_command = ['echo']
ichor.hpc.batch_system.node module
ichor.hpc.batch_system.parallel_environment module
ichor.hpc.batch_system.sge module
- class JobStatus(value)
Bases:
EnumStrListAn enumeration.
- Deleting = ['dr', 'dt', 'dRr', 'dRt', 'ds', 'dS', 'dT', 'dRs', 'dRS', 'dRT']
- Error = ['Eqw', 'Ehqw', 'EhRqw']
- Holding = ['hqw', 'hRqw']
- Pending = ['qw']
- Resubmit = ['Rr', 'Rt']
- Running = ['r']
- Suspended = ['s']
- Transferring = ['t']
- class SunGridEngine
Bases:
BatchSystemA class that implements methods ICHOR uses to submit jobs to the Sun Grid Engine (SGE) batch system. These methods/properties are used to construct job scripts for any program we want to run on SGE.
- Host = 'SGE_O_HOST'
- JobID = 'JOB_ID'
- NumProcs = 'NSLOTS'
- OptionCmd = '$'
- TaskID = 'SGE_TASK_ID'
- TaskLast = 'SGE_TASK_LAST'
- classmethod array_job(njobs: int) str
Returns the line in the job script that specifies this job is an array job. These jobs are run at the same time in parallel as they do not depend on one another. An example will be running 50 Gaussian or AIMALL jobs at the same time without having to submit 50 separate jobs. Instead 1 array job can be submitted.
- classmethod change_working_directory(path: Path) str
Return the line in the job script definning the working directory from where the job is going to run.
- static current_node() NodeType
Return the current type of node ichor is running on SGE defines the SGE_O_HOST when running on a compute node
- delete_job_command = ['qdel']
- classmethod error_directory(path: Path, task_array: bool = False) str
Return the line in the job script defining the error directory where any errors from the job should be written to. These files end in .e{job_id}.
- classmethod hold_job(job_id: JobID | List[JobID]) List[str]
Return a list containing hold_jid keyword and job id which is used to hold a particular job id for it to be ran at a later time.
- static is_present() bool
Check if SGE is present on the current machine ICHOR is running on.
- classmethod max_running_tasks(max_running_tasks: int) str
Returns the flag to se the maximum number of running tasks for a job
- classmethod node_options(include_nodes: List[str], exclude_nodes: List[str]) str
- classmethod output_directory(path: Path, task_array: bool = False) str
Return the line in the job script defining the output directory where the output of the job should be written to. These files end in .o{job_id}.
- classmethod parallel_environment(ncores: int) str | None
Returns the line in the job script defining the number of cores to be used for the job.
- classmethod parse_job_id(stdout) str
Example script submission using SGE:
$ qsub test.sh > Your job 518753 ("test.sh") has been submitted ^^^^^^The job id is given by the number, this is parsed by finding the number in the return string
- static status() List[str]
Return a list containing command used to check status of jobs on SGE batch system.
- submit_script_command = ['qsub']
ichor.hpc.batch_system.slurm module
- class JobStatus(value)
Bases:
EnumStrListAn enumeration.
- Deleting = ['RD', 'CG']
- Error = ['F', 'ST', 'TO', 'OOM', 'NF', 'BF', 'CA']
- Holding = ['RH']
- Pending = ['PD']
- Resubmit = []
- Running = ['R']
- Suspended = ['S']
- Transferring = []
- class SLURM
Bases:
BatchSystemA class that implements methods ICHOR uses to submit jobs to the Sun Grid Engine (SGE) batch system. These methods/properties are used to construct job scripts for any program we want to run on SGE.
- Host = 'SLURM_SUBMIT_HOST'
- JobID = 'SLURM_JOBID'
- NumProcs = 'SLURM_NPROCS'
- OptionCmd = 'SBATCH'
- TaskID = 'SLURM_ARRAY_TASK_ID'
- TaskLast = 'SLURM_ARRAY_TASK_COUNT'
- classmethod array_job(njobs: int, max_running_tasks: int | None = None) str
Returns the line in the job script that specifies this job is an array job. These jobs are run at the same time in parallel as they do not depend on one another. An example will be running 50 Gaussian or AIMALL jobs at the same time without having to submit 50 separate jobs. Instead 1 array job can be submitted.
- classmethod change_working_directory(path: Path) str
Return the line in the job script definning the working directory from where the job is going to run.
- static current_node() NodeType
Return the current type of node ichor is running on SLURM defines the SLURM_SUBMIT_HOST when running on a compute node
- delete_job_command = ['scancel']
- classmethod error_directory(path: Path, task_array: bool = False) str
Return the line in the job script defining the error directory where any errors from the job should be written to. These files end in .e{job_id}.
- classmethod hold_job(job_id: JobID | List[JobID]) List[str]
Return a list containing hold_jid keyword and job id which is used to hold a particular job id for it to be ran at a later time. https://hpc.nih.gov/docs/job_dependencies.html
- static is_present() bool
Check if SLURM is present on the current machine ICHOR is running on.
- classmethod max_running_tasks(max_running_tasks: int) str
Returns the flag to se the maximum number of running tasks for a job
- classmethod node_options(include_nodes: List[str], exclude_nodes: List[str]) str
- classmethod output_directory(path: Path, task_array: bool = False) str
Return the line in the job script defining the output directory where the output of the job should be written to. These files end in .o{job_id}.
- classmethod parallel_environment(ncores: int) str | None
Returns the line in the job script defining the number of corest to be used for the job.
- classmethod parse_job_id(stdout) str
Example script submission using SLURM:
$ sbatch test.sh > Submitted batch job 345234 ^^^^^^Our job id is the final number in the stdout
- static status() List[str]
Return a list containing command used to check status of jobs on SGE batch system.
- submit_script_command = ['sbatch']
ichor.hpc.batch_system.utils module
- delete_jobs()
Delete all jobs that were queued up to run. This function reads the ichor.hpc.global_variables.FILE_STRUCTURE[“jid”] file, which contains the names of all submitted jobs.
- display_status_of_running_jobs()
Module contents
- class Job(id: str, priority: float, name: str, user: str, state: str, start: datetime, queue: str, slots: int, task_id: str | None = None)
Bases:
VarReprMixin
- class JobID(script: str | Path, id: str)
Bases:
objectClass used to keep track of jobs submitted to compute nodes.
- Parameters:
script – A path to a script file such as GAUSSIAN.sh that will be submitted to compute node.
id – The job id given to the job when the job was submitted to a compute node.
instance – the unique identified (UUID) that is used for the job’s datafile (containing the names of all the files needed for the job).
- write(path: str | Path)
- class LocalBatchSystem
Bases:
objectLocalBatchSystem is to only be used for debugging purposes (Unless one wants to implement a batch system to run on a local machine… would be a nice addition)
- property OptionCmd: str
- classmethod array_job(njobs: int, max_running_tasks: int | None = None) str
Returns the line in the job script that specifies this job is an array job. These jobs are run at the same time in parallel as they do not depend on one another. An example will be running 50 Gaussian or AIMALL jobs at the same time without having to submit 50 separate jobs. Instead 1 array job can be submitted.
- classmethod change_working_directory(dir)
- delete_job_command = ['echo']
- classmethod error_directory(d, task_array=False)
- classmethod output_directory(d, task_array=False)
- classmethod parallel_environment(cores)
- classmethod parse_job_id(stdout) List[str]
- static status() List[str]
- submit_script_command = ['echo']
- class ParallelEnvironment
Bases:
RangeDictA dictionary containing key:value pairs in which the key is a keyword used by the submission system to specify the number of cores and the value is a tuple containing a lower and upper bound for the number of cores. Once
- class SLURM
Bases:
BatchSystemA class that implements methods ICHOR uses to submit jobs to the Sun Grid Engine (SGE) batch system. These methods/properties are used to construct job scripts for any program we want to run on SGE.
- Host = 'SLURM_SUBMIT_HOST'
- JobID = 'SLURM_JOBID'
- NumProcs = 'SLURM_NPROCS'
- OptionCmd = 'SBATCH'
- TaskID = 'SLURM_ARRAY_TASK_ID'
- TaskLast = 'SLURM_ARRAY_TASK_COUNT'
- classmethod array_job(njobs: int, max_running_tasks: int | None = None) str
Returns the line in the job script that specifies this job is an array job. These jobs are run at the same time in parallel as they do not depend on one another. An example will be running 50 Gaussian or AIMALL jobs at the same time without having to submit 50 separate jobs. Instead 1 array job can be submitted.
- classmethod change_working_directory(path: Path) str
Return the line in the job script definning the working directory from where the job is going to run.
- static current_node() NodeType
Return the current type of node ichor is running on SLURM defines the SLURM_SUBMIT_HOST when running on a compute node
- delete_job_command = ['scancel']
- classmethod error_directory(path: Path, task_array: bool = False) str
Return the line in the job script defining the error directory where any errors from the job should be written to. These files end in .e{job_id}.
- classmethod hold_job(job_id: JobID | List[JobID]) List[str]
Return a list containing hold_jid keyword and job id which is used to hold a particular job id for it to be ran at a later time. https://hpc.nih.gov/docs/job_dependencies.html
- static is_present() bool
Check if SLURM is present on the current machine ICHOR is running on.
- classmethod max_running_tasks(max_running_tasks: int) str
Returns the flag to se the maximum number of running tasks for a job
- classmethod node_options(include_nodes: List[str], exclude_nodes: List[str]) str
- classmethod output_directory(path: Path, task_array: bool = False) str
Return the line in the job script defining the output directory where the output of the job should be written to. These files end in .o{job_id}.
- classmethod parallel_environment(ncores: int) str | None
Returns the line in the job script defining the number of corest to be used for the job.
- classmethod parse_job_id(stdout) str
Example script submission using SLURM:
$ sbatch test.sh > Submitted batch job 345234 ^^^^^^Our job id is the final number in the stdout
- static status() List[str]
Return a list containing command used to check status of jobs on SGE batch system.
- submit_script_command = ['sbatch']
- class SunGridEngine
Bases:
BatchSystemA class that implements methods ICHOR uses to submit jobs to the Sun Grid Engine (SGE) batch system. These methods/properties are used to construct job scripts for any program we want to run on SGE.
- Host = 'SGE_O_HOST'
- JobID = 'JOB_ID'
- NumProcs = 'NSLOTS'
- OptionCmd = '$'
- TaskID = 'SGE_TASK_ID'
- TaskLast = 'SGE_TASK_LAST'
- classmethod array_job(njobs: int) str
Returns the line in the job script that specifies this job is an array job. These jobs are run at the same time in parallel as they do not depend on one another. An example will be running 50 Gaussian or AIMALL jobs at the same time without having to submit 50 separate jobs. Instead 1 array job can be submitted.
- classmethod change_working_directory(path: Path) str
Return the line in the job script definning the working directory from where the job is going to run.
- static current_node() NodeType
Return the current type of node ichor is running on SGE defines the SGE_O_HOST when running on a compute node
- delete_job_command = ['qdel']
- classmethod error_directory(path: Path, task_array: bool = False) str
Return the line in the job script defining the error directory where any errors from the job should be written to. These files end in .e{job_id}.
- classmethod hold_job(job_id: JobID | List[JobID]) List[str]
Return a list containing hold_jid keyword and job id which is used to hold a particular job id for it to be ran at a later time.
- static is_present() bool
Check if SGE is present on the current machine ICHOR is running on.
- classmethod max_running_tasks(max_running_tasks: int) str
Returns the flag to se the maximum number of running tasks for a job
- classmethod node_options(include_nodes: List[str], exclude_nodes: List[str]) str
- classmethod output_directory(path: Path, task_array: bool = False) str
Return the line in the job script defining the output directory where the output of the job should be written to. These files end in .o{job_id}.
- classmethod parallel_environment(ncores: int) str | None
Returns the line in the job script defining the number of cores to be used for the job.
- classmethod parse_job_id(stdout) str
Example script submission using SGE:
$ qsub test.sh > Your job 518753 ("test.sh") has been submitted ^^^^^^The job id is given by the number, this is parsed by finding the number in the return string
- static status() List[str]
Return a list containing command used to check status of jobs on SGE batch system.
- submit_script_command = ['qsub']
- init_batch_system()