ichor.hpc.batch_system package

Submodules

ichor.hpc.batch_system.batch_system module

class BatchSystem

Bases: ABC

An abstract base class for batch systems which are the systems used to submit jobs to compute nodes (for example Sun Grid Engine.)

Host = None
JobID = None
NumProcs = None
OptionCmd = None
TaskID = None
TaskLast = None
abstract classmethod array_job(njobs: int) str

Returns the flag to set the number of tasks for a job

abstract classmethod change_working_directory(path: Path) str

“ Changes the working directory

abstract static current_node() NodeType

Return the type of the node ichor is currently running on e.g. NodeType.ComputeNode

classmethod delete(job: JobID)

Delete submitted jobs on the batch system.

classmethod delete_job(job_id: JobID) str

Delete job submitted to compute node.

delete_job_command = None
abstract classmethod error_directory(path: Path, task_array: bool = False) str

Changes the error directory where (these are .e files)

abstract classmethod get_queued_jobs() List[Job]
abstract classmethod hold_job(job: JobID | List[JobID])

Hold a job in order for it to be ran at another time/ after another job has finished running.

abstract static is_present() bool
abstract classmethod max_running_tasks(max_running_tasks: int) str

Returns the flag to se the maximum number of running tasks for a job

abstract classmethod node_options(include_nodes: List[str], exclude_nodes: List[str]) str
abstract classmethod output_directory(path: Path, task_array: bool = False) str

Changes the output directory where (these are .o files)

abstract classmethod parallel_environment(ncores: int) str | None

Returns the flag to set the parallel environment for the job

abstract classmethod parse_job_id(stdout: str) str
abstract static status() str

Returns the status of running jobs.

classmethod submit_script(job_script: Path, hold: JobID | List[JobID] | None = None) JobID

Submit a job script to the batch system in order to queue/run jobs.

submit_script_command = None

ichor.hpc.batch_system.jobs module

exception CannotParseJobID

Bases: Exception

class Job(id: str, priority: float, name: str, user: str, state: str, start: datetime, queue: str, slots: int, task_id: str | None = None)

Bases: VarReprMixin

class JobID(script: str | Path, id: str)

Bases: object

Class used to keep track of jobs submitted to compute nodes.

Parameters:
  • script – A path to a script file such as GAUSSIAN.sh that will be submitted to compute node.

  • id – The job id given to the job when the job was submitted to a compute node.

  • instance – the unique identified (UUID) that is used for the job’s datafile (containing the names of all the files needed for the job).

write(path: str | Path)

ichor.hpc.batch_system.local module

class LocalBatchSystem

Bases: object

LocalBatchSystem is to only be used for debugging purposes (Unless one wants to implement a batch system to run on a local machine… would be a nice addition)

property OptionCmd: str
classmethod array_job(njobs: int, max_running_tasks: int | None = None) str

Returns the line in the job script that specifies this job is an array job. These jobs are run at the same time in parallel as they do not depend on one another. An example will be running 50 Gaussian or AIMALL jobs at the same time without having to submit 50 separate jobs. Instead 1 array job can be submitted.

classmethod change_working_directory(dir)
static current_node() NodeType
delete_job_command = ['echo']
classmethod error_directory(d, task_array=False)
classmethod output_directory(d, task_array=False)
classmethod parallel_environment(cores)
classmethod parse_job_id(stdout) List[str]
static status() List[str]
submit_script_command = ['echo']

ichor.hpc.batch_system.node module

class NodeType(value)

Bases: Enum

An enumeration.

ComputeNode = 'compute'
LoginNode = 'login'

ichor.hpc.batch_system.parallel_environment module

class ParallelEnvironment

Bases: RangeDict

A dictionary containing key:value pairs in which the key is a keyword used by the submission system to specify the number of cores and the value is a tuple containing a lower and upper bound for the number of cores. Once

ichor.hpc.batch_system.sge module

class JobStatus(value)

Bases: EnumStrList

An enumeration.

Deleting = ['dr', 'dt', 'dRr', 'dRt', 'ds', 'dS', 'dT', 'dRs', 'dRS', 'dRT']
Error = ['Eqw', 'Ehqw', 'EhRqw']
Holding = ['hqw', 'hRqw']
Pending = ['qw']
Resubmit = ['Rr', 'Rt']
Running = ['r']
Suspended = ['s']
Transferring = ['t']
class SunGridEngine

Bases: BatchSystem

A class that implements methods ICHOR uses to submit jobs to the Sun Grid Engine (SGE) batch system. These methods/properties are used to construct job scripts for any program we want to run on SGE.

Host = 'SGE_O_HOST'
JobID = 'JOB_ID'
NumProcs = 'NSLOTS'
OptionCmd = '$'
TaskID = 'SGE_TASK_ID'
TaskLast = 'SGE_TASK_LAST'
classmethod array_job(njobs: int) str

Returns the line in the job script that specifies this job is an array job. These jobs are run at the same time in parallel as they do not depend on one another. An example will be running 50 Gaussian or AIMALL jobs at the same time without having to submit 50 separate jobs. Instead 1 array job can be submitted.

classmethod change_working_directory(path: Path) str

Return the line in the job script definning the working directory from where the job is going to run.

static current_node() NodeType

Return the current type of node ichor is running on SGE defines the SGE_O_HOST when running on a compute node

delete_job_command = ['qdel']
classmethod error_directory(path: Path, task_array: bool = False) str

Return the line in the job script defining the error directory where any errors from the job should be written to. These files end in .e{job_id}.

classmethod get_queued_jobs() List[Job]
classmethod hold_job(job_id: JobID | List[JobID]) List[str]

Return a list containing hold_jid keyword and job id which is used to hold a particular job id for it to be ran at a later time.

static is_present() bool

Check if SGE is present on the current machine ICHOR is running on.

classmethod max_running_tasks(max_running_tasks: int) str

Returns the flag to se the maximum number of running tasks for a job

classmethod node_options(include_nodes: List[str], exclude_nodes: List[str]) str
classmethod output_directory(path: Path, task_array: bool = False) str

Return the line in the job script defining the output directory where the output of the job should be written to. These files end in .o{job_id}.

classmethod parallel_environment(ncores: int) str | None

Returns the line in the job script defining the number of cores to be used for the job.

classmethod parse_job_id(stdout) str

Example script submission using SGE:

$ qsub test.sh
> Your job 518753 ("test.sh") has been submitted
           ^^^^^^

The job id is given by the number, this is parsed by finding the number in the return string

static status() List[str]

Return a list containing command used to check status of jobs on SGE batch system.

submit_script_command = ['qsub']

ichor.hpc.batch_system.slurm module

class JobStatus(value)

Bases: EnumStrList

An enumeration.

Deleting = ['RD', 'CG']
Error = ['F', 'ST', 'TO', 'OOM', 'NF', 'BF', 'CA']
Holding = ['RH']
Pending = ['PD']
Resubmit = []
Running = ['R']
Suspended = ['S']
Transferring = []
class SLURM

Bases: BatchSystem

A class that implements methods ICHOR uses to submit jobs to the Sun Grid Engine (SGE) batch system. These methods/properties are used to construct job scripts for any program we want to run on SGE.

Host = 'SLURM_SUBMIT_HOST'
JobID = 'SLURM_JOBID'
NumProcs = 'SLURM_NPROCS'
OptionCmd = 'SBATCH'
TaskID = 'SLURM_ARRAY_TASK_ID'
TaskLast = 'SLURM_ARRAY_TASK_COUNT'
classmethod array_job(njobs: int, max_running_tasks: int | None = None) str

Returns the line in the job script that specifies this job is an array job. These jobs are run at the same time in parallel as they do not depend on one another. An example will be running 50 Gaussian or AIMALL jobs at the same time without having to submit 50 separate jobs. Instead 1 array job can be submitted.

classmethod change_working_directory(path: Path) str

Return the line in the job script definning the working directory from where the job is going to run.

static current_node() NodeType

Return the current type of node ichor is running on SLURM defines the SLURM_SUBMIT_HOST when running on a compute node

delete_job_command = ['scancel']
classmethod error_directory(path: Path, task_array: bool = False) str

Return the line in the job script defining the error directory where any errors from the job should be written to. These files end in .e{job_id}.

classmethod get_queued_jobs() List[Job]
classmethod hold_job(job_id: JobID | List[JobID]) List[str]

Return a list containing hold_jid keyword and job id which is used to hold a particular job id for it to be ran at a later time. https://hpc.nih.gov/docs/job_dependencies.html

static is_present() bool

Check if SLURM is present on the current machine ICHOR is running on.

classmethod max_running_tasks(max_running_tasks: int) str

Returns the flag to se the maximum number of running tasks for a job

classmethod node_options(include_nodes: List[str], exclude_nodes: List[str]) str
classmethod output_directory(path: Path, task_array: bool = False) str

Return the line in the job script defining the output directory where the output of the job should be written to. These files end in .o{job_id}.

classmethod parallel_environment(ncores: int) str | None

Returns the line in the job script defining the number of corest to be used for the job.

classmethod parse_job_id(stdout) str

Example script submission using SLURM:

$ sbatch test.sh
> Submitted batch job 345234
                      ^^^^^^

Our job id is the final number in the stdout

static status() List[str]

Return a list containing command used to check status of jobs on SGE batch system.

submit_script_command = ['sbatch']

ichor.hpc.batch_system.utils module

delete_jobs()

Delete all jobs that were queued up to run. This function reads the ichor.hpc.global_variables.FILE_STRUCTURE[“jid”] file, which contains the names of all submitted jobs.

display_status_of_running_jobs()
get_current_jobs() List[Job]
read_jid(jid_file: Path | None = None) List[JobID]

Module contents

class Job(id: str, priority: float, name: str, user: str, state: str, start: datetime, queue: str, slots: int, task_id: str | None = None)

Bases: VarReprMixin

class JobID(script: str | Path, id: str)

Bases: object

Class used to keep track of jobs submitted to compute nodes.

Parameters:
  • script – A path to a script file such as GAUSSIAN.sh that will be submitted to compute node.

  • id – The job id given to the job when the job was submitted to a compute node.

  • instance – the unique identified (UUID) that is used for the job’s datafile (containing the names of all the files needed for the job).

write(path: str | Path)
class LocalBatchSystem

Bases: object

LocalBatchSystem is to only be used for debugging purposes (Unless one wants to implement a batch system to run on a local machine… would be a nice addition)

property OptionCmd: str
classmethod array_job(njobs: int, max_running_tasks: int | None = None) str

Returns the line in the job script that specifies this job is an array job. These jobs are run at the same time in parallel as they do not depend on one another. An example will be running 50 Gaussian or AIMALL jobs at the same time without having to submit 50 separate jobs. Instead 1 array job can be submitted.

classmethod change_working_directory(dir)
static current_node() NodeType
delete_job_command = ['echo']
classmethod error_directory(d, task_array=False)
classmethod output_directory(d, task_array=False)
classmethod parallel_environment(cores)
classmethod parse_job_id(stdout) List[str]
static status() List[str]
submit_script_command = ['echo']
class NodeType(value)

Bases: Enum

An enumeration.

ComputeNode = 'compute'
LoginNode = 'login'
class ParallelEnvironment

Bases: RangeDict

A dictionary containing key:value pairs in which the key is a keyword used by the submission system to specify the number of cores and the value is a tuple containing a lower and upper bound for the number of cores. Once

class SLURM

Bases: BatchSystem

A class that implements methods ICHOR uses to submit jobs to the Sun Grid Engine (SGE) batch system. These methods/properties are used to construct job scripts for any program we want to run on SGE.

Host = 'SLURM_SUBMIT_HOST'
JobID = 'SLURM_JOBID'
NumProcs = 'SLURM_NPROCS'
OptionCmd = 'SBATCH'
TaskID = 'SLURM_ARRAY_TASK_ID'
TaskLast = 'SLURM_ARRAY_TASK_COUNT'
classmethod array_job(njobs: int, max_running_tasks: int | None = None) str

Returns the line in the job script that specifies this job is an array job. These jobs are run at the same time in parallel as they do not depend on one another. An example will be running 50 Gaussian or AIMALL jobs at the same time without having to submit 50 separate jobs. Instead 1 array job can be submitted.

classmethod change_working_directory(path: Path) str

Return the line in the job script definning the working directory from where the job is going to run.

static current_node() NodeType

Return the current type of node ichor is running on SLURM defines the SLURM_SUBMIT_HOST when running on a compute node

delete_job_command = ['scancel']
classmethod error_directory(path: Path, task_array: bool = False) str

Return the line in the job script defining the error directory where any errors from the job should be written to. These files end in .e{job_id}.

classmethod get_queued_jobs() List[Job]
classmethod hold_job(job_id: JobID | List[JobID]) List[str]

Return a list containing hold_jid keyword and job id which is used to hold a particular job id for it to be ran at a later time. https://hpc.nih.gov/docs/job_dependencies.html

static is_present() bool

Check if SLURM is present on the current machine ICHOR is running on.

classmethod max_running_tasks(max_running_tasks: int) str

Returns the flag to se the maximum number of running tasks for a job

classmethod node_options(include_nodes: List[str], exclude_nodes: List[str]) str
classmethod output_directory(path: Path, task_array: bool = False) str

Return the line in the job script defining the output directory where the output of the job should be written to. These files end in .o{job_id}.

classmethod parallel_environment(ncores: int) str | None

Returns the line in the job script defining the number of corest to be used for the job.

classmethod parse_job_id(stdout) str

Example script submission using SLURM:

$ sbatch test.sh
> Submitted batch job 345234
                      ^^^^^^

Our job id is the final number in the stdout

static status() List[str]

Return a list containing command used to check status of jobs on SGE batch system.

submit_script_command = ['sbatch']
class SunGridEngine

Bases: BatchSystem

A class that implements methods ICHOR uses to submit jobs to the Sun Grid Engine (SGE) batch system. These methods/properties are used to construct job scripts for any program we want to run on SGE.

Host = 'SGE_O_HOST'
JobID = 'JOB_ID'
NumProcs = 'NSLOTS'
OptionCmd = '$'
TaskID = 'SGE_TASK_ID'
TaskLast = 'SGE_TASK_LAST'
classmethod array_job(njobs: int) str

Returns the line in the job script that specifies this job is an array job. These jobs are run at the same time in parallel as they do not depend on one another. An example will be running 50 Gaussian or AIMALL jobs at the same time without having to submit 50 separate jobs. Instead 1 array job can be submitted.

classmethod change_working_directory(path: Path) str

Return the line in the job script definning the working directory from where the job is going to run.

static current_node() NodeType

Return the current type of node ichor is running on SGE defines the SGE_O_HOST when running on a compute node

delete_job_command = ['qdel']
classmethod error_directory(path: Path, task_array: bool = False) str

Return the line in the job script defining the error directory where any errors from the job should be written to. These files end in .e{job_id}.

classmethod get_queued_jobs() List[Job]
classmethod hold_job(job_id: JobID | List[JobID]) List[str]

Return a list containing hold_jid keyword and job id which is used to hold a particular job id for it to be ran at a later time.

static is_present() bool

Check if SGE is present on the current machine ICHOR is running on.

classmethod max_running_tasks(max_running_tasks: int) str

Returns the flag to se the maximum number of running tasks for a job

classmethod node_options(include_nodes: List[str], exclude_nodes: List[str]) str
classmethod output_directory(path: Path, task_array: bool = False) str

Return the line in the job script defining the output directory where the output of the job should be written to. These files end in .o{job_id}.

classmethod parallel_environment(ncores: int) str | None

Returns the line in the job script defining the number of cores to be used for the job.

classmethod parse_job_id(stdout) str

Example script submission using SGE:

$ qsub test.sh
> Your job 518753 ("test.sh") has been submitted
           ^^^^^^

The job id is given by the number, this is parsed by finding the number in the return string

static status() List[str]

Return a list containing command used to check status of jobs on SGE batch system.

submit_script_command = ['qsub']
init_batch_system()