freySite
the life and times of J T Frey

gqueue

The gqueue utility is used to simplify the task of submitting computational jobs to the Sun GridEngine (SGE) queue environment. SGE runs parallel jobs in a special context called a parallel environment. A parallel environment controls how a multi-node job is started and stopped. Serial jobs need not run inside a parallel environment. However, in all cases a job must be submitted to SGE as a queue script: an executable shell script that contains additional commands that influence how SGE will handle the job in addition to the actual commands associated with the computational task. gqueue removes the burden of writing an SGE submission script each time a user wishes to run a calculation: it writes the script for the user.

GQueue is written in straightforward C and makes use of a single external library, libxml.

Configuration

By default, gqueue puts its configuration in /etc/gqueue.d; this can be modified during the build process. Inside that directory should be a file named pe.conf which contains the XML definition of the various job environments:

<?xml version="1.0"?>
<pe_lists>
  <pe_list id="g03c02" script-base="/etc/gqueue.d/scripts/g03c02" implicit-single="yes">
    <pe id="gaussian_dual" cpu-count="2" round-up="no"/>
    <pe id="gaussian_quad" cpu-count="4" round-up="yes"/>
  </pe_list>
</pe_lists>

Each parallel environment list ("pe_list") can have zero or more specific parallel environments associated with it. Each parallel environment is distinguished by a GridEngine parallel environment name ("id") and the high-end of the range of requested processors that should trigger the use of that parallel environment. The range builds upon the previous highest count. In the example XML above, the "g03c02" parallel environment list defines two parallel environments that should be used:

  • If a single processor was requested, then the job is submitted without a parallel environment selected ("implicit-single")
  • If two processors were requested, the "gaussian_dual" parallel environment is selected
  • If three or four processors were requested the count is rounded up to four (''round-up'') and the "gaussian_quad" parallel environment is selected

The "script-base" attribute in a "pe_list" defines where gqueue should look for supplementary programs (executables or scripts) that:

Exe/Script Name Purpose
[script-base]/pre Run before gqueue attempts to process the input file
[script-base]/validate-pe Last-minute substitution for the parallel environment that was chosen for the job
[script-base]/merge Add the job-specific region to the generated queue script
[script-base]/post Run after gqueue has constructed the submission script is is ready to submit it

For example, the "g03c02" "pe_list" uses a "pre" script to extract a "%nproc=#" line from the Gaussian input file passed to gqueue. The "merge" script sets-up the Gaussian environment for revision C02 of Gaussian '03 and inserts the commands to actually run the Gaussian job.

Enviroment Variables

Each of the SGE job processing options that gqueue handles has an associated environment variable. If the environment variable is defined, then when gqueue is run it automatically starts with the parameter values provided in the shell environment. This can be helpful if, for example, the user wishes to have an email delivered after completion of every job without explicitly entering an email address on the command line, or if every job on a cluster of homogeneous dual-processor nodes should be run with two processors requested.

Environment Variable Valid Values Default Description
GQUEUE_VERBOSE 0 or 1 0 Should the **gqueue** utility display additional information as it writes the job script?
GQUEUE_SAVEOUTPUT 0 or 1 0 Should the job script not include a trailing line to remove the GridEngine output file for this job?
GQUEUE_SAVESCRIPT 0 or 1 0 Should the job script not be deleted after the job completes?
GQUEUE_EMAILADDR user@domain.com not set GridEngine should deliver job status change notification emails to this address.
GQUEUE_QUEUEDIR a file path ~/.gqueue **gqueue** should store the generated job script in this directory.
GQUEUE_SHELLPATH a file path /bin/sh The shell that should be invoked to run the generated queue script; note that this really shouldn't be changed since each specific script generator may use a particular shell for the script (e.g. python for Dacapo jobs).
GQUEUE_NPROC an integer > 0 not set Without providing an explicit processor count in this variable or on the command line, each script generator may set the processor count based on the contents of your input file, e.g.
GQUEUE_PRIORITY an integer -50 The GridEngine priority at which this job should be submitted.
GQUEUE_MEMSIZE a memory size (see below) not set Providing an explicit value (in bytes) for this property will attach a hard memory resource requirement to the job script.

Memory Size Specification

Passing memory sizes to gqueue can be done with the following formats (formats are shown as regular expressions):

Format Meaning
[0-9]+ A byte count
[0-9]+[kKmMgG]?[bB] A byte count with an optional unit prefix
[0-9]+[kKmMgG]?[wW] A word count with an optional unit prefix

A word is defined by gqueue to be 8 bytes in size. The unit prefixes translate to:

Prefix Character Multiplier
k 1000
K 1024
m 1000000
M 1048576
g 1000000000
G 1073741824

gqueue always rounds to the nearest MB (since this is what GridEngine expects).

Generated Content Storage Locales

gqueue creates in the user's home directory a special directory to hold all of its generated queue scripts as they execute. By default, the directory is named ".gqueue" but the default name may be changed when the utility is compiled or the user may choose an alternate directory at runtime via a command-line option. Whatever the name of this directory (or directories) it (they) should not be moved, deleted, or renamed while it (they) contain(s) an active queue script.

Queue Script Naming Conventions

The queue scripts are always named using the basename of the input file the user submits. SGE will not accept queue scripts which have filenames that begin with a numerical charac- ter; if this is the case, then the basename has a "gqs_" prefix and a unique eight-character suffix. Otherwise, the script is named by appending ".gqs_" and a unique eight-character suffix to the basename. Queue scripts automatically delete themselves at the end of a suc- cessful job unless you request otherwise via a command-line option. SGE output files (with a ".o#" suffix) are also automatically removed unless the user requests otherwise.

Running in Immediate Mode

If gqueue is run without an input file provided on the command line, it attempts to read the computational body of the queue script from standard input: your terminal will hang waiting for you to type the body of the script followed by an EOF (control-D) character. You can press control-C to kill gqueue if you mistakenly end up in this mode.

Links

Written by Jeff Frey on Monday February 29, 2016
Permalink -

« Programming - Where was Star Wars headed? »