Handling Fluent: CCAs, JSVs, Starter Methods and Re-queueing
1. |
JSV Notes |
-- http://arc.liv.ac.uk/SGE/htmlman/htmlman1/jsv.html
-- submit parameters (e.g., pe_name, pe_min, pe_max, ...)
http://wikis.sun.com/display/gridengine62u5/Submit+Parameters
-- can we use a JSV to put all the "-l" stuff into job ENV?
-- yes, use jsv_add_env, e.g. in jsv script
jsv_add_env "JSV_EXAMPLE" "jsv_value_33"
then in qsub script
echo "//$JSV_EXAMPLE//"
From the man page (above) :
1) qsub -jsv ...
2) $cwd/.sge_request
3) $HOME/.sge_request
4) $SGE_ROOT/$SGE_CELL/common/sge_request
5) Global configuration
The Client JSVs (1-3) can be defined by Grid Engine end users, whereas
the client JSV defined in the global sge_request file (4) and the
server JSV (5) can only be defined by the Grid Engine administrators.
Due to the fact that (4) and (5) are defined and configured by Grid
Engine administrators and executed as the last JSV instances in the
sequence of JSV scripts, an administrator has a way to enforce certain
policies for a cluster. However, note that (4) may be over-ridden
trivially with qsub -clear.
2. |
The Issues |
- PE Integration
- We need SGE and Fluent to talk to eachother to ensure that Fluent starts the SGE-determined number of processes and starts them on the right compute nodes. We also want to ensure processes are tidied up correctly at the end.
- Licensing
- Fluent uses FlexLM with floating network licences provided by the Uni licence servers. We don't want jobs to be scheduled, then fail to run because of a lack of licences. The School of MACE own all the licences.
3. |
PE Integration |
Fluent is SGE-aware: suitable PE scripts are supplied and qsub is not required. For example:
fluent 3d -g -t64 -pnet -ssh -sge -sgepe fluent-32.pe 64 -i input.jou
where fluent-32.pe include's a Fluent-supplied kill script
pe_name fluent-32.pe
slots 999
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /opt/gridware/apps/binapps/fluent/6.3.26/Fluent.Inc/addons/sge1.0/kill-fluent
allocation_rule 32
control_slaves FALSE
job_is_first_task FALSE
urgency_slots min
accounting_summary FALSE
The Fluent-supplied kill script:
#!/bin/sh
FileName=kill-fluent-sge-$JOB_ID
OldFileName=kill-fluent$JOB_ID ## For backward compatibility with Fluent6.1
if [ -w $PWD ] ; then
FileName=$PWD/$FileName
OldFileName=$PWD/$OldFileName
else
FileName=$HOME/$FileName
OldFileName=/tmp/$OldFileName
fi
if [ -f $FileName ]; then
/bin/sh $FileName
elif [ -f $OldFileName ]; then
/bin/sh $OldFileName
fi
4. |
Background: Fluent's Licencing Model |
Using lmstat, e.g.,
lmstat -c fluent_licence.dat -a
or
lmstat -c fluent_licence.dat -f fluent-par
shows how many licence tickets are available and how many have been issued
(and who has checked them out and on which host).
MACE have some fluent tickets — which seem to be the same as fluentall; also some fluent-par tickets. A query to Ansys on 2011 Sept 22 gave some details on how it all works:
I am trying to determine how the fluent licence model works
for parallel jobs. Is this correct: a 16-core process needs
16 licences/tickets; by preference it gets one fluent/fluentall
ticket and 15 fluent-par tickets; if insufficient fluent-par
tickets are available it tops up with fluent/fluentall tickets to
get to 16 if possible. If only, say, 12 total are available the
job will start with 12 processes only. Is this correct?
The answer was:
Your understanding is absolutely correct.
5. |
Possible Strategies |
Dan T summarises this nicely on his blog:
- Consumable Complex Resource Attributes
-
Use SGE's consumable complex resource attributes (which are
user-requested via the hard resource list — "-l thing").
Simply defined CCRAs which match the FlexLM tickets and require users to
ask for the resources they need, for example
#!/bin/bash #$ -pe fluent-smp.pe 4 #$ -l fluentall=0.25 #$ -l fluent-par=0.75 # ...numbers are multiplied (by SGE) by NSLOTS...Use a JSV to ensure request the tickets they need.
Drawbacks:- But what about the tickets being used by off-cluster hosts?
- There is a race condition:
- Custom Job Starter and Job Rescheduling
-
Use a qsub script to start Fluent: within that script, check if sufficient
licences are available; if not, return 99, so that the job is
re-scheduled (see SGE Admin and User Guide: Consequences of Different
Error or Exit Codes); otherwise, start the job.
Drawbacks:- Do we really want a whole load of Fluent jobs sitting the queue re-scheduling themselves?
- Race condition: e.g., an off-cluster Fluent instance could grab the tickets.
- FlexNet (FlexLM) Connector
-
An API for checking out and returning licenses. Perfect(?): a batch system
can check out licences as part of the scheduling.
Drawbacks:- Not widely adopted; not adopted by SGE.
- Olesen Integration (qlicserver aka flex-grid)
-
A daemon runs which frequently determines the number of tickets available
and adjusts the matching CRCAs to match (e.g., by using qconf). This is
detailed
by Olesen. (Olesen's implementation uses SGE's load sensor framework
to run the daemon.)
Drawbacks:- Race condition: the CRCA values will sometimes be out-of-date as off-cluster instances of Fluent start.
- Not clear how we could handle different numbers of tickets being available for different users?
6. |
Our Approach |
6.1. |
Complications |
- Not all available Fluent licence tickets are necessarily available to each user (e.g., there may be 32 fluent-par tickets, but a limit of 20 such tickets per user).
- No all tickets are necessarily available to each compute host. Moreover, FlexLM sees only the UNqualified hostname.
6.2. |
Steps |
- Use an approach based on the re-scheduling script below. It is hoped to move most of the script complexity into a SGE queue starter method (which shared its environment with the job script).
-- look at possibility of determining of user-specific and cluster-specific ticket limits can be added to the flex-grid approach together with... ...use "-l fluent=32" or similar... ...and, moving the "-l" values into the ENV via a JSV, do a second starter_method check with re-schedule N.B. Using "-l fluent=32" is pointless unless using qconf-adjusted CRCAs as we can simply use NSLOTS (as in the script below)-- ensure all fluent jobs are submitted to the right PE and queue (the latter to get the starter_method) via a JSV
7. |
Example, Tested Script |
The following qsub script uses lmstat to enquire of the licence servers how many Fluent licence tickets are available (fluentall and fluent-par), then determines if it is worth attempting to run the Fluent job. If insufficient licences are available, the script returns 99 which instructs SGE to re-schedule the job. (The script has been — successfuly — tested in a production environment.)
#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -q mixed.q
#$ -pe fluent.pe 4
## -- ** The complications here to be moved to a (prolog script or) starter_method?
## -- ** The complications here to be moved to a (prolog script or) starter_method?
## #$ -l fluentall=1
## #$ -l fluent-par=$NSLOTS-1
# See also: http://blogs.oracle.com/templedf/entry/license_management_with_grid_engine
#
# -- suggests, since we don't want many jobs being re-queued, to prevent this mostly by
# monitoring the number of available licences (via a daemon and lmstat) and using this
# update a consumable complex value corresponding to number of licences which stops
# job getting scheduled in the first place...
LIC_REQ_ALL=1
LIC_WANT_PAR=$(($NSLOTS-1))
#
# Ansys say this is correct:
#
# a 16-core process needs
# 16 licences/tickets; by preference it gets one fluent/fluentall
# ticket and 15 fluent-par tickets; if insufficient fluent-par
# tickets are available it tops up with fluent/fluentall tickets to
# get to 16 if possible. If only, say, 12 total are available the
# job will start with 12 processes only.
#
LIC_ISSUED_ALL=`./lmstat -c fluent.dat -f fluentall | grep "Users of" | awk '{print $6}'`
LIC_USED_ALL=`./lmstat -c fluent.dat -f fluentall | grep "Users of" | awk '{print $11}'`
LIC_ISSUED_PAR=`./lmstat -c fluent.dat -f fluent-par | grep "Users of" | awk '{print $6}'`
LIC_USED_PAR=`./lmstat -c fluent.dat -f fluent-par | grep "Users of" | awk '{print $11}'`
LIC_AVAIL_ALL=$(($LIC_ISSUED_ALL-$LIC_USED_ALL))
LIC_AVAIL_PAR=$(($LIC_ISSUED_PAR-$LIC_USED_PAR))
echo ""
echo "licences-par issued : $LIC_ISSUED_PAR"
echo "licences-par used : $LIC_USED_PAR"
echo ""
echo "licences(all) issued : $LIC_ISSUED_ALL"
echo "licences(all) used : $LIC_USED_ALL"
echo ""
LIC_AVAIL=$(($LIC_AVAIL_PAR+$LIC_AVAIL_ALL))
LIC_WANT=$(($LIC_REQ_ALL+$LIC_WANT_PAR))
echo "licences total avail: $LIC_AVAIL";
echo "licences total want : $LIC_WANT";
if [ "$LIC_AVAIL" -lt "$LIC_WANT" ] ; then
echo "exit with 99"
exit 99
# SGE Admin and User Guide, "Consequences of Different Error or Exit Codes":
# Job script or prolog/epilog: 99 --> Requeue
fi
#/software/Fluent.Inc/bin/fluent 3d -g -t$NSLOTS -ssh -i input.jou
echo "exit with 0"
exit 0
8. |
Example, Tested Script Too: Research Licences Only |
#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -pe fluent-smp.pe 12
LIC_REQ=$NSLOTS
#
# http://www.ace-net.ca/wiki/FLUENT:
#
# -- a "fluentall" token always required (or should this be "fluent" --- what is the difference?)
# -- fluent requires a "fluent-par" token for each parallel slot over and above the initial slot, e.g., N-1
#
#
# See aslo: http://blogs.oracle.com/templedf/entry/license_management_with_grid_engine
#
# -- suggests, since we don't want many jobs being re-queued, to prevent this mostly by
# monitoring the number of available licences (via a daemon and lmstat) and using this
# update a consumable complex value corresponding to number of licences which stops
# job getting scheduled in the first place...
LMSTAT=/opt/gridware/apps/binapps/fluent/13.0/v130/fluent/license/lnamd64/lmstat
LICENSEDAT=/opt/gridware/apps/binapps/fluent/13.0/v130/fluent/license/license.dat
LIC_ISSUED=`$LMSTAT -c $LICENSEDAT -f aa_r_cfd | grep "Users of" | awk '{print $6}'`
LIC_USED=`$LMSTAT -c $LICENSEDAT -f aa_r_cfd | grep "Users of" | awk '{print $11}'`
LIC=$(($LIC_ISSUED-$LIC_USED-$LIC_REQ))
echo "licences issued : $LIC_ISSUED"
echo "licences used : $LIC_USED"
echo "licences reqd. : $NSLOTS"
echo "licences spare : $LIC"
#### /bin/sleep 300
if [ "$LIC" -lt 0 ]; then
echo "exit with 99"
exit 99
# SGE Admin and User Guide, "Consequences of Different Error or Exit Codes":
# Job script or prolog/epilog: 99 --> Requeue
fi
/opt/gridware/apps/binapps/fluent/13.0/v130/fluent/bin/fluent 3d -g -t$NSLOTS -ssh -i input.jou
exit 0