Handling Fluent: CCAs, JSVs, Starter Methods and Re-queueing
1. |
JSV Notes |
-- http://arc.liv.ac.uk/SGE/htmlman/htmlman1/jsv.html -- submit parameters (e.g., pe_name, pe_min, pe_max, ...) http://wikis.sun.com/display/gridengine62u5/Submit+Parameters -- can we use a JSV to put all the "-l" stuff into job ENV? -- yes, use jsv_add_env, e.g. in jsv script jsv_add_env "JSV_EXAMPLE" "jsv_value_33" then in qsub script echo "//$JSV_EXAMPLE//"
From the man page (above) :
1) qsub -jsv ... 2) $cwd/.sge_request 3) $HOME/.sge_request 4) $SGE_ROOT/$SGE_CELL/common/sge_request 5) Global configuration The Client JSVs (1-3) can be defined by Grid Engine end users, whereas the client JSV defined in the global sge_request file (4) and the server JSV (5) can only be defined by the Grid Engine administrators. Due to the fact that (4) and (5) are defined and configured by Grid Engine administrators and executed as the last JSV instances in the sequence of JSV scripts, an administrator has a way to enforce certain policies for a cluster. However, note that (4) may be over-ridden trivially with qsub -clear.
2. |
The Issues |
- PE Integration
- We need SGE and Fluent to talk to eachother to ensure that Fluent starts the SGE-determined number of processes and starts them on the right compute nodes. We also want to ensure processes are tidied up correctly at the end.
- Licensing
- Fluent uses FlexLM with floating network licences provided by the Uni licence servers. We don't want jobs to be scheduled, then fail to run because of a lack of licences. The School of MACE own all the licences.
3. |
PE Integration |
Fluent is SGE-aware: suitable PE scripts are supplied and qsub is not required. For example:
fluent 3d -g -t64 -pnet -ssh -sge -sgepe fluent-32.pe 64 -i input.jouwhere fluent-32.pe include's a Fluent-supplied kill script
pe_name fluent-32.pe slots 999 user_lists NONE xuser_lists NONE start_proc_args /bin/true stop_proc_args /opt/gridware/apps/binapps/fluent/6.3.26/Fluent.Inc/addons/sge1.0/kill-fluent allocation_rule 32 control_slaves FALSE job_is_first_task FALSE urgency_slots min accounting_summary FALSE
The Fluent-supplied kill script:
#!/bin/sh FileName=kill-fluent-sge-$JOB_ID OldFileName=kill-fluent$JOB_ID ## For backward compatibility with Fluent6.1 if [ -w $PWD ] ; then FileName=$PWD/$FileName OldFileName=$PWD/$OldFileName else FileName=$HOME/$FileName OldFileName=/tmp/$OldFileName fi if [ -f $FileName ]; then /bin/sh $FileName elif [ -f $OldFileName ]; then /bin/sh $OldFileName fi
4. |
Background: Fluent's Licencing Model |
Using lmstat, e.g.,
lmstat -c fluent_licence.dat -aor
lmstat -c fluent_licence.dat -f fluent-parshows how many licence tickets are available and how many have been issued (and who has checked them out and on which host).
MACE have some fluent tickets — which seem to be the same as fluentall; also some fluent-par tickets. A query to Ansys on 2011 Sept 22 gave some details on how it all works:
I am trying to determine how the fluent licence model works for parallel jobs. Is this correct: a 16-core process needs 16 licences/tickets; by preference it gets one fluent/fluentall ticket and 15 fluent-par tickets; if insufficient fluent-par tickets are available it tops up with fluent/fluentall tickets to get to 16 if possible. If only, say, 12 total are available the job will start with 12 processes only. Is this correct?The answer was:
Your understanding is absolutely correct.
5. |
Possible Strategies |
Dan T summarises this nicely on his blog:
- Consumable Complex Resource Attributes
-
Use SGE's consumable complex resource attributes (which are
user-requested via the hard resource list — "-l thing").
Simply defined CCRAs which match the FlexLM tickets and require users to
ask for the resources they need, for example
#!/bin/bash #$ -pe fluent-smp.pe 4 #$ -l fluentall=0.25 #$ -l fluent-par=0.75 # ...numbers are multiplied (by SGE) by NSLOTS...
Use a JSV to ensure request the tickets they need.
Drawbacks:- But what about the tickets being used by off-cluster hosts?
- There is a race condition:
- Custom Job Starter and Job Rescheduling
-
Use a qsub script to start Fluent: within that script, check if sufficient
licences are available; if not, return 99, so that the job is
re-scheduled (see SGE Admin and User Guide: Consequences of Different
Error or Exit Codes); otherwise, start the job.
Drawbacks:- Do we really want a whole load of Fluent jobs sitting the queue re-scheduling themselves?
- Race condition: e.g., an off-cluster Fluent instance could grab the tickets.
- FlexNet (FlexLM) Connector
-
An API for checking out and returning licenses. Perfect(?): a batch system
can check out licences as part of the scheduling.
Drawbacks:- Not widely adopted; not adopted by SGE.
- Olesen Integration (qlicserver aka flex-grid)
-
A daemon runs which frequently determines the number of tickets available
and adjusts the matching CRCAs to match (e.g., by using qconf). This is
detailed
by Olesen. (Olesen's implementation uses SGE's load sensor framework
to run the daemon.)
Drawbacks:- Race condition: the CRCA values will sometimes be out-of-date as off-cluster instances of Fluent start.
- Not clear how we could handle different numbers of tickets being available for different users?
6. |
Our Approach |
6.1. |
Complications |
- Not all available Fluent licence tickets are necessarily available to each user (e.g., there may be 32 fluent-par tickets, but a limit of 20 such tickets per user).
- No all tickets are necessarily available to each compute host. Moreover, FlexLM sees only the UNqualified hostname.
6.2. |
Steps |
- Use an approach based on the re-scheduling script below. It is hoped to move most of the script complexity into a SGE queue starter method (which shared its environment with the job script).
-- look at possibility of determining of user-specific and cluster-specific ticket limits can be added to the flex-grid approach together with... ...use "-l fluent=32" or similar... ...and, moving the "-l" values into the ENV via a JSV, do a second starter_method check with re-schedule N.B. Using "-l fluent=32" is pointless unless using qconf-adjusted CRCAs as we can simply use NSLOTS (as in the script below)
-- ensure all fluent jobs are submitted to the right PE and queue (the latter to get the starter_method) via a JSV
7. |
Example, Tested Script |
The following qsub script uses lmstat to enquire of the licence servers how many Fluent licence tickets are available (fluentall and fluent-par), then determines if it is worth attempting to run the Fluent job. If insufficient licences are available, the script returns 99 which instructs SGE to re-schedule the job. (The script has been — successfuly — tested in a production environment.)
#!/bin/bash #$ -S /bin/bash #$ -cwd #$ -q mixed.q #$ -pe fluent.pe 4 ## -- ** The complications here to be moved to a (prolog script or) starter_method? ## -- ** The complications here to be moved to a (prolog script or) starter_method? ## #$ -l fluentall=1 ## #$ -l fluent-par=$NSLOTS-1 # See also: http://blogs.oracle.com/templedf/entry/license_management_with_grid_engine # # -- suggests, since we don't want many jobs being re-queued, to prevent this mostly by # monitoring the number of available licences (via a daemon and lmstat) and using this # update a consumable complex value corresponding to number of licences which stops # job getting scheduled in the first place... LIC_REQ_ALL=1 LIC_WANT_PAR=$(($NSLOTS-1)) # # Ansys say this is correct: # # a 16-core process needs # 16 licences/tickets; by preference it gets one fluent/fluentall # ticket and 15 fluent-par tickets; if insufficient fluent-par # tickets are available it tops up with fluent/fluentall tickets to # get to 16 if possible. If only, say, 12 total are available the # job will start with 12 processes only. # LIC_ISSUED_ALL=`./lmstat -c fluent.dat -f fluentall | grep "Users of" | awk '{print $6}'` LIC_USED_ALL=`./lmstat -c fluent.dat -f fluentall | grep "Users of" | awk '{print $11}'` LIC_ISSUED_PAR=`./lmstat -c fluent.dat -f fluent-par | grep "Users of" | awk '{print $6}'` LIC_USED_PAR=`./lmstat -c fluent.dat -f fluent-par | grep "Users of" | awk '{print $11}'` LIC_AVAIL_ALL=$(($LIC_ISSUED_ALL-$LIC_USED_ALL)) LIC_AVAIL_PAR=$(($LIC_ISSUED_PAR-$LIC_USED_PAR)) echo "" echo "licences-par issued : $LIC_ISSUED_PAR" echo "licences-par used : $LIC_USED_PAR" echo "" echo "licences(all) issued : $LIC_ISSUED_ALL" echo "licences(all) used : $LIC_USED_ALL" echo "" LIC_AVAIL=$(($LIC_AVAIL_PAR+$LIC_AVAIL_ALL)) LIC_WANT=$(($LIC_REQ_ALL+$LIC_WANT_PAR)) echo "licences total avail: $LIC_AVAIL"; echo "licences total want : $LIC_WANT"; if [ "$LIC_AVAIL" -lt "$LIC_WANT" ] ; then echo "exit with 99" exit 99 # SGE Admin and User Guide, "Consequences of Different Error or Exit Codes": # Job script or prolog/epilog: 99 --> Requeue fi #/software/Fluent.Inc/bin/fluent 3d -g -t$NSLOTS -ssh -i input.jou echo "exit with 0" exit 0
8. |
Example, Tested Script Too: Research Licences Only |
#!/bin/bash #$ -S /bin/bash #$ -cwd #$ -pe fluent-smp.pe 12 LIC_REQ=$NSLOTS # # http://www.ace-net.ca/wiki/FLUENT: # # -- a "fluentall" token always required (or should this be "fluent" --- what is the difference?) # -- fluent requires a "fluent-par" token for each parallel slot over and above the initial slot, e.g., N-1 # # # See aslo: http://blogs.oracle.com/templedf/entry/license_management_with_grid_engine # # -- suggests, since we don't want many jobs being re-queued, to prevent this mostly by # monitoring the number of available licences (via a daemon and lmstat) and using this # update a consumable complex value corresponding to number of licences which stops # job getting scheduled in the first place... LMSTAT=/opt/gridware/apps/binapps/fluent/13.0/v130/fluent/license/lnamd64/lmstat LICENSEDAT=/opt/gridware/apps/binapps/fluent/13.0/v130/fluent/license/license.dat LIC_ISSUED=`$LMSTAT -c $LICENSEDAT -f aa_r_cfd | grep "Users of" | awk '{print $6}'` LIC_USED=`$LMSTAT -c $LICENSEDAT -f aa_r_cfd | grep "Users of" | awk '{print $11}'` LIC=$(($LIC_ISSUED-$LIC_USED-$LIC_REQ)) echo "licences issued : $LIC_ISSUED" echo "licences used : $LIC_USED" echo "licences reqd. : $NSLOTS" echo "licences spare : $LIC" #### /bin/sleep 300 if [ "$LIC" -lt 0 ]; then echo "exit with 99" exit 99 # SGE Admin and User Guide, "Consequences of Different Error or Exit Codes": # Job script or prolog/epilog: 99 --> Requeue fi /opt/gridware/apps/binapps/fluent/13.0/v130/fluent/bin/fluent 3d -g -t$NSLOTS -ssh -i input.jou exit 0