Stuff

UoM::RCS::Talby::Danzek::SGE



Page Group

Basic Config:

Extra Stuff:

Applications:

Scripts Etc.







Handling Fluent: CCAs, JSVs, Starter Methods and Re-queueing

1. 

JSV Notes

 -- http://arc.liv.ac.uk/SGE/htmlman/htmlman1/jsv.html

 -- submit parameters (e.g., pe_name, pe_min, pe_max, ...)
        http://wikis.sun.com/display/gridengine62u5/Submit+Parameters


     -- can we use a JSV to put all the "-l" stuff into job ENV?
         -- yes, use jsv_add_env, e.g. in jsv script
                jsv_add_env "JSV_EXAMPLE" "jsv_value_33"
            then in qsub script
                echo "//$JSV_EXAMPLE//"

From the man page (above) :

          1) qsub -jsv ...
          2) $cwd/.sge_request
          3) $HOME/.sge_request
          4) $SGE_ROOT/$SGE_CELL/common/sge_request
          5) Global configuration

       The Client JSVs (1-3) can be defined by Grid Engine end users, whereas
       the client JSV defined in the global sge_request file (4) and the
       server JSV (5) can only be defined by the Grid Engine administrators.

       Due to the fact that (4) and (5) are defined and configured by Grid
       Engine administrators and executed as the last JSV instances in the
       sequence of JSV scripts, an administrator has a way to enforce certain
       policies for a cluster.  However, note that (4) may be over-ridden
       trivially with qsub -clear.

2. 

The Issues

PE Integration
We need SGE and Fluent to talk to eachother to ensure that Fluent starts the SGE-determined number of processes and starts them on the right compute nodes. We also want to ensure processes are tidied up correctly at the end.
Licensing
Fluent uses FlexLM with floating network licences provided by the Uni licence servers. We don't want jobs to be scheduled, then fail to run because of a lack of licences. The School of MACE own all the licences.

3. 

PE Integration

Fluent is SGE-aware: suitable PE scripts are supplied and qsub is not required. For example:

    fluent 3d -g -t64 -pnet -ssh -sge -sgepe fluent-32.pe 64 -i input.jou
where fluent-32.pe include's a Fluent-supplied kill script
    pe_name            fluent-32.pe
    slots              999
    user_lists         NONE
    xuser_lists        NONE
    start_proc_args    /bin/true
    stop_proc_args     /opt/gridware/apps/binapps/fluent/6.3.26/Fluent.Inc/addons/sge1.0/kill-fluent
    allocation_rule    32
    control_slaves     FALSE
    job_is_first_task  FALSE
    urgency_slots      min 
    accounting_summary FALSE

The Fluent-supplied kill script:

    #!/bin/sh

    FileName=kill-fluent-sge-$JOB_ID
    OldFileName=kill-fluent$JOB_ID  ## For backward compatibility with Fluent6.1

    if [ -w $PWD ] ; then
        FileName=$PWD/$FileName
        OldFileName=$PWD/$OldFileName
    else
        FileName=$HOME/$FileName
        OldFileName=/tmp/$OldFileName
    fi

    if [ -f $FileName ]; then
        /bin/sh  $FileName
    elif [ -f $OldFileName ]; then
        /bin/sh  $OldFileName
    fi

4. 

Background: Fluent's Licencing Model

Using lmstat, e.g.,

    lmstat -c fluent_licence.dat -a 
or
    lmstat -c fluent_licence.dat -f fluent-par 
shows how many licence tickets are available and how many have been issued (and who has checked them out and on which host).

MACE have some fluent tickets — which seem to be the same as fluentall; also some fluent-par tickets. A query to Ansys on 2011 Sept 22 gave some details on how it all works:

    I am trying to determine how the fluent licence model works
    for parallel jobs.  Is this correct:  a 16-core process needs
    16 licences/tickets;  by preference it gets one fluent/fluentall
    ticket and 15 fluent-par tickets;  if insufficient fluent-par
    tickets are available it tops up with fluent/fluentall tickets to
    get to 16 if possible.  If only, say, 12 total are available the
    job will start with 12 processes only.  Is this correct?
The answer was:
    Your understanding is absolutely correct.

5. 

Possible Strategies

Dan T summarises this nicely on his blog:

Consumable Complex Resource Attributes
Use SGE's consumable complex resource attributes (which are user-requested via the hard resource list — "-l thing"). Simply defined CCRAs which match the FlexLM tickets and require users to ask for the resources they need, for example
  #!/bin/bash
  
  #$ -pe fluent-smp.pe 4

  #$ -l fluentall=0.25
  #$ -l fluent-par=0.75
             # ...numbers are multiplied (by SGE) by NSLOTS...
Use a JSV to ensure request the tickets they need.

Drawbacks:
  • But what about the tickets being used by off-cluster hosts?
  • There is a race condition:
Custom Job Starter and Job Rescheduling
Use a qsub script to start Fluent: within that script, check if sufficient licences are available; if not, return 99, so that the job is re-scheduled (see SGE Admin and User Guide: Consequences of Different Error or Exit Codes); otherwise, start the job.

Drawbacks:
  • Do we really want a whole load of Fluent jobs sitting the queue re-scheduling themselves?
  • Race condition: e.g., an off-cluster Fluent instance could grab the tickets.
FlexNet (FlexLM) Connector
An API for checking out and returning licenses. Perfect(?): a batch system can check out licences as part of the scheduling.

Drawbacks:
  • Not widely adopted; not adopted by SGE.
Olesen Integration (qlicserver aka flex-grid)
A daemon runs which frequently determines the number of tickets available and adjusts the matching CRCAs to match (e.g., by using qconf). This is detailed by Olesen. (Olesen's implementation uses SGE's load sensor framework to run the daemon.)

Drawbacks:
  • Race condition: the CRCA values will sometimes be out-of-date as off-cluster instances of Fluent start.
  • Not clear how we could handle different numbers of tickets being available for different users?

6. 

Our Approach

6.1. 

Complications

  1. Not all available Fluent licence tickets are necessarily available to each user (e.g., there may be 32 fluent-par tickets, but a limit of 20 such tickets per user).
  2. No all tickets are necessarily available to each compute host. Moreover, FlexLM sees only the UNqualified hostname.

6.2. 

Steps

  1. Use an approach based on the re-scheduling script below. It is hoped to move most of the script complexity into a SGE queue starter method (which shared its environment with the job script).
  2.  -- look at possibility of determining of user-specific and cluster-specific
        ticket limits can be added to the flex-grid approach together with...
    
     ...use "-l fluent=32" or similar...
    
     ...and, moving the "-l" values into the ENV via a JSV, do a second 
        starter_method check with re-schedule
    
     N.B. Using "-l fluent=32" is pointless unless using qconf-adjusted CRCAs
          as we can simply use NSLOTS (as in the script below)
  3.  -- ensure all fluent jobs are submitted to the right PE and queue (the latter
        to get the starter_method) via a JSV

7. 

Example, Tested Script

The following qsub script uses lmstat to enquire of the licence servers how many Fluent licence tickets are available (fluentall and fluent-par), then determines if it is worth attempting to run the Fluent job. If insufficient licences are available, the script returns 99 which instructs SGE to re-schedule the job. (The script has been — successfuly — tested in a production environment.)

#!/bin/bash

#$ -S /bin/bash
#$ -cwd
#$ -q mixed.q
#$ -pe fluent.pe 4

    ## -- ** The complications here to be moved to a (prolog script or) starter_method?
    ## -- ** The complications here to be moved to a (prolog script or) starter_method?

##  #$ -l fluentall=1
##  #$ -l fluent-par=$NSLOTS-1

    # See also:  http://blogs.oracle.com/templedf/entry/license_management_with_grid_engine
    #  
    #  -- suggests, since we don't want many jobs being re-queued, to prevent this mostly by
    #     monitoring the number of available licences (via a daemon and lmstat) and using this 
    #     update a consumable complex value corresponding to number of licences which stops
    #     job getting scheduled in the first place...


LIC_REQ_ALL=1
LIC_WANT_PAR=$(($NSLOTS-1))
    #
    # Ansys say this is correct:
    #
    # a 16-core process needs
    # 16 licences/tickets;  by preference it gets one fluent/fluentall
    # ticket and 15 fluent-par tickets;  if insufficient fluent-par
    # tickets are available it tops up with fluent/fluentall tickets to
    # get to 16 if possible.  If only, say, 12 total are available the
    # job will start with 12 processes only.  
    #

LIC_ISSUED_ALL=`./lmstat -c fluent.dat -f fluentall | grep "Users of" | awk '{print $6}'`
LIC_USED_ALL=`./lmstat -c fluent.dat -f fluentall | grep "Users of" | awk '{print $11}'`

LIC_ISSUED_PAR=`./lmstat -c fluent.dat -f fluent-par | grep "Users of" | awk '{print $6}'`
LIC_USED_PAR=`./lmstat -c fluent.dat -f fluent-par | grep "Users of" | awk '{print $11}'`

LIC_AVAIL_ALL=$(($LIC_ISSUED_ALL-$LIC_USED_ALL))
LIC_AVAIL_PAR=$(($LIC_ISSUED_PAR-$LIC_USED_PAR))

echo ""
echo "licences-par issued : $LIC_ISSUED_PAR"
echo "licences-par used   : $LIC_USED_PAR"
echo ""
echo "licences(all) issued : $LIC_ISSUED_ALL"
echo "licences(all) used   : $LIC_USED_ALL"
echo ""

LIC_AVAIL=$(($LIC_AVAIL_PAR+$LIC_AVAIL_ALL))
LIC_WANT=$(($LIC_REQ_ALL+$LIC_WANT_PAR))

echo "licences total avail:  $LIC_AVAIL";
echo "licences total want :  $LIC_WANT";


if [ "$LIC_AVAIL" -lt "$LIC_WANT" ] ; then
    echo "exit with 99"
    exit 99
        # SGE Admin and User Guide, "Consequences of Different Error or Exit Codes": 
        #     Job script or prolog/epilog: 99 --> Requeue
fi

#/software/Fluent.Inc/bin/fluent 3d -g -t$NSLOTS -ssh -i input.jou

echo "exit with 0"
exit 0

8. 

Example, Tested Script Too: Research Licences Only

#!/bin/bash

#$ -S /bin/bash
#$ -cwd
#$ -pe fluent-smp.pe 12

LIC_REQ=$NSLOTS
    #
    # http://www.ace-net.ca/wiki/FLUENT:
    #     
    #  -- a "fluentall" token always required (or should this be "fluent" --- what is the difference?)
    #  -- fluent requires a "fluent-par" token for each parallel slot over and above the initial slot, e.g., N-1
    #     

    #
    # See aslo:  http://blogs.oracle.com/templedf/entry/license_management_with_grid_engine
    #  
    #  -- suggests, since we don't want many jobs being re-queued, to prevent this mostly by
    #     monitoring the number of available licences (via a daemon and lmstat) and using this 
    #     update a consumable complex value corresponding to number of licences which stops
    #     job getting scheduled in the first place...

LMSTAT=/opt/gridware/apps/binapps/fluent/13.0/v130/fluent/license/lnamd64/lmstat
LICENSEDAT=/opt/gridware/apps/binapps/fluent/13.0/v130/fluent/license/license.dat

LIC_ISSUED=`$LMSTAT -c $LICENSEDAT -f aa_r_cfd | grep "Users of" | awk '{print $6}'`
LIC_USED=`$LMSTAT -c $LICENSEDAT -f aa_r_cfd | grep "Users of" | awk '{print $11}'`

LIC=$(($LIC_ISSUED-$LIC_USED-$LIC_REQ))

echo "licences issued : $LIC_ISSUED"
echo "licences used   : $LIC_USED"
echo "licences reqd.  : $NSLOTS"
echo "licences spare  : $LIC" 

#### /bin/sleep 300

if [ "$LIC" -lt 0 ]; then
   echo "exit with 99"
   exit 99
       # SGE Admin and User Guide, "Consequences of Different Error or Exit Codes": 
       #     Job script or prolog/epilog: 99 --> Requeue
fi

/opt/gridware/apps/binapps/fluent/13.0/v130/fluent/bin/fluent 3d -g -t$NSLOTS -ssh -i input.jou

exit 0