SGE Notes
 -- job is first task, control slaves...
 -- loose integration
 -- tight integration
     -- openmpi
     -- hp-mpi
Fair shares:
1. http://ait.web.psi.ch/services/linux/hpc/merlin3/sge/admin/
2. http://wikis.sun.com/display/gridengine62u3/How+to+Create+Project-Based+Share-Tree+Scheduling+With+QMON
   http://wikis.sun.com/display/gridengine62u3/Configuring+the+Share-Based+Policy#ConfiguringtheShare-BasedPolicy-ConfiguringtheShareTreePolicyWithQMON
-----------------
http://gridengine.sunsource.net/news/SGE62u5-announce.html
 -- includes topology-aware stuff
-----------------
http://wiki.gridengine.info/wiki/index.php/Main_Page
-----------------
Grid Engine Portal
 -- http://gridengine.sunsource.net/gep/GEP_Intro.html
Users authenticate to a portal interface from anywhere on the internet via a browser and can then:
    * Securely access and execute applications via a transparent interface to Grid Engine
    * Monitor the status of jobs running in Grid Engine
    * Securely upload input files to the Portal Server with the click of a button
    * Securely download output files to a local workstation with the click of a button
    * View X-windows based applications using VNC
Administrators can also remotely access the portal and perform administrative functions such as:
    * Registering applications for use with the GEP in a matter of minutes
    * Quickly building HTML interfaces to applications using templates that prompt users for input
    * Monitoring Grid Engine usage and statistics
----------------
 -- The battle:  Globus vs LSF --- is there not a third way via SGE's SDM.
 -- how is the multiclustering gonna work?
     -- requires common filesystems 
     -- requires standard s/w stack
         -- not gonna work...
 -- SGE is licensed under GPL
 -- howtos http://gridengine.sunsource.net/howto/howto.html
 -- drmaa api
 -- ARCo accounting and reporting (MySQL or Oracle)
---------------
SDM
http://blogs.sun.com/templedf/entry/service_domain_manager
supports
 -- cloud bursting
 -- powers down idle and underutilized machines
 -- not a metasheduler --- moves compute nodes from one cluster to another
http://wikis.sun.com/display/GridEngine/Using+SDM+With+the+Sun+Grid+Engine+Adapter
----------------
SGE on Campus
 -- redqueen
 -- mace01
 -- man2
 -- usto oran (MACE)
 -- pacemaker (MHS)
 -- templar (FLS)
 -- agent (FLS)  
 -- epsilon (EPS)
 -- Brian Blower's cluster (MHS)
 -- terra (Duncan Irving, Earth Sciences)
 
-----------------
Topology Aware Scheduling
http://blogs.sun.com/templedf/entry/topology_aware_scheduling
------------------
Checkpointing
http://gridengine.sunsource.net/howto/checkpointing.html
 -- integrates with Condor libraries/compiler
https://upc-bugs.lbl.gov//blcr/doc/html/FAQ.html
 -- BLCR