SGE Notes
-- job is first task, control slaves...
-- loose integration
-- tight integration
-- openmpi
-- hp-mpi
Fair shares:
1. http://ait.web.psi.ch/services/linux/hpc/merlin3/sge/admin/
2. http://wikis.sun.com/display/gridengine62u3/How+to+Create+Project-Based+Share-Tree+Scheduling+With+QMON
http://wikis.sun.com/display/gridengine62u3/Configuring+the+Share-Based+Policy#ConfiguringtheShare-BasedPolicy-ConfiguringtheShareTreePolicyWithQMON
-----------------
http://gridengine.sunsource.net/news/SGE62u5-announce.html
-- includes topology-aware stuff
-----------------
http://wiki.gridengine.info/wiki/index.php/Main_Page
-----------------
Grid Engine Portal
-- http://gridengine.sunsource.net/gep/GEP_Intro.html
Users authenticate to a portal interface from anywhere on the internet via a browser and can then:
* Securely access and execute applications via a transparent interface to Grid Engine
* Monitor the status of jobs running in Grid Engine
* Securely upload input files to the Portal Server with the click of a button
* Securely download output files to a local workstation with the click of a button
* View X-windows based applications using VNC
Administrators can also remotely access the portal and perform administrative functions such as:
* Registering applications for use with the GEP in a matter of minutes
* Quickly building HTML interfaces to applications using templates that prompt users for input
* Monitoring Grid Engine usage and statistics
----------------
-- The battle: Globus vs LSF --- is there not a third way via SGE's SDM.
-- how is the multiclustering gonna work?
-- requires common filesystems
-- requires standard s/w stack
-- not gonna work...
-- SGE is licensed under GPL
-- howtos http://gridengine.sunsource.net/howto/howto.html
-- drmaa api
-- ARCo accounting and reporting (MySQL or Oracle)
---------------
SDM
http://blogs.sun.com/templedf/entry/service_domain_manager
supports
-- cloud bursting
-- powers down idle and underutilized machines
-- not a metasheduler --- moves compute nodes from one cluster to another
http://wikis.sun.com/display/GridEngine/Using+SDM+With+the+Sun+Grid+Engine+Adapter
----------------
SGE on Campus
-- redqueen
-- mace01
-- man2
-- usto oran (MACE)
-- pacemaker (MHS)
-- templar (FLS)
-- agent (FLS)
-- epsilon (EPS)
-- Brian Blower's cluster (MHS)
-- terra (Duncan Irving, Earth Sciences)
-----------------
Topology Aware Scheduling
http://blogs.sun.com/templedf/entry/topology_aware_scheduling
------------------
Checkpointing
http://gridengine.sunsource.net/howto/checkpointing.html
-- integrates with Condor libraries/compiler
https://upc-bugs.lbl.gov//blcr/doc/html/FAQ.html
-- BLCR