Name and Address Resolution and Troubleshooting
1. |
Messages |
Look in:
- SGE_ROOT/default/spool/qmaster/messages;
- on qmaster, in /tmp/sge_messages.
2. |
Debug Mode |
Run the daemons and utilities in debug mode to get extra messages (cf. ssh -v). Example:
source SGE_ROOT/util/dl.sh dl <level> # ...choose debug level... /etc/init.d/sge_qmaster start # ...starts, does not background, splits out messages...where debug level is a number from 1 to 10, as described on DanT's blog. In a second terminal:
source SGE_ROOT/util/dl.sh dl <level> qstat # ...splits messages in addition to usual info (or not!)...
3. |
FAQ |
- /etc/init.d/sge_qmaster simply locks up; qstat likewise
-
/etc/init.d/sge_qmaster simply locks up and refuses to complete. Use
of ps shows that an instance of sge_qmaster is apparently running,
but qping is stuck. qstat locks and returns nowt.
Check that traffic from all qmaster host network interfaces can get through the local loopback interface, lo/127.0.0.1. Bizarrely, SGE requires that packets with source address 10.99.203.190, 10.2.2.250, 10.3.3.250 and 10.2.49.100 all traverse lo. Did you have a pinhole firewall in operation? - commlib error (client IP resolved to host ""
-
-- ensure /etc/hosts on qmaster correct for ALL interfaces on the qmaster host -- ensure SGE_ROOT/default/common/host_aliases has all required entries -- and the above ensure that hostname hostname -f and, for ALL network interfaces names (long and short) and ips, the appropriate one of SGE_ROOT/utilbin/<arch>/gethostbyname -aname <qmastername> SGE_ROOT/utilbin/<arch>/gethostbyaddr -aname <qmasterip> all return the same Example: In /etc/hosts: 127.0.0.1 localhost.localdomain localhost 10.99.203.190 test.manchester.ac.uk test # 10.2.49.100 login-stg.test.manchester.ac.uk login-stg # 10.2.2.250 login.test.manchester.ac.uk login 10.3.3.250 login-3.test.manchester.ac.uk login-3 In host_aliases: login.test.manchester.ac.uk login login-3.test.manchester.ac.uk login-3 \ login-stg.test.manchester.ac.uk login-stg test.manchester.ac.uk test Some tests: hostname login.test.manchester.ac.uk hostname -f login.test.manchester.ac.uk ./gethostbyname -aname login-3 login.test.manchester.ac.uk ./gethostbyname -aname login-3.test.manchester.ac.uk login.test.manchester.ac.uk ./gethostbyname -aname login.test.manchester.ac.uk login.test.manchester.ac.uk . . . . ./gethostbyaddr -aname 10.99.203.190 login.test.manchester.ac.uk ./gethostbyaddr -aname 10.2.2.250 login.test.manchester.ac.uk . . . .