Materials Studio SGE Bug
Materials Studio refuses to accept the existence of some queues — but not others — in the SGE configuration of our HPC cluster. Why? Why?
Background
- Materials Studio has a nice native MS Windows GUI which talks to Apache and some Perl glue, in turn to SGE (the batch system) on our Linux-based HPC Cluster. All good.
- It has been working fine for six months, talking to our 7-day, 2-day and short/debug queues.
- Then we added some more — different — hardware and so extra queues. Reconfig Materials Studio to talk to the new queues also and. . . it does not work! The MS Windows client cannot see the new queues.
Diagnostics
Why can the MS Windows client not see the new queues?
- Client talks to Apache; Apache logs show it is calling /dsd/commands/dsd_getqueues.pl.
- Chasing subroutine call after subroutine call through the less-than-ideally-structured code we find, within .../Gateway/root_default/dsd/commands/queues/SGE/dsd_sge.pm, GetAvailableQueues which pokes through .../SGE/sge.cfg for queues listed under allowed_queues and compares them to output from qconf -sql. Valid queues are returned to the MS Windows client. All Good.
- Except that, for no good reason at all, GetAvailableQueues only looks at the first 10 queues — ignores the rest.
- Fixed.