Running comet on a cluster

From CometWiki
Revision as of 10:56, 13 April 2009 by Markdayel (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

We highly recommend running 'comet' on a cluster of machines, rather than one. The program takes some time to run (an hour or two for early events like symmetry breaking; overnight if you want to look at the later motility, pulsatile motion etc.) Running the program concurrently on at least 5 machines (I used 9) lets you test the effect of changing one parameter through a range of values. You can set it going then come back the following day and have the whole thing layed out for you. I've found this very useful, and have included a set of scripts to automate the process.


I recommend putting a default cometparams.ini into a main directory for the data e.g. ~/runs. In that directory run the varyset script (included with the source):

varyset <parameter> <startval> <endval> <number of steps>

This will create a subdirectory within runs that contains subdirectories numbered 1,2,3,etc. each containing a version of the cometparams.ini file with the <parameter> value varying in linear steps between <startval> and <endval>. It will also add information to run the individual comet jobs into ~/joblist.

If you have access to a cluster with a working job control system, you might want to use that. I had trouble with the job control system on the cluster I was using, and ended up writing my own:

On the head node, I have the

startnewjobs

script running as a cron job. This checks to see if the worker nodes are idle (5 min load average below a certain threshold) and starts the next job if they are.


If you don't have access to a cluster, there is a single computer version of this

startjobsloop

which will check ~/joblist for new jobs and run them sequentially.