We will use the CHTC to do distributed high-throughput computing via HTCondor software to run thousands of parallel jobs.
To work on the CHTC, login via "ssh
NetID@learn.chtc.wisc.edu
" (using
your NetID
). Here are some of the most important
commands:
condor_submit <script.sub>
submits
the job(s) in script.sub
.
condor_q
lists my jobs.condor_q <NetID> --hold
lists reasons for my held (broken) jobs.condor_q -better-analyze <JobID>
indicates why a job isn't starting.condor_q -hold
: gives reason for held job.condor_release
: releases held jobs back to idle, which can help for transient HTCondor problems.condor_q
help for morecondor_rm <NetID>
cancels jobs belonging to <NetID>
condor_submit -i <script.sub>
runs an
interactive job to get a command line on a computing node.
condor_submit_dag <script.dag>
runs a
computation described by a directed acyclic graph (DAG)
as in the sd
example, below.
learn
from
which you ran condor_submit
to launch the jobs. (New
directories are not copied back.)
Here are the examples from lecture:
wget http://www.stat.wisc.edu/~jgillett/DSCP/CHTC/tinyExamples.tar
wget http://www.stat.wisc.edu/~jgillett/DSCP/CHTC/sd.tar
wget http://www.stat.wisc.edu/~jgillett/DSCP/CHTC/calling_R_or_python.tar
If you need more disk space, you may as CHTC for a quota increase or
they or I can help you with using
their /staging
folder for large files.
Here are links to more information:
.tar
" file of
your code and other human-written files (omitting most data files
and most output files) and then copying that single file to your
own computer.