The OSG Match Maker is really not doing any match making. What it does do is give Condor information about OSG resources, and then Condor does the real match making. What this means is that the jobs you will be submitting are just Condor jobs, but with a OSG / grid flavor to the required attributes.
The advantage of relying on Condor to the the match making is that we can use the very powerful match maker based on classads. The Condor manual describes this well:
You can use the regular Condor command line tools to monitor
resources and jobs. Included with the OSG Match Maker is a wrapper
around those Condor tools and the output is a higher level view
of the resources / jobs. The tool is called
condor_grid_overview
. Below is some example
output:
Example 1. Output from condor_grid_overview
$ condor_grid_overview ID Owner Command Resource Status Time Sta ======= ========== ==================== ==================== ============= ======== 112300 rynge 1.sh NYSGRID_CORNELL_NYS1 Running 10:27:14 112303 rynge 1.sh isuhep Running 13:30:02 112307 rynge 1.sh UCLA_Saxon_Tier3 Running 13:25:02 112318 rynge 1.sh CIT_CMS_T2 Running 10:45:01 112326 rynge 1.sh NYSGRID_CORNELL_NYS1 Running 9:50:01 112345 rynge 1.sh UCSDT2-B Running 13:05:02 112383 rynge 1.sh UCLA_Saxon_Tier3 Running 6:50:00 112384 rynge 1.sh NYSGRID_CORNELL_NYS1 Running 9:38:14 112396 rynge 1.sh TTU-ANTAEUS Running 12:15:02 112511 rynge 1.sh FNAL_FERMIGRID Running 8:15:00 112515 rynge 1.sh UFlorida-HPC Running 7:01:06 112520 rynge 1.sh UCSDT2 Running 12:55:02 112540 rynge 1.sh CIT_CMS_T2 Running 7:35:00 112545 rynge 1.sh RENCI-Engagement Running 13:35:02 112546 rynge 1.sh NYSGRID_CORNELL_NYS1 Running 10:09:12 112561 rynge 1.sh FNAL_GPGRID_1 Running 15:50:03 112562 rynge 1.sh Purdue-RCAC Running 9:45:01 112604 rynge 1.sh Purdue-RCAC Running 5:19:58 112637 rynge 1.sh FNAL_GPGRID_1 Running 8:20:01 112641 rynge 1.sh Purdue-RCAC Running 5:19:58 112644 rynge 1.sh UCSDT2-B Running 11:00:01 112653 rynge 1.sh isuhep Running 7:25:00 112688 rynge 1.sh UCSDT2-B Running 9:10:01 Site Jobs Subm Pend Run Stage Fail Unkno Rank ============================== ===== ===== ===== ===== ===== ===== ===== ======= BNL_ATLAS_1 0 0 0 0 0 0 0 953 100% BNL_ATLAS_2 0 0 0 0 0 0 0 946 100% Clemson-Birdnest 0 0 0 0 0 0 0 1 100% Clemson-ciTeam 0 0 0 0 0 0 0 1 100% CLEMSON-IT 0 0 0 0 0 0 0 1 100% FNAL_FERMIGRID 5 0 0 5 0 0 0 926 100% FNAL_GPGRID_1 2 0 0 2 0 0 0 938 100% GLOW 0 0 0 0 0 0 0 1 100% isuhep 0 0 0 0 0 0 0 945 100% LIGO_UWM_NEMO 0 0 0 0 0 0 0 945 100% MIT_CMS 2 0 0 2 0 0 0 939 100% NWICG_NotreDame 5 0 5 0 0 0 0 200 100% NYSGRID-CCR-U2 0 0 0 0 0 0 0 1 100% NYSGRID-RIT 0 0 0 0 0 0 0 1 100% NYSGRID_CORNELL_NYS1 0 0 0 0 0 0 0 955 100% OCI-NSF 0 0 0 0 0 0 0 954 100% Purdue-RCAC 10 0 0 10 0 0 0 91 10% Purdue-Steele 9 1 5 3 0 0 0 945 100% RENCI-Engagement 3 0 0 3 0 0 0 932 100% SBGrid-Harvard-East 0 0 0 0 0 0 0 950 100% SWT2_CPB 0 0 0 0 0 0 0 948 100% TTU-ANTAEUS 0 0 0 0 0 0 0 945 100% UCR-HEP 1 0 0 1 0 0 0 946 100% UCSDT2 0 0 0 0 0 0 0 952 100% UCSDT2-B 1 0 0 1 0 0 0 941 100% Vanderbilt 0 0 0 0 0 0 0 946 100% 165 jobs; 66 idle, 99 running, 0 held
The first job we will look at is a simple job. The reason we call it simple is that the job does not have any problem recovery attributes, so if the job fails, it will not restart or move to another resource.
The condor submit file looks like:
Example 2. test.condor - a simple OSGMM job
universe = grid grid_type = gt2 globusscheduler = $$(GlueCEInfoContactString) globusrsl = (maxWallTime=10) requirements = ( (TARGET.GlueCEInfoContactString =!= UNDEFINED) && (TARGET.Rank > 300) ) executable = /bin/hostname stream_output = False stream_error = False WhenToTransferOutput = ON_EXIT TransferExecutable = false output = job.out error = job.err log = job.log notification = NEVER queue
Note the globusscheduler = $$(GlueCEInfoContactString)
line. The double dollar signs means that the Condor match maker will
fill in that value after the match making has taken place. You can try
this example by saving the listing above to a file (for example,
test.condor
) and then run:
condor_submit test.condor
You can monitor the job with condor_grid_overview
and once the job is done, you can check the job.*
files for standard out, standard error and log.
Another way to submit a job using a simple submission file is to use the condor_grid_submit submission method. Using this submission method can simplify the creation of jobs by abstracting away the input/output, as well as creating a sandbox for your job to run.
The first step is creating the submission file.
Example 3. condor_grid_submit submission file
transfer_input_files = inputfile.txt transfer_output_files = output.tar.gz=srm://srm.unl.edu:8446/srm/v2/server?SFN=/mnt/hadoop/user/OSGMMTEST/output.tar.gz executable = myprogram.sh output=condor_out/output.$(Cluster).$(Process) error=condor_out/error.$(Cluster).$(Process) log=results.log queue 1
In this example, we transfer the input file inputfile.txt to the grid execution node. After running the program myprogram.sh, the output is transferred, using SRM, to the Storage Element at the University of Nebraska - Lincoln. The inputs and outputs can be in SRM, multiple files should be delimited by commas.
The SRM targets must be in the form <srmsource>=<filename> for input and <filename>=<srmtarget> for output. Inputs and Outputs not using the SRM protocol will be transferred using Condor-G's regular file transfer method.
To submit using condor_grid_submit, run the command:
condor_grid_submit <submitfile>
The simple job is great for testing, but when it comes to real workloads we want to be able to specify a large number of jobs, and have the jobs manage themselves when it comes to error recovery and retries. Below is an example on how to create a Condor DAGMan. To use this example, start with an empty directory.
The file below, together with sample inputs and a sample model can be downloaded as a tar file: advanced_job_example.tar.gz
Note: this example assumes that there is a GridFTP server running on the same host you are submitting from and that the user is mapped to the local account in /etc/grid-security/grid-mapfile
The first thing we need is a submit script. This will walk across
our input data files, and create a Condor job for each input.
Around those jobs we will create a Condor DAGMan workflow to
handle pre/post scripts for each job and job retries in case
of failures. The submit script can be named
submit
and should have the following contents:
Example 4. submit - used to create a new run
#!/bin/bash set -e ############################################################################# # # settings # # max run time (minutes) MAX_WALL_TIME=60 # memory requirements (in megabytes) MEMORY_REQUIREMENT=400 ############################################################################# # top dir TOP_DIR=`pwd` # runid - just a timestamp RUN_ID=`/bin/date +'%F_%H%M%S'` echo "Run id is $RUN_ID" # run dir RUN_DIR=$TOP_DIR/runs/$RUN_ID mkdir -p $RUN_DIR/logs touch $RUN_DIR/alljobs.log chmod 644 $RUN_DIR/alljobs.log # gridftp base urls BASE_URL=gsiftp://`hostname -f`"$TOP_DIR" JOB_ID=0 for INPUT_FILE in `(cd inputs && ls | sort)`; do JOB_ID=$(($JOB_ID + 1)) echo "Generating job $JOB_ID for input $INPUT_FILE" mkdir -p $RUN_DIR/logs/$JOB_ID # condor submit file cd $RUN_DIR cat >$JOB_ID.condor <<EOF universe = grid grid_type = gt2 globusscheduler = \$\$(GlueCEInfoContactString) globusrsl = (maxWallTime=$MAX_WALL_TIME)(min_memory=$MEMORY_REQUIREMENT)(max_memory=$MEMORY_REQUIREMENT) requirements = ( (TARGET.GlueCEInfoContactString =!= UNDEFINED) \\ && (TARGET.Rank > 300) \\ && (TARGET.OSGMM_MemPerCPU >= ($MEMORY_REQUIREMENT * 1000)) \\ && (TARGET.OSGMM_CENetworkOutbound == TRUE) \\ && (TARGET.OSGMM_SoftwareGlobusUrlCopy == TRUE) \\ && ( isUndefined(TARGET.OSGMM_Success_Rate_$USER) \\ || (TARGET.OSGMM_Success_Rate_$USER > 75) ) \\ ) # when retrying, remember the last 4 resources tried match_list_length = 4 Rank = (TARGET.Rank) - \\ ((TARGET.Name =?= LastMatchName0) * 1000) - \\ ((TARGET.Name =?= LastMatchName1) * 1000) - \\ ((TARGET.Name =?= LastMatchName2) * 1000) - \\ ((TARGET.Name =?= LastMatchName3) * 1000) # make sure the job is being retried and rematched periodic_release = (NumGlobusSubmits < 5) globusresubmit = (NumSystemHolds >= NumJobMatches) rematch = True globus_rematch = True # only allow for the job to be queued for a while, then try to move it # GlobusStatus==16 is suspended # JobStatus==1 is pending # JobStatus==2 is running periodic_hold = ( (GlobusStatus==16) || \\ ((JobStatus==1) && ((CurrentTime - EnteredCurrentStatus) > (20*60))) || \\ ((JobStatus==2) && ((CurrentTime - EnteredCurrentStatus) > ($MAX_WALL_TIME*60))) ) # stay in queue on failures on_exit_remove = (ExitBySignal == False) && (ExitCode == 0) executable = ../../remote-job-wrapper arguments = $RUN_ID $JOB_ID $BASE_URL $INPUT_FILE stream_output = False stream_error = False WhenToTransferOutput = ON_EXIT TransferExecutable = true output = logs/$JOB_ID/job.out error = logs/$JOB_ID/job.err log = alljobs.log notification = NEVER queue EOF # update dag echo "" >>master.dag echo "JOB job_$JOB_ID $JOB_ID.condor" >>master.dag echo "SCRIPT PRE job_$JOB_ID $TOP_DIR/local-pre-job $RUN_DIR $RUN_ID $JOB_ID" >>master.dag echo "SCRIPT POST job_$JOB_ID $TOP_DIR/local-post-job $RUN_DIR $RUN_ID $JOB_ID" >>master.dag echo "RETRY job_$JOB_ID 7" >>master.dag done condor_submit_dag -notification NEVER master.dag
We also need the local-pre-job
and local-post-job
scripts. These are invoked locally before and
each job is run. In this example the pre script is used to maintain
the permissions on the log file, while the post script check to make
sure that the output of the job is what we expect it to be. This is
an important part as some sites do not return the correct exit code
of your job. Checking the output after the job is done is extra
insurance against all jobs completing successfully. If an post script
fails (returns non-zero exit code), DAGMan considers the job as failed
and will re-run the job.
Example 5. local-pre-job
#!/bin/bash set -e RUN_DIR=$1 RUN_ID=$2 JOB_ID=$3 # make sure the log file is readable by the match maker touch $RUN_DIR/alljobs.log chmod 644 $RUN_DIR/alljobs.log
Example 6. local-post-job
#!/bin/bash set -e RUN_DIR=$1 RUN_ID=$2 JOB_ID=$3 TIMESTAMP=`/bin/date +'%y%m%d_%H:%M'` # make sure the output has the successful marker - this is done # because we can not always trust the exit codes from the grid sites if grep "=== RUN SUCCESSFUL ===" $RUN_DIR/logs/$JOB_ID/job.out; then exit 0 else # keep copies of the output for failed jobs cd $RUN_DIR/logs/$JOB_ID cp job.out job.out.checked.$TIMESTAMP cp job.err job.err.checked.$TIMESTAMP exit 1 fi
The last thing we need is a rempte-job-wrapper, which purpose it is to
provide a nice environment for your model to run in. The script
below will create a temporary directory on the local disk
(that is, under $OSG_WN_TMP
and clean up
after the job.
Note that if you need to stage out more outputs, it is common to tar the outputs up in one file, and stage just that file. The reason this is simpler is that you don't have to know ahead of time what your outputs will be named. This job example can easily be extended to do more complex data staging, either by using Condor or by using GridFTP / SRM.
Example 7. rempte-job-wrapper
#!/bin/bash # provide some information about the host we are running on function host_info() { echo echo "Running on" `hostname -f` "($OSG_SITE_NAME)" echo echo "uname -a" uname -a echo echo -n "OS: " if [ -e /etc/redhat-release ]; then echo "RedHat (maybe derivative)" cat /etc/redhat-release else if [ -e /etc/debian_version ]; then echo "Debian" cat /etc/debian_version else echo "Unknown" fi fi echo echo "ulimit -a" ulimit -a echo echo "/usr/bin/env" /usr/bin/env echo echo "cat /proc/cpuinfo" cat /proc/cpuinfo echo echo "cat /proc/meminfo" cat /proc/meminfo echo echo "---------------------------------------------------" echo } # create a work directory in a place the site asks us to use function create_work_dir() { unset TMPDIR TARGETS="$OSG_WN_TMP $OSG_DATA/engage/tmp" for DER in $TARGETS; do WORK_DIR=`/bin/mktemp -d -p $DER job.XXXXXXXXXX` if [ $? = 0 ]; then echo "Created workdir in $DER" export WORK_DIR return 0 fi echo "Failed to create workdir in $DER" done return 1 } # clean up the temporary work directory function cleanup() { cd $START_DIR rm -rf $WORK_DIR || /bin/true } # use gridftp to stage in model and inputs function stage_in() { cd $WORK_DIR # get the application globus-url-copy -v -notpt -nodcau \ $BASE_URL/application/wordfreq \ file://$WORK_DIR/wordfreq \ || return 1 chmod 755 wordfreq # get the inputs globus-url-copy -v -notpt -nodcau \ $BASE_URL/inputs/$INPUT_FILE \ file://$WORK_DIR/$INPUT_FILE \ || return 1 chmod 755 wordfreq return 0 } # use gridftp to stage out results function stage_out() { cd $WORK_DIR globus-url-copy -v -create-dest -notpt -nodcau \ file://$WORK_DIR/app.stdouterr \ $BASE_URL/runs/$RUN_ID/outputs/$JOB_ID.app.stdouterr \ || return 1 return 0 } # execute the model function run_model() { cd $WORK_DIR # input identifier in the output file (echo "$INPUT_FILE"; echo) >app.stdouterr # run the real model cat $INPUT_FILE | ./wordfreq >>app.stdouterr 2>&1 EXIT_CODE=$? # if failure, put the last lines on stdout - useful for debugging if [ "x$EXIT_CODE" != "x0" ]; then tail -n 500 app.stdouterr fi return $EXIT_CODE } # run id is the first argument RUN_ID=$1 # job id is the second argument JOB_ID=$2 # gridftp base url BASE_URL=$3 # input file name INPUT_FILE=$4 # keep the exit code to the end EXIT_CODE=1 # remember start dir START_DIR=`pwd` # first, collect some information about the environment host_info # grid environment set up if [ "x$PATH" = "x" ]; then export PATH="/usr/bin:/bin" fi . $OSG_GRID/setup.sh || { echo "Unable to source \$OSG_GRID/setup.sh" exit 1 } # we need a local temp directory to do the actual work in # it is important to try to use local filesystems as much as # possible during jobs, instead of using the shared $OSG_DATA create_work_dir if [ $? != 0 ]; then exit 1 fi # is it also very important to do the cleanup in case of failure trap cleanup 1 2 3 6 (stage_in && run_model $RUN_ID $JOB_ID && stage_out) EXIT_CODE=$? # cleanup cleanup # signal the sucess/failure of the job if [ "x$EXIT_CODE" = "x0" ]; then # give the all good signal to the job-success-check script echo "=== RUN SUCCESSFUL ===" else echo "Job failed with exit code $EXIT_CODE" fi exit $EXIT_CODE
To run this example, save all the files, make sure the executable bit is set an all the script (chmod 755) and then run:
./submit
You will find a timestampped directory under runs/
containing logs and outputs.