The OSG Match Maker is really not doing any match making. What it does do is give Condor information about OSG resources, and then Condor does the real match making. What this means is that the jobs you will be submitting are just Condor jobs, but with a OSG / grid flavor to the required attributes.
The advantage of relying on Condor to the the match making is that we can use the very powerful match maker based on classads. The Condor manual describes this well:
You can use the regular Condor command line tools to monitor
resources and jobs. Included with the OSG Match Maker is a wrapper
around those Condor tools and the output is a higher level view
of the resources / jobs. The tool is called
condor_grid_overview. Below is some example
output:
Example 1. Output from condor_grid_overview
$ condor_grid_overview
ID Owner Command Resource Status Time Sta
======= ========== ==================== ==================== ============= ========
112300 rynge 1.sh NYSGRID_CORNELL_NYS1 Running 10:27:14
112303 rynge 1.sh isuhep Running 13:30:02
112307 rynge 1.sh UCLA_Saxon_Tier3 Running 13:25:02
112318 rynge 1.sh CIT_CMS_T2 Running 10:45:01
112326 rynge 1.sh NYSGRID_CORNELL_NYS1 Running 9:50:01
112345 rynge 1.sh UCSDT2-B Running 13:05:02
112383 rynge 1.sh UCLA_Saxon_Tier3 Running 6:50:00
112384 rynge 1.sh NYSGRID_CORNELL_NYS1 Running 9:38:14
112396 rynge 1.sh TTU-ANTAEUS Running 12:15:02
112511 rynge 1.sh FNAL_FERMIGRID Running 8:15:00
112515 rynge 1.sh UFlorida-HPC Running 7:01:06
112520 rynge 1.sh UCSDT2 Running 12:55:02
112540 rynge 1.sh CIT_CMS_T2 Running 7:35:00
112545 rynge 1.sh RENCI-Engagement Running 13:35:02
112546 rynge 1.sh NYSGRID_CORNELL_NYS1 Running 10:09:12
112561 rynge 1.sh FNAL_GPGRID_1 Running 15:50:03
112562 rynge 1.sh Purdue-RCAC Running 9:45:01
112604 rynge 1.sh Purdue-RCAC Running 5:19:58
112637 rynge 1.sh FNAL_GPGRID_1 Running 8:20:01
112641 rynge 1.sh Purdue-RCAC Running 5:19:58
112644 rynge 1.sh UCSDT2-B Running 11:00:01
112653 rynge 1.sh isuhep Running 7:25:00
112688 rynge 1.sh UCSDT2-B Running 9:10:01
Site Jobs Subm Pend Run Stage Fail Unkno Rank
============================== ===== ===== ===== ===== ===== ===== ===== =======
BNL_ATLAS_1 0 0 0 0 0 0 0 953 100%
BNL_ATLAS_2 0 0 0 0 0 0 0 946 100%
Clemson-Birdnest 0 0 0 0 0 0 0 1 100%
Clemson-ciTeam 0 0 0 0 0 0 0 1 100%
CLEMSON-IT 0 0 0 0 0 0 0 1 100%
FNAL_FERMIGRID 5 0 0 5 0 0 0 926 100%
FNAL_GPGRID_1 2 0 0 2 0 0 0 938 100%
GLOW 0 0 0 0 0 0 0 1 100%
isuhep 0 0 0 0 0 0 0 945 100%
LIGO_UWM_NEMO 0 0 0 0 0 0 0 945 100%
MIT_CMS 2 0 0 2 0 0 0 939 100%
NWICG_NotreDame 5 0 5 0 0 0 0 200 100%
NYSGRID-CCR-U2 0 0 0 0 0 0 0 1 100%
NYSGRID-RIT 0 0 0 0 0 0 0 1 100%
NYSGRID_CORNELL_NYS1 0 0 0 0 0 0 0 955 100%
OCI-NSF 0 0 0 0 0 0 0 954 100%
Purdue-RCAC 10 0 0 10 0 0 0 91 10%
Purdue-Steele 9 1 5 3 0 0 0 945 100%
RENCI-Engagement 3 0 0 3 0 0 0 932 100%
SBGrid-Harvard-East 0 0 0 0 0 0 0 950 100%
SWT2_CPB 0 0 0 0 0 0 0 948 100%
TTU-ANTAEUS 0 0 0 0 0 0 0 945 100%
UCR-HEP 1 0 0 1 0 0 0 946 100%
UCSDT2 0 0 0 0 0 0 0 952 100%
UCSDT2-B 1 0 0 1 0 0 0 941 100%
Vanderbilt 0 0 0 0 0 0 0 946 100%
165 jobs; 66 idle, 99 running, 0 held
The first job we will look at is a simple job. The reason we call it simple is that the job does not have any problem recovery attributes, so if the job fails, it will not restart or move to another resource.
The condor submit file looks like:
Example 2. test.condor - a simple OSGMM job
universe = grid
grid_type = gt2
globusscheduler = $$(GlueCEInfoContactString)
globusrsl = (maxWallTime=10)
requirements = ( (TARGET.GlueCEInfoContactString =!= UNDEFINED) && (TARGET.Rank > 300) )
executable = /bin/hostname
stream_output = False
stream_error = False
WhenToTransferOutput = ON_EXIT
TransferExecutable = false
output = job.out
error = job.err
log = job.log
notification = NEVER
queue
Note the globusscheduler = $$(GlueCEInfoContactString)
line. The double dollar signs means that the Condor match maker will
fill in that value after the match making has taken place. You can try
this example by saving the listing above to a file (for example,
test.condor) and then run:
condor_submit test.condor
You can monitor the job with condor_grid_overview
and once the job is done, you can check the job.*
files for standard out, standard error and log.
Another way to submit a job using a simple submission file is to use the condor_grid_submit submission method. Using this submission method can simplify the creation of jobs by abstracting away the input/output, as well as creating a sandbox for your job to run.
The first step is creating the submission file.
Example 3. condor_grid_submit submission file
transfer_input_files = inputfile.txt
transfer_output_files = output.tar.gz=srm://srm.unl.edu:8446/srm/v2/server?SFN=/mnt/hadoop/user/OSGMMTEST/output.tar.gz
executable = myprogram.sh
output=condor_out/output.$(Cluster).$(Process)
error=condor_out/error.$(Cluster).$(Process)
log=results.log
queue 1
In this example, we transfer the input file inputfile.txt to the grid execution node. After running the program myprogram.sh, the output is transferred, using SRM, to the Storage Element at the University of Nebraska - Lincoln. The inputs and outputs can be in SRM, multiple files should be delimited by commas.
The SRM targets must be in the form <srmsource>=<filename> for input and <filename>=<srmtarget> for output. Inputs and Outputs not using the SRM protocol will be transferred using Condor-G's regular file transfer method.
To submit using condor_grid_submit, run the command:
condor_grid_submit <submitfile>
The simple job is great for testing, but when it comes to real workloads we want to be able to specify a large number of jobs, and have the jobs manage themselves when it comes to error recovery and retries. Below is an example on how to create a Condor DAGMan. To use this example, start with an empty directory.
The file below, together with sample inputs and a sample model can be downloaded as a tar file: advanced_job_example.tar.gz
Note: this example assumes that there is a GridFTP server running on the same host you are submitting from and that the user is mapped to the local account in /etc/grid-security/grid-mapfile
The first thing we need is a submit script. This will walk across
our input data files, and create a Condor job for each input.
Around those jobs we will create a Condor DAGMan workflow to
handle pre/post scripts for each job and job retries in case
of failures. The submit script can be named
submit and should have the following contents:
Example 4. submit - used to create a new run
#!/bin/bash
set -e
#############################################################################
#
# settings
#
# max run time (minutes)
MAX_WALL_TIME=60
# memory requirements (in megabytes)
MEMORY_REQUIREMENT=400
#############################################################################
# top dir
TOP_DIR=`pwd`
# runid - just a timestamp
RUN_ID=`/bin/date +'%F_%H%M%S'`
echo "Run id is $RUN_ID"
# run dir
RUN_DIR=$TOP_DIR/runs/$RUN_ID
mkdir -p $RUN_DIR/logs
touch $RUN_DIR/alljobs.log
chmod 644 $RUN_DIR/alljobs.log
# gridftp base urls
BASE_URL=gsiftp://`hostname -f`"$TOP_DIR"
JOB_ID=0
for INPUT_FILE in `(cd inputs && ls | sort)`; do
JOB_ID=$(($JOB_ID + 1))
echo "Generating job $JOB_ID for input $INPUT_FILE"
mkdir -p $RUN_DIR/logs/$JOB_ID
# condor submit file
cd $RUN_DIR
cat >$JOB_ID.condor <<EOF
universe = grid
grid_type = gt2
globusscheduler = \$\$(GlueCEInfoContactString)
globusrsl = (maxWallTime=$MAX_WALL_TIME)(min_memory=$MEMORY_REQUIREMENT)(max_memory=$MEMORY_REQUIREMENT)
requirements = ( (TARGET.GlueCEInfoContactString =!= UNDEFINED) \\
&& (TARGET.Rank > 300) \\
&& (TARGET.OSGMM_MemPerCPU >= ($MEMORY_REQUIREMENT * 1000)) \\
&& (TARGET.OSGMM_CENetworkOutbound == TRUE) \\
&& (TARGET.OSGMM_SoftwareGlobusUrlCopy == TRUE) \\
&& ( isUndefined(TARGET.OSGMM_Success_Rate_$USER) \\
|| (TARGET.OSGMM_Success_Rate_$USER > 75) ) \\
)
# when retrying, remember the last 4 resources tried
match_list_length = 4
Rank = (TARGET.Rank) - \\
((TARGET.Name =?= LastMatchName0) * 1000) - \\
((TARGET.Name =?= LastMatchName1) * 1000) - \\
((TARGET.Name =?= LastMatchName2) * 1000) - \\
((TARGET.Name =?= LastMatchName3) * 1000)
# make sure the job is being retried and rematched
periodic_release = (NumGlobusSubmits < 5)
globusresubmit = (NumSystemHolds >= NumJobMatches)
rematch = True
globus_rematch = True
# only allow for the job to be queued for a while, then try to move it
# GlobusStatus==16 is suspended
# JobStatus==1 is pending
# JobStatus==2 is running
periodic_hold = ( (GlobusStatus==16) || \\
((JobStatus==1) && ((CurrentTime - EnteredCurrentStatus) > (20*60))) || \\
((JobStatus==2) && ((CurrentTime - EnteredCurrentStatus) > ($MAX_WALL_TIME*60))) )
# stay in queue on failures
on_exit_remove = (ExitBySignal == False) && (ExitCode == 0)
executable = ../../remote-job-wrapper
arguments = $RUN_ID $JOB_ID $BASE_URL $INPUT_FILE
stream_output = False
stream_error = False
WhenToTransferOutput = ON_EXIT
TransferExecutable = true
output = logs/$JOB_ID/job.out
error = logs/$JOB_ID/job.err
log = alljobs.log
notification = NEVER
queue
EOF
# update dag
echo "" >>master.dag
echo "JOB job_$JOB_ID $JOB_ID.condor" >>master.dag
echo "SCRIPT PRE job_$JOB_ID $TOP_DIR/local-pre-job $RUN_DIR $RUN_ID $JOB_ID" >>master.dag
echo "SCRIPT POST job_$JOB_ID $TOP_DIR/local-post-job $RUN_DIR $RUN_ID $JOB_ID" >>master.dag
echo "RETRY job_$JOB_ID 7" >>master.dag
done
condor_submit_dag -notification NEVER master.dag
We also need the local-pre-job and local-post-job
scripts. These are invoked locally before and
each job is run. In this example the pre script is used to maintain
the permissions on the log file, while the post script check to make
sure that the output of the job is what we expect it to be. This is
an important part as some sites do not return the correct exit code
of your job. Checking the output after the job is done is extra
insurance against all jobs completing successfully. If an post script
fails (returns non-zero exit code), DAGMan considers the job as failed
and will re-run the job.
Example 5. local-pre-job
#!/bin/bash
set -e
RUN_DIR=$1
RUN_ID=$2
JOB_ID=$3
# make sure the log file is readable by the match maker
touch $RUN_DIR/alljobs.log
chmod 644 $RUN_DIR/alljobs.log
Example 6. local-post-job
#!/bin/bash
set -e
RUN_DIR=$1
RUN_ID=$2
JOB_ID=$3
TIMESTAMP=`/bin/date +'%y%m%d_%H:%M'`
# make sure the output has the successful marker - this is done
# because we can not always trust the exit codes from the grid sites
if grep "=== RUN SUCCESSFUL ===" $RUN_DIR/logs/$JOB_ID/job.out; then
exit 0
else
# keep copies of the output for failed jobs
cd $RUN_DIR/logs/$JOB_ID
cp job.out job.out.checked.$TIMESTAMP
cp job.err job.err.checked.$TIMESTAMP
exit 1
fi
The last thing we need is a rempte-job-wrapper, which purpose it is to
provide a nice environment for your model to run in. The script
below will create a temporary directory on the local disk
(that is, under $OSG_WN_TMP and clean up
after the job.
Note that if you need to stage out more outputs, it is common to tar the outputs up in one file, and stage just that file. The reason this is simpler is that you don't have to know ahead of time what your outputs will be named. This job example can easily be extended to do more complex data staging, either by using Condor or by using GridFTP / SRM.
Example 7. rempte-job-wrapper
#!/bin/bash
# provide some information about the host we are running on
function host_info()
{
echo
echo "Running on" `hostname -f` "($OSG_SITE_NAME)"
echo
echo "uname -a"
uname -a
echo
echo -n "OS: "
if [ -e /etc/redhat-release ]; then
echo "RedHat (maybe derivative)"
cat /etc/redhat-release
else
if [ -e /etc/debian_version ]; then
echo "Debian"
cat /etc/debian_version
else
echo "Unknown"
fi
fi
echo
echo "ulimit -a"
ulimit -a
echo
echo "/usr/bin/env"
/usr/bin/env
echo
echo "cat /proc/cpuinfo"
cat /proc/cpuinfo
echo
echo "cat /proc/meminfo"
cat /proc/meminfo
echo
echo "---------------------------------------------------"
echo
}
# create a work directory in a place the site asks us to use
function create_work_dir()
{
unset TMPDIR
TARGETS="$OSG_WN_TMP $OSG_DATA/engage/tmp"
for DER in $TARGETS; do
WORK_DIR=`/bin/mktemp -d -p $DER job.XXXXXXXXXX`
if [ $? = 0 ]; then
echo "Created workdir in $DER"
export WORK_DIR
return 0
fi
echo "Failed to create workdir in $DER"
done
return 1
}
# clean up the temporary work directory
function cleanup()
{
cd $START_DIR
rm -rf $WORK_DIR || /bin/true
}
# use gridftp to stage in model and inputs
function stage_in()
{
cd $WORK_DIR
# get the application
globus-url-copy -v -notpt -nodcau \
$BASE_URL/application/wordfreq \
file://$WORK_DIR/wordfreq \
|| return 1
chmod 755 wordfreq
# get the inputs
globus-url-copy -v -notpt -nodcau \
$BASE_URL/inputs/$INPUT_FILE \
file://$WORK_DIR/$INPUT_FILE \
|| return 1
chmod 755 wordfreq
return 0
}
# use gridftp to stage out results
function stage_out()
{
cd $WORK_DIR
globus-url-copy -v -create-dest -notpt -nodcau \
file://$WORK_DIR/app.stdouterr \
$BASE_URL/runs/$RUN_ID/outputs/$JOB_ID.app.stdouterr \
|| return 1
return 0
}
# execute the model
function run_model()
{
cd $WORK_DIR
# input identifier in the output file
(echo "$INPUT_FILE"; echo) >app.stdouterr
# run the real model
cat $INPUT_FILE | ./wordfreq >>app.stdouterr 2>&1
EXIT_CODE=$?
# if failure, put the last lines on stdout - useful for debugging
if [ "x$EXIT_CODE" != "x0" ]; then
tail -n 500 app.stdouterr
fi
return $EXIT_CODE
}
# run id is the first argument
RUN_ID=$1
# job id is the second argument
JOB_ID=$2
# gridftp base url
BASE_URL=$3
# input file name
INPUT_FILE=$4
# keep the exit code to the end
EXIT_CODE=1
# remember start dir
START_DIR=`pwd`
# first, collect some information about the environment
host_info
# grid environment set up
if [ "x$PATH" = "x" ]; then
export PATH="/usr/bin:/bin"
fi
. $OSG_GRID/setup.sh || {
echo "Unable to source \$OSG_GRID/setup.sh"
exit 1
}
# we need a local temp directory to do the actual work in
# it is important to try to use local filesystems as much as
# possible during jobs, instead of using the shared $OSG_DATA
create_work_dir
if [ $? != 0 ]; then
exit 1
fi
# is it also very important to do the cleanup in case of failure
trap cleanup 1 2 3 6
(stage_in && run_model $RUN_ID $JOB_ID && stage_out)
EXIT_CODE=$?
# cleanup
cleanup
# signal the sucess/failure of the job
if [ "x$EXIT_CODE" = "x0" ]; then
# give the all good signal to the job-success-check script
echo "=== RUN SUCCESSFUL ==="
else
echo "Job failed with exit code $EXIT_CODE"
fi
exit $EXIT_CODE
To run this example, save all the files, make sure the executable bit is set an all the script (chmod 755) and then run:
./submit
You will find a timestampped directory under runs/
containing logs and outputs.