User Guide

User Guide
Prev

The OSG Match Maker Concept

The OSG Match Maker is really not doing any match making. What it does do is give Condor information about OSG resources, and then Condor does the real match making. What this means is that the jobs you will be submitting are just Condor jobs, but with a OSG / grid flavor to the required attributes.

The advantage of relying on Condor to the the match making is that we can use the very powerful match maker based on classads. The Condor manual describes this well:

Matchmaking with ClassAds

Condor's ClassAd Mechanism

Matchmaking in the Grid Universe

Command Line Tools

You can use the regular Condor command line tools to monitor resources and jobs. Included with the OSG Match Maker is a wrapper around those Condor tools and the output is a higher level view of the resources / jobs. The tool is called condor_grid_overview. Below is some example output:

Example 1. Output from condor_grid_overview

$ condor_grid_overview

ID      Owner      Command              Resource             Status        Time Sta
======= ========== ==================== ==================== ============= ========
112300  rynge      1.sh                 NYSGRID_CORNELL_NYS1 Running       10:27:14
112303  rynge      1.sh                 isuhep               Running       13:30:02
112307  rynge      1.sh                 UCLA_Saxon_Tier3     Running       13:25:02
112318  rynge      1.sh                 CIT_CMS_T2           Running       10:45:01
112326  rynge      1.sh                 NYSGRID_CORNELL_NYS1 Running        9:50:01
112345  rynge      1.sh                 UCSDT2-B             Running       13:05:02
112383  rynge      1.sh                 UCLA_Saxon_Tier3     Running        6:50:00
112384  rynge      1.sh                 NYSGRID_CORNELL_NYS1 Running        9:38:14
112396  rynge      1.sh                 TTU-ANTAEUS          Running       12:15:02
112511  rynge      1.sh                 FNAL_FERMIGRID       Running        8:15:00
112515  rynge      1.sh                 UFlorida-HPC         Running        7:01:06
112520  rynge      1.sh                 UCSDT2               Running       12:55:02
112540  rynge      1.sh                 CIT_CMS_T2           Running        7:35:00
112545  rynge      1.sh                 RENCI-Engagement     Running       13:35:02
112546  rynge      1.sh                 NYSGRID_CORNELL_NYS1 Running       10:09:12
112561  rynge      1.sh                 FNAL_GPGRID_1        Running       15:50:03
112562  rynge      1.sh                 Purdue-RCAC          Running        9:45:01
112604  rynge      1.sh                 Purdue-RCAC          Running        5:19:58
112637  rynge      1.sh                 FNAL_GPGRID_1        Running        8:20:01
112641  rynge      1.sh                 Purdue-RCAC          Running        5:19:58
112644  rynge      1.sh                 UCSDT2-B             Running       11:00:01
112653  rynge      1.sh                 isuhep               Running        7:25:00
112688  rynge      1.sh                 UCSDT2-B             Running         9:10:01

Site                            Jobs  Subm  Pend  Run  Stage  Fail Unkno  Rank  
============================== ===== ===== ===== ===== ===== ===== ===== =======
BNL_ATLAS_1                        0     0     0     0     0     0     0    953    100%
BNL_ATLAS_2                        0     0     0     0     0     0     0    946    100%
Clemson-Birdnest                   0     0     0     0     0     0     0      1    100%
Clemson-ciTeam                     0     0     0     0     0     0     0      1    100%
CLEMSON-IT                         0     0     0     0     0     0     0      1    100%
FNAL_FERMIGRID                     5     0     0     5     0     0     0    926    100%
FNAL_GPGRID_1                      2     0     0     2     0     0     0    938    100%
GLOW                               0     0     0     0     0     0     0      1    100%
isuhep                             0     0     0     0     0     0     0    945    100%
LIGO_UWM_NEMO                      0     0     0     0     0     0     0    945    100%
MIT_CMS                            2     0     0     2     0     0     0    939    100%
NWICG_NotreDame                    5     0     5     0     0     0     0    200    100%
NYSGRID-CCR-U2                     0     0     0     0     0     0     0      1    100%
NYSGRID-RIT                        0     0     0     0     0     0     0      1    100%
NYSGRID_CORNELL_NYS1               0     0     0     0     0     0     0    955    100%
OCI-NSF                            0     0     0     0     0     0     0    954    100%
Purdue-RCAC                       10     0     0    10     0     0     0     91     10%
Purdue-Steele                      9     1     5     3     0     0     0    945    100%
RENCI-Engagement                   3     0     0     3     0     0     0    932    100%
SBGrid-Harvard-East                0     0     0     0     0     0     0    950    100%
SWT2_CPB                           0     0     0     0     0     0     0    948    100%
TTU-ANTAEUS                        0     0     0     0     0     0     0    945    100%
UCR-HEP                            1     0     0     1     0     0     0    946    100%
UCSDT2                             0     0     0     0     0     0     0    952    100%
UCSDT2-B                           1     0     0     1     0     0     0    941    100%
Vanderbilt                         0     0     0     0     0     0     0    946    100%

165 jobs; 66 idle, 99 running, 0 held

Simple Job

The first job we will look at is a simple job. The reason we call it simple is that the job does not have any problem recovery attributes, so if the job fails, it will not restart or move to another resource.

The condor submit file looks like:

Example 2. test.condor - a simple OSGMM job

universe        = grid
grid_type       = gt2
globusscheduler = $$(GlueCEInfoContactString)
globusrsl       = (maxWallTime=10)
requirements    = ( (TARGET.GlueCEInfoContactString =!= UNDEFINED) && (TARGET.Rank > 300) )

executable = /bin/hostname

stream_output = False
stream_error  = False

WhenToTransferOutput = ON_EXIT
TransferExecutable = false

output = job.out
error = job.err
log = job.log

notification = NEVER

queue

Note the globusscheduler = $$(GlueCEInfoContactString) line. The double dollar signs means that the Condor match maker will fill in that value after the match making has taken place. You can try this example by saving the listing above to a file (for example, test.condor) and then run:

condor_submit test.condor

You can monitor the job with condor_grid_overview and once the job is done, you can check the job.* files for standard out, standard error and log.

Condor Grid Submit Wrapper

Another way to submit a job using a simple submission file is to use the condor_grid_submit submission method. Using this submission method can simplify the creation of jobs by abstracting away the input/output, as well as creating a sandbox for your job to run.

The first step is creating the submission file.

Example 3. condor_grid_submit submission file

transfer_input_files = inputfile.txt
transfer_output_files = output.tar.gz=srm://srm.unl.edu:8446/srm/v2/server?SFN=/mnt/hadoop/user/OSGMMTEST/output.tar.gz

executable = myprogram.sh

output=condor_out/output.$(Cluster).$(Process)
error=condor_out/error.$(Cluster).$(Process)
log=results.log


queue 1

In this example, we transfer the input file inputfile.txt to the grid execution node. After running the program myprogram.sh, the output is transferred, using SRM, to the Storage Element at the University of Nebraska - Lincoln. The inputs and outputs can be in SRM, multiple files should be delimited by commas.

The SRM targets must be in the form <srmsource>=<filename> for input and <filename>=<srmtarget> for output. Inputs and Outputs not using the SRM protocol will be transferred using Condor-G's regular file transfer method.

To submit using condor_grid_submit, run the command:

condor_grid_submit <submitfile>

Advanced Job

The simple job is great for testing, but when it comes to real workloads we want to be able to specify a large number of jobs, and have the jobs manage themselves when it comes to error recovery and retries. Below is an example on how to create a Condor DAGMan. To use this example, start with an empty directory.

The file below, together with sample inputs and a sample model can be downloaded as a tar file: advanced_job_example.tar.gz

Note: this example assumes that there is a GridFTP server running on the same host you are submitting from and that the user is mapped to the local account in /etc/grid-security/grid-mapfile

The first thing we need is a submit script. This will walk across our input data files, and create a Condor job for each input. Around those jobs we will create a Condor DAGMan workflow to handle pre/post scripts for each job and job retries in case of failures. The submit script can be named submit and should have the following contents:

Example 4. submit - used to create a new run

#!/bin/bash                                                                                                                           

set -e

#############################################################################
#                                                                            
# settings                                                                   
#                                                                            

# max run time (minutes)
MAX_WALL_TIME=60

# memory requirements (in megabytes)
MEMORY_REQUIREMENT=400

#############################################################################

# top dir
TOP_DIR=`pwd`

# runid - just a timestamp
RUN_ID=`/bin/date +'%F_%H%M%S'`
echo "Run id is $RUN_ID"       

# run dir
RUN_DIR=$TOP_DIR/runs/$RUN_ID
mkdir -p $RUN_DIR/logs       
touch $RUN_DIR/alljobs.log
chmod 644 $RUN_DIR/alljobs.log

# gridftp base urls
BASE_URL=gsiftp://`hostname -f`"$TOP_DIR"

JOB_ID=0
for INPUT_FILE in `(cd inputs && ls | sort)`; do
    JOB_ID=$(($JOB_ID + 1))                     
    echo "Generating job $JOB_ID for input $INPUT_FILE"
    mkdir -p $RUN_DIR/logs/$JOB_ID                     

    # condor submit file
    cd $RUN_DIR         
    cat >$JOB_ID.condor <<EOF
universe        = grid       
grid_type       = gt2        
globusscheduler = \$\$(GlueCEInfoContactString)
globusrsl       = (maxWallTime=$MAX_WALL_TIME)(min_memory=$MEMORY_REQUIREMENT)(max_memory=$MEMORY_REQUIREMENT)
requirements    = ( (TARGET.GlueCEInfoContactString =!= UNDEFINED) \\
                    && (TARGET.Rank > 300) \\
                    && (TARGET.OSGMM_MemPerCPU >= ($MEMORY_REQUIREMENT * 1000)) \\
                    && (TARGET.OSGMM_CENetworkOutbound == TRUE) \\   
                    && (TARGET.OSGMM_SoftwareGlobusUrlCopy == TRUE) \\
                    && ( isUndefined(TARGET.OSGMM_Success_Rate_$USER) \\
                          || (TARGET.OSGMM_Success_Rate_$USER > 75) ) \\     
                  )                                                          

# when retrying, remember the last 4 resources tried
match_list_length = 4                               
Rank              = (TARGET.Rank) - \\              
                    ((TARGET.Name =?= LastMatchName0) * 1000) - \\
                    ((TARGET.Name =?= LastMatchName1) * 1000) - \\
                    ((TARGET.Name =?= LastMatchName2) * 1000) - \\
                    ((TARGET.Name =?= LastMatchName3) * 1000)     

# make sure the job is being retried and rematched
periodic_release = (NumGlobusSubmits < 5)         
globusresubmit = (NumSystemHolds >= NumJobMatches)
rematch = True                                    
globus_rematch = True                             

# only allow for the job to be queued for a while, then try to move it
#  GlobusStatus==16 is suspended
#  JobStatus==1 is pending
#  JobStatus==2 is running
periodic_hold = ( (GlobusStatus==16) || \\
                  ((JobStatus==1) && ((CurrentTime - EnteredCurrentStatus) > (20*60))) || \\
                  ((JobStatus==2) && ((CurrentTime - EnteredCurrentStatus) > ($MAX_WALL_TIME*60))) )

# stay in queue on failures
on_exit_remove = (ExitBySignal == False) && (ExitCode == 0)

executable = ../../remote-job-wrapper
arguments = $RUN_ID $JOB_ID $BASE_URL $INPUT_FILE

stream_output = False
stream_error  = False

WhenToTransferOutput = ON_EXIT
TransferExecutable = true

output = logs/$JOB_ID/job.out
error = logs/$JOB_ID/job.err
log = alljobs.log

notification = NEVER

queue
EOF

    # update dag
    echo "" >>master.dag
    echo "JOB    job_$JOB_ID $JOB_ID.condor" >>master.dag
    echo "SCRIPT PRE   job_$JOB_ID $TOP_DIR/local-pre-job $RUN_DIR $RUN_ID $JOB_ID" >>master.dag
    echo "SCRIPT POST  job_$JOB_ID $TOP_DIR/local-post-job $RUN_DIR $RUN_ID $JOB_ID" >>master.dag
    echo "RETRY  job_$JOB_ID 7" >>master.dag
done

condor_submit_dag -notification NEVER master.dag

We also need the local-pre-job and local-post-job scripts. These are invoked locally before and each job is run. In this example the pre script is used to maintain the permissions on the log file, while the post script check to make sure that the output of the job is what we expect it to be. This is an important part as some sites do not return the correct exit code of your job. Checking the output after the job is done is extra insurance against all jobs completing successfully. If an post script fails (returns non-zero exit code), DAGMan considers the job as failed and will re-run the job.

Example 5. local-pre-job

#!/bin/bash

set -e

RUN_DIR=$1
RUN_ID=$2
JOB_ID=$3

# make sure the log file is readable by the match maker
touch $RUN_DIR/alljobs.log
chmod 644 $RUN_DIR/alljobs.log

Example 6. local-post-job

#!/bin/bash

set -e

RUN_DIR=$1
RUN_ID=$2
JOB_ID=$3

TIMESTAMP=`/bin/date +'%y%m%d_%H:%M'`

# make sure the output has the successful marker - this is done
# because we can not always trust the exit codes from the grid sites
if grep "=== RUN SUCCESSFUL ===" $RUN_DIR/logs/$JOB_ID/job.out; then
    exit 0
else
    # keep copies of the output for failed jobs
    cd $RUN_DIR/logs/$JOB_ID
    cp job.out job.out.checked.$TIMESTAMP
    cp job.err job.err.checked.$TIMESTAMP
    exit 1
fi

The last thing we need is a rempte-job-wrapper, which purpose it is to provide a nice environment for your model to run in. The script below will create a temporary directory on the local disk (that is, under $OSG_WN_TMP and clean up after the job.

Note that if you need to stage out more outputs, it is common to tar the outputs up in one file, and stage just that file. The reason this is simpler is that you don't have to know ahead of time what your outputs will be named. This job example can easily be extended to do more complex data staging, either by using Condor or by using GridFTP / SRM.

Example 7. rempte-job-wrapper

#!/bin/bash                                                       


# provide some information about the host we are running on
function host_info()
{
    echo
    echo "Running on" `hostname -f` "($OSG_SITE_NAME)"

    echo
    echo "uname -a"
    uname -a

    echo
    echo -n "OS: "
    if [ -e /etc/redhat-release ]; then
        echo "RedHat (maybe derivative)"
        cat /etc/redhat-release
    else
        if [ -e /etc/debian_version ]; then
            echo "Debian"
            cat /etc/debian_version
        else
            echo "Unknown"
        fi
    fi

    echo
    echo "ulimit -a"
    ulimit -a

    echo
    echo "/usr/bin/env"
    /usr/bin/env

    echo
    echo "cat /proc/cpuinfo"
    cat /proc/cpuinfo

    echo
    echo "cat /proc/meminfo"
    cat /proc/meminfo

    echo
    echo "---------------------------------------------------"
    echo
}


# create a work directory in a place the site asks us to use
function create_work_dir()                                  
{                                                           
    unset TMPDIR                                            
    TARGETS="$OSG_WN_TMP $OSG_DATA/engage/tmp"              
    for DER in $TARGETS; do                                 
        WORK_DIR=`/bin/mktemp -d -p $DER job.XXXXXXXXXX`    
        if [ $? = 0 ]; then                                 
            echo "Created workdir in $DER"                  
            export WORK_DIR                                 
            return 0                                        
        fi                                                  
        echo "Failed to create workdir in $DER"             
    done                                                    
    return 1                                                
}                                                           


# clean up the temporary work directory
function cleanup()                     
{                                      
    cd $START_DIR                      
    rm -rf $WORK_DIR || /bin/true      
}                                      


# use gridftp to stage in model and inputs
function stage_in()                       
{                                         
    cd $WORK_DIR                          

    # get the application
    globus-url-copy -v -notpt -nodcau \
                    $BASE_URL/application/wordfreq \
                    file://$WORK_DIR/wordfreq \     
                    || return 1                     
    chmod 755 wordfreq                              

    # get the inputs
    globus-url-copy -v -notpt -nodcau \
                    $BASE_URL/inputs/$INPUT_FILE \
                    file://$WORK_DIR/$INPUT_FILE \
                    || return 1                   
    chmod 755 wordfreq                            

    return 0
}           


# use gridftp to stage out results
function stage_out()              
{                                 
    cd $WORK_DIR                  

    globus-url-copy -v -create-dest -notpt -nodcau \
                    file://$WORK_DIR/app.stdouterr \
                    $BASE_URL/runs/$RUN_ID/outputs/$JOB_ID.app.stdouterr \
                    || return 1                                           
    return 0                                                              
}                                                                         


# execute the model
function run_model()
{                   
    cd $WORK_DIR    

    # input identifier in the output file
    (echo "$INPUT_FILE"; echo) >app.stdouterr

    # run the real model
    cat $INPUT_FILE | ./wordfreq >>app.stdouterr 2>&1
    EXIT_CODE=$?                                     
                                                     
    # if failure, put the last lines on stdout - useful for debugging
    if [ "x$EXIT_CODE" != "x0" ]; then                               
        tail -n 500 app.stdouterr                                    
    fi                                                               
    return $EXIT_CODE                                                
}                                                                    


# run id is the first argument
RUN_ID=$1                     

# job id is the second argument
JOB_ID=$2                      

# gridftp base url
BASE_URL=$3       

# input file name
INPUT_FILE=$4    

# keep the exit code to the end
EXIT_CODE=1                    

# remember start dir
START_DIR=`pwd`

# first, collect some information about the environment
host_info

# grid environment set up
if [ "x$PATH" = "x" ]; then
    export PATH="/usr/bin:/bin"
fi
. $OSG_GRID/setup.sh || {
    echo "Unable to source \$OSG_GRID/setup.sh"
    exit 1
}

# we need a local temp directory to do the actual work in
# it is important to try to use local filesystems as much as
# possible during jobs, instead of using the shared $OSG_DATA
create_work_dir
if [ $? != 0 ]; then
    exit 1
fi

# is it also very important to do the cleanup in case of failure
trap cleanup 1 2 3 6

(stage_in && run_model $RUN_ID $JOB_ID && stage_out)
EXIT_CODE=$?

# cleanup
cleanup

# signal the sucess/failure of the job
if [ "x$EXIT_CODE" = "x0" ]; then
    # give the all good signal to the job-success-check script
    echo "=== RUN SUCCESSFUL ==="
else
    echo "Job failed with exit code $EXIT_CODE"
fi

exit $EXIT_CODE

To run this example, save all the files, make sure the executable bit is set an all the script (chmod 755) and then run:

./submit

You will find a timestampped directory under runs/ containing logs and outputs.

Prev
Administration Guide	Home