OSG MM SourceForge Project Site

Administration Guide

Deployment Scenarios

OSGMM will be continuously run on your submit machine. It will bring down updated ClassAds from the the next level up collector, monitor the queue to update ranks, and monitor your jobs to keep track of job success rates.

OSGMM can be run in one of two modes: VO master or single install. Smaller VOs just need a single install. Bigger VOs which have a need for multiple submit nodes, can run a VO master instance of OSG MM which can do the site verifications, and then the have simpler installs on the submit nodes. In this case, you can think of OSGMM having three levels of Condor Collectors. The top level is the ReSS Condor Collector at FNAL. The second level is the VO central server (a VO should have one, and only one), and the third is the lite version which can be used on any submit host within the VO. At each level, some massaging of the ReSS ClassAds takes place.

Installing OSG MatchMaker

The only software prerequisite is the OSG Client software stack.

  1. Create a new user. It is recommended that the OSG MM daemon is run under a dedicated user account. The rest of this document will call this user osgmm but the username you choose does not matter. The user account should be a regular user account which you can log in to.

  2. Make sure that the OSG setup.sh/setup.csh scripts get sourced for the new user. You should be able to run condor_q and condor_status. Also setup $JAVA_HOME and the $PATH so that java can be used on the command line.

  3. You can use the home directory of the new user for the installation

  4. Download and untar the latest version of OSG MM from the SourceForge download page: OSG MM downloads.

  5. Edit the config file at etc/osgmm.conf. Change the vo_name paramater to the VO you belong to.

Configuring Condor

The OSG Match Maker will be run by condor_master just like any other Condor daemon.

To provide the correct environment (condor_q and condor_status needs to be in the path and working), use a wrapper script. Put this in sbin/osgmm-wrapper). For a VDT based install:

#!/bin/bash
# A simple wrapper for /opt/vdt/osg-match-maker/sbin/osgmm, to ensure that
# the VDT environment variables are present.
. "/opt/osg-client/setup.sh"
exec "/opt/vdt/osg-client/sbin/osgmm" "$@"
                

Find your local Condor config file (usually condor/local.*/condor_config.local) and add:

# Add the OSG Match Maker to the daemons managed by Condor.
DAEMON_LIST = $(DAEMON_LIST), OSGMM
OSGMM = /path/to/osg-match-maker/sbin/osgmm-wrapper

# To run the OSGMM as a different user account, uncomment this line and modify
# it to replace osgmm with the user in question.  The account must have a
# usable home directory.
OSGMM_USERID = osgmm

# If you are running a more complex security configuration with the OSGMM
# running as a different user, the following settings should allow the OSG Match
# Maker the access it requires.
ALLOW_WRITE = $(ALLOW_WRITE),$(OSGMM_USERID)@*/$(FULL_HOSTNAME)
SEC_DAEMON_AUTHENTICATION_METHODS = FS, $(SEC_DAEMON_AUTHENTICATION_METHODS)

Note that you will have to change the path and possibly the user id

It is also a good idea to change the scheduling intervals, tweak the GridManager settings, and disable the local virtual machines:

NEGOTIATOR_INTERVAL=25
SCHEDD_INTERVAL = 60
GRIDMANAGER_MAX_SUBMITTED_JOBS_PER_RESOURCE=400
GRIDMANAGER_MAX_PENDING_SUBMITS_PER_RESOURCE=3

Once the changes have been made, restart Condor.

Testing the Install

Check var/log/osgmm.log for errors. If everything looks good there, run condor_grid_overview and check that it has a list of available resources. It is a good idea to copy condor_grid_overview to a common place, such as condor/bin/ in the OSG Client installation, so that users will have the tool in their path automatically.

Enabling Site Verification/Maintenance

Site verification/maintenance can be enabled in the osgmm.conf file. This is an optional feature, but enabling this will increase your job success rates.

To enable the verification/maintenance and jobs, configure the MyProxy settings and verification/maintenance job interval in etc/osgmm.conf. Store a credential in MyProxy use a command like:

myproxy-init --pshost my.proxy.host.org --voms Engage --cred_lifetime 4320 --username osgmm
            

The results of the verification jobs will be reflected in the site ranks. When the system is started without any sites have been verified, you will see ranks of 1. When the jobs are submitted is now triggered by checking timestamps on the last run. If you want to force a rerun for a particular site, just remove the site's directory in var/verification-runs/ or var/maintenance-runs/.

You can also add your own maintenance/verificaiton steps by creating these files:

etc/extra.maintenance-script.fork
etc/extra.maintenance-script.jm
etc/extra.verification-script.fork
etc/extra.verification-script.jm

The content of those files will be appended to the system scripts before sending them to the remote side. The system scripts can be found under libexec/. If you need to signal test failure in the verification scripts just echo "TEST FAILED" on stdout. This flag will be picked up by OSGMM.

Overriding ClassAd Attributes or Full ClassAds

You can add your own sites by creating ClassAds and store them in etc/additional-ads/. The ads should be the same format as the ads appearing in the ReSS feed (i.e., in the Glue format).

You can override ReSS ClassAd attributes by placing files with the attributes in etc/attribute-overrides/. The files should be named with the site id (for example FNAL_GPFARM). One example would be to limit the number of jobs we should send to a site. In etc/ad-overrides/FNAL_GPFARM, put:

MaxMatches = 25

To override an attribute for all sites, put the attribute in etc/attribute-override-all-ads

Job Statistic Graphs

OSGMM will track and create graphs of jobs in the system. These graphs can be found in var/stats and they are updated every 5 minutes.