OSGMM will be continuously run on your submit machine. It will bring down updated ClassAds from the the next level up collector, monitor the queue to update ranks, and monitor your jobs to keep track of job success rates.
OSGMM can be run in one of two modes: VO master or single install. Smaller VOs just need a single install. Bigger VOs which have a need for multiple submit nodes, can run a VO master instance of OSG MM which can do the site verifications, and then the have simpler installs on the submit nodes. In this case, you can think of OSGMM having three levels of Condor Collectors. The top level is the ReSS Condor Collector at FNAL. The second level is the VO central server (a VO should have one, and only one), and the third is the lite version which can be used on any submit host within the VO. At each level, some massaging of the ReSS ClassAds takes place.
The only software prerequisite is the OSG Client software stack.
Create a new user. It is recommended that the OSG MM daemon is run under a dedicated user account. The rest of this document will call this user osgmm but the username you choose does not matter. The user account should be a regular user account which you can log in to.
Make sure that the OSG setup.sh/setup.csh scripts get sourced for the new user. You should be able to run condor_q and condor_status. Also setup $JAVA_HOME and the $PATH so that java can be used on the command line.
You can use the home directory of the new user for the installation
Download and untar the latest version of OSG MM from the SourceForge download page: OSG MM downloads.
Edit the config file at etc/osgmm.conf. Change the vo_name paramater to the VO you belong to.
The OSG Match Maker will be run by condor_master just like any other Condor daemon.
To provide the correct environment (condor_q and condor_status needs to be
in the path and working), use a wrapper script. Put this in
sbin/osgmm-wrapper
). For a VDT based install:
#!/bin/bash # A simple wrapper for /opt/vdt/osg-match-maker/sbin/osgmm, to ensure that # the VDT environment variables are present. . "/opt/osg-client/setup.sh" exec "/opt/vdt/osg-client/sbin/osgmm" "$@"
Find your local Condor config file (usually
condor/local.*/condor_config.local
) and add:
# Add the OSG Match Maker to the daemons managed by Condor. DAEMON_LIST = $(DAEMON_LIST), OSGMM OSGMM = /path/to/osg-match-maker/sbin/osgmm-wrapper # To run the OSGMM as a different user account, uncomment this line and modify # it to replace osgmm with the user in question. The account must have a # usable home directory. OSGMM_USERID = osgmm # If you are running a more complex security configuration with the OSGMM # running as a different user, the following settings should allow the OSG Match # Maker the access it requires. ALLOW_WRITE = $(ALLOW_WRITE),$(OSGMM_USERID)@*/$(FULL_HOSTNAME) SEC_DAEMON_AUTHENTICATION_METHODS = FS, $(SEC_DAEMON_AUTHENTICATION_METHODS)
Note that you will have to change the path and possibly the user id
It is also a good idea to change the scheduling intervals, tweak the GridManager settings, and disable the local virtual machines:
NEGOTIATOR_INTERVAL=25 SCHEDD_INTERVAL = 60 GRIDMANAGER_MAX_SUBMITTED_JOBS_PER_RESOURCE=400 GRIDMANAGER_MAX_PENDING_SUBMITS_PER_RESOURCE=3
Once the changes have been made, restart Condor.
Check var/log/osgmm.log
for errors. If
everything looks good there, run condor_grid_overview
and check that it has a list of available resources. It is a good idea to
copy condor_grid_overview to a common place, such as
condor/bin/
in the OSG Client installation, so that
users will have the tool in their path automatically.
Site verification/maintenance can be enabled in the osgmm.conf file. This is an optional feature, but enabling this will increase your job success rates.
To enable the verification/maintenance and jobs, configure the MyProxy
settings and verification/maintenance job interval in
etc/osgmm.conf
. Store a credential in MyProxy use
a command like:
myproxy-init --pshost my.proxy.host.org --voms Engage --cred_lifetime 4320 --username osgmm
The results of the verification jobs will be reflected in the site
ranks. When the system is started without any sites have been verified,
you will see ranks of 1. When the jobs are submitted is now triggered
by checking timestamps on the last run. If you want to force a rerun
for a particular site, just remove the site's directory in
var/verification-runs/
or
var/maintenance-runs/
.
You can also add your own maintenance/verificaiton steps by creating these files:
etc/extra.maintenance-script.fork etc/extra.maintenance-script.jm etc/extra.verification-script.fork etc/extra.verification-script.jm
The content of those files will be appended to the system scripts before sending them to the remote side. The system scripts can be found under libexec/. If you need to signal test failure in the verification scripts just echo "TEST FAILED" on stdout. This flag will be picked up by OSGMM.
You can add your own sites by creating ClassAds and store them in etc/additional-ads/. The ads should be the same format as the ads appearing in the ReSS feed (i.e., in the Glue format).
You can override ReSS ClassAd attributes by placing files with the attributes in etc/attribute-overrides/. The files should be named with the site id (for example FNAL_GPFARM). One example would be to limit the number of jobs we should send to a site. In etc/ad-overrides/FNAL_GPFARM, put:
MaxMatches = 25
To override an attribute for all sites, put the attribute in etc/attribute-override-all-ads