View on GitHub

compose-htc-wn

Docker compose to setup a functioning WNs attaching to a remote scheduler

What will be installed?

                                                          ▲
                                                          │
┌─────────────────────────────────────────┐               │
│                                         │               │ To remote pool
│         HTCondor STARTD                 │               │
│                                         ├───────────────┘
│              Partionable                │   shared secret
├──────────────────────┬─────────┬────────┤
│                      │         │        │
│     SLOT 1           │  SLOT 2 │ SLOT 3 │
│                      │         │        │
├──────────────────────┴─────────┴────────┤
│                                         │
│                CVMFS                    │
│                                         │
└─────────────────────────────────────────┘

Requirements

Preparation

Firs of all create all the needed directories:

# clone this repository
git clone https://github.com/comp-dev-cms-ita/compose-htc-wn
cd compose-htc-wn
# Here the condor logs will be stored
mkdir ./logs
sudo chown 64:64 -R ./logs
# Here the cvmfs repos will be mounted and cached
mkdir -p ./cvmfs/cache
sudo chown 102:102 -R ./cvmfs/cache

# CVMFS cache limit to 4GB (adapt it to your space availability)
echo "CVMFS_QUOTA_LIMIT=4000" >> ./default-local/images.dodas.infn.it.conf

# Put in this file the shared secret to authenticate with the remote schedd
echo -n "HTC SHARED SECRET HERE" > ./shared-secret/pool_password
sudo chown root ./shared-secret/pool_password
sudo chmod 600 ./shared-secret/pool_password

Finally you will need a few configurations specific for you site. You should do this simply putting in .env file the following information:

_condor_SiteName = "\"YOUR SITE NAME HERE\""
NUM_CPUS = 8
MEMORY = 16000

N.B. be careful on leaving the \" where they are in the example

Deploy

:exclamation: N.B. Only for setup with external volume for wn spool dir

Mount on ./wn-spool the external volume and set the correct permission

sudo mkdir ./wn-spool
sudo mount /dev/vdb1 ./wn-spool
sudo chown 64:64 -R ./wn-spool

Then in docker-compose.yaml uncomment the following lines (both under WN and telegraf service):

  - type: bind
     source: ./wn-spool
     target: /var/lib/condor/execute

Now everything should be ready to go. Bring up the system with:

docker-compose up -d

and monitor the status via a simple docker ps command. When everything is in status healthy (that can take several minutes), you should be able to find the logs of the WN on ./logs folder.

Reconfigure

If you need to change a configuration parameter, you also need to restart the containers in order to make the change taking effect:

docker-compose down
sudo umount ./shared-home
docker-compose up -d

Set the automatic cache selection for the xrootd client

❗ please check that your repository is aligned with the latest commits. In particular check if in .env file you see an entry like: XRD_PLUGINCONFDIR=/etc/xrootd/client.plugins.d/

First of all edit client.plugins.d/xcache.conf as you need:

url = xrootd-cms.infn.it
lib = /cvmfs/cms.dodas.infn.it/miniconda3/envs/cms-dodas/lib/libXrdClProxyPlugin-5.so 
enable = true

xroot_proxy = root://PUT HERE host:port of your cache or leave the default if your site don't have one yet//
xroot_proxy_excl_domains = file:/*,localhost,eospublic.cern.ch,"PUT HERE THE HOSTNAME FOR WHICH YOU DON'T WANT TO PASS THROUGH THE CACHE (e.g. your local storage endpoint)"

The in docker-compose.yaml uncomment the following lines:

    #- type: bind
    #  source: ./client.plugins.d
    #  target: /etc/xrootd/client.plugins.d

You are now ready to deploy or reconfigure as explained above.

Reset the environment and redeploy

For any unrecoverable error you can proceed with a scratch docker environment with the following commands:

docker-compose down
kill -9 $(lsof -i :9618 | grep TCP | awk '{print $2}')
docker ps | awk '{print $1}' | xargs docker kill
docker ps -a | awk '{print $1}' | xargs docker rm
sudo umount ./shared-home
docker-compose up -d