Getting access to LHC Computing Grid. One can find a brief introductory material at the RMKI's getting started page, where you can find information on how to get access to LCG. There are also simple examples on that page.

Some more practical information on running a typical job.

  • Get authenticated on the LCG:

> grid-proxy-init Here, you will be prompted for your grid password. Or:

> grid-proxy-init -valid 4:00 This is the same, but the authentication will be only valid for 4 hours.

Now you can perform various operations on the grid, e.g. send jobs. You can get info on your authentication (e.g. expiration etc.) by grid-proxy-info. You can destroy your authentication proxy by grid-proxy-destroy.

  • Get your jobs authenticated on the LCG:

> myproxy-init Here, you will be prompted for your grid password, and to specify a password attached to your so called job proxy to be created. Or:

> myproxy-init -n Here, you won't be asked to specify a password for your proxy.

Running myproxy-init is necessary when you are running long-term jobs. In this case, the myproxy-init ensures that your jobs still will have authentication even after your interactive proxy (obtained by grid-proxy-init) has expired. You can get information on your job proxy by myproxy-info. You can destroy your job proxy by myproxy-destroy. Note: if you don't use this, you may not be able to retrieve your job outputs for long-term jobs!

  • Running a job:

A "Hello World!" example can be found at RMKI's getting started page. Instead of a "Hello World!" example, we present here a framework, with which one can send jobs in mass to the LCG: see the attached tarball called submit.tar.gz. You can adjust these shell script wrappers to your needs. This assumes that your jobs are placed in a directory scheme like in the attached example skeleton.tar.gz. This latter is a framework for simple jobs, which are ran on a simple computers (contains an automated Makefile and an automated starter shell script).

  • Specifying system requiremets:

It is a common task to require some software environment from the working nodes, where your jobs will be executed (e.g. AFS). You can specify requirements by placing lines like the following line to your .jdl file:

Requirements = (Member("AFS", other.GlueHostApplicationSoftwareRunTimeEnvironment));

One can apply logical operations to the arguments, like:

Requirements = (Member("AFS", other.GlueHostApplicationSoftwareRunTimeEnvironment) && Member("VO-cms-ORCA_8_13_1", other.GlueHostApplicationSoftwareRunTimeEnvironment));

  • Using the storage elements of the LCG:

The files are stored on Storage Elements on the LCG. Usually these have different physical file administration schemes. Therefore, a logical layer was built on top of them, which makes the file access uniform from the user point of view. To use this facility , use the environmental variable LCG_CATALOG_TYPE:

export LCG_CATALOG_TYPE=lfc
(for bash), or
setenv LCG_CATALOG_TYPE lfc
(for tcsh).

The Logical File Name (lfn) is resolved by the Logical File Catalogue (lfc) server. You have to specify this:

export LFC_HOST=lfc-cms.cern.ch
(for bash), or
setenv LFC_HOST lfc-cms.cern.ch
(for tcsh).

Now you can refer to files with their logical file names, which are site-independent. The logical file names have to be unique. A very convenient way to generate them is to imitate the UNIX type file naming scheme. With the command lfc-mkdir one can make logical directories, like

> lfc-mkdir /grid/cms/user_name/some_directory (All logical file names begin with /grid/virtual_organization. If you are a new user, the directory /grid/virtual_organization/user_name will not exist initially: you have to create it first, if you want an own directory.)

There are various commands to manage logical file names, all beginning with lfc-, like lfc-chmod, lfc-getacl, lfc-setacl, lfc-chown, lfc-ln, lfc-rename, lfc-ls, lfc-rm, lfc-mkdir etc. Refer to the man pages of these commands, but the usage of these should be self-explanatory.

As the given logical directory is ready, one can copy files into it. If you want to store a file in the LCG storage infrastructure permanently, copying a file is not enough: you have to register the file in the logical file catalogue. It is done like:

> lcg-cr --vo cms -d grid100.kfki.hu -l lfn:/grid/cms/alaszlo/destionation/testfile.txt \
                                                    file:/afs/kfki.hu/home/alaszlo/source/testfile.txt
(Here, the -d destination_Storage_Element is optional, and is used to force the destination Storage Element to be destination_Storage_Element; otherwise the file may be copied to a Storage Element anywhere in the World.) This command will copy and register your file onto LCG storage system. Once your file is there, you can get a copy of it with lcg-cp e.g. as an input for a job:

> lcg-cp --vo cms lfn:/grid/cms/alaszlo/some_directory/test_file.txt file:$PWD/test_file.txt
(This command stages out the file in question onto a local --or AFS-- disc area, pointed at by $PWD, i.e. into your current directory.)

There are various commands to manage the stored files (e.g. unregister and delete them, or create replicas), all beginning with lcg-, like lcg-rep, lcg-aa, lcg-rf, lcg-cp, lcg-cr, lcg-la, lcg-uf, lcg-del, lcg-lg, lcg-lr, lcg-gt, lcg-ra etc. Refer to the man peges of these commands, but most of them are self-explanatory.

Unfortunately, one cannot copy and register complete directory trees. Therefore, I wrote a shell script wrapper to do this:

> lcgcr.sh source_directory destination_directory_lfn (You can find this here: lcgcr.sh.)

Note1: unfortunately, I encountered that the lcg-cr action often fails (this should be remedied in future). Therefore, it is convenient to put an lcg-cr command into a checker loop:

#!/bin/sh
RESULT=1
TRIES=0
while (( $RESULT != 0 )) ; do
    lcg-cr --vo cms -d grid100.kfki.hu \
        -l lfn:/grid/cms/alaszlo/destintation/test_file.txt \
            file:/afs/kfki.hu/home/alaszlo/source/test_file.txt
    RESULT=$?
    TRIES=$(($TRIES+1))
    if (( $TRIES >= 5 )) ; then echo "lcg-cr failed. Giving up after 5 tries." ; exit 1 ; fi
    if (( $RESULT != 0 )) ; then sleep 1m ; fi
done

This precaution may also be recommended for lcg-cp or other lcg- commands.

Note2: NEVER RESTRICT THE PERMISSIONS OF A FILE OR DIRECTORY IN SUCH A WAY, THAT THE GROUP DOES NOT HAVE WRITE PERMISSIONS. There is a very simple reason for it. It is not really you, who does the actual copying onto a storage element: some program copies it, which is assigned to some more or less random userid, which is mapped to your userid TEMPORARYLY. This is a rather questionable way of manging userid-s, but that is the way how it is implemented in the current version. This fact has rahter dangerous implications. If you restrict the permissions, such that the group does not have write permissions, then at an other occasion (when your userid is mapped to another actual userid), you cannot write your files. :)) Meanwhile, an other user (who is accidentally given that particular userid) may still have write permission to your file, despite of your precaution. :)) Funny, isn't it? As a consequence, noone should restrict the right permissions in such a way that the group does not have write permissions. But this also has a consequence: practically anyone has write permissions to your data, plus you have write permissions for the Nobel prize winner CMS Higgs DST files... Let's hope this conceptional mistake will be corrected soon.

  • Read/write C++ streams to storage elements of the LCG:

As one does generally not want to always stage out the data files to a local disk by hand, and then process it, it is recommended to have read/write streams. Unfortunately, such official streams are not available, yet. Therefore I wrote grid storage i/o stream C++ classes (gstream, igstream, ogstream, like the usual C++ STL fstream, ifstream, ofstream file input/output stream classes; the letter 'g' standing for 'grid'). It does nothing else, but treats the file as a normal file, unless its name begins with the string /grid/. In this case, it stages out the datafile in question onto a local (or AFS) area, and then treats the local file as a normal file. One commonly faces the problem that the file not only has to be processed, but it also has to be passed through a filter programme. Therefore I also wrote pipe streams for grid storage (igpstream, ogpstream), which are based on the ipstream and opstream classes of the library at http://pstreams.sourceforge.net (note the LGPL license!). Some practical examples:

#include "gstream.h"

int main(int argc, char *argv[])
{
    // Open the datafile for reading.
    igstream igfile("/grid/cms/alaszlo/some_datafile.dat");
        // Extract data from your datafile with 'igstream::operator>>' or with 'igstream::read(char*, int)'.
    // Close the datafile.
    igfile.close();

    // Open the datafile for writing.
    ogstream ogfile("/grid/cms/alaszlo/some_datafile.dat");
        // Write data to your datafile with 'ogstream::operator<<' or with 'ogstream::write(char*, int)'.
    // Close the datafile.
    ogfile.close();

    // Open the datafile for reading, through a filter program.
    igpstream igpfile("/grid/cms/alaszlo/some_datafile.dat.gz", "gunzip --stdout %f");
        // Extract data from your datafile with 'igpstream::operator>>' or with 'igpstream::read(char*, int)'.
    // Close the datafile.
    igpfile.close();

    // Open the datafile for writing, through a filter program.
    ogpstream ogpfile("/grid/cms/alaszlo/some_datafile.dat.gz", "gzip - > %f");
        // Write data to your datafile with 'ogpstream::operator<<' or with 'ogpstream::write(char*, int)'.
    // Close the datafile.
    ogpfile.close();

    return 0;
}

You can get the library files here: pstream.h, gstream.h, gstream.cc. Before you can use it you have to export the following environmental variables:

export VO=your_vo
(for bash), or
setenv VO your_vo
(for tcsh), and

export DEST=your_favourite_storage_element
(for bash), or
setenv DEST your_favourite_storage_element
(for tcsh).

Setting the environmental variable TMPDIR is optional. This specifies the local (or AFS) directory, where the datafiles are staged out (therefore, it has to have large disk space!). E.g.:

export TMPDIR=/tmp
(for bash), or
setenv TMPDIR /tmp
(for tcsh).

If not specified, the current working directory ($PWD) is used, as this is recommended for grid jobs (the working nodes have large disk spaces).

-- AndrasLaszlo - 22 Mar 2006

Topic attachments
I Attachment History Action Size Date Who Comment
Cccc gstream.cc r2 r1 manage 11.1 K 2006-06-19 - 12:45 AndrasLaszlo Grid i/o stream library (source code)
Hh gstream.h r1 manage 3.4 K 2006-03-17 - 17:18 AndrasLaszlo Grid i/o stream library (header file)
Shsh lcgcr.sh r3 r2 r1 manage 4.6 K 2006-06-19 - 12:45 AndrasLaszlo Copy and register complete directory trees into Logical File Catalogue
Hh pstream.h r1 manage 60.8 K 2006-03-17 - 17:17 AndrasLaszlo Pipe stream library (consists of a single header file)
Gzgz skeleton.tar.gz r3 r2 r1 manage 18.4 K 2006-06-19 - 12:44 AndrasLaszlo Framework for a simple job
Gzgz submit.tar.gz r3 r2 r1 manage 22.2 K 2006-06-19 - 12:45 AndrasLaszlo A simple LCG mass submitter framework
Edit | Attach | Watch | Print version | History: r16 < r15 < r14 < r13 < r12 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r16 - 2006-06-19 - AndrasLaszlo
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback