National Partnership for Advanced Computational Infrastructure: Archives

These pages are a copy of the original www.npaci.edu website, and should be used for historical reference only.
Please select an item from the toolbar below to be taken to the latest information on that subject.
[ SDSC | User Services | Applications | Allocations | Consulting | SAC | Datastar | Training ]


NPACI Grid: FAQ


ABOUT NPACI Grid
What Is It?
Case Studies
Grid Monitor
Testbed Info
Terminology
FAQ

USER REFERENCE
Getting Started
Tutorial
Certificates
Resources
NPACKage
HotPage

LEARN MORE
Events
Web Links
Contacts

 

NPACI Archive Page

The NPACI program ended on September 30, 2004. This site is presented for archival purposes only. For current resources at each of the partner sites, please refer to the appropriate institution site.

 

FAQ

  1. Why should I use the NPACI Grid and/or NPACKage?
  2. How do I access the NPACI Grid?
  3. How do I obtain a certificate for the NPACI Grid?
  4. May I use a certificate in a non-Globus environment?
  5. How do I start a new grid session?
  6. The grid-proxy-init commands gives "command not found"
  7. Why do I have a bunch of gram_job_mgr_*.log files in my home directory?
  8. grid-proxy-init fails on an AIX system
  9. Why do I get authentication errors when I try to connect to horizon.sdsc.edu under Globus?
  10. I want to use mpich/mpich-g2/vendormpi, but they are in the wrong order in my path

Answers

  1. Why should I use the NPACI Grid and/or NPACKage?

    Simplified Job Submission
    : Because NPACKage is installed on the NPACI Grid, a single job description language (Resource Specification Language) may be used to submit jobs to any site.  In addition, with Condor-G one may submit and monitor all jobs from a single site.

    Single Sign-on: Grid certificates enable single sign-on capabilities on the NPACI Grid. 

    Extend Local Resources: By installing NPACKage on your local resources, you will be extending your own grid with Supercomputing resources.  This will allow you to run smaller jobs locally and large jobs on the NPACI Grid.

    Portals: Grid portals may be developed to provide a single point of access to to data and tools.  In addition, portals simplify complex programs and workflows.


  2. How do I access the NPACI Grid?

    For information on obtaining an account, setting up your environment, and obtaining certificates, refer to Getting Started.  For information on some of the NPACI Grid programs and features, see the Tutorial.  Refer to the Grid Services Matrix for the services and parameters required to run on the grid.

  3. How do I obtain a certificate for the NPACI Grid?

    See Certificates for instructions.

  4. Can I use my NPACI certificate in a non-Globus environment?

    No.

  5. How do I start a new grid session?

    Once you have obtained a certificate and initialized your NPACKage environment, run grid-proxy-init.  This will create a proxy certificate for you so you won't have to enter a passphrase each time you access a new site.  Proxies are generally valid for one day.

    b80n03 ~ 2% grid-proxy-init
    Your identity: /C=US/O=NPACI/OU=SDSC/CN=J Doe/USERID=jdoe
    Enter GRID pass phrase for this identity:
    Creating proxy ............................................. Done
    Your proxy is valid until: Wed Jul 16 02:59:27 2003

  6. The grid-proxy-init command gives "command not found"

    You need to initialize your NPACKage environment. To use the NPACI Grid and the NPACKage software stack, you need to set up the right environment. To accomplish this, you will need to place a few commands in your shell initialization files. (And of course, re-login after you make the changes.)  Read more about configuration.

  7. Why do I have a bunch of gram_job_mgr_*.log files in my home directory?

    These log files are automatically placed in your home directory when you run globus jobs and are removed when the job completes successfully.  If the job fails, the log is not deleted as it may be useful for debugging the problem.


     
  8. grid-proxy-init fails on an AIX system
     
    If this happens, and you run grid-proxy-init with the -debug flag with the result below,

    [uxxx@longhorn uxxx]$ grid-proxy-init -debug

    Output File: /tmp/x509up_uxx
    Your identity: /C=US/O=NPACI/USERID=uxxx
    Enter GRID pass phrase for this identity:
    Creating proxy

    ERROR: Couldn't create proxy certificate

    grid_proxy_init.c:869:
    globus_gsi_proxy.c:763: globus_gsi_proxy_create_signed: Error with the proxy handle
    globus_gsi_proxy.c:234: globus_gsi_proxy_create_req: Error with private key: Couldn't generate RSA key pair for proxy handle
    OpenSSL Error: rsa_gen.c:182: in library: rsa routines, function RSA_generate_key: BN lib
    OpenSSL Error: md_rand.c:501: in library: random number
    generator, function SSLEAY_RAND_BYTES: PRNG not seeded
    OpenSSL Error: pem_lib.c:666: in library: PEM routines, function PEM_read_bio: no start line

    this can be caused for one of two reasons. 
    • there is an internal error.  The 'SSLEAY_RAND_BYTES: PRNG not seeded' error means there were not enough random numbers available to a daemon called "entropy" to create your proxy certificate.  The entropy daemon will collect some more random numbers. Wait several seconds and try again.
    • On AIX machines, a random number generated in /dev/random is not being generated. There are two workarounds.

         1) Create a .rnd file in your home directory with 200 bytes of random data (it doesn't matter what data this file contains)
         2) Set this environment variable every time you login (place it in your shell startup file, e.g., .cshrc):
      setenv EGD_PATH /etc/entropy
  9.  

  10. Authentication errors connecting to horizon under Globus
     
    When attempting to run batch jobs on horizon.sdsc.edu remotely using Globus, mutual authentication errors occur that may look like this:

    GRAM Authentication test failure: authentication failed:
    GSS Major Status: Unexpected Gatekeeper or Service Name
    GSS Minor Status Error Chain:

    init.c:499: globus_gss_assist_init_sec_context_async: Error during context initialization
    init_sec_context.c:286: gss_init_sec_context: Mutual authentication failed: The target name (/C=US/O=NPACI/OU=SDSC/CN=tf005i.sdsc.edu) in the context, and the target name (/CN=host/tf005ig.sdsc.edu) passed to the function do not match

    The problem occurs because horizon.sdsc.edu is not a single machine. A connection to "horizon" will round robin to either tf004i.sdsc.edu or tf005i.sdsc.edu. Therefore, the hostname in the host certificate file does not match the host you were trying to contact.  Because neither of the host certificates on these machines recognize the hostname 'horizon', the connection fails.
     
    Solution:

    You can usually get around this problem by not using CNAMES or round-robin DNS names.  Be careful when authenticating against hosts with multiple interfaces.

    • Instead of horizon.sdsc.edu, use tf004i.sdsc.edu or tf005i.sdsc.edu
    • Instead of b80login.sdsc.edu, use b80n01.sdsc.edu

  11. I want to use mpich/mpich-g2/vendormpi, but they are in the wrong order in my path
     
    The problem with having multiple MPI implementations available is that the tools are all named the same (mpicc, mpirun, ...), so the first implementation in the path takes precedence.  That is not necessarily the one that you want to use.  The trick is to first change your path to include the implantation you want first in the path:

    export PATH=/usr/npaci-grid-1.1/grid/bin:$PATH

    The example above puts mpich-g2 first in the path.