SearchMenu

Using ROCKS Tools on the Valkyrie Cluster

This article is composed of the following sections:

Cluster Architecture

There are two types of machines in the Valkyrie cluster:

  • Front end nodes
  • Compute nodes

Front end nodes are used for compiling and launching jobs. There is one front end node named: valkyrie.ucsd.edu.

Compute nodes are the machines on which jobs run. There are sixteen (16) compute nodes, compute-0-0 through compute-0-15.

All of the Valkyrie nodes are linked by a special highspeed Myrinet network. They also can communicate via a standard (100mbit) Ethernet which is slower.

An overview of software provided on Valkyrie:

MPI

MPI is a library specification for message-passing, proposed as a standard by a broadly based committee of vendors, implementors, and users.

MPICH

MPICH is a freely available, portable implementation of MPI, the Standard for message-passing libraries. MPICH is maintained by Argonne National Laboratory (ANL) and Mississippi State University (MSU).

MyriNet GM

GM is a message-passing system for Myrinet networks. The GM system includes a driver, Myrinet-interface control program, a network mapping program, and the GM API, library, and header files.

GM features include:

  • Concurrent, protected, user-level access to the Myrinet interface.
  • Reliable, ordered delivery of messages.
  • Automatic mapping and route computation.
  • Automatic recovery from transient network problems.
  • Scalability to thousands of nodes.
  • Very low host-CPU utilization.
  • Extensible software to allow simultaneous direct support of the GM API, IP (TCP/UDP), MPI, and other APIs.
MPICH over GM (MPICH-GM)

Myricom has implemented MPICH over the MyriNet GM message passing library. This version of MPI is installed under "/opt/mpich".

Compiling source for the cluster

Compiling for the C implementation of MPICH-G

Set the following environment variable:

  • If you are using bash:
    export PATH=/opt/mpich/myrinet/gnu/bin:$PATH
  • If you are using csh:
    set path = ( /opt/mpich/myrinet/gnu/bin $path )

The compiler executable is named mpicc.

For example, compiling the sample cpi.c program which may be copied from /opt/mpich/myrinet/gnu/examples/cpi.c would look like:

valkyrie.ucsd.edu% mpicc cpi.c -o cpi
Compiling programs to use standard 100mbit Ethernet (/opt/mpich)

If for some reason you wish to use the standard (100mbit) Ethernet instead of the highspeed MyriNet for interprocess communication, compile your programs using the /opt/mpich tree.

Set the following environment variable.

  • If you are using bash:
    export PATH=/opt/mpich/gnu/bin:$PATH
  • If you are using csh:
    set path = ( /opt/mpich/gnu/bin $path )

Compile your source files using mpicc as described above.

Launching Interactive Jobs

Using mpirun

Mpirun on Rocks clusters is used to launch jobs that are linked with the Ethernet device for MPICH.

In this section we will assume that you have compiled the sample "cpi" program.

For example, to interactively launch the "cpi" program on two processors:

  1. Create a file in your home directory named "machines", and put two entries in it like:
    compute-0-0
    compute-0-1
  2. Set the PATH like:
    • If you are using bash:
      export PATH=/opt/mpich/gnu/bin:$PATH
    • For csh:
      set path = ( /opt/mpich/gnu/bin $path )
  3. Now launch the job from Valkyrie:
    mpirun -nolocal -np 2 -machinefile machines

    For example for user cs260c user:

    mpirun -nolocal -np 2 -machinefile machines /home/cs260c/cpi
Using mpirun.ch_gm

Mpirun.ch_gm on Rocks clusters is used to launch jobs that are linked with the Myrinet device for MPICH.

For example, to interactively launch the "cpi" program on two processes:

  1. Create a file in your home directory named "machines", and put two entries in it like:
    compute-0-0
    compute-0-1
  2. Set the PATH:

    • If you are using bash:
      export PATH=/opt/mpich/myrinet/gnu/bin:$PATH
    • For csh:
      set path = ( /opt/mpich/myrinet/gnu/bin $path )
  3. Now launch the job from Valkyrie:
    mpirun.ch_gm -np 2 -machinefile machines

    For example for user cs260c user:
    mpirun.ch_gm -np 2 -machinefile machines /home/cs260c/cpi

Checking/Killing processes on nodes

cluster-ps and cluster-kill commands can be used on Valkyrie to check the running processes and the owner of the process/es can kill his/her job.

[valkyrie ~]$ cluster-ps
[valkyrie ~]$ cluster-kill (username)