Notes & Examples

Login to TRITON:

 ssh -l your_user_name triton-login.sdsc.edu 

Accounting on TRITON:

   gbalance -u username  

File transfer to/from TRITON:

To copy the file "pi.c" from a ENGR machine to TRITON:
 
          scp pi.c your_SDSC_username@triton-login.sdsc.edu:pi.c 

To copy the file pi.c from TRITON to the ENGR domain machines: 

          scp pi.c your_ENGR_username@linux.engr.ucsb.edu:pi.c

Compiling

In general, the login node should be used only to edit, compile software and to submit jobs to the scheduler.
NEVER RUN A JOB ON THE LOGIN NODE. Jobs should be run only on Triton's compute nodes.

Serial program
 Compile your programs with pgcc, pgf77, and pgf90 (Portland Group 
Compilers), or icc, ifort (Intel Compilers) or gcc, g77, gfortran 
(GNU Compilers). 

      pgcc [options] file.c 	C
      pgf90 [options] file.f 	Fortran 90

Example:
      % 
      % pgcc -o serial serial.c
      %

MPI program
  MPI source codes should be recompiled for the Triton system with the
following compiler commands:

      mpicc [options] file.c	C & C++ [myrinet/mx switch & Portland Compiler] 
      mpif77 [options] file.f	Fortran 77 [myrinet/mx switch & Portland Compiler]
      mpif90 [options] file.f90 Fortran 90 [myrinet/mx switch & Portland Compiler}

Modules

Here are some common module commands and their descriptions:

    module list - List the modules that are currently loaded
    module avail - List the modules that are available
    module display "module_name" - Show the environment variables used by 
                   "module name" and how they are affected
    module unload "module name" - Remove "module name" from the environment
    module load "module name" - Load "module name" into the environment
    module switch "module 1 name" "module 2 name" - Replace "module 1 name" 
                 with "module 2 name" in the enviornment

Running

When you have a job running, you are allocated the nodes
requested. At that time, a PBS prologue script runs that
allows you direct ssh access to your nodes.
At the conclusion of your job, that privilege is removed.

Interactive
You can use "qsub -I" to get exclusive access to an 8-node host, 
where you can perform interactive analyses. 
If you need one processor:  
  qsub -I -q small -l walltime=00:10:00

Examples:

To run an interactive job with a wall clock limit of 30 minutes, 
using two nodes and two processors per node:

$ qsub -I -l walltime=00:30:00 -l nodes=2:ppn=2
qsub: waiting for job 75.triton-42.sdsc.edu to start
qsub: job 75.triton-42.sdsc.edu ready

$ echo $PBS_NODEFILE
/opt/torque/aux/75.triton-42.sdsc.edu

$ more /opt/torque/aux/75.triton-42.sdsc.edu
tcc-2-31
tcc-2-31
tcc-2-25
tcc-2-25

To run a job:
$ mpirun -machinefile $PBS_NODEFILE -np 4 ./execfile

Batch

See: http://tritonresource.sdsc.edu/jobs.php

Script file for the EXPRESS queue:

#!/bin/csh
#PBS -q express
#PBS -N hello
#PBS -l nodes=1:ppn=4
#PBS -l walltime=0:05:00
#PBS -o hello-out
#PBS -e hello-err
#PBS -V
cd /oasis/stefan-ucsb
mpirun -v -machinefile $PBS_NODEFILE -np 4 ./mpi_hello > h-out

Numerical Libraries & Peformance Tools


NUMERICAL LIBRARIES


The Portland Group compilers come with the Optimized ACML library (LAPACK/BLAS/FFT).

ACML user guide is in the following location:
/opt/pgi/linux86-64/8.0-6/doc/acml.pdf

Example BLAS, LAPACK, FFT codes in:
/home/diag/examples/ACML

Compile and link as follows:
pgf90 dzfft_example.f -L/opt/pgi/linux86-64/8.0-6/lib -lacml
pgcc -L/opt/pgi/linux86-64/8.0-6/lib lapack_dgesdd.c -lacml -lm -lpgftnrtl -lrt
pgcc -L/opt/pgi/linux86-64/8.0-6/lib blas_cdotu.c -lacml -lm -lpgftnrtl -lrt


Intel Intel has developed Math Kernel Library (MKL) which contains many linear algebra, FFT and other useful numerical routines. * Basic linear algebra subprograms (BLAS) with additional sparse routines * Fast Fourier Transforms (FFT) in 1 and 2 dimensions, complex and real * The linear algebra package, LAPACK * A C interface to BLAS * Vector Math Library (VML) * Vector Statistical Library (VSL) * Multi-dimensional Discrete Fourier Transforms (DFTs) Installed on Triton as part of the Intel compiler directory. Covers BLAS, LAPACK, FFT, BLACS, and SCALAPACK libraries. Most useful link: The Intel link advisor! http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/ Examples in the following directory: /home/diag/examples/MKL LAPACK example using MKL Compile as follows: export MKLPATH=/opt/intel/Compiler/11.1/072/mkl ifort dgebrdx.f -I$MKLPATH/include $MKLPATH/lib/em64t/libmkl_solver_lp64_sequential.a -Wl,--start-group $MKLPATH/lib/em64t/libmkl_intel_lp64.a $MKLPATH/lib/em64t/libmkl_sequential.a $MKLPATH/lib/em64t/libmkl_core.a -Wl,--end-group libaux_em64t_intel.a -lpthread Output: ./a.out < dgebrdx.d ScaLAPACK example using MKL Sample test case (from MKL examples) is in: /home/diag/examples/scalapack The make file is set up to compile all the tests. Procedure: module purge module load intel module load openmpi_mx make libem64t compiler=intel mpi=openmpi LIBdir=/opt/intel/Compiler/11.1/072/mkl/lib/em64t Sample link line (to illustrate how to link for scalapack): /opt/openmpi/bin/mpicc -o mm_pblas mm_pblas.c -L/opt/intel/Compiler/11.1/072/mkl/lib/em64t /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_scalapack_lp64.a /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_blacs_openmpi_lp64.a -L/opt/intel/Compiler/11.1/072/mkl/lib/em64t /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_intel_lp64.a -Wl,--start-group /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_sequential.a /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_core.a -Wl,--end-group -lpthread mm_pblas.c

GPROF


GPROF is the GNU Project PROFiler.   

Requires recompilation of the code.

Compiler options and libraries provide wrappers for each routine call and periodic sampling of the program. 

A default gmon.out file is produced with the function call information.

GPROF links the symbol list in the executable with the data in gmon.out. 

Types of Profiles
Flat Profile
CPU time spend in each function (self and cumulative) 
Number of times a function is called
Useful to identify most expensive routines

Call Graph
Number of times a function was called by other functions
Number of times a function called other functions
Useful to identify function relations
Suggests places where function calls could be eliminated

Use the -pg flag during compilation:
% gcc  -g -pg ./srcFile.c
% icc  -g -p  ./srcFile.c
% pgcc -g -pg ./srcFile.c

Run the executable. An output file gmon.out will be generated with the profiling information.

Execute gprof and redirect the output to a file:
% gprof    ./exeFile gmon.out > profile.txt
% gprof -l ./exeFile gmon.out > profile_line.txt


FPMPI


FPMPI Is a simple MPI profiling library. It is intended as a first step towards
understanding the nature of the communication patterns and potential bottlenecks
in existing applications.

Applications run which are linked to FPMPI will generate an output file, fpmpi_profile.txt.
This file contains:

    * description: A brief description of fpmpi_profile.txt format.
    * synchronization data: A listing of the synchronizing routines used and some related 
      profile data.
    * asynchronous communication data: A listing of the asynchronous communication routines 
      used and some related profile data.
    * topology data: A brief output of the communication topology.

On TRITON the library is located in:
   /home/beta/fpmpi/fpmpi-2/lib

To run, needs PGI and MPICH MX:
   module purge
   module load pgi
   module load mpich_mx

Just relink with the library. For example:
   /opt/pgi/mpichmx_pgi/bin/mpicc -o trap-fpmpi trap.c -L/home/beta/fpmpi/fpmpi-2/lib -lfpmpi

qsub -I -l walltime=00:20:00 -l nodes=1:ppn=4

mpirun -machinefile $PBS_NODEFILE -np 4 trap-fpmpi

fpmpi_profile.txt


TAU/PDT


TAU Performance System is a portable profiling and tracing toolkit for performance analysis 
of parallel programs written in Fortran, C, C++, Java, Python. 
TAU's profile visualization tool, paraprof, provides graphical displays of all the performance 
analysis results, in aggregate and single node/context/thread forms.

TRITON TAU location: /home/beta/tau/2.19-pgi [Using pgi compilers and openmpi_mx]

Load the TAU environment:

export  PATH=/home/beta/tau/2.19-pgi/x86_64/bin:$PATH
export LD_LIBRARY_PATH=/home/beta/tau/2.19-pgi/x86_64/lib:$LD_LIBRARY_PATH

Select the appropiate TAU MAKEFILE based on your choices. For example:
/home/beta/tau/2.19-pgi/x86_64/lib/Makefile.tau-mpi-pdt-pgi

So, we set it up:
% export TAU_MAKEFILE=/home/beta/tau/2.19-pgi/x86_64/lib/Makefile.tau-mpi-pdt-pgi

And we compile using the wrapper provided by tau:
% tau_cc.sh trap.c
or, for Makefiles, edit Makefile and change mpif90/mpicc = tau_f90.sh/tau_cc.sh.

Run the job through the queue normally. We obtain the following profile files [on 4 processors]:
     profile.0.0.0, profile.1.0.0, profile.2.0.0 & profile.3.0.0  

Analyze performance data:

 pprof - for text based display - output of PPROF

 paraprof - for GUI

GUI environment:
a. On PC systems [PUTYY]: select X11 forwarding.
   On Linux & MAC OS: ssh -X ...

b. On TRITON, 
  -  Connect to the compute nodes, with X forwarding:
       qsub -I -X -l walltime=00:20:00 -l nodes=1:ppn=4

  -  Go to the directory where the "profile.0.0.0, etc." are stored.

  -  Set the X11 libraries path:
       export LD_LIBRARY_PATH=/home/beta/X11/lib:$LD_LIBRARY_PATH

  - Set the TAU path:
       export  PATH=/home/beta/tau/2.19-pgi/x86_64/bin:$PATH
       export LD_LIBRARY_PATH=/home/beta/tau/2.19-pgi/x86_64/lib:$LD_LIBRARY_PATH

  - Check to see if X11 is working, by calling 'xclock':
       /home/beta/X11/bin/xclock

Use 'paraprof', to analyze performance data:
        paraprof

Debugging on Triton


DDT on Triton may be run as follows [see also the SDSC page Debugging on Triton with DDT]:

 Alert: DDT does not currently work on Triton for codes compiled with the 
MPICH-MX libraries. Please compile with the Open MPI MX libraries for 
debugging with DDT. 

  1. Login to Triton with X11 forwarding turned on (-X option to ssh command)

  2. Connect to the compute nodes:
         qsub -I -X -l walltime=00:20:00 -l nodes=1:ppn=4
     
  3. Run this command to set up your environment:
           module load ddt 

  4. Set the X11 libraries path:
           export LD_LIBRARY_PATH=/home/beta/X11/lib:$LD_LIBRARY_PATH

  5. Check to see if X11 is working, by calling 'xclock':
            /home/beta/X11/bin/xclock

  6. Run this command to start the DDT client:
           /home/beta/ddt/bin/ddt

  7. Make sure your code is compiled with optimization turned off by compiling 
with -O0 (that is capital letter "O" followed by number zero), and symbol 
table information enabled by compiling with the -g option.


Screenshot of New Session->Run menu selection
Screenshot of New Session->Run menu selection
Screenshot of New Session->Run menu selection
Screenshot of New Session->Run menu selection


Sample Programs

Serial MPI