Login to TRITON:
ssh -l your_user_name triton-login.sdsc.eduAccounting on TRITON:
gbalance -u usernameFile transfer to/from TRITON:
To copy the file "pi.c" from a ENGR machine to TRITON: scp pi.c your_SDSC_username@triton-login.sdsc.edu:pi.c To copy the file pi.c from TRITON to the ENGR domain machines: scp pi.c your_ENGR_username@linux.engr.ucsb.edu:pi.cCompiling
In general, the login node should be used only to edit, compile software and to submit jobs to the scheduler.
NEVER RUN A JOB ON THE LOGIN NODE. Jobs should be run only on Triton's compute nodes.
Serial programCompile your programs with pgcc, pgf77, and pgf90 (Portland Group Compilers), or icc, ifort (Intel Compilers) or gcc, g77, gfortran (GNU Compilers). pgcc [options] file.c C pgf90 [options] file.f Fortran 90 Example: % % pgcc -o serial serial.c %MPI programMPI source codes should be recompiled for the Triton system with the following compiler commands: mpicc [options] file.c C & C++ [myrinet/mx switch & Portland Compiler] mpif77 [options] file.f Fortran 77 [myrinet/mx switch & Portland Compiler] mpif90 [options] file.f90 Fortran 90 [myrinet/mx switch & Portland Compiler}Modules
Here are some common module commands and their descriptions: module list - List the modules that are currently loaded module avail - List the modules that are available module display "module_name" - Show the environment variables used by "module name" and how they are affected module unload "module name" - Remove "module name" from the environment module load "module name" - Load "module name" into the environment module switch "module 1 name" "module 2 name" - Replace "module 1 name" with "module 2 name" in the enviornmentRunning
When you have a job running, you are allocated the nodes requested. At that time, a PBS prologue script runs that allows you direct ssh access to your nodes. At the conclusion of your job, that privilege is removed. Interactive You can use "qsub -I" to get exclusive access to an 8-node host, where you can perform interactive analyses. If you need one processor: qsub -I -q small -l walltime=00:10:00 Examples: To run an interactive job with a wall clock limit of 30 minutes, using two nodes and two processors per node: $ qsub -I -l walltime=00:30:00 -l nodes=2:ppn=2 qsub: waiting for job 75.triton-42.sdsc.edu to start qsub: job 75.triton-42.sdsc.edu ready $ echo $PBS_NODEFILE /opt/torque/aux/75.triton-42.sdsc.edu $ more /opt/torque/aux/75.triton-42.sdsc.edu tcc-2-31 tcc-2-31 tcc-2-25 tcc-2-25 To run a job: $ mpirun -machinefile $PBS_NODEFILE -np 4 ./execfile Batch See: http://tritonresource.sdsc.edu/jobs.php Script file for the EXPRESS queue: #!/bin/csh #PBS -q express #PBS -N hello #PBS -l nodes=1:ppn=4 #PBS -l walltime=0:05:00 #PBS -o hello-out #PBS -e hello-err #PBS -V cd /oasis/stefan-ucsb mpirun -v -machinefile $PBS_NODEFILE -np 4 ./mpi_hello > h-outNumerical Libraries & Peformance Tools
NUMERICAL LIBRARIES
The Portland Group compilers come with the Optimized ACML library (LAPACK/BLAS/FFT). ACML user guide is in the following location: /opt/pgi/linux86-64/8.0-6/doc/acml.pdf Example BLAS, LAPACK, FFT codes in: /home/diag/examples/ACML Compile and link as follows: pgf90 dzfft_example.f -L/opt/pgi/linux86-64/8.0-6/lib -lacml pgcc -L/opt/pgi/linux86-64/8.0-6/lib lapack_dgesdd.c -lacml -lm -lpgftnrtl -lrt pgcc -L/opt/pgi/linux86-64/8.0-6/lib blas_cdotu.c -lacml -lm -lpgftnrtl -lrt
Intel Intel has developed Math Kernel Library (MKL) which contains many linear algebra, FFT and other useful numerical routines. * Basic linear algebra subprograms (BLAS) with additional sparse routines * Fast Fourier Transforms (FFT) in 1 and 2 dimensions, complex and real * The linear algebra package, LAPACK * A C interface to BLAS * Vector Math Library (VML) * Vector Statistical Library (VSL) * Multi-dimensional Discrete Fourier Transforms (DFTs) Installed on Triton as part of the Intel compiler directory. Covers BLAS, LAPACK, FFT, BLACS, and SCALAPACK libraries. Most useful link: The Intel link advisor! http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/ Examples in the following directory: /home/diag/examples/MKL LAPACK example using MKL Compile as follows: export MKLPATH=/opt/intel/Compiler/11.1/072/mkl ifort dgebrdx.f -I$MKLPATH/include $MKLPATH/lib/em64t/libmkl_solver_lp64_sequential.a -Wl,--start-group $MKLPATH/lib/em64t/libmkl_intel_lp64.a $MKLPATH/lib/em64t/libmkl_sequential.a $MKLPATH/lib/em64t/libmkl_core.a -Wl,--end-group libaux_em64t_intel.a -lpthread Output: ./a.out < dgebrdx.d ScaLAPACK example using MKL Sample test case (from MKL examples) is in: /home/diag/examples/scalapack The make file is set up to compile all the tests. Procedure: module purge module load intel module load openmpi_mx make libem64t compiler=intel mpi=openmpi LIBdir=/opt/intel/Compiler/11.1/072/mkl/lib/em64t Sample link line (to illustrate how to link for scalapack): /opt/openmpi/bin/mpicc -o mm_pblas mm_pblas.c -L/opt/intel/Compiler/11.1/072/mkl/lib/em64t /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_scalapack_lp64.a /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_blacs_openmpi_lp64.a -L/opt/intel/Compiler/11.1/072/mkl/lib/em64t /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_intel_lp64.a -Wl,--start-group /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_sequential.a /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_core.a -Wl,--end-group -lpthread mm_pblas.c
GPROF
GPROF is the GNU Project PROFiler. Requires recompilation of the code. Compiler options and libraries provide wrappers for each routine call and periodic sampling of the program. A default gmon.out file is produced with the function call information. GPROF links the symbol list in the executable with the data in gmon.out. Types of Profiles Flat Profile CPU time spend in each function (self and cumulative) Number of times a function is called Useful to identify most expensive routines Call Graph Number of times a function was called by other functions Number of times a function called other functions Useful to identify function relations Suggests places where function calls could be eliminated Use the -pg flag during compilation: % gcc -g -pg ./srcFile.c % icc -g -p ./srcFile.c % pgcc -g -pg ./srcFile.c Run the executable. An output file gmon.out will be generated with the profiling information. Execute gprof and redirect the output to a file: % gprof ./exeFile gmon.out > profile.txt % gprof -l ./exeFile gmon.out > profile_line.txt
FPMPI
FPMPI Is a simple MPI profiling library. It is intended as a first step towards understanding the nature of the communication patterns and potential bottlenecks in existing applications. Applications run which are linked to FPMPI will generate an output file, fpmpi_profile.txt. This file contains: * description: A brief description of fpmpi_profile.txt format. * synchronization data: A listing of the synchronizing routines used and some related profile data. * asynchronous communication data: A listing of the asynchronous communication routines used and some related profile data. * topology data: A brief output of the communication topology. On TRITON the library is located in: /home/beta/fpmpi/fpmpi-2/lib To run, needs PGI and MPICH MX: module purge module load pgi module load mpich_mx Just relink with the library. For example: /opt/pgi/mpichmx_pgi/bin/mpicc -o trap-fpmpi trap.c -L/home/beta/fpmpi/fpmpi-2/lib -lfpmpi qsub -I -l walltime=00:20:00 -l nodes=1:ppn=4 mpirun -machinefile $PBS_NODEFILE -np 4 trap-fpmpi fpmpi_profile.txt
TAU/PDT
TAU Performance System is a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, Java, Python. TAU's profile visualization tool, paraprof, provides graphical displays of all the performance analysis results, in aggregate and single node/context/thread forms. TRITON TAU location: /home/beta/tau/2.19-pgi [Using pgi compilers and openmpi_mx] Load the TAU environment: export PATH=/home/beta/tau/2.19-pgi/x86_64/bin:$PATH export LD_LIBRARY_PATH=/home/beta/tau/2.19-pgi/x86_64/lib:$LD_LIBRARY_PATH Select the appropiate TAU MAKEFILE based on your choices. For example: /home/beta/tau/2.19-pgi/x86_64/lib/Makefile.tau-mpi-pdt-pgi So, we set it up: % export TAU_MAKEFILE=/home/beta/tau/2.19-pgi/x86_64/lib/Makefile.tau-mpi-pdt-pgi And we compile using the wrapper provided by tau: % tau_cc.sh trap.c or, for Makefiles, edit Makefile and change mpif90/mpicc = tau_f90.sh/tau_cc.sh. Run the job through the queue normally. We obtain the following profile files [on 4 processors]: profile.0.0.0, profile.1.0.0, profile.2.0.0 & profile.3.0.0 Analyze performance data: pprof - for text based display - output of PPROF paraprof - for GUI GUI environment: a. On PC systems [PUTYY]: select X11 forwarding. On Linux & MAC OS: ssh -X ... b. On TRITON, - Connect to the compute nodes, with X forwarding: qsub -I -X -l walltime=00:20:00 -l nodes=1:ppn=4 - Go to the directory where the "profile.0.0.0, etc." are stored. - Set the X11 libraries path: export LD_LIBRARY_PATH=/home/beta/X11/lib:$LD_LIBRARY_PATH - Set the TAU path: export PATH=/home/beta/tau/2.19-pgi/x86_64/bin:$PATH export LD_LIBRARY_PATH=/home/beta/tau/2.19-pgi/x86_64/lib:$LD_LIBRARY_PATH - Check to see if X11 is working, by calling 'xclock': /home/beta/X11/bin/xclock Use 'paraprof', to analyze performance data: paraprof
Debugging on Triton
DDT on Triton may be run as follows [see also the SDSC page Debugging on Triton with DDT]: Alert: DDT does not currently work on Triton for codes compiled with the MPICH-MX libraries. Please compile with the Open MPI MX libraries for debugging with DDT. 1. Login to Triton with X11 forwarding turned on (-X option to ssh command) 2. Connect to the compute nodes: qsub -I -X -l walltime=00:20:00 -l nodes=1:ppn=4 3. Run this command to set up your environment: module load ddt 4. Set the X11 libraries path: export LD_LIBRARY_PATH=/home/beta/X11/lib:$LD_LIBRARY_PATH 5. Check to see if X11 is working, by calling 'xclock': /home/beta/X11/bin/xclock 6. Run this command to start the DDT client: /home/beta/ddt/bin/ddt 7. Make sure your code is compiled with optimization turned off by compiling with -O0 (that is capital letter "O" followed by number zero), and symbol table information enabled by compiling with the -g option.
Sample Programs
Serial MPI