From hillson@ait.nrl.navy.mil Tue Dec 28 12:19:06 1999 Date: Tue, 15 Jun 1999 14:11:01 -0400 (EDT) From: Roger Hillson To: tapulika@npac.syr.edu Cc: hillson@ait.nrl.navy.mil, smith@ait.nrl.navy.mil, tran@metsci.com Subject: SPEEDES/MPI/Janeway Tom, First item is a message from Tuan. He summarizes the environmental and path variables needed to run mpi applications on Janeway, and I am not going to repeat this information. --------------------------------------------------------------------- Roger, Here's the instruction for compiling/running ParNSS on janeway. Could you please add any instruction you might have, your README.NEW and forward to Wojtek Furmanski and his people. Tuan ---------------------------------------------------------------------------- ------- Unpack the tar/gzip file > gunzip speedes061199.tar.gz > tar xvf speedes061199.tar.gz This will create a directory : speedes061199 Make two version of the directory speedes061199, one for shared memory and the other for message passing mode. > cp -R speedes061199 speedes061199shm > cp -R speedes061199 speedes061199mpi Set the environment variables required for compiling/running SPEEDES w/MPI. setenv MPI_MSGS_PER_PROC 20000 setenv MPI_TYPE_MAX 20000 setenv MPI_REQUEST_MAX 20000 setenv MPI_MSGS_PER_HOST 4096 setenv MPI_BUFS_PER_PROC 256 setenv MPI_THRESHOLD 1 setenv MPI_ARCH IRIX64 setenv MPI_INCLUDE_DIR "/usr/include" setenv MPI_PATH "/usr/include" setenv MPI_LIB_DIR "/usr/lib32/mips3" setenv MPI_EXTRA_LIBS "-lmpi -lm" To compile a version of SPEEDES with message passing: > cd speedes061199mpi > Admin/spmake -j 16 COMM=MPI LIBEXT=a To execute a SPEEDES application with message passing: > mpirun -np 64 ParNSS 64 This command execute ParNSS(w/ MPI) on 64 nodes Example of setup files for submitting job to miser to run with message passing can be found on janeway in the directory: /scratch/ttran/ParNSS-Beta100kA/mpi64 ----------------------------------------------------------------------- Appended last is README.NEW, which you should read. Again, the compilation instructions in README.NEW are obsolete. Roger ------------------------------------------------------------------------ Revised 2 June 1999 README before attempting to use the MPI communication libraries. (1) The MPI libraries were revised to support SPEEDES version 0.6. Additional changes are documented in the header of file SpeedesCommMPI.C. (2) The MPI libraries were tested using version 1.1.2 of MPICH. The compilation of these libraries was done on a Sun Sparcstation 5 running Solaris 2.5.1. (Solaris 2.5.1 is an alternate name for SunOS 5.5.1.) If you query your system calling uname ('uname -a'), uname will return the level of the SunOS operating system. The libraries were also tested on Janeway, NRL's Origin 2000, and Dax, NRL's Exemplar. The MPICH source code is obtained from ftp.mcs.anl.gov/pub/mpi and is called mpich.tar.gz. Installation instructions are provided. (3) MPICH supports two different libraries: a socket-based communication library [device ch_p4], and a shared-memory based communication library [device ch_shmem]. The MPI interface is the same in either case. The socket-based library can be used to build MPI applications that will run on a network of Suns and/or SGI's (assuming that you also build the SGI libraries - see the MPICH installation instructions). Socket-based MPI applications will also run on a multi-CPU workstation, but very inefficient relative to a shared-memory based application. A shared-memory based application will run efficiently on a multiprocessor, but cannot be run on a network of workstations. To configure the sockets-based library on a Sun, use the switch -device=ch_p4. For example: configure -mpe -device=ch_p4 -nof77 -noc++ -cc=gcc or configure -mpe -device=ch_p4 -nof77 -noc++ -cc=CC -- depending on the desired compiler. To configure the shared-memory based library on a Sun, use the switch -device=ch_shmem. For example: configure -mpe -device=ch_shmem -nof77 -noc++ -cc=gcc or configure -mpe -device=ch_shmem -nof77 -noc++ -cc=CC See the MPICH installation instructions for additional details. (4) The same compiler, linker, and OS level should be used for building the MPICH libraries AND the MPI applications. Applications compiled and linked under Solaris 2.5.1 will also execute under Solaris 2.6, but do not compile an application under Solaris 2.6 and link to 2.5.1 MPI libraries. (5) Note that the link switch for MPICH 1.12 is now -lmpich, rather than -lmpi. If necessary, the -lmpi switch should be commented out in /speedes/speedes0.6.cc/Admin/include_only_once.mak i.e. hillson.spruance% diff include_only_once.mak include_only_once.mak.~1~ 167c167 < extra_libs = -L$(MPI_LIB_DIR) $(MPI_EXTRA_LIBS) --- > extra_libs = -L$(MPI_LIB_DIR) -lmpi $(MPI_EXTRA_LIBS) (6) =============================================================== THE FOLLOWING SECTION IS PARTICULARLY IMPORTANT FOR UNIT TESTING OF THE COMMUNICATION FUNCTIONS UNDER EITHER MPI OR THE SHARED MEMORY INTERFACES. There are a number of bugs in the test program TestSpComm.C, and I have tried to document them here. A modified version of TestSpComm.C is provided in the /MPI directory. The Bandwidth test is commented out, but could be put back in. MPI_Finalize() is called properly, to avoid leaving shared memory segments around. You can diff my version of TestSpComm.C with the version provided with SPEEDES if you are interested in the changes. I recommend running this version on Janeway. =================================================================== I. The Bandwidth test bug TestSpComm.C is test program for exercising the communication interfaces. In SPEEDES0.6, several tests will fail if the bandwidth test is run FIRST. This problem is not specific to MPI. If necessary, the bandwidth tests should be commented out in TestSpComm.C i.e. hillson.spruance% diff TestSpComm.C TestSpComm.a.C 263c263 < //BandwidthTest(Nits, MsgLen); --- > BandwidthTest(Nits, MsgLen); 1168d1167 < 1470c1469 < //BandwidthTest(Nits, absMsgLen); --- > BandwidthTest(Nits, absMsgLen); ================================================================= II. The cleanup bug A second problem with TestSpComm is that it does not call SpComm_CleanupOther() at exit. SpComm_CleanupOther() indirectly calls MPI_Finalize(). If MPI_Finalize() is not called, the shared memory segments may not be cleaned up correctly. [The 'qnet' demonstration does call SpComm_CleanupOther() correctly at exit]. The MPI-version of TestSpComm still seems to run correctly on networks of Sun workstations (ch_p4 libraries), and multiprocessors (both ch_p4 and ch_shmem libraries). ================================================================== III. The shared memory segment bug If TestSpComm is terminated prematurely (with a ctrl-C), the shared-memory segments will not be deleted. The is true for both the shared memory interfaces, and the MPI interfaces. I got tired of deleting these segments by hand, and wrote a utility called zap.c. To build zap, cc - o zap zap.c on any system. Directions for using zap are contained in the source code. You should compile and run zap now and then just to see if there are any shared memory segments around -- remember, these are persistent. =================================================================== IV. The message size bug. TestSpComm fails if a negative number is used as an input for the message size. This should cause messages of random length to be generated, up to the absolute value of the negative number. This problem is not MPI-specific. =================================================================== V. The SPEEDES deep-clean bug. There is also a bug in the Speedes0.6 deep clean. "/speedes/speedes0.6/gnumake CLEAN=" does not remove the object and dependency files in /speedes/speedes0.6/src/comm/MPI/ArchitectureDirs/SunOS [e.g. SpeedesCommMPI.o and MPI_Globals.o] are not removed. Metron is aware of this problem, and will fix it. ==================================================================== VI. Cout menu bug When TestSpComm.C is run under the mpirun on Janeway, the interactive menu does not work correctly. To fix this bug, 'endl' was added to to the end of two cout statements. END of TESTSPCOMM BUG SECTION ==================================================================== (7) Before building a Speedes/MPI application, the environmental variable MPI_PATH should be set to reflect your system configuration. (A) Under MPICH: # SPEEDES environmental variables for linking MPI under MPICH setenv MPICHHOME "/proj/pgmt/mpi/mpich/sockets" setenv MPI_PATH "$MPICHHOME" (B) On Janeway: Before compiling and linking on Janeway, first source the file JANEWAY.MPI.SET (First set SPEEDES_RT to the proper path.) (C) On Dax: Before compiling and linking on Dax, first source the file DAX.MPI.SET (First set SPEEDES_RT to the proper path.) (8) After setting the above environmental variables, TestSpComm and qnet can be built as follows. If you are using MPICH, the script file mpiCC must be used. In /speedes0.6/cc/src/comm/shmem/tcpip: (a) Under MPICH, on a network of Sun's or SGI's: gnumake TestSpComm COMM=MPI CXX=mpiCC (b) On Janeway, using Janeway's native version of MPI: gnumake TestSpcomm COMM=MPI # Janeway does not have a script file mpiCC (c) On Dax: gnumake TestSpComm COMM=MPI CXX=mpiCC # Dax does have a script file for compiling MPI programs In /speedes0.6/cc/demos/qnet: (a) Under MPICH gnumake qnet COMM=MPI CXX=mpiCC (b) On Janeway, using Janeway's native version of MPI: gnumake qnet TestSpcomm COMM=MPI # Janeway does not have a script file mpiCC (c) On Dax: gnumake qnet TestSpComm COMM=MPI CXX=mpiCC # Dax does have a script file for compiling MPI programs (9) Running the demos: If the ch_p4 (sockets) MPI library is used, program output from cout or printf statements will not necessarily be printed to stdout in the order in which they occur. ===================================================================== I. Running the demo's under MPICH on a network of Sun's and/or SGI's -- To run TestSpComm: (a) If you are running the MPICH sockets library, define a host file which lists the domain names of the slave machines. The first node will execute on the machine on which mpirun is executed. MPICH will create a default host file for the machine on which MPICH is compiled, but it is good practice to explicitly list your host machines. For example, for a network of Suns or SGI's: cat host.net machine1.x.y.z machine2.x.y.z machine3.x.y.z machine4.x.y.z The host file is ignored if the MPI shared memory libraries are used. (b) In /speedes/speedes0.6/exe/ArchitectureDirs/SunOS: mpirun -np 3 -machinefile host.net TestSpComm -lnodes 3 -group 0 [or] mpirun -np 3 TestSpComm -lnodes 3 -group 0 Obviously, the number of processes [3] can be changed. The socket-based MPI library [ch_p4] is required to run on the network, but either the ch_p4 or ch_shmem MPI libraries can be used on a multiprocessor. The ch_shmem are more efficient on architectures which support shared memory. (c) To run the qnet demo: Symbolically link speedes.par and qnet.par from /speedes0.6/cc/demos/qnet to /speedes/speedes0.6/exe/ArchitectureDirs/SunOS mpirun -np 3 -machinefile host.net qnet 3 [or] mpirun -np 3 qnet 3 ====================================================================== II. Running the demos on Janeway --- (a) To run the tests without miser: (1) To run the mpi tests: mpirun -np #procs TestSpComm -lnodes #procs -group 0 mpirun -np #procs qnet -lnodes #procs -group 0 where #procs is the number of nodes requested. For some parameter settings, it may be necessary to increase the predefined limits for certain environmental variables (see my paper). source JANEWAY.MPI.ENV sets the environmental variables to 'reasonable' values. (2) To run shared-memory tests: TestSpComm -lnodes #procs -group 0 qnet -lnodes #procs -group 0 (b) Running with miser: (1) To run the shared-memory version of TestSpComm: The files 'seg' and 'job' must be provided. Sample files are included with the distribution. To submit the job to miser: miser_submit -q warp -f seg job See the Janeway help pages & 'man miser' for details. (2) To run the MPI tests with miser: Only the seg file is required. miser_submit -q warp -f seg mpirun -miser -np #procs TestSpComm -lnodes #procs -group 0 ====================================================================== III. Running the demos on Dax -- mpirun -np #procs TestSpComm -lnodes #procs -group 0 mpirun -np #procs qnet -lnodes #procs -group 0 where #procs is the number of nodes requested. Dax does not have the equivalent of a miser utility. ========================================================================= Finally, be aware of the phenomenon I noted in my paper. If some of the TestSpComm tests are run repeatedly, the timings drop until reaching an apparent asymptote. End README File. Roger Hillson Code 5583 Naval Research Laboratory hillson@ait.nrl.navy.mil