Next: Global dimensioning parameters
Up: Installation of
Previous: User configuration
Performance and special considerations
The script siteconfig_lapw is provided for general
configuration and compilation of the WIEN97 package. When you call
this script for the first time and follow the suggested answers, WIEN97
should run on your system.(see 11.2.2)
The codes in the individual subdirectories /SRC_program are
compiled using make. The file Makefile is generated during installation using Makefile.orig as template.
In some directories source files *.frc and
param.inc_r/c files, which contain both, the real and complex
(for systems without inversion symmetry) version of the code are
present. You create these versions with make and make
complex respectively. (The *.frc files will than be
preprocessed automatically).
For timing purposes a subroutine CPUTIM is used in
lapw0/1/2 and specific routines for IBM-AIX, HP-UX, DEC-OSF1,
Fujitsu-S100, SGI-IRIX NEC and Cray are available. On other systems
cputim_generic.c should work,
On some HP systems you may encounter problems like: ``stack growth
failure''. You may recompile with -K, reconfigure your Unix-kernel
(with increased stack-size) or put large arrays in the respective
program into COMMONS.
Most of the CPU time will be spent in lapw1 and (to a smaller
extent) in lapw2 and lapw0. Therefore we recommend to optimize
the performance for these 3 programs:
Find out which compiler options (man f77) make these programs
run faster. You could specify a higher optmization (-O3), specify a
particular processor architecture (-qarch=pp2 or -R10000, ....) or a
preprocessor (like kapp or vast).
In addition some fortran routines are provided in different versions
which may run faster on your hardware or you can optimize various
parameters:
- SRC_lapw1/hns.frc: Two versions of hns.frc are provided.
On most machines(except on sgi), hns_generic.frc will probably be
faster.
- SRC_lapw1/seclr4it.frc: Two versions of seclr4it.frc are
provided. On vector computers seclr4it_nec.frc may be
faster. Concerns only ``iterative diagonalization''.
- SRC_lapw1/vectf.f: This file should be removed from your
Makefile on IBM systems, provided IBM's masslib.a is installed and
added in the Makefile.
- SRC_lapw1/param.inc_(r|c): Optimize the HB parameter.
As a starting point set HB=32 for workstations and HB=254 on vector
machines (with vector lenght 255). This depends on cache size and
``high performance'' blas routines. It can speedup the
diagonalization by a factor of 2 on SGIs. HB=1 leads to usage of
different routines (from WIEN95).
- SRC_lapw2/l2main.frc: Two versions of l2main are
provided. Especially on vector computers l2main_nec.frc may be
faster.
- SRC_lapw2/essl.frc: On IBMs with the essl library use
essl_aix.frc.
- SRC_lapw2/param.inc_(r|c): Optimize the IBLCK and
IBLOCK parameters. (Depends on cache size, large for vector
computers)
- SRC_lapw1 and SRC_lapw2/Makefile: Good performance
depends on highly optimized BLAS and LAPACK libraries. Whenever it
is possible, replace the supplied libraries, namely
SRC_lib/blas_lapw and SRC_lib/lapack_lapw, by
routines from your vendor (essl on IBM, complib.sgimath on SGI, dxml
on DEC, try also blas, nag, or imsl libraries). If such libraries
are not available try to optimize with compiler preprocessing as
good as possible (especially the blas_lapw library).
Next: Global dimensioning parameters
Up: Installation of
Previous: User configuration
2000-04-11