HPC and wgrib2api
Wgrib2api was designed to work in High Performance Computing (HPC) applications.
The features useful for HPC are,
Some slowness can come from,
- Applications can be written using the Intel fortran compiler, ifort.
- Slow file reads and writes can be replaced by fast memory buffer reads and writes.
- wgrib2api can be compiled without use of configure scripts which may cause problems
when cross compiling. However, you lose some functionality (handling jpeg2000 and png
- wgrib2api can be compiled with/without OpenMP.
- A high-level interface will always be slower than a low-level interface. However,
as grid size increases, the overhead will become a smaller fraction of workload.
- wgrib2api is not reentrant. For large speedups, you have to run multiple copies of wgrib2api
by MPI. You are limited to one copy of wgrib2api per MPI processes and different copies better
not write to the same disk file.
Sure, HPC applications read. However, reading grib is a minor part of most
NWP-HPC jobs. Models and data assimilation systems tend to read much of their input
data files in non-grib formats. Most of the computation is in producing and
outputing the forecast or analyses.
There has been much effort done in speeding up the writing of grib files. For example, the
GFS (Global Forecast System, NCEP) needs to writes grib files every 3 or 6 hours of
the forecasts. Each grib file has more than 700 fields. You can encode each field
independently but parallelization of the encoding of individual fields is limited.
The wgrib2api limits may be typical and they are
Simple packing could be parallelized using MPI but simple packing
is not competative because of its poor compression. So the best parallelization
will be from encoding individual fields using separate MPI processes. The output
file can either be created using a parallel filesystem or by copying the grib
messages to a I/O task to write them to the file system. Wgrib2api does not
support multiple programs writing to the same file, so the output can be
writen to a memory file and writen to a the larger disk file by another step.
- simple packing: highly parallized by OpenMP, poor compression
- AEC packing: single thread, good compression
- complex packing: some parallelization by OpenMP, good compression
- jpeg2000: single thread using Jasper library, good compression