Skip Navigation Links www.nws.noaa.gov 
NOAA logo - Click to go to the NOAA home page National Weather Service   NWS logo - Click to go to the NWS home page
Climate Prediction Center
 
 

 
About Us
   Our Mission
   Who We Are

Contact Us
   CPC Information
   CPC Web Team

 
HOME > Monitoring_and_Data > Oceanic and Atmospheric Data > Reanalysis: Atmospheric Data > wgrib2m
 

wgrib2mv - wgrib2 m(multiple streams) v(vector data)

wgrib2ms - wgrib2 m(multiple streams) s(scalar data)

Introduction

Wgrib2 was designed to be parallelized by what-may-be-called dataflow programming. Data flows into a black box and data flows out. One way to parallelize is to divide the data flow into N streams, process each stream separately and then recombine the streams at the end of the processing. A common operation is to extract a subregion of a grid.

   Real-world usage: subregion option in grib-filter, nomads.ncep.noaa.gov

   Example: wgrib2 IN.grb -set_grib_type c2 -ijsmall_grib "1:20" "20:40" OUT.grb

      reads grib file, IN.grb
      extract data(1:20, 20:40)
      write a new grib file, OUT.grb, using c2 compression
For each field, you need to decode the field, extract the subarray a(1:20,20:40), encode the grib message. The processing for each grib message is an independent operation and can be parallelized by
   Assume IN.grb has 400 grib messages and their are 4 free cores, sh parallel code is

   wgrib2 IN.grb -for   1:100 -set_grib_type c2 -ijsmall_grib "1:20" "20:40" OUT.grb.1 &
   wgrib2 IN.grb -for 101:200 -set_grib_type c2 -ijsmall_grib "1:20" "20:40" OUT.grb.2 &
   wgrib2 IN.grb -for 201:300 -set_grib_type c2 -ijsmall_grib "1:20" "20:40" OUT.grb.3 &
   wgrib2 IN.grb -for 301:400 -set_grib_type c2 -ijsmall_grib "1:20" "20:40" OUT.grb.4 &
   wait
   cat OUT.grb.1 OUT.grb.2 OUT.grb.3 OUT.grb.4 > OUT.grb

   4 copies of wgrib2 are run at the same time for upto a 4x speed improvement.  The
   final step is to combine the results for the 4 jobs.
Wgrib2ms and wgrib2mv are perl scripts that allow you parallelize your wgrib2 commands in a similar fashion.

Simple Usage

run in 1 stream:   wgrib2     FILE (options)
run in N streams:  wgrib2ms N FILE (options)            N > 1
run in N streams:  wgrib2mv N FILE (options)            N > 1

 * restrictions on the options that work with wgrib2ms and wgrib2mv
 * wgrib2mv requires vector fields to be in same grib message, ex (UGRD,VGRD)
 * wgrib2mv puts vector fields in the same grib message
Wgrib2ms parallelizes wgrib2 in a somewhat similar fashion to the above example
 wgrib2ms 4 IN.grb -set_grib_type c2 -ijsmall_grib "1:20" "20:40" OUT.grb
          |
          \-- run in 4 streams

An easy-to-read version of the shell code that is generated by wgrib2ms,

 1: mkfifo pipe1 pipe2 pipe3 pipe4
 2: wgrib2 -for_n 1::4 IN.grb -set_grib_type c2 -ijsmall_grib 1:10 20:40 pipe1 &
 3: wgrib2 -for_n 2::4 IN.grb -set_grib_type c2 -ijsmall_grib 1:10 20:40 pipe2 &
 4: wgrib2 -for_n 3::4 IN.grb -set_grib_type c2 -ijsmall_grib 1:10 20:40 pipe3 &
 5: wgrib2 -for_n 4::4 IN.grb -set_grib_type c2 -ijsmall_grib 1:10 20:40 pipe4 &
 6: gmerge OUT.grb pipe1 pipe2 pipe3 pipe4
 7: rm pipe1 pipe2 pipe3 pipe4

line 1: make 4 pipes
lines 2-5: run 4 copies of wgrib2 in background, grib output to the pipes
    each copy of wgrib2 processes every 4th field
lines 6: gmerge reads grib messages from the 4 pipes in round-robin fashion and 
     writes it out to OUT.grb

The advantage of this code is that there are no temporary disk files and the merging
of the results of the 4 streams is done at the same time as the wgrib2 processing.

For wgrib2mv, each grib message contains either a vector or scalar field. For examples of vector fields would be the wind (UGRD, VGRD) or the storm tracks (USTM, VSTM). Examples of scalars would be the temperature (TMP) or relative humidity (RH). For wgrib2ms (s is for scalar), every field is treated as a scalar. The output from wgrib2mv has the vector quantities packed together and wgrib2ms has the vector fields in separate grib messages.

  1. use wgrib2mv if you want to use -new_grid (new_grid does scalar and vector interpolation)
  2. use wgrib2mv if you want to keep the vector fields in the same grib message
  3. otherwise use wgrib2ms
In theory, wgrib2ms should be faster than wgrib2mv because the pipelines are shorter and each grib message is more uniform in size. However, if you need to do a -new_grid, you have no choice but to use wgrib2mv.

Wgrib2m parallelizes a wgrib2 command by dividing the data flow into N streams which are processed independently. Only a limited number of output options are supported. Note that the inventory from wgrib2m is in a different order than the inventory from a wgrib2 command.

For -new_grid to work, you have to use wgrib2mv. In addition, the input file must have UGRD and VGRD (default) in the same grib message. Both wgrib2 and wgrib2mv support arbitary vector pairs using the -new_grid_vectors option (v2.0.2). Note that wgrib2mv follows the copygb convention and only UGRD and VGRD are interpolated using vector interpolation by default.

If you want to process the grib data as scalars and convert the output file so that (UGRD, VGRD) are in the same grib message, you can do something like this.

wgrib2ms 4 IN.grb -set_grib_type s -inv /dev/null -grib_out - | wgrib2 - -ncep_uv OUT.grb

wgrib2 output options supported by wgrib2m

  1. -grib
  2. -grib_out
  3. -ijsmall_grib
  4. -new_grid (wgrib2mv only)
  5. -small_grib
  6. all other output options should not be used

wgrib2m restrictions on the output options

  1. Each output option must write to a different file
  2. Each output option must write to the output file for every record processed.
  3. You can use the -match option because -match selects the record prior to processing
  4. You cannot use -if to select the record to be output (see restriction 2)
  5. Output options can only write grib (ex. -netcdf, -cvs are not allowed)

wgrib2 reading options supported by wgrib2m

  1. processing a regular grib file (not a pipe)
  2. -i (reading inventory from stdin) added v1.1
  3. -import will cause problems

wgrib2 options that work differently in wgrib2m

Some options still work but may behave differently in wgrib2m. Since the processing is split in to N streams, each copy of wgrib2 will not see all the records. For example, you may want to calculate the 1000mb-500mb thickness. If one copy of wgrib2 gets the 1000 mb Z and other one gets the 500 mb Z, then you can't calculate the thinkness. This will affect

  1. -rpn
  2. -import

Usage

wgrib2ms N (wgrib2 subset options)
  for N > 1, execute wgrib2 (wgrib2 subset options) in N streams
  for N < -1, produces script running -N streams
wgrib2mv N (wgrib2 subset options)
  for N > 1, execute wgrib2 (wgrib2 subset options) in N streams
  for N < -1, produces script running -N streams

v1.1+
  grep ":HGT:" nam.idx | wgrib2ms 3 -i nam.grb2 -set_grib_type c3 -grib_out HGT.c3
v1.2 wgrib2m was renamed wgrib2mv, added wgrib2ms
  can write to stdout by the "-" filename.  Note only one output option can write to stdout

Example:wgrib2mv

A major use of wgrib2mv is to regrid a file. Suppose we only want to do a vector interpolation of (UGRD,VGRD) and (UGRD,VGRD) and only (UGRD,VGRD) are already stored in the same grib message. In addition suppose we want to interpolate to ncep grid 221. THen the 1 stream version is.
    wgrib2 IN.grb -new_grid_winds grid -new_grid ncep grid 221 OUT221.grb

    This will put UGRD and VGRD in their own grib messages. Suppose we want them in
    the same grib message, the you can do

    wgrib2 IN.grb -inv /dev/null -new_grid_winds grid -new_grid ncep grid 221 - \
      wgrib2 - -ncep_uv OUT221.grb

To parallelize (8 streams) the above you can do

    wgrib2mv 8 IN.grb -inv /dev/null -new_grid_winds grid -new_grid ncep grid 221 OUT221.grb

wgrib2mv run M(ultiple streams) using V(ector in own grib message).

Now suppose we only want to treat (UGRD,VGRD) and (USTR,VSTM) as vectors.  In addition,
we want to bilinearly interpolate all the fields except to SOTYP and VGTYP which are
to be nearest neighbor values.  For one stream,

   wgrib2 IN.grb -new_grid_winds earth -new_grid_interpolation bilinear \
      -new_grid_vectors "UGRD:VGRD:USTM:VSTM" \
      -if ":(VGTYP|SOTYP):" -new_grid_interpolation neighbor -fi \
      -new_grid ncep grid 221 - -inv /dev/null | wgrib2 - -ncep_uv OUT.grb

The 4 stream version is

   wgrib2 IN.grb -submsg_uv tmpfile
   wgrib2mv 4 tmpfile -new_grid_winds earth -new_grid_interpolation bilinear \
      -new_grid_vectors "UGRD:VGRD:USTM:VSTM" \
      -if ":(VGTYP|SOTYP):" -new_grid_interpolation neighbor -fi \
      -new_grid ncep grid 221 - -inv /dev/null | wgrib2 - -ncep_uv OUT.grb

Observations

Using Centos 6.4 on a FX 8320 (8 core), there was little speed up with N > 4 when using 1 MB grib messages. Using grib messages < 64KB (pipe buffer size), the processing scaled better with the number of streams.

Code location: http://www.ftp.cpc.ncep.noaa.gov/wd51we/wgrib2_aux_progs/


NOAA/ National Weather Service
National Centers for Environmental Prediction
Climate Prediction Center
5830 University Research Court
College Park, Maryland 20740
Climate Prediction Center Web Team
Page last modified: Feb, 2015
Disclaimer Privacy Policy