Parsing Data and Header Files

Parse Lookup File

For most WXP program, they can find data files by use of the name convention tag. For parsing, there is an additional method that simplifies the process. When the parse program parses for a WMO header, it searches the parse lookup file "parse.lup" to cross reference headers to name convention tags. This eliminates the need to specify the in_file file name tag. The syntax of the file is:

   pattern      tag

The pattern can be a regular expression. It returns the tag of the first match in the file. That name convention is used to setup the filenames to search for the desired product. Here is a sample parse lookup file:

   FO      mos_dat
   FT      term_dat
   W       sev_dat
   F       for_dat
   C       cli_dat
   S       sfc_dat
   U       upa_dat
   AC      sev_dat
   A       sum_dat

In this case, all products whose WMO header starts with "W" will be searched for in the files with the tag "sev_dat".

Text Data

When parsing data, most of the data is processed a line at a time. This means that searching for a particular product in a file can consume tremendous amounts of time. To simplify the process, there are header files that list the product headers in a separate file along with byte offsets into the actual ingested data file. This file then becomes a lookup table for specific products. The use of a header file speeds up searching for products by nearly an order of magnitude and is recommended for all data files.

The LDM, as well as non-WXP ingestors, does not create header files. The hdrparse program is used to post process the ingested data and create the header files. To run hdrparse, the file name convention file must be set up with a header file name syntax:

   for_dat       %D/%y%m%d%h_for.wmo   %D/%y%m%d%h_for.hdr
   sev_dat       %D/%y%m%d%h_sev.wmo   %D/%y%m%d%h_sev.hdr
   sum_dat       %D/%y%m%d%h_sum.wmo   %D/%y%m%d%h_sum.hdr
   cli_dat       %D/%y%m%d%h_cli.wmo   %D/%y%m%d%h_cli.hdr

For each listed file type, there is the name convention for the ingested file plus a second listing which is the name convention of the header file. If the second name convention is omitted, WXP assumes there are no header files for this particular type of file. Once the name convention file is set up to use header files, run header parse on the type of file:

hdrparse -cu=la -if=for_dat

and this will generate the header file. The header output will appear on the screen. The num_hour resource can be used to create header files over several hours.

Once done, programs like parse, forecast and fouswx will run faster and network traffic reduced since direct access into large files requires the transfer of a smaller amount of data than a line by line search through the file would.

GRIB Data

As with text data, parsing large GRIB files can be made easier with header files. Again, header file name conventions are specified in the name convention file:

grib_eta      %D/%y%m%d%12h_eta.grb    %D/%y%m%d%12h_eta.hdr
grib_ngm      %D/%y%m%d%12h_ngm.grb    %D/%y%m%d%12h_ngm.hdr
grib_ruc      %D/%y%m%d%3h_ruc.grb     %D/%y%m%d%3h_ruc.hdr

These header files are produce automatically by the WXP ingestor. Again, the LDM does not generate these file so the griblook program must be run to generate these files:

   griblook -cu=la -mo=eta -ou=hdrfile -pf=app

This will update the current header file for the ETA model grids.

Automating Header File Generation

For header files to work, they must be continually generated as the data arrives. This means that a script should be run in cron once a minute to generate these files. Here is a sample script:

#! /bin/csh -f
#  hdrcreate: Creates a set of header files for various non-WXP ingested files
setenv wxpdefault /home/wxp/etc

foreach model ( eta ngm ruc \
   avn_n0e avn_n1e avn_n0w avn_n1w \
   avn_s0e avn_s1e avn_s0w avn_s1w \
   mrf_nh mrf_us mrf_ak mrf_hi mrf_pr \
   mrf_nhem mrf_shem )
   /home/wxp/bin/griblook -cu=la -mo=$model -ou=hdrfile -pf=app -me=none
end

foreach type ( for_dat sev_dat cli_dat sum_dat mos_dat )
   /home/wxp/bin/hdrparse -cu=la -if=$type -pf=app -me=none
end

Both programs are run in append mode so that they only parse what has come in since the last running of the program. The program uses the existing header file to find out the location of the last product seen when the program ran last. It then starts parsing from that point and appends the new products to the end of the existing header file. This reduces the execution time of the script considerably.

Last updated July 21, 1998