Data Ingest: Header Files

To aid in the parsing of products from the various feeds, a header file can be created by the ingest program. This essentially lists the header of each product in the file along with its byte offset into the file. Since most parsing is based on header, it is far easier to search the smaller header file than to parse through the much larger product file. The use of header files improves the speed of data parsing by nearly an order of magnitude, especially if the file does not exist on a local hard drive. For GRIB data, header files provide a simple means to pull specific grids from a large GRIB file. These files can be larger than 10MB and thus full parsing for a grid can be time consuming. The header file means a program can seek directly to the grid location thus reading 2-5KB of data rather than the entire file. This not only speeds up data processing but also reduces network traffic.

Header File Syntax

The syntax of the file is as follows:

offset header / extra
offset header / extra
....

where:

offset -- is the byte offset into the file,
header -- is the product header in its entirety is listed after the offset
extra -- extra information about the product which is normally the AWIPS header

A sample from a forecast data header file:

      0 FPUS86 KPQR 282359 / OPUPDX
   3264 FPUS85 KGGW 290001 / OPUGGW
   3548 FPAK11 PAYA 282207 / &ZCZC JNULFPYAK
   4190 FPUS73 KFGF 282359 / NOWFAR
   6865 FPAK57 PAJK 290001 CCA / &ZCZC JNUZFPAK
   9613 FPUS73 KLBF 290002 / NOWLBF
  10092 FPUS73 KGRB 290001 / NOWGRB
  10588 FPAK11 PAFA 290005&ZCZC FAILFPFAI / AKZ007-290530-
  11366 FCCN51 CWAO 290001 AAA / TAF AMD CYUY 290001Z
  11592 FPAK57 PAJK 290001 CCA / &ZCZC JNUZFPAK
  14342 FPAK11 PAFA 290005 / &ZCZC FAILFPFAI

The header file is from a GRIB product, the decoded GRIB information is listed including model number, grid number, forecast time, level type, level number and variable/parameter number. To decode these numbers, see the Appendix (WXP Product Descriptions). Since GRIB headers are not unique, this information is needed to uniquely describe the contents of the product.

3087563 YORB10 KWBE 131500 /  85 212       6 100     100  39
3102432 ZORE10 KWBE 131500 /  85 212       9 100     100  39
3117301 YCUA99 KWBE 131500 PAA /  85 215       0 100    1000  41
3230510 YCUA85 KWBE 131500 PAA /  85 215       0 100     850  41
3336000 YCUA70 KWBE 131500 PAA /  85 215       0 100     700  41
3352511 YCUA50 KWBE 131500 PAA /  85 215       0 100     500  41
3368533 YCUA25 KWBE 131500 PAA /  85 215       0 100     250  41
3474022 YTUA98 KWBE 131500 PAA /  85 215       0 105       2  11
3487852 YUUA98 KWBE 131500 PAA /  85 215       0 105      10  33
3583433 YVUA98 KWBE 131500 PAA /  85 215       0 105      10  34
3599668 YRUA98 KWBE 131500 PAA /  85 215       0 105       2  52
3635495 YPUA98 KWBE 131500 PAA /  85 215       0   1       0   1

Creation of Header Files

The creation of header is done automatically by the WXP ingestor if the header file name convention is added to the bulletin file. For example for forecast data:

F[^O]             >>    %D/%y%m%d%6h_for.wmo  %D/%y%m%d%6h_for.hdr

The first name convention listed "%D/%y%m%d%6h_for.wmo" is the filename where the actual product is saved. The second name convention "%D/%y%m%d%6h_for.hdr" is where the header file information is saved. It is recommended that header files be created for any product that requires parsing. Here is a list:

Any text files such as forecasts, climatic data, MOS data, advisories, warnings and summaries.
Any GRIB data such as any files from the HRS feed.

The sample ingest bulletin file also lists those products that need header files.

Header Files and the LDM

The LDM cannot generate header files so WXP has two programs that will generate header files for LDM data.

hdrparse - for text based header files
griblook - for GRIB based header files.

For header files to work, they must be continually generated as the data arrives. This means that a script should be run in cron once a minute to generate these files. Here is a sample script:

#! /bin/csh -f

setenv wxpdefault /usr/wxp/etc

foreach model ( eta ngm ruc \
   avn_n0e avn_n1e avn_n0w avn_n1w \
   avn_s0e avn_s1e avn_s0w avn_s1w \
   mrf_nh mrf_us mrf_ak mrf_hi mrf_pr \
   mrf_nhem mrf_shem )
   /usr/wxp/bin/griblook -cu=la -mo=$model -ou=hdrfile -pf=app -me=none
end

foreach type ( for_dat sev_dat cli_dat sum_dat mos_dat )
   /usr/wxp/bin/hdrparse -cu=la -if=$type -pf=app -me=none
end

Both programs are run in append mode so that they only parse what has come in since the last running of the program. The program uses the existing header file to find out the location of the last product seen when the program ran last. It then starts parsing from that point and appends the new products to the end of the existing header file. This reduces the execution time of the script considerably.

For further information about WXP, email devo@ks.unisys.com
Last updated by Dan Vietor on July 21, 1998

data_ingestor <<

data_header

>> data_pan