Data Ingest: Header Files
To aid in the parsing of products from the various feeds, a header file can be created by the ingest program. This essentially lists the header of each product in the file along with its byte offset into the file. Since most parsing is based on header, it is far easier to search the smaller header file than to parse through the much larger product file. The use of header files improves the speed of data parsing by nearly an order of magnitude, especially if the file does not exist on a local hard drive. For GRIB data, header files provide a simple means to pull specific grids from a large GRIB file. These files can be larger than 10MB and thus full parsing for a grid can be time consuming. The header file means a program can seek directly to the grid location thus reading 2-5KB of data rather than the entire file. This not only speeds up data processing but also reduces network traffic.
Header File Syntax
The syntax of the file is as follows:
offset header / extra offset header / extra ....
where:
- offset -- is the byte offset into the file,
- header -- is the product header in its entirety is listed after the offset
- extra -- extra information about the product which is normally the AWIPS header
A sample from a forecast data header file:
0 FPUS86 KPQR 282359 / OPUPDX 3264 FPUS85 KGGW 290001 / OPUGGW 3548 FPAK11 PAYA 282207 / &ZCZC JNULFPYAK 4190 FPUS73 KFGF 282359 / NOWFAR 6865 FPAK57 PAJK 290001 CCA / &ZCZC JNUZFPAK 9613 FPUS73 KLBF 290002 / NOWLBF 10092 FPUS73 KGRB 290001 / NOWGRB 10588 FPAK11 PAFA 290005&ZCZC FAILFPFAI / AKZ007-290530- 11366 FCCN51 CWAO 290001 AAA / TAF AMD CYUY 290001Z 11592 FPAK57 PAJK 290001 CCA / &ZCZC JNUZFPAK 14342 FPAK11 PAFA 290005 / &ZCZC FAILFPFAI
The header file is from a GRIB product, the decoded GRIB information is listed including model number, grid number, forecast time, level type, level number and variable/parameter number. To decode these numbers, see the Appendix (WXP Product Descriptions). Since GRIB headers are not unique, this information is needed to uniquely describe the contents of the product.
3087563 YORB10 KWBE 131500 / 85 212 6 100 100 39 3102432 ZORE10 KWBE 131500 / 85 212 9 100 100 39 3117301 YCUA99 KWBE 131500 PAA / 85 215 0 100 1000 41 3230510 YCUA85 KWBE 131500 PAA / 85 215 0 100 850 41 3336000 YCUA70 KWBE 131500 PAA / 85 215 0 100 700 41 3352511 YCUA50 KWBE 131500 PAA / 85 215 0 100 500 41 3368533 YCUA25 KWBE 131500 PAA / 85 215 0 100 250 41 3474022 YTUA98 KWBE 131500 PAA / 85 215 0 105 2 11 3487852 YUUA98 KWBE 131500 PAA / 85 215 0 105 10 33 3583433 YVUA98 KWBE 131500 PAA / 85 215 0 105 10 34 3599668 YRUA98 KWBE 131500 PAA / 85 215 0 105 2 52 3635495 YPUA98 KWBE 131500 PAA / 85 215 0 1 0 1
Creation of Header Files
The creation of header is done automatically by the WXP ingestor if the header file name convention is added to the bulletin file. For example for forecast data:
F[^O] >> %D/%y%m%d%6h_for.wmo %D/%y%m%d%6h_for.hdr
The first name convention listed "%D/%y%m%d%6h_for.wmo
" is the
filename where the actual product is saved. The second name convention "%D/%y%m%d%6h_for.hdr
"
is where the header file information is saved. It is recommended that header files
be created for any product that requires parsing. Here is a list:
- Any text files such as forecasts, climatic data, MOS data, advisories, warnings and summaries.
- Any GRIB data such as any files from the HRS feed.
The sample ingest bulletin file also lists those products that need header files.
Header Files and the LDM
The LDM cannot generate header files so WXP has two programs that will generate header files for LDM data.
- hdrparse - for text based header files
- griblook - for GRIB based header files.
For header files to work, they must be continually generated as the data arrives. This means that a script should be run in cron once a minute to generate these files. Here is a sample script:
#! /bin/csh -f setenv wxpdefault /usr/wxp/etc foreach model ( eta ngm ruc \ avn_n0e avn_n1e avn_n0w avn_n1w \ avn_s0e avn_s1e avn_s0w avn_s1w \ mrf_nh mrf_us mrf_ak mrf_hi mrf_pr \ mrf_nhem mrf_shem ) /usr/wxp/bin/griblook -cu=la -mo=$model -ou=hdrfile -pf=app -me=none end foreach type ( for_dat sev_dat cli_dat sum_dat mos_dat ) /usr/wxp/bin/hdrparse -cu=la -if=$type -pf=app -me=none end
Both programs are run in append mode so that they only parse what has come in since the last running of the program. The program uses the existing header file to find out the location of the last product seen when the program ran last. It then starts parsing from that point and appends the new products to the end of the existing header file. This reduces the execution time of the script considerably.
For further information about WXP, email devo@ks.unisys.com
Last updated by Dan Vietor on July 21, 1998