WXP version 5
User's Guide

Data Ingest: Ingest Program

These data feeds need to be read in by the computer and saved in a fashion compatible with the analysis package.  This process is call "data ingest".  WXP can process data from two ingest programs.  The first is the Local Data Manager (LDM) provided to universities by Unidata. The second is the WXP ingest program.

LDM (Local Data Manager)

For Unidata/university sites, the software package called the LDM is available to read in select and process information from the various data feeds.  With the proper setup of the "pqact.conf" file, the LDM can be set up so that WXP can process its output.  The discussion of the LDM is in the Installation Guide.

WXP Ingestor

The WXP ingestor program ingest is set up to read in and process each of the Family of Service or NOAAPORT feeds.  Considering that several megabytes of data are broadcast on each of these feeds each day, the ingest program must offer a means to select products (or discard unneeded ones) and file them in a fashion that makes it easier for programs to search for appropriate data.

The ingest program can receive data from four sources:

  1. File -- this is a file of raw ingested data from FOS or NOAAPORT.   This can be fed through the ingest program for addition product selection and management.  To specify a file, list the filename on the command line to the ingest program.
  2. Serial Port -- this is a standard RS/232 type serial port that is configured for baud rate and parity.  WXP has several presets for various FOS feeds such as domestic data and public products.  Otherwise, the port parameters are set with the in_file resource.  To specify a serial port, list the device driver (/dev/ttya) or port (COM1 for Windows).
  3. Named Pipe (FIFO) -- this is a named pipe (Unix only).  This is a file on disk that acts as a queue where one process can write data to the pipe and the ingest program can read that data from the pipe.  This is handy for interfacing the WXP ingestor with non-WXP ingestors.  To specify a named pipe, list the filename of the named pipe.  WXP will determine if it is a named pipe or a file.
  4. Socket -- a socket is a network connection that acts like a queue.   One program feeds data to a socket while the WXP ingestor reads data from the socket and processes it.  WXP uses a TCP/STREAM socket to preserve data integrity.   The WXP ingestor acts as the socket server and binds itself to the socket.  To specify a socket, use the keyword "sock:port" with the port address.  A recommended port address is something in the range of 5000 (this is to eliminate conflicts with other TCP/IP applications).  The other application which acts as a client must know the IP address of the machine the WXP ingestor is running on and the port number it bound to.

The ingest program uses a pattern matching scheme to select products.  Each pattern has an associated action that is to be performed on the matched product.  These actions include:

Bulletin File

The ingest programs uses a setup file to allow the user to setup which products to process.  This setup file is called a bulletin file.  The bulletin filename is specified with the bull_file resource. The bulletin file contains a list of headers, actions and commands to be performed:

header [action] [command/filename...] [header file]
header [action] [command/filename...] [header file]
...

Header

The header can specify the exact header or a pattern to which headers can be matched. The headers listed in the file can use the following wildcard characters:

. or ? match a single character
- or * match any character
[letters] match a character from the set.
[^letters] match any character except those from the set
(str1|str2...) match strings
_ underscore matches a space
/data match extra information

Some example header strings are:

AB Anything that starts with AB
S[AP] SA or SP
(W|AC|RG) Starts with W or AC or RG
F[^O] Anything that starts with F, second character NOT O
FQUS1_KIND Full header specification with spaces as underscores
*_KIND Wildcard match on any product that ends with KIND
/SFP Matches the AWIP header SFP

When the product is GRIB, the header is parsed for specific product parameters. This information can then be used to select the product. The syntax for this selection is:

/[Xvvv][Xvvv][Xvvv]...

Where X is:

The values for each parameter are listed in the WXP Product Description Appendix. Using the internal GRIB parameters is more reliable than selecting by the WMO header because more than one product may have the same header:

HVAC98 KWBC 070000 from Sea Wave model
HVAC99 KWBC 070000 from Aviation model

To separate the two products, use the model specifications: /M77 for the Aviation model and /M10 for the Sea Wave model.

Actions

The actions are:

>> append to file with header
append same as above
> write to file with header, previous content overwritten
write same as above
# write to file without header, previous contents overwritten
file same as above
| pipe product to listed command
pipe same as above
@ run command when product complete
run same as above

Also, the action can be prepended by a set of flags:

Command or Filename

The command is generally the file to save the output or the command to run with the pipe or run actions. The command can have several escape characters:

Examples based on system time 1455Z Jan 12, 1997,
product header FPUS5 KIND 281512

Wildcard Explanation Example
@tag Name convention tag  
%Y current system year 1997
%y current system year (last 2 digits) 97
%b current system month (3 letters) jan
%m current system month 01
%d current system day 12
%j current system Julian day 12
%h current system hour 14
%n current system minute 55
%pd product day 28
%ph product hour 15
%pn product minute 12
%T product type FPUS5
%t product type (lower case) fpus5
%L product locale KIND
%l product locale (lower case) kind
%D data_path resource  
%C con_path resource  
%R raw_path resource  
%G grid_path resource  
%W watch_path resource  
%I image_path resource  
%F file_path resource  

Some of the above wildcards can be preceded with a number.  For dates, the number is a modifier that rounds down to the nearest value that is a multiple of that number.  For example, "%6h" would round down to the nearest 6 hour boundary.  For the previous example, it results in the value 12.  

For the product type and locale, this number is used in a substring operation.  The first digit of the number is the offset into the string and the second digit refers to the number of characters to use.  For example, "%12T" results in "FP".  To get "IND", use "%23L".

Header Files

To aid in the parsing of products from the various feeds, a header file can be created by the ingest program.  This essentially lists the header of each product in the file along with its byte offset into the file.  Since most parsing is based on header, it is far easier to search the smaller header file than to parse through the much larger product file.

To produce these files automatically by the ingestor, add the file name convention to the end of the line in the bulletin file:

F[^O]             >>    %D/%y%m%d%6h_for.wmo  %D/%y%m%d%6h_for.hdr

The first name convention listed "%D/%y%m%d%6h_for.wmo" is the filename where the actual product is saved. The second name convention "%D/%y%m%d%6h_for.hdr" is where the header file information is saved. The syntax of the file is as follows:

offset header / extra
offset header / extra
....

where:

A sample from a forecast data header file:

      0 FPUS86 KPQR 282359 / OPUPDX
   3264 FPUS85 KGGW 290001 / OPUGGW
   3548 FPAK11 PAYA 282207 / &ZCZC JNULFPYAK
   4190 FPUS73 KFGF 282359 / NOWFAR

For more information on header files, see the section on header files.

Sample Bulletin File

A sample bulletin file

# Pattern        Action Filename               Header Filename
#
S[AP]             >>-15 %D/%y%m%d%h_sao.wmo
S[IMNS]           >>-05 %D/%y%m%d%h_syn.wmo
SD                >>+07 %D/%y%m%d%h_rad.wmo
U[^AB]            >>-65 %D/%y%m%d%12h_upa.wmo 
ASUS1_            >>    %D/%y%m%d%3h_frt.wmo
WWUS40            >>    %D/%y%m%d%6h_wws.wmo
FO                >>    %D/%y%m%d%12h_mod.wmo %D/%y%m%d%12h_mod.hdr
A                 >>    %D/%y%m%d%6h_sum.wmo  %D/%y%m%d%6h_sum.hdr
C                 >>    %D/%y%m%d%6h_cli.wmo  %D/%y%m%d%6h_cli.hdr
W                 >>    %D/%y%m%d%6h_sev.wmo  %D/%y%m%d%6h_sev.hdr
F[^O]             >>    %D/%y%m%d%6h_for.wmo  %D/%y%m%d%6h_for.hdr
#
# Specific forecast products
#
FXUS01            >     %D/fore/48hr
FXUS02            >     %D/fore/3-5d_Hem
FPUS5_KIND        |     /usr/local/bin/parse - -ph=FPUS5_KIND -id=%%INZ029 -pa=dollar -of=%D/fore/laf_zone -me=none
*_KIND            >>    %D/Indy/%m%d.dat
#
# HDS products
#
Y/M89             >>    %D/%y%m%d%12h_eta.grb %D/%y%m%d%12h_eta.hdr
Y/M39G211         >>    %D/%y%m%d%12h_ngm.grb %D/%y%m%d%12h_ngm.hdr
Y/M64G211         >>    %D/%y%m%d%12h_ngm.grb %D/%y%m%d%12h_ngm.hdr

Program Output

The default output of the ingest program is to reformat the products, removing the control character sequence and formatting the header and product as follow:

** header ***
product
** header ***
....

This allows the ingestor to reparse data ingested by the WXP ingestor to increase granularity of data files. For example, you may want to take the forecast files from the initial ingest and parse for products out of KIND.

When the ingest program is running, it will display a list of the products being broadcast on the data feed.  The selected product's header will be preceded by "**" and the discarded products will be preceded by "--".  The action and the output file will also be displayed.

**SAAK70 KAWN 080800 RTD** 97 JAN 8  08:38:29Z
Append to: /home/wxp/data/97010808.sao
**SACN85 CWAO 080834    ** 97 JAN 8  08:38:29Z
Append to: /home/wxp/data/97010808.sao
**SPUS70 KWBC 080837    ** 97 JAN 8  08:38:29Z
Append to: /home/wxp/data/97010808.sao
**SPUS80 KWBC 080837    ** 97 JAN 8  08:38:30Z
Append to: /home/wxp/data/97010808.sao
**SPCN46 CWAO 080835    ** 97 JAN 8  08:38:30Z
Append to: /home/wxp/data/97010808.sao
**SACN85 CWAO 080834    ** 97 JAN 8  08:38:30Z
Append to: /home/wxp/data/97010808.sao
**SXUS91 KNKA 080837    ** 97 JAN 8  08:38:30Z
Append to: /home/wxp/data/97010808.sfc
**SPCN42 CWAO 080836    ** 97 JAN 8  08:38:30Z
Append to: /home/wxp/data/97010808.sao
**FPUS3 KBUF 080836     ** 97 JAN 8  08:38:30Z
Append to: /home/wxp/data/97010806.for
**FPUS4 KBUF 080837     ** 97 JAN 8  08:38:30Z
Append to: /home/wxp/data/97010806.for

If the product contains GRIB data, the GRIB header is decoded to give further information about the product:

**HVKA99 KWBC 061200    ** 97 JAN 6  18:58:42Z
AVN analysis - 1000 mb V wind component (m/s)
Append to: /home/wxp/data/97010612_avn1w.grb
**HVLA99 KWBC 061200    ** 97 JAN 6  18:58:44Z
AVN analysis - 1000 mb V wind component (m/s)
Append to: /home/wxp/data/97010612_avn0w.grb
**HVMA99 KWBC 061200    ** 97 JAN 6  18:58:47Z
AVN analysis - 1000 mb V wind component (m/s)
Append to: /home/wxp/data/97010612_avs0e.grb
**HVNA99 KWBC 061200    ** 97 JAN 6  18:58:49Z
AVN analysis - 1000 mb V wind component (m/s)
Append to: /home/wxp/data/97010612_avs1e.grb
**HVOA99 KWBC 061200    ** 97 JAN 6  18:58:51Z
AVN analysis - 1000 mb V wind component (m/s)
Append to: /home/wxp/data/97010612_avs1w.grb
**HVPA99 KWBC 061200    ** 97 JAN 6  18:58:53Z
AVN analysis - 1000 mb V wind component (m/s)
Append to: /home/wxp/data/97010612_avs0w.grb
**HPIA98 KWBC 061200    ** 97 JAN 6  18:58:55Z
AVN analysis - Surface Pressure (Pa)
Append to: /home/wxp/data/97010612_avn0e.grb

Output Files

The ingest program reformats the products when it saves them to file.  First it strips the bulk of the control characters out of the file.  This is to allow text editors and word processors to be able to read in and process the data.  In replacing the control characters, the ingest program delimits headers with asterisks "**".

** header ***
product
** header ***
....

A sample of an DD+ output file is:

** FPUS73 KFGF 282359 ***
NOWFAR

SHORT TERM FORECAST
NATIONAL WEATHER SERVICE EASTERN ND/GRAND FORKS ND
656 PM CDT THU MAY 28 1998

NDZ006>008-014>016-290600-
BENSON-CAVALIER-PEMBINA-RAMSEY-TOWNER-WALSH-
INCLUDING THE CITIES OF -CAVALIER-DEVILS LAKE-GRAFTON-LANGDON-
656 PM CDT THU MAY 28 1998

.NOW...
SCATTERED SHOWERS AND AN ISOLATED THUNDERSTORM CAN BE EXPECTED NORTH OF
A LINE FROM CANDO TO GRAFTON THROUGH SUNSET. THE HEAVIER SHOWERS MAY
PRODUCE UP TO ONE HALF AN INCH OF RAIN. WEST WINDS GUSTING TO 25 MPH
WILL DECREASE AFTER SUNSET. BY MIDNIGHT TEMPERATURES WILL RANGE FROM 55
IN CANDO AND PEMBINA TO 63 IN DEVILS LAKE AND GRAFTON.

$$

** FPUS73 KDMX 290003 ***
NOWDSM

SHORT TERM FORECAST
NATIONAL WEATHER SERVICE DES MOINES IA
703 PM CDT THU MAY 28 1998

IAZ004>007-015>017-023>028-033>039-290603-
ALGONA-ESTHERVILLE-FORT DODGE-IOWA FALLS-MASON CITY-WATERLOO-
703 PM CDT THU MAY 28 1998

.NOW...
...A TORNADO WATCH REMAINS IN EFFECT UNTIL 900 M...
EXPECT LTLE CHANGE IN THE WEATHER EARLY THIS EVENING WITH
PERIODIC SHOWERS AND THUNDERSTORMS.  SOME STORMS WILL BE SEVERE WITH
DAMAGING WINDS...LARGE HAIL AND POSSIBLY A TORNADO.  BE PREPARED TO
SEEK SAFE SHELTER ON SHORT NOTICE.  TEMPERATURES SHOULD MAINLY BE IN
THE 70S WITH COULD BE A BIT COOLER NEAR STORMS.
$$
   
** FPUS74 KFWD 290004 ***
NOWFTW                                  
... 

Log Files

The ingest program logs appropriate information in a log file.  By default, this file is named "ingest.log" and is put in the file_path directory.  The program logs when ingest starts and stops, lists all unselected products and notes any corrupted products from HRS. Each entry is timestamped:

98 MAY 15 15:11:51Z : Unselected product: GPNG98 KWBC 151200 / GRID 07092 10101
98 MAY 15 15:11:51Z : Unselected product: GPNI98 KWBC 151200 / GRID 07092 10101
98 MAY 15 15:13:18Z : Unselected product: NWUS43 KFSD VERIFY / WVMFSD
98 MAY 15 15:13:20Z : Unselected product: NWUS43 KFSD VERIFY / WVMFSD
98 MAY 15 15:13:20Z : Unselected product: NWUS43 KFSD VERIFY / WVMFSD 

The log file name can contain name convention wildcard characters such as "/home/wxp/logs/noaa-%m%d.log" where the %m and %d are replaced with the month and day so that log files are generated for each day the ingestor is running.

Terminating Ingest

Ingest may be stopped in two ways. First, if the ingest program is running in the foreground, the break or interrupt key may be hit and the message "Break: do you want to quit (k/y/n): " appears. This allows the user to quit or return to ingest if the break key was hit by accident. If y is specified, the ingest program ends following the end of the current product. If k is specified, the ingest program ends immediately. If the ingest program is running as a background task (UNIX only), the user may also issue the kill command from the operating system specifying the process identifier of the ingest program.

OPERATIONS NOTE: The ingest program may be listed in the "/etc/rc" (Unix startup script) or "autoexec.bat" (for MS-Windows) so ingest will be started whenever the system is first booted up or powered on. Since no environment variables are set upon system initialization, program resources must be specified by either specifying the resource file with "-df=/home/wxp/etc" or by specifying the data_path and file_path parameters, respectively .


Last updated July 21, 1998