WXP version 5
Program Reference

INGEST

Sections

NAME

ingest - The data ingest and selection program

SYNOPSIS

ingest [parameters...] filename

PARAMETERS

Command Line Resource Default Description
-h help No Lists basic help information.
-df=filename default .wxpdef Sets the name of the resource file.
-na=name name ingest Specifies the name used in resource file parsing.
-ba batch No Run program in batch mode
-me=level message out2 Specifies level of messages to be displayed
  • product headers - out1
  • product descriptions - out2a
  • output filenames - out2c
  • product contents - out3
-fp=filepath file_path current directory Specifies location of database files.
-dp=datapath data_path current directory Specifies location of ingested data files.  This is assumed to be the output of the ingest process.
-cp=conpath con_path current directory  
-if=input in_file none Specifies the type of input to the program:
  • dds - the domestic data feed 
  • pps - the public products data feed
  • ddp - the domestic plus data feed
  • ids - the international data feed
  • 604 - the FAA 604 data feed
  • hds - the high res data feed
  • wxp - WXP ingested data files
  • kav - Kavouras data files

You can also specify baud rate and parity if needed by adding them after the input type, separated by commas as in:
   ddp,9600,even

-bf=bull_file bull_file ingest.bul Specifies the bulletin file. This file contains a list of which products are to be save, the action to be performed and the file naming convention to use.
-lf=log_file log_file ingest.log Specifies the log file for the ingest process. This log file will contain critical information about the status of the ingest program.  Messages in the ingest file are all date/time stamped.
-pa=param[,param...] parameter None Extra parameters:
  • cntrl - print control characters as [XX] rather than stripping them for output to standard output.
  • log_unk - log unknown or unselected products.
filename
filename Standard Input This is the input filename.  This can be either:
  • file - a standard file which needs to be reparsed
  • fifo - this is a named pipe created with the mknod command.  
  • device - this is a character device like a serial port which is used to ingest from NWS data feeds.
  • socket - this is a network connection using a TCP/STREAM connection.   Specify "sock:port" with the port address.

DESCRIPTION

The ingest is set up to read in and process each of the Family of Service and NOAAPORT feeds.  Considering that several megabytes of data are broadcast on each of these feeds each day, the ingest program must offer a means to select products (or discard unneeded ones) and file them in a fashion that makes it easier for programs to search for appropriate data.

The ingest program can receive data from four sources:

  1. File -- this is a file of raw ingested data from FOS or NOAAPORT.   This can be fed through the ingest program for addition product selection and management.  To specify a file, list the filename on the command line to the ingest program.
  2. Serial Port -- this is a standard RS/232 type serial port which is configured for baud rate and parity.  WXP has several presets for various FOS feeds such as domestic data and public products.  Otherwise, the port parameters are set with the in_file resource.  To specify a serial port, list the device driver (/dev/ttya) or port (COM1 for Windows).
  3. Named Pipe (FIFO) -- this is a named pipe (Unix only).  This is a file on disk that acts as a queue where one process can write data to the pipe and the ingest program can read that data from the pipe.  This is handy for interfacing the WXP ingestor with non-WXP ingestors.  To specify a named pipe, list the filename of the named pipe.  WXP will determine if it is a named pipe or a file.
  4. Socket -- a socket is a network connection that acts like a queue.   One program feeds data to a socket while the WXP ingestor reads data from the socket and processes it.  WXP uses a TCP/STREAM socket to preserve data integrity.   The WXP ingestor acts as the socket server and binds itself to the socket.  To specify a socket, use the keyword "sock:port" with the port address.  A recommended port address is something in the range of 5000 (this is to eliminate conflicts with other TCP/IP applications).  The other application which acts as a client must know the IP address of the machine the WXP ingestor is running on and the port number it bound to.

The ingest program uses a pattern matching scheme to select products.  Each pattern has an associated action that is to be performed on the matched product.  These actions include:

Bulletin File

The ingest programs uses a bulletin file to set up which products are to be selected from the data feed and which actions to perform on them. The bulletin filename is specified with the bull_file resource. The bulletin file contains a list of headers, actions and commands to be performed:

header [action] [command/filename...] [header file]
header [action] [command/filename...] [header file]
...

The header can specify the exact header or a pattern to which headers can be matched. The headers listed in the file can use the following wildcard characters:

. or ? match a single character
- or * match any character
[letters] match a character from the set.
[^letters] match any character except those from the set
(str1|str2...) match strings
_ underscore matches a space
/data match extra information

Some example header strings are:

AB Anything that starts with AB
S[AP] SA or SP
(W|AC|RG) Starts with W or AC or RG
F[^O] Anything that starts with F, second character NOT O
FQUS1_KIND Full header specification with spaces as underscores
*_KIND Wildcard match on any product that ends with KIND

When the product is GRIB, the header is parsed for specific product parameters. This information can then be used to select the product. The syntax for this selection is:

/[Xvvv][Xvvv][Xvvv]...

Where X is:

The values for each parameter are listed in the WXP Product Description Appendix. Using the internal GRIB parameters is more reliable than selecting by the WMO header because more than one product may have the same header:

HVAC98 KWBC 070000 from Sea Wave model
HVAC99 KWBC 070000 from Aviation model

To separate the two products, use the model specifications: /M77 for the Aviation model and /M10 for the Sea Wave model.

Actions

The actions are:

>> append to file with header
append same as above
> write to file with header, previous content overwritten
write same as above
# write to file without header, previous contents overwritten
file same as above
| pipe product to listed command
pipe same as above
@ run command when product complete
run same as above

Also, the action can be prepended by a set of flags:

Command or Filename

The command is generally the file to place the output or the command to run with the pipe or run actions. The command can have several escape characters:

Examples based on system time 1455Z Jan 12, 1997,
product header FPUS5 KIND 281512

Wildcard Explanation Example
@tag Name convention tag  
%Y current system year 1997
%y current system year (last 2 digits) 97
%m current system month 01
%d current system day 12
%j current system Julian day 12
%h current system hour 14
%n current system minute 55
%pd product day 28
%ph product hour 15
%pn product minute 12
%T product type FPUS5
%t product type (lower case) fpus5
%L product locale KIND
%l product locale (lower case) kind
%D data_path resource  
%C con_path resource  
%R raw_path resource  
%G grid_path resource  
%W watch_path resource  
%I image_path resource  
%F file_path resource  

Some of the above wildcards can be preceded with a number.  For dates, the number is a modifier which rounds down to the nearest value which is a multiple of that number.  For example, "%6h" would round down to the nearest 6 hour boundary.  For the previous example, it results in the value 12.  

For the product type and locale, this number is used in a substring operation.  The first digit of the number is the offset into the string and the second digit refers to the number of characters to use.  For example, "%12T" results in "FP".  To get "IND", use "%23L".

Header Files

To aid in the parsing of products from the various feeds, a header file can be created by the ingest program.  This essentially lists the header of each product in the file along with its byte offset into the file.  Since most parsing is based on header, it is far easier to search the smaller header file than to parse through the much larger product file.

To produce these files automatically by the ingestor, add the file name convention to the end of the line in the bulletin file:

F[^O]             >>    %D/%y%m%d%6h_for.wmo  %D/%y%m%d%6h_for.hdr

The first name convention listed "%D/%y%m%d%6h_for.wmo" is the filename where the actual product is saved. The second name convention "%D/%y%m%d%6h_for.hdr" is where the header file information is saved. The syntax of the file is as follows:

offset header / extra
offset header / extra
....

where:

A sample from a forecast data header file:

      0 FPUS86 KPQR 282359 / OPUPDX
   3264 FPUS85 KGGW 290001 / OPUGGW
   3548 FPAK11 PAYA 282207 / &ZCZC JNULFPYAK
   4190 FPUS73 KFGF 282359 / NOWFAR

For more information on header files, see the section on header files.

Sample Bulletin File

A sample bulletin file

# Pattern        Action Filename               Header Filename
#
S[AP]             >>-15 %D/%y%m%d%h_sao.wmo
S[IMNS]           >>-05 %D/%y%m%d%h_syn.wmo
SD                >>+07 %D/%y%m%d%h_rad.wmo
U[^AB]            >>-65 %D/%y%m%d%12h_upa.wmo 
ASUS1_            >>    %D/%y%m%d%3h_frt.wmo
WWUS40            >>    %D/%y%m%d%6h_wws.wmo
FO                >>    %D/%y%m%d%12h_mod.wmo %D/%y%m%d%12h_mod.hdr
A                 >>    %D/%y%m%d%6h_sum.wmo  %D/%y%m%d%6h_sum.hdr
C                 >>    %D/%y%m%d%6h_cli.wmo  %D/%y%m%d%6h_cli.hdr
W                 >>    %D/%y%m%d%6h_sev.wmo  %D/%y%m%d%6h_sev.hdr
#
# Specific forecast products
#
FXUS01            >     %D/fore/48hr
FXUS02            >     %D/fore/3-5d_Hem
FPUS53_KIND       |     /usr/local/bin/parse - -ph=FPUS53_KIND -id=%%INZ029 -pa=dollar -of=%D/fore/laf_zone -me=none
*_KIND            >>    %D/Indy/%m%d.dat
#
# HDS products
#
Y/M89             >>    %D/%y%m%d%12h_eta.grb %D/%y%m%d%12h_eta.hdr
Y/M39G211         >>    %D/%y%m%d%12h_ngm.grb %D/%y%m%d%12h_ngm.hdr
Y/M64G211         >>    %D/%y%m%d%12h_ngm.grb %D/%y%m%d%12h_ngm.hdr

Program Output

The default output of the ingest program is to reformat the products, removing the control character sequence and formatting the header and product as follow:

** header ***
product
** header ***
....

This allows the ingestor to reparse data ingested by the WXP ingestor to increase granularity of data files. For example, you may want to take the forecast files from the initial ingest and parse for products out of KIND.

When the ingest program is running, it will display a list of the products being broadcast on the data feed.  The selected product's header will be preceded by "**" and the discarded products will be preceded by "--".  The action and the output file will also be displayed.

**SAAK70 KAWN 080800 RTD** 97 JAN 8  08:38:29Z
Append to: /usr/wxp/data/97010808.sao
**SACN85 CWAO 080834    ** 97 JAN 8  08:38:29Z
Append to: /usr/wxp/data/97010808.sao
**SPUS70 KWBC 080837    ** 97 JAN 8  08:38:29Z
Append to: /usr/wxp/data/97010808.sao
**SPUS80 KWBC 080837    ** 97 JAN 8  08:38:30Z
Append to: /usr/wxp/data/97010808.sao
**SPCN46 CWAO 080835    ** 97 JAN 8  08:38:30Z
Append to: /usr/wxp/data/97010808.sao
**SACN85 CWAO 080834    ** 97 JAN 8  08:38:30Z
Append to: /usr/wxp/data/97010808.sao
**SXUS91 KNKA 080837    ** 97 JAN 8  08:38:30Z
Append to: /usr/wxp/data/97010808.sfc
**SPCN42 CWAO 080836    ** 97 JAN 8  08:38:30Z
Append to: /usr/wxp/data/97010808.sao
**FPUS3 KBUF 080836     ** 97 JAN 8  08:38:30Z
Append to: /usr/wxp/data/97010806.for
**FPUS4 KBUF 080837     ** 97 JAN 8  08:38:30Z
Append to: /usr/wxp/data/97010806.for

If the product contains GRIB data, the GRIB header is decoded to give further information about the product:

**HVKA99 KWBC 061200    ** 97 JAN 6  18:58:42Z
AVN analysis - 1000 mb V wind component (m/s)
Append to: /usr/wxp/data/97010612_avn1w.grb
**HVLA99 KWBC 061200    ** 97 JAN 6  18:58:44Z
AVN analysis - 1000 mb V wind component (m/s)
Append to: /usr/wxp/data/97010612_avn0w.grb
**HVMA99 KWBC 061200    ** 97 JAN 6  18:58:47Z
AVN analysis - 1000 mb V wind component (m/s)
Append to: /usr/wxp/data/97010612_avs0e.grb
**HVNA99 KWBC 061200    ** 97 JAN 6  18:58:49Z
AVN analysis - 1000 mb V wind component (m/s)
Append to: /usr/wxp/data/97010612_avs1e.grb
**HVOA99 KWBC 061200    ** 97 JAN 6  18:58:51Z
AVN analysis - 1000 mb V wind component (m/s)
Append to: /usr/wxp/data/97010612_avs1w.grb
**HVPA99 KWBC 061200    ** 97 JAN 6  18:58:53Z
AVN analysis - 1000 mb V wind component (m/s)
Append to: /usr/wxp/data/97010612_avs0w.grb
**HPIA98 KWBC 061200    ** 97 JAN 6  18:58:55Z
AVN analysis - Surface Pressure (Pa)
Append to: /usr/wxp/data/97010612_avn0e.grb

Output Files

The ingest program reformats the products when it saves them to file.  First it strips the bulk of the control characters out of the file.  This is to allow text editors and word processors to be able to read in and process the data.  In replacing the control characters, the ingest program delimits headers with asterisks "**".

** header ***
product
** header ***
....

A sample of a DD+ output file is:

** FPUS73 KFGF 282359 ***
NOWFAR

SHORT TERM FORECAST
NATIONAL WEATHER SERVICE EASTERN ND/GRAND FORKS ND
656 PM CDT THU MAY 28 1998

NDZ006>008-014>016-290600-
BENSON-CAVALIER-PEMBINA-RAMSEY-TOWNER-WALSH-
INCLUDING THE CITIES OF -CAVALIER-DEVILS LAKE-GRAFTON-LANGDON-
656 PM CDT THU MAY 28 1998

.NOW...
SCATTERED SHOWERS AND AN ISOLATED THUNDERSTORM CAN BE EXPECTED NORTH OF
A LINE FROM CANDO TO GRAFTON THROUGH SUNSET. THE HEAVIER SHOWERS MAY
PRODUCE UP TO ONE HALF AN INCH OF RAIN. WEST WINDS GUSTING TO 25 MPH
WILL DECREASE AFTER SUNSET. BY MIDNIGHT TEMPERATURES WILL RANGE FROM 55
IN CANDO AND PEMBINA TO 63 IN DEVILS LAKE AND GRAFTON.

$$

** FPUS73 KDMX 290003 ***
NOWDSM

SHORT TERM FORECAST
NATIONAL WEATHER SERVICE DES MOINES IA
703 PM CDT THU MAY 28 1998

IAZ004>007-015>017-023>028-033>039-290603-
ALGONA-ESTHERVILLE-FORT DODGE-IOWA FALLS-MASON CITY-WATERLOO-
703 PM CDT THU MAY 28 1998

.NOW...
...A TORNADO WATCH REMAINS IN EFFECT UNTIL 900 M...
EXPECT LTLE CHANGE IN THE WEATHER EARLY THIS EVENING WITH
PERIODIC SHOWERS AND THUNDERSTORMS.  SOME STORMS WILL BE SEVERE WITH
DAMAGING WINDS...LARGE HAIL AND POSSIBLY A TORNADO.  BE PREPARED TO
SEEK SAFE SHELTER ON SHORT NOTICE.  TEMPERATURES SHOULD MAINLY BE IN
THE 70S WITH COULD BE A BIT COOLER NEAR STORMS.
$$
   
** FPUS74 KFWD 290004 ***
NOWFTW                                  
... 

PAN (Product Arrival Notices) Messages

Product arrival notices are sent at the completion of a product to a specified PAN receiving program.  The PAN receiver will use this message to trigger an action based on the arrival of that product.  For example, a PAN receiver might be interested in the arrival of severe thunderstorm warning messages so it can warn the user.  The PAN message is broadcast over a socket using a UDP transmission.  This is a connectionless process where the PAN is sent to a specific address and port and it is up to the PAN receiver to be active and waiting for the message using a receive from call.

The PAN message is sent as a single line of information for each product received by the ingestor.  The information in the PAN message is broken up into fields delimited by a bar "|":

ID|Server|###|YYYYMMDDhhmmss|WMO/Extra|Filename|Offset|Size

Fields:

Examples:

901|45|909|19980428152512|SDXX99 KWBC 281522 / RCMFWS|/home/wxp/data/98042815_rad.wmo|75975|2410| 
  1. 901 - identifies NOAAPORT PAN message
  2. 45 - identifies local NOAAPORT server
  3. 909 - is the sequence number
  4. 19980428152512 - Date product arrived on server and PAN message sent (depends on server time). It arrived at 15:25:12Z on 24 APR 1998
  5. SDXX99 KWBC 281522 - WMO header
    RCMFWS - AWIPS header
  6. /home/unisys/wxp/data/98042815_rad.wmo - server filename where product is located. Each file can contain more than one product
  7. 75975 - byte offset in file
  8. 2410 - product size in bytes
901|45|907|19980428152510|YSRG98 KWBD 281200 PAA / 89 212 4030036 1 0 66|/home/wxp/model/98042812_eta2.grb|3412593|32943|
  1. 901
  2. 45 - identifies local NOAAPORT server
  3. 907 - is the sequence number
  4. 19980428152510 - Date product arrived on server and PAN message sent
  5. YSRG98 KWBD 281200 PAA - WMO header
    89 212 4030036 1 0 66 - Extra GRIB info, model 89 is Eta model, grid 212 is AWIPS grid 212, time 4030036 is a 6 hour accumulation from forecast hour 30 to 36, level type 1 is surface, level 0 is ignored for surface and parameter 66 is snow depth.
  6. /home/unisys/wxp/model/98042812_eta2.grb
  7. 3412593 - byte offset in file
  8. 32943 - product size in bytes

PAN Message Setup

To set up the WXP ingestor for PAN messages the following pieces of information must be added to the "ingest.bul" file. At some point in the file, a PAN configuration line must be added.

# PAN Setup
@PAN id=45 sock:steve:5566 sock:dev5:5000 pan.log

The "@PAN" is a keyword in the bulletin file for the PAN configuration line. The "id=45" specifies the NOAAPORT unique server ID which is broadcast as field 2 in the PAN message. The rest of the line lists destinations. The "sock" keyword specifies the PAN go over a UDP socket. The string "steve:5566" is the network name of the destination computer and the TCP/IP port number. If the sock keyword is omitted, the PAN is save to the listed filename such as "pan.log". Up to 10 destinations can be listed. Each destination is addressed starting with 0 and going to 9 in the order listed on the PAN line.

By default, no PAN messages are sent even if the PAN line is added to the bulletin file. To enable PAN messages, the "P" flag must be added to the action for each product being saved on the server. For example a product line would look like:

# Pattern Action Filename Header Filename

FT       >> %D/%y%m%d%h_term.wmo %D/%y%m%d%h_term.hdr

To enable this product type for PAN messages, add the "P" flag to the action.

FT      P>> %D/%y%m%d%h_term.wmo %D/%y%m%d%h_term.hdr

This will send a PAN message to all listed destinations whenever this products is received. If you don't want to send a PAN to all destinations, the destination IDs can be listed:

FT   P035>> %D/%y%m%d%h_term.wmo %D/%y%m%d%h_term.hdr

In this case, PAN messages will only be sent to the 0, 3 and 5th destinations.

Log Files

The ingest program logs appropriate information in a log file.  By default, this file is named "ingest.log" and is put in the file_path directory.  The program logs when ingest starts and stops, lists all unselected products and notes any corrupted products from HRS. Each entry is timestamped:

98 MAY 15 15:11:51Z : Unselected product: GPNG98 KWBC 151200 / GRID 07092 10101
98 MAY 15 15:11:51Z : Unselected product: GPNI98 KWBC 151200 / GRID 07092 10101
98 MAY 15 15:13:18Z : Unselected product: NWUS43 KFSD VERIFY / WVMFSD
98 MAY 15 15:13:20Z : Unselected product: NWUS43 KFSD VERIFY / WVMFSD
98 MAY 15 15:13:20Z : Unselected product: NWUS43 KFSD VERIFY / WVMFSD 

The log file name can contain name convention wildcard characters such as "/usr/wxp/logs/noaa-%m%d.log" where the %m and %d are replaced with the month and day so that log files are generated for each day the ingestor is running.

Terminating Ingest

Ingest may be stopped in two ways. First, if the ingest program is running in the foreground, the break or interrupt key may be hit and the message "Break: do you want to quit (k/y/n): " appears. This allows the user to quit or return to ingest if the break key was hit by accident. If y is specified, the ingest program ends following the end of the current product. If k is specified, the ingest program ends immediately. If the ingest program is running as a background task (UNIX only), the user may also issue the kill command from the operating system specifying the process identifier of the ingest program.

OPERATIONS NOTE: The ingest program may be listed in the "/etc/rc" (Unix startup script) or "autoexec.bat" (for MS-Windows) so ingest will be started whenever the system is first booted up or powered on. Since no environment variables are set upon system initialization, program resources must be specified by either specifying the resource file with "-df=/usr/wxp/etc" or by specifying the data_path and file_path parameters, respectively .

FILES

SEE ALSO


Last updated May 30, 1998