Data Ingest: Ingest Program
These data feeds need to be read in by the computer and saved in a fashion compatible with the analysis package. This process is call "data ingest". WXP can process data from two ingest programs. The first is the Local Data Manager (LDM) provided to universities by Unidata. The second is the WXP ingest program.
LDM (Local Data Manager)
For Unidata/university sites, the software package called the LDM is available to read in select and process information from the various data feeds. With the proper setup of the "pqact.conf" file, the LDM can be set up so that WXP can process its output. The discussion of the LDM is in the Installation Guide.
WXP Ingestor
The WXP ingestor program ingest is set up to read in and process each of the Family of Service or NOAAPORT feeds. Considering that several megabytes of data are broadcast on each of these feeds each day, the ingest program must offer a means to select products (or discard unneeded ones) and file them in a fashion that makes it easier for programs to search for appropriate data.
The ingest program can receive data from four sources:
- File -- this is a file of raw ingested data from FOS or NOAAPORT. This can be fed through the ingest program for addition product selection and management. To specify a file, list the filename on the command line to the ingest program.
- Serial Port -- this is a standard RS/232 type serial port that is configured for baud rate and parity. WXP has several presets for various FOS feeds such as domestic data and public products. Otherwise, the port parameters are set with the in_file resource. To specify a serial port, list the device driver (/dev/ttya) or port (COM1 for Windows).
- Named Pipe (FIFO) -- this is a named pipe (Unix only). This is a file on disk that acts as a queue where one process can write data to the pipe and the ingest program can read that data from the pipe. This is handy for interfacing the WXP ingestor with non-WXP ingestors. To specify a named pipe, list the filename of the named pipe. WXP will determine if it is a named pipe or a file.
- Socket -- a socket is a network connection that acts like a queue. One program feeds data to a socket while the WXP ingestor reads data from the socket and processes it. WXP uses a TCP/STREAM socket to preserve data integrity. The WXP ingestor acts as the socket server and binds itself to the socket. To specify a socket, use the keyword "sock:port" with the port address. A recommended port address is something in the range of 5000 (this is to eliminate conflicts with other TCP/IP applications). The other application which acts as a client must know the IP address of the machine the WXP ingestor is running on and the port number it bound to.
The ingest program uses a pattern matching scheme to select products. Each pattern has an associated action that is to be performed on the matched product. These actions include:
- write - write the product to a file. If the ingestor matches a new product, the new product will overwrite the contents of the file.
- append - append each new matched product to the end of the file. This is the most common action as it is easier to a single file with a few hundred products than it is to manage hundreds of small files.
- pipe - pipe the contents of the product to the standard input of a specified program. With the pipe action, further processing of the data can be done before writing the output to file. Also, this can be used to mail products to other users.
- run - run a specified program once the matched product is received. This can be used to flag a user when a severe weather statement is received.
Bulletin File
The ingest programs uses a setup file to allow the user to setup which products to process. This setup file is called a bulletin file. The bulletin filename is specified with the bull_file resource. The bulletin file contains a list of headers, actions and commands to be performed:
header [action] [command/filename...] [header file] header [action] [command/filename...] [header file] ...
Header
The header can specify the exact header or a pattern to which headers can be matched. The headers listed in the file can use the following wildcard characters:
. or ? | match a single character |
- or * | match any character |
[letters] | match a character from the set. |
[^letters] | match any character except those from the set |
(str1|str2...) | match strings |
_ | underscore matches a space |
/data | match extra information |
Some example header strings are:
AB | Anything that starts with AB |
S[AP] | SA or SP |
(W|AC|RG) | Starts with W or AC or RG |
F[^O] | Anything that starts with F, second character NOT O |
FQUS1_KIND | Full header specification with spaces as underscores |
*_KIND | Wildcard match on any product that ends with KIND |
/SFP | Matches the AWIP header SFP |
When the product is GRIB, the header is parsed for specific product parameters. This
information can then be used to select the product. The syntax for this selection is:
/[Xvvv][Xvvv][Xvvv]...
Where X is:
M -- model number
G -- grid number
L -- level type
H -- level value
T -- forecast time
V -- variable number
vvv -- the value of the parameter
The values for each parameter are listed in the WXP Product Description Appendix. Using
the internal GRIB parameters is more reliable than selecting by the WMO header
because more than one product may have the same header:
HVAC98 KWBC 070000 from Sea Wave model
HVAC99 KWBC 070000 from Aviation model
To separate the two products, use the model specifications: /M77 for the Aviation
model and /M10 for the Sea Wave model.
Actions
The actions are:
>> | append to file with header |
append | same as above |
> | write to file with header, previous content overwritten |
write | same as above |
# | write to file without header, previous contents overwritten |
file | same as above |
| | pipe product to listed command |
pipe | same as above |
@ | run command when product complete |
run | same as above |
Also, the action can be prepended by a set of flags:
- R -- specifies to save the file as a raw file and not strip control characters.
- B -- specifies a product to be a binary product and not strip unprintable characters
- P -- specifies to send a PAN message at the completion of a product
Command or Filename
The command is generally the file to save the output or the command to run with the pipe or run actions. The command can have several escape characters:
Examples based on system time 1455Z Jan 12, 1997,
product header FPUS5 KIND 281512
Wildcard | Explanation | Example |
@tag | Name convention tag | |
%Y | current system year | 1997 |
%y | current system year (last 2 digits) | 97 |
%b | current system month (3 letters) | jan |
%m | current system month | 01 |
%d | current system day | 12 |
%j | current system Julian day | 12 |
%h | current system hour | 14 |
%n | current system minute | 55 |
%pd | product day | 28 |
%ph | product hour | 15 |
%pn | product minute | 12 |
%T | product type | FPUS5 |
%t | product type (lower case) | fpus5 |
%L | product locale | KIND |
%l | product locale (lower case) | kind |
%D | data_path resource | |
%C | con_path resource | |
%R | raw_path resource | |
%G | grid_path resource | |
%W | watch_path resource | |
%I | image_path resource | |
%F | file_path resource |
Some of the above wildcards can be preceded with a number. For dates, the number is a modifier that rounds down to the nearest value that is a multiple of that number. For example, "%6h" would round down to the nearest 6 hour boundary. For the previous example, it results in the value 12.
For the product type and locale, this number is used in a substring operation. The first digit of the number is the offset into the string and the second digit refers to the number of characters to use. For example, "%12T" results in "FP". To get "IND", use "%23L".
Header Files
To aid in the parsing of products from the various feeds, a header file can be created by the ingest program. This essentially lists the header of each product in the file along with its byte offset into the file. Since most parsing is based on header, it is far easier to search the smaller header file than to parse through the much larger product file.
To produce these files automatically by the ingestor, add the file name convention to the end of the line in the bulletin file:
F[^O] >> %D/%y%m%d%6h_for.wmo %D/%y%m%d%6h_for.hdr
The first name convention listed "%D/%y%m%d%6h_for.wmo
" is the
filename where the actual product is saved. The second name convention "%D/%y%m%d%6h_for.hdr
"
is where the header file information is saved. The syntax of the file is as follows:
offset header / extra offset header / extra ....
where:
- offset -- is the byte offset into the file,
- header -- is the product header in its entirety is listed after the offset
- extra -- extra information about the product which is normally the AWIPS header
A sample from a forecast data header file:
0 FPUS86 KPQR 282359 / OPUPDX 3264 FPUS85 KGGW 290001 / OPUGGW 3548 FPAK11 PAYA 282207 / &ZCZC JNULFPYAK 4190 FPUS73 KFGF 282359 / NOWFAR
For more information on header files, see the section on header files.
Sample Bulletin File
# Pattern Action Filename Header Filename # S[AP] >>-15 %D/%y%m%d%h_sao.wmo S[IMNS] >>-05 %D/%y%m%d%h_syn.wmo SD >>+07 %D/%y%m%d%h_rad.wmo U[^AB] >>-65 %D/%y%m%d%12h_upa.wmo ASUS1_ >> %D/%y%m%d%3h_frt.wmo WWUS40 >> %D/%y%m%d%6h_wws.wmo FO >> %D/%y%m%d%12h_mod.wmo %D/%y%m%d%12h_mod.hdr A >> %D/%y%m%d%6h_sum.wmo %D/%y%m%d%6h_sum.hdr C >> %D/%y%m%d%6h_cli.wmo %D/%y%m%d%6h_cli.hdr W >> %D/%y%m%d%6h_sev.wmo %D/%y%m%d%6h_sev.hdr F[^O] >> %D/%y%m%d%6h_for.wmo %D/%y%m%d%6h_for.hdr # # Specific forecast products # FXUS01 > %D/fore/48hr FXUS02 > %D/fore/3-5d_Hem FPUS5_KIND | /usr/local/bin/parse - -ph=FPUS5_KIND -id=%%INZ029 -pa=dollar -of=%D/fore/laf_zone -me=none *_KIND >> %D/Indy/%m%d.dat # # HDS products # Y/M89 >> %D/%y%m%d%12h_eta.grb %D/%y%m%d%12h_eta.hdr Y/M39G211 >> %D/%y%m%d%12h_ngm.grb %D/%y%m%d%12h_ngm.hdr Y/M64G211 >> %D/%y%m%d%12h_ngm.grb %D/%y%m%d%12h_ngm.hdr
Program Output
The default output of the ingest program is to reformat the products, removing the control character sequence and formatting the header and product as follow:
** header *** product ** header *** ....
This allows the ingestor to reparse data ingested by the WXP ingestor to increase granularity of data files. For example, you may want to take the forecast files from the initial ingest and parse for products out of KIND.
When the ingest program is running, it will display a list of the products being broadcast on the data feed. The selected product's header will be preceded by "**" and the discarded products will be preceded by "--". The action and the output file will also be displayed.
**SAAK70 KAWN 080800 RTD** 97 JAN 8 08:38:29Z Append to: /home/wxp/data/97010808.sao **SACN85 CWAO 080834 ** 97 JAN 8 08:38:29Z Append to: /home/wxp/data/97010808.sao **SPUS70 KWBC 080837 ** 97 JAN 8 08:38:29Z Append to: /home/wxp/data/97010808.sao **SPUS80 KWBC 080837 ** 97 JAN 8 08:38:30Z Append to: /home/wxp/data/97010808.sao **SPCN46 CWAO 080835 ** 97 JAN 8 08:38:30Z Append to: /home/wxp/data/97010808.sao **SACN85 CWAO 080834 ** 97 JAN 8 08:38:30Z Append to: /home/wxp/data/97010808.sao **SXUS91 KNKA 080837 ** 97 JAN 8 08:38:30Z Append to: /home/wxp/data/97010808.sfc **SPCN42 CWAO 080836 ** 97 JAN 8 08:38:30Z Append to: /home/wxp/data/97010808.sao **FPUS3 KBUF 080836 ** 97 JAN 8 08:38:30Z Append to: /home/wxp/data/97010806.for **FPUS4 KBUF 080837 ** 97 JAN 8 08:38:30Z Append to: /home/wxp/data/97010806.for
If the product contains GRIB data, the GRIB header is decoded to give further information about the product:
**HVKA99 KWBC 061200 ** 97 JAN 6 18:58:42Z AVN analysis - 1000 mb V wind component (m/s) Append to: /home/wxp/data/97010612_avn1w.grb **HVLA99 KWBC 061200 ** 97 JAN 6 18:58:44Z AVN analysis - 1000 mb V wind component (m/s) Append to: /home/wxp/data/97010612_avn0w.grb **HVMA99 KWBC 061200 ** 97 JAN 6 18:58:47Z AVN analysis - 1000 mb V wind component (m/s) Append to: /home/wxp/data/97010612_avs0e.grb **HVNA99 KWBC 061200 ** 97 JAN 6 18:58:49Z AVN analysis - 1000 mb V wind component (m/s) Append to: /home/wxp/data/97010612_avs1e.grb **HVOA99 KWBC 061200 ** 97 JAN 6 18:58:51Z AVN analysis - 1000 mb V wind component (m/s) Append to: /home/wxp/data/97010612_avs1w.grb **HVPA99 KWBC 061200 ** 97 JAN 6 18:58:53Z AVN analysis - 1000 mb V wind component (m/s) Append to: /home/wxp/data/97010612_avs0w.grb **HPIA98 KWBC 061200 ** 97 JAN 6 18:58:55Z AVN analysis - Surface Pressure (Pa) Append to: /home/wxp/data/97010612_avn0e.grb
Output Files
The ingest program reformats the products when it saves them to file. First it strips the bulk of the control characters out of the file. This is to allow text editors and word processors to be able to read in and process the data. In replacing the control characters, the ingest program delimits headers with asterisks "**".
** header *** product ** header *** ....
A sample of an DD+ output file is:
** FPUS73 KFGF 282359 *** NOWFAR SHORT TERM FORECAST NATIONAL WEATHER SERVICE EASTERN ND/GRAND FORKS ND 656 PM CDT THU MAY 28 1998 NDZ006>008-014>016-290600- BENSON-CAVALIER-PEMBINA-RAMSEY-TOWNER-WALSH- INCLUDING THE CITIES OF -CAVALIER-DEVILS LAKE-GRAFTON-LANGDON- 656 PM CDT THU MAY 28 1998 .NOW... SCATTERED SHOWERS AND AN ISOLATED THUNDERSTORM CAN BE EXPECTED NORTH OF A LINE FROM CANDO TO GRAFTON THROUGH SUNSET. THE HEAVIER SHOWERS MAY PRODUCE UP TO ONE HALF AN INCH OF RAIN. WEST WINDS GUSTING TO 25 MPH WILL DECREASE AFTER SUNSET. BY MIDNIGHT TEMPERATURES WILL RANGE FROM 55 IN CANDO AND PEMBINA TO 63 IN DEVILS LAKE AND GRAFTON. $$ ** FPUS73 KDMX 290003 *** NOWDSM SHORT TERM FORECAST NATIONAL WEATHER SERVICE DES MOINES IA 703 PM CDT THU MAY 28 1998 IAZ004>007-015>017-023>028-033>039-290603- ALGONA-ESTHERVILLE-FORT DODGE-IOWA FALLS-MASON CITY-WATERLOO- 703 PM CDT THU MAY 28 1998 .NOW... ...A TORNADO WATCH REMAINS IN EFFECT UNTIL 900 M... EXPECT LTLE CHANGE IN THE WEATHER EARLY THIS EVENING WITH PERIODIC SHOWERS AND THUNDERSTORMS. SOME STORMS WILL BE SEVERE WITH DAMAGING WINDS...LARGE HAIL AND POSSIBLY A TORNADO. BE PREPARED TO SEEK SAFE SHELTER ON SHORT NOTICE. TEMPERATURES SHOULD MAINLY BE IN THE 70S WITH COULD BE A BIT COOLER NEAR STORMS. $$ ** FPUS74 KFWD 290004 *** NOWFTW ...
Log Files
The ingest program logs appropriate information in a log file. By default, this file is named "ingest.log" and is put in the file_path directory. The program logs when ingest starts and stops, lists all unselected products and notes any corrupted products from HRS. Each entry is timestamped:
98 MAY 15 15:11:51Z : Unselected product: GPNG98 KWBC 151200 / GRID 07092 10101 98 MAY 15 15:11:51Z : Unselected product: GPNI98 KWBC 151200 / GRID 07092 10101 98 MAY 15 15:13:18Z : Unselected product: NWUS43 KFSD VERIFY / WVMFSD 98 MAY 15 15:13:20Z : Unselected product: NWUS43 KFSD VERIFY / WVMFSD 98 MAY 15 15:13:20Z : Unselected product: NWUS43 KFSD VERIFY / WVMFSD
The log file name can contain name convention wildcard characters such as "/home/wxp/logs/noaa-%m%d.log" where the %m and %d are replaced with the month and day so that log files are generated for each day the ingestor is running.
Terminating Ingest
Ingest may be stopped in two ways. First, if the ingest program is running in the foreground, the break or interrupt key may be hit and the message "Break: do you want to quit (k/y/n): " appears. This allows the user to quit or return to ingest if the break key was hit by accident. If y is specified, the ingest program ends following the end of the current product. If k is specified, the ingest program ends immediately. If the ingest program is running as a background task (UNIX only), the user may also issue the kill command from the operating system specifying the process identifier of the ingest program.
OPERATIONS NOTE: The ingest program may be listed in the "/etc/rc" (Unix startup script) or "autoexec.bat" (for MS-Windows) so ingest will be started whenever the system is first booted up or powered on. Since no environment variables are set upon system initialization, program resources must be specified by either specifying the resource file with "-df=/home/wxp/etc" or by specifying the data_path and file_path parameters, respectively .
For further information about WXP, email devo@ks.unisys.com
Last updated by Dan Vietor on July 21, 1998