wmoingest
Sections
NAME
wmoingest - The data ingest and selection program
SYNOPSIS
wmoingest [parameters...] filename
PARAMETERS
Command Line | Resource | Default | Description |
---|---|---|---|
-h | help | No | Lists basic help information. |
-df=filename | default | wxp.cfg | Sets the name of the resource file. |
-na=name | name | the program name | Specifies the name used in resource file parsing. |
-fp=path | file_path | the current directory | Specifies location of database files. |
-dp=path | data_path | the current directory | Specifies the location of the input raw data files. |
-cp=path | con_path | the current directory | Specifies location of decoded/converted data files |
-gp=path | grid_path | the current directory | Specifies location of gridded output files from WXP. |
-rp=path | raw_path | the current directory | Specifies location of output raw files generated by WXP. |
-ip=path | image_path | the current directory | Specifies location of output image files (.gif/,png) from WXP. |
-tp=path | text_path | the current directory | Specifies location of raw ingest text data |
-mp=path | model_path | the current directory | Specifies location of raw model GRIB data |
-sp=path | sat_path | the current directory | Specifies location of satellite images. |
-nc=name_conv | name_conv | name.cnv or name_conv file | This sets which name convention file to use. |
-if=in_file | in_file | program specific | Specifies the input file name tag. |
-of=out_file | out_file | program specific | Specifies the output file name tag. |
-pf=filename | prod_file | ingest.prd | Specifies the name of the product file. |
-lf=filename | log_file | program dependent | Specifies the name of the log file. |
-pa=param[,param...] | parameter | none | Specifies addition parameters to program. |
filename (positional) | filename | none | Specify input filename |
DESCRIPTION
The wmoingest program will read in data formatted in the WMO communications standard. The format of a data product contains a header, data and trailer. The format of the header and associated product are as follows:
[SOH][CR][CR][LF] seq[CR][CR][LF] header[CR][CR][LF] data.......... [CR][CR][LF][ETX]
Where:
-
seq -- A sequence number which is incremented one for each successive bulletin.
header -- The WMO header information describing the type, origin and observation time of the data.
Considering that several gigabytes of data are broadcast on these feeds each day, the wmoingest program must offer a means to select products (or discard unneeded ones) and file them in a fashion that makes it easier for programs to search for appropriate data.
The wmoingest program can receive data from four sources:
- file - a file of raw (unprocessed) ingested data from a source like NOAAPORT.
- sock:port - a TCP socket that allows connection from remote clients. The WXP ingestor acts as the socket server and binds itself to the socket. The client must connect and send data in the WMO format over the socket.
The wmoingest program uses a pattern matching scheme to select products. Each pattern has an associated action that is to be performed on the matched product. These actions include:
- write - write the product to a file. If the ingestor matches a new product, the new product will overwrite the contents of the file.
- append - append each new matched product to the end of the file. This is the most common action as it is easier to a single file with a few hundred products than it is to manage hundreds of small files.
- pipe - pipe the contents of the product to the standard input of a specified program. With the pipe action, further processing of the data can be done before writing the output to file.
- run - run a specified program once the matched product is received. This can be used to flag a user when a severe weather statement is received.
Product File
The wmoingest programs uses a product file to set up which products are to be selected from the data feed and which actions to perform on them. This is the same as the bulletin file in WXP 5. The product filename is specified with the resrc:prod_file resource. The product file contains a list of headers, actions and commands to be performed:
header [action] [command/filename...] [header file] header [action] [command/filename...] [header file] ...
The header can specify the exact header or a regular expression. Regular expression characters are:
. or ? | match a single character |
- or * | match any character |
[letters] | match a character from the set. |
[^letters] | match any character except those from the set |
(str1|str2...) | match strings |
_ | underscore matches a space |
/data | match secondary information (i.e. AWIPS header) |
Some example header patterns are:
AB | Anything that starts with AB |
S[AP] | SA or SP |
(W|AC|RG) | Starts with W or AC or RG |
F[^O] | Anything that starts with F, second character NOT O |
FQUS1_KIND | Full header specification with spaces as underscores |
*_KIND | Wildcard match on any product that ends with KIND |
When the product is GRIB, the header is parsed for specific product parameters. This information can then be used to select the product. The syntax for this selection is:
/[Xvvv][Xvvv][Xvvv]...
Where X is:
S - grid source
M - model number
G - grid number
T - forecast time
L - level type
H - level value
V - variable number
vvv - the value of the parameter
The values for each parameter are listed in the WXP Product Description Appendix.
Using the internal GRIB parameters is more reliable than selecting by the
WMO header because more than one product may have the same header:
HVAC99 KWBC 070000 from Sea Wave model
HVAC99 KWBC 070000 from Aviation model
To separate the two products, use the model specifications: /M77 for
the Aviation model and /M10 for the Sea Wave model.
Actions
The actions are:
>> | append to file with header |
append | same as above |
> | write to file with header, previous content overwritten |
write | same as above |
# | write to file without header, previous contents overwritten |
file | same as above |
| | pipe product to listed command |
pipe | same as above |
@ | run command when product complete |
run | same as above |
Also, the action can be prepended by a set of flags:
- R - specifies to save the file as a raw file and not strip control characters. Full WMO CCB header remains in tact with control characters.
- B - specifies a product to be a binary product and not strip unprintable characters. Use short header.
- P - specifies to send a PAN message at the completion of a product
Command or Filename
The command is generally the file to save the product or a command to be run with the pipe or run actions. The command can have several escape characters:
Examples based on system time 1455Z Aug 12, 2011,
product header FPUS5 KIND 121432
Wildcard | Explanation | Example |
---|---|---|
%Y | current system year | 2011 |
%y | current system year (last 2 digits) | 11 |
%m | current system month | 08 |
%d | current system day | 12 |
%j | current system Julian day | 223 |
%h | current system hour | 14 |
%n | current system minute | 55 |
%pd | product day | 12 |
%ph | product hour | 14 |
%pn | product minute | 32 |
%T | product type | FPUS5 |
%t | product type (lower case) | fpus5 |
%L | product locale | KIND |
%l | product locale (lower case) | kind |
%D | data_path resource | |
%C | con_path resource | |
%R | raw_path resource | |
%G | grid_path resource | |
%T | text_path resource | |
%M | model_path resource | |
%S | sat_path resource |
Some of the above wildcards can be preceded with a number. For dates, the number is a modifier which rounds down to the nearest value which is a multiple of that number. For example, "%6h" would round down to the nearest 6 hour boundary. For the previous example, it results in the value 12.
For the product type and locale, this number is used in a substring operation. The first digit of the number is the offset into the string and the second digit refers to the number of characters to use. For example, "%12T" results in "FP". To get "IND", use "%23L".
Header Files
To aid in the parsing of products from a file with multiple products, a header file can be created by the ingest program. This essentially lists the WMO header of each product in the file along with its byte offset into the file. Since most parsing is based on header, it is far easier to search the smaller header file than to parse through the much larger product file.
To produce these files automatically by the ingestor, add the file name convention to the end of the line in the bulletin file:
F[^O] >> %D/%y%m%d%6h_for.wmo %D/%y%m%d%6h_for.hdr
The first name convention listed "%D/%y%m%d%6h_for.wmo
"
is the filename where the actual product is saved. The second name convention
"%D/%y%m%d%6h_for.hdr
" is where the header file
information is saved. The syntax of the file is as follows:
offset header / extra offset header / extra ....
where:
- offset - is the byte offset into the file,
- header - is the product header in its entirety is listed after the offset
- extra - extra information about the product which is normally the AWIPS header
A sample from a forecast data header file:
4297 WGUS83 KLSX 091700 / FLSLSX 5636 WUUS54 KLZK 091701 / SVRLZK 7084 WWUS86 KSEW 091702 / SABSAS 9365 WWCN02 CYTR 091701 / WEATHER WARNING NUMBER 018 UPDATED FOR PETAWAWA BY THE MSC WEATHERI
For more information on header files, see the section on header files.
# Pattern Action Filename Header Filename # S[AP] >>-15 %D/%y%m%d%h_sao.wmo S[IMNS] >>-05 %D/%y%m%d%h_syn.wmo SD >>+07 %D/%y%m%d%h_rad.wmo U[^AB] >>-65 %D/%y%m%d%12h_upa.wmo ASUS1_ >> %D/%y%m%d%3h_frt.wmo WWUS40 >> %D/%y%m%d%6h_wws.wmo FO >> %D/%y%m%d%12h_mod.wmo %D/%y%m%d%12h_mod.hdr A >> %T/%y%m%d%6h_sum.wmo %T/%y%m%d%6h_sum.hdr C >> %D/%y%m%d%6h_cli.wmo %D/%y%m%d%6h_cli.hdr W >> %T/%y%m%d%6h_sev.wmo %T/%y%m%d%6h_sev.hdr # # Specific forecast products # FXUS01 > %T/fore/48hr FXUS02 > %T/fore/3-5d_Hem FPUS53_KIND | wmoparse - -ph=FPUS53_KIND -id=%%INZ029 -pa=dollar -of=%T/fore/laf_zone -me=none *_KIND >> %T/Indy/%m%d.dat # # Model products # Y/M84 >> %M/%y%m%d%6h_nam.grb %M/%y%m%d%6h_nam.hdr Y/M77G211 >> %M/%y%m%d%6h_gfus.grb %M/%y%m%d%6h_gfus.hdr Y/M81G211 >> %M/%y%m%d%6h_gfus.grb %M/%y%m%d%6h_gfus.hdr
Program Output
The default output of the wmoingest program is to reformat the products, removing the control character sequence and formatting the header and product as follow:
** header *** product ** header *** ....
This allows the ingestor to reparse data ingested by the WXP ingestor to increase granularity of data files. For example, you may want to take the forecast files from the initial ingest and parse for products out of KIND.
When the ingest program is running, it will display a list of the products being ingested from the input files/data feed. The selected product's header will be preceded by "**" and the discarded products will be preceded by "--". The action and the output file will also be displayed:
** 144 SRUS35 KWOH 091703 / RRSBTV *** 07:41:34 Append to: /home/wxp/testdata/wmo/data/06030917_rvr.wmo ** 145 SXUS26 KWOH 091703 / RRSLMK *** 07:41:34 Append to: /home/wxp/testdata/wmo/text/06030917_sfc.wmo ** 146 FTHO31 MHTG 091643Z / TAF MHTG 091625Z 091818 07012KT 9999 FEW *** 07:41:34 Append to: /home/wxp/testdata/wmo/data/06030917_term.wmo ** 147 SXUS55 KWOH 091703 / RRSBOU *** 07:41:34 Append to: /home/wxp/testdata/wmo/text/06030917_sfc.wmo ** 148 HTNJ20 EGRR 091200 / GRIB1 S74 M45 G42 T60 L100 H200 V11 *** 07:41:34 Append to: /home/wxp/testdata/wmo/model/06030912_egr.grb ** 149 HTJJ25 EGRR 091200 / GRIB1 S74 M45 G38 T60 L100 H250 V11 *** 07:41:34 Append to: /home/wxp/testdata/wmo/model/06030912_egr.grb ** 150 HTOJ20 EGRR 091200 / GRIB1 S74 M45 G43 T60 L100 H200 V11 *** 07:41:34 Append to: /home/wxp/testdata/wmo/model/06030912_egr.grb ** 151 HTKJ20 EGRR 091200 / GRIB1 S74 M45 G39 T60 L100 H200 V11 *** 07:41:34 Append to: /home/wxp/testdata/wmo/model/06030912_egr.grb ** 152 HTPJ20 EGRR 091200 / GRIB1 S74 M45 G44 T60 L100 H200 V11 *** 07:41:34 Append to: /home/wxp/testdata/wmo/model/06030912_egr.grb ** 153 HTIJ20 EGRR 091200 / GRIB1 S74 M45 G37 T60 L100 H200 V11 *** 07:41:34 Append to: /home/wxp/testdata/wmo/model/06030912_egr.grb ** 154 HTJJ20 EGRR 091200 / GRIB1 S74 M45 G38 T60 L100 H200 V11 *** 07:41:34 Append to: /home/wxp/testdata/wmo/model/06030912_egr.grb ** 155 HTLJ20 EGRR 091200 / GRIB1 S74 M45 G40 T60 L100 H200 V11 *** 07:41:34 Append to: /home/wxp/testdata/wmo/model/06030912_egr.grb ** 156 SDUS34 KLCH 091659 / NVWPOE *** 07:41:34 Write to: /home/wxp/testdata/wmo/nids/POE/0603091659_nvw.nid ** 157 SDUS35 KSLC 091700 / N3SMTX *** 07:41:34 Write to: /home/wxp/testdata/wmo/nids/MTX/0603091700_n3s.nid ** 158 SDUS26 KSTO 091702 / N1RDAX *** 07:41:34 Write to: /home/wxp/testdata/wmo/nids/DAX/0603091702_n1r.nid
If the product contains GRIB data, the GRIB header is decoded to give further information about the product. In the above example, the output will list the type of GRIB product (GRIB1 or GRIB2) plus grid specific data decoded from the product.:
Output Files
The ingest program reformats the products when it saves them to file. First it strips the bulk of the control characters out of the file. This is to allow text editors and word processors to be able to read in and process the data. In replacing the control characters, the ingest program delimits headers with asterisks "**".
** header *** product ** header *** ....
A sample of a output file is:
** WWUS54 KJAN 091659 *** SVSJAN SEVERE WEATHER STATEMENT NATIONAL WEATHER SERVICE JACKSON MS 1059 AM CST THU MAR 9 2006 ARC003-LAC067-091745- /O.CON.KJAN.SV.W.0046.000000T0000Z-060309T1745Z/ ASHLEY AR-MOREHOUSE LA- 1059 AM CST THU MAR 9 2006 ...SEVERE THUNDERSTORM WARNING CONTINUES UNTIL 1145 AM CST FOR EASTERN ASHLEY COUNTY...AND MOREHOUSE PARISH... AT 1054 AM CST...NATIONAL WEATHER SERVICE DOPPLER RADAR CONTINUED TO INDICATE A LINE OF SEVERE THUNDERSTORMS CAPABLE OF PRODUCING PENNY SIZE HAIL...AND DESTRUCTIVE WINDS IN EXCESS OF 70 MPH. THESE STORMS WERE LOCATED ALONG A LINE EXTENDING FROM 7 MILES SOUTH OF HAMBURG TO 6 MILES WEST OF BONITA...MOVING NORTHEAST AT 60 MPH. ... ** WWUS54 KSHV 091700 *** SVSSHV SEVERE WEATHER STATEMENT NATIONAL WEATHER SERVICE SHREVEPORT LA 1059 AM CST THU MAR 9 2006 LAC073-091715- /O.CON.KSHV.SV.W.0043.000000T0000Z-060309T1715Z/ ...
PAN (Product Arrival Notices) Messages
Product arrival notices are sent at the completion of a product to a specified PAN receiving program. The PAN receiver will use this message to trigger an action based on the arrival of that product. For example, a PAN receiver might be interested in the arrival of severe thunderstorm warning messages so it can warn the user. The PAN message is broadcast over a socket using a UDP transmission. This is a connectionless process where the PAN is sent to a specific address and port and it is up to the PAN receiver to be active and waiting for the message using a receive from call.
The PAN message is sent as a single line of information for each product received by the ingestor. The information in the PAN message is broken up into fields delimited by a bar "|":
ID|Server|###|YYYYMMDDhhmmss|WMO/Extra|Filename|Offset|Size
Fields:
- ID - Message Type ID (901 for NOAAPORT)
- Server - NOAAPORT server number which uniquely identifies server (0-99)
- ### - Sequence number from server (0-999). Increments by one for each PAN sent from that server. It cycles through numbers 0 to 999 and back to 0.
- YYYYMMDDhhmmss - Timestamp of when product is sent from WXP ingestor.
- WMO/Extra - WMO Header plus additional information. For text products, this is the first 20 bytes of the product (newlines and unprintables changed to spaces). This will often contain the AWIPS header. For GRIB products, this is the decoded header information from the GRIB Product Definition Block (see data listing above for syntax).
- Filename - Filename including full path. This is the filename that the WXP ingestor saved the product to. NOTE: This filename and path may be different from the filename and path you need to access the data. If the data is mounted on an NFS drive, the appropriate NFS path will need to be substituted for the path listed here.
- Offset - Byte offset of product header in file. This is the exact location (first byte in file is 0) of the start of the product header. An fseek using this number is all that is needed to locate the product.
- Size - Size in bytes of product from header to end of product including any leading or trailing blank lines
Examples:
901|46|240|20110825081954|FTHO31 MHTG 091643Z / TAF MHTG ... |/home/wxp/data/wmo/data/06030917_term.wmo|18151|493|
- 901 - identifies WXP PAN message
- 45 - identifies local server
- 240 - is the sequence number
- 20110825081954 - Date product arrived on server and PAN message sent (depends on server time). It arrived at 08:19:54Z on 25 AUG 2011
- FTHO31 MHTG 091643Z- WMO header
TAF ... - AWIPS header or first line of data - /home/wxp/data/wmo/data/06030917_term.wmo - server filename where product is located. Each file can contain more than one product
- 18151 - byte offset in file
- 493 - product size in bytes
901|46|245|20110825081954|HTKJ20 EGRR 091200 / GRIB1 S74 M45 G39 T60 L100 H200 V11|/home/wxp/data/wmo/model/06030912_egr.grb|2868632|3709|
- 901
- 45 - identifies local server
- 245 - is the sequence number
- 20110825081954 - Date product arrived on server and PAN message sent
- HTKJ20 EGRR 091200 / GRIB1 S74 M45 G39 T60 L100 H200 V11 KWBD 281200 PAA
- WMO header plus GRIB information
The GRIB infoi shows the product as a UKMET model grid, grid type is 39, 60 hour forecast of 200 mb temperature. - /home/wxp/data/wmo/model/06030912_egr.grb - filename
- 2868632 - byte offset in file
- 3709 - product size in bytes
PAN Message Setup
To set up the WXP ingestor for PAN messages the following pieces of information must be added to the "ingest.prd" file. At some point in the file, a PAN configuration line must be added.
# PAN Setup @PAN id=45 sock:datasvr:5566 sock:devsvr:5000 pan.log
The "@PAN" is a keyword in the bulletin file for the PAN configuration line. The "id=45" specifies the NOAAPORT unique server ID which is broadcast as field 2 in the PAN message. The rest of the line lists destinations. The "sock" keyword specifies the PAN go over a UDP socket. The string "datasvr:5566" is the hostname of the destination computer and the port number. If the sock keyword is omitted, the PAN is save to the listed filename such as "pan.log". Up to 40 destinations can be listed. Each destination is addressed starting with 0 and going to 9 in the order listed on the PAN line for the first 10 hosts. Additional PAN destinations can be specified on following lines.
By default, no PAN messages are sent even if the PAN line is added to the bulletin file. To enable PAN messages, the "P" flag must be added to the action for each product being saved on the server. For example a product line would look like:
# Pattern Action Filename Header Filename
FT >> %D/%y%m%d%h_term.wmo %D/%y%m%d%h_term.hdr
To enable this product type for PAN messages, add the "P" flag to the action.
FT P>> %D/%y%m%d%h_term.wmo %D/%y%m%d%h_term.hdr
This will send a PAN message to all listed destinations whenever this products is received. If you don't want to send a PAN to all destinations, the destination IDs can be listed:
FT P035>> %D/%y%m%d%h_term.wmo %D/%y%m%d%h_term.hdr
In this case, PAN messages will only be sent to the 0, 3 and 5th destinations.
Log Files
The wmoingest program logs appropriate information in a log file. By default, this file is named "ingest.log" and is put in the file_path directory. The program logs when ingest starts and stops, lists all unselected products and any other errors and warnings.
11 AUG 25 07:36:53 : Starting ingest (ver: 6.65-LINUX-X11) 11 AUG 25 07:36:53 : Reading products file: /home/wxp/etc/ingest.prd 11 AUG 25 07:36:53 : Ingesting file: ../../testdata/wmo/0603091700.dmp 11 AUG 25 07:36:55 : Terminating ingest
The log file name can contain name convention wildcard characters such as "/home/wxp/logs/noaa-%m%d.log" where the %m and %d are replaced with the month and day so that log files are generated for each day the ingestor is running.
Terminating Ingest
Ingest may be stopped by either hitting control-C on the keyboard or sending it a kill command. When this occurs, the ingestor will wait until a full product is received before terminating.
EXAMPLES
FILES
- ingest.prd - Product file which specifies which products are to be saved. This can be specified with the resrc:prod_file resource.
- ingest.log - Log file records important information from the ingest process.
SEE ALSO
Last updated October 2013