PARSE
Sections
NAME
parse - Text data parsing program
SYNOPSIS
parse [parameters...] filename
PARAMETERS
Command Line | Resource | Default | Description |
-h | help | No | Lists basic help information. |
-df=filename | default | .wxpdef | Sets the name of the resource file. |
-na=name | name | parse | Specifies the name used in resource file parsing. |
-ba | batch | No | Run program in batch mode |
-me=level | message | out2 | Specifies level of messages to be displayed
|
-fp=filepath | file_path | current directory | Specifies location of database files. |
-dp=datapath | data_path | current directory | Specifies the location (path) of the input raw data files. This may be modified in the name convention file. |
-nc=name_conv | name_conv | name_conv | The name convention file specifies how files are named in WXP. This sets which name convention file to use. |
-if=in_file | in_file | raw_dat | Specifies the input file name tag. The default tag is raw_dat but will need to be modified for most applications. This can be determined from the product header if possible. Otherwise, it must be explicitly specified. |
-cu=[hour|la] | current | None | This specifies to use current data files. The current filename is based on the name convention. An optional hour can be specified for older data. If la is specified, the program will search back to find the most recent available file. |
-ho=hour | hour | None | This resource specifies the exact hour that a data file is valid for. This locks in the start hour for a multi-file sequence. |
-nh=num_hour | num_hour | 0 | This specifies the number of hours that will be used. If this is not specified, a single hour will be parse. Otherwise a set of hours will be parsed. |
-ph=product | product | User Prompt | This specifies the product header to search for. |
-id=identifier | identifier | None | Specifies the station to parse for. This is a string within a product that printing will start with. If this is not specified, the whole product will be displayed. |
-pa=param | parameter | None | Specifies additional plotting parameters. See the parameter resource for more details. Some
possibilities are:
|
filename[#seq] | filename | None User Pompt Batch: current=la |
The name of the surface data file to be used. An optional sequence number can be added to designate the time for non-WXP files. |
DESCRIPTION
This program parses text data for a specific product and identifier. The input to the program is a raw ingested data file. The type of data file can be determined either from the product header or the in_file resource. When a product is specified, it is cross-referenced against the parse.lup file to determine a file name tag to use. A sample of this lookup file is:
W sev_dat F for_dat C cli_dat ...
If a product does not exactly match what is in the lookup file, a tag can be specified with the in_file resource.
The programs starts off by prompting the user for input data file name. The user may specify the input file either via the command line of through the current resource. This will depend on the type of file either specified by the in_file resource or the product header.
Next, the user enters a product header. The header can have wildcard characters to parse for multiple product types:
. or ? | match a single character |
- or * | match any character |
[letters] | match a single character from the set. |
[^letters] | match any character except those from the set. |
(str1[|str2...]) | match strings |
_ | underscore matches a space. |
/secondline | second line parsing |
Second line parsing is also possible. For many products, the second line of the product is the AWIPS header:
** FPUS1 KIND 022030 *** SFPIN
which is this case is "SFPIN". To parse for this, specify either "FPUS1_KIND
"
or "/SFPIN
".
If "all" is specified, all bulletins are searched.
Once the product header has been specified, the file will be opened and all products matching the given header will be displayed in their entirety.
Selective Output
At times, the entire product is not desirable. By using a combination of the identifier resource and various output parameters, specific subsets of products can be displayed. By specifying a station identifier, the printing will start on a line that contains the identifier. Once an identifier is found, printing will continue until the end of product, unless otherwise specified. The identifier can be:
- string -- matches a string at the beginning of the line only
- +string -- matches if the string is contained anywhere within the line
- %zone -- matches if the zone matches a standard zone line (ie INZ029)
- zn:zone -- matches the zone
- ua:id -- matches a upper air ID
Printing normally continues to the end of product. To terminate it earlier, use one of the parameters in the parameter resource:
- blank -- stop parsing at a blank line
- 3blank -- stop parsing after 3 blank lines
- dollar -- stop parsing at a dollar sign
- equal -- stop parsing on a trailing equals sign
- line[=lines] -- stop parsing after set number of lines (default 1)
Since more than one product can appear, it may be desirable to use only the first or last occurrence. Since products are continually appended to data files, it may be desirable to continue parsing even when the program has hit the end of file. This way the latest products will be printed as they are ingested. Additional parameters are available for these cases:
- first -- print only the first occurrence
- last -- print only the last occurrence
- cont -- keep file open to search for new products as they arrive
Header Files
The use of a header file can considerably improve access to data files. Rather than parsing the entire file which at times is larger than 1MB, the product headers can be parsed directly out of a header file. Header files are much smaller and parse very fast. The header file contains a byte offset into the large file.
EXAMPLES
To parse for the latest state forecast from KIND
parse -cu -nh=-12 -ph=FPUS1_KIND -pa=last
** FPUS1 KIND 022030 *** SFPIN INZ002>089-031000- STATE FORECAST FOR INDIANA NATIONAL WEATHER SERVICE INDIANAPOLIS IN 330 PM EST THU OCT 2 1997 .TONIGHT...FAIR AND WARMER. LOWS 50 TO 55. .FRIDAY...MOSTLY SUNNY...BREEZY AND WARMER. HIGHS 80 TO 85. .FRIDAY NIGHT...BECOMING MOSTLY CLOUDY. A CHANCE OF THUNDERSTORMS. LOWS IN THE LOWER 60S. .SATURDAY...MOSTLY CLOUDY...BREEZY AND A CHANCE OF THUNDERSTORMS. WARM. HIGHS MIDDLE 70S TO AROUND 80. .EXTENDED FORECAST... .SUNDAY AND MONDAY...MOSTLY CLEAR AND WARM. LOWS MIDDLE 50S TO AROUND 60. HIGHS UPPER 70S TO LOWER 80S. .TUESDAY...PARTLY CLOUDY AND MILD. LOWS AROUND 50 TO MIDDLE 50S. HIGHS IN THE 70S. DS
To parse for the latest state forecast using the AFOS PIL. Note the in_file is specified since the product header does not appear in the parse.lup file.
parse -cu -nh=-12 -if=for_dat -ph=/SFPIN -pa=last
To parse for the latest zone forecast
parse -cu -nh=-12 -ph=FPUS53_KIND -id=%INZ029 -pa=dollar,last ** FPUS53 KIND 022040 COR *** INZ020>023-028>030-035-036-043-044-051-052-060-067-030930- CARROLL-CASS-CLAY-CLINTON-FOUNTAIN-KNOX-MIAMI-MONTGOMERY-PARKE- SULLIVAN-TIPPECANOE-VERMILLION-VIGO-WARREN-WHITE- INCLUDING THE CITIES OF...CRAWFORDSVILLE...FRANKFORT...LAFAYETTE... LOGANSPORT...TERRE HAUTE...VINCENNES 330 PM EST THU OCT 2 1997 .TONIGHT...PARTLY CLOUDY AND WARMER. LOW IN THE MIDDLE 50S. SOUTHWEST WIND 5 TO 10 MPH. .FRIDAY...MOSTLY SUNNY AND WARMER. HIGH 80 TO 85. BREEZY SOUTHWEST WIND 15 TO 20 MPH. .FRIDAY NIGHT...BECOMING MOSTLY CLOUDY. A 40 PERCENT CHANCE OF THUNDERSTORMS. MILD. LOW IN THE LOWER 60S. .SATURDAY...MOSTLY CLOUDY...BREEZY AND A 40 PERCENT CHANCE OF THUNDERSTORMS. MILD. HIGH IN THE UPPER 70S.
FILES
- parse.lup - the parsing lookup file between headers and file name tags
SEE ALSO
- forecast - the forecast parsing program
Last updated Oct 2, 1997