NAME
    ghcn_fetch.pl - Fetch station and weather data from the NOAA GHCN
    repository

VERSION
    version v0.22.258

SYNOPSIS
        ghcn_fetch.pl [-gui] [-optfile <filespec>]

        ghcn_fetch.pl [<report_type>]
                [-country <str>] [-state <str>] [-location <str>] [-gsn]
                [-gps "<lat> <long>" [-radius <n>] ]
                [-range <str>] [-active <str> [-partial]] [-quality <pct>]
                [-fmonth <str>] [-fday <str>]
                [-anomalies] [-baseline <str>] [-precip] [-tavg] [-nogaps]
                [-kml <filespec> [-color <str>] ]
                [-dataonly] [-nonetwork <int>] [-performance] [-verbose]
                [-outclip]
                [-report <report_type>]

            <report_type> ::= id | daily | monthly | weekly | ""

        ghcn_fetch.pl -readme

        ghcn_fetch.pl -help

        ghcn_fetch.pl -usage | -?

DESCRIPTION
    Fetch data from the NOAA GHCN database and output as tab-separated
    lines. Various options are provided to allow filtering of the NOAA
    stations by country, state, location name, year range, station active
    year range, etc. When no report type is provided, or -report is an empty
    string, the output is simply a list of the selected stations.

    If report type 'daily', 'monthly' or 'yearly' is given, then the pages
    for the selected stations are scanned and the data from them aggregated
    and output as one row per designated period. This is followed by the
    station list.

    If report type 'id' is given, then the daily data for each selected
    station id is reported, followed by the station list.

    The report type can be abbreviated; e.g. d or da for daily. The report
    type can be provided as the first argument, or it can be provided via
    the -report option anywhere within the argument list.

    In general it's best to narrow your filter criteria as much as possible
    otherwise it will take a very long time to load and process the station
    pages. A good strategy is to omit the -report option so you can see how
    many stations will be queried before asking for any detailed data. Then
    you can adjust the number of stations using other filters.

    If no options are given, and stdin isn't receiving from a pipe or a
    file, then -gui is assumed. This launches a dialog to provide a
    user-friendly way to set options, and to save and reload them (if
    -optfile is provided).

PARAMETERS
    Getoptions::Long is used, so either - or -- may be used. Parameter names
    may be abbreviated, so long as they remains unambiguous.

  Report Types
    Data obtained from the GHCN database can be reported at various levels
    of aggregation using the -report option. The string value given to
    -report specifies the type and level of aggregation. Abbrevations are
    permitted.

    -report station
        Generate a list of the stations which match the criteria provided
        (location, geo coordinates, ranges etc.) This is the default when no
        report type is requested. No actual weather data is accessed; only
        station data.

    -report daily
        Scan the NOAA station pages that meet all the selection criteria and
        aggregate the data from them by year, month and day. Output the
        results as a tab-separated table suitable for import into Excel for
        analysis.

        TMAX (temperature maximum) is aggregated by maximum; TMIN by
        minimum; TAVG values are averaged. Note that while most stations
        track TMAX and TMIN, a lot fewer track TAVG. When TAVG is missing, a
        proxy is calculated by averaging TMAX and TMIN.

    -report monthly
        Same as -daily except the output is summarized to the month level.
        Note that with this option, TAVG is average across days of the month
        and may of limited usefulness. Avg will be calculated as the average
        of the max and min for the month, which is what is typically used as
        the measure for monthly average temperature.

    -report yearly
        Same as -daily except the output is summarized to the year level.
        See the explanation of TAVG vs Avg on -monthly.

    -report id
        Break the selected aggregation level down by station id and include
        the station id in the output. This is like -daily, but with a
        separate set of rows for each station id.

  Station Filter
    A list of station id's can be provided via stdin, and will be used in
    lieu of other filtering criteria. Each line of input will be searched
    for one or more station id's.

  Geographic Filters
    -country <str>
        Filter the station list to include only those from a specific
        country. The string can be a 2-character GEC (formerly FIPS) country
        code, a 3-character UN country code, or a 3-character internet
        country code (including the dot). Longer strings are treated as a
        pattern and matched (unanchored) against country names.

        NOAA uses GEC codes in their database. For a full list of country
        codes and names see
        <https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-countries.txt>
        and
        <https://www.cia.gov/library/publications/the-world-factbook/appendi
        x/appendix-d.html>

    -state <str> (or -province)
        Filter the station list to include only those within the specified
        2-character US state or Canadian province code.

    -location <str>
        Filter the station list to include only those whose name matches the
        specified pattern. For a starts-with match, prefix the pattern with
        ^ (or \A). For an ends-with match, suffix the pattern with $ (or
        \Z).

        You can also specify a station id (e.g. CA006105978) or a
        comma-delimited list of station id's (e.g. CA006105978,USC00336346).

        As a handy shortcut, mappings between user-defined names and a
        station id or id list can be defined in the locations section of
        .ghcn_fetch.yaml.

    -gsn
        Select only GCOS Surface Network stations, which is a baseline
        network comprising a subset of about 1000 stations chosen mainly to
        give a fairly uniform spatial coverage from places where there is a
        good length and quality of data record. See
        "/www.ncdc.noaa.gov/gosic/global-climate-observing-system-gcos/g
        cos-surface-network-gsn-program-overview" in https:

    -gps <latitude>,<longitude>
        Filter the station list to include only those stations that are
        within -radius kilometers (default 25) of the specified decimal
        latitude and longitude values; e.g. 45.3822 -75.7167. The two value
        can be delimited by spaces, or any punctuation character (e.g.
        comma). If a space is used, the string must be enclosed in quotes.

    -radius <int>
        Specify the radius, in kilometers, to be used for the -gps option.

    Date Filters
    -range <str>
        Only include data from the specified range of years. The range is
        given as a string such as 1990-2018. Any punctuation character can
        be used to separate the two years. A single year may also be given.
        Alternatively, two discontiguous years can be given by separating
        the years with a comma (e.g. -range 1919,2019), although this
        feature cannot be combined with -active and with -anomalies.

        Note that if -active is specified, then -range must be a subset of
        -active since there's no point in asking for data that lies outside
        the active range of data collection for a station.

    -active <str>
        Only include data from stations which have been fully active within
        the specified range. The range is given as a string such as
        1990-2018. Any punctuation character can be used to separate the two
        years. A single year may also be given.

        Instead of a year range, you can use an empty string to set the
        active range to match the range specified by -range.

    -partial
        The -partial option can be used in conjunction with -active to
        include stations that were only active during part of the active
        range.

    -quality <int>
        Only include stations which have <int>% days of unflagged data
        within -range. If -anomalies is given, the number of days within the
        -baseline range is also checked against <int>%. The default value
        for -quality is 90, meaning that 90% of the days found within -range
        (and -baseline) must be present and unflagged in order for the
        station's data to be included in the output.

    -fday <str>
        Filter the data so that it includes only the days of the month which
        match the specified range list; e.g. 5-10,20.

    -fmonth <str>
        Filter the data so that it includes only the months of the year
        which match the specified range list; e.g. 1-3,7-9 would select
        Jan-Mar and Jul-Sep.

  Analysis Options
    -anomalies
        Calculate the mean temperature anomalies for each day at each
        station relative to a baseline year range (see -baseline). Include
        these in the output.

    -baseline <str>
        Use the date range <str> to compute anomalies. Default 1971-2000.

    -precip
        Include precipitation measures in the output, specifically SNOW,
        SNWD (snow depth), ans PRCP (all precipitation). Values are in cm.
        Like TMAX, SNWD is the maximum depth recorded across stations and
        across time. The others are averaged across stations and then summed
        across time. In other words, if -year is used you get the maximum
        snow depth for the year, and the total accumulation of snow and
        precipitfor the year.

    -tavg
        Include TAVG (average daily temperature) in the output. TAVG will be
        averaged across stations and also across months or years if -monthly
        or -yearly is given.

    -nogaps
        For report 'id', generate rows for those months and days where data
        is missing. This enables charting with a complete time x-axis.
        Without it, large gaps result in horizontal compression of the chart
        and a distorted picture across time.

  Kml Options
    -kml <filespec>
        Output the coordinates of the selected stations as a KML file, for
        import into Google Earth as placemarks. The active range of each
        station will be included as timespans so that you can view the
        placemarks across time.

    -color <color> (or -colour)
        Color of the KLM placemark pushpins. Acceptable values are red,
        green, blue, azure, purple, yellow and white. May be abbreviated
        down to one letter. Default is red.

  Output Options
    -outclip or -o
        Send output to the Windows clipboard. (Windows only)

  Misc Options
    -dataonly
        Print only the data table. Other information, including notes, lists
        of stations kept and rejected, and statistics are suppressed.

    -nonetwork <int>
        Set the NoNetwork option used in URI::Fetch in order to alter the
        behaviour of caching.

        By default, -nonetwork is set to -1, which sets the NoNetwork option
        of URI::Fetch to the number of seconds in the current year at the
        time the script is run. This means that the HTTP server is not
        contacted if the page is in cache and the cached page was inserted
        sometime within the present year. If the cached copy is older than
        this year, then a normal HTTP request (full or cache check) is done.

        If -nonetwork is set to 0 and the requested page is found in the
        cache, the HTTP server is checked for a fresher copy.

        If -nonetwork is set to 1, the HTTP server is never contacted,
        regardless of the page being in cache or not. If the page is missing
        from cache, the fetch method will return undef and the script with
        die. If the page is in cache, that page will be returned, no matter
        how old it is. This is useful for situations where the NOAA HTTP
        server is slow or offline and the desired data is available in the
        cache.

    -performance
        Include performance statistics in the output. This includes some
        extra timing information (labelled "(internal)" in the Time
        Statistics list because they are internal to the other timing
        metrics) as well as statistics for the memory consumption of the
        Data hash table. Also some memory statistics are added to some
        Timing subjects.

    -verbose
        When given, warning messages about missing data are displayed to
        stderr.

  Command-Line Only Options
    -gui
        Launch a graphic user interface that can be used to set options. Not
        available unless modules Tk and Tk::Getopt are installed.

    -optfile <filespec>
        Designate a file to be used to save or load options.

    -readme
        Launch the default web browser and display the NOAA Daily Readme.txt
        file, providing a description of the Daily data files and station
        data.

    -h | -help
        Display this documentation.

    -usage | -?
        Display the Synopsis section of this documentation.

CONFIGURATION FILE
    At startup, ghcn_fetch will look for the file .ghcn_fetch.yaml in either
    %UserProfile% (Windows) or $HOME (unix/linux) in order to capture some
    additional options. The file content should contain something like this:

        ---
        cache:
            root: C:/ghcn_cache
            namespace: ghcn

        aliases:
            yow: CA006106000,CA006106001    # Ottawa airport
            cda: CA006105976,CA006105978    # Ottawa CDA and CDA RCS
            center: USC00326365             # geographic center of North America

    Supported options are:

    cache:
        This section defines the cache_root and namespace options for
        URI::Fetch. If present, then any pages which are fetched from the
        NOAA GHCN repository are cached in the folder and subfolder
        designated by root: and namespace:. This vastly improves the
        performance of subsequent invocations of ghcn_fetch, especially when
        using the same station filtering criteria.

    aliases:
        This section provides a list of shortcut names that are mapped to
        station id's or id-lists and which can be used in the -location
        option. If a -location value matches a key in this section, the
        station id or id-list is substituted. Note that keys must be
        lowercase letter only, with or without a leading underscore.

RELATED SCRIPTS
    Additional scripts are provided for data analysis. These scripts are
    designed to take the output ghcn_fetch.

    For Windows users, a -outclip option directs the tab-separated output to
    the Windows clipboard, so it can be pasted into Excel for analysis using
    PivotTable and PivotChart. Alternatively you can use the usual '>'
    method to direct the output to a file.

    ghcn_extremes.pl
        Report patterns of temperature extremes (heatwaves or coldwaves) by
        analyzing daily temperature records and looking for consecutive days
        of extreme temperatures; e.g.

            ghcn_fetch -country CA -report daily | ghcn_extremes > extremes.tsv

    ghcn_station_counts.pl
        Report the station counts per year for a list of stations generated
        by this script using -report stations (which is the default -report
        option); e.g.

            ghcn_fetch -country CA | ghcn_station_counts > stn_counts.tsv

AUTHOR
    Gary Puckering (jgpuckering@rogers.com)

LICENSE AND COPYRIGHT
    Copyright 2022, Gary Puckering

