Add waterdata.get_ratings for USGS stage-discharge ratings via STAC by thodson-usgs · Pull Request #269 · DOI-USGS/dataretrieval-python

thodson-usgs · 2026-05-06T15:11:59Z

Adds a Python wrapper around the new USGS Water Data STAC catalog (announced at https://waterdata.usgs.gov/blog/wdfn-rating-curves/) for stage-discharge rating curves. Mirrors R's read_waterdata_ratings.

Why a new module

The rating-curve catalog is exposed via STAC (api.waterdata.usgs.gov/stac/v0/search) rather than the OGC collections that back every other waterdata getter. Two transport-layer differences from OGC:

Filter syntax — STAC takes a CQL2 string in a filter query param. The composer mirrors R's logic: only pin file_type server-side when a single value is requested; multi-type requests fetch all matches and filter URLs client-side.
Download path — STAC features point at a per-rating .rdb asset on stac-files/. The function downloads each matching asset to a user-supplied file_path (default: tempfile.mkdtemp()) and parses it with the existing nwis._read_rdb helper.

Because of these differences, the implementation lives in dataretrieval/waterdata/ratings.py rather than api.py.

Live API examples (verified)

from dataretrieval.waterdata import get_ratings

# Default: exsa ratings for two sites
ratings = get_ratings(
    monitoring_location_id=["USGS-01104475", "USGS-01104460"],
    file_type="exsa",
)
# {'USGS-01104475.exsa.rdb': DataFrame(2001, 4), 'USGS-01104460.exsa.rdb': DataFrame(...)}
ratings["USGS-01104475.exsa.rdb"].head()
#    INDEP  SHIFT  DEP  STOR
# 0   0.00   0.00  0.0   NaN
# 1   0.01  -0.01  0.0   NaN
# 2   0.02  -0.02  0.0   NaN

# Both exsa and corr for the same two sites — 4 tables returned
ratings = get_ratings(
    monitoring_location_id=["USGS-01104475", "USGS-01104460"],
    file_type=["exsa", "corr"],
)

# Bounding-box "what's available?" without downloading
features = get_ratings(
    bbox=[-95.0, 40.0, -92.0, 42.0],
    download_and_parse=False,
)
# 59 STAC feature dicts in this region

Error paths surfaced upfront

get_ratings(monitoring_location_id="USGS-01104475", file_type="bogus")
# ValueError: Invalid file_type ['bogus']. Valid options: ['exsa', 'base', 'corr'].

get_ratings(monitoring_location_id="USGS-01104475", datetime="P7D")
# ValueError: ISO 8601 durations (e.g. 'P7D') are not supported in `datetime`
#             for the rating-curve service. Provide a date or interval instead.

The duration check matches a constraint of the upstream rating-curve service that R also enforces client-side.

Test plan

9 unit tests in tests/waterdata_ratings_test.py:
- Filter composition (single site, multi site, no site, empty).
- Two error paths (invalid file_type, ISO 8601 duration in datetime).
- End-to-end search + download via requests_mock (single-site happy path).
- download_and_parse=False returns the raw feature list.
- Multi-file_type correctly omits file_type from the server filter and filters URLs locally.
Live verification against api.waterdata.usgs.gov/stac/v0/search for all three demos shown above.

Two non-functional follow-ups suggested during review of DOI-USGS#269: (1) Document the cross-module reach into nwis._read_rdb. Rating files use the same USGS RDB shape as NWIS responses, so the parser is already reusable as-is — no refactor of the legacy nwis module is needed. Added a comment at the import site explaining why the private import is intentional and what to watch for if _read_rdb ever moves. (2) Surface the RDB #-prefixed header block. Each parsed rating frame now carries provenance in df.attrs: - df.attrs["comment"]: the list of "#"-prefixed header lines (rating id, parameter, expansion type, last-shifted timestamp, warnings, etc.). - df.attrs["url"]: the asset URL it was fetched from. R's read_waterdata_ratings exposes the comment block via comment(df); pandas's standard `attrs` dict is the Python equivalent. Done in ratings.py only — does not touch nwis. A live spot-check against api.waterdata.usgs.gov on USGS-01104475 exsa shows the 31-line USGS header survives intact (gauge name, parameter code, rating expansion, etc.). One new unit test pins the behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Six fixes from a multi-agent review pass on PR DOI-USGS#269: 1. Narrow the per-feature except handler from a bare Exception to (RequestException, ValueError, OSError). The previous catch-all would swallow programming bugs (KeyError on a malformed feature, AttributeError, ...) and silently drop rows. 2. Escape single quotes in CQL string literals via the standard doubling rule. Most monitoring-location IDs cannot contain a quote, but the function takes arbitrary strings — defending against malformed filters and potential injection regardless. New _quote_cql_str helper, applied to monitoring_location_id and file_type. New unit test pins the behaviour. 3. Promote file_type to a Literal["exsa", "base", "corr"] and derive _VALID_FILE_TYPES from it via typing.get_args, so the runtime guard and the type hint can never drift apart. 4. Rename the `datetime` parameter to `time` to match the convention used by every sibling waterdata getter (and to stop shadowing the stdlib `datetime` module). The parameter is still passed through as the STAC `datetime` query string under the hood; that's now documented explicitly. 5. Switch the multi-type local filter from a substring check on the asset URL to feature["properties"]["file_type"]. Substring matching on URLs would false-match if a host or path ever contained one of the literal type names; STAC features carry the typed property already. 6. Skip the on-disk RDB write when file_path is None. tempfile.mkdtemp leaks (no automatic cleanup), and df.attrs["url"] already records the source — so by default we now return only the parsed frame. Users who want a local copy can pass file_path=...; the contract for that path is unchanged. While here, dropped a "discrete analogue to the OGC waterdata getters" sentence from the module docstring (it was internal-architecture WHAT-narrating, not user-relevant guidance). All 11 ratings tests pass (one new test for the CQL quote-escaping). Live verification on USGS-01104475 confirms file_path=None still returns the parsed frame with df.attrs populated, multi-type via property filter returns the expected 4 tables, and the renamed `time=` parameter still drives the STAC datetime filter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wraps the new Water Data STAC catalog endpoint (api.waterdata.usgs.gov/stac/v0/search) for stage-discharge rating curves. Lives in its own module (waterdata/ratings.py) because the transport layer — STAC search + per-feature RDB download — differs from the OGC collections used by the rest of the package. The function: - Composes a CQL filter from monitoring_location_id and (single-value) file_type, mirroring R's logic. Multi-type requests fetch all matches and filter URLs client-side. - Optionally downloads each matching .rdb asset to a user-supplied file_path (default: a fresh tempfile.mkdtemp), and parses with the existing nwis._read_rdb helper. - Returns a dict[id -> DataFrame] when download_and_parse=True, or the raw list of STAC features when False (cheap "what's available?" inspection). Surfaces a clear ValueError for invalid file_type values and for ISO 8601 durations in `datetime` (the rating-curve service rejects them). Mirrors R's read_waterdata_ratings in DOI-USGS/dataRetrieval; example sites and idioms come straight from the R doc. Tests: 9 unit tests covering filter composition, the two error paths, and end-to-end search-and-download via requests_mock (single-site, download_and_parse=False, and multi-type URL filtering). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two non-functional follow-ups suggested during review of DOI-USGS#269: (1) Document the cross-module reach into nwis._read_rdb. Rating files use the same USGS RDB shape as NWIS responses, so the parser is already reusable as-is — no refactor of the legacy nwis module is needed. Added a comment at the import site explaining why the private import is intentional and what to watch for if _read_rdb ever moves. (2) Surface the RDB #-prefixed header block. Each parsed rating frame now carries provenance in df.attrs: - df.attrs["comment"]: the list of "#"-prefixed header lines (rating id, parameter, expansion type, last-shifted timestamp, warnings, etc.). - df.attrs["url"]: the asset URL it was fetched from. R's read_waterdata_ratings exposes the comment block via comment(df); pandas's standard `attrs` dict is the Python equivalent. Done in ratings.py only — does not touch nwis. A live spot-check against api.waterdata.usgs.gov on USGS-01104475 exsa shows the 31-line USGS header survives intact (gauge name, parameter code, rating expansion, etc.). One new unit test pins the behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Six fixes from a multi-agent review pass on PR DOI-USGS#269: 1. Narrow the per-feature except handler from a bare Exception to (RequestException, ValueError, OSError). The previous catch-all would swallow programming bugs (KeyError on a malformed feature, AttributeError, ...) and silently drop rows. 2. Escape single quotes in CQL string literals via the standard doubling rule. Most monitoring-location IDs cannot contain a quote, but the function takes arbitrary strings — defending against malformed filters and potential injection regardless. New _quote_cql_str helper, applied to monitoring_location_id and file_type. New unit test pins the behaviour. 3. Promote file_type to a Literal["exsa", "base", "corr"] and derive _VALID_FILE_TYPES from it via typing.get_args, so the runtime guard and the type hint can never drift apart. 4. Rename the `datetime` parameter to `time` to match the convention used by every sibling waterdata getter (and to stop shadowing the stdlib `datetime` module). The parameter is still passed through as the STAC `datetime` query string under the hood; that's now documented explicitly. 5. Switch the multi-type local filter from a substring check on the asset URL to feature["properties"]["file_type"]. Substring matching on URLs would false-match if a host or path ever contained one of the literal type names; STAC features carry the typed property already. 6. Skip the on-disk RDB write when file_path is None. tempfile.mkdtemp leaks (no automatic cleanup), and df.attrs["url"] already records the source — so by default we now return only the parsed frame. Users who want a local copy can pass file_path=...; the contract for that path is unchanged. While here, dropped a "discrete analogue to the OGC waterdata getters" sentence from the module docstring (it was internal-architecture WHAT-narrating, not user-relevant guidance). All 11 ratings tests pass (one new test for the CQL quote-escaping). Live verification on USGS-01104475 confirms file_path=None still returns the parsed frame with df.attrs populated, multi-type via property filter returns the expected 4 tables, and the renamed `time=` parameter still drives the STAC datetime filter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Eight small changes; net -16 lines. No behavior change other than the duration check now using the project's anchored regex instead of an ad-hoc substring search (small correctness improvement). 1. Reorder so the public `get_ratings` is at the top and helpers follow — matches the convention in `nearest.py`. 2. Reuse `_DURATION_RE` from `waterdata.utils` instead of the ad-hoc `"P" in str(v).upper()` check. The regex is anchored at start and case-correct; the substring check was both too loose (matched any string containing 'p'/'P') and case-wrong (ISO 8601 P/PT are case-sensitive — uppercasing the input would have masked lowercase-p false positives). 3. Extract `_as_list(x)` for the `[x] if isinstance(x, str) else list(x)` pattern, which appeared three times (file_type normalization, monitoring_location_id in `_build_filter`, time values in the duration check). 4. Drop redundant `date=False` on the `_format_api_dates` call (it's the default). 5. Collapse the `if time is not None: ... else: time_str = None` four-line block into a single ternary. 6. Pre-filter `features` by file_type before the download loop, so the loop body is just download-and-handle-errors. The "skip features whose file type wasn't requested" comment goes away because the list-comprehension above is self-documenting. 7. Drop two narrating comments whose WHY is already captured by the helpers' docstrings (`_build_filter` and `_extract_rdb_comment`). 8. Rename `test_get_ratings_multi_type_filters_urls_locally` → `..._filters_via_property` — the test name still claimed URL filtering after the property-based switch in the previous /simplify pass. All 11 ratings tests pass; live verification on api.waterdata.usgs.gov confirms file_path=None still attaches df.attrs, multi-type filtering returns the expected four tables, and the duration check still rejects "P7D" while accepting "2026-04-29". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two test-only fixes from a broader-ruff (--select ALL) pass. Both already-passed-CI changes are advisory under the project's narrower rule set (F, E, W, I, UP, B, Q, SIM, TID, C90, E501) but worth applying as cheap correctness/cleanliness wins. 1. RUF043: prefix the pytest.raises(match=...) pattern with `r` so the `.*` metacharacters are explicit. Functionally equivalent here (pytest re.searches the message), but the raw string makes the regex intent unambiguous. 2. PLC0415: remove a redundant inline `from dataretrieval.waterdata.ratings import _build_filter` inside test_build_filter_escapes_quotes — `_build_filter` is already imported at module top (left over from before the top-level import landed in an earlier commit on this branch). Other broader-ruff findings reviewed and intentionally skipped as either project-wide patterns or non-actionable: - S113 (`requests.get` without `timeout=`): no other call in the codebase uses one — applying here only would be inconsistent. - PLW1514 (`open(..., "w")` without `encoding=`): same project- wide situation. RDB content is ASCII. - PLR0913 / PLR0917 (8 args on get_ratings): mirrors R's read_waterdata_ratings signature; can't reduce without breaking parity. - FBT001 / FBT002 (boolean positional args download_and_parse / ssl_check): match the rest of waterdata's public surface; reordering would be a breaking API change. - FURB103 / PTH103 / PTH118 / PTH123 (pathlib over os.path): the rest of the codebase uses os.path; consistency wins. - EM101 / EM102 / TRY003 (raise ValueError(msg) with msg pre-assigned): every other `raise ValueError` in waterdata uses the inline f-string; consistency wins. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The USGS RDB tab-separated format is used both by NWIS web services and by the new Water Data STAC catalog's rating-curve assets. Until now, only nwis.py had a parser, and waterdata/ratings.py reached across to import the private nwis._read_rdb. That smell is gone: - New top-level module dataretrieval.rdb with two public helpers: - read_rdb(text, dtypes=None): pure parser. Strips the comment block, tab-parses the header row, skips the format-spec row, and returns a DataFrame. Forwards optional caller-supplied dtype hints to pandas.read_csv (unknown column names are silently ignored, so dicts are safe to pass on any RDB). Raises ValueError on HTML responses. - extract_rdb_comment(text): returns the #-prefixed header block as a list of lines (the equivalent of R's comment(df)). - nwis._read_rdb is now a thin wrapper around rdb.read_rdb that applies the NWIS-specific dtype hints (site_no, dec_long_va, ...) and runs format_response() for datetime indexing. The private symbol's contract is unchanged; existing 30 nwis tests pass. Dropped the now-unused StringIO import. - waterdata.ratings switches its import from nwis._read_rdb to rdb.read_rdb / extract_rdb_comment, dropping the local _extract_rdb_comment helper. The cross-module-private-import comment block goes away — the dependency is now a clean same-package import. 7 new unit tests pin the shared parser's behavior: basic shape parsing, format-spec-row skipping, dtype passthrough, all-comments empty-result, HTML rejection, and the comment extractor. Net stat: +177 / -72. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Aggregated quality-review fixes; no behavior change. Tests: 7 rdb + 11 ratings + 28 nwis pass (was 30 nwis; -2 are pure-duplicate coverage now lived in rdb_test.py). Source: - dataretrieval/rdb.py: rename `rdb` parameter -> `text` on both read_rdb and extract_rdb_comment ("rdb" is the format, not the thing; reads better as `read_rdb(text=...)`). Drop "USGS-internal" from the module docstring (the format is publicly served). Folded the two-pass `[f.replace ... for f in fields if f.strip()]` into one comprehension + a single empty-string filter. Replaced the defensive `>=` on the all-comments guard with `==` (it's exact after `next(..., len(lines))`). Spelled out in extract_rdb_comment's docstring that the returned lines retain their leading `#` and whitespace, matching R. - dataretrieval/nwis.py: drop the `read_rdb as _read_rdb_text` import alias — the local name is too close to the wrapper `_read_rdb` and confused readers. Move `_NWIS_RDB_DTYPES` from near the bottom of the file to the top alongside `_CRS`, where module-level constants live. Drop the narrating "pandas silently ignores unknown names..." comment (already in the read_rdb docstring). Drop the `if df.empty: return df` guard in the wrapper — `format_response()` already short-circuits cleanly on an empty frame (no `dec_lat_va` -> skip GeoDataFrame branch; no `datetime` -> early return). Tests: - tests/nwis_test.py: drop the two pure-duplicate cases (`test_valid_rdb_returns_dataframe` and `test_all_comments_returns_empty_dataframe`) whose coverage now lives in tests/rdb_test.py. Keep and refocus the remaining `test_no_sites_flows_through_format_response` to exercise the wrapper-specific contract (now load-bearing after dropping the empty-frame guard): an empty parser result must still flow through format_response without raising. Issue-DOI-USGS#171 regression intent preserved in the docstring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

thodson-usgs · 2026-05-06T19:01:24Z

@ldecicco-USGS, I think this PR is straightforward, but I did refactor the rdb utilities out of the nwis module. Maybe overkill, but low risk. I'm fine to undo that if you prefer.

…oval The module-level "use waterdata instead" warning has been firing on import for a while; this PR makes the migration guidance actionable by emitting a per-function DeprecationWarning that names the specific waterdata replacement the user should switch to. Once peaks (DOI-USGS#267) and ratings (DOI-USGS#269) land, every active nwis function has a waterdata replacement, so all nine of them are deprecated here: nwis.get_dv -> waterdata.get_daily() nwis.get_iv -> waterdata.get_continuous() nwis.get_info -> waterdata.get_monitoring_locations() nwis.what_sites -> waterdata.get_monitoring_locations() nwis.get_stats -> waterdata.get_stats_por() / waterdata.get_stats_date_range() nwis.get_discharge_peaks -> waterdata.get_peaks() nwis.get_ratings -> waterdata.get_ratings() nwis.get_record -> the appropriate waterdata.get_*() nwis.query_waterdata -> a high-level waterdata.get_*() helper nwis.query_waterservices -> a high-level waterdata.get_*() helper (get_qwdata, get_discharge_measurements, get_gwlevels, get_pmcodes, and get_water_use are already defunct and raise NameError.) Implementation follows the nadp deprecation template (DOI-USGS#243): a small _REPLACEMENTS dict + a _warn_deprecated(func_name) helper called as the first line of each public function. stacklevel=3 makes the warning point at the caller's code, not the helper's frame. 11 new parametrized tests pin the warning text — that the function name appears, the replacement helper appears, and the removal date appears — plus one end-to-end test that get_iv() actually fires its warning when called. Removal date is set to 2027-05-06, one full year out (vs. the six months used for nadp), since nwis is much more widely used and most users will need migration time. Maintainer can adjust if desired. This depends on DOI-USGS#267 (waterdata.get_peaks) and DOI-USGS#269 (waterdata.get_ratings) being merged: until then the deprecation messages for get_discharge_peaks and get_ratings point at functions users can't yet call. Hold this PR draft until those land. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

thodson-usgs mentioned this pull request May 6, 2026

Release staging: PRs targeted for v1.1.4 #270

Closed

thodson-usgs requested a review from Copilot May 6, 2026 15:42

Copilot started reviewing on behalf of thodson-usgs May 6, 2026 15:42 View session

thodson-usgs and others added 3 commits May 6, 2026 11:18

thodson-usgs force-pushed the add-ratings branch from 0a7d4f8 to 2650f50 Compare May 6, 2026 16:19

thodson-usgs review requested due to automatic review settings May 6, 2026 17:05

thodson-usgs and others added 4 commits May 6, 2026 12:56

thodson-usgs requested a review from ldecicco-USGS May 6, 2026 19:01

ldecicco-USGS approved these changes May 6, 2026

View reviewed changes

Merge branch 'main' into add-ratings

29045dc

thodson-usgs marked this pull request as ready for review May 6, 2026 23:42

thodson-usgs mentioned this pull request May 6, 2026

Add waterdata.get_peaks for the annual peak-streamflow OGC collection #267

Merged

3 tasks

thodson-usgs merged commit ba11d6f into DOI-USGS:main May 6, 2026
8 checks passed

thodson-usgs mentioned this pull request May 6, 2026

Deprecate remaining active nwis functions ahead of 2027-05-06 removal #271

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add waterdata.get_ratings for USGS stage-discharge ratings via STAC#269

Add waterdata.get_ratings for USGS stage-discharge ratings via STAC#269
thodson-usgs merged 8 commits intoDOI-USGS:mainfrom
thodson-usgs:add-ratings

thodson-usgs commented May 6, 2026

Uh oh!

thodson-usgs commented May 6, 2026

Uh oh!

thodson-usgs commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

thodson-usgs commented May 6, 2026

Why a new module

Live API examples (verified)

Error paths surfaced upfront

Test plan

Related

Uh oh!

thodson-usgs commented May 6, 2026

Uh oh!

thodson-usgs commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants