Fix _format_api_dates: parse ISO 8601 datetimes#247
Fix _format_api_dates: parse ISO 8601 datetimes#247thodson-usgs merged 5 commits intoDOI-USGS:mainfrom
Conversation
A single ISO 8601 datetime such as "2018-02-12T23:20:50Z" — the format the docstrings of every getter cite as a valid `time` example — was silently dropped to None. The function only tried "%Y-%m-%d %H:%M:%S" and "%Y-%m-%d", neither of which matches a string with a `T` separator and trailing `Z`. `_construct_api_requests` then set `params["time"] = None`, requests dropped the param, and the user's time filter was silently discarded — the API fell back to "most recent year". Replace the cascading try/except with a single helper that walks the supported formats in order: ISO 8601 with offset and/or fractional seconds, plain ISO 8601, the legacy space-separated form, and date-only. Tz-aware inputs convert directly to UTC; naive inputs continue to be interpreted in the runner's local zone for backwards compatibility. Also tighten the duration / interval pass-through. The previous `re.search(r"P", ..., re.IGNORECASE)` matched any string containing a `p` or `P` anywhere — words like "Apr" or "sample" would skip parsing entirely. Anchor the check to `^[Pp]\d`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes dataretrieval.waterdata.utils._format_api_dates so ISO 8601/RFC 3339 datetimes like 2018-02-12T23:20:50Z are parsed and preserved (rather than silently dropped), and tightens the duration/interval passthrough logic. It also adds regression tests covering the new/expected parsing behavior.
Changes:
- Add
_parse_datetimeplus a supported-format list to parse ISO 8601 variants (incl.Z, fractional seconds, numeric offsets) and legacy formats. - Refactor
_format_api_datesto use the helper, converting tz-aware inputs to UTC and keeping naive inputs localized for backward compatibility. - Add regression tests for ISO 8601 parsing and for avoiding false “duration” matches.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
dataretrieval/waterdata/utils.py |
Refactors datetime parsing/formatting to correctly handle ISO 8601 inputs and adjusts duration/interval passthrough logic. |
tests/waterdata_utils_test.py |
Adds regression tests validating ISO 8601 parsing behavior and passthrough handling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Pass through duration ("P7D") and pre-formatted interval ("a/b") strings | ||
| # untouched. Anchor the duration check so the bare letter ``P`` / ``p`` | ||
| # appearing inside a normal word doesn't accidentally bypass parsing. | ||
| if len(datetime_input) == 1: | ||
| single = datetime_input[0] | ||
| if re.match(r"^[Pp]\d", single) or "/" in single: |
There was a problem hiding this comment.
Resolved as of 74625a0 — the regex was already widened to ^[Pp]T?\d (allowing the optional T for PT-prefixed durations) and is now hoisted to a module-level _DURATION_RE. PT36H round-trips unchanged. See test_format_api_dates_passthrough_time_only_duration in tests/waterdata_utils_test.py.
| if re.match(r"^[Pp]\d", single) or "/" in single: | ||
| return single | ||
|
|
||
| parsed_dates = [_parse_datetime(dt) for dt in datetime_input] | ||
| if any(dt is None for dt in parsed_dates): | ||
| return None | ||
|
|
||
| if date: | ||
| return "/".join(dt.strftime("%Y-%m-%d") for dt in parsed_dates) | ||
|
|
||
| # Localize naive datetimes to the runner's local zone before converting | ||
| # to UTC; tz-aware datetimes are converted directly. | ||
| utc_dates = [ | ||
| (dt if dt.tzinfo is not None else dt.replace(tzinfo=local_timezone)).astimezone( | ||
| ZoneInfo("UTC") | ||
| ) | ||
| for dt in parsed_dates | ||
| ] | ||
| return "/".join(dt.strftime("%Y-%m-%dT%H:%M:%SZ") for dt in utc_dates) |
There was a problem hiding this comment.
This was the ‘falsy-bug’ class I was specifically guarding against. _format_one checks pd.isna(dt) or dt == "" or dt is None before calling _parse_datetime, so non-string NA endpoints render as ".." rather than reaching .endswith and raising. Covered by test_format_api_dates_open_ended_range_with_none (passes ["2024-01-01", None] and the reverse).
| `date` is False. Inputs with an explicit offset (``Z`` or ``+HH:MM``) are | ||
| converted from that offset to UTC; naive inputs are interpreted in the | ||
| local time zone for backwards compatibility. | ||
| - For date ranges, replaces "nan" with ".." in the output. |
There was a problem hiding this comment.
Fair catch — the wording was stale. Fixed in 3316f7a: dropped the literal-"nan"-replacement bullet and replaced it with one that describes the per-element NA contract (single value → None; range endpoint → ".."; range is None only when every element is NA or any non-NA element fails to parse).
| def test_format_api_dates_passthrough_duration(): | ||
| assert _format_api_dates("P7D") == "P7D" | ||
|
|
||
|
|
There was a problem hiding this comment.
Already covered — see test_format_api_dates_passthrough_time_only_duration at tests/waterdata_utils_test.py:148, which asserts _format_api_dates("PT36H") == "PT36H".
Per copilot review on PR DOI-USGS#247: - Broaden the duration passthrough regex from `^[Pp]\\d` to `^[Pp]T?\\d` so time-only durations like `PT36H` (documented in get_continuous and get_daily as valid `time`/`last_modified` values) are preserved instead of being parsed as datetimes and silently dropped. - Per-element NA handling: a `None` / `NaN` / empty endpoint in a 2-value range now becomes `..` in the output, matching the docstring contract and supporting half-bounded intervals like `[date, None]` and `[None, date]`. Previously, a non-string element would raise from `.endswith` inside `_parse_datetime`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…iso8601 # Conflicts: # tests/waterdata_utils_test.py
Per /simplify review on PR DOI-USGS#247: - Lift `_format_one` from inside `_format_api_dates` to module scope so it sits next to `_parse_datetime` like the file's other helpers, and pass `date` / `local_tz` explicitly instead of capturing them via closure. - Split the parenthesized ternary `(parsed if … else parsed.replace(...)).astimezone(...)` into a named `aware` local plus an explicit `.astimezone(...)` call. - Hoist the duration regex into a module-level `_DURATION_RE = re.compile(r"^[Pp]T?\d")` with the rationale comment attached to the constant rather than buried inside the call site. - Trim the now-redundant comment block before the `_format_one` loop. Validated against the live waterdata API: ISO 8601 with `Z`, plain dates, half-bounded ranges, `P7D` duration passthrough, and numeric- offset-to-UTC conversion all produce correct URLs and matching row counts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
dataretrieval/waterdata/utils.py:180
_format_api_datesis annotated as acceptingstr | list[str], but the function (and new tests) support NA endpoints likeNone/NaNin 1–2 element ranges. Consider widening the annotation to include optional/NA elements so static type checkers don’t flag valid usage.
def _format_api_dates(
datetime_input: str | list[str], date: bool = False
) -> str | None:
"""
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -168,7 +214,9 @@ def _format_api_dates( | |||
| - Supports relative period strings (e.g., "P7D") and passes them through | |||
| unchanged. | |||
There was a problem hiding this comment.
Agreed — fixed in 3316f7a. The Notes bullet now distinguishes single-value inputs (NA → None) from range endpoints (NA → ".."), and clarifies that a range only returns None when every element is NA or any non-NA element fails to parse. Also widened the datetime_input annotation to str | list[str | None] | None to match the actual contract.
…docstring - Widen the `datetime_input` annotation to `str | list[str | None] | None` to match the actual contract — range endpoints may be `None`/`NaN`/empty for half-bounded ranges. - Rewrite the Notes bullets to spell out the per-element NA behavior (single value -> None; range endpoint -> ".."; range is None only when every element is NA or a non-NA element fails to parse). - Drop the stale "replaces 'nan' with '..' in the output" bullet — the implementation no longer string-replaces, it handles NA per-element. - Document `PT...` time-only durations alongside `P7D` in the parameter description. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
A single ISO 8601 datetime such as
\"2018-02-12T23:20:50Z\"— the format the docstrings of every getter cite as a validtimeexample — was silently dropped toNone. The function only tried\"%Y-%m-%d %H:%M:%S\"and\"%Y-%m-%d\", neither of which matches a string with aTseparator and trailingZ._construct_api_requeststhen setparams[\"time\"] = None,requestsdropped the param, and the user's time filter was silently discarded — the API fell back to "most recent year".This PR replaces the cascading try/except with a single helper that walks the supported formats in order: ISO 8601 with offset and/or fractional seconds, plain ISO 8601, the legacy space-separated form, and date-only. Tz-aware inputs convert directly to UTC; naive inputs continue to be interpreted in the runner's local zone for backwards compatibility.
It also tightens the duration / interval pass-through. The previous
re.search(r\"P\", ..., re.IGNORECASE)matched any string containing aporPanywhere — words like\"Apr\"or\"sample\"would skip parsing entirely. The check is now anchored to^[Pp]\\d.Test plan
tests/waterdata_utils_test.pycover: ISO 8601 withZ; with fractional seconds; with numeric offset; pair of ISO 8601 datetimes; passthrough of intervals / durations; words containingpare no longer mis-parsed; legacy space-separated form still works; date-only output.Related PRs
Other open PRs in this bug-review series that touch
dataretrieval/waterdata/utils.py(different functions, no functional conflicts):_format_api_datesaccept ISO 8601._arrange_colsstop mutating caller list._handle_stats_nestingtolerate missing drop columns.