Skip to content

Fix _format_api_dates: parse ISO 8601 datetimes#247

Merged
thodson-usgs merged 5 commits intoDOI-USGS:mainfrom
thodson-usgs:fix/format-api-dates-iso8601
May 4, 2026
Merged

Fix _format_api_dates: parse ISO 8601 datetimes#247
thodson-usgs merged 5 commits intoDOI-USGS:mainfrom
thodson-usgs:fix/format-api-dates-iso8601

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

@thodson-usgs thodson-usgs commented May 3, 2026

Summary

A single ISO 8601 datetime such as \"2018-02-12T23:20:50Z\" — the format the docstrings of every getter cite as a valid time example — was silently dropped to None. The function only tried \"%Y-%m-%d %H:%M:%S\" and \"%Y-%m-%d\", neither of which matches a string with a T separator and trailing Z. _construct_api_requests then set params[\"time\"] = None, requests dropped the param, and the user's time filter was silently discarded — the API fell back to "most recent year".

This PR replaces the cascading try/except with a single helper that walks the supported formats in order: ISO 8601 with offset and/or fractional seconds, plain ISO 8601, the legacy space-separated form, and date-only. Tz-aware inputs convert directly to UTC; naive inputs continue to be interpreted in the runner's local zone for backwards compatibility.

It also tightens the duration / interval pass-through. The previous re.search(r\"P\", ..., re.IGNORECASE) matched any string containing a p or P anywhere — words like \"Apr\" or \"sample\" would skip parsing entirely. The check is now anchored to ^[Pp]\\d.

Test plan

  • Ten new regression tests in tests/waterdata_utils_test.py cover: ISO 8601 with Z; with fractional seconds; with numeric offset; pair of ISO 8601 datetimes; passthrough of intervals / durations; words containing p are no longer mis-parsed; legacy space-separated form still works; date-only output.
  • Full local test suite passes (236 tests).

Related PRs

Other open PRs in this bug-review series that touch dataretrieval/waterdata/utils.py (different functions, no functional conflicts):

A single ISO 8601 datetime such as "2018-02-12T23:20:50Z" — the format
the docstrings of every getter cite as a valid `time` example — was
silently dropped to None. The function only tried "%Y-%m-%d %H:%M:%S"
and "%Y-%m-%d", neither of which matches a string with a `T` separator
and trailing `Z`. `_construct_api_requests` then set `params["time"] =
None`, requests dropped the param, and the user's time filter was
silently discarded — the API fell back to "most recent year".

Replace the cascading try/except with a single helper that walks the
supported formats in order: ISO 8601 with offset and/or fractional
seconds, plain ISO 8601, the legacy space-separated form, and date-only.
Tz-aware inputs convert directly to UTC; naive inputs continue to be
interpreted in the runner's local zone for backwards compatibility.

Also tighten the duration / interval pass-through. The previous
`re.search(r"P", ..., re.IGNORECASE)` matched any string containing a
`p` or `P` anywhere — words like "Apr" or "sample" would skip parsing
entirely. Anchor the check to `^[Pp]\d`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes dataretrieval.waterdata.utils._format_api_dates so ISO 8601/RFC 3339 datetimes like 2018-02-12T23:20:50Z are parsed and preserved (rather than silently dropped), and tightens the duration/interval passthrough logic. It also adds regression tests covering the new/expected parsing behavior.

Changes:

  • Add _parse_datetime plus a supported-format list to parse ISO 8601 variants (incl. Z, fractional seconds, numeric offsets) and legacy formats.
  • Refactor _format_api_dates to use the helper, converting tz-aware inputs to UTC and keeping naive inputs localized for backward compatibility.
  • Add regression tests for ISO 8601 parsing and for avoiding false “duration” matches.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
dataretrieval/waterdata/utils.py Refactors datetime parsing/formatting to correctly handle ISO 8601 inputs and adjusts duration/interval passthrough logic.
tests/waterdata_utils_test.py Adds regression tests validating ISO 8601 parsing behavior and passthrough handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dataretrieval/waterdata/utils.py Outdated
Comment on lines +218 to +223
# Pass through duration ("P7D") and pre-formatted interval ("a/b") strings
# untouched. Anchor the duration check so the bare letter ``P`` / ``p``
# appearing inside a normal word doesn't accidentally bypass parsing.
if len(datetime_input) == 1:
single = datetime_input[0]
if re.match(r"^[Pp]\d", single) or "/" in single:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved as of 74625a0 — the regex was already widened to ^[Pp]T?\d (allowing the optional T for PT-prefixed durations) and is now hoisted to a module-level _DURATION_RE. PT36H round-trips unchanged. See test_format_api_dates_passthrough_time_only_duration in tests/waterdata_utils_test.py.

Comment thread dataretrieval/waterdata/utils.py Outdated
Comment on lines +223 to +241
if re.match(r"^[Pp]\d", single) or "/" in single:
return single

parsed_dates = [_parse_datetime(dt) for dt in datetime_input]
if any(dt is None for dt in parsed_dates):
return None

if date:
return "/".join(dt.strftime("%Y-%m-%d") for dt in parsed_dates)

# Localize naive datetimes to the runner's local zone before converting
# to UTC; tz-aware datetimes are converted directly.
utc_dates = [
(dt if dt.tzinfo is not None else dt.replace(tzinfo=local_timezone)).astimezone(
ZoneInfo("UTC")
)
for dt in parsed_dates
]
return "/".join(dt.strftime("%Y-%m-%dT%H:%M:%SZ") for dt in utc_dates)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the ‘falsy-bug’ class I was specifically guarding against. _format_one checks pd.isna(dt) or dt == "" or dt is None before calling _parse_datetime, so non-string NA endpoints render as ".." rather than reaching .endswith and raising. Covered by test_format_api_dates_open_ended_range_with_none (passes ["2024-01-01", None] and the reverse).

Comment thread dataretrieval/waterdata/utils.py Outdated
`date` is False. Inputs with an explicit offset (``Z`` or ``+HH:MM``) are
converted from that offset to UTC; naive inputs are interpreted in the
local time zone for backwards compatibility.
- For date ranges, replaces "nan" with ".." in the output.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair catch — the wording was stale. Fixed in 3316f7a: dropped the literal-"nan"-replacement bullet and replaced it with one that describes the per-element NA contract (single value → None; range endpoint → ".."; range is None only when every element is NA or any non-NA element fails to parse).

def test_format_api_dates_passthrough_duration():
assert _format_api_dates("P7D") == "P7D"


Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already covered — see test_format_api_dates_passthrough_time_only_duration at tests/waterdata_utils_test.py:148, which asserts _format_api_dates("PT36H") == "PT36H".

thodson-usgs and others added 3 commits May 4, 2026 10:06
Per copilot review on PR DOI-USGS#247:

- Broaden the duration passthrough regex from `^[Pp]\\d` to `^[Pp]T?\\d`
  so time-only durations like `PT36H` (documented in get_continuous and
  get_daily as valid `time`/`last_modified` values) are preserved
  instead of being parsed as datetimes and silently dropped.
- Per-element NA handling: a `None` / `NaN` / empty endpoint in a 2-value
  range now becomes `..` in the output, matching the docstring contract
  and supporting half-bounded intervals like `[date, None]` and
  `[None, date]`. Previously, a non-string element would raise from
  `.endswith` inside `_parse_datetime`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…iso8601

# Conflicts:
#	tests/waterdata_utils_test.py
Per /simplify review on PR DOI-USGS#247:
- Lift `_format_one` from inside `_format_api_dates` to module scope so
  it sits next to `_parse_datetime` like the file's other helpers, and
  pass `date` / `local_tz` explicitly instead of capturing them via
  closure.
- Split the parenthesized ternary
  `(parsed if … else parsed.replace(...)).astimezone(...)` into a
  named `aware` local plus an explicit `.astimezone(...)` call.
- Hoist the duration regex into a module-level
  `_DURATION_RE = re.compile(r"^[Pp]T?\d")` with the rationale comment
  attached to the constant rather than buried inside the call site.
- Trim the now-redundant comment block before the `_format_one` loop.

Validated against the live waterdata API: ISO 8601 with `Z`, plain
dates, half-bounded ranges, `P7D` duration passthrough, and numeric-
offset-to-UTC conversion all produce correct URLs and matching row
counts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

dataretrieval/waterdata/utils.py:180

  • _format_api_dates is annotated as accepting str | list[str], but the function (and new tests) support NA endpoints like None/NaN in 1–2 element ranges. Consider widening the annotation to include optional/NA elements so static type checkers don’t flag valid usage.
def _format_api_dates(
    datetime_input: str | list[str], date: bool = False
) -> str | None:
    """

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dataretrieval/waterdata/utils.py Outdated
Comment on lines 213 to 215
@@ -168,7 +214,9 @@ def _format_api_dates(
- Supports relative period strings (e.g., "P7D") and passes them through
unchanged.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — fixed in 3316f7a. The Notes bullet now distinguishes single-value inputs (NA → None) from range endpoints (NA → ".."), and clarifies that a range only returns None when every element is NA or any non-NA element fails to parse. Also widened the datetime_input annotation to str | list[str | None] | None to match the actual contract.

…docstring

- Widen the `datetime_input` annotation to `str | list[str | None] | None`
  to match the actual contract — range endpoints may be `None`/`NaN`/empty
  for half-bounded ranges.
- Rewrite the Notes bullets to spell out the per-element NA behavior
  (single value -> None; range endpoint -> ".."; range is None only when
  every element is NA or a non-NA element fails to parse).
- Drop the stale "replaces 'nan' with '..' in the output" bullet — the
  implementation no longer string-replaces, it handles NA per-element.
- Document `PT...` time-only durations alongside `P7D` in the parameter
  description.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@thodson-usgs thodson-usgs marked this pull request as ready for review May 4, 2026 20:22
@thodson-usgs thodson-usgs merged commit 2a43a50 into DOI-USGS:main May 4, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants