Skip to content

Fix get_nearest_continuous: accept scalar targets and missing time column#251

Draft
thodson-usgs wants to merge 4 commits intoDOI-USGS:mainfrom
thodson-usgs:fix/nearest-continuous-crashes
Draft

Fix get_nearest_continuous: accept scalar targets and missing time column#251
thodson-usgs wants to merge 4 commits intoDOI-USGS:mainfrom
thodson-usgs:fix/nearest-continuous-crashes

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

@thodson-usgs thodson-usgs commented May 3, 2026

Summary

The docstring says targets accepts "anything pandas.to_datetime consumes", which includes a bare string or pd.Timestamp. But pd.to_datetime("2024-01-01T00:00:00Z", utc=True) returns a scalar Timestamp, and pd.DatetimeIndex(scalar) raises TypeError — so single-value cases crashed despite the documented contract.

This PR:

  • Adds a small _coerce_targets helper that wraps a scalar result in a one-element DatetimeIndex so any pandas.to_datetime-consumable input works.
  • Detects when the response is missing a time column (e.g. the user passed properties that excluded it) up front and raises ValueError with a pointer at the likely cause, instead of crashing with KeyError deep inside df.assign.

Minimal reproducible example

Pre-fix bug (scalar target):

import pandas as pd

# This is the relevant pre-fix construct from get_nearest_continuous:
pd.DatetimeIndex(pd.to_datetime("2024-01-01T00:00:00Z", utc=True))
# TypeError: DatetimeIndex(...) must be called with a collection of some kind,
#            Timestamp('2024-01-01 00:00:00+0000', tz='UTC') was passed

pd.DatetimeIndex(pd.to_datetime(pd.Timestamp("2024-01-01"), utc=True))
# Same TypeError.

# Only a list/tuple worked:
pd.DatetimeIndex(pd.to_datetime(["2024-01-01"], utc=True))  # OK

Post-fix, live API:

from dataretrieval.waterdata import get_nearest_continuous

df, md = get_nearest_continuous(
    monitoring_location_id="USGS-01646500",
    parameter_code="00060",
    targets="2024-01-01T12:00:00Z",   # bare string — formerly crashed
)
print(df[["time", "value"]])
#                        time  value
# 0 2024-01-01 12:00:00+00:00   9260

Test plan

  • Three new regression tests in tests/waterdata_nearest_test.py: single string target round-trips; single pd.Timestamp target round-trips; response missing the time column raises a helpful ValueError.
  • Full tests/waterdata_nearest_test.py suite (23 tests) passes.
  • Live verification against the USGS waterdata API: scalar targets="2024-01-01T12:00:00Z" returns the expected single-row DataFrame.

…lumn

The docstring says ``targets`` accepts "anything ``pandas.to_datetime``
consumes", which includes a bare string or ``pd.Timestamp``. But
``pd.to_datetime("2024-01-01T00:00:00Z", utc=True)`` returns a scalar
``Timestamp``, and ``pd.DatetimeIndex(scalar)`` raises ``TypeError`` —
so single-value cases crashed despite the documented contract.

Wrap a scalar result in a one-element ``DatetimeIndex`` so any
``pandas.to_datetime``-consumable input works.

Also: when the user passes ``properties`` that excludes ``time``, the
helper used to crash with ``KeyError`` deep inside ``df.assign``. Detect
the missing column up front and raise a ``ValueError`` pointing at the
likely cause.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates waterdata.get_nearest_continuous to better match its documented input contract by accepting scalar targets inputs (e.g., a single string or pd.Timestamp) and by failing fast with a clearer error when the get_continuous response lacks a required time column.

Changes:

  • Added _coerce_targets to normalize scalar targets into a one-element DatetimeIndex.
  • Added an explicit time column presence check with a helpful ValueError when missing (commonly due to properties excluding it).
  • Added regression tests for scalar targets and the missing-time error path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
dataretrieval/waterdata/nearest.py Adds _coerce_targets and validates presence of time in the returned DataFrame before processing.
tests/waterdata_nearest_test.py Adds regression coverage for scalar targets inputs and for a missing time column in responses.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dataretrieval/waterdata/nearest.py Outdated
parsed = pd.to_datetime(targets, utc=True)
if isinstance(parsed, pd.DatetimeIndex):
return parsed
return pd.DatetimeIndex([parsed])
Comment thread tests/waterdata_nearest_test.py Outdated
Comment on lines +285 to +287
assert result["target_time"].iloc[0] == pd.Timestamp(
"2023-06-15T10:30:31Z", tz="UTC"
)
Comment thread tests/waterdata_nearest_test.py Outdated
Comment on lines +296 to +297
target = pd.Timestamp("2023-06-15T10:30:31Z", tz="UTC")
result, _ = get_nearest_continuous(target, monitoring_location_id="USGS-02238500")
thodson-usgs and others added 3 commits May 4, 2026 10:09
Per copilot review on PR DOI-USGS#251:

- _coerce_targets: detect non-DatetimeIndex iterables (Series, ndarray)
  via pd.api.types.is_scalar so the elements are preserved instead of
  being wrapped in a single-element list. Add a regression test passing
  a pd.Series of two timestamps and assert both are processed.
- Tests: drop the redundant tz='UTC' on pd.Timestamp inputs that already
  carry a Z suffix; pandas 2.x raises on double timezone specification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants