Fix get_nearest_continuous: accept scalar targets and missing time column#251
Draft
thodson-usgs wants to merge 4 commits intoDOI-USGS:mainfrom
Draft
Fix get_nearest_continuous: accept scalar targets and missing time column#251thodson-usgs wants to merge 4 commits intoDOI-USGS:mainfrom
thodson-usgs wants to merge 4 commits intoDOI-USGS:mainfrom
Conversation
…lumn
The docstring says ``targets`` accepts "anything ``pandas.to_datetime``
consumes", which includes a bare string or ``pd.Timestamp``. But
``pd.to_datetime("2024-01-01T00:00:00Z", utc=True)`` returns a scalar
``Timestamp``, and ``pd.DatetimeIndex(scalar)`` raises ``TypeError`` —
so single-value cases crashed despite the documented contract.
Wrap a scalar result in a one-element ``DatetimeIndex`` so any
``pandas.to_datetime``-consumable input works.
Also: when the user passes ``properties`` that excludes ``time``, the
helper used to crash with ``KeyError`` deep inside ``df.assign``. Detect
the missing column up front and raise a ``ValueError`` pointing at the
likely cause.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates waterdata.get_nearest_continuous to better match its documented input contract by accepting scalar targets inputs (e.g., a single string or pd.Timestamp) and by failing fast with a clearer error when the get_continuous response lacks a required time column.
Changes:
- Added
_coerce_targetsto normalize scalartargetsinto a one-elementDatetimeIndex. - Added an explicit
timecolumn presence check with a helpfulValueErrorwhen missing (commonly due topropertiesexcluding it). - Added regression tests for scalar
targetsand the missing-timeerror path.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
dataretrieval/waterdata/nearest.py |
Adds _coerce_targets and validates presence of time in the returned DataFrame before processing. |
tests/waterdata_nearest_test.py |
Adds regression coverage for scalar targets inputs and for a missing time column in responses. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| parsed = pd.to_datetime(targets, utc=True) | ||
| if isinstance(parsed, pd.DatetimeIndex): | ||
| return parsed | ||
| return pd.DatetimeIndex([parsed]) |
Comment on lines
+285
to
+287
| assert result["target_time"].iloc[0] == pd.Timestamp( | ||
| "2023-06-15T10:30:31Z", tz="UTC" | ||
| ) |
Comment on lines
+296
to
+297
| target = pd.Timestamp("2023-06-15T10:30:31Z", tz="UTC") | ||
| result, _ = get_nearest_continuous(target, monitoring_location_id="USGS-02238500") |
Per copilot review on PR DOI-USGS#251: - _coerce_targets: detect non-DatetimeIndex iterables (Series, ndarray) via pd.api.types.is_scalar so the elements are preserved instead of being wrapped in a single-element list. Add a regression test passing a pd.Series of two timestamps and assert both are processed. - Tests: drop the redundant tz='UTC' on pd.Timestamp inputs that already carry a Z suffix; pandas 2.x raises on double timezone specification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The docstring says
targetsaccepts "anythingpandas.to_datetimeconsumes", which includes a bare string orpd.Timestamp. Butpd.to_datetime("2024-01-01T00:00:00Z", utc=True)returns a scalarTimestamp, andpd.DatetimeIndex(scalar)raisesTypeError— so single-value cases crashed despite the documented contract.This PR:
_coerce_targetshelper that wraps a scalar result in a one-elementDatetimeIndexso anypandas.to_datetime-consumable input works.timecolumn (e.g. the user passedpropertiesthat excluded it) up front and raisesValueErrorwith a pointer at the likely cause, instead of crashing withKeyErrordeep insidedf.assign.Minimal reproducible example
Pre-fix bug (scalar target):
Post-fix, live API:
Test plan
tests/waterdata_nearest_test.py: single string target round-trips; singlepd.Timestamptarget round-trips; response missing thetimecolumn raises a helpfulValueError.tests/waterdata_nearest_test.pysuite (23 tests) passes.targets="2024-01-01T12:00:00Z"returns the expected single-row DataFrame.