Skip to content

Fix _arrange_cols: stop mutating the caller's properties list#254

Merged
thodson-usgs merged 6 commits intoDOI-USGS:mainfrom
thodson-usgs:fix/arrange-cols-mutation
May 5, 2026
Merged

Fix _arrange_cols: stop mutating the caller's properties list#254
thodson-usgs merged 6 commits intoDOI-USGS:mainfrom
thodson-usgs:fix/arrange-cols-mutation

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

@thodson-usgs thodson-usgs commented May 3, 2026

Summary

_arrange_cols did properties.append("geometry") and properties[properties.index("id")] = output_id in place. Calling any getter twice with the same properties=[...] list therefore caused that list to grow unboundedly and have id rewritten across calls — a real action-at-a-distance defect for any user code that re-uses a properties list across requests.

This PR takes a local copy of the list and mutates that. The caller's list is untouched.

Minimal reproducible example

A common pattern: define a properties list once and reuse it for two different getters.

from dataretrieval.waterdata import get_daily, get_continuous

cols = ["id", "monitoring_location_id", "value", "time"]

# Call 1 - get_daily mutates `cols` in place
get_daily(
    monitoring_location_id="USGS-05427718",
    parameter_code="00060",
    time="2025-01-01/2025-01-02",
    properties=cols,
)
# cols is now ['daily_id', 'monitoring_location_id', 'value', 'time', 'geometry']

# Call 2 - same list, different getter, hits the live API with the
# already-mutated names
get_continuous(
    monitoring_location_id="USGS-06904500",
    parameter_code="00065",
    time="2025-01-01T00:00:00Z/2025-01-01T01:00:00Z",
    properties=cols,
)
# RuntimeError: 400: InvalidParameterValue. unknown properties specified.

The 'id' -> 'daily_id' rewrite from call 1 leaks into the wire request for get_continuous, which doesn't recognize daily_id as a valid property and rejects the entire request. With this PR, the same code succeeds — cols is byte-for-byte unchanged after each call. The same defect surfaces in three more everyday patterns (loop over sites with shared list, get_continuous -> get_daily, get_daily -> get_monitoring_locations); all four fail with 400 against the live API on main and all four succeed on this branch.

Test plan

  • Three new regression tests in tests/waterdata_utils_test.py: caller list is unchanged after two consecutive calls; 'id' still resolves to the output_id column in the result; geometry is still tacked on when present in the response.
  • Existing tests in the file still pass.
  • Full local test suite passes (with API_USGS_PAT set).
  • Verified live against api.waterdata.usgs.gov that the four cross-getter / loop scenarios above fail with 400 on main and succeed on this branch.

Related PRs

Other open PRs in this bug-review series that touch dataretrieval/waterdata/utils.py (different functions, no functional conflicts):

thodson-usgs and others added 2 commits May 3, 2026 15:20
`_arrange_cols` did `properties.append("geometry")` and
`properties[properties.index("id")] = output_id` in place. Calling any
getter twice with the same `properties=[...]` therefore caused that
list to grow unboundedly and have `id` rewritten across calls — a
real action-at-a-distance defect for any code that re-uses a
`properties` list across requests.

Take a local copy of the list and mutate that. Caller's list is
untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the regression-narrative phrasing for the local-copy comment and
trim the multi-line "id is technically a valid column..." block to a
two-line WHY explaining the rename intent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an action-at-a-distance bug in dataretrieval.waterdata.utils._arrange_cols where the function mutated the caller-provided properties list in-place, causing surprising behavior when the same list instance is reused across multiple requests.

Changes:

  • Stop mutating the caller’s properties list by copying it to a local list before applying id/geometry adjustments.
  • Add regression/behavior tests covering non-mutation, idoutput_id behavior, and retention of geometry when present.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
dataretrieval/waterdata/utils.py Copies properties before mutation to prevent side effects across calls.
tests/waterdata_utils_test.py Adds regression + behavior tests validating the fixed _arrange_cols semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dataretrieval/waterdata/utils.py Outdated
@@ -687,15 +687,15 @@ def _arrange_cols(
# If properties are provided, filter to only those columns
# plus geometry if skip_geometry is False
thodson-usgs and others added 2 commits May 4, 2026 10:11
Per copilot review on PR DOI-USGS#254. _arrange_cols doesn't take skip_geometry
- it conditionally keeps the geometry column based on whether one is
present in the DataFrame.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion

# Conflicts:
#	tests/waterdata_utils_test.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

thodson-usgs and others added 2 commits May 5, 2026 10:19
The "If properties are provided, filter to only those columns,
appending the geometry column when present" block restated what the
code below is already self-evidently doing. The two retained comments
in the body are the load-bearing WHY-comments (defensive copy
rationale, id-rename rationale).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tation

# Conflicts:
#	tests/waterdata_utils_test.py
@thodson-usgs thodson-usgs marked this pull request as ready for review May 5, 2026 15:25
@thodson-usgs thodson-usgs merged commit dd70cfa into DOI-USGS:main May 5, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants