839: Add support for reference geometry for geo questions by lindsay-stevens · Pull Request #847 · XLSForm/pyxform

lindsay-stevens · 2026-06-17T14:35:04Z

Closes #839

Why is this the best possible solution? Were any other approaches considered?

Initial pass for feedback
Includes refactoring to clarify existing nodeset generation

What are the regression risks?

The refactoring may have missed some usage pattern that is not currently in the test suite.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

Spec updates TBC getodk/xforms-spec#343 but outline in linked forum post.

Before submitting this PR, please make sure you have:

included test cases for core behavior and edge cases in tests
run python -m unittest and verified all tests pass
run ruff format pyxform tests and ruff check pyxform tests to lint code
verified that any code or assets from external sources are properly credited in comments

- range.py: - pv.validate doesn't change anything so remove assignment. - question.py already processes all attributes in control dict so put the parameters for that location in there. Still useful to have the parsed parameters available separately though. - parameters.py - change signature since validator doesn't change input. - question.py - refactor MultipleChoiceQuestion.build_xml into steps that correspond to the supported options, in a way that can be selectively reused depending on what the itemset use case supports. - simplify RangeQuestion to: - not re-apply parameter attributes already set via control dict. - use the itemset helper to try only internal choices since that seemed to be the intent of the previous code. - remove bind type check since this is set internally, or alternatively if it is user-modifiable then it should be checked earlier and/or have tests added.

- builder.py: - use the presence of just "itemset" as a way to trigger pulling in the choices info, rather than needing to attach choices as well which are ignored in that case anyway. - range.py: - use the above change to simplify the feature signal for the itemset. - question.py - use the pattern in _build_itemset_instance a fall-through default rather than trying to proactively select whether it's needed. - fold the itemset node functionality into a question subclass since that seems a bit neater than using a helper class, and also allows sharing the common __init__ code.

- for geopoint, geoshape, and geotrace, add a parameter 'reference-geo' which accepts a choices list_name, an entities list_name, a reference variable containing the name of a repeat, or the name of an attached CSV file. The parameter value is used to build an itemset node that references source data instance, optionally with a choice_filter.

lognaturel

Thanks so much! Folks have been trying out the beta implementation in Collect and are loving this functionality.

I know this isn't your favorite spec but I like how you've introduced a layer in the question hierarchy for itemset-capable question types, it makes a lot of sense and led to some useful cleanup.

Details below, the main things are:

parameter name should be reference-geometry
any external secondary instance should be supported, regardless of the filename extension
let's drop support for the special itemsets.csv if it's practical

lognaturel · 2026-06-17T20:16:43Z

    trigger_references: list[tuple[str, int]] = []
-    repeat_names = set()
+    geo_references: list[tuple[str, int]] = []
+    csv_sources: set[str] = set()


It could also be XML or geojson sources -- any named secondary instance -- whether it's internal or external or what the file format is shouldn't matter.

I thought pyxform would have a way to list these but now that I think about it we don't validate the parameter to instance calls or anything like that, do we? Can we rely on the code that outputs all the instance declarations by any chance?

Updated validation and test cases to allow all external instance file types in 7b1799b. Since it became a bit lengthy I put some follow-up questions on scope in this comment.

lognaturel · 2026-06-17T20:17:04Z

+            "[row : {row}] On the 'survey' sheet, the 'parameters' value is invalid. "
+            "For geo questions, the 'reference-geo' parameter must be either: "
+            "a choices list_name, an entities list_name, a reference variable containing "
+            "the name of a repeat, or the name of an attached CSV file."


"an attached CSV file" -> "an attached file" (could be XML or geojson)

Updated the error message along these lines in 7b1799b

lognaturel · 2026-06-17T20:35:08Z

+        | survey |
+        | | type                         | name | label | parameters           |
+        | | select_one_from_file s1.csv  | q1   | Q1    |                      |
+        | | {type}                       | q2   | Q2    | reference-geo=s1.csv |


I think it would be more consistent with other usage to use the instance name without the extension. It feels kind of similar to writing an instance('s1') expression where we don't use the extension.

It's true that there's an extension for select_one_from_file s1.csv but that feels to me like it's because we conflated attaching and querying the file. For reference-geometry the file must already be attached.

Updated the usage pattern and tests accordingly in 7b1799b

lognaturel · 2026-06-17T20:37:23Z

+                    ],
+                )
+
+    def test_with_reference_geo__select_one_external__ok(self):


I would prefer not to support this case. I think it's very unlikely someone would want it and in general we can treat the DB-backed CSV querying features as legacy.

In 7b1799b I removed the special case for this from validate_parameter_reference_geometry, although the tests show that the select_one_external / external_choices approach doesn't work on it's own anyway - like with entities it needs a csv-external to trigger the auto-generated secondary instance, which allows the nodeset to reference a file named itemsets.csv.

Unless a regression has snuck in somewhere, the effect of select_one_external / external_choices seems to be just to emit an input control with an instance() expression in a query attribute - and then presumably Collect does some magic to auto-wire that to itemsets.csv without a secondary instance.

I'm not sure it's OK to block the select_one_external / external_choices usage pattern entirely because a user might have a file called itemsets.csv that they created independently. In other words we could say in an error "you cannot use the secondary instance name itemsets with reference-geometry" or "you cannot (do that) if there is an external_choices sheet", which seems potentially noisy?

Collect does some magic to auto-wire that to itemsets.csv without a secondary instance

Exactly.

a user might have a file called itemsets.csv that they created independently

That makes sense. And in that case it would be the regular CSV file code path, right? I guess what I'm saying is let's not include itemsets as a special case for select_one_external, let's let it get considered only if there's e.g. an explicit select_one_from_file itemsets.csv.

- per updated requirements

- previously just allowed csv but the updated requirements are to support any named secondary instance, and to do so using the instance name rather than the file name. - there are also the possibilities of instances generated for pulldata calls and ${last-saved#q} references, but that parsing code and logic is in survey.py which would require a larger refactor so it is excluded from this commit. - test_geo.py: - tightened "extra_q_assertions" to check for the parameter name as an attribute, since that is really the intent of that assertion, and having a count there might conflict with future xform changes.

- moved out mainly to support importing from other modules that xls2json depends on, but it turned out that it was not needed. In particular, as noted in the function body, it supports various legacy scenarios which are not relevant to newer features such as itemset referencing in range or geo parameters.

lognaturel · 2026-06-19T04:31:31Z

Thanks, the functionality is looking complete to me, now!

lognaturel

Looks great! ✨

The least satisfying part for me is the validation of secondary instances but any alternative I can think of feels pretty invasive and not worth it. The best I can come up with is some kind of secondary instance registry that every secondary instance related construct could register into.

- geo.py: - add docstrings, use clearer param names, use constants - avoid use of dict().update({}) and update dict directly - convert existing errors to errors.py module with row refs, and apply corresponding changes to: - errors.py - test_allow_mock_accuracy.py - range.py: - add docstring, use constants - xls2json.py: - don't add to secondary_instances tracking for select_one_external, which allows test_not_supported__select_one_external__error to pass - test_geo.py: - add requirements docstring for reference-geometry - add tests for unsupported cases related to secondary instances

- test_geo.py: - add assertions that the target nodeset exists in the output

lindsay-stevens added 6 commits June 16, 2026 16:39

chg: organise choices and range code into validators module

c9fc5c8

chg: organise range and geo code into validators modules

f80d7c4

chg: combine range xpath assertions, add translation test case

aa7ab86

lindsay-stevens force-pushed the pyxform-839 branch from a913442 to 6ed0215 Compare June 17, 2026 14:55

lognaturel reviewed Jun 17, 2026

View reviewed changes

lindsay-stevens added 3 commits June 18, 2026 14:01

chg: rename reference-geo parameter to reference-geometry

b31e271

- per updated requirements

This was referenced Jun 18, 2026

Create a framework for adding parameters #592

Open

Add support for reference geometry for geo questions #839

Open

lognaturel approved these changes Jun 22, 2026

View reviewed changes

lindsay-stevens added 2 commits June 23, 2026 02:55

chg: tidying implementation of reference-geometry

dd67a6c

- test_geo.py: - add assertions that the target nodeset exists in the output

Conversation

lindsay-stevens commented Jun 17, 2026

Why is this the best possible solution? Were any other approaches considered?

What are the regression risks?

Does this change require updates to documentation? If so, please file an issue here and include the link below.

Before submitting this PR, please make sure you have:

Uh oh!

lognaturel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lognaturel commented Jun 19, 2026

Uh oh!

lognaturel left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lognaturel left a comment •

edited

Loading