839: Add support for reference geometry for geo questions#847
839: Add support for reference geometry for geo questions#847lindsay-stevens wants to merge 11 commits into
Conversation
- range.py:
- pv.validate doesn't change anything so remove assignment.
- question.py already processes all attributes in control dict so
put the parameters for that location in there. Still useful to have
the parsed parameters available separately though.
- parameters.py
- change signature since validator doesn't change input.
- question.py
- refactor MultipleChoiceQuestion.build_xml into steps that correspond
to the supported options, in a way that can be selectively reused
depending on what the itemset use case supports.
- simplify RangeQuestion to:
- not re-apply parameter attributes already set via control dict.
- use the itemset helper to try only internal choices since that
seemed to be the intent of the previous code.
- remove bind type check since this is set internally, or
alternatively if it is user-modifiable then it should be checked
earlier and/or have tests added.
- builder.py:
- use the presence of just "itemset" as a way to trigger pulling in
the choices info, rather than needing to attach choices as well
which are ignored in that case anyway.
- range.py:
- use the above change to simplify the feature signal for the itemset.
- question.py
- use the pattern in _build_itemset_instance a fall-through default
rather than trying to proactively select whether it's needed.
- fold the itemset node functionality into a question subclass since
that seems a bit neater than using a helper class, and also allows
sharing the common __init__ code.
- for geopoint, geoshape, and geotrace, add a parameter 'reference-geo' which accepts a choices list_name, an entities list_name, a reference variable containing the name of a repeat, or the name of an attached CSV file. The parameter value is used to build an itemset node that references source data instance, optionally with a choice_filter.
a913442 to
6ed0215
Compare
There was a problem hiding this comment.
Thanks so much! Folks have been trying out the beta implementation in Collect and are loving this functionality.
I know this isn't your favorite spec but I like how you've introduced a layer in the question hierarchy for itemset-capable question types, it makes a lot of sense and led to some useful cleanup.
Details below, the main things are:
- parameter name should be
reference-geometry - any external secondary instance should be supported, regardless of the filename extension
- let's drop support for the special itemsets.csv if it's practical
| trigger_references: list[tuple[str, int]] = [] | ||
| repeat_names = set() | ||
| geo_references: list[tuple[str, int]] = [] | ||
| csv_sources: set[str] = set() |
There was a problem hiding this comment.
It could also be XML or geojson sources -- any named secondary instance -- whether it's internal or external or what the file format is shouldn't matter.
I thought pyxform would have a way to list these but now that I think about it we don't validate the parameter to instance calls or anything like that, do we? Can we rely on the code that outputs all the instance declarations by any chance?
There was a problem hiding this comment.
Updated validation and test cases to allow all external instance file types in 7b1799b. Since it became a bit lengthy I put some follow-up questions on scope in this comment.
| "[row : {row}] On the 'survey' sheet, the 'parameters' value is invalid. " | ||
| "For geo questions, the 'reference-geo' parameter must be either: " | ||
| "a choices list_name, an entities list_name, a reference variable containing " | ||
| "the name of a repeat, or the name of an attached CSV file." |
There was a problem hiding this comment.
"an attached CSV file" -> "an attached file" (could be XML or geojson)
There was a problem hiding this comment.
Updated the error message along these lines in 7b1799b
| | survey | | ||
| | | type | name | label | parameters | | ||
| | | select_one_from_file s1.csv | q1 | Q1 | | | ||
| | | {type} | q2 | Q2 | reference-geo=s1.csv | |
There was a problem hiding this comment.
I think it would be more consistent with other usage to use the instance name without the extension. It feels kind of similar to writing an instance('s1') expression where we don't use the extension.
It's true that there's an extension for select_one_from_file s1.csv but that feels to me like it's because we conflated attaching and querying the file. For reference-geometry the file must already be attached.
There was a problem hiding this comment.
Updated the usage pattern and tests accordingly in 7b1799b
| ], | ||
| ) | ||
|
|
||
| def test_with_reference_geo__select_one_external__ok(self): |
There was a problem hiding this comment.
I would prefer not to support this case. I think it's very unlikely someone would want it and in general we can treat the DB-backed CSV querying features as legacy.
There was a problem hiding this comment.
In 7b1799b I removed the special case for this from validate_parameter_reference_geometry, although the tests show that the select_one_external / external_choices approach doesn't work on it's own anyway - like with entities it needs a csv-external to trigger the auto-generated secondary instance, which allows the nodeset to reference a file named itemsets.csv.
Unless a regression has snuck in somewhere, the effect of select_one_external / external_choices seems to be just to emit an input control with an instance() expression in a query attribute - and then presumably Collect does some magic to auto-wire that to itemsets.csv without a secondary instance.
I'm not sure it's OK to block the select_one_external / external_choices usage pattern entirely because a user might have a file called itemsets.csv that they created independently. In other words we could say in an error "you cannot use the secondary instance name itemsets with reference-geometry" or "you cannot (do that) if there is an external_choices sheet", which seems potentially noisy?
There was a problem hiding this comment.
Collect does some magic to auto-wire that to itemsets.csv without a secondary instance
Exactly.
a user might have a file called itemsets.csv that they created independently
That makes sense. And in that case it would be the regular CSV file code path, right? I guess what I'm saying is let's not include itemsets as a special case for select_one_external, let's let it get considered only if there's e.g. an explicit select_one_from_file itemsets.csv.
- per updated requirements
- previously just allowed csv but the updated requirements are to
support any named secondary instance, and to do so using the instance
name rather than the file name.
- there are also the possibilities of instances generated for
pulldata calls and ${last-saved#q} references, but that parsing code
and logic is in survey.py which would require a larger refactor so
it is excluded from this commit.
- test_geo.py:
- tightened "extra_q_assertions" to check for the parameter name as an
attribute, since that is really the intent of that assertion, and
having a count there might conflict with future xform changes.
- moved out mainly to support importing from other modules that xls2json depends on, but it turned out that it was not needed. In particular, as noted in the function body, it supports various legacy scenarios which are not relevant to newer features such as itemset referencing in range or geo parameters.
|
Thanks, the functionality is looking complete to me, now! |
lognaturel
left a comment
There was a problem hiding this comment.
Looks great! ✨
The least satisfying part for me is the validation of secondary instances but any alternative I can think of feels pretty invasive and not worth it. The best I can come up with is some kind of secondary instance registry that every secondary instance related construct could register into.
- geo.py:
- add docstrings, use clearer param names, use constants
- avoid use of dict().update({}) and update dict directly
- convert existing errors to errors.py module with row refs, and
apply corresponding changes to:
- errors.py
- test_allow_mock_accuracy.py
- range.py:
- add docstring, use constants
- xls2json.py:
- don't add to secondary_instances tracking for select_one_external,
which allows test_not_supported__select_one_external__error to pass
- test_geo.py:
- add requirements docstring for reference-geometry
- add tests for unsupported cases related to secondary instances
- test_geo.py: - add assertions that the target nodeset exists in the output
Closes #839
Why is this the best possible solution? Were any other approaches considered?
What are the regression risks?
The refactoring may have missed some usage pattern that is not currently in the test suite.
Does this change require updates to documentation? If so, please file an issue here and include the link below.
Spec updates TBC getodk/xforms-spec#343 but outline in linked forum post.
Before submitting this PR, please make sure you have:
testspython -m unittestand verified all tests passruff format pyxform testsandruff check pyxform teststo lint code