Releases: The-Strategy-Unit/nhp_data
v4.4.0
What's Changed
Added new population catchments #227
We now use the same logic for creating catchment populations for both the inputs rates data and the model demographics module. The logic has been slightly refined to be closer to OHID's earlier work to create hospital catchments.
Abstract inputs geography type by #230
We now have inputs data available both at provider level, and local authority (LAD23CD) level.
We also now include all providers in the inputs data, not just acute providers (#233)
Removes mental health providers from tables by #231
By including all providers in the inputs data, we introduced a new issue where mean lengths of stay for inpatients were being skewed by Mental Health providers, where patients can have significantly greater lengths of stays than in acute providers. We remove these providers from all of our data extracts.
We do this slightly later for ECDS data though, as some providers which are classed as Mental Health do run walk in centres/minor injury units (#234).
Other PR's
- fix typing issues by @tomjemmett in #228
- removes deprecated function by @tomjemmett in #229
- fix typing issues for ty 0.0.12 by @tomjemmett in #232
Full Changelog: v4.3.0...v4.4.0
v4.3.0
What's Changed
- Abstract table names by @tomjemmett in #225
- adds filter to remove long los (data quality) by @tomjemmett in #226
Full Changelog: v4.2.1...v4.3.0
v4.2.1
What's Changed
- adds script to generate synthetic data by @tomjemmett in #217
- uses correct specialty code by @tomjemmett in #219
Full Changelog: v4.2.0...v4.2.1
v4.2.0
What's Changed
Refactored to be a python package
The code is now a python package that can be built and deployed to databricks.
Switch to asset bundle deployments
All of the workflows are deployed via GitHub actions now on push to main.
Implemented inequalities changes
Inequalities is now calculated by both provider and ICB.
Data changes
- we were previously dropping some inpatients rows when they had a missing
hsagrp. We now introduce an "unknown" group, so a very small amount of additional rows included - we move the filtering of
speldurto earlier in the pipeline. This only has a small impact on some of the mitigators and inputs data - we have updated the day procedures code list to use the same filtering on procedure codes as the
has_procedurescolumn in inpatients, and based on 2023/24 rather than 2019/20.
PR's
- Update asset bundle deployment by @tomjemmett in #185
- add script to generate day procedures code list by @tomjemmett in #172
- updates readme by @tomjemmett in #192
- fix typing on evidence based interventions by @tomjemmett in #197
- fix typing is not null by @tomjemmett in #196
- fixes return types of functions by @tomjemmett in #198
- fix typing issues in ods trusts by @tomjemmett in #194
- fix typing issues by @tomjemmett in #199
- adds ty: ignore for monkey-patched methods by @tomjemmett in #201
- switches to databricks 16.4 runtime by @tomjemmett in #200
- add icb to inequalities by @yiwen-h in #195
- adds lint/typing actions by @tomjemmett in #203
- Add ability to support ICB model runs by @tomjemmett in #202
- fixes issue introduced in previous commit with null checks by @tomjemmett in #206
- fix wrong param in pop_by_imd_decile by @yiwen-h in #207
- add inequalities to pyproject.toml entrypoints by @yiwen-h in #208
- removes waiting list/covid adjustment inputs steps by @tomjemmett in #209
- correct fyear format and save path for inequalities data extract by @yiwen-h in #210
- fix day procedure mitigator by @tomjemmett in #212
- adds script to clean up spark files by @tomjemmett in #213
- adds can manage permissions to the service principal by @tomjemmett in #214
- fix issue with logic (prior bad regex replace) by @tomjemmett in #215
- add missing providers to inequalities extract by @yiwen-h in #216
Full Changelog: v4.1.0...v4.2.0
v4.1.0
What's Changed
- ensures all datasets have age filter by @tomjemmett in #167
- move steps from nhp_model into nhp_data by @tomjemmett in #166
- fixes typo in function call by @tomjemmett in #168
- ensures age column is created before using it by @tomjemmett in #169
- fixes typo by @tomjemmett in #170
- adds icb to ecds/op aggregated data by @tomjemmett in #171
- removes maternity 'mitigators' from inputs data by @tomjemmett in #174
- adds script to get the rtt incompletes data by @tomjemmett in #175
- adds script to generate the trust types parquet file by @tomjemmett in #176
- add ICB to population by imd decile table by @yiwen-h in #179
- fix duplicate age_group col in pop_by_imd_decile script by @yiwen-h in #180
Full Changelog: v4.0.0...v4.1.0
v4.0.0
What's Changed
Add 2022 pop projections #146
2022-based Subnational Population Projections have been released by ONS. We now use these for the demographic adjustment in the model. This PR introduced this new data into the code base. The following PR's are fixes to ensure that this actually worked.
- update provider catchment methodology by @tomjemmett in #151
- changes which schema the catchments table lives in by @tomjemmett in #153
- fix demographic extract by @tomjemmett in #154
- remove 2019 and 2022 from gams scripts, change principal_proj to migration_category by @yiwen-h in #155
- fixes birth extract logic by @tomjemmett in #158
- switches the file we use for the migration_category variant by @tomjemmett in #157
- adds local authority successor mappings by @tomjemmett in #156
- adds missing var_proj_zero_net_migration to snpp step by @tomjemmett in #159
- fix error with partitioning in custom birth factors by @yiwen-h in #160
- ensures we union in the base birth adjustment data with custom projections by @tomjemmett in #163
Alter tretspef cols #141
Previously, we had a bit of a bodge with the tretspef (treatment specialty function) column. When we extracted the data, we grouped these specialties to only keep the specialties used in RTT reporting. We later added in the ungrouped tretspef column, but in order not to break other code we added this as tretspef_raw.
We now handle this a bit more cleanly by having the actual values in tretspef, introducing a new column called tretspef_grouped. This has reduced the need for the same bit of code to be present in multiple locations, as we now handle the grouping early in the processing.
General tidying up
-
adds scripts to recreate the default layer views by @tomjemmett in #143
-
renames files to match the folder structure by @tomjemmett in #142
-
fixes the extract workflow by @tomjemmett in #144
-
alter workflows by @tomjemmett in #164
Full Changelog: v3.6.0...v4.0.0
v3.6.0
What's Changed
Custom Population Projections
- Add custom_projection for R0A to extracted demographic factors data by @yiwen-h in #97
- add custom RD8 projections to model data extract by @yiwen-h in #121
- fix custom birth factors by @tomjemmett in #127
- fixes issue with birth factors by @tomjemmett in #133
Data Changes
- switch to using both ccg of residence and responsibility by @tomjemmett in #108
- uses correct columns for ecds ccgs by @tomjemmett in #117
- switch EBI mitigators to age field by @tomjemmett in #110
- adds the ethnic category (ethnos) to the raw_data schema tables by @tomjemmett in #115
- adds the fyear/provider partition columns to mitigators table by @tomjemmett in #116
- force maternity spells to be Other (Medical) treatment function by @tomjemmett in #107
Refactors
- refactor virtual wards by @tomjemmett in #106
- change to SparkSession by @tomjemmett in #120
- importing from incorrect module by @tomjemmett in #124
- Split model data extract into separate files by @yiwen-h in #125
- updates data extract workflow for new file structure by @yiwen-h in #129
- update codeowners by @tomjemmett in #130
- fixes issue with search/replace by @tomjemmett in #131
- fix inputs pipeline by @tomjemmett in #132
Full Changelog: v3.5.0...v3.6.0
v3.5.0
What's Changed
Population Data
Previously, we had been using an older code base (nhp_demogr_module_inputs).
This code base was not in any automated processes and hasn't been re-run in a long time.
This process has now been moved across to run in databricks with the rest of our data pipelines.
Also, they are now ready for switching to newer population projections when ONS releases them.
- add population projections by @tomjemmett in #98
- fixes issue with variant projections by @tomjemmett in #105
Health Status Adjustment GAMs
A bug was discovered in the way we generate the activity rates which are used by the GAMs for Health Status Adjustment.
When we work out the amount of activity by provider, we join to the catchment population for that provider.
However, the join was incorrect and was not including the provider, so we were dividing by all of the populations (for every provider).
- fix gams by @tomjemmett in #103
- Fix GAMs part deux by @tomjemmett in #104
Full Changelog: v3.4.0...v3.5.0
v3.4.0
What's Changed
Replaced AEC mitigators with SDEC mitigators
These new mitigators (mostly) target the same activity as before, but instead of reducing the length of stay to 0, the activity is removed from inpatients and added to the type-05 activity in A&E/ECDS.
- replaces aec with sdec mitigators by @tomjemmett in #89
- Update sdec documentation by @tomjemmett in #90
Other minor changes
- add outpatients mitigators table by @tomjemmett in #85
- fixes path to demographic files by @tomjemmett in #91
- updates inputs workflow to be linear by @tomjemmett in #92
Full Changelog: v3.3.0...v3.4.0
v3.3.0
What's Changed
- filters age/sex data for inputs app to ensure n > 5 by @tomjemmett in #62
- Add imd fields by @tomjemmett in #63
- adds script for processing ODS xml data by @tomjemmett in #64
- updates to use new nhp catalog by @tomjemmett in #65
- renames main icb file by @tomjemmett in #67
- fix issue of circular dependency for main icb by @tomjemmett in #68
- removes tqdm from inpatient mitigators by @tomjemmett in #69
- adds script to generate population by imd decile table by @tomjemmett in #70
- Add inequalities to inputs data pipeline by @yiwen-h in #74
- uses correct logic for population by imd decile by @tomjemmett in #73
- Move notebooks from nhp_model by @tomjemmett in #76
- ensure arguments are correct types by @tomjemmett in #77
- fixes extract workflow by @tomjemmett in #78
- add extra columns to raw layer by @tomjemmett in #80
- Collapses rows in extracted model OP data if not identified as activity impacted by inequalities by @yiwen-h in #82
Full Changelog: v3.2.1...v3.3.0