Skip to content

Improve synth data#246

Draft
tomjemmett wants to merge 3 commits intomainfrom
improve_synth_data
Draft

Improve synth data#246
tomjemmett wants to merge 3 commits intomainfrom
improve_synth_data

Conversation

@tomjemmett
Copy link
Copy Markdown
Member

  • ensures we only ever sample from ip/op/a&e rows where there is at least 10 bits of activity
  • samples outpatients/a&e at a rate to generate a target amount of rows
  • randomly samples columns in inpatients to ensure the rows are synthetic and aren't an actual admission

@tomjemmett tomjemmett force-pushed the improve_synth_data branch 3 times, most recently from 37529ab to 1cbf7f3 Compare May 5, 2026 12:29
tomjemmett added 3 commits May 6, 2026 09:25
- adds some additional static variables for controlling how many rows to generate for each dataset
- adds a filter to ensure that we only select rows that have 10 bits of activity to begin with
- ensures we sample from rows that appear enough times to be anonymous
- random samples the strategies so the data no longer relates to individuals
- resamples the length of stay to preseve anonymity
@tomjemmett tomjemmett force-pushed the improve_synth_data branch from 1cbf7f3 to bcf54fa Compare May 6, 2026 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant