Synthetic Dataset Pilot
A new way of protecting privacy for local data
This project was a collaboration with the Urban Institute and Allegheny County's Department of Human Services (DHS) and CountyStat to pilot techniques for creating synthetic data from datasets containing sensitive and protected information in the local government context. Synthetic data generation replaces actual data with representative data generated from statistical models; this preserves the key data properties that allow insights to be drawn from the data while protecting the privacy of the people included in the data.
The Understanding Synthetic Data white paper gives a concise introduction to synthetic data. Much more documentation, and the synthetic dataset itself (standing in for individual-level records representing people in Allegheny County and the various human services they received in a single year) may be found in this dataset on our data portal.