Data Literacy for Data Stewards Missing Data Workshop

by Bob Gradeck

October 26, 2022

Thanks for Adena Bowden for co-authoring this post.

 

Last Friday (October 21, 2022), we continued our twelve-week virtual data literacy series, “Data Literacy for Data Stewards,” with a discussion on data divides and missing data. Our objectives for the workshop included: understanding why data is missing and the power inherent in decisions on what is captured in data and what isn’t; discussing practices such as counter-data collection to fill data divides and shift power; and co-creating a list of practices we can adopt to minimize data divides. 

 

Missing datasets are “the blank spots that exist in spaces that are otherwise data-saturated” (Mimi Ọnụọha). These blank spots “reveal our hidden social biases and indifferences.” Mimi Ọnụọha’s project, The Library of Missing Datasets, calls attention to this issue, highlighting datasets that one might expect to exist already, but that have never been created. Examples include “undocumented immigrants currently incarcerated and/or underpaid,” “employment statistics that include those in federal prisons,” and “trans people killed or injured in instances of hate crime.”  

 

When people and communities are missing from data, data-driven services and decisions fail to serve everyone equally. This data divide – the gap between individuals and communities that are represented in data and those who are not – can exacerbate social and economic inequalities.  

 

Data may be missing for many different reasons; many of which are related to complications inherent in data collection. Drawing on work by Mimi Ọnụọha, the Center for Data Innovation, and Data Feminism (linked below), we compiled the following reasons why data may be missing: 

 

  • Data threatens people with power 
  • Access to data collection technologies 
  • Data is not representative 
  • Biases/inequities  
  • Indigenous/local knowledge not seen as important 
  • People seek to be excluded from data for protection 
  • Data tough to quantify (e.g. people’s emotions) 
  • Lack of participation in data collection  
  • Privacy risks 
  • Fear of misuse 
  • Laws/policies that restrict collection/sharing 
  • Financial capacity 
  • Technological capacity 
  • Inadequate data practices 
  • Access to data collection devices 
  • Cost to acquire/purchase data 
  • Costs outweigh benefits of collection 
 
We then asked participants to work together in breakout groups to create a list of reasons why data can be missing and to cross-reference their lists with ours. Here are some additional reasons they captured: 

 

  • Failure to consider a need for the data 
  • Data may be spread across various sources 
  • Language and cultural barriers 
  • Mistrust of research 
  • Vendor contracts restrict data access 
  • Jurisdiction restrictions and political fragmentation 
  • Misinformed, misaligned, or malicious intentions 
  • Topics that are not new or trending may be disregarded 
  • Lack of understanding regarding the value of data-sharing 
  • Ignorance is bliss 
  • Lack of platforms to share data 

 

Next, we discussed strategies for addressing data divides, which included counter-data collection, uncovering hidden data, and advocacy for better data. We asked participants to think about this in the context of a single missing dataset: why might this data not exist? Are there any risks posed by making the data available? What are some strategies to fill these data gaps? How can communities use data to shift power? 
To close the workshop, we asked participants to develop a list of practices that can be used to minimize data divides and missing data. We asked them to think about people working at varying levels of power over data. Ideas included: 

 

  • Ensuring those affected by the data are participating in the data collection and dissemination process 
  • Providing funding to communities to collect data that is in line with their priorities 
  • Emphasizing the value of qualitative data 
  • Acknowledging limitations of data collection  
  • Sharing information in various formats to expand accessibility 
  • Teaching data literacy in elementary and secondary education 
  • Being the squeaky wheel – Advocate for better data stewardship, ask for the data 
  • Sharing data literacy resources 
 
In our next workshop, we will discuss territorial stigmatization in data. Participants will learn about how residents of frequently-stigmatized communities are harmed by perceptions people have of their neighborhood. By attending this workshop people that work with data will develop shared practices that will foreground structural causes of community deficits, and improve the ways they represent communities in data. 
 
If you are interested in participating in the next cohort of our Data Literacy for Data Stewards peer learning series starting in the first quarter of 2023, email us at wprdc@pitt.edu and we will let you know when registration is open. 
 
Resources: 
The Library of Missing Datasets — MIMI ỌNỤỌHA (mimionuoha.com) 
GitHub – MimiOnuoha/missing-datasets: An overview and exploration of the concept of missing datasets. 
How Can the United States Address the Data Divide? – Center for Data Innovation 
Closing the Data Divide for a More Equitable US Digital Economy | ITIF 
Data Feminism (mit.edu)