Data Literacy for Data Stewards Classification Workshop

by Bob Gradeck

November 23, 2022

Friday, November 18 was week five of twelve of our virtual data literacy series, “Data Literacy for Data Stewards.” During our workshop, we explored the different classification systems we use for data and the values, assumptions, and power hierarchies that are embedded within them. We worked together to explore how we can maximize value and minimize harm from the use of classification systems in data and we co-created a list of practices that we can adopt to develop and use classification systems to advance equity and justice.  

 

To kick off the workshop, participants listed systems that sort, categorize, classify, or group things: 

  • Library books (Library of Congress or Dewey decimal system)  
  • Dublin Core Metadata Schema 
  • Employee hierarchy and payroll 
  • Material Safety Data Sheets – categorizes different chemicals by how hazardous they are 
  • Type of appointment scheduled at a medical center 
  • Relationship types – friend; partner; platonic partner; best friend; acquaintance; etc 
  • Dishwasher racks 
  • Grocery store aisles 
  • Cancer staging system 
  • International Classification of Diseases (ICD) 
  • Organizational Maturity 
  • Types of Kosher Foods 
  • Body Mass Index (BMI) 
  • Alcohol Use Disorders Identification Test (AUDIT) scores 
  • Taxonomy 
  • Logic Models (inputs, activities, outputs, outcomes, impacts) 
  • Grammy nomination categories 
  • Ballot types 
  • Military Rank 
  • National Incident Based Reporting System (NIBRS) for crime data 
  • Geographical Boundaries (neighborhood, council district, fire zones, census tract, ZIP codes) 
  • Race and ethnicity classifications 
  • Billing codes  
  • The Food Pyramid 
  • Maslow’s Hierarchy of Needs 

 

After this list-building activity, we had a full-group discussion where people shared examples of how classification systems can be helpful, harmful, or reinforce existing hierarchies and power structures. Some thoughts from the group: 

  • If you don’t measure something, you can’t appropriately address it. Following World War II, France no longer collects data on race, ethnicity, or religion, fearing atrocities of the Nazi occupation could be revisited. As a result, it is impossible to accurately measure the disadvantages experienced by groups of people defined by race, ethnicity, or national origin.  
  • Some of our classification systems are rooted in dominant hierarchies. The Dewey Decimal System for classifying library materials reflects a white, heterosexual, cisgender, patriarchy of the global north. The system perpetuates colonial mindsets in how we organize and make information accessible, and de-values content provided by non-dominant groups.  Here’s an example of how a group of people from Northeastern University are working to confront the colonial nature of archives through their attempt to Decolonize the Archive. 
  • Geographic boundaries like Zip codes, municipalities, and voting districts may be useful administrative boundaries, but they may not reflect how residents define their communities. 
  • Classification systems may hold benefits for some, but may limit others. Movie rating systems protect young audiences from being exposed to age-inappropriate material, but may censor freedom of speech. 

 

When we attempt to bring order to data by cleaning or classifying it, we lose nuance and context. We asked participants to share thoughts and examples of what is lost when we try to bring order and structure to data. Examples include:  

  • People who do not identify with existing racial, ethnic, or gender categories, such as people who are of a non-binary gender, those that identify as a member of more than one race, or people whose racial identity is excluded as a response option, such as Latinx people, and those of Middle-Eastern or North-African descent. 
  • Classification systems that result in missing or inaccurate data, such as the Uniform Crime Reporting (UCR) classification system, which captures only the highest-ranking crime in the hierarchy for incidents where more than one crime was reported. Police departments including Pittsburgh’s are now transitioning to the National Incident Based Reporting System (NIBRS), which captures each separate crime tied to incidents with multiple crimes. 

 

Next, participants worked together in breakout groups to co-create lists of questions about classification systems that we can use to challenge dominant power structures and hierarchies. Their questions include: 

  • Who designed the classification system? What power do they hold? 
  • If the system classifies people, what power do those being classified hold? 
  • Why was this classification system created?  
  • What was the process behind the creation of the classification system?  
  • Did people captured in the data have the power to design the classification system? 
  • Is this system public? Is it easy to understand? Is it accessible (language, location) 
  • Is the system culturally appropriate? 
  • How will the system be used? How might it be used by others? 
  • What relationships are described or implied by the schema? 
  • Are there other ways to classify or represent the data?  
  • Who benefits from this classification system? Who might be harmed? 
  • Who or what is included in the system? Who/what might be left out? 
  • What biases may exist in the system? What unintended consequences might result from using this system? 
  • How does this classification system affect the perception of the subject matter in question? 
  • Can self-reported values be captured (e.g., How would you describe yourself? Which do you think is best for describing you?)  

 

Finally, as a full group, we discussed practices that we can adopt to design and use classification systems to advance equity and justice. Attendees would like to adopt these “better” practices: 

  • Adopt more inclusive policies, involving individuals from each subgroup so as not to cause unintended harm; 
  • Provide opportunities for feedback when classifying data; 
  • Pair quantitative with qualitative data; allowing room to provide context and capture nuance 
  • Investigate and acknowledge potential limitations and bias when using well-established or popular classification systems; 
  • Consider whether classification systems are necessary. We now have complex and powerful tools and technology that may allow for more nuanced organization of information; 
  • Consider historical and other contexts;  
  • Ask: How have we seen similar data classified before? Who was at the table then? Who wasn’t?; What policies are in place that may have led to that? What are the assumptions behind groupings? How do they interface with power dynamics?; 
  • Think about natural groupings versus non-natural groupings. When collecting data, see if there are natural patterns that emerge that can be used to organize the information; 
  • Decide whether classification systems will be viewed internally or made public. If we’re putting these schemas out into the public, we must be intentional about quality and ethics; 
  • Work to shift the balance of power from those who have historically been in power to those who haven’t by emphasizing equity when classifying information. Can historically marginalized groups gain power from categorized data?; 
  • Note the limitations of classification systems that we’re utilizing. For example, record when data is gathered on a male-female binary. 

 

For more information about creating and using classification systems in an ethical and accountable manner, check out the following resources: 

 

Next time (Friday, December 2, 2022), we will discuss data dashboards. We will work to uncover and understand the power and privilege people have in defining how communities are framed and represented through dashboards and data analyses.  Participants will develop practices that they can use to ensure that dashboards and data analyses accurately reflect and contextualize the range of experiences of people and communities represented in data. 

 

If you are interested in participating in the next cohort of our Data Literacy for Data Stewards peer learning series starting in the first quarter of 2023, email us at wprdc@pitt.edu and we will let you know when registration is open.