Measuring reinfections in COVID-19 data

by Bob Gradeck

December 17, 2021

We’re now into the 22nd month of the pandemic, and it is becoming more-common for people to have experienced multiple COVID-19 infections. The Pennsylvania Department of Health (DOH) recently made some changes to the data feed provided to the Allegheny County Health Department making it possible to identify characteristics of people experiencing a COVID-19 infection. 
After some data wrangling on the part of Allegheny County, these improvements to the daily open data feed are now available through the Regional Data Center’s open data portal. Currently, only data on initial reinfections is being received. Reinfections are counted if they occur at least 90 days following the initial infection. It is unclear if data on third infections will be shared by the DOH in the future.
As of data published on December 15, 1,664 people in Allegheny County have experienced reinfection, and 157,325 people have reported one infection. Excluding people who experienced their first infection outside of Allegheny County, the median length of time between initial infection and reinfection was 233 days, and the average number of days was 239. The chart below shows the length of time between cases for those experiencing multiple infections using the date the specimen was collected.


Figure 1: Number of days between initial COVID-19 infection and reinfection in Allegheny County among people with more than one reported infection in the data
Histogram showing the number of people experiencing reinfection by the number of days between reinfection.
Data source: Allegheny County Health Department, analysis by Western Pennsylvania Regional Data Center


Here’s how some of our primary COVID datasets have changed:

The COVID Tests and Cases dataset has up until now included only one record per person with a COVID-19 infection. The improvements to this dataset now allow for a look at the number of reinfections, and the length of time between infections. Each person reflected in the dataset has their own unique identifier in the data (the “indiv_id” field), and the new “case_number” field allows the data user to distinguish between initial infections and reinfections. This dataset also includes information about race, age, ethnicity, and sex, and flags for when people are hospitalized, admitted to the ICU, and placed on a ventilator. 
The two community-level tables now include a field tallying the number of reinfections by municipality and Pittsburgh neighborhood. One of the tables provides a cumulative count of the number of people tested, number of tests, positive results, infections, reinfections, hospitalizations, and deaths, by community. The table providing monthly counts includes the same information as the cumulative table, and allows for an analysis of the pandemic over time in all municipalities and Pittsburgh neighborhoods.


Datasets that have not changed include:

The individual test result dataset now includes over 2.6 million test records and provides information on the type of test, dates the specimens were collected and reported, test results, and the age, race, sex, and ethnicity of the person tested.
Data describing deaths in Allegheny County is available in two tables. The deaths by date table captures the number of COVID deaths by date of death, and the deaths by demographic groups table provides a cumulative count of deaths by age, sex, race, and ethnicity. 


Thank yous are in order

Our data pipeline was temporarily broken as a result of the changes to the testing dates and case confirmation reporting processes made by the Pennsylvania Department of Health. We want to acknowledge all of the work that went into getting everything back up and running on the part of the data team at Allegheny County. They had no lead time to prepare for these changes before they went live.