Lessons Learned

by Bob Gradeck

May 27, 2015

For over ten years, the University Center for Social and Urban Research has served the Pittsburgh area as an information intermediary. Information intermediaries help people find and use information to improve the communities they call home. The Western Pennsylvania Regional Data Center is one in a long line of initiatives of the University Center in this intermediary role. From 2005-2014, the Center operated the Pittsburgh Neighborhood and Community Information System (PNCIS), is home to the Pittsburgh Today regional indicators initiative, and recently launched the Southwestern Pennsylvania Community Profiles community indicators project. We’ve also been fortunate to be part of the National Neighborhood Indicators Partnership (NNIP) since 2008. NNIP is a network of over 30 data intermediaries from across the country, and is organized by the Urban Institute.

In our work with the PNCIS, we would collect data from government agencies, and primarily use it with government and nonprofit partners to support community development efforts in the City of Pittsburgh and Allegheny County. Much of the data we needed would be provided through a handshake agreement, and sent to us via email attachment or through the mail on a CD. The data would often require extensive cleanup, and none of our data processing and geocoding processes were automated. With our limited capacity to clean data, some datasets were updated only once or twice per year. While we would ask data users to sign a user agreement in which they accepted liability, we lacked formal data sharing agreements with most of our data providers, and none of the data came with a license.

We would then load data to a GIS server. Users who signed our legal agreement would receive a password to access the online GIS Website. Users could use the online map interface to make maps, however the site lacked capability to produce graphs or other non-map visualizations. If users wanted to download data, it would have to be done 2,000 records at a time so as not to overwhelm our server. As our largest dataset had over 500,000 records, we would ask our users looking to get a copy of full datasets to stop by the office to pick-up a CD or bring a thumb drive to receive a copy of the data.

Despite all of these challenges, the PNCIS was able to support over 900 users in their work, and we’ve developed so many great relationships along the way. Some of our success stories were outlined in a 2014 guest blog post written for the Sunlight Foundation. These challenges are not unique to Pittsburgh. Most of our NNIP colleagues in other cities also have faced similar difficulties in their work..

After reflecting on all of our work for the past decade, we’ve developed several observations related to data sharing. This experience has informed our plans for the Data Center.

  • Demand for data is growing, and more and more data is becoming easily accessible each day. Most users will need help determining which data to use in their work and how to use it appropriately. The role of a data intermediary is even more important than ever, and we’re the first open data project we know of to explicitly tie the work of a data intermediary to a government’s official open data program.
  • People want to be able to easily download an entire dataset, or use data a variety of tools to visualize data on their own terms. Thankfully, just about every open data portal allows for easy data downloads. As part of our data portal technology selection process, we focused on making sure the Data Center’s open data portal could connect with many popular tools such as Tableau and CartoDB, and provided a robust API allowing data to be easily-ingested by other tools.
  • Data owners aren’t usually great publishers, and very little public data comes with proper documentation. For this reason, the Data Center worked with Digital Scholarship Services at the University Library System to identify a descriptive metadata schema and training materials for data publishers. We’re also hoping publishers and the data user community can contribute to shared documentation through our data user guides.
  • To become truly useful, data needs to be updated as often as possible. Project staff and government partners working with the Data Center are investing time automating the data publishing process for key datasets by establishing several Extract-Transform-Load (ETL) processes. Code for America’s Dave Guarino has greatly influenced our thinking regarding ETL. At launch, many datasets are already refreshed on the open data portal on a daily or weekly basis through automated processes, and we’ve already tested at least four different ETL approaches.
  • People don’t usually talk to each other about how they’re using data. So many innovative and effective uses of data are never shared with other users. For this reason, we hope data users participate in Data Center data user groups and codify tacit knowledge through data user guides.
  • Infrastructure is often an afterthought, and the open data portal is really only one piece of what’s needed to foster effective data sharing and use. Matt Burton of the University of Pittsburgh’s School of Information Sciences talks about infrastructure as being all about people, and he’s right. University, City, and County legal counsel have created a scalable legal infrastructure allowing for multiple organizations to share data, Proper documentation and automating data publication mentioned earlier are also key pieces of our infrastructure, and don’t happen without the efforts of talented, hard working people.
  • Problems cross borders, but data often doesn’t. Data can play a role in addressing many of our community’s greatest challenges. In addition to providing a data sharing infrastructure to numerous organizations across Western Pennsylvania that lack capacity, the Data Center will also play a role in developing and identifying data standards, and establishing cross-community conversations about data.

We’ve learned a tremendous amount in our role as an information intermediary over the past ten years. We’re glad to be able to put a lot of the lessons we learned into building a better infrastructure for sharing information in the region, and hope this blog post can help explain why we’ve structured the initiative in the way that we have. We hope you can be involved in the work of the Data Center, and look forward to working with you.