Inside Our Data Portal Selection Process

by Bob Gradeck

October 26, 2015

There are many different models communities use to run their open data portals. Some opt for a software as a service solution, relying on a vendor to host data and manage the technology. Several vendors are in this space, some offering a proprietary solution, and others built on open source software. Some communities choose to host their open data portal internally using open source software, or even (like Albuquerque, NM) posting data to a FTP server. The implications of our choice could be dramatic on the operations of the Western Pennsylvania Regional Data Center. With so many different options available to us, we thought it best to develop a structured process to determine what technology and hosting model was the best fit for the Regional Data Center.

Several years ago, when looking to update online mapping tools used for the Pittsburgh Neighborhood and Community Information System, we spent some time learning about how to structure a software selection process. We started by reading a number of online resources and captured this information in a brief summary. This framework was still relevant to us a half-decade later.

Like most large institutions, the University of Pittsburgh has a process for buying products and services. The challenge for us was that we weren’t sure if we wanted to even buy something. Developing the data portal in house was an option we wanted to explore alongside all others. At the outset we talked to our University purchasing managers to determine the best way to align our discovery process with the University’s purchasing process. We weren’t buying a commodity, so we didn’t want to be tied to the lowest bid if we did decide to purchase a tool. The advice of our purchasing managers was to start by looking at all of our options without soliciting bids. By doing so, we’d be in a much better position to move through the purchasing process in a way that was structured around the needs of the Regional Data Center and our users, and not the lowest price.

We didn’t want to make this decision in isolation, and wanted to include our partners (both publishers and users) in the learning process. We’re grateful to have had a number of people from the University of Pittsburgh, Allegheny County, City of Pittsburgh, Carnegie Mellon University, Open Pittsburgh (our Code for America Brigade), and U.S. Open Data involved in our selection process. They not only helped us learn about the available products, they also helped us learn more about our needs. The conversations around software selection also influenced our thinking about how to structure the Data Center’s staffing, programs, and activities.

To structure the software selection process, we issued a Request for Information (RFI) on December, 15, 2014. In this open request, we asked respondents to tell us why we should consider their product for use with the Data Center. To minimize burden on both reviewers and firms, we asked respondents to limit their reply to our questions to eight pages (with no limit to appendices). We didn’t ask about price in order for our process to focus on what best meets the needs of our partners and users and remain compliant with the University’s purchasing process. We also shared questions and answers from potential respondents openly to maintain a fair and transparent process. The RFI was sent to many of the firms we knew of that are involved with open data, and also was shared on our Website and redistributed through social media with help of some of our national partners.

We developed a set of criteria that we asked our review committee to use in their evaluation. A full description of each are included on our guide for reviewers. These criteria included:

  1. Impact on Sustainability
  2. Functions and Features
  3. Performance and Reliability
  4. Usability
  5. Vendor Experience and Capacity
  6. Other Criteria

We received 12 replies to the RFI, and were very impressed with the quality of the responses. We shared these responses with our review committee and asked them to test-drive the products if in-use elsewhere before we had our initial conversation. In this first meeting, the committee discussed each response in line with the selection criteria and developed a short list of seven firms to receive an interview. The committee also developed a list of questions that served as a basis of conversation with each vendor.

Phone interviews with each of the firms making the second-round of consideration were conducted by the Regional Data Center project manager. Each of the interviews lasted approximately 45-60 minutes. Notes from each conversation were shared with the committee prior to the final selection meeting where a decision was made. To ease the burden of scheduling interviews, we let respondents know that we were holding several blocks of time for phone conversations as soon as their initial submission was received. By planning ahead, we were able to complete all seven interviews and share notes with our review committee within two weeks of our initial selection meeting.

The timeline from start to finish took approximately 2 ½ months from release of the RFI to the point at which the technology selection was made. Key milestones include:

  • December 15, 2014 Release of the Request for Information
  • January 13, 2015 Close of Q&A submissions/responses
  • January 20, 2015 Submissions due
  • February 16, 2015 1st selection committee conversation (select short list)
  • February 16-27, 2015 Short-list interviews
  • March 2, 2015 2nd selection committee conversation (final selection made)

We wound up selecting CKAN for the Regional Data Center’s open data portal. Using open-source technology allowed us to leverage the server infrastructure at the University of Pittsburgh’s Network Operations Center, and was the choice to put the project on a path to long-term sustainability. We felt the product met the needs of both our publishers and users, scaled at minimal cost, fit the structure of a regional initiative, allowed for customization (including integrated metadata), and has a sizable development community. We did configure the site using internal staff, and later hired the team of Accela and Ontodia to help with the initial configuration, train our staff, test ETL processes, and provide coverage through staffing transitions.

There are many great solutions for open data out there. CKAN may or may not be the right solution for other cities, but all communities can benefit from a selection process that involves users and partners. Our process allowed us to learn a great deal about our needs, and we’ve been happy with the way our choice has turned-out.