Commonly-cited principles hold that open data is a complete set of primary data made easily and permanently available in a timely fashion using electronic, machine readable, open file formats. Cost should not pose a barrier to accessing information, and no unreasonable restrictions should limit accessibility, sharing and re-use.
We have a tutorial that shows you exactly how to download and open a dataset in your favorite spreadsheet software. If you get stuck, we encourage you to reach-out in an email, phone call, or by attending one of our upcoming training or office-hour events.
There also may be a tool that enables you to use data without having to download it directly from our site. A full list of tools can be found on our web site.
We realize that there is a lot of data that isn’t available through the Regional Data Center. We’re always working to add more to these web sites, but want to help you get the data you need.
There are three primary ways you can request data:
There is also a FOIA web site being to manage making FOIA requests of the federal government.
We encourage our publishers to prepare data dictionaries when publishing tabular datasets. Data dictionaries include field names and definitions of the fields, and sometimes include information on data types, field length, and other details about the records. These dictionaries generally can be found below the table view, and look like this:
One benefit of this form of data dictionary is that, when you look at the table view of the data on our web site, the value from the "Label" field will replace the field name (the "Column" value). Also, if there is a "Description" value (which should contain a definition of the field and any additional notes), there will be a blue "Information Symbol" (circle with an "i" in it) in the table header, next to the field label. If you hover your cursor over that Information Symbol, you will get a pop-up box containing the field's description:
In some cases, data dictionaries can also be found as separate downloadable files on the dataset page.
The Regional Data Center’s open data portal can accommodate many different types of data formats. Data tables, text documents, geographic files, HTML/hyperlinks, archives, images, and even sound files may be found on the open data portal.
One of our preferred tabular data formats is comma-separated variables, or CSV. We like CSVs because they are an open file format that is compatible with many software programs. Other common tabular formats found on the site include JSON (JavaScript Object Notation) and XLS (Excel).
Common geospatial data formats include ESRI shapefiles (all components in a .ZIP archive), GeoJSON, KML, and even links to ESRI REST endpoints on other servers.
Other image or document files on the site include PDF files, GIF, and JPEG formats. A full list of file types available on the Regional Data Center can be found on the open data portal’s side panel (desktop version), or under the “Filter Results” menu (mobile version).
We believe context is critical to effectively use open data. We have been writing data guides for some of our most-used or complex datasets. Where data guides haven’t yet been produced, we encourage you to submit a question through the open data portal, or contact us or the data steward listed in the dataset’s metadata.
If there is a tabular view for it, you can filter the Data Table view down to a more manageable number of records if you can pick a category that contains a desired subset of records (e.g., use the MUNICIPALITY field to pick the records for one or two municipalities in the entire county).
Note that you can add more filters. Adding more filter values for a given field will combine together all the results of the individual filters (that is, the filters will be ORed together, yielding more records). Adding new fields will further filter the results (the filters will be ANDed together, yielding fewer records).
Feedback from users is always welcome, especially when it can result in improvements to data quality. The easiest way to let us know about an issue is to post a comment on the dataset page on our web site. We monitor comments and will forward any relevant details to the data steward or other appropriate contact. You are also welcome to share details with us in an email, or reach out directly to the data steward listed in the dataset’s metadata.
We take privacy seriously here at the Regional Data Center, and weigh the benefit of sharing the data against the harm caused to an individual if data is shared. We are reluctant to publish personally-identifiable information, and also will not publish data protected by legislation such as HIPAA and FERPA.
We include materials on privacy in our new publisher trainings and encourage organizations to develop internal privacy review processes before sharing data. We will work with our publishers to aggregate some sensitive data to enable publication, and are also available for consultations, or can arrange a conversation with outside experts. We also provide a final privacy review for each dataset at initial publication.
The Berkman Klein Center for Internet and Society at Harvard University published a helpful Open Data Privacy Playbook in February 2017, and is an excellent resource to learn more about the issue of privacy when it comes to open data.
You can also review our privacy policy on our web site to see how we handle things such as email and IP addresses in our work.
There are many places to turn when looking to build your data skills. Our Data 101 classes developed in collaboration with the Carnegie Library of Pittsburgh provide a foundation for beginning data users, and teach basic data and statistical literacy concepts using paper and group activities. Trainings are also offered through the Regional Data Center, and also through our partners the Carnegie Library and many other organizations. We post a wide range of training opportunities in our events calendar, and also encourage data users to view the tutorials found on our web site.
Just let us know. We reply to phone calls and emails. You’re also welcome to stop by one of our office hours or other events, or schedule a consultation. Librarians are also a great place to turn for help in your own neighborhood, and our partners at the Carnegie Library of Pittsburgh and the Allegheny County Library Association have also encouraged their librarians to provide data services.
Let us know in an email or call, or by attending one of our data user group meetings or office hours to share your idea in-person. We can help you frame your idea and give you suggestions in how to take the next step. We also highly encourage you to join a community of other data users at a civic organization such as Code for Pittsburgh, or by participating in a hackathon, data dive, or other type of event.
Yes – we regularly are asked to come to events to talk more about open data and share the story of the Regional Data Center. Please let us know if you'd like us to speak at one of your events, join you for a meeting, deliver a guest lecture in your class, or hold office hours in your community.
We are always happy to share more about our project with people from other cities, and welcome the opportunity to talk with people interested in our model. We are happy to schedule a call to talk more about our work, and suggest you first read through the information we’ve assembled about the project in order to make the most out of your conversation with us.
Any nonprofit organization, public-sector, public authority, or academic institution is welcome to share data through the Regional Data Center’s Open Data Portal.
Organizations share open data for many different reasons. In some cases, the organization is obligated to share data as the result of a law, ordinance, mandate, or directive. In other voluntary situations, organizations are looking to increase trust and transparency, enhance public participation, improve collaboration, or inform key community issues. One often overlooked benefit includes enhanced efficiency – many of our publishers share data through the open data portal to minimize the number of external data requests that staff must manage.
Before a new publisher can share information through the Regional Data Center, the following steps must be completed.
We do not want cost to be a barrier to sharing data. For that reason, we do not charge fees to our data publishers. We do encourage our partners to become stewards of the Regional Data Center, and hope they will support us in our fundraising efforts.
The Regional Data Center can accept information in a number of different file formats. We are biased toward open file formats, such as CSVs for tabular data.
Not all data can be open data. Our data deposit agreement contains a detailed listing of the types of information that we will not publish, including:
We take privacy seriously. Even if information does not meet the criteria listed above, we will weigh the value to users against the harm that may be caused by sharing the information, and make a decision about whether or not to publish.
Even if data is too sensitive to publish as-is, we are happy to work with our publishers to aggregate or de-identify data enabling it to be released as open data.
Some organizations have had a tough time deciding what type of data to share. Looking at how others prioritized their data releases can help you in your decision. Here are a few ideas to get you started:
A great place to start is to look at a list of the data requests your organizations have recently received. These may consist of both formal “Right to Know Law” requests, and informal requests from external partners and residents. Proactively sharing data can lead to enhanced productivity and greater transparency. Staff will spend less time posting frequently requested data to the data portal once rather than repeatedly sharing it by request.
It’s very likely that people in your organization are already sharing data with each other, often through highly-inefficient methods (such as e-mail attachments). One of the benefits of an open data portal is that it allows for efficient sharing of information. If this information can be shared publicly, loading it to the open data portal will break down information “silos” allowing for more efficient data sharing and access by all members of the organization.
Data can support internal priorities of your organization. If a dataset can support a key business process, facilitate collaboration, or can be used to inform important decisions, it is probably a good candidate for release as open data. Also, if your organization measures its performance through a series of indicators, you may also want to publish this information for others to see.
Your data may also have value to others in your community. Information from your organization can be essential to improving the lives of your neighbors and informing the work for your community partners. For example, property information shared by local governments often helps community development organizations understand market conditions and target programs. If your data can support the work of others in your community, consider sharing it with them. As part of this process, it can be helpful to ask others for recommendations on data they’d like your organization to share.
Open data programs have been in place around the U.S. for nearly a decade. Organizations in other communities, or counterparts in Western Pennsylvania can provide you with inspiration. If there’s an organization similar to yours that is effectively using information in their work, see if there’s anything you can emulate. Imitation is the sincerest form of flattery.
Some organizations are required to share information with the public. If this is the case for your organization, an open data portal provides an easy way to share information in a way that makes it easy for it to be used by others.
Update frequency is an important consideration after deciding what data to share. Some of the factors that often go into this decision include the difficulty or time required to prepare the data update and the degree to which the data informs important organizational or community initiatives and critical business processes. Publishers often take an incremental strategy in updating data, where they assess how the data is being used before determining how often to refresh the information on the data portal. However your organization decides to proceed, we encourage you to edit the “update frequency” in the metadata to let users know when to expect an update.
Organizations that are experienced in open data often develop a publishing calendar. If you’d like to stay on top of your open data publishing, a calendar can help you organize your publishing efforts. Sharing your open data calendar can also be a way to let your data users know when to expect a data update.
If feasible, the Regional Data Center will work with publishing partners to automate data updates through an “Extract, Transform, Load” (ETL) process. ETL processes are in place for several datasets on the portal, and allow for near real-time data updates. If you think one of your organizations’ datasets is a good ETL candidate, please let us know.
Don’t worry if the quality of your organization’s data is not perfect. Users often provide feedback that will result in improvements to the quality of your data. We will work with you to make sure your data doesn’t contain sensitive information before it’s published. We’ll also help you document the condition of your data so that users know exactly what they’re working with, warts and all.
Yes. We are very interested and looking for partners willing to help us understand what capacity and partnerships are needed to achieve positive outcomes in communities far from our offices on the University of Pittsburgh’s campus. If you want to work with us, we’d love to talk with you.
Metadata is a structured framework for documenting data. Some people like to say it’s data about data. It’s essential if anyone hopes to find and use your data. Metadata appears with every dataset on the open data portal. We invite you to learn more about our metadata standard and how it was developed.
Open data policies are a great way for communities to institutionalize open data through a legislative framework. Our friends at the Sunlight Foundation have developed expertise in helping local communities develop open data policies. We encourage you to check out all of the resources available on their Web site, which include sample policies, guidelines, and tools to help you craft and get feedback on policies of your own. If you’d like to speak with someone from Sunlight, feel free to reach-out directly, or we would be happy to make an introduction.