What if I notice a problem or error in a dataset?
Feedback from users is always welcome, especially when it can result in improvements to data quality. The easiest way to let us know about an issue is to post a comment on the dataset page on our Website. We monitor comments and will forward any relevant details to the data steward or other appropriate contact. You are also welcome to share details with us in an email, or reach out directly to the data steward listed in the dataset’s metadata.
What if I need more information about a particular dataset?
We believe context is critical to effectively use open data. We have been writing data guides for some of our most-used or complex datasets. Where data guides haven’t yet been produced, we encourage you to submit a question through the open data portal, or contact us or the data steward listed in the dataset’s metadata.
What am I legally allowed to do with the data on the open data portal?
Data License: The second piece of the infrastructure involves the data license. Data licenses are important in that they allow data users to clearly understand conditions under which data can be re-used. The Data Center asks each data provider to either assign a license to each database they share, or explicitly assign the database to the public domain. Data providers are also asked to provide a default license to be assigned by the Data Center if no license is specified at the time of publication. If you’re interested in learning more about the individual licenses you’re most-likely to encounter on the Data Center’s open data portal, please refer to our data license resources. Some licenses require data users to provide attribution. Please see our guide to attribution and citation if you’d like to learn more about how to provide proper credit if using data from the Data Center’s open data repository.
If you’re using a database whose license requires attribution, please see information included in a separate section of this FAQ if you’d like to learn more about how to provide proper credit for data.
If you’re interested in learning more about the individual licenses you’re most-likely to encounter on the Data Center’s open data portal, please refer to the following licensing frameworks:
What’s the difference between attribution and citation?
- Attribution is often a legal condition of use required by copyright or licence. Some of the licenses (e.g. Creative Commons Attribution, or CC BY) include attribution requirements. The Data Center requires publishers to specify a license for each database. It’s good practice to view the data license (found in the metadata) before using data from the data repository.
- Citation is not a legal requirement but considered a good practice. Even if a data license does not require attribution, the Regional Data Center encourages the use of citations.
How do I create a proper citation or attribution for data?
Proper attribution or citation:
- credits the publisher
- uniquely identifies data and its provenance
- promotes discoverability by others
- honors any licenses associated with the data
- adheres to a specific citation format
In general, the following content should suffice when referencing data from an open data portal:
- database title
- author and/or publisher
- publication date
- provenance information (e.g. access date, and URL)
- data license
Consider the following example, formatted in APA Style:
- Allegheny County Office of Property Assessments, Department of Administrative Services . Property assessment parcel data (as of July, 14, 2015) [Data set]. Licensed under CC BY. Retrieved from the Western Pennsylvania Regional Data Center on August 12, 2015 http://data.wprdc.org/dataset/property-assessments.
In this example, the specific data set is cited, the publisher is named, publication year is listed, license is provided, and its provenance is clear from the date of access and a direct link to the source.
If incorporating open data into an application, it’s also good practice to provide a link to the original source data. Some people have even designed open-source icons that can be used to direct users of a digital tool to the original source of the data.
Creative Commons has provided a series of attribution examples for different types of media on their Website.
Which datasets are scheduled to automatically update?
We learned early-on about the importance of automating the data publishing process. People get busy, go on vacations, and switch jobs, so taking the human element out of the process can result in a more predictable and efficient publishing workflow. We work closely with our publishers to automate their publishing processes, and have developed an automated publishing toolkit to make our processes more efficient.
The framework for automating a publishing process is commonly referred to as “Extract – Transform – Load” (ETL). We have added “_etl” as a tag to each of the datasets we automated through our own internal processes. Using “_etl” as a search term on the open data portal will provide you with a current list of datasets that are published through an automated process. The publishing frequency for each dataset is available in the metadata record for each dataset.
We also harvest the geographic data catalog from Allegheny County and the City of Pittsburgh‘s ArcGIS Open Data Portals on a weekly basis. We did not want to impose a separate publishing process for each of these organizations, and were able to instead set up a workflow where they publish to their own repository. We harvest links to their datasets on a weekly basis.
What technology do you use?
We are fans of open source technology, and use many different products in our work. To manage the open data portal, we chose CKAN as our software following a selection process in 2015. We use Carto in many of our mapping tools, and post our code to our GitHub repository. We also use many other software products in our work, including Python, PostgreSQL, django, Leaflet and many more…
Where do I find more information on the Application Program Interface (API)?
Documentation for the CKAN API can be found online. The documentation includes both tutorials and sample code.
- What if I need help using the Regional Data Center’s API?
How do I access your integrated property data API?
We developed a property data API that combines many datasets having a common parcel ID. The API is used to power several tools, including the parcel downloader, property dashboard, and Burgh’s Eye View parcels. If you’d like to integrate this API with your data or systems, please let us know, and we’ll help to make it happen.
I built something. Can you let the world know more about it?
We would love to share your data stories and tools. These stories routinely are featured in our newsletter, our data showcase, and on Twitter. User stories also provide inspiration for other users and are also invaluable in our fundraising efforts. If you have a story, send an email, or better yet, give us a call.
Also, feel free to add your contributions to the list of relevant tools and analyses that we maintain on GitHub.
This dataset is too big for me to work with. How can I download only part of it?
If there is a tabular view for it, you can filter the Data Table view down to a more manageable number of records if you can pick a category that contains a desired subset of records (e.g., use the MUNICIPALITY field to pick the records for one or two municipalities in the entire county).
- Above the Data Table, click on the “Add Filter” link. (Data Tables on the landing page of a dataset may not have those link. In this case, find the corresponding data resource under “Data and Resources” and click through to find the filterable Data Table.)
- A “Select a field” dropdown appears above the “Add Filter” link. Select the field you want to filter on from this dropdown.
- A new field-values dropdown will be generated below the selected field. Select a value from the dropdown. (For very large tables (millions of records), there may be some delay before the field-values can be gathered and presented in the dropdown.)
- The Data Table will update, with text to describe the filtered view (e.g., “Showing 1 to 10 of 22,692 entries (filtered from 4,302,279 total entries)”).
- You can now sort and browse the filtered view.
- More importantly, you can now download the filtered view by clicking the Download button and then selecting your desired file format.
Note that you can add more filters. Adding more filter values for a given field will combine together all the results of the individual filters (that is, the filters will be ORed together, yielding more records). Adding new fields will further filter the results (the filters will be ANDed together, yielding fewer records).