So You Want to Use Open Data


by Albert Lin

April 11, 2016

This is the first of a series of posts for new users of the Western Pennsylvania Regional Data Center. We’ll also be covering basic free tools to help you get started for telling stories with data and also how to share that. Have something you’re interested in? Let us know, we want to hear from you!

Introduction: What is Open Data?

So we have this snazzy new tool that you may have heard buzz about, but you might be wondering to yourself, what is open data? Commonly-used principles hold that, open data is a complete set of primary data made easily and permanently available in a timely fashion using electronic, machine readable, open file formats. Cost should not pose a barrier to accessing information, and no unreasonable restrictions should limit accessibility, sharing and re-use.

What’s the Western Pennsylvania Regional Data Center?

Many government agencies have decided to go it alone and create and maintain their own open data portal, which is an online repository for open data where people can discover, download, and get information about available data. Here in western Pennsylvania, we’ve decided to create a shared, collaborative platform that will provide open data capacity to public agencies and nonprofit entities throughout the region. We acknowledge that many of the problems we face as a region cross borders, and solving them requires information from many different organizations.  To that end, we’ve launched the Western Pennsylvania Regional Data Center on October 15, 2015.

The Regional Data Center is housed at the University of Pittsburgh and maintained by the University Center for Social and Urban Research. It is closely supported by Allegheny County and the City of Pittsburgh, and is funded through support from the University, the Richard King Mellon Foundation, and The Heinz Endowments.

What’s Available?

We currently have about 130 data sets published to the open data portal by Allegheny County, the City of Pittsburgh, and the University of Pittsburgh. These data sets cover 16 broad categories, including Demographics, Housing and Properties, and Public Safety. To view the complete list of available datasets, proceed to the data center.

How Do I Access the Data?

There are three major ways to discover data on the Regional Data Center’s open data portal.

  1. The first method of finding data on the open data portal is searching for it by keyword using the search box on the front page of the Regional Data Center’s Website, or on the main dataset page on the data portal.
  2. The second method of discovering data involves browsing the offerings by one of the 16 groups or topics on the open data portal. Each dataset appears in at least one or more groups or topics, and it’s possible to perform a keyword search within each group or topic.
  3. The third main way to find data is to view a list of all datasets by publishing organization. After selecting a publisher from our main publisher page, you’re then able to view and search their offerings on the open data portal.

Please note: the first time you access the site, you’ll be presented with the Terms of Use. Scroll through, and after agreeing to the terms, you can begin browsing the datasets.

Using the three discovery methods described above, it’s possible to find an individual dataset such as the City’s Departments of Permits, Licenses, and Inspections (PLI) Violation Report.

Under the keyword search method (#1), entering the word “violation” in the search box on the front page will provide a search result containing the violations data. The data also is included on the “Housing and Properties” tab on the groups/topics page (search method #2). The data also appears in the list of data published by the City of Pittsburgh (search method #3).

Housing___Properties_-_WPRDC

You’ll notice that under the description of the dataset in the search results, the available data formats are listed. In this case, we see that the PLI data is available in both CSV and there’s also an HTML link that will take you to the violations search page maintained on the City’s Website. There are many types of available data formats on the data portal. In the case of tabular data, you will general see that there are both CSV and XLS files, both compatible with Excel. You can use the CSV file to also access the file in many different programming environments, like R and Python. In the case of geospatial data, you will see files available in Esri REST, GeoJSON, CSV, and KML files.

Datasets_-_WPRDC

When you click on the data set, the main page for the data set shows the different file types available, as well as the metadata information, which describes the data set. The metadata record contains information about the data set including which department maintains the data, when the data was first published, when it was last updated, and how often it is updated. More information about the Regional Data Center’s metadata format is available on the Website.

311_Data_-_In_Development_-_Datasets_-_WPRDC_1

At the bottom of the page, you will also see a comment thread, where users of the data portal can ask questions about the data, share details in how they’re using the data, and report any errors they may find in the data, or suggestions for improving the quality and usability of the data.

311_Data_-_In_Development_-_Datasets_-_WPRDC_2

To download the data, you can click the “Download” button next to the file format you want to use. The download will begin automatically If you’ve downloaded a CSV file or an Excel file, your computer may be configured to open this file directly in Excel or another program. Additionally, you can begin to use this data following the download in statistical analyses packages like R and Stata.

publicdownload

If you’re interested in connecting to the data using an API or the OData connector, click “Explore” and then select preview. Once on the Preview page, you can click the “Data API” button at the upper right that has all the information needed to connect.

311_Data_-_In_Development_-_311_Data_-_WPRDC_2

Additional Features

If you don’t see a dataset that you want, let us know! We’re constantly working with our partners to add more data to the Data Center, and your input helps us develop our priorities on what to try to add next. Find more information about data requests here, and begin your own request here. To make a data request, you’ll first need to register for an account on the Regional Data Center’s open data portal, and be logged-in on the Website.

Also, for more context beyond the metadata for the datasets, we’re working to create data user guides to give future users a better sense of how a particular data set is used, how the data was collected, suggested applications, and other information that might help future users. You’ll find data user guides under the files at the top of a dataset’s page. Learn more about data user guides, and let us know if you would like to contribute.

Finally, check out our Showcase for projects that have been created by our partners using open data!

Video Guide