Open Data – Everyone’s (Housing) Property (Values)

by Claire Suh

July 23, 2018

If you’ve made it here, I don’t have to tell you about the great resources Western Pennsylvania Regional Data Center (WPRDC) has made available (with excellent documentation and tutorials to boot!). I was charged with using the Market Value Analysis (MVA) data from the Urban Redevelopment Authority (URA) to analyze at sale structures and market strength. This dataset is a useful tool to understand real estate markets in municipalities across Pittsburgh, and is particularly helpful in identifying types of distress or conversely, value, that characterizes an area. It also provides insight into market strength and weakness using Census block groups. These insights can be applied when considering investments or interventions. Below, I outline the steps I took to do some initial analysis, followed by my findings.

I downloaded the MVA data from WPRDC’s website and opened it up in R. Like any success-seeking data analyst, I began by inspecting the data. What am I looking at? What’s being measured? What is the unit of measure? This is something you can choose your adventure for, and I chose to look at summary statistics, basic plots to help me visualize, and some quick cross-tabs to see if there were any trends.

This first order analysis yielded nothing too complex – I simply wanted to visualize what the median residential sale price was per 2016 MVA cluster. You can see a few things from this graph, but most clearly, property clusters in the beginning of the alphabet have higher value than those towards that follow.

Next, I was curious whether there were any trends, correlations, variations, or outliers that I might find in this initial analysis. I wanted to look at the clusters visually, and thought it would be interesting to see if there were any trends in housing prices. This graph was meant to support the trend towards clusters earlier in the alphabet having higher median value, but also shows that variance in median sale price is pretty similar across all clusters, with a trend towards higher variance for lower median value sales.

At this point, I felt like this dataset alone was a good starting point, but could benefit from additional information. This was the impetus for making me wonder what else could be useful to supplement MVA to put together a robust picture of Pittsburgh housing. In the next post, I will get into some other sources of data available publicly, and how to use them.