The Three Constituents of Open Data
Socrata has spent the majority of the last three years focused on understanding the consumption side of the data publishing equation. We’re passionate about making data accessible and comprehensible to the widest audiences possible. Our work in this area has led us to a classifying the kinds of consumers of data – a taxonomy of data consumption if you will.
There are three major constituent groups of people who consume data:
The Non-Technically Trained But Nonetheless Interested. In a retail analogy, this is the 7-Eleven shopper. This is the ad hoc class of consumers of data. They are convenience driven. These people are not programmers or DBAs with extensive training in data analysis. They are mainstream people, including students, who perhaps most regularly use Facebook, Excel, Word, PowerPoint and GMail. Their interest in data is often temporal. They want to look up how much ARRA money is being spent in their neighborhood. They want to know when was the coldest year on record. Or perhaps how many wolves live in Yellowstone National Park. They want to know how their senator voted on the lastest bill. Their mental picture of data varies from person to person and dataset to dataset. When asked “what does data look like?” one might say a table, another might say a graph or chart; another might say it looks like a map; another would say it looks like the search results on Yelp or Linked In; yet another might say it looks like the closing stock prices of the Wall St. Journal. In order to comprehend data, they want to at least absorb and digest it and preferably sort, filter and search through it. The key to this group’s positive data consumption experience is that it needs to be interactive and visual. Because their needs are so diverse, it’s the hardest group to satisfy well.
Programmers. This is the Radio Shack shopper. They want to build things with data. Technically speaking, they’d rather not consume data, but rather they prefer to consume an API – an application programming interface – that “points to” data. Providing bulk data in download format is actually a burden to this group. Giving them the raw data imposes upon them to find a place to store the data – like a relational database. Providing data in bulk imposes upon them some method for keeping the data current. They are writing a program or mashup they hope endures for a quite some time. Write once, run forever. This group is interested in a consistent API from one dataset to another. Providing data in bulk imposes upon them to create their own API for accessing the data once they’ve stored it and figured out how to keep it up to date. What they really want is access to data through an open, standards-based REST API designed for consuming data programmatically. API enabling data isn’t particularly hard, but it does require some deliberate design, effort and execution. And of course, if thousands of data publishers expend the energy and effort to offer home grown APIs not based on open standards, the result will be an entirely different frustration for programmers – dealing with thousands of different variants of APIs, which ultimately means the bar will be too high for most programmers to bother writing programs that make interesting use of public data.
Analysts, Researchers, Scientists and the Media. This is the Costco shopper. They want data in bulk, machine-readable formats like XML, CSV, XLS and JSON or maybe even RSS or RDF. Often they want multiple datasets from multiple sources so they can pour them into their own analysis system. They want to mine the data, looking for undiscovered meaning, hidden and as yet untold truths. This is the domain of investigative journalists. This is the easiest group to satisfy, as the easiest way to share data is make a CSV or Microsoft Access file available.
The open data movement is good for us all. It will take time, but eventually it means that government will run more transparently and better. Maybe even businesses will someday too. It means that new insights from a plethora of public data sources will be formed. But the bar for sharing data has been raised. It’s simply no longer acceptable to publish a circa-1996 five-page web page full of caveats, disclaimers and instructions for decoding encoded data, at the bottom of which page there is a link to download a 17MB Microsoft Access file. The new bar for sharing data is to publish data in way that is the most accessible and the most comprehensible to the widest array of audiences by ensuring that all three core data consumption constituent groups are adequately represented.
So what’s your role in open data? It’s simply to raise your voice for your constituent group. Are you civic-minded but not technically trained? Demand that public data be shared in interactive ways that allow you to sift through it in real time, without requiring a download. Are you a programmer? Push for API access to data. Tell data publishers about SODA. Don’t accept a download. Are you a scientist, researcher, analyst or part of the media? Ask for bulk, machine-readable access to data in the format that’s easiest for you to consume. Data publishers need to hear from you.
Where To Find Socrata in the Community