What is open data, again?

Post: 27 February 2016

Many of my blog posts relate to open data in one way or another, and for the sake of brevity I tend to assume a basic knowledge of what “open data” means. But I still see a lot of confusion and misunderstanding, particularly in organisations that are still getting up to speed with public transparency and the digital economy.

Last week for example a Gartner analyst wrote this:

Many tend to equate open data with public data, given its original definition. However, data can be defined as open when it is machine-readable and is accessible through an API. This can apply to potentially any data that needs to be processed: whether it is public, discoverable through Freedom of Information Act requests or restricted (for example, covered by privacy laws).

Which is wrong on a number of points.

So what is “open data”, again?

As with any technical term there are legitimate variations in the definition of open data, but there is now a virtual consensus among practitioners that open data must conform to the requirements of the Open Definition.

The 5-star deployment scheme for open data proposed by Tim Berners-Lees and the ODI’s Open Data Certificates process are also useful for thinking about whether a dataset is open.

Boiling the Open Definition down to essentials, open data must be:

accessible
legally re-usable (with minimal restrictions)
machine-readable
available in an open format

Open data is accessible

Open data is legally re-usable

An open dataset is almost always released under an open licence, such as the Open Government Licence (OGL), the Creative Commons Attribution License (CC BY) or the Open Database License (ODbL).

As we can see from two of the above examples, the open licence need not be written specifically for datasets.

An open licence must permit use of the data for any purpose and must not impose a charge. See Section 2 of the Open Definition for a full list of required permissions as well as some acceptable conditions, such as attribution.

Sometimes open data is available without a licence, because nobody has legal standing to assert copyright or database right over the dataset. This is quite common in the US, where most works published by the federal government are not protected by (domestic) copyright law.

There are also a few examples of non-licensed open data in the UK, such as the Free Company Data Product available from Companies House. This dataset is open because neither Companies House nor any other party claims it as their intellectual property, and it otherwise meets the requirements of the Open Definition.

Image credit: open data (scrabble) by Justin Grimes (CC BY-SA 2.0)