Does open data need a licence?

Post: 17 February 2015

According to the Open Definition, open data “must“ be available under an open licence.*

But what about datasets that are also effectively free to access and reuse because the producer does not assert any intellectual property rights over the data? Are those datasets “open data”, and if not why not?

image

An example: Companies House’s Free Company Data Product, which contains basic data about all live companies on the public register. This dataset is released monthly, but without licensing information.

Last year a Companies House representative provided this clarification (by email to Robert Whittaker):

I can confirm there are no restrictions on the use of the data provided within the Free Company Data Product … As public information, you are free to use it however you wish.

And in a Freedom of Information response (to Stuart Harrison):

CH does not make public information, including the Free Data Product, available under the Open Government Licence (OGL)….
Most of the material on the companies register, with the exception of a small category of material which is exempt from statutory disclosure requirements, is “public information” that the Registrar of Companies is under a statutory obligation to make available to the public. Where information on the register is supplied by a company or someone acting on its behalf, any copyright in that information belongs to that company (or its agents). It does not belong to the Registrar. The Registrar supplies this third party copyright information to customers under authorities given to him under s47 and s50 of the Copyright, Designs and Patents Act 1988 and Schedule 1 of the Database Regulations (SI 1997/3032).
CH places no restriction on how the information is used, but advises all customers to take their own legal advice regarding possible breach of third party copyright.

A second example: data on business costs and expenses claimed by MPs, available as downloads from the IPSA website. Availability for reuse is covered by a single sentence in a FAQ document:

The information is released publicly and without restriction.

In both examples there is no specified licence. But does there need to be one?

The public domain

The Open Data Institute’s guide to open data goes somewhat further than the Open Definition, by saying that “without a licence, the data can’t be reused.”

Technically that statement is wrong. There is an alternative case: the data can be in the public domain, because IP rights do not apply or have expired or been forfeited.

In principle, if a dataset is in the public domain then reuse is unfettered. Even the few restrictions normally imposed by an open licence, such as the requirement to attribute the data to its source, do not apply.

But ‘public domain’ is a tricky concept. In the United States, where works of the federal government are excluded from copyright protection, this term has a recognised status in copyright law. However in the UK it is mainly a colloquialism. There is no statutory process for waiving IP rights and placing a work into the commons.

In practice, in order to reuse a dataset openly we either have to establish permission or establish that no permission is required. The usual approach is for the publisher to apply an open licence to the dataset. But in order to make data available under licence the publisher must actually believe they have IP rights in that data. Otherwise they have no standing to give anyone permission to use the data (or to restrict its use).

A licence provides clarity

The approach taken by Companies House and IPSA is unsatisfactory because it is ambiguous. On the face of it, blanket statements that the data may be used “without restriction” have the virtue of simplicity. But does that mean Companies House and IPSA have no IP claims on the data, or do those statements actually operative as ad hoc licences? It’s not clear.

Although the records of individual companies are third-party information, Companies House could assert database rights over the register of companies as a whole, and possibly also copyright over the system of uniform resource identifiers (URIs) assigned to each company. That would give CH the necessary IP ownership to apply the Open Government Licence to the Free Company Data Product.

The OGL does not override any other copyright or database right exceptions, so (as far as I can see) this approach would not conflict with the statutory status of the register of companies as a public register.

Similarly IPSA could regularise its data downloads by applying the Open Parliament Licence.

Another option is the Creative Commons Zero (CC0) licence, which is designed specifically for IP owners who want to dedicate copyrighted works as nearly as possible to the public domain.

This change of approach would provide greater clarity to reusers by bringing the CH and IPSA datasets into a recognised open data framework.

Where producers genuinely believe they cannot assert IP rights over the data they make public, but want to encourage open reuse, it would be good practice to provide as clear as possible a statement to that effect. This would give reusers an assurance that the data is okay to use (subject to due diligence on third party rights and other issues) and that the producer commits not to apply restrictions on reuse retroactively at a later date.

* Update 28 February 2016: the latest version of the Open Definition clarifies that open data may also be in the public domain, i.e. not subject to a licence.

Image: Open data by Descrier (CC BY 2.0)