This is the second in a series of blog posts about open data in biodiversity. I have a professional interest but all views in this post are personal.
In my previous post I highlighted the complexity of data flows within biodiversity, and the challenge to establish ownership of data years after it has been collected.
The National Biodiversity Network Trust's advice to data managers on clarifying permission to share and use existing wildlife data provides an excellent summary of the problem:
The vast majority of wildlife data resources that have been created in the UK to date have had no formal transfer of authority permitting the data to be passed on and used. In most cases casual submission of wildlife records has been based upon the presumption that the data would be made available to others. In such circumstances data holders usually feel they have sufficient authority to use and make wildlife data available to others. Indeed this is the premise upon which the majority of traditional data exchange has taken place in the UK to date. |
These ambiguities are a barrier to open licensing of biodiversity data. Many users will be wary of data that does not have clear provenance and permissions, and this reduces the potential to maximise social and economic benefits from re-use of biodiversity data.
NBN guidance on intellectual property rights
Where the NBN Trust falls down is in its analysis of intellectual property rights (IPR) as they pertain to biological data records.
In particular the following view is difficult to reconcile with UK law:
Copyright is sometimes used to refer to IP rights more generally. However, in legal terms copyright does not apply to individual data records per se. It may apply to a document if the "manner of expression" is original and if sufficient "art, skill or labour" has gone into its creation. Even in this context, the copyright would only apply to the document and not the underlying information. The fact that there may be considerable skill and labour involved in finding and identifying species does not invoke copyright of the data record. Copyright will only apply if the compilation and/or organisation of the data includes a significant level of ingenuity or skill on the part of the database creator. Given the nature of the data supplied by data recorders, legal advice obtained by the NBN Trust suggests that data recorders would not hold any material copyright of the data. |
The NBN Trust's guidance rests mainly on the Copyright and Rights in Databases Regulations 1997 (CRDR) and ignores the more central Copyright Designs and Patents Act 1988 (CDPA), which sets out the application of UK copyright law to data records as well as databases.
Facts are not subject to copyright, even if the facts are new and discovering them requires skill or labour. However an original collection or arrangement of facts in fixed form may be protected by copyright. It's somewhat facile to say that copyright only applies to the document and not the underlying information, if the information is supplied in recorded form and not available from any other source.
The biological record as a "table or compilation"
CDPA recognises "a table or compilation" as a type of literary work that is eligible for copyright. A dataset, or even an individual data record, is likely to be protected by copyright as a table or compilation if its creation is original and involves a deliberate arrangement or selection of data i.e. some skill or effort on the part of the author.
So it's not so simple to say copyright doesn't apply to individual data records. A mechanically recorded observation may not be subject to copyright – particularly if the record is in a schema that prevents the recorder from making any decisions about how the observation will be expressed. However a data record is more likely to be subject to copyright if the recorder chooses the attributes that make up the record or decides how the observation is expressed, for example as free-format text.
The threshold of originality used to assess whether a work can be copyrighted is quite low, and there is no requirement that creation of the work should involve a high level of skill or labour.
Individual data records may not always be protected by copyright, but it will rarely be possible to make that judgement without considering the content and structure of the record and how it was created. For purposes of onward dissemination and use it is safer to assume that copyright subsists from the point of creation of individual records.
Of course data recorders will often make a series of observations and submit those records as a collection, which increases the likelihood that they will conform to the meaning of "compilation" and be subject to copyright.
Databases and database right
Databases are a further type of literary work that may be protected by copyright. In CDPA a database is a collection of independent works, data or other material arranged in a systematic or methodical way, and is original if the selection or arrangement of the contents constitutes the author's own intellectual creation.
CRDR 1997 also implements a "sui generis" database right that enables the maker of a database to restrict extraction or re-use of substantial parts of the contents of a database, provided there has been substantial investment in obtaining, verifying or presenting the contents of the database.
Database right is the form of intellectual property that the NBN Trust has focused on in its guidance note. However the assumption that recorders will usually have database right is also questionable. Some recorders do build up their own collections of records, but to obtain database right they must make a substantial investment in the organisation and arrangement of a collection of works – independent of any investment in the creation of the data itself.
Database right is separate from copyright and does not require the maker of the database to own copyright in the works collected in the database. However it's unlikely that a data recorder will have database right in a collection of data that they created themselves without also having copyright in the data records (and in the database).
Risk-based publication of historical records
It's easy to see why the NBN Trust might prefer data recorders to have database right but not copyright. Database right normally expires 15 years from completion or publication of a database, whereas copyright lasts for 70 years from the death of the author. CRDR 1997 also permits data users to make reasonable assumptions about when database right has expired, if it is not possible by reasonable inquiry to ascertain the identity of the database maker. If records don't have copyright, that increases the pool of historical data that can be published via the NBN Atlas.
Unfortunately the NBN Trust's advice is a misinterpretation of the law. Record centres and recording schemes cannot generally assume that historical records are unencumbered by copyright.
This may prove an intractable problem for any record centre or scheme that holds reserves of historical data, as it will often be uneconomic to contact and obtain permission from recorders retroactively.
(SxBRC's lessons learned report on sharing of Sussex beetle records is an useful case study on the difficulties of publishing historical biological records as open data.)
Record centres may decide to publish their historical data anyway, relying on a risk-based estimation that recorders are unlikely to object to dissemination of their records via the NBN Atlas under whatever licence the record centre wants to use. The estimation of risk will depend on the quantity and sources of the data. The economic value of copyright in individual data records, and therefore the potential for liability for infringement of intellectual property rights, may be perceived as low. On the other hand a large volume of collated data with uncertain provenance will present a higher risk.
Any concerns about IPR should be made transparent. Where record centres or the NBN Atlas publish data that contains third party rights they may not have authority to license, they are (in my view) ethically obliged to warn users so that users can make their own judgement about the suitability of the data for their purposes. Such an information warning may also reduced the potential legal liability for record centres and the NBN.
Creation of new biological records
There are a number of things we can do to ensure new biological records are collected with clear permissions that provide a basis for wide sharing and use.
The first is to make use of emerging technologies for biological recording, such as smartphone apps and standardised digital forms. Properly implemented, these technologies can reduce the potential for recorders to acquire IPR in the records they submit and/or make it easy to capture and record permissions to re-use those records.
This technological approach can empower citizen science schemes and the crowd-sourcing of observation data. However it may sometimes be too restrictive for types of biological recording that require more judgement and expertise, whether from experienced volunteers or professional surveyors.
Whether records are submitted digitally or in paper form, the records centre or scheme should ensure that the recorder grants (or confirms they have authorisation to grant) a worldwide, royalty-free, perpetual, non-exclusive licence to use any IPR in the record, including a right to sub-license. This is the minimum permission necessary to ensure that the record can be published as open data, either individually or collated with other records.
Even if the records centre or scheme has no settled intention to publish the record as open data, obtaining permission on this basis will provide the maximum scope for dissemination and use for their own purposes.
If there is a plan or policy to publish the data under a specific open licence, such as CC BY or the OGL, it's a good idea to signpost that licence to the recorder at the point of submission. This will help them understand the ways in which their data may be used.
One variation of this approach is to ask recorders to waive any IPR they may have in the records, perhaps with reference to the CC0 public domain dedication. CC0 is even more flexible than an open licence but will be a viable alternative only if the recorder is submitting records they have created themselves.
The NBN Trust has provided some model wordings for permission statements. However I recommend avoiding phrases like "environmental decision-making, education, research and other public benefit" as they are subject to interpretation and will restrict the scope for downstream use of the data.
There is a balance to be struck between the need to obtain permissions that are legally robust and the desire to avoid jargon that may discourage recorders from submitting data. Development of better processes for sharing data will depend on raising awareness of intellectual property rights and understanding how they contribute to securing the evidence base for biodiversity science.