Designing a workflow for publication of open data can be quite involved. There are many ongoing debates about data standards, formats, accessibility, discoverability, and resource models. At the organisational level a successful open data programme depends on building expertise across functions and technical disciplines.
But explaining open data at that level of complexity isn't always necessary or helpful. If we expect open data to become "business as usual" we need to give data specialists a manageable set of considerations they can apply to release of individual datasets.
Following is a checklist of key points I encourage data specialists to consider when preparing data for publication.
1. Have you identified a dataset that you want to release as open data?
Confirm you have the dataset stored in an internal location and are sure it's the version you want to publish. Complete any quality assurance process you have in place. Open data doesn't need to meet any particular standard of data quality but you should be able to communicate known issues to users.
2. Do you have legal rights to release the dataset as open data?
You must either own the dataset or have permission to use and publish as open data any third party intellectual property in the dataset. (Or the dataset could be free of IPR, but that's rare in the UK.)
Ideally you should understand the lineage of the dataset; what went into it and how it was produced. If the lineage is unclear, as may be the case with historical data, publication will require a risk-based judgement.
3. Have you risk-assessed the data for publication?
You should have internal standards for risk assessment, with a sign-off process. The risk criteria will depend on the nature of the dataset; however risk assessment will usually consider data protection, confidentiality, public safety, and other factors.
4. Is the dataset in an open format?
The data must be manipulable and in a file format that is supported by available free software. So no PDFs or MS Access databases, unless you also intend to publish the data in an alternative open format.
5. Have you chosen a licence?
There are a number of standard open licences available. Some are more permissive than others.
The Creative Commons Attribution License is a popular choice. If you are a UK public body you should use the Open Government Licence where possible. If your dataset contains third party data, make sure the licence you choose is compatible with the licence under which you have permission to include that data.
Don't try to write your own licence. Your dataset is more likely to be used if the licensing makes it interoperable with other datasets.
6. Have you written an attribution statement?
Most open licences enable the publisher to specify how they want to be credited for use of their data.
Usually this is a short statement of rights, e.g. "Contains XYZ Ltd data © copyright and database rights 2018". Make sure you include any third party attribution requirements as well.
7. Is the dataset adequately documented?
Metadata, an additional data file that sets out the essential characteristics of the dataset in a structured format, is the most common type of documentation for a dataset. Metadata is good practice, particularly if the dataaset will be catalogued or archived. But metadata alone is often not enough to make the dataset understandable to the user.
Write a free-format summary that explains the content and significance of the dataset. Adapt any internal documentation you have, and provide links to (or copies of) any reports or background material that will help the user understand the dataset. Make sure column headings and attributes in the dataset are either defined or self-explanatory.
Documentation may include an "information warning" that draws the attention of the user to concerns related to data quality, third party rights, data protection, or the unsuitability of the data for specific purposes.
At minimum the dataset should be accompanied by a "Read Me" file with a brief description of the data and its source, the date of publication, and information on licensing and attribution.
8. Have you packaged the dataset?
Open data may be distributed far beyond its publication source, so you should think about packaging the data and documentation in a zip file or similar container so that is all travels together. Don't rely too much on the catalogue or landing page where the data will be published originally; that context may be lost.
9. Do you have a place to publish the dataset?
Open data is almost always available for download for free from the public web. Data portals are increasingly affordable , but unless your dataset is very large or in demand you should be able to publish it as an attachment on your normal website.
You may also want to submit your dataset to catalogues or portals run by reputable outside organisations, particularly if you operate in a sector that cooperates to publish data. Users may share your dataset or combine it with data from other sources, but you should maintain a canonical version of your dataset in an online location that you control.
10. Do you have a plan to support the dataset?
Promote the availability of your open data through social media and other channels. You may also want to produce an infographic, visualisation or interactive map to make the data more relatable. If possible use the release to highlight the work of your organisation, or tie it in to a business objective or policy initiative.
Open data is often published without a firm idea of how or how much it will be re-used, so don't go overboard. Publishing your data in a discoverable location may be enough.
However if you think the dataset has broad potential or there is clear demand, there could be additional measures you can take to maximise re-use of your data. Making the data accessible through an open API (in addition to bulk downloads), or using identifiers and open standards to turn the dataset into linked data, are two ways of developing your open data further.
Further reading
The above checklist is intended to encourage publication of data that complies with the Open Definition.
Many of the points are covered in more detail in an Open Data Policy I wrote recently for JNCC.