Shakespeare Review into Public Sector Information: Draft Recommendations

In June 2012 the UK Government announced that Stephan Shakespeare, Chair of the Data Strategy Board, would lead an independent review of public sector information. Terms of reference and additional information are available on the Gov.uk website.

Market research firm YouGov has recently conducted a survey regarding open data, in support of the PSI Review. (Stephan Shakespeare is CEO of YouGov.) An analysis of the high-level results of the survey was released earlier this week.

Today I had the opportunity to participate in a follow-up survey and comment on a draft of ten basic recommendations that Shakespeare plans to put forward in his PSI Review. The Review is scheduled for publication in May.

[ Update: You can now take the follow-up survey too! ]

Below, presented verbatim and without comment, are the ten draft basic recommendations:

image

Recommendation 1

Every department of government should make an immediate commitment to publish their single most important dataset quickly, to a high standard agreed to maximise linkability, ease of use and free access. They should also commit to maintaining that dataset and keeping it regularly updated.

Most important dataset to be defined as the one that is used most often by the department itself to carry out its work (or the one most requested by outside users, if that demand can be demonstrated and is significantly different); these datasets taken together will be the ‘National Core Reference Data’.

The situation currently is that most departments have done a reasonable job at making some datasets available; my proposal says it must be their most important datasets, the ones that define their core work; and currently they are not necessarily published and maintained to a consistent format and standard, and that should now happen.

Recommendation 2

Alongside this high-quality core data, departments should commit to publishing all their datasets (in anonymised form) as quickly as possible without concerns about quality - that is, if there is a clash between data quality and speed to publication, they should follow the 'publish early and ugly’ principle because data scientists are well accustomed to getting value out of imperfect data.

Currently many datasets are held back because it is felt they are not ready because they are not of sufficiently high quality, and that resources prevent their speedy improvement. But data users say that lower quality is not as much of a problem as is non-publishing.

Recommendation 3

Recommendations 1 and 2 taken together define a twin-track policy for a simultaneous 'high quality core’ AND a 'publish early and ugly’ policy. This twin-track policy will maximise the benefit within practical constraints (with the further recommendation that departments take pride in adding as many datasets as possible and as quickly as possible from track 1 and track 2).

This approach reduces the excuses for poor or slow delivery; it says 'get it all out and then improve’.

Recommendation 4

Building on existing activities, there should be an immediate programme of investment in basic data science through our academic institutions, covering both genuinely unfettered 'basic research’ and research of 'practical immediate value’ to the national data strategy. We cannot rely only on markets and government departments to maximise the potential of this relatively new and fast-developing field in which we are positioned to be a world leader.

At the moment, America invests massively more than us and continuously reaps the benefits in world-leading business applications of science and technology; yet Britain is capable of being first in this field, given our strength in data science and the fact we have large, coherent datasets. For example, nowhere in the world has such good health data, due to the scale of the NHS as a single provider. There is huge potential here for building social and economic value if we are willing to invest smartly.

Recommendation 5

We should have a clear pragmatic policy on privacy and confidentiality that increases protections for citizens while also increasing the availability of data; we can do this by putting in place guidelines for publication that, if correctly followed, pushes responsibility for (mis)use on the end (mis)user, strengthening application of punitive consequences. Especially sensitive datasets should be accessible only to those who can demonstrate sufficient expertise in the area and whose activity with the data is traceable.

We currently have an unrealistic degree of expectation of any data holder to perfectly protect all our data, which has led to a situation where data scientists are presumed 'guilty unless proven innocent’ - an attitude that inhibits innovation. Following 'Best practice’ guidelines should be enough, so long as we are willing to prosecute those who misuse personal data. Otherwise we will miss out on the enormous benefits of Public Sector Information, including open government data.

Recommendation 6

We should have a mechanism for driving the implementation of the national data strategy throughout the public sector, and its oversight. This should include clarity about what data is/can be available, with a feedback loop for its improvement; it will be continuously accessible to citizen and business- user influence. The idea is to be an exemplar of the democratic crowd-sourcing of decisions.

We have several committees, boards, overseers and champions of data; but no easily understood, easily accessed, easily influenced mechanism for making things happen. This is ironic given it’s all about 'information’. We should create a single channel for driving Public Sector Information, including open government data through the system.

Recommendation 7

We should develop a model of a 'mixed economy’ of public data so that everyone can benefit from some forms of two-way sharing between the public and the commercial sectors. Data that is derived from the activity of citizens must be seen as being at least co-owned by them and returning value to them, though the investment of business in collecting and processing the data should also be respected.

There are government initiatives such as Midata a government led project that works with businesses to give consumers better access to the electronic personal data that companies hold about them. The project recognises that data about citizens belongs to them and that they should have a way of claiming and using their ownership.

Furthermore, government should be able to make a collective claim on some data if one can make a strong case for public value which is not by other means returning to the public.

Recommendation 8

We should challenge the current quasi-commercial Trading Fund model (for Companies House, Land Registry, the Met Office and Ordnance Survey) in favour of a basic information utility or scientific institution model, in which Trading Funds should be responsible for transparency of data production (that is, collecting and publishing data in a way that can be seen to be reliable and authoritative) and only provide 'added value' services where the market is likely to fail.

Currently the Trading Funds do a reasonably good job of collecting, using and sharing data. But many think it would be even better if they could focus on transparent collection and distribution, and where appropriate scientific processing, rather than holding on to it for quasi profit-making purposes.

Recommendation 9

We should expect systematic and transparent use of data in the formulation, implementation and monitoring of government policy, and formally embed this in the democratic process.

Although government does publish some data as evidence for policy, for example in impact assessments, practice varies, and the wider consultation process is not generally considered to be effective. We should deepen and broaden the role of data in policy making.

Recommendation 10

We should continue to provide evidence for the economic and social value of Public Sector Information, including open government data to underpin a bold strategy of investment in an infrastructure of data to make the UK the world leader in this field, thereby gaining the greatest advantage in this new wave of the digital revolution.

Currently we can measure the costs of producing and publishing data but have no model for evaluating the economic or social benefits 'downstream’, and so we may be undervaluing these activities, leading to underinvestment of resources.