How to fix Data.gov.uk (or at least make it suck less for users)

Post: 14 September 2016

The Government Digital Service (GDS) is currently undertaking user research for Data.gov.uk, as “previous research suggests that it’s not easy to find and use data on the site”.

There’s a blog post, and a short survey in which GDS asks three questions:

I have reservations. If GDS’s objective is to improve Data.gov.uk (DGU) for data users, the above questions may be too open and not produce much actionable feedback.

I’m also somewhat doubtful that GDS has the right skills and mindset to fix DGU. The transactional approach GDS has taken to development of GOV.UK and other digital services is not likely to deliver what the data community and the public need from DGU.

Neither the blog post nor the survey are readily discoverable on Data.gov.uk itself, which also makes me wonder how serious GDS is about user research. The future development of DGU is (or should be) sufficiently central to the Government’s digital policy to merit a proper public consultation via GOV.UK.

As a long-time user and close observer of DGU I’ve been critical of how the site has evolved over the past several years. The metadata catalogue is a mess and the learning curve for data discovery is too steep for the average user.

However this post (clickbait title aside) is an attempt to offer some constructive analysis and ideas for what I would like to see GDS do with Data.gov.uk. DGU is supported by a small team of developers and none of these points are meant as specific criticism of them or their work.

What is Data.gov.uk for?

Start with the fundamentals. What is DGU for? What is the site meant to deliver?

The metadata catalogue must be central and is the only essential functionality.

There is a tendency to think of DGU as an “open data portal” like other such sites. But there is a big difference between building a repository from which knowledgeable users can download datasets, and a catalogue that a broad range of users (including the general public) can use to discover and locate datasets.

DGU should focus tightly on the metadata catalogue; possibly to the exclusion of all else, if resources are limited. There are a small number of datasets that are only available via DGU but generally speaking it is not crucial for DGU to provide direct, one-click access to data. Accessing data out of context is rarely useful.

As a default approach DGU catalogue records should introduce the data and show users where to find it, with a link to a landing page on the publisher’s own site for the dataset itself. (Of course this need not prevent development of specialist repositories or minisites on the Data.gov.uk domain, distinct from the catalogue itself.)

Data.gov.uk needs an editor

DGU’s key strategic failure is the lack of active curation.

Currently the catalogue is compiled through semi-automated harvesting and submission processes from information provided by publishers. Some publishers are more engaged with this task than others.

Metadata quality on DGU is highly variable. There are few standards of practices for presentation of datasets. DGU users cannot easily track or group similar datasets. And while the “importance” of an individual dataset is always to some extent subjective, there are few indicators to help DGU users distinguish high-value datasets from those that will be of only niche interest.

DGU should employ an editor with skills in data curation and information management to redesign the pro forma and the common presentation of metadata in the catalogue. New dataset records submitted by data publishers should be scrutinised and either accepted, rejected or amended – as a manual intervention. DGU should also fill gaps in the catalogue by creating metadata records for significant public datasets that publishers have not listed on the site.

This would require a fundamental shift of focus: Data.gov.uk is the catalogue, not the platform that supports the catalogue. Don’t treat the metadata as simply a consumable. Understand and manage the information so that it makes sense to the user.

More add-ons won’t fix Data.gov.uk

There have been some improvements over the past year such as more flexible search parameters and introduction of themes on the Data.gov.uk homepage. Generally though the usability of DGU has been undermined by a tendency to tinker and bolt on new functionality for which there is little evidence of user demand.

The problems with DGU are mostly underpinning and conceptual. More development on top of the existing structure is unlikely to make the site more usable. GDS should evaluate each of the features on the catalogue pages with an eye to reducing clutter. For example:

Stop pretending that Data.gov.uk is the only gov.uk data site

Cabinet Office’s original vision of DGU as the single portal for all open public data has not been realised. That should not be a source of any great regret. DGU never had the necessary resource or political support to achieve that objective, but it was undesirable in any case.

Many areas of government do need centralised support and guidance on publication of open data. However departments and agencies that hold substantial numbers of datasets have the expertise to operate their own programmes. The data community has been well served by the development of specialised hubs and open data repositories such as:

https://data.police.uk/

http://digital.nhs.uk/

http://geoportal.statistics.gov.uk/

https://www.ordnancesurvey.co.uk/opendatadownload/ 

and many others, as well as a multitude of local government data sites.

DGU should embrace this ecology and start to drive traffic to some of these sites from its homepage. If users are looking for subject-specific data DGU should help them find the most relevant site quickly, rather than expecting them to navigate there through a specific dataset record in the DGU catalogue.

Which platform?

It’s probably clear from the above that I favour a complete reboot for Data.gov.uk. I’ve no idea if that’s on the cards. But if GDS was to build a new Data.gov.uk site, is there an obvious software platform it should choose?

I’ve deliberately left this question to the end of my post. Too often the process of software procurement seems to start with the solutions available rather than clear definition of the business requirements.

In my view it should be possible to build a better DGU on one of the software platforms available for development of open data portals. But that shouldn’t be an assumption. If DGU continues on the current basis, i.e. development-led with automated harvesting rather than active curation of the catalogue, an out-of-the-box solution may not be sufficiently adaptable.

The current DGU site is a CKAN implementation but customised to such an extent that it may not be representative of how that platform would perform with a fresh approach. GDS should also look at alternatives such as Socrata and DKAN. But, no, I don’t think there is an obvious choice. (For what it’s worth, the most cleanly presented open data site I’ve come across recently belongs to Northumberland County Council, and seems to be bespoke.)

I refer you to my previous comments …

A few of my other posts from the past year include comments on various aspects of DGU that I think are still relevant as feedback to GDS’s user research: 

Fix the DGU data request process (or replace it with something better)

Don’t operate the DGU catalogue as a ministerial tote board

Use DGU to improve “signposting” of open data