Last week the Web Foundation published a "Leaders Edition" of the Open Data Barometer.

The ODB is a global measure of government performance on publication and use of open data. Whereas the four previous ODB releases endeavoured to measure open data throughout the world, the Leaders Edition focuses on 30 governments that have adopted the Open Data Charter and those that have signed up to the G20 Anti-Corruption Open Data Principles.

The United Kingdom, which had scored highest in all previous releases of the ODB, has tied with Canada as the highest scoring government in the Leaders Edition.

The implementation component of the ODB global measure is derived from what the Web Foundation describes as a "detailed dataset survey completed for 15 kinds of data in each government". This survey touches on "issues of data availability, format, licensing, timeliness, and discoverability."

In this post I want to take a close look at the supporting evidence provided for the ODB's survey of UK datasets. I've collected this evidence into a spreadsheet, drawing on various files from the ODB data pages and notes in the Research Handbook.

(A declaration of interest: I had no involvement in the ODB dataset survey but have previously submitted findings to the Global Open Data Index, an alternative measure of national performance on open data.)


Overview

The ODB dataset survey looks at government data in 15 categories:

  • Map Data
  • Land ownership data
  • Detailed census data
  • Detailed government budget
  • Detailed data on government spend
  • Company register
  • Legislation
  • Public transport timetables
  • International trade data
  • Health sector performance
  • Primary or secondary education performance data
  • Crime statistics
  • National environment statistics
  • National election results
  • Public Contracts

ODB researchers provide Yes/No responses to ten questions, supported with evidence, for each category of data:

  1. Does the data exist?
  2. Is it available online from government in any form?
  3. Is the dataset provided in machine-readable and reusable formats?
  4. Is the machine-readable and reusable data available as a whole?
  5. Is the dataset available free of charge?
  6. Is the data openly licensed?
  7. Is the dataset up to date?
  8. Is the dataset being kept regularly updated?
  9. Was it easy to find information about this dataset?
  10. Are data identifiers provided for key elements in the dataset?

For some questions the researcher is required to chose a representative dataset within the category as the basis for their responses.

Responses to the ten questions are weighted into a score between 0 and 100 for each category. You can see the UK scores on the ODB's Country Detail page.

The survey also makes a judgement on whether the data in each category is "open", based on whether the researcher has answered Yes to all of questions 3 to 6.


Map data

The description for this category is: "A detailed digital map of the country provided by a national mapping agency and kept updated with key features such as official administrative borders, roads and other important infrastructure. Please look for maps of at least a scale of 1:250,000 or better (1cm = 2.5km)."

The ODB survey has concluded that the UK does not have open data that fits this description. This is a red flag for the ODB's methodology, since the availability of open mapping data is well established in the UK except at the most detailed level.

The researcher has identified one source for UK map data: "Ordnance Survey (ONS), the UK's mapping agency".

OS is the national mapping agency for Great Britain rather than the UK. Both OS and ONS publish open map data at better than 1:250,000 scale, as does OSNI for Northern Ireland.

The researcher has chosen OS Open Names as a representative mapping dataset. OS OpenMap – Local or Boundary-Line would have been better choices.

However the main problem is that the researcher has answered the licensing question based on the terms of use for OS Maps, an OS digital mapping application. A more relevant response would refer to the terms for OS OpenData products (and equivalent data in NI) – all available under the Open Government Licence.


Land ownership data

The description for this category is: "A dataset that provides national level information on land ownership. This will usually be held by a land registration agency, and usually relies on the existence of a national land registration database."

The ODB Research Handbook is vague on the level of granularity required but the notes suggest the requirement for a mechanism that facilitates access to land registry data "without requiring knowing the owner or other details".

In the UK land ownership records are available from HM Land Registry and other sources, but in most cases only on commercial terms.

Responses from the ODB researcher focus on the UK House Price Index, a statistical dataset, and Price Paid Data, a transaction level dataset for England and Wales. Neither dataset contains information on land ownership.


Detailed census data

The description for this category is: "Key national statistics such as demographic and economic indicators (GDP, unemployment, population, etc), often provided by a National Statistics Agency. Aggregate data (e.g. GDP for whole country at a quarterly level, or population at an annual level) is considered acceptable for this category."

There is a mismatch between the indicator name and the category description in the ODB Research Handbook, which permits virtually any key national statistics. Consequently the scoring in this category depends very much on the search approach taken by the researcher. In this case the researcher has chosen the UK's quarterly national accounts as a representative dataset.


Company register

The description for this category is: "A list of registered (limited liability) companies in the country including name, unique identifier and additional information such as address, registered activities. The data in this category does not need to include detailed financial data such as balance sheet etc."

Despite that description, the researcher has focused on Companies House's Free Accounts Data Product instead of the more pertinent Free Company Data Product.

The researcher has concluded that there is no open data available for the UK in this category.

The problem here is the ODB's methodology, which requires open data to have an explicit licence and does not accommodate open datasets such as this that are made public under statutory authority. Companies House has said elsewhere that it does not publish a licence for the Free Company Data Product because it does not claim any intellectual property rights in the data.


Public transport timetables

The description for this category is: "Details of when and where public transport services such as buses and rail services are expected to run. Please provide details for both bus and rail services if applicable. If no national data is available, please check and provide details related to the capital city."

The research has focused on NaPTAN, an open database of public transport access nodes managed by the Department for Transport and updated by local authorities. This is a useful dataset but does not contain timetable information.

It may have been better to base the responses on TfL's timetable data since the criteria allow for that, though the availability of transport data for London is hardly representative of the UK.


Public Contracts

The description for this category is: "Details of the contracts issued by the national government."

This is another red flag for the ODB's methodology. In the wake of this year's Carillion collapse the UK Government has been criticised from all sides for the poor availability of data on public contracts. Yet ODB's survey identifies this as one of the UK's highest scoring categories.

The researcher's responses focus on Contracts Finder, a central repository of data on public contracts. However use of Contracts Finder is inconsistent across government.

The researcher also concludes incorrectly that the Contracts Finder data is available as a bulk download, one of the minimum criteria for open data. While users of Contracts Finder can download CSVs based on search results, access the data via an API, or get downloads of records added each day, bulk downloads of the whole dataset remain unavailable.


What does this mean for the credibility of the Open Data Barometer?

Responses in the other categories are broadly fine within the criteria provided by the Research Handbook, though some of the scores are driven very much by the researcher's selection of the representative dataset for each category.

The core underlying problem with the ODB dataset survey is that the responses are almost entirely a function of the online discoverability of data by researchers who are not very familiar with UK government sources.

Data discovery is an important topic that should be studied, but it cannot be measured from the search strategies of a small number of users. The ODB approach fails to disaggregate discovery from availability of open data, which means that many data sources have been missed or misunderstood. Consequently the UK survey results are inaccurate.

I haven't looked in any detail at the ODB results or evidence for Canada or any other governments, so I won't venture an opinion on whether better research would have put the UK in a different place on the leader board. However the dataset survey is ostensibly the least subjective component of the Open Data Barometer's global measure.

In general I am sceptical of the value of comparing open data performance at the national level, except perhaps as a prompt for further discussion. However if the Web Foundation is going to continue with this project I think we should expect a more rigorous approach to gathering of evidence on key national datasets.