The Department for Levelling Up, Housing & Communities (DLUHC) publishes Energy Performance of Buildings Data for England and Wales at epc.opendatacommunities.org.
The majority of the data is from the 22 million or so domestic Energy Performance Certificates (EPCs) that have been lodged on the public register between 2008 and the end of 2021. The data contains EPC ratings for residential properties, along with a range of building attributes related to energy efficiency.
Recently, DLUHC has added Unique Property Reference Numbers (UPRNs) to the EPC data.
UPRNs are identifiers derived from Ordnance Survey's AddressBase. AddressBase itself is a commercial product, but the UPRNs and their point coordinates are open data.
Inclusion of the UPRNs means the EPC records can be geolocated at address level and combined with open data (including postcodes) from other sources. UPRNs increase the re-usability of the EPC dataset because the identifiers can be used in place of the address fields, which are subject to restrictive Royal Mail terms.
But how successful has DLUHC been in its effort to allocate UPRNs to the EPC records?
Allocation of UPRNs is incomplete
The most recent quarterly data release contains 22,243,396 records for domestic EPCs lodged up to the end of 2021.
UPRNs have been allocated to 20,551,337 records, or 92% of the total.
1,716,164 records contain UPRNs that have been added by the energy assessors who submitted the records to the register. 18,835,173 records contain UPRNs that have been allocated by matching the address information in the EPCs to AddressBase.
Currently, that means about 85% of the EPC records contain UPRNs from address matching, though we may expect the proportion of UPRNs added by energy assessors to increase over time.
DLUHC has not explained its method in any detail, but says in a blog post:
To supplement missing UPRNs the department has used an address matching algorithm to provide additional UPRN coverage for records dating back to 2008.
The address-matching algorithm uses a combination of rules-based and machine-learning approaches using data from AddressBase. In this way, we can map addresses to UPRNs with a high degree of reliability.
We don't really know how many properties have EPCs
The EPC dataset does not cover all housing stock in England and Wales. An EPC is only required when a property is built, sold or rented. EPCs are valid for ten years but remain on the register after they expire.
We do not have any reliable count of the number of unique properties on the EPC register. That may seem strange, and arguably it point to weaknesses in the design of the register.
Allocation of UPRNs will, in theory, tell us how many properties have EPCs. A UPRN is, by definition, unique to a property. A property may have multiple EPCs, but should have only one UPRN.
The domestic EPC dataset contains 15,678,307 unique UPRNs, of which 25% are allocated to more than one EPC record. That's useful because it enables us to track changes to the attributes of a property over time.
However, allocation of UPRNs to the whole EPC dataset may not be an achievable task, for various reasons.
Even complete allocation of UPRNs will not tell us how many existing properties have EPCs. There is no direct mechanism for updating the register when a property is demolished or subdivided into new properties.
Those events can be detected through matching to other information – AddressBase includes metadata that shows which UPRNs are 'historic' – but that information is not available as open data.
Why are there missing UPRNs?
Although UPRNs have been allocated to 92% of EPC records, that's an average across all records lodged on the register from October 2008 to December 2021.
In some recent quarters, UPRNs have been allocated to as few as 85% of EPC records:
There are several common reasons why a UPRN might not have been allocated to an EPC record:
- the property no longer exists,
- the lodged address was incomplete or in a non-standard format, or
- the UPRN exists but is not yet in OS AddressBase.
DLUHC says:
We shall only assign UPRNs from the address matching algorithm that pass our confidence score. This is something we shall routinely monitor to improve the quality and coverage of UPRNs allocated to records.
The address matching algorithm requires a contemporary version of AddressBase and as some EPC’s may be submitted on properties yet to be included in AddressBase we may not find a reliable match. For this reason, some dwellings such as new build properties may have UPRNs allocated in a later publication.
It's unclear which releases of OS AddressBase DLUHC has used for address matching of EPCs. However, it appears the low rates of matching for EPCs lodged in 2021 are mainly due to the lag between creation of UPRNs for new dwellings and their availability in AddressBase:
Is more nearly complete matching of UPRNs to EPCs achievable?
DLUHC has said it will continue address matching of EPCs to UPRNs.
However, I note only 45 of the 1,635,380 EPC records that were unmatched in the Q3 2021 release were matched to UPRNs in the Q4 2021 release. Either DLUHC has not yet put in the resource to backfill UPRNs for existing records, or those missing UPRNs present a challenge.
I hope DLUHC will persevere with this work. Allocation of UPRNs by address matching is necessary and welcome progress towards maximising the re-usability of the EPC dataset as open data.