High-value data collection

The Sustainable Energy Research Group have built a portfolio of high-quality primary research datasets collected through a number of studies, experiments and large-scale trials. These include high-resolution energy demand and environmental monitoring data, resulting in very large, high-value datasets.

The group was instrumental in producing the largest (> 4,000) representative household sample of linked high-resolution electricity demand, and household characteristics and time-use surveys in the UK (SAVE). More recently we have applied our experience with SAVE to the UK-wide Smart Energy Research Laboratory (SERL) which provides access to UK residential smart meter (and linked EPC, climate & occupant data) for UK academics.

Analytics and capacity building

Alongside the collection of cutting-edge datasets, we have developed extensive analytical capacity to address the challenges of working with such large data throughout the research process, from collection to pre-processing, aggregation and resampling, statistical modelling, publishing and finally archiving. To do this efficiently and at speed we make regular use of the University’s internal and external cloud-based analytic platforms (R/RStudio, Matlab etc) as well as the University’s High-Performance Computing Iridis Compute Cluster.

We are committed to building future analytic capacity for both research and industry, running the Data Analysis & Experimental Methods for Civil and Environmental Engineering module which is core to a number of our engineering MSc programmes. Students from these programmes regularly progress to further study through PhD projects at the intersection of energy, climate change and high-volume data analytics.

Reproducible research and collaboration

We use industry standard tools to develop code and analytics software, embedding version control and collaboration tools via the University’s Git service (for example https://git.soton.ac.uk/SERG/saveData). We are also committed to best practice for open and reproducible research and are active members of the local branch of the UK Reproducibility Network and Southampton Research Software Community. As part of this we use git.soton.ac.uk/SERG to support our community of practice and hold regular short courses or tutorials on best practice data analysis workflow using R and git.

Datasets

Solent Achieving Value through Efficiency (SAVE):
The data comprises electrical power demand, electrical energy consumption and a range of survey data on a large (n > 4,000), representative sample of households in the South East of England. The project involved a randomised controlled trial to test a number of energy efficiency and behavioural interventions to reduce household power demand during evening peak hours. The data consists of power demand at 10-second intervals and electrical power consumption at 15-minute intervals, collected over 2 years from 2017 to 2018. This is linked to socio-demographic survey data collected from participating households, providing a uniquely detailed dataset of both high-resolution electricity demand and household characteristics. An anonymised version of the 15-minute, survey and time use data can be accessed via the UK Data Service and comprises a 2.3 GB data package. The 10 second power data, which can be linked to the survey and time-use data, is held at the University and the resulting volume of data is greater than a terabyte in size. We have developed suite of R packages to support analysis of both datasets.

Liveable Cities
This data was collected via the installation of bespoke environmental monitoring kits into 145 households situated in the city of Southampton, UK. The kits, based on Raspberry Pi units, incorporate electrical power monitoring, temperature and humidity monitoring, and carbon dioxide concentration monitoring at high granularity. The same kits were used to measure similar data in 100 households in Xi’an, China; 20 households in Portsmouth UK, and one large detached villa in Jeddah, Saudi Arabia. All in all, this amounts to a total of over 112 million records of 2-3 variables, stored in a 2.3 GB MySQL InnoDB database.

Smart Energy Research Laboratory
The Smart Energy Research Laboratory, in which we are a partner, provides access to GB smart meter data for academic research users. This data comprises half-hourly electricity and gas usage data for an Observatory of up to 10,000 households linked to Energy Performance Certificate (EPC), household attribute and local climate data. SERL also provides a Laboratory function where new projects can obtain access to smart meter data for households they have recruited independently. In all cases the data can only be accessed by accredited researcher staff via SERL’s secure Amazon Workspace Service (AWS) although code for use in data processing and analysis is openly available.

Energy for Development (E4D)
SERG has been continuously monitoring five PV mini-grids in East Africa since 2012. The first was installed in 2012 and the remaining four in 2015.  Power produced by the PV, consumed by the businesses in the mini-grids and stored in the batteries has been monitored at 1 minute resolution, along with a host of other variables including temperature, irradiance and rainfall runoff from the PV canopy. Since October 2018, over nine million records have been stored of over 50 variables per record, across the sites (1.2 GB MySQL InnoDB).

Solar shading car park, Jeddah, Saudi Arabia
As part of an experiment comparing different methods of cleaning solar panels in dusty climates, a series of strings of PV modules were instrumented for voltage, current, temperature and water volume (where the modules were cleaned by water).  The measurements were taken at 1-minute resolution continuously from September 2018 until the present.  A weather station has also been operating for the same period at 2-minute resolution. The total data volume exceeds six million records of over 70 variables per record (1.3GB MySQL InnoDB)