Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting

Metadata Updated: January 11, 2024

The BuildingsBench datasets consist of:

  • Buildings-900K: A large-scale dataset of 900K buildings for pretraining models on the task of short-term load forecasting (STLF). Buildings-900K is statistically representative of the entire U.S. building stock.
  • 7 real residential and commercial building datasets for benchmarking two downstream tasks evaluating generalization: zero-shot STLF and transfer learning for STLF.

Buildings-900K can be used for pretraining models on day-ahead STLF for residential and commercial buildings. The specific gap it fills is the lack of large-scale and diverse time series datasets of sufficient size for studying pretraining and finetuning with scalable machine learning models. Buildings-900K consists of synthetically generated energy consumption time series. It is derived from the NREL End-Use Load Profiles (EULP) dataset (see link to this database in the links further below). However, the EULP was not originally developed for the purpose of STLF. Rather, it was developed to "...help electric utilities, grid operators, manufacturers, government entities, and research organizations make critical decisions about prioritizing research and development, utility resource and distribution system planning, and state and local energy planning and regulation." Similar to the EULP, Buildings-900K is a collection of Parquet files and it follows nearly the same Parquet dataset organization as the EULP. As it only contains a single energy consumption time series per building, it is much smaller (~110 GB).

BuildingsBench also provides an evaluation benchmark that is a collection of various open source residential and commercial real building energy consumption datasets. The evaluation datasets, which are provided alongside Buildings-900K below, are collections of CSV files which contain annual energy consumption. The size of the evaluation datasets altogether is less than 1GB, and they are listed out below:

  1. ElectricityLoadDiagrams20112014
  2. Building Data Genome Project-2
  3. Individual household electric power consumption (Sceaux)
  4. Borealis
  5. SMART
  6. IDEAL
  7. Low Carbon London

A README file providing details about how the data is stored and describing the organization of the datasets can be found within each data lake version under BuildingsBench.

Access & Use Information

Public: This dataset is intended for public access and use. License: Creative Commons Attribution

Downloads & Resources

Dates

Metadata Created Date June 23, 2023
Metadata Updated Date January 11, 2024

Metadata Source

Harvested from OpenEI data.json

Additional Metadata

Resource Type Dataset
Metadata Created Date June 23, 2023
Metadata Updated Date January 11, 2024
Publisher National Renewable Energy Laboratory
Maintainer
Doi 10.25984/1986147
Identifier https://data.openei.org/submissions/5859
Data First Published 2018-12-31T07:00:00Z
Data Last Modified 2024-01-11T07:00:01Z
Public Access Level public
Bureau Code 019:20
Metadata Context https://openei.org/data.json
Metadata Catalog ID https://openei.org/data.json
Schema Version https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby https://project-open-data.cio.gov/v1.1/schema/catalog.json
Data Quality True
Harvest Object Id 483b9528-cf79-4e05-9a29-8eefc970cdde
Harvest Source Id 7cbf9085-0290-4e9f-bec1-91653baeddfd
Harvest Source Title OpenEI data.json
Homepage URL https://data.openei.org/submissions/5859
License https://creativecommons.org/licenses/by/4.0/
Old Spatial {"type":"Polygon","coordinates":-180,-83,180,-83,180,83,-180,83,-180,-83}
Program Code 019:002, 019:023, 019:000
Projectnumber 08GO28308
Projecttitle Laboratory Directed Research and Development (LDRD)
Source Datajson Identifier True
Source Hash ce86e39d956a223b8d8dbe9b207149763fb963d7f60b7615f08d311c0f5d39e1
Source Schema Version 1.1
Spatial {"type":"Polygon","coordinates":-180,-83,180,-83,180,83,-180,83,-180,-83}

Didn't find what you're looking for? Suggest a dataset here.