Statistical models can only be as good as the data put into them. Data about energy consumption continues to grow, particularly its non-technical aspects, but these variables are often interpreted differently among disciplines, datasets, and contexts. Selecting key variables and interactions is therefore an important step in achieving more accurate predictions, better interpretation, and identification of key subgroups for further analysis.
This paper therefore makes two main contributions to the modeling and analysis of energy consumption of buildings. First, it introduces regularization, also known as penalized regression, for principled selection of variables and interactions. Second, this approach is demonstrated by application to a comprehensive dataset of energy consumption for commercial office and multifamily buildings in New York City. Using cross-validation, this paper finds that a newly-developed method, hierarchical group-lasso regularization, significantly outperforms ridge, lasso, elastic net and ordinary least squares approaches in terms of prediction accuracy; develops a parsimonious model for large New York City buildings; and identifies several interactions between technical and non-technical parameters for further analysis, policy development and targeting. This method is generalizable to other local contexts, and is likely to be useful for the modeling of other sectors of energy consumption as well.
Available at: http://works.bepress.com/david_hsu/4/