Xiao 2014

Energy and Buildings 75 (2014) 109118
Contents lists available at ScienceDirect
Energy and Buildings

journal homepage: www.elsevier.com/locate/enbuild
Data mining in building automation system for improving building

operational performance
Fu Xiao , Cheng Fan
Department of Building Services Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
a r t i c l e
i n f o
Article history:
Received 11 July 2013
Received in revised form 18 October 2013
Accepted 5 February 2014
Keywords:
Data mining
Building automation system
Feature extraction
Clustering analysis
Association rule mining
Recursive partitioning
a b s t r a c t
Todays building automation system (BAS) provides us with a tremendous amount of data on actual
building operation. Buildings are becoming not only energy-intensive, but also information-intensive.
Data mining (DM) is an emerging powerful technique with great potential to discover hidden knowledge
in large data sets. This study investigates the use of DM for analyzing the large data sets in BAS with the
aim of improving building operational performance. An applicable framework for mining BAS database
is proposed. The framework is implemented to mine the BAS database of the tallest building in Hong
Kong. After data preparation, clustering analysis is performed to identify the typical power consumption
patterns of the building. Then, association rule mining is adopted to unveil the associations among power
consumptions of major components in each cluster. Lastly, post-mining is conducted to interpret the
rules. 457 rules are obtained in association rule mining, of which the majority can be easily deduced
from domain knowledge and hence be ignored in this study. Four of the rules are used for improving
building performance. This study shows that DM techniques are valuable for knowledge discovery in
BAS database; however, solid domain knowledge is still needed to apply the knowledge discovered to
achieve better building operational performance.
2014 Elsevier B.V. All rights reserved.
1. Introduction
Buildings have great impacts on human life and global sustainability. They consume a large amount of energy to build
a comfortable, healthy, safety and productive environment for
human beings. Buildings consume 41% of primary energy in the
United States, which exceeds the transportation sector (29%) and
the industry sector (30%) [1]. In Hong Kong, buildings contribute to
nearly 90% of the total electric energy consumption and around 60%
of greenhouse gas emissions [2]. Buildings consume energy in their
whole life cycles. Normally, the energy use during the operation
stage accounts for 8090% of their lifecycle energy use [3]. Improving building operational performance is of signicant importance
for energy saving in the building sector.
Modern buildings are usually integrated with a diversity of
advanced technologies. Building automation system (BAS) is a
typical example which integrates technologies from information
science, computing science, control theory and etc. BAS enables
modern buildings to be more intelligent through real-time automatic monitoring and control. A huge number of records of
Corresponding author. Tel.: +852 2766 4194; fax: +852 2765 7198.
E-mail address: linda.xiao@polyu.edu.hk (F. Xiao).
http://dx.doi.org/10.1016/j.enbuild.2014.02.005
0378-7788/ 2014 Elsevier B.V. All rights reserved.
temperature, humidity, ow rate, pressure, power, control signals,

states of equipment and etc., are stored in BAS database. However,
the data in BAS are rarely fully interpreted and utilized. The reason
is twofold: the poor quality of the data and the lack of effective and
convenient tools for analyzing the large data sets. BAS data usually
contain signicant missing values and outliers. Modern BASs can
only perform simple data analysis and visualization functions, such
as historical data tracking, moving averages and alarming of simple
abnormalities. They are not capable of systematically analyzing the
massive data sets in their database. The building automation industry needs powerful tools to analyze the massive operational data to
obtain knowledge for improving building operational performance.
Data mining (DM) is an emerging powerful technology with
great potential to discover hidden knowledge in large data sets.
In recent years, DM has been gaining increasing interest in various
industries, such as banking and nancial services, retails, healthcare, telecommunication and counter-terrorism [4]. The use of data
mining techniques in the building eld also yields encouraging
outcomes in energy saving and improving indoor environment.
Generally speaking, DM has been used for load prediction, fault
detection and diagnosis as well as optimal controls in the building eld. Dong et al. [5] used support vector regression models to
predict the monthly building energy bills. The research results validated the feasibility and applicability of support vector regression
110
F. Xiao, C. Fan / Energy and Buildings 75 (2014) 109118
in the building load forecasting area. Amin-Naseri and Soroush [6]

presented a hybrid neural network model combined with the clustering analysis algorithm to predict the daily electrical peak load.
It was shown that, compared to statistical methods, such as linear regression, DM-based methods had signicant superiority in
prediction accuracy. Kusiak et al. [7] developed the ensemble models of the energy consumption of the heating, ventilation and air
conditioning (HVAC) components and adopted the particle swarm
optimization algorithm to search the optimal set points of HVAC
components. It was reported that 7% of HVAC energy consumption could be achieved by using the proposed method. Ahmed et al.
[8] investigated the impacts of building characteristics and climate
conditions using classication techniques on indoor thermal comforts and indoor illuminance level. Three methods, i.e., the Nave
Bayes, decision tree, and support vector machine, were developed.
It was claimed that DM could be used as a decision aid to facilitate
building operational processes. Yu et al. [9] established a decision
tree model to predict the building energy use intensity. The results
showed that the proposed method was able to accurately predict
the energy use intensity (93% for training data, and 92% for testing
data). Yu et al. [10] adopted association rule mining to save building operational energy. The frequent-pattern growth algorithm was
used to generate rules among variables of HVAC air-side system. It
was shown that the discovered association rules can be used to
identify energy waste, detect equipment faults, and gain insight
into building operation. Cabrera and Zareipour [11] demonstrated
the capability of association rule mining in identifying lighting
energy waste patterns. The research results showed that effective
energy saving measures could be generated using the association
rules discovered. The simulation results showed that signicant
savings, as high as 70% of current energy use, were achievable.
Although DM techniques have already been used in the building eld, the previous research seldom took full advantage of DM
techniques in discovering knowledge underlying massive data sets.
The main purpose of using DM techniques in previous work was
to develop more accurate, reliable and computationally efcient
models, for example, SVM [5], neural networks [6], and decision
trees [9]. The number of input variables of the models in the studies above mentioned was still relatively small and the inputs were
usually pre-dened based on domain knowledge. For example,
domain knowledge in the building eld tells us that the building cooling load is affected by the outdoor temperature, humidity,
solar radiation, and etc. Therefore, these inuential variables were
selected as inputs of the model and the output was the cooling
load [7]. DM techniques, such as neural network and support vector machine [4,12], were then used to map the inputs to the output,
which produced the black-box models. Meanwhile, the previous
research work seldom targeted at the massive data sets in the building automation systems. One obstacle to applying DM in BASs is
that most of the DM techniques are so sophisticate that few building automation professionals are able to acquire them. The other
obstacle is that DM itself cannot tell the value or the signicance
of the knowledge discovered, and therefore domain knowledge in
the building eld is still needed to interpret the knowledge for
applications in BASs. Therefore, interdisciplinary research should
be performed to make a breakthrough in effectively utilizing the
massive data sets in BASs by using advanced DM techniques.
A number of DM techniques are available nowadays and more
are emerging with the development of technology. There is a
lack of generic approach to mining BAS database using DM techniques. This paper proposes an applicable framework for mining
BAS database using typical DM techniques. The widely used DM
techniques, including clustering analysis [4,12], association rule
mining [4,12] and recursive partitioning [13], are adopted in the
framework. Besides, the framework is extensible so that other
DM techniques can be integrated into the framework gradually.
A case study of implementing the framework in mining the data

sets retrieved from the BAS of the tallest building in Hong Kong
is conducted. The DM algorithms adopted in this study was performed on a Macintosh computer (Mac OS X 10.6.8), a processor of
2.2 GHz (Intel Core i7), and a memory capacity of 8GB. The resulting
computation time is shorter than 120 s for each step.
2. Overview of the framework and DM techniques
Fig. 1 illustrates the framework proposed for mining BAS data
sets using DM techniques. The framework consists of ve steps,
i.e. data preparation, clustering analysis, association rule mining
(ARM), post-mining and application. Data preparation, or data
preprocessing is performed to clean the data sets, reduce data
dimension and transform them to suitable formats for data mining. Clustering analysis classies the large data sets into several
clusters according to typical patterns identied in the clustering
analysis. Clustering analysis helps to reduce the distance among
the data sets and enhance the similarity of the data sets in each
cluster. Performing clustering analysis can enhance the reliability
of the knowledge discovered in the next step.
Fig. 1 ARM is employed in this study to discover knowledge
in the format of rules. A rule generally takes the form of If
A, then B, while A is called antecedent and B is the consequent. Rules are established by exploring relationships between
variables in data sets. As an example, a rule {Shoes = High
heels} {Gender = Female} states that if one wears high heels,
then its gender should be female. Other DM techniques may be
adopted in the framework for various purposes as shown in Fig. 1.
The post-mining stage focuses on the rule selection and rule interpretation. Finally, the knowledge discovered is used to improve
the building operational performance. Data preparation, postmining and application of discovered knowledge usually require
rich domain knowledge; however, the clustering analysis and
ARM are almost independent of the system concerned and mainly
involve mathematical algorithms. The framework and DM techniques adopted in this study are described as follows.
2.1. Data preparation
Data preparation, or data preprocessing, is an essential step in
the process of knowledge discovery using DM. Previous experiences showed that data preparation might take 80% of the total
data mining effort [14]. Actually, data preparation is equally important in the model-based methods due to unreliable measurements
[15] and process dynamics [16]. The accuracy and reliability of the
mining results are largely determined by the data quality. Data
preparation is essential to mining BAS data sets because, rstly,
the data quality in BAS is usually low due to measurement noises,
uncertainties, sensor faults, and insufcient calibration. Two typical problems with BAS data sets are missing values and outliers
and they may negatively affect the data mining performance. It
was reported that when more than 15% of the training data are
outliers, the decrease in neural network models accuracy is statistically signicant even with small magnitudes of outlierness [17].
Secondly, most DM techniques have special requirements on the
data format. BAS data consist of both numerical (quantitative) and
categorical (qualitative) data. The typical examples of numerical
data are the measurements of temperature, ow rate, power and
pressure. The typical examples of categorical data are the ON/OFF
control and state signals as well as date and time. Normally, ARM
is applied to handle categorical data and therefore numerical data
should be transformed to categorical data before conducting ARM.
For instance, the power consumption of chiller is numeric data. Discretization methods can be applied to transform the numeric data
111
Raw data in BAS
Data preparation
(Data cleaning, data transformation, data reduction, etc.)
Clustering analysis
(Partitioning clustering, hierarchical clustering, etc.)
Association rule mining

(Apriori, Eclat, FP-growth algorithms, etc.)
Applications of other data mining techniques

(Predictions, classifications, sequential pattern mining, etc.)
Post-mining
(Rule selection, rule interpretation, etc.)
Application of discovered knowledge

(Performance prediction and evaluation,
abnormality detection. optimization, etc.)
Fig. 1. Framework for mining BAS data sets using DM techniques.
to 2-level categorical data, e.g., Low and High. The scales of BAS
data are also very different due to different units used. For instance,
a typical control signal usually changes from 0 to 1; the temperature measurements may change from 0 C to 40 C; and the power
measurements may change for 0 kW to 5000 kW. Some predictive
data mining techniques (e.g. support vector machine) perform better if the input data have similar scales. Therefore, scaling methods,
such as normalization or standardization, should be performed in
data preparation.
Generally speaking, the data preparation fullls three tasks
including data cleaning, data transformation and data reduction.
Data cleaning handles missing values, resolves inconsistencies, and
detects and removes outliers. Missing values can be lled in using
the global constant, moving average, imputation or inference-based
models [18]. Outliers can be identied using unsupervised clustering, supervised classication and semi-supervised recognition
[4]. Data transformation includes scaling of data sets and transformation of data attribute or type. Commonly used scaling methods
include the maxmin normalization, Z-score normalization and
decimal point normalization [18]. Attribute transformation prepares the data into the suitable format as required by a DM
algorithm, for instance, transform numerical data to categorical
data. Popular methods of attribute transformation include feature
extraction, equal-frequency binning, equal-interval binning and
entropy-based discretization [18]. Data reduction aims to reduce
the dimension of the data sets so as to improve the calculation efciency. Data sets retrieved from BAS can usually form a matrix with
each row representing an observation set at specic time instant
and each column representing a variable or an item, such as the
chiller power consumption and the space temperature. Sampling
techniques, such as random sampling and stratied sampling, can
be applied to reduce the row number (i.e. number of observation
sets). Three methods are commonly used to reduce the column
number. Firstly, one may use its domain expertise to select the
most relevant variables. Secondly, one may create a few representative variables using a linear combination of original variables
(e.g. principal component analysis [12]). Thirdly, one may use the
heuristic methods, such as the step-wise forward selection and
step-wise backward elimination methods, to reduce the column
number [18].
2.2. Clustering analysis

Clustering analysis is conducted in such a manner that objects
with similar characteristics are grouped within the same cluster.
The similarities between any pairs of observations are normally
evaluated using distance-based metrics, such as the Manhattan and
Euclidean metrics. Clustering analysis aims to maximize observation similarities within the same cluster and minimize similarities
between different clusters. It is realized that conventional clustering analysis may fail to reveal the true underlying membership
if the data dimension becomes too high, as the distance-based
metrics may become meaningless in high-dimensional space. To
overcome such problem, some advanced clustering algorithms,
e.g., the subspace clustering, have been proposed [18]. Clustering analysis has been successfully used to preprocess large data
sets, identify outliers and discover underlying patterns [4,12,18]. In
this study, the entropy-weighted k-means (EWKM) method [19] is
adopted to identify the typical building operational patterns. Three
parameters should be specied to perform the EWKM algorithm
including the cluster number (k), the weight distribution parameter (), and the convergence threshold (). The optimal parameter
values can be determined using either internal validation methods
(e.g., DaviesBouldin index, Silhouette index and Dunn index) or
external validation methods (e.g., purity, F-measure and normalized mutual information) [4]. The details of EWKM algorithm can
be found in [19].
112
2.3. Association rule mining

Association rule mining (ARM) is also an unsupervised learning
process. It was rstly applied to perform the market basket analysis, which aims to identify customer purchase behaviors. Later,
ARM has been widely used to analyze large datasets in various
elds, such as retail, bioinformatics and sociology [18]. ARM normally requires the data sets to be mined are categorical; therefore,
data transformation is usually needed for mining BAS data sets
using ARM. A brief explanation of ARM is given blow.
Let I be a non-empty item set, an association rule is a statement
of the form A B, where A, B I, and A B = . The set A is called
the antecedent of the rule while the set B is called the consequent
of the rule. Association rules are derived from a large number of
observation sets (T), which is known as transaction sets in the DM
eld. Each variable or item in T belongs to I.
Let P(A) denote the probability that set A appears in the data set
T and P(A,B) denote the probability that the sets A and B coincide
in the data set T, the conditional probability of B given A is dened
by:
P(B|A) =
P(A, B)
P(A)
The interestingness of a rule is evaluated using three parameters, i.e. support, condence and lift:
Support (A B) = P(A, B)
Condence (A B) = P(B|A)
P(B|A)
P(A, B)
=
Lift (A B) =
P(A)P(B)
P(B)
ARM aims to nd out all rules satisfying the user-specied minimum support or minimum condence. Support of a rule is the joint
probability of the antecedent and the consequent. Condence is the
conditional probability of the consequent, given the antecedent.
Support and condence are normally used to determine whether
the rule is statistically signicant or not. Lift is a measure of the
dependence and correlation between the antecedent and the consequent. If the lift equals 1, it indicates that antecedent and the
consequent are independent of each other, and hence, the discovered knowledge has little value. Lift larger than 1 indicates positive
correlation, which means that the probability of the consequent is
positively affected by the occurrence of the antecedent. In contrast,
lift smaller than 1 indicates negative correlation. Therefore, desired
association rules should have lift values deviating from 1.
Many algorithms are available to perform ARM, including the
Apriori, ECLAT and FP-growth. In this study, the Apriori algorithm
[4] is employed. The key assumption is that any subsets of a frequent data set should also be frequent, which ensures the efciency
in generating candidate frequent sets. More specically, the Apriori algorithm rst generates candidate frequent item sets from
the original large data sets. Then, by comparing the user-specied
threshold for support and the frequency counts, frequent item sets
can be selected. Association rules are then derived within each
frequent item set considering the user-specied threshold for condence.
2.4. Recursive partitioning for post-mining
Recursive partitioning is a supervised, nonparametric method
used to develop tree-structure models for predictions or classications. Recursive partitioning models are self-explanatory and easy
to follow, enabling users to conveniently understand the underlying reasoning process. The tree-structure model normally consists
of a root node, internal nodes and terminal nodes. A root node only
has outgoing edges while a terminal node only has incoming edges.
An internal node has both incoming and outgoing edges. A simple
tree model for classifying chiller energy consumption levels based
on outdoor temperature and occupancy is shown in Fig. 2. The rst
splitting variable is the outdoor temperature, which presents in
the root node, or Node 1 in the gure. If the outdoor temperature is
higher than 24 C, the chiller energy consumption level should be
High. Otherwise, the occupancy level should be considered. The
occupancy level is selected as the splitting variable in the internal
node (i.e., Node 2). For instance, if the outdoor temperature is no
more than 24 C and the occupancy level is larger than 0.5, then
the chiller energy consumption level should be High. In this tree
model, Nodes 3, 4 and 5 are terminal nodes, showing the classication results with proportions. In this example, the classication
accuracy is 100%, because all the proportions of the resultant chiller
energy consumption levels are 1 (or 100%) at the three terminal
nodes. Recursive partitioning has been extensively used in analyzing problems in genetics, clinical medicine, and bioinformatics
[20].
There are many algorithms to perform recursive partitioning,
such as the conditional inference tree method [19], CART and C4.5
[18]. The conditional inference tree method, which has shown
effective diagnostic capability, is employed in the post-mining step
to develop the tree-structure model for analyzing the abnormalities
detected by the association rules.
3. Mining BAS data sets retrieved from a real building
3.1. Description of the raw BAS data sets
The data sets concerned in this study were collected from the
tallest commercial building in Hong Kong [20]. This building was
designated as an Intelligent Building of 2011 by the Asian Institute of Intelligent Buildings. An advanced BAS is installed in this
building. Over 500 power meters record the real-time power consumptions of various components like chillers, pumps, fans, lifts,
lighting devices, and etc. 8-month data (from January, 2012 to
August, 2012) were collected with an interval of 15 min, resulting in 22,974 observation sets in total. The collected data consist
of the date and time (i.e. year, month, day, hour, minute, weekday), measurements of indoor and outdoor variables (e.g. outdoor
temperature and relative humidity, indoor CO2 concentration)
and various power consumptions of sub-systems and components
including essential power, normal power, plumbing & drainage,
lift & escalator, mechanical ventilation system, air-handling units,
primary air units, chillers, cooling towers, primary chilled water
pumps (PCHWP), secondary chilled water pumps (SCHWP), condenser water pumps (CDWP). All the data are considered in this
study.
3.2. Data preparation
3.2.1. Data cleaning
The raw BAS data sets contain signicant missing values and
outliers. Discarding low-quality data will enhance the reliability of
mining results. In this study, missing values are handled using a
simple moving average method with a window size of 5 samples.
In the raw BAS data set, there are also some dead values, which do
not change in long time. In this study, if a variable does not change
in one hour, the corresponding observation set will be discarded.
Outliers will be detected using a simple lter, the interquartile
range rule [15]. The interquartile range is the difference between
the third quartile (i.e., Q3 ) and the rst quartile (i.e., Q1 ). The
lower limit and the upper limit are dened as Q1 1.5(Q3 Q1 ) and
113
Fig. 2. A simple example of the tree model.
Q3 + 1.5(Q3 Q1 ), respectively. Any variable lies beyond the range

dened by these two limit values are regarded as outliers and discarded. After data cleaning, 19,962 out of 22,974 observation sets
are retained for DM.
3.3. Data transformation for clustering analysis
In this study, clustering analysis is used to identify the typical
building operational patterns in terms of daily building power consumption. The total building power consumption is grouped on a
daily basis. Since the data is collected at the interval of 15 min, each
day should have 96 power consumption data. In total, 192 sets of
complete daily data (i.e. 18,432 observation sets) are used for further analysis. Data scaling is performed to derive the relative trend
of energy usage within each day. Considering that 96 values per
day create a relatively sparse matrix for analysis, feature extraction
is performed to enhance the clustering efciency. Feature extraction can be regarded as a special form of data reduction. It aims
to transform the data into a set of features, which can best represent the contained information [4]. In this study, three modes are
dened based on the usage patterns of typical commercial buildings: morning (07:0012:00), afternoon (13:0019:00) and night
(20:006:00). Four commonly used statistics (i.e., mean, maximum,
minimum and standard deviation) are calculated for each mode.
Consequently, the resulting twelve features are used to represent
daily building power consumption. The dimension of data for clustering is reduced from 96 to 12.
3.3.1. Data transformation for association rule mining
Association rule mining is adopted to discover association rules
in the operational data. Most association rule mining algorithms
require the data to be categorical (such as high, medium and low)
rather than numerical. However, most of the raw data, except for
the date and time, are numerical. Therefore, data transformation
should be carried out on BAS data sets before mining association
rules.
A number of methods are available to discretize data from
numeric to categorical. The equal-width method and equalfrequency method have been widely used due to their simplicity
and reliability. The equal-width binning method divides the data
into m intervals of equal size, while the equal-frequency method
divides the data into m groups which contains approximately same
number of observations. Transformation results can greatly affect
the ARM performance. For instance, if the observation number
in one category is too small, this category will be regarded as

infrequent event. As a result, it may be very difcult to discover
rules related to this category under a high support setting. There
is no universally applicable guideline on how to select the optimal
transformation method for a specic problem. It is recommended
to examine the distribution of the numeric data rst, and then
integrate domain knowledge to select a suitable method for
data transformation. For instance, the power consumptions of a
two-speed fan can be easily categorized into three categories: low
(corresponding to zero power consumption when the fan stops),
medium (low speed) and high (high speed). Generally speaking,
the more categories are used, the smaller the relative frequency
of each category will be. Consequently, the support threshold
should be set lower to cater for less frequent relationship when
performing association rule mining.
In this study, the power consumption data and weather data are
all numerical and they should be transformed into categorical data
before mining association rules. Considering the climate conditions
in Hong Kong, the outdoor air temperature is categorized into 6
levels with the interval of 5 C from below 10 C to above 30 C, and
the outdoor air relative humidity is categorized into 6 levels with
the interval of 5% from below 70% to above 90%.
The equal-frequency binning method, which results in an equal
size of each category, is used to categorize all the power consumption data, except for the power consumption data of PCHWP and
CDWP. PCHWP and CDWP are constant speed pumps and their
power consumptions will keep constant when the pumps run.
Therefore, the power consumption data of PCHWP and CDWP are
categorized according to the running pump number, for example,
2nd means 2 pumps are running. The rest power consumption
data are categorized into 3 categories using the equal-frequency
binning methods, as they generally have a continuous distribution across their ranges. The three categories can be dened as low,
medium and high.
3.4. Identication of power consumption patterns using
clustering analysis
Clustering analysis was employed to investigate daily power
consumption patterns. The open-source software R [21] was used
to perform all the DM techniques used in this study.
Two internal validation indices, the Dunn index and the Silhouette width [4], were used to select the clustering algorithm
and the optimal cluster numbers. The Dunn index measures the
114
Hierarchical
Kmeans
Pam
Hierarchical
Pam
0.45
Silhouette Width
0.05
Dunn Index
Kmeans
0.04
0.03
0.40
0.35
0.30
0.02
2
Number of Clusters
Number of Clusters
Fig. 3. Clustering validation results.
ratio of the minimal intra-cluster distance to the maximal intercluster distance. Therefore, the Dunn index should be maximized.
The Silhouette width measures the average of each observations
Silhouette value, which reects the condence level of the clustering of a particular observation. Hence, the Silhouette Width ranges
from 1 to 1 and it should be maximized.
Three popular clustering algorithms, i.e. the agglomerative
hierarchical, k-means and PAM algorithms, were compared. The
searching range of the cluster number was set from 2 to 7. Clustering validation results are shown in Fig. 3. Both indexes show that
k-means algorithm with 3 clusters has the best clustering results.
Therefore, the k-means algorithm was selected with k equals to 3.
The entropy-weighted k-means (EWKM), which is an extension
of the k-means algorithm, was applied to perform the clustering
analysis. EWKM can obtain not only the clustering results, but also
the relative importance (RI) of each variable. The Dunn index and
Silhouette width are used to nd the optimal parameters. As a
result, the weight distribution parameter () and the convergence
threshold () were set as 0.2 and 0.0001, respectively.
The EWKM clustering result is shown in Fig. 4. Nearly all the
daily building power consumption feature data in Cluster 1 come
from weekdays (i.e., Monday to Friday). Cluster 2 mainly consists of
data from Saturday while the majority of observations in Cluster 3
come from Sunday. Such result indicates that power consumption
patterns in weekday are similar to each other. Meanwhile power

consumption patterns in weekday, Saturday and Sunday are typical
and very different from each other.
Fig. 5 is the heat map showing the relative importance of the
extracted features in each cluster. It can be found from Fig. 5 that
four features, i.e. the morning peak power, afternoon peak power,
afternoon minimum power and night power standard deviation,
are crucial in determining all three clusters. Further analysis will
be conducted on the three clusters separately.
3.5. Association rule mining
As described in Section 3.3, three typical building operating patterns were identied using clustering analysis. The transformed
data sets, as described in Section 3.2, were divided into three separate data sets or the three clusters for working days, Saturday, and
Sunday, respectively. Association rule mining was conducted on
each cluster. Apriori was selected as the mining algorithm.
Two key parameters, i.e., minimum support and condence,
should be determined to carry out the ARM. In this study, the minimum support is set relatively low, i.e., 0.2, to capture associations
Cluster ID
Mon
Tue
Wed
Thu
Fri
Sat
Sun
Label
Fig. 4. EWKM clustering result.
Fig. 5. Relative importance of features.
between infrequent events. By contrast, the minimum condence

threshold is set to be relatively high, i.e., 0.85, to ensure the reliability of obtained rules. Considering that the interpretability of
discovered rules decreases with the increase in item number, the
minimum and maximum item number (i.e. the total number of
antecedents and consequents) in a rule was set to be 2 and 5,
respectively. Redundant rules were removed by comparing their
lift values. For instance, assuming that Rule A and Rule B have the
same consequent, and Rule As antecedent is a superset of Rule Bs.
If Rule A has the same or a lower lift value, Rule A is redundant and
removed.
In total, 257, 78 and 122 rules were derived from the data sets
of Weekday, Saturday and Sunday, respectively. Most of rules can
be easily obtained from domain knowledge and hence be ignored
in this study. For example, Rule 1 in Table 1 describes that, if the
outdoor temperature on Saturday is between 15 C and 20 C, the
chiller power consumption is Low. This can be easily understood,
as a low outdoor temperature and a low occupancy level on Sunday always lead to a small cooling load and hence a low chiller
power consumption. Rule 2 states that, if the power consumption
of the primary air handling units is High, the power consumption
of lifts is High. This rule can also be easily interpreted, as both
the power consumption of the primary air handling units and lifts
are closely related to occupancy level. A higher power consumption of the primary air handling units normally indicates a higher
occupancy level and hence, more people need to use the lifts for
vertical transportation. Four representative rules, which are either
against common experience or of particular value, are analyzed in
detail.
4. Applications of association rules for improving building

operational performance
4.1. Application of association rules I: decit ow
Rules 35 are interesting because they disobey one simple
design principle. In the design, each chiller is associated with
one constant-speed primary chilled water pumps (PCHWP) and
one constant-speed condenser water pumps (CDWP). The running
numbers of the PCHWP and the CDWP should be the same as the
number of the chillers in operation, so called one-to-one operation
strategy. Rule 3 indicates that on weekdays, if the PCHWP power
consumption is at the 4th level (i.e., 4 PCHWP are running), the
CDWP power consumption is at the 3rd level (i.e., 3 CDWPs are
running). Rule 4 and Rule 5 states the similar phenomena on Saturday and Sunday. If the PCHWP power consumption is at the 3rd
level, the CDWP power consumption is at the 2nd level. There is
always one more PCHWP in operation, which may cause signicant energy waste. Therefore, the operational strategy of PCHWPs
should be investigated.
Fig. 6 shows the relative frequency when the same number of
PCHWPs and CDWPs are in operation in each month. The relative
frequency is the number of events concerned divided by the total
number of observations. It can be found that from May, the numbers of PCHWPs and CDWPs in operation are different for more than
90% of the time. After checking with the operation staff, the reason
was found. To prevent decit ow, one extra PCHWP was started
to compensate the ow rate in the primary loop. This decit ow
prevention strategy was implemented occasionally before May;
however, it has been consistently used starting from May.
Decit ow is a commonly encountered problem in the
primarysecondary chilled water systems with decoupled bypass
line. It normally takes place when the required ow rate of the
secondary loop exceeds that provide by the primary loop. Severe
operational problems can be caused, such as high supply chilled
115
Fig. 6. Relative frequency one-to-one operation conditions.
water temperature, over-supplied chilled water, and increased

energy consumption of secondary pumps [22].
It is obvious that the current operation strategy is not energyefcient due to the operation of one extra PCHWP. However, it
seems necessary to operate one more PCHWP to prevent decit
ow. The question is whether the operation strategy can effectively
prevent decit ow, or the cost of energy is worthwhile or not. To
answer this question, recursive partitioning was applied to evaluate
the effectiveness of such a strategy. All the observation sets under
the condition that the number of running PCHWPs equals the number of running CDWPs plus 1 were extracted for further analysis.
13,004 observation sets in total were obtained. The ow rate in the
bypass line was selected as the indicator of decit ow. If it is negative, decit ow occurs, and vice versa. A tree model was built using
recursive partitioning. The model output is either decit ow or
normal condition. The condence level was set to be 95% in determining the splitting variable. To optimize the conguration of tree
model, two parameters, i.e., the minimum number of observations
in a node to perform splitting and the minimum number of observation in a terminal node, were specied. These two parameters
were determined by cross-validation using the classication purity
as evaluation criteria. As a result, these two parameters were set as
3000 and 1000 respectively.
The developed conditional inference tree is shown in Fig. 7. Each
terminal node shows the proportion of the classied items, D for
decit ow and N for normal condition. The rated power of each
PCHWP and CDWP are 126 kW and 202 kW, respectively. The rst
two terminal nodes, Nodes 3 and 4, indicate that decit ow still
occur frequently when the number of running PCHWP is 2. More
specically, if the outdoor temperature is relatively high, i.e., higher
than 22.95 C, decit ow always occur. By contrast, if the outdoor
temperature is relatively low, i.e., lower than 22.95 C, the chance
of decit ow decreases. Therefore, when only 1 chiller is in operation, decit ow cannot be prevented effectively by running one
extra PCHWP. In addition, Node 10 also indicates that the operation
strategy cannot effectively prevent decit ow in the corresponding situation. It shows that when the number of running PCHWP
is 3 and the chiller power consumption is larger than 1861.7 kW,
which is between the capacities of one and two chillers, decit ow
occurred in 60% of the operation time.
The other three terminal nodes, Nodes 8, 9 and 11, indicate a
good performance as decit ow can be prevented effectively. It is
116
Table 1
Summary of interesting association rules.
No.
Antecedent
Consequent
Supp.
Conf.
Lift
Cluster
1
2
3
4
5
6
Out.T = (15,20)
Pwr.PAU = High
Pwr.PCHWP = 4th
Pwr.PCHWP = 3rd
Pwr.PCHWP = 3rd
Pwr.PCHWP = 4th
Pwr.Chiller = Low
Pwr.Lift = High
Pwr.CDWP = 3rd
Pwr.CDWP = 2nd
Pwr.CDWP = 2nd
Pwr.SCHWP = High
0.25
0.35
0.27
0.32
0.34
0.24
0.88
0.86
0.99
0.88
0.89
0.89
2.10
1.76
2.73
1.78
1.69
2.83
Saturday
Weekday
Weekday
Saturday
Sunday
Weekday
noticed that variable month is selected as the splitting variable for

Node 7. It is observed that before May, when the running number of
chiller is 2 and the running number of PCHWPs is 3, no decit ow
occurs. By contrast, starting from May, under the same condition,
decit ow may occur with around 15% probability. The potential
affecting factors can be climate, system set points, and etc. Node 11
shows that when the running number of PCHWPs is larger than 4,
no decit ow occurs. It can be concluded that the current operation
strategy is effective to prevent decit ow when 3 or more chillers
are in operation.
To sum up, the recursive partitioning model reveals that the current operation strategy for preventing decit ow is not effective,
particularly when only one chiller is in operation, or low cooling load condition. It is recommended to develop different control
strategies for low cooling load condition to effectively overcome
the problem of decit ow and save energy.
4.2. Application of association rules II: detection of abnormal
operation
Rule 6 in Table 1 is derived from the weekday data sets. It
says that if the PCHWP power consumption is at the 4th level,
the secondary chilled water pump (SCHWP) power consumption is
High. Rule 6 shows a reasonable relationship between the energy
consumption of primary pumps and secondary pumps. As the cooling load increases, the required secondary chilled water ow rate
increases and the power consumption of SCHWPs increase, too.
When the cooling load increases signicantly, one more chillers

and hence one more PCHWP are started. Therefore, more PCHWP
in operation means a greater cooling load and hence more power
consumption of SCHWPs. Although this rule can be easily understood with domain knowledge, the quantitative description of the
rule is not that straightforward. ARM provides an applicable rule for
detecting abnormal operation of the primary and secondary pumps.
Using this rule to examine the raw data, it was found that the
weekdays data sets have 438 abnormal observations. It was also
found that the majority of these abnormal observations are sparsely
distributed on different days, resulting less than 5 (i.e., 75 min) continuous abnormal observations for one specic day. Since HVAC
system may experience transient situations during the OnOff control of the major components like chillers and pumps, these sparse
abnormal observations can be ignored. However, if a large number of abnormal observations occur continuously as described in
the example below, it is reasonable to believe that the operation
presents some problems.
Fig. 8 shows the abnormal primarysecondary pump operation in one weekday founded in the raw data sets. Starting from
11:00, the SCHWP power consumption undergoes a rapid increase
and its running condition is changed from medium to high. At
the same time, the PCHWP power consumption rises to around
500 kW, which corresponds to the power consumption of 4 PCHWPs (i.e. 4 126 kW = 504 kW). The CDWP power consumption
also rises to around 600 kW, which corresponds to the operation
of 3 CDWPs (i.e., 3 202 kW = 606 kW). Three hours later, i.e., at
Fig. 7. Developed conditional inference tree.
117
of 5 major steps, data preparation, clustering analysis, association

rule mining, post-mining, and application of discovered knowledge. Data preparation is performed to improve the data quality
and transform them to suitable formats for data mining. Clustering
analysis is carried out to identify the typical building operational
patterns, based on which more reliable mining results can be
obtained. Association rule mining is performed to extract the
hidden knowledge in the format of rules. Post-mining helps to
select and interpret potential useful rules. Two cases are presented
which demonstrate the applicability of this framework in enhancing building operational performance. The experience gained in
this study shows that DM is a powerful tool to discover knowledge underlying the large amount of BAS data sets; however, rich
domain knowledge in the building eld is still necessary for the
application of knowledge discovered.
The framework proposed in this study is exible and extensible.
Further study will be conducted to improve the BAS data mining
framework. More advanced DM techniques will be investigated and
the suitable ones will be integrated into the framework for various
purposes including prediction, optimization and diagnosis.
Fig. 8. Abnormal running condition.
Acknowledgements
The authors gratefully acknowledge the support of this research
by the Hong Kong Polytechnic University (project No. G-YM86). We
would also like to thank Professor Shengwei Wang for his invaluable advice and help.
References
Fig. 9. Normal running condition.
14:00, the SCHWP power consumption drops back to the Medium

level and never reaches the High level again during the rest
of the day. Nevertheless, no action was taken for PCHWPs and
CDWPs, as they keep on running with high intensity until 20:00.
The operation from 14:00 to 20:00 does not satisfy Rule 6 and
can be diagnosed as abnormal operation. In this case, it wastes
energy in the 6 h. The extra energy cost is around 2000 kWh (i.e.,
(126 + 202) 6 = 1968 kWh) on that day.
By contrast, Fig. 9 shows the normal operation in one weekday.
The SCHWP power consumption reaches the high level at 08:00
and it drops back to the medium level around 2 h later. Corresponding to these changes, it is observed that one more PCHWP
and CDWP are switched on at around 08:00 and switched off at
10:00.
5. Conclusions
This paper proposes an applicable framework for mining BAS
data sets using popular DM techniques. The framework consists
[1] 2011 Building Energy Data Book, U.S. Department of Energy, March 2012.
[2] Hong Kong Energy End-use Data 2012, Hong Kong Electrical & Mechanical
Services Department, September 2012.
[3] T. Ramesh, R. Prakash, K.K. Shukla, Life cycle energy analysis of buildings: an
overview, Energy and Buildings 42 (10) (2010) 15921600.
[4] O. Maimon, L. Rokach, Data Mining and Knowledge Discovery Handbook, 2nd
ed., Springer, New York, 2010.
[5] B. Dong, C. Cao, S.E. Lee, Applying support vector machines to predict building energy consumption in tropical region, Energy and Buildings 37 (2005)
545553.
[6] M.R. Amin-Naseri, A.R. Soroush, Combined use of unsupervised and supervised
learning for daily peak load forecasting, Energy Conversion and Management
49 (2008) 13021308.
[7] A. Kusiak, M.Y. Li, F. Tang, Modeling and optimization of HVAC energy consumption, Applied Energy 87 (2010) 30923102.
[8] A. Ahmed, N.E. Korres, J. Ploennigs, H. Elhadi, K. Menzel, Mining building performance data for energy-efcient operation, Advanced Engineering Informatics
25 (2011) 341354.
[9] Z. Yu, F. Haghighat, C.M. Fung, H. Yoshino, A decision tree method
for building energy demand modeling, Energy and Buildings 42 (2010)
16371646.
[10] Z. Yu, F. Haghighat, C.M. Fung, L. Zhou, A novel methodology for knowledge discovery through mining associations between building operational data, Energy
and Buildings 47 (2012) 430440.
[11] D.F.M. Cabrera, H. Zareipour, Data association mining for identifying lighting
energy waste patterns in educational institutes, Energy and Buildings (2013),
http://dx.doi.org/10.1016/j.enbuild.2013.02.049.
[12] D.L. Olson, D. Delen, Advanced Data Mining Techniques, Springer-Verlag, Berlin,
Heidelberg, 2008.
[13] T. Hothorn, K. Hornik, A. Zeileis, Unbiased recursive partitioning: a conditional inference framework, Journal of Computation and Graphical Statistics
15 (2006) 651674.
[14] S.C. Zhang, C.Q. Zhang, Q. Yang, Data preparation for data mining, Applied
Articial Intelligence 17 (2003) 375381.
[15] S.W. Wang, Q. Zhou, F. Xiao, A system-level fault detection and diagnosis strategy for HVAC systems involving sensor faults, Energy and Buildings 42 (2010)
477490.
[16] Z.J. Ma, S.W. Wang, Supervisory and optimal control of central chiller plants
using simplied adaptive models and genetic algorithm, Applied Energy 88
(2011) 198211.
[17] A. Khamis, Z. Ismail, K. Haron, A.T. Mohammed, The effects of outliers data
on neural network performance, Journal of Applied Science 5 (8) (2005)
13941398.
[18] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data
Mining, Inference and Prediction, 2nd ed., Springer Series in Statistics, New
York, USA, 2009.
118
[19] L.P. Jing, M.K. Ng, J.Z. Huang, An entropy-weighting k-means algorithm for
subspace clustering of high-dimensional sparse data, IEEE Transactions on
Knowledge and Data Engineering 19 (2007) 10261041.
[20] C. Strobl, J. Malley, G. Tutz, An introduction to recursive partitioning: rationale,
application, and characteristics of classication and regression trees, bagging,
and random forests, Psychological Methods 14 (2009) 323348.
[21] R. A Language and Environment for Statistical Computing, R Foundation for

Statistical Computing, Vienna, Austria, 2013 http://www.R-project.org
[22] D.C. Gao, S.W. Wang, Y.J. Sun, A fault-tolerant and energy efcient control
strategy for primarysecondary chilled water systems in buildings, Energy and
Buildings 43 (2011) 36463656.

Xiao 2014

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Xiao 2014

Uploaded by

Copyright:

Available Formats

Energy and Buildings 75 (2014) 109118

Contents lists available at ScienceDirect

Energy and Buildings

Data mining in building automation system for improving building

temperature, humidity, ow rate, pressure, power, control signals,

F. Xiao, C. Fan / Energy and Buildings 75 (2014) 109118

in the building load forecasting area. Amin-Naseri and Soroush [6]

A case study of implementing the framework in mining the data

F. Xiao, C. Fan / Energy and Buildings 75 (2014) 109118

Raw data in BAS

Association rule mining

Applications of other data mining techniques

Application of discovered knowledge

Fig. 1. Framework for mining BAS data sets using DM techniques.

2.2. Clustering analysis

F. Xiao, C. Fan / Energy and Buildings 75 (2014) 109118

2.3. Association rule mining

F. Xiao, C. Fan / Energy and Buildings 75 (2014) 109118

Fig. 2. A simple example of the tree model.

Q3 + 1.5(Q3 Q1 ), respectively. Any variable lies beyond the range

in one category is too small, this category will be regarded as

F. Xiao, C. Fan / Energy and Buildings 75 (2014) 109118

patterns in weekday are similar to each other. Meanwhile power

Fig. 5. Relative importance of features.

F. Xiao, C. Fan / Energy and Buildings 75 (2014) 109118

between infrequent events. By contrast, the minimum condence

4. Applications of association rules for improving building

Fig. 6. Relative frequency one-to-one operation conditions.

water temperature, over-supplied chilled water, and increased

F. Xiao, C. Fan / Energy and Buildings 75 (2014) 109118

noticed that variable month is selected as the splitting variable for

When the cooling load increases signicantly, one more chillers

Fig. 7. Developed conditional inference tree.

F. Xiao, C. Fan / Energy and Buildings 75 (2014) 109118

of 5 major steps, data preparation, clustering analysis, association

Fig. 9. Normal running condition.

14:00, the SCHWP power consumption drops back to the Medium

F. Xiao, C. Fan / Energy and Buildings 75 (2014) 109118

[21] R. A Language and Environment for Statistical Computing, R Foundation for

You might also like