Using artificial intelligence to control building systems - Consulting

Energy savings in buildings can be achieved when artificial intelligence and building automation systems are connected to achieve HVAC goals

Learning Objectives

Understand how basic machine learning algorithms used in the control and optimization of HVAC systems work, and what type of data is needed to train them.
Learn about the impact that a BAS network has on the data that is being used by machine learning algorithms.
Know what metrics can be used to analyze the performance of machine learning algorithms that are being used for the control or optimization of HVAC systems.

In building automation and controls, there is no shortage of information about artificial intelligence and the potential energy savings that can be achieved by AI algorithms when they are used for the control of building systems. Similarly, there is no shortage of building automation system vendors that offer some sort of AI solution in addition of their standard BAS solution. Lastly, there is an ever-growing pool of third-party providers of AI based controls solutions that claim to be able to work with any BAS.

In a heating, ventilation and air conditioning controls design process consistent with industry standards, the design engineer produces a set of sequence of operations, which describe the intent for how the systems are to operate, while stopping short of providing 100% of the information required for a controls vendor to program the sequences.

The experience and expertise the controls vendor is then drawn upon to translate the intent provided by the design engineer into computer code entered into the BAS. This process gives the design engineer control over the general manner in which the systems will operate while delegating the final details such as time delays, proportional integral derivative loop gains and general code structure.

When it comes to AI-based control solutions, the design engineer is often forced to delegate a significantly larger portion of the controls design to the AI solution provider due to the proprietary and wider range of the AI solution. This reduction in control can be unsettling for design engineers, particularly when the concepts and vocabulary used to describe the nature of the AI solution are foreign to most HVAC design engineers, and when the claims made by various solution providers are difficult to prove and/or normalize to a particular application.

Although the building controls uses various established communication protocols, i.e., BACnet, Modbus and LONworks, the examples in this article are based on the BACnet communication protocol.

In general, most cloud-based solutions sit on top of a BAS (see Figure 1). In these cases, there needs to be a “translator/communicator” between the cloud-based solution and the BAS. This is where an application programming interface comes into play; the API “moves” the data (i.e., data from sensors and actuators, commands from the cloud to the BAS, etc.) between the cloud and the BAS.

The API can be provided by the lower-level BAS provider or the cloud-based solution provider. The BAS comprises all the BACnet building-level controllers (B-BC), BACnet advanced-application controllers (B-AAC), BACnet application-specific controllers (B-ASC), BACnet advanced-operator workstations (B-AWS) and all other wiring, sensors, actuators, servers, etc.

ANSI/ASHRAE Standard 135-A Data Communication Protocol for Building Automation and Control Networks, Annex L is the standard that established the performance criteria required to be met by a BAS and its components in order for the BAS to be classified as BACnet compliant.

The B-AWS provides complete configuration, monitoring, modification and operation of the entire direct digital control system by advanced building operators and technicians.

The B-BC is typically used for large (i.e., more than 40 control points and sophisticated sequence of operations) air handling units, chilled water plants and heating plants.

The B-AAC is typically used for smaller (i.e., less than 30 control points and not sophisticated sequence of operations) AHUs.

The B-ASC is typically used for the control of terminal units (i.e., variable air volume boxes, fan coil units, etc.)

Establishing the proper communication link between the local BAS and the cloud solution is perhaps the most important part of designing the BAS network infrastructure. This is due to:

In general, cloud-based solutions do not natively speak any of the standard HVAC controls communication protocols (i.e., BACnet, ModBUS, etc.). This is where the API comes into the picture; it converts the information from the cloud-based solution into a standard communication protocol so that the B-BCs, B-AACs and B-ASCs can understand what is being asked of them to do. Similarly, the API receives information from the B-BCs, B-AACs and B-ASCs and converts it into whatever (proprietary) communication protocol is required by the cloud-based solution. The more complex the cloud-based solution, the more complex and costly the API becomes.
Frequency of polling. Polling can be defined as the frequency of time (usually in minutes) a cloud-based solution is allowed by the BAS vendor to talk to various BAS controllers; too much polling and the BAS network could become overwhelmed. Infrequent polling the data available to the cloud-based solution and thus, negatively impacts its effectiveness.

How can the BAS network become overwhelmed?

Unfortunately, a significant portion of existing and even new BAS networks are built based on what could be considered an old technology and that is the BACnet MS/TP network and communication protocol. In the mid to late 1990s to present, the BACnet MS/TP network and the associated communication protocol gave owners a much-needed benchmark to compare various BAS vendors and, at the same time, a more cost-effective way of having a BAS.

Figure 1: A BACnet network architecture that shows the hierarchy of various BACnet controls devices and the interface point with a cloud-based solution. Courtesy: SmithGroup

A standard set of trend data from a standard BAS includes data polled, typically at 5-to-15-minute intervals. As shown in Figure 1 (red lines), the BACnet MS/TP section of the network typically resides downstream of B-BCs. One could consider the actual speed of the BACnet MS/TP network as the equivalent of the speed of an internet dial-up modem for the late 1990s. Even if the local B-ASCs have the processing speed to allow for more aggressive polling times (i.e., every minute), it is the BACnet MS/TP network and the associated communication protocol that will most likely limit the polling time.

Machine learning is a subset of AI. An important concept to understand is that ML-based control algorithms do not understand the cause — the “why” behind their own predictions.

ML algorithms typically fall under three categories: supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction, recommendation) and reinforcement learning (reward maximization). The first and third category are used most in the control of building systems. It is important to note that in both categories (i.e., supervised learning and reinforcement learning), the algorithms must see the historical data first to be able to fit a function.

The output of an algorithm that uses classification is typically binary. This type of algorithms is typically used to detect anomalies in the operation of HVAC systems — for example, is the chiller about to fail?

As it relates to typical HVAC systems and BASs that use continuous variables (i.e., chilled water supply temperature) and not discrete variables (i.e., day of the week), ML algorithms are trained by giving them a set of data for which they are being asked to fit a function to and then make a prediction, using the fitted function, on new data.

As an example of attempting to use a ML-based control algorithm to predict the performance of a pump, see Table 1. Inputs include the flow rate, water entering temperature and differential pressure; the output is the pump energy consumption in kilowatt-hours. When the correct inputs are chosen based on real-world data that can approximate the energy consumption of a pump, parameters can be developed.

Training data for an algorithm can be generated by gathering using the BAS trends and then sending that data through the API to the cloud-based solution (see Figure 2).

Figure 2: A sample time-based trend graph that a standard building automation system can provide. The data is captured every 2.5 minutes. Courtesy: SmithGroup

Before sending new data to the algorithm and ask the algorithm to make predictions, one must analyze the performance of the algorithm based on the training data, i.e., the data that was sent to it to estimate the function. Typically, there are at least three scenarios that unfold once we get the first predictions from the algorithm (see Figure 3). Note that Figure 3 represents a multivariate function based on background variables that are not indicated in the chart.

High bias or under fitting: bias is the amount of error introduced by approximating a highly non-linear function with an almost linear function.
High Variance or over fitting: it happens when the algorithm overlearns from the training data, i.e., it sees things that are not there in reality.
Just right: the algorithm has low variance and low bias

Figure 3: A sample comparison between the actual data and possible predictions made by an artificial intelligence algorithm. Courtesy: SmithGroup

Evaluating BAS metrics

When evaluating the potential use of a ML solution, judging the potential performance and separating claims from the facts can be difficult. One approach to evaluate the experience and expertise of the solution provider is to request examples of historical metrics. The performance of regression algorithms is established by using the following metrics:

The mean absolute error, or MAE, is the sum of the absolute differences between predictions and actual values; if gives an indication of how much off the target the predictions were. This metric gives an indication of the magnitude of the error, but no indication if the algorithm is under fitting or over fitting.
The mean squared error, or MSE, is similar to the MAE in that it gives an indication of the magnitude of the prediction error.
Root mean squared error, or RMSE, takes the square root of the MSE and converts the units back to the original units of the output variable.
R-squared metric provides an indication of the goodness of fit. R-squared is typically displayed between 0 and 100 or between 0 and 1. In general, the higher the R-squared, the better the model used by the algorithm fits the real data.

All metrics described above tend to get worse when the following occur:

Not enough data: For an algorithm to improve its predictions, it needs data and it needs a lot of it. The more data it sees, the lower the errors (described above) are. For example, to start receiving valid predictions from an ML-based control algorithm, one would need minimum 10 weeks of data with the variables described in Table 1. This means that the chilled water plant must be running on standard sequence of operations before one is to use the output from ML-based control algorithms to make decisions regarding the future operation of the chilled water plant.

Data out of range: ML-based control algorithms are poor performers when asked to make predictions on data that is outside of the range used to train them. Let’s take for example the data in Table 1: the minimum value of the pump speed is 33 hertz and the maximum is 58 hertz. If the algorithm now starts seeing data outside of this range (i.e., below 33 hertz and above 58 hertz) and use said data to make predictions, said predictions will be grossly off. This is because ML-based algorithms are almost incapable at extrapolating. To compensate for this, one will need to add this out-of-range data to the training set, retrain the algorithm and reanalyze the metrics described above.

Data spread too thin across range: Algorithms prefer smaller ranges and repeatable inputs within those ranges. Even if a significant amount of training data is available, the density of training data within various portions of the performance map can be insufficient. An ML algorithm relying on weather data is an example of an algorithm that may struggle with this issue. The possible combinations of outdoor air temperature, humidity ratio, wind direction and wind speed are enormous — and seasonally impacted.

If attempting to predict wintertime heating loads using weather data, the data collection process can become even further hampered by the occupancy status of the building. When relying on an ML algorithm to make informed predictions for a building, it is vital to ensure the full map of possible combinations can be populated at a sufficient density.

Confounding factors not addressed: The performance of HVAC systems is ripe with confounding factors that can impair the training process of an ML algorithm. These confounding factors must be accounted for to ensure the training data set is clean and relatively consistent with the physics of the system.

For example, if an ML algorithm were attempting to predict performance of a remotely located AHU cooling coil using a chilled water supply temperature sensor located within a central plant as an input, the temperature of the chilled water entering the cooling coil may temporarily differ from the sensed value within the plant, particularly after a change in setpoint as the cooler or warmer chilled water takes time to move through the piping system and the heat transfer process at the cooling coil takes time to reach equilibrium. This process creates false data and can impair the training process.

Confounding factors can be reduced by ensuring training data uses real, measured data as opposed to system setpoints, through the intelligent deployment of sensors throughout the system and the use of intelligent time delays for data collection.

Data collection

Because engineers, building owners, facilities managers and building engineers are not versed in this AI language, what is the responsibility in ensuring a successful ML application? Let us go back to BACnet MS/TP part of a BACnet-based BAS network. The longer the MS/TP network is, the longer it takes for the data to be transmitted to and from the cloud-based solution. The network speed determines the minimum frequency of polling, which significantly impacts the amount of data that can be generated in a given period of time and increases the potential interference of confounding factors.

This type of algorithm must see the data first to be properly trained. If they can’t see the data, they can’t extrapolate. Figure 4 represents a multivariate ML function plotted on a time scale. Note that the output data is impacted by a number of other background variables that are not indicated in the chart. If polling is setup to occur every five minutes, then the algorithm will both have less data to work with and can potentially miss out on valuable data.

Figure 4: Sample graph showing how polling or lack thereof can impact the amount of data that is being used by a machine learning algorithm. In this graph, the data points “missed” by the polling are peak values that can negatively affect the performance of artificial intelligence algorithms. Courtesy: SmithGroup

Figure 5: A proposed BACnet architecture in which the communication wiring between the majority of the BACnet devices is Ethernet cable. This approach will allow for a significant increase in polling rates, Courtesy: SmithGroup

How can one mitigate this risk of having incomplete data? For an existing building that has an extensive BACnet MS/TP infrastructure, one cannot simply decrease the polling time to every 30 seconds; an alternative is to decrease the polling time as much as possible while continuously store data and retrain the algorithms across all seasons.

In the case of a new design, implementing a network that is entirely based on the BACnet internet protocol communication protocol can provide the network speeds required for intelligent building solutions (see Figure 5). This means that the cabling between the controllers and other controllers or routers is Ethernet type and not MS/TP twisted pair type.