Validation Parameters Tab
The Validation Parameters Tab defines validation routines that identify anomalous data. When the anomalous data is located, the data status flag is changed to Bad, Load Control, or Load Event. Parameters are set for each meter. The data validation routines are run when the Validate Data action (Meter Actions) is invoked. When multiple validation methods are selected, the union of method results defines anomalous data.
The following methods may be used to validate the data.
- Disallow Zero.
- Disallow Negative Values.
- Enable Level.
- Enable Repeated Value.
- Enable Spike Detection.
- Enable Model Filter.
- Enable Delta.
The "Status to be Set" parameter in the tab's upper left corner determines the data status marking to apply based on the criteria specified on the Validation Parameters form.
Copy Settings and Paste Settings in the tab's lower left corner may be used to quickly apply settings to the other meters or statuses. To copy a setting between statuses, select the “Copy Settings” button, then move to another status and select the “Paste Settings” button. To copy settings between meters, select the “Copy Settings” button (do not close the Meter Properties dialog), then select another meter or set of meters on the List tab and select the “Paste Settings” button on the new Validation Parameters tab. Clear Settings clears the settings for the currently selected meter and status.
Disallow Zero
By default, zero values are permitted as valid data. When this option is selected, zero values are identified as anomalous.
Disallow Negative Values
By default, negative values are permitted as valid data. When this option is selected, negative values are identified as anomalous.
Enable Level
The Enable Level method defines the allowable monthly maximum and minimum values for the selected meter. For example, if the January maximum is set to 100 and the minimum is set to 50, any January values below 50 and above 100 are identified as anomalous. To activate this validation, check the Enable Level box.
Enable Repeated Values
The Enable Repeated Values method marks a series of repeated values as anomalous. The Tolerance value defines how many repeated values are valid. For example, if the Tolerance is set to 2.0, then two repeated values are considered valid. A third repeated value would trigger meter validation. When triggered, all consecutive repeated values are marked with the appropriate status flag (excluding the first value). To activate this validation, check the Enable Repeated Values box.
Enable Spike Detection
The Enable Spike Detection method checks the percentage difference between the highest value and the third highest value for a rolling 24-hour period. The Threshold is the lowest value at which the method is activated. For instance, if the January value is set to 40, then the method will not apply to data lower than 40. The Tolerance defines the percentage difference between the period peak and the third highest value. When the period peak exceeds the third highest value by more than this value, the data will be marked as anomalous. For example, a Tolerance set to 2 is interpreted as a 200% difference between the peak the third highest value. To activate this validation, check the Enable Spike Detection box.
Enable Model Filter
The Enable Model Filter method validates data against a model result based on a variance method and validation range. To enable this method, check the Enable Model Filter box and define the following parameters.
The fields required to configure this validation method are as follows.
Model. This model is from the Models Module and the Five-Minute Models Module is used to estimate the expected value and variance of meter's data values.
Variance Method. This is the data grouping used to construct the tolerance bands. Groupings are defined below and include,Constant, Day Type, Day of Week, and Extended Day Type.
Validation Range (In Standard Errors). The Validation Range parameters (Up and Down) determine the number of model standard errors to add and subtract from the model expected value.
Up. The number of model standard errors to add from the model expected value.
Down. The number of model standard errors to subtract from the model expected value.
Filter Load Control. This parameter allows the user to apply the algorithms to data marked as Load Control.
Filter Load Event. This parameter allows the user to apply the algorithms to data marked as Load Event.
The Model Filter uses the model predicted values and residual statistics to construct a validation interval for each observation. The filter logic is described assuming an hourly model is used to validate an hourly meter. The model's predicted value for an hour (h) is the starting point for constructing the validation interval for that hour. The validation boundaries are given by:
MaxVal(h) = Pred(h) + NUp x StdErr(h)
MinVal(h) = Pred(h) - NDown x StdErr(h)
In this expression, Pred(h) is the predicted value from the model for hour h. NUp is the Validation Range setting that determines the number of standard errors allowed in the upward direction. Similarly, NDown is the number of standard errors allowed in the downward direction.
The standard error value for an hour, StdErr(h), is constructed from the validation model’s estimation residuals. The residuals are grouped and the standard error for each hour is set equal to the standard deviation of the residual values. The Variance Method defines the groups as listed below
- Constant. In this method, a single standard error value is calculated for each hour, regardless of day type. The value for an hour is the standard deviation of all estimation residuals for that hour. For the model, these values can be viewed in the Models Module on the Estimation tab.
- Day of Week. In this method, residuals are grouped according to the day of the week (Monday, Tuesday… Saturday, and Sunday/Holiday). For each of the seven day of week categories, the estimation residuals for each hour are used to compute a standard deviation for that category and hour. These values are applied based on the day and hour being validated. These values are not available in the MetrixIDR user interface.
- Day Type. This method is similar to the Day of Week method, except that only two day of week groups are used. The first is group consists of Saturdays, Sundays, and Holidays. The second is a group for remaining days. The computed standard error values are applied based on the day-type and hour being validated. These values are not available in the MetrixIDR user interface.
- Extended Day Type. This method is similar to the Day Type method, except that separate statistics are computed for each month and additional day-types are used. For each month five day-type group are defined. These groups are (1) Monday, (2) Tuesday/Wednesday/Thursday, (3) Friday, (4) Saturday, and (5) Sunday/Holiday. The computed standard error values are applied based on the extended day-type and hour being validated. These values are not available in the MetrixIDR user interface.
When the selected meter has a finer frequency than the validation model, the predicted values, Pred(h), are interpolated. For example, when an hourly model is used to validate 5-minute data, the standard error values for intervals within each hour are calculated as described above based on the Variance Method. However, the hourly predicted values are interpolated. The interpolation method is half way between linear and quadratic interpolation. As a result, the 5-minute predicted values depend on the predicted hourly values for the past hour, current hour, and next hour. The weights vary as the hours proceed. For example, the weights for the first 5-minute value are about 40% on the past hour and 60% on the current hour (weights are .396, .666, -.062). For the middle values, the weights are close to 100% on the current hour (weights are .032, .978, -.001 for the 6th 5-minute interval in the hour). And for the last 5-minute value in the hour, the weights are close to 60% on the current hour and 40% on the next hour (weights are -.062, .666, .396).
In the case of a 5-minute model validating 5-minute data, the method includes a ramp rate component. In this case, the logic described above is applied to the data levels as well as to the first differences (ramp rates) of the data based on calculated standard errors from the first difference (ramp rate) models.
Enable Delta
The Enable Delta Method validates data based on the difference (ramp rate) between two intervals. The input validation parameters apply to specific intervals of time based for each month. To enable this method, check the Enable Delta box and define the validation parameter based on Buckets. Buckets are managed using the following commands.
- Add Bucket. This button inserts a row (bucket) in the Enable Delta validation box. A bucket defines the validation period and delta setting.
- Remove Bucket. This button removes an existing bucket from the list.
Bucket characteristics define the month, time range, and validation rule. These characteristics must be entered and are defined below.
- Month. The month for the validation parameter.
- StartHour. The start hour for the validation parameter.
- EndHour. The end hour for the validation parameter.
- Delta. The allowable deviation from interval to interval. When a delta exceeds this numerical value within the month and hour range, the data identified as anomalous.
- MaxBadIntervals. The maximum allowable number of consecutive anomalous intervals. When the number of anomalous intervals exceeds the MaxBadIntervals value, the data are not marked as anomalous.