Notes on Collector and Archive Compression
This section describes the behavior of collector and archive compression. Understanding these two Historian features will help you apply them appropriately to reduce the storage of unnecessary data. Smaller archives are easier to maintain and allow you to keep a greater time span of historical data online.
Collector Compression:
Collector compression applies a smoothing filter, inside the collector, to data retrieved from the data source. By ignoring small changes in values that fall within a deadband centered around the last reported value, only significant changes are reported to the archiver. Fewer samples reported yields less work for the archiver and less archive storage space used.
The definition of significant changes is determined by the user by setting the collector compression deadband value. For convenience, Historian Administrator calculates and shows the deadband in engineering units if you enter a deadband percentage. If you later change the high and low EGU limits, the deadband is still a percentage, but of the new limits. A 20% deadband on 0 to 500 EGU span is 100 engineering units. Then, you change the limits to 100 and 200 and the 20% is now 20 engineering units.
The deadband is centered around the last reported sample, not simply added to it or subtracted. If your intent is to have a deadband of 1 unit between reported samples, you want a compression deadband of 2 so it is one to each side of the last reported sample. In an example of 0 to 500 EGU range, with a deadband of 20%, the deadband is 100 units, and the value has to change by more than 50 units from the last reported value. Changes in data quality from good to bad, or bad to good, automatically exceed collector compression and are reported to the archiver. Any data to that comes to the collector out of time order will also automatically exceed collector compression.
It is possible for collected tags with no compression to appear in Historian as if the collector or archive compression options are enabled. If collector compression occurs, you will notice an increase in the percentage of the Compression value from 0% in the Collectors panel of the System Statistics page in Historian Administrator. When archive compression occurs, you will notice the Archive Compression value and status bar change on the System Statistics page.
For all collectors, except the File collector, you may observe collector compression occurring for your collected data (even though it is not enabled) if bad quality data samples appear in succession. When a succession of bad data quality samples appears, Historian collects only the first sample in the series. No new samples are collected until the data quality changes. Historian does not collect the redundant bad data quality samples, and this is reflected in the Collector Compression percentage statistic.
For a Calculation or Server-to-Server Collector, you may possibly observe collector compression (even though it is not enabled) when calculations fail, producing no results or bad quality data. The effect of Collector Compression Timeout is to behave, for one poll cycle, as if the collector compression feature is not being used. The sample collected from the data source is sent to the archiver. Then the compression is turned back on, as configured, for the next poll cycle with new samples being compared to the value sent to the archiver.
Archive Compression:
Archive compression can be used to reduce the number of samples stored when data values for a tag form a straight line in any direction. For a horizontal line (non changing value), the behavior is similar to collector compression. But, in archive compression, it is not the values that are being compared to a deadband, but the slope of line those values produce when plotted value against time. Archive compression logic is executed in the data archiver and, therefore, can be applied to tags populated by methods other than collectors.
Archive compression can be used on tags where data is being added to a tag by migration, or by the File collector, or by an SDK program for instance. Each time the archiver receives a new value for a tag, the archiver computes a line between this incoming data point and the last archived value.
The deadband is calculated as a tolerance centered about the slope of this line. The slope is tested to see if it falls within the deadband tolerance calculated for the previous point. If the new point does not exceed the tolerance, it is held by the archiver rather than being archived to disk. This process repeats with subsequent points. When an incoming value exceeds the tolerance, the value held by the archiver is written to disk and the incoming sample becomes held.
The effect of the archive compression timeout is that the incoming sample is automatically considered to have exceeded compression. The held sample is archived to disk and the incoming sample becomes the new held sample. If the Archive Compression value on the System Statistics page indicates that archive compression is occurring, and you did not enable archive compression for the tags, the reason could be because of internal statistics tags with archive compression enabled.