About Collector and Archive Compression
Collector Compression
Collector compression applies a smoothing filter to data retrieved from the data source. By ignoring small changes in values that fall within a deadband centered around the last reported value, only significant changes are reported to the archiver. Fewer samples reported yields less work for the archiver and less archive storage space used.
You can specify the deadband value. For convenience, if you enter a deadband percentage, Historian Administrator shows the deadband in engineering units. For example, if you specify a 20% deadband on 0 to 500 EGU span, it is calculated and shown as 100 engineering units. If you later change the limits to 100 and 200, the 20% deadband is now calculated as 20 engineering units.
The deadband is centered around the last reported sample, not simply added to it or subtracted. If your intent is to have a deadband of 1 unit between reported samples, you must enter a compression deadband of 2 so that it is one to each side of the last reported sample. In the previous example of 0 to 500 EGU range, with a deadband of 20%, the deadband is 100 units; This means that only if the value changes by more than 50 units, it is reported.
Changes in data quality from good to bad, or bad to good, automatically exceed collector compression and are reported to the archiver. Any data that comes to the collector out of time order will also automatically exceed collector compression.
It is possible for collected tags with no compression to appear in Historian as if the collector or archive compression options are enabled. If collector compression occurs, you will notice an increase in the percentage of the compression value in the Collectors section of the System Statistics page in Historian Administrator. When archive compression occurs, you will notice the archive compression value and status bar change on the System Statistics page.
For instructions on setting collector compression, refer to Access/Modify a Tag.
- When a succession of bad data quality samples appears, Historian collects only the first sample in the series. No new samples are collected until the data quality changes. Historian does not collect the redundant bad data quality samples, and this is reflected in the collector compression percentage.
- For a Calculation or Server-to-Server collector, when calculations fail, producing no results or bad quality data, collector compression is used. The effect of Collector Compression Timeout is to behave, for one poll cycle, as if the collector compression feature is not being used. The sample collected from the data source is sent to the archiver. Then the compression is turned back on, as configured, for the next poll cycle with new samples being compared to the value sent to the archiver.
Handling Value Step Changes with Collector Data Compression
BigDiff=abs(HI_EGU-LO_EGU)*(CompressionDeadbandPercent/(100.0*2.0))*4.0
If ( Collector Compression is Enabled )
If ( Elapsed time since LastReportedValue>=( SampleInterval * 5 ) )
If ( abs(CurrentValue-LastReportedValue) > BigDiff )
Write LastReportedValue,Timestamp=(CurrentTime-SampleInterval)
In the
example above, if a new value was not reported for at least the last 4 sample
intervals, and the new input value is at least 4 deltas away from the old value
(where a single delta is equal to half of the compression deadband), then a marker
value is written. Value Spike with Collector Compression
Time | X Value |
---|---|
0:00:00 | 10.0 (steady state value) |
0:00:01 | 10.0 |
0:00:02 | 10.0 |
0:00:03 | 10.0 |
0:00:04 | 10.0 |
0:00:05 | 10.0 |
0:00:06 | 10.0 |
0:00:07 | 10.0 |
0:00:08 | 10.0 |
0:00:09 | 10.0 |
0:00:10 | 20.0 (new value after step change) |
Time | X Value |
---|---|
0:00:00 | 10.0 (steady state value) |
0:00:10 | 20.0 (new value after step change) |
Time | X Value |
---|---|
0:00:00 | 10.0 (steady state value) |
0:00:09 | 10.0 (inserted Marker value) |
0:00:10 | 20.0 (new value after step change) |
Evaluating and Controlling Data Compression
You can achieve optimum performance in Historian by carefully controlling the volume of dynamic data it collects and archives. You need enough information to tell you how the process is running, but you do not need to collect and store redundant or non-varying data values that provide no useful information.
Control Data Flow
You can control the amount of online or dynamic data the system handles at a given time by adjusting certain system parameters. The general principle is to control the flow of data into the archive either by adjusting the rate at which the collectors gather data or by adjusting the degree of filtering (compression) the system applies to the data collected.
- Reduce the polling rate by increasing the collection interval for unsolicited and polled collection.
- Enable collector compression and optionally use compression timeout.
- Set the compression deadband on the collectors to a wider value.
- Use the collector compression timeout.
- Enable archive (trend) compression.
- Set the archive compression deadband to a wider value.
- Where possible, use the scaled data type and enable input scaling on selected tags.
- Where possible, select milliseconds or microseconds rather than seconds for time resolution. Seconds is optimum for most common devices. This affects disk space.
Evaluate Data Compression Performance
You can determine how effectively data compression is functioning at any given time by examining the system statistics displayed on the System Statistics page of Historian Administrator.
The compression field at the top of the page shows the current effect of archive compression. Values for this parameter should typically range from 0 to 9%. If the value is zero, it indicates that compression is either ineffective or turned off. If it shows a value other than zero, it indicates that archive compression is operating and effective. The value itself indicates how well it is functioning. To increase the effect of data compression, increase the value of archive compression deadband so that compression becomes more active.
Archive Compression
Archive compression is used to reduce the number of samples stored when data values for a tag form a straight line in any direction. For a horizontal line (non-changing value), the behavior is similar to collector compression. But, in archive compression, it is not the values that are being compared to a deadband, but the slope of line those values produce when plotted value against time. Archive compression logic is executed in the data archiver and, therefore, can be applied to tags populated by methods other than collectors.
You can use archive compression on tags where data is being added to a tag by migration. Each time the archiver receives a new value for a tag, the archiver computes a line between this incoming data point and the last archived value.
The deadband is calculated as a tolerance centered about the slope of this line. The slope is tested to see if it falls within the deadband tolerance calculated for the previous point. If the new point does not exceed the tolerance, it is not stored in the archive. This process repeats with subsequent points. When an incoming value exceeds the tolerance, the value held by the archiver is written to disk and the incoming sample is withheld.
The effect of the archive compression timeout is that the incoming sample is automatically considered to have exceeded compression. The withheld sample is archived to disk and the incoming sample becomes the new withheld sample. If the Archive Compression value on the System Statistics page indicates that archive compression is occurring, and you did not enable archive compression for the tags, the reason could be because of internal statistics tags with archive compression enabled.
For instructions on setting archive compression, refer to Access/Modify a Tag.