About Proficy Historian Message Queue Object
In any server software, there will be a number of queues. Most of the time, and all the queues should ideally have 0 items. This implies that the server is keeping up with the workload. The read and write counters of the overview object tell you how many read and write operations were performed. However, the queue counters can tell you how many actions are expected to happen, and if the user had to wait for a response.
Measuring the system performance through queues is an excellent way to determine if the server has reached the steady state performance limit. It can also tell you if the usage comes in bursts and needs to be mores spread out over time.
When using the queues for measurement, you should think about what are the “items” on that queue. The “items” or “messages” here are read calls or write calls. One read call can have multiple tag names and one write call can have multiple data samples.
- Write Queue: Data writes from collectors and non-collectors.
- Read Queue: Anything for data that is not a write. It is not just data reads, it can also be tag browses.
- Msgs Queue: Anything other than read queue and write queue. You can practically ignore this queue as it is only a tiny part of the activity and it is not considered in this document.
You can get basic or very detailed information from the queue counters. At a basic level, if the queues are non-zero at a point in time, you are doing too much work at that point in time. If your queues are always non zero, then you are always expecting too much and have reached your performance limit.
- Last Read time vs Average Read Time
- Variability of the current queue counts
- Variability of the processed rate of read or write queue
- Number of samples per write
Basic Queue Counters
These counters represent concepts that apply to any queue usage in any server software. There is a set of these for the read queue and set for the write queue.
Counter Name | Description |
---|---|
Count (Max) | The highest number reached by the Count (Total). |
Count (Total) | Number of messages currently on the queue. |
Processed Count | Number of messages processed from the queue since Data Archiver startup. This number will wrap around and reset to zero if the Data Archiver runs for a long time. |
Processed Rate (msg/min) | Number of messages processed from the queue in the last minute. |
Processing Time (Ave) | Average time (in milliseconds) since the Data Archiver startup to process a message. |
Processing Time (Last) | Time (in milliseconds) to process the most recently processed message. |
Processing Time (Max) | Highest number the Processing Time (Last) reached since the Data Archiver startup. |
Recv Count (msgs) | Number of messages received into the queue since the Data Archiver startup. |
Recv Rate (msgs/min) | Rate at which messages are received in the last minute. |
If your Processed Rate (msgs/min) is more than your Recv Rate (msgs/min), then your Count (Total) will be zero as the Data Archiver will be keeping up with the incoming requests.
The current value of these counters in report view is displayed at all times in the Performance Monitor. You can log these counters to a Performance Monitor group file so that the times can be matched up with periods of slow performance.
Detailed Queue Counters
These counters require a detailed understanding of how the queues are used.
There is no single read or write queue in memory. They are a virtual queue that is the sum total of all the client queues. Each connection from a client uses a socket. Each socket is monitored by a thread called a client thread. A queue is used between one client thread and the pool of threads that access the IHA files. This can be called as a client queue. No client thread goes directly to the IHA files. There are a fixed number of threads that monitor all client queues and read and write the IHA files.
A default system has one write thread and four read threads. You may have 20 collectors and 35 clients connected to the data archiver, that is, 20+35=55 client threads. That is, 55 x 2 = 110 client queues as each client thread has one read and one write queue.
The four read threads will monitor the 55 client read queues, most of which are empty most of the time. The one write thread monitors the 55 client write queues.
The Count (Total) on the Read Queue instance or the Write Queue instance is the sum total of all the items on the 55 read queues.
Counter Name | Description |
---|---|
Threads | Number of configured threads that go to the IHAs. This number will not change at runtime. It defaults to one write thread and four read threads. |
Threads Working | Number of configured queue processing worker threads that are currently working on processing a message. If there is not much work to do, there will be idle threads and which will be much less than the Threads counter, possibly zero. |
Time In Queue (Ave) | The average time since the Data Archiver startup of the “Time In Queue (Last)”. |
Time In Queue (Last) | Time (in milliseconds) that the last message waited in the queue before a thread started processing it. This should be near zero, meaning the archiver is keeping up with the requests and writes. |
Time In Queue (Max) | The max time since the Data Archiver startup of the Time In Queue (Last). |
ClientQueues with Msgs |
The number of client queues with messages on them. In the previous example, this is how many of the 55 read client queues have at least one item on it. It doesn’t matter how many items are on the client queue, only that it has at least one item. The number would be between 0 and 55. This number gives some idea about how balanced the incoming load is and how balanced the servicing of the clients is. You don’t want any single client doing too many reads or write causing other clients to have to wait. |
The time to process one read or one write would be the Time In Queue (Last) + the Processing Time (Last). But these are not visible as these are overall system wide counters, and not the way to troubleshoot one read or one client. The Time in Queue (Last) increases when the Threads Working equals Threads meaning all threads are busy.
Example: Comparing current to average processing time
Every system is different and has its own “normal” data rate. You can measure it if your current rate is above or below normal. To determine if the Recv Rate or Processed Rate is above or below normal, you must look at the number over a longer period of time, maybe 1 hour or 24 hours.
To determine if the processing time is taking longer than normal, you can trend the Processing Time (Last) to the Processing Time (Average) at the same time range. One line will be above the other to show if the range is above or below normal.
Example: Measuring the variability of Queue Count Total
This demonstrates that the Count (Total) can change. The number will change based on the Recv Count and the Processed Count.
The Write Queue Recv Rate is usually consistent. But you may see the Write Queue Recv Rate increase during a Store and Forward flush of a collector. The Write Queue Processed Count will vary more, and that will cause the Write Queue Count (Total) to vary as well. Consider an archive backup done at midnight each day. During a backup, the writes have to stop. The Write Queue Recv Rate will stay the same because collectors are still writing. The Processed Count will be zero during the backup so the Write Count (Total) will grow.
The same happens if there are long reads happening. If there are any reads, then the writes will have to wait and the Write Count (Total) will grow. But the Overview object Read Raw Rate should be busy, indicating the Data Archiver is busy doing some work, but not the writes.
If the writes are out of time order, the exact same number and bundle size of the raw samples can take longer to write. The exact same number of raw samples can take longer to read if there are cache misses and the data archiver does file I/O.
Reads are unlike writes because collectors will keep sending writes, even if they don’t get responses. A client that does a read will wait for the response before sending the next read. The reads will not queue up in the Data Archiver. In general, the Read Queue Count (Total) will not grow as high as the Write Queue Count (Total) unless you have many read clients.
You can measure how much your Read and Write Queue Count (Total) vary over a 24 hour period, and understand that Count (Total) variability is caused by the variability of the Recv Rate and Processed Rate. The variability of those is caused by the variability of the sizes of the reads and writes combined with whatever else is happening on the machine.
Example: Computing the number of samples per write
The Overview object has a Read Calls counter but does not have a Write calls counter. You don’t know the number of write calls nor can you compute a number of samples per write call. But, since one Write Queue Recv Count is one write call, you can use that number.