The goals of performance monitoring i always think about are:
- Health indicator: it will show you if the current status of your application farm is in good shape or not; or let you predict the trend;
- Find out the bottleneck of your system from a high level, it is really helpful for Engineers to get start to get issue fixed;
- Help to do trouble shooting: the counters will drive us to the right direction from different angles to find the root cause instead of "smart guess";
- Operation management: Capacity Planning and Risk management, try to resolve any performance or stability issues before it comes!
Counters For OS | | Explaination |
| Server Uptime | elapse time since server recent start up |
| Processor--Total CPU% | the percentage of elapsed time that the processor spends to execute a non-Idle thread. |
| Processor--% User Time | the percentage of elapsed time the processor spends in the user mode. |
| System--Processor Queue Length | the number of threads in the processor queue. There is a single queue for processor time even on computers with multiple processors. Therefore, if a computer has multiple processors, you need to divide this value by the number of processors servicing the workload. A sustained processor queue of less than 10 threads per processor is normally acceptable, dependent of the workload. |
| Memory--% MEM in use | the ratio of Memory\\Committed Bytes to the Memory\\Commit Limit. |
| Memory--pages/sec | the rate at which pages are read from or written to disk to resolve hard page faults. |
| Disk Space--DISK C: | the percentage of Used space on Disk C |
| Disk Space--DISK D: | the percentage of Used space on Disk D |
| Disk Space--DISK E: | the percentage of Used space on Disk E |
| DISK IO --Avg. Disk Queue Length | the average number of both read and write requests that were queued for the selected disk during the sample interval. |
| DISK IO --% Disk Time | the percentage of elapsed time that the selected disk drive was busy servicing read or write requests. |
| DISK IO --Avg. Disk sec/Read | the average time, in seconds, of a read of data from the disk. |
| DISK IO --Avg. Disk sec/Write | the average time, in seconds, of a write of data to the disk. |
| DISK IO --Disk Reads/sec | the rate of read operations on the disk. |
| DISK IO --Disk Writes/sec | the rate of write operations on the disk. |
| NetWork IO--Packets Sent/sec | the rate at which packets are sent on the network interface. |
| NetWork IO--Packets received/sec | the rate at which packets are received on the network interface. |
| TCP--TCP Connections EST | the number of TCP connections for which the current state is either ESTABLISHED or CLOSE-WAIT |
| Total Process cnt | Total number of processes on the server |
| | |
Counters For Specific Process instance | | Explaination |
Apache Http Server -- Apache Process | | |
| Process Uptime | elapse time since apache process recent start up |
| Process CPU% | the percentage of elapsed time that the processor spends on Apache Process. |
| Process Memory in use | Memory usage by apache process |
| Busy workers cnt | the number of threads which are in use for requests |
| Idle workers cnt | the number of threads which are not receiving any request |
| requests/sec | throughput of apache Http server |
| KB/sec | throughput of apache Http server |
| | |
Application Server -- Java Process | | |
| Process Uptime | elapse time since java process recent start up |
| Process CPU% | the percentage of elapsed time that the processor spends on java Process. |
| Private Bytes | the current size, in bytes, of memory that this process has allocated that cannot be shared with other processes. |
| Working Set | Working Set is the current size, in bytes, of the Working Set of this process. The Working Set is the set of memory pages touched recently by the threads in the process |
| Used Heap Size | JVM Heap size is in use currently |
| Live Threads cnt | The number of threads currently active in this process |
| Accumulate GCTime | the total time taken by GC activities |
| | |
Async Server--JMS MQ Process | | |
| Process Uptime | elapse time since JMS process recent start up |
| Process CPU% | the percentage of elapsed time that the processor spends on JMS Process. |
| Private Bytes | the current size, in bytes, of memory that this process has allocated that cannot be shared with other processes. |
| Working Set | Working Set is the current size, in bytes, of the Working Set of this process. The Working Set is the set of memory pages touched recently by the threads in the process |
| Message Queue Depth | the current number of messages that are waiting on the queue |
| DLQ cnt | the current number of messages which in Dead Letter Queue |
| Live Threads cnt | The number of threads currently active in this process |
| DB Connections cnt | the current number of open database connections used by this process |
| | |
DB Server -- SQL Server Process | | |
| Process Uptime | elapse time since SQL Server process recent start up |
| Process CPU% | the percentage of elapsed time that the processor spends on SQL Server Process. |
| Process Memory in use | Memory usage by SQL Server process |
| Free Space in Temp DB | Tracks free space in tempdb in kilobytes |
| User Connections | the current number of connections (and users) are using the server |
| Buffer Cache Hit Ratio | indicates how often SQL Server goes to the buffer, not the hard disk, to get data. Had better larger than 90% |
| Full Scans/Sec | the number of unrestricted full scans during unit time. These can either be base table or full index scans. |
| Transactions/Sec | The number of transactions started for the database during unit time |
| Average Wait Time | the average amount of wait time (milliseconds) for each lock request that resulted in a wait |
More counters you add and higher granularity you made will bring more overhead to your system, you should make trade offs on selecting counters and sample interval.There is no silver bullet after all...
No comments:
Post a Comment