TCollector
TCollector
tcollector?is a client-side process that gathers data from local collectors and pushes the data to OpenTSDB. You run it on all your hosts, and it does the work of sending each host's data to the TSD.
tcollector是client-side(客戶端)進程,收集本地數據,然后推送OPenTSDB。在所有的主機上運行,然后把主機上的數據發送給TSD。
?
OpenTSDB is designed to make it easy to collect and write data to it. It has a simple protocol, simple enough for even a shell script to start sending data. However, to do so reliably and consistently is a bit harder. What do you do when your TSD server is down? How do you make sure your collectors stay running? This is where tcollector comes in.
OpenTSDB的設計目標是讓數據采集以及數據存儲變得更容易。其擁有簡單的協議,簡單得可以支持shell腳本發送數據。但是,如何做到可靠性和一致性是個比較難的事情。當TSD server故障的時候如何處理?如何確保你的collectors保持持續運行?tcollector可以做到這些。
?
Tcollector does several things for you:
- Runs all of your data collectors and gathers their data
- Does all of the connection management work of sending data to the TSD
- You don't have to embed all of this code in every collector you write
- Does de-duplication of repeated values
- Handles all of the wire protocol work for you, as well as future enhancements
- 運行所有的數據采集程序,并將數據收集起來
- 做所有發送數據到TSD的連接管理工作
- 在你自己實現的collector中無需實現這個功能
- 對于重復的數據會做刪除工作
- 可以處理各種傳輸協議,包括一些增強版本
?
Deduplication
Typically you want to gather data about everything in your system. This generates a lot of datapoints, the majority of which don't change very often over time (if ever). However, you want fine-grained resolution when they do change. Tcollector remembers the last value and timestamp that was sent for all of the time series for all of the collectors it manages. If the value doesn't change between sample intervals, it suppresses sending that datapoint. Once the value does change (or 10 minutes have passed), it sends the last suppressed value and timestamp, plus the current value and timestamp. In this way all of your graphs and such are correct. Deduplication typically reduces the number of datapoints TSD needs to collect by a large fraction. This reduces network load and storage in the backend. A future OpenTSDB release however will improve on the storage format by using RLE (among other things), making it essentially free to store repeated values.
通常情況下,你想采集系統內所有的數據。這將會產生很多的數據點,大部分的數據點在一段時間內是不會變化的。但是,當有變化的時候,你期望平滑(fine-grained solution)的解決方案。Tcollector在一次采集周期內會記錄為每個collector記錄最后一次值以及采集時間。如果在相同的時間間隔內,值沒有變化的話,它會這個數據點暫時保存起來,不發送。一旦數據發送變化(10分鐘時間過去),它將發送最后的值以及時間戳,加上當前的值和時間戳。這樣的話,你的數據圖就是正確的。重復數據刪除會減少TSD中數據點的個數,而這些數據點是需要采集的。這樣的話,會減少后端的網絡開銷以及存儲。OpenTSD將使用RLE改善存儲的格式,使更本質地解決重復數據存儲問題。Collecting lots of metrics with tcollector
Collectors in tcollector can be written in any language. They just need to be executable and output the data to stdout. Tcollector will handle the rest. The collectors are placed in the?collectors?directory. Tcollector iterates over every directory named with a number in that directory and runs all the collectors in each directory. If you name the directory?60, then tcollector will try to run every collector in that directory every 60 seconds. Use the directory?0?for any collectors that are long-lived and run continuously. Tcollector will read their output and respawn them if they die. Generally you want to write long-lived collectors since that has less overhead. OpenTSDB is designed to have lots of datapoints for each metric (for most metrics we send datapoints every 15 seconds).
If there any non-numeric named directories in the?collectors?directory, then they are ignored. We've included a?lib?and?etc?directory for library and config data used by all collectors.
在tcollector中的Collectors可以支持任何語言。只需要它們是可執行的,并且將輸出結果輸出到標準輸出中。Tcollector會處理這些結果。
這些collectors放在collectors目錄下。Tcolletor遍歷每個以數字命名的目錄,然后運行每個目錄下所有的collectors。如果你的目錄命名為60,tcollcetor將每60s運行這個目錄下每個collector。使用目錄0表示是一個常住的進行,一直運行。Tcollector將讀取它們的輸出,如果進程掛了,tcollector將重啟它。通常情況下,你想寫一個常住的collecotrs,因為這樣開銷相對比較小。OpenTSDB設計是針對每個metric有很多數據點,對于大部分的metrics每15s發送一次數據。
如果在collectors中有不是數據命名的目錄,將被忽略。同時還包括collectors使用到的lib和etc目錄。
?
Installation of tcollector
You need to clone tcollector from GitHub:
git clone git://github.com/OpenTSDB/tcollector.gitand edit 'tcollector/startstop' script to set following variable:?TSD_HOST=dns.name.of.tsd
To avoid having to run?mkmetric?for every metric that tcollector tracks you can to start TSD with the?--auto-metric?flag. This is useful to get started quickly, but it's not recommended to keep this flag in the long term, to avoid accidental metric creation.
在GitHub上下載相關代碼:
git clone git://github.com/OpenTSDB/tcollector.git在startstop中修改TSD_HOST=?dns.name.of.tsd,TSD主機的域名。
為了避免每個metric都運行mkmetric,使用--auto-metric啟動TSD。這樣啟動更快一些,但是不推薦在長時間保持這個設置,避免新增的metric。
?
Collectors bundled with?tcollector
The following are the collectors we've included as part of the base package, together with all of the metric names they report on and what they mean. If you have any others you'd like to contribute, we'd love to hear about them so we can reference them or include them with your permission in a future release.
下面的collector是系統自帶的基礎包,每個metric的實際意思和名稱吻合。
General collectors
0/dfstat.py
df狀態相關的,和/usr/bin/df命令類似
These stats are similar to ones provided by?/usr/bin/df?util.
- df.bytes.total
total size of data
- df.bytes.used
bytes used
- df.bytes.free
bytes free
- df.inodes.total
total number of inodes
- df.inodes.used
number of inodes used
- df.inodes.free
number of inodes free
These metrics include time series tagged with each mount point and the filesystem's fstype. This collector filters out any cgroup, debugfs, devtmpfs, rpc_pipefs, rootfs filesystems, as well as any any mountpoints mounted under?/dev/,?/sys/,?/proc/, and?/lib/.
這些metric包括時間序列以及文件系統的fstype。這個collector過濾任何的cgroup,debugfs,devtmpfs,rpc_pipefs,rootfs等文件系統,以及/dev,/sys/,/proc/,/lib/等掛著點。
With these tags you can select to graph just a specific filesystem, or all filesystems with a particular fstype (e.g. ext3).
有了這些tags,你可以選擇指定文件系統的圖,也可以選擇特定fstype對應的所有文件系統。
輸出結果如下圖:
[root@etch171 mars171 0]# ./dfstat.py df.bytes.total 1413306095 4159016960 mount=/ fstype=ext3 df.bytes.used 1413306095 3396472832 mount=/ fstype=ext3 df.bytes.percentused 1413306095 81.6652796722 mount=/ fstype=ext3 df.bytes.free 1413306095 762544128 mount=/ fstype=ext3 df.inodes.total 1413306095 1048576 mount=/ fstype=ext3 df.inodes.used 1413306095 74363 mount=/ fstype=ext3 df.inodes.percentused 1413306095 7.09180831909 mount=/ fstype=ext3 df.inodes.free 1413306095 974213 mount=/ fstype=ext3 df.bytes.total 1413306095 241564782592 mount=/data1 fstype=ext3 df.bytes.used 1413306095 202218672128 mount=/data1 fstype=ext3 df.bytes.percentused 1413306095 83.7119839896 mount=/data1 fstype=ext3 df.bytes.free 1413306095 39346110464 mount=/data1 fstype=ext3 df.inodes.total 1413306095 60882944 mount=/data1 fstype=ext3 df.inodes.used 1413306095 645826 mount=/data1 fstype=ext3 df.inodes.percentused 1413306095 1.06076670668 mount=/data1 fstype=ext3 df.inodes.free 1413306095 60237118 mount=/data1 fstype=ext3 ......
0/ifstat.py
來自于文件/proc/net/dev
These stats are from?/proc/net/dev.
- proc.net.bytes
(rate) Bytes in/out
- proc.net.packets
(rate) Packets in/out
- proc.net.errs
(rate) Packet errors in/out
- proc.net.dropped
(rate) Dropped packets in/out
These are interface counters, tagged with the interface,?iface=, and?direction=?in or out. Only?ethN?interfaces are tracked. We intentionally exclude?bondN?interfaces, because bonded interfaces still keep counters on their child?ethN?interfaces and we don't want to double-count a box's network traffic if you don't select on?iface=.
輸出的結果是和具體iface綁定,有in和out兩個方向。只有ethN的網卡接口會跟蹤。有意識地排除bondN接口,因為綁定的接口在ethN接口中進行記數,如果不選擇具體iface=的話,這樣可以避免double-count。
輸出結果:
proc.net.fifo.errs 1413338912 0 iface=eth0 direction=in proc.net.frame.errs 1413338912 0 iface=eth0 direction=in proc.net.compressed 1413338912 0 iface=eth0 direction=in proc.net.multicast 1413338912 6869312 iface=eth0 direction=in proc.net.bytes 1413338912 1064085376 iface=eth0 direction=out proc.net.packets 1413338912 7305051 iface=eth0 direction=out proc.net.errs 1413338912 0 iface=eth0 direction=out proc.net.dropped 1413338912 0 iface=eth0 direction=out proc.net.fifo.errs 1413338912 0 iface=eth0 direction=out proc.net.collisions 1413338912 0 iface=eth0 direction=out proc.net.carrier.errs 1413338912 0 iface=eth0 direction=out proc.net.compressed 1413338912 0 iface=eth0 direction=out proc.net.bytes 1413338912 100779466516 iface=eth1 direction=in proc.net.packets 1413338912 862873063 iface=eth1 direction=in proc.net.errs 1413338912 124 iface=eth1 direction=in proc.net.dropped 1413338912 0 iface=eth1 direction=in proc.net.fifo.errs 1413338912 0 iface=eth1 direction=in proc.net.frame.errs 1413338912 124 iface=eth1 direction=in proc.net.compressed 1413338912 0 iface=eth1 direction=in proc.net.multicast 1413338912 781541 iface=eth1 direction=in proc.net.bytes 1413338912 90765358317 iface=eth1 direction=out proc.net.packets 1413338912 976995995 iface=eth1 direction=out proc.net.errs 1413338912 0 iface=eth1 direction=out proc.net.dropped 1413338912 0 iface=eth1 direction=out
0/iostat.py
Data is from?/proc/diskstats.
- iostat.disk.*
per-disk stats
- iostat.part.*
per-partition stats (see note below on different metrics, depending on if you have a 2.6 kernel before 2.6.25 or after.)
See?iostats.txt
[root@typhoeus79 ice_test_m avaliables]# more /proc/diskstats 1 0 ram0 0 0 0 0 0 0 0 0 0 0 01 1 ram1 0 0 0 0 0 0 0 0 0 0 01 2 ram2 0 0 0 0 0 0 0 0 0 0 01 3 ram3 0 0 0 0 0 0 0 0 0 0 01 4 ram4 0 0 0 0 0 0 0 0 0 0 01 5 ram5 0 0 0 0 0 0 0 0 0 0 01 6 ram6 0 0 0 0 0 0 0 0 0 0 01 7 ram7 0 0 0 0 0 0 0 0 0 0 01 8 ram8 0 0 0 0 0 0 0 0 0 0 01 9 ram9 0 0 0 0 0 0 0 0 0 0 01 10 ram10 0 0 0 0 0 0 0 0 0 0 01 11 ram11 0 0 0 0 0 0 0 0 0 0 01 12 ram12 0 0 0 0 0 0 0 0 0 0 01 13 ram13 0 0 0 0 0 0 0 0 0 0 01 14 ram14 0 0 0 0 0 0 0 0 0 0 01 15 ram15 0 0 0 0 0 0 0 0 0 0 08 0 sda 194745 287649 6810384 578134 68316366 101295831 1361828191 887830852 0 157754620 8883733288 1 sda1 5048 1768 178050 23162 130155 188328 2548968 2781202 0 722873 28043158 2 sda2 1100 5771 53512 4594 506 24646 201216 10205 0 7300 147988 3 sda3 53769 7820 1194332 125424 13980592 15361786 234893716 116798689 0 55437523 1169133338 4 sda4 2 0 4 34 0 0 0 0 0 34 348 5 sda5 5325 158518 165019 5156 7897969 16671932 196569642 65590406 0 28552584 655889618 6 sda6 67094 57688 2033043 200871 42956415 34695714 621323346 634120014 0 86205636 6343039608 7 sda7 62381 56041 3185872 218802 3350729 34353425 306291303 68530336 0 11667279 687478303 0 hda 0 0 0 0 0 0 0 0 0 0 09 0 md0 0 0 0 0 0 0 0 0 0 0 0/proc/diskstats?has 11 stats for a given physical device. These are all rate counters, except?ios_in_progress.
/proc/diskstats對于物理設備有11個狀態,下面是這些值:
.read_requests Number of reads completed 已經完成讀的數目 .read_merged Number of reads merged 合并讀的數目 .read_sectors Number of sectors read 扇區讀的數目 .msec_read Time in msec spent reading .write_requests Number of writes completed .write_merged Number of writes merged .write_sectors Number of sectors written .msec_write Time in msec spent writing .ios_in_progress Number of I/O operations in progress .msec_total Time in msec doing I/O .msec_weighted_total Weighted time doing I/O (multiplied by ios_in_progress)in 2.6.25 and later, by-partition stats are reported the same as disks.
Note
in 2.6 before 2.6.25, partitions have only 4 stats per partition
.read_issued .read_sectors .write_issued .write_sectorsFor partitions, these?*_issued?are counters collected before requests are merged, so aren't the same as?*_requests?(which is post-merge, which more closely represents represents the actual number of disk transactions).
Given that diskstats provides both per-disk and per-partition data, for TSDB purposes we put them under different metrics (versus the same metric and different tags). Otherwise, if you look at a given metric, the data for a given box will be double-counted, since a given operation will increment both the disk series and the partition series. To fix this, we output by-disk data to?iostat.disk.*?and by-partition data to?iostat.part.*.
兩種不同的維度
0/netstat.py
Socket分配以及網絡統計,讀取的文件是
78 sockstat = open("/proc/net/sockstat")79 netstat = open("/proc/net/netstat")80 snmp = open("/proc/net/snmp")
例子:
[root@eos176 data1]# cat /proc/net/sockstat sockets: used 200 TCP: inuse 88 orphan 2 tw 290 alloc 89 mem 39 UDP: inuse 8 mem 2 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0?
Socket allocation and network statistics.
Metrics from?/proc/net/sockstat.
- net.sockstat.num_sockets
Number of sockets allocated (only TCP)
- net.sockstat.num_timewait
Number of TCP sockets currently in?TIME_WAIT?state
- net.sockstat.sockets_inuse
Number of sockets in use (TCP/UDP/raw)
- net.sockstat.num_orphans
Number of orphan TCP sockets (not attached to any file descriptor)
- net.sockstat.memory
Memory allocated for this socket type (in bytes)
- net.sockstat.ipfragqueues
Number of IP flows for which there are currently fragments queued for reassembly
Metrics from?/proc/net/netstat?(netstat?-s?command).
- net.stat.tcp.abort
Number of connections that the kernel had to abort. <code>type=memory</code> is especially bad, the kernel had to drop a connection due to having too many orphaned sockets. Other types are normal (e.g. timeout)
- net.stat.tcp.abort.failed
Number of times the kernel failed to abort a connection because it didn't even have enough memory to reset it (bad)
- net.stat.tcp.congestion.recovery
Number of times the kernel detected spurious retransmits and was able to recover part or all of the CWND
- net.stat.tcp.delayedack
Number of delayed ACKs sent of different types.
- net.stat.tcp.failed_accept
Number of times a connection had to be dropped after the 3WHS.?reason=full_acceptq?indicates that the application isn't accepting connections fast enough. You should see SYN cookies too
- net.stat.tcp.invalid_sack
Number of invalid SACKs we saw of diff types. (requires Linux v2.6.24-rc1 or newer)
- net.stat.tcp.memory.pressure
Number of times a socket entered the "memory pressure" mode (not great).
- net.stat.tcp.memory.prune
Number of times a socket had to discard received data due to low memory conditions (bad)
- net.stat.tcp.packetloss.recovery
Number of times we recovered from packet loss by type of recovery (e.g. fast retransmit vs SACK)
- net.stat.tcp.receive.queue.full
Number of times a received packet had to be dropped because the socket's receive queue was full. (requires Linux v2.6.34-rc2 or newer)
- net.stat.tcp.reording
Number of times we detected re-ordering and how
- net.stat.tcp.syncookies
SYN cookies (both sent & received?
輸出結果:
[root@typhoeus79 ice_test_m avaliables]# ./iostat.py iostat.disk.read_requests 1413341296 194745 dev=sda iostat.disk.read_merged 1413341296 287649 dev=sda iostat.disk.read_sectors 1413341296 6810384 dev=sda iostat.disk.msec_read 1413341296 578134 dev=sda iostat.disk.write_requests 1413341296 68320119 dev=sda iostat.disk.write_merged 1413341296 101301793 dev=sda iostat.disk.write_sectors 1413341296 1361905911 dev=sda iostat.disk.msec_write 1413341296 887834437 dev=sda iostat.disk.ios_in_progress 1413341296 0 dev=sda iostat.disk.msec_total 1413341296 157756976 dev=sda iostat.disk.msec_weighted_total 1413341296 888376910 dev=sda??
0/nfsstat.py--RPC統計
These stats are from?/proc/net/rpc/nfs.
- nfs.client.rpc.stats
RPC stats counter
It tagged with the type (<code>type=</code>) of operation. There are 3 operations:?authrefrsh?- number of times the authentication information refreshed,?calls?- number of calls conducted, and?retrans?- number of retransmissions
- nfs.client.rpc
RPC calls counter
It tagged with the version (version=) of NFS server that conducted the operation, and name of operation (op=)
Description of operations can be found at appropriate RFC: NFS ver. 3?RFC1813, NFS ver. 4?RFC3530, NFS ver. 4.1?RFC5661.
?
0/procnettcp.py
讀取文件是/proc/net/tcp{,6}
?
These stats are all from?/proc/net/tcp{,6}. (Note if IPv6 is enabled, some IPv4 connections seem to get put into?/proc/net/tcp6). Collector sleeps 60 seconds in between intervals. Due in part to a kernel performance issue in older kernels and in part due to systems with many TCP connections, this collector can take sometimes 5 minutes or more to run one interval, so the frequency of datapoints can be highly variable depending on the system.
- proc.net.tcp
Number of TCP connections
For each run of the collector, we classify each connection and generate subtotals. TSD will automatically total these up when displaying the graph, but you can drill down for each possible total or a particular one. Each connection is broken down with a tag for?user=username(with a fixed list of users we care about or put under "other" if not in the list). It is also broken down into state with?state=, (established, time_wait, etc). It is also broken down into services with <code>service=</code> (http, mysql, memcache, etc) Note that once a connection is closed, Linux seems to forget who opened/handled the connection. For connections in time_wait, for example, they will always show user=root. This collector does generate a large amount of datapoints, as the number of points is (S*(U+1)*V), where S=number of TCP states, U=Number of users you track, and V=number of services (collections of ports). The deduper does dedup this down very well, as only 3 of the 10 TCP states are generally ever seen. On a typical server this can dedup down to under 10 values per interval.
?
0/procstats.py
Miscellaneous stats from?/proc.
- proc.stat.cpu
(rate) CPU counters (jiffies), tagged by cpu type (type=user, nice, system, idle, iowait, irq, softirq, etc). As a rate they should aggregate up to approximately 100*numcpu per host. Best viewed as type=* or maybe type={user|nice|system|iowait|irq}
- proc.stat.intr
(rate) Number of interrupts
- proc.stat.ctxt
(rate) Number of context switches
See?http://www.linuxhowtos.org/System/procstat.htm
- proc.vmstat.*
A subset of VM Stats from?/proc/vmstat?(mix of rate and non-rate). See?http://www.linuxinsight.com/proc_vmstat.html?.
- proc.meminfo.*
Memory usage stats from?/proc/meminfo. See the?Linux kernel documentation
- proc.loadavg.*
1min, 5min, 15min, runnable, total_threads metrics from?/proc/loadavg
- proc.uptime.total
(rate) Seconds since boot
- proc.uptime.now
(rate) Seconds since boot that the system has been idle
- proc.kernel.entropy_avail
Amount of entropy (in bits) available in the input pool (the one that's cryptographically strong and backing?/dev/random?among other things). Watch this value on your frontend servers that do SSL unwrapping, if it gets too low, your SSL performance will suffer
- sys.numa.zoneallocs
Number of pages allocated from the preferred node (type=hit) or not (type=miss)
- sys.numa.foreign_allocs
Number of pages this node allocated because the preferred node didn't have a free page to accommodate the request
- sys.numa.allocation
Number of pages allocated locally (type=local) or remotely (type=remote) for processes executing on this node
- sys.numa.interleave
Number of pages allocated successfully by the interleave strategy
?
0/smart-stats.py
Stats from SMART disks.
- smart.raw_read_error_rate
Data related to the rate of hardware read errors that occurred when reading data from a disk surface. The raw value has different structure for different vendors and is often not meaningful as a decimal number. (vendor specific)
- smart.throughput_performance
Overall throughput performance of a hard disk drive
- smart.spin_up_time
Average time of spindle spin up (from zero RPM to fully operational [millisecs])
- smart.start_stop_count
A tally of spindle start/stop cycles
- smart.reallocated_sector_ct
Count of reallocated sectors
- smart.seek_error_rate
Rate of seek errors of the magnetic heads. (vendor specific)
- smart.seek_time_performance
Average performance of seek operations of the magnetic heads
- smart.power_on_hours
Count of hours in power-on state, shows total count of hours (or minutes, or seconds) in power-on state. (vendor specific)
- smart.spin_retry_count
Count of retry of spin start attempts
- smart.recalibration_retries
The count that recalibration was requested (under the condition that the first attempt was unsuccessful)
- smart.power_cycle_count
The count of full hard disk power on/off cycles
- smart.soft_read_error_rate
Uncorrected read errors reported to the operating system
- smart.program_fail_count_chip
Total number of Flash program operation failures since the drive was deployed
- smart.erase_fail_count_chip
"Pre-Fail" Attribute
- smart.wear_leveling_count
The maximum number of erase operations performed on a single flash memory block
- smart.used_rsvd_blk_cnt_chip
The number of a chip’s used reserved blocks
- smart.used_rsvd_blk_cnt_tot
"Pre-Fail" Attribute (at least HP devices)
- smart.unused_rsvd_blk_cnt_tot
"Pre-Fail" Attribute (at least Samsung devices)
- smart.program_fail_cnt_total
Total number of Flash program operation failures since the drive was deployed
- smart.erase_fail_count_total
"Pre-Fail" Attribute
- smart.runtime_bad_block
The total count of all read/program/erase failures
- smart.end_to_end_error
The count of parity errors which occur in the data path to the media via the drive's cache RAM (at least Hewlett-Packard)
- smart.reported_uncorrect
The count of errors that could not be recovered using hardware ECC
- smart.command_timeout
The count of aborted operations due to HDD timeout
- smart.high_fly_writes
HDD producers implement a Fly Height Monitor that attempts to provide additional protections for write operations by detecting when a recording head is flying outside its normal operating range. If an unsafe fly height condition is encountered, the write process is stopped, and the information is rewritten or reallocated to a safe region of the hard drive. This attribute indicates the count of these errors detected over the lifetime of the drive
- smart.airflow_temperature_celsius
Airflow temperature
- smart.g_sense_error_rate
The count of errors resulting from externally induced shock & vibration
- smart.power-off_retract_count
The count of times the heads are loaded off the media
- smart.load_cycle_count
Count of load/unload cycles into head landing zone position
- smart.temperature_celsius
Current internal temperature
- smart.hardware_ecc_recovered
The count of errors that were recovered using hardware ECC
- smart.reallocated_event_count
Count of remap operations. The raw value of this attribute shows the total count of attempts to transfer data from reallocated sectors to a spare area
- smart.current_pending_sector
Count of "unstable" sectors (waiting to be remapped, because of unrecoverable read errors)
- smart.offline_uncorrectable
The total count of uncorrectable errors when reading/writing a sector
- smart.udma_crc_error_count
The count of errors in data transfer via the interface cable as determined by ICRC (Interface Cyclic Redundancy Check)
- smart.write_error_rate
The total count of errors when writing a sector
- smart.media_wearout_indicator
The normalized value of 100 (when the SSD is new) and declines to a minimum value of 1
- smart.transfer_error_rate
Count of times the link is reset during a data transfer
- smart.total_lba_writes
Total count of LBAs written
- smart.total_lba_read
Total count of LBAs read
Description of metrics can be found at:?S.M.A.R.T. article on wikipedia. The best way to understand/find metric is to look at producer's specification.
?
Other collectors
0/couchbase.py
0/elasticsearch.py
0/hadoop_datanode_jmx.py
0/haproxy.py
0/hbase_regionserver_jmx.py
0/mongo.py
0/mysql.py
Stats from MySQL (relational database).
Refer to the following documentation for metrics description: InnoDB?Innodb monitors, Global?Show status, Engine?Show engine, Slave?Show slave status, Process list?Show process list.
?
0/postgresql.py
0/redis-stats.py
Stats from Redis (key-value store).
Refer to the following documentation for metrics description:?Redis info comands.
0/riak.py
0/varnishstat.py
Stats from Varnish (HTTP accelerator).
0/zookeeper.py
Stats from Zookeeper (centralized service for distributed synchronization).
Refer to the following documentation for metrics description:?Zookeeper admin commands.
代碼結構如下:
[root@etch171 mars171 collectors]# tree -L 2 . |-- 0 | |-- couchbase.py | |-- dfstat.py | |-- elasticsearch.py | |-- graphite_bridge.py | |-- hadoop_datanode.py | |-- hadoop_namenode.py | |-- haproxy.py | |-- hbase_master.py | |-- hbase_regionserver.py | |-- ifstat.py | |-- iostat.py | |-- mongo.py | |-- mysql.py | |-- netstat.py | |-- nfsstat.py | |-- opentsdb.sh | |-- postgresql.py | |-- procnettcp.py | |-- procstats.py | |-- redis-stats.py | |-- riak.py | |-- smart-stats.py | |-- udp_bridge.py | |-- varnishstat.py | |-- zabbix_bridge.py | |-- zfsiostats.py | |-- zfskernstats.py | `-- zookeeper.py |-- __init__.py |-- etc | |-- __init__.py | |-- config.py | |-- graphite_bridge_conf.py | |-- mysqlconf.py | |-- postgresqlconf.py | |-- udp_bridge_conf.py | `-- zabbix_bridge_conf.py `-- lib|-- __init__.py|-- hadoop_http.py`-- utils.py
?
【參考資料】
1、http://opentsdb.net/docs/build/html/user_guide/utilities/tcollector.html
2、http://en.wikipedia.org/wiki/Wire_protocol
3、http://www.ttlsa.com/opentsdb/opentsdb-nagios-monitoring-and-alarming-realization/
轉載于:https://www.cnblogs.com/gsblog/p/4025482.html
總結
以上是生活随笔為你收集整理的TCollector的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 多种云资源管理用什么软件好?你知道吗?
- 下一篇: 软件项目成本管理