mlx rdma网卡指标参数简介
生活随笔
收集整理的這篇文章主要介紹了
mlx rdma网卡指标参数简介
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
mlx rdma網(wǎng)卡指標(biāo)參數(shù)簡(jiǎn)介
- 綜述
- hw_counter
- counter
- 參考鏈接
綜述
mlx5 driver在linux sysfs下有一系列的mlx網(wǎng)卡參數(shù)和計(jì)數(shù)器分布在/sys/class/infiniband/mlx5_x/ports/1/counters和/sys/class/infiniband/mlx5_x/ports/1/hw_counters目錄下,這些參數(shù)統(tǒng)計(jì)了某種類型的事件發(fā)生的次數(shù),如某種錯(cuò)誤數(shù),收包數(shù)等等。理解這些參數(shù),可以幫助我們更好的理解mlx網(wǎng)卡的運(yùn)行狀態(tài),通過(guò)監(jiān)控,可以更快的定位rdma報(bào)錯(cuò)的根因
hw_counter
- rnr_nak_retry_err:本機(jī)作為發(fā)送方,收到對(duì)端發(fā)來(lái)的RNR NAK包的數(shù)量。如果接收方qp的srq沒有空閑了,這個(gè)計(jì)數(shù)會(huì)漲
- out_of_buffer:本機(jī)作為接收方,收包的時(shí)候發(fā)現(xiàn)沒有buffer了,如果自己qp的srq滿了,這個(gè)計(jì)數(shù)會(huì)漲
- out_of_sequence:收包亂序
- local_ack_timeout_err:發(fā)送的rdma請(qǐng)求超時(shí)計(jì)數(shù)
- packet_seq_err:本機(jī)收到NAK包計(jì)數(shù)
- req_cqe_error:本機(jī)CQE報(bào)錯(cuò)計(jì)數(shù)
- duplicate_request:本機(jī)收到重復(fù)包
- np_ecn_marked_roce_packets:本機(jī)收到的ecn
counter
- port_rcv_data: Total number of data octets, divided by 4 (lanes), received on all VLs. This is 64 bit counter.
- port_rcv_packets: Total number of packets (this may include packets containing Errors. This is 64 bit counter.
- port_xmit_data: Total number of data octets, divided by 4 (lanes), transmitted on all VLs. This is 64 bit counter.
- port_xmit_packets: Total number of packets transmitted on all VLs from this port. This may include packets with errors.
- unicast_rcv_packets: Total number of unicast packets, including unicast packets containing errors.
- unicast_xmit_packets: Total number of unicast packets transmitted on all VLs from the port. This may include unicast packets with errors.
參考鏈接
總結(jié)
以上是生活随笔為你收集整理的mlx rdma网卡指标参数简介的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 虚拟化技术概念基础
- 下一篇: DPDK精准测量时间