Nova如何统计节点硬件资源
引言
當我們在使用那些建設在OpenStack之上的云平臺服務的時候,往往在概覽頁面都有一個明顯的位置用來展示當前集群的一些資源使用情況,如,CPU,內存,硬盤等資源的總量、使用量、剩余量。而且,每當我們拓展集群規模之后,概覽頁面上的資源總量也會自動增加,我們都熟知,OpenStack中的Nova服務負責管理這些計算資源,那么你有沒有想過,它們是如何被Nova服務獲取的嗎?
Nova如何統計資源
我們知道,統計資源的操作屬于Nova服務內部的機制,考慮到資源統計結果對后續操作(如創建虛擬機,創建硬盤)的重要性,我們推斷該機制的運行順序一定先于其他服務。
通過上述簡單的分析,再加上一些必要的Debug操作,我們得出:
該機制的觸發點位于nova.service.WSGIService.start方法中:
其中,self.manager.pre_start_hook()的作用就是去獲取資源信息,它的直接調用為nova.compute.manager.pre_start_hook如下:
def pre_start_hook(self):"""After the service is initialized, but before we fully bringthe service up by listening on RPC queues, make sure to updateour available resources (and indirectly our available nodes)."""self.update_available_resource(nova.context.get_admin_context()) ...@periodic_task.periodic_taskdef update_available_resource(self, context):"""See driver.get_available_resource()Periodic process that keeps that the compute host's understanding ofresource availability and usage in sync with the underlying hypervisor.:param context: security context"""new_resource_tracker_dict = {}nodenames = set(self.driver.get_available_nodes())for nodename in nodenames:rt = self._get_resource_tracker(nodename)rt.update_available_resource(context)new_resource_tracker_dict[nodename] = rt# Delete orphan compute node not reported by driver but still in dbcompute_nodes_in_db = self._get_compute_nodes_in_db(context,use_slave=True)for cn in compute_nodes_in_db:if cn.hypervisor_hostname not in nodenames:LOG.audit(_("Deleting orphan compute node %s") % cn.id)cn.destroy()self._resource_tracker_dict = new_resource_tracker_dict上述代碼中的rt.update_available_resource()的直接調用實為nova.compute.resource_tracker.update_available_resource()如下:
def update_available_resource(self, context):"""Override in-memory calculations of compute node resource usage basedon data audited from the hypervisor layer.Add in resource claims in progress to account for operations that havedeclared a need for resources, but not necessarily retrieved them fromthe hypervisor layer yet."""LOG.audit(_("Auditing locally available compute resources"))resources = self.driver.get_available_resource(self.nodename)if not resources:# The virt driver does not support this functionLOG.audit(_("Virt driver does not support ""'get_available_resource' Compute tracking is disabled."))self.compute_node = Nonereturnresources['host_ip'] = CONF.my_ip# TODO(berrange): remove this once all virt drivers are updated# to report topologyif "numa_topology" not in resources:resources["numa_topology"] = Noneself._verify_resources(resources)self._report_hypervisor_resource_view(resources)return self._update_available_resource(context, resources)上述代碼中的self._update_available_resource的作用是根據計算節點上的資源實際使用結果來同步數據庫記錄,這里我們不做展開;self.driver.get_available_resource()的作用就是獲取節點硬件資源信息,它的實際調用為:
class LibvirtDriver(driver.ComputeDriver):def get_available_resource(self, nodename):"""Retrieve resource information.This method is called when nova-compute launches, andas part of a periodic task that records the results in the DB.:param nodename: will be put in PCI device:returns: dictionary containing resource info"""# Temporary: convert supported_instances into a string, while keeping# the RPC version as JSON. Can be changed when RPC broadcast is removedstats = self.get_host_stats(refresh=True)stats['supported_instances'] = jsonutils.dumps(stats['supported_instances'])return statsdef get_host_stats(self, refresh=False):"""Return the current state of the host.If 'refresh' is True, run update the stats first."""return self.host_state.get_host_stats(refresh=refresh)def _get_vcpu_total(self):"""Get available vcpu number of physical computer.:returns: the number of cpu core instances can be used."""if self._vcpu_total != 0:return self._vcpu_totaltry:total_pcpus = self._conn.getInfo()[2] + 1except libvirt.libvirtError:LOG.warn(_LW("Cannot get the number of cpu, because this ""function is not implemented for this platform. "))return 0if CONF.vcpu_pin_set is None:self._vcpu_total = total_pcpusreturn self._vcpu_totalavailable_ids = hardware.get_vcpu_pin_set()if sorted(available_ids)[-1] >= total_pcpus:raise exception.Invalid(_("Invalid vcpu_pin_set config, ""out of hypervisor cpu range."))self._vcpu_total = len(available_ids)return self._vcpu_total..... class HostState(object):"""Manages information about the compute node through libvirt."""def __init__(self, driver):super(HostState, self).__init__()self._stats = {}self.driver = driverself.update_status()def get_host_stats(self, refresh=False):"""Return the current state of the host.If 'refresh' is True, run update the stats first."""if refresh or not self._stats:self.update_status()return self._statsdef update_status(self):"""Retrieve status info from libvirt."""...data["vcpus"] = self.driver._get_vcpu_total()data["memory_mb"] = self.driver._get_memory_mb_total()data["local_gb"] = disk_info_dict['total']data["vcpus_used"] = self.driver._get_vcpu_used()data["memory_mb_used"] = self.driver._get_memory_mb_used()data["local_gb_used"] = disk_info_dict['used']data["hypervisor_type"] = self.driver._get_hypervisor_type()data["hypervisor_version"] = self.driver._get_hypervisor_version()data["hypervisor_hostname"] = self.driver._get_hypervisor_hostname()data["cpu_info"] = self.driver._get_cpu_info()data['disk_available_least'] = _get_disk_available_least()...注意get_available_resource方法的注釋信息,完全符合我們開始的推斷。我們下面單以vcpus為例繼續調查資源統計流程,self.driver._get_vcpu_total的實際調用為LibvirtDriver._get_vcpu_total(上述代碼中已給出),如果配置項vcpu_pin_set沒有生效,那么得到的_vcpu_total的值為self._conn.getInfo()[2](self._conn可以理解為libvirt的適配器,它代表與kvm,qemu等底層虛擬化工具的抽象連接,getInfo()就是對libvirtmod.virNodeGetInfo的一次簡單的封裝,它的返回值是一組數組,其中第三個元素就是vcpus的數量),我們看到這里基本就可以了,再往下就是libvirt的C語言代碼而不是Python的范疇了。
另一方面,如果我們配置了vcpu_pin_set配置項,那么該配置項就被hardware.get_vcpu_pin_set方法解析成一個可用CPU位置索引的集合,再通過對該集合求長后,我們也能得到最終想要的vcpus的數量。
如上,就是Nova統計節點硬件資源的整個邏輯過程(vcpus為例)。
總結
以上是生活随笔為你收集整理的Nova如何统计节点硬件资源的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 【POI xls】解析xls遇到的问题
- 下一篇: jquery可见性选择器(匹配匹配所有显