解决ovirt虚拟机使用FCP瘦分配安装win10系统卡死的问题
問題描述:
僅在FCP 瘦分配模式下會(huì)出現(xiàn)該問題,測試將win10安裝到Getting files ready for installation(13%)時(shí)卡死,通過virsh看到,虛擬機(jī)狀態(tài)進(jìn)入pause
1、2日志中均出現(xiàn)如下報(bào)錯(cuò):
2018-04-18 18:37:21,556+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) CPU stopped: onSuspend (vm:5085)
2018-04-18 18:37:21,556+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) abnormal vm stop device scsi0-0-0-0 error (vm:4218)
2018-04-18 18:37:21,556+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) CPU stopped: onIOError (vm:5085)
2018-04-18 18:37:21,556+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) abnormal vm stop device scsi0-0-0-0 error enospc (vm:4218)
2018-04-18 18:37:21,556+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) CPU stopped: onIOError (vm:5085)
2018-04-18 18:37:21,558+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) No VM drives were extended (vm:4225)
2018-04-18 18:37:21,559+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) abnormal vm stop device scsi0-0-0-0 error enospc (vm:4218)
2018-04-18 18:37:21,559+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) CPU stopped: onIOError (vm:5085)
2018-04-18 18:37:21,561+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) No VM drives were extended (vm:4225)
2018-04-18 18:37:21,561+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) abnormal vm stop device scsi0-0-0-0 error enospc (vm:4218)
2018-04-18 18:37:21,561+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) CPU stopped: onIOError (vm:5085)
2018-04-18 18:37:21,563+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) No VM drives were extended (vm:4225)
2018-04-18 18:37:21,563+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) abnormal vm stop device scsi0-0-0-0 error enospc (vm:4218)
2018-04-18 18:37:21,563+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) CPU stopped: onIOError (vm:5085)
2018-04-18 18:37:21,565+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) No VM drives were extended (vm:4225)
2018-04-18 18:37:21,566+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) abnormal vm stop device scsi0-0-0-0 error enospc (vm:4218)
2018-04-18 18:37:21,566+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) CPU stopped: onIOError (vm:5085)
2018-04-18 18:37:21,568+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) No VM drives were extended (vm:4225)
2018-04-18 18:37:21,568+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) abnormal vm stop device scsi0-0-0-0 error enospc (vm:4218)
2018-04-18 18:37:21,568+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) CPU stopped: onIOError (vm:5085)
2018-04-18 18:37:21,570+0800 INFO (libvirt/events) [virt.vm] (vmId=’13a5e5bd-f101-45df-bf0c-923da11bec67’) No VM drives were extended (vm:4225)
實(shí)際測試中出現(xiàn)虛擬機(jī)掛起問題,問題在于vdsm端通過獲取qemu的ENOSPC報(bào)錯(cuò)信息,通過下圖流程:
Created with Rapha?l 2.1.2extendDrivesIfNeedextendDriveVolume實(shí)際是在_shouldExtendVolume中進(jìn)行physical - alloc < drive.watermarkLimit判定時(shí)出錯(cuò),而watermarkLimit參數(shù)由self.VOLWM_FREE_PCT * self.volExtensionChunk / 100計(jì)算得到,由vdsm.conf中的參數(shù)volume_utilization_chunk_mb(默認(rèn)1024)決定,默認(rèn)值為512MB,當(dāng)vm硬盤對(duì)容量擴(kuò)大大于限制值擴(kuò)大會(huì)失敗,嘗試改大conf參數(shù),還是會(huì)出問題,故在代碼中取消對(duì)其磁盤擴(kuò)展的限制,后面發(fā)現(xiàn)修改會(huì)引入瘦分配失效的問題:
Apr 21 14:34:07 Linx vdsmd: —-extend:[]
Apr 21 14:34:07 Linx vdsmd: ——out extend Drives
Apr 21 14:34:09 Linx vdsmd: ——in extend Drives
Apr 21 14:34:09 Linx vdsmd: blockInfo:[107374182400L, 0L, 10737418240L]
Apr 21 14:34:09 Linx vdsmd: —-ret:[(, ‘a(chǎn)b0474bc-7b63-413a-8817-6f4fd4bdd871’, 107374182400L, 0L, 10737418240L)]
Apr 21 14:34:09 Linx vdsmd: —-extend:[]
Apr 21 14:34:09 Linx vdsmd: ——out extend Drives
Apr 21 14:34:11 Linx vdsmd: ——in extend Drives
Apr 21 14:34:11 Linx vdsmd: blockInfo:[107374182400L, 0L, 10737418240L]
Apr 21 14:34:11 Linx vdsmd: —-ret:[(, ‘a(chǎn)b0474bc-7b63-413a-8817-6f4fd4bdd871’, 107374182400L, 0L, 10737418240L)]
Apr 21 14:34:11 Linx vdsmd: —-extend:[]
Apr 21 14:34:11 Linx vdsmd: ——out extend Drives
Apr 21 14:34:13 Linx vdsmd: ——in extend Drives
Apr 21 14:34:13 Linx vdsmd: blockInfo:[107374182400L, 0L, 10737418240L]
Apr 21 14:34:13 Linx vdsmd: —-ret:[(, ‘a(chǎn)b0474bc-7b63-413a-8817-6f4fd4bdd871’, 107374182400L, 0L, 10737418240L)]
Apr 21 14:34:13 Linx vdsmd: —-extend:[]
Apr 21 14:34:13 Linx vdsmd: ——out extend Drives
Apr 21 14:34:15 Linx vdsmd: ——in extend Drives
Apr 21 14:34:15 Linx vdsmd: blockInfo:[107374182400L, 0L, 10737418240L]
Apr 21 14:34:15 Linx vdsmd: —-ret:[(
瘦分配的主要用途,是減小虛擬機(jī)對(duì)磁盤占用率,瘦分配失效會(huì)導(dǎo)致每次創(chuàng)建快照后,該虛擬機(jī)對(duì)磁盤的
占用率會(huì)翻倍,通過打印發(fā)現(xiàn)ENOSPC的事件觸發(fā)不停發(fā)生,且alloc size = 0, commit中的
修復(fù)直接返回True,會(huì)導(dǎo)致磁盤不停擴(kuò)容,直到上限,也就導(dǎo)致了瘦分配的失效。
繼續(xù)排查:
log如下:
Apr 21 15:42:06 Linx vdsmd: blockInfo:[107374182400L, 0L, 536870912L]
Apr 21 15:42:06 Linx vdsmd: —-ret:[(, u’83cf08db-ea47-40be-9b6f-c39b8444d369’, 107374182400L, 0L, 536870912L)]
Apr 21 15:42:06 Linx vdsmd: —-false3, physical:536870912; alloc:0; watermarkLimit:268435456;
vdsm端擴(kuò)展內(nèi)存的調(diào)用如下兩條線:
流程1:
Created with Rapha?l 2.1.2DriveWatermarkMonitor_execute->self._vm.extendDrivesIfNeeded()流程2:
Created with Rapha?l 2.1.2onIOError(callBack)->if reason == 'ENOSPCself.extendDrivesIfNeeded()兩條線代表了如下兩種情況:
流程1)后臺(tái)監(jiān)控對(duì)磁盤水位進(jìn)行實(shí)時(shí)監(jiān)控,當(dāng)達(dá)到閥值,擴(kuò)展磁盤大小。
流程2)當(dāng)實(shí)時(shí)監(jiān)控的間隙出現(xiàn)磁盤達(dá)到閥值,qemu會(huì)掛起虛擬機(jī)并且拋出ENOSPC的異常,vdsm中捕獲該異常,并且擴(kuò)展磁盤。
如果僅僅針對(duì)qemu掛起后的情況進(jìn)行處理,會(huì)導(dǎo)致磁盤持續(xù)寫入時(shí)虛擬機(jī)斷續(xù)掛起,表現(xiàn)在使用中的情況是不停出現(xiàn)卡頓,所以需要解決的是vdsm端_getExtendCandidates中調(diào)用libvirt self._dom.blockInfo無法獲取到當(dāng)前磁盤實(shí)際大小的問題。
self._dom.blockInfo在libvirt中和domblkinfo調(diào)用流程相同,實(shí)際測試如下:
virsh # domblkinfo –device sda –domain linx80-1
Capacity: 107374182400
Allocation: 0
Physical: 536870912
virsh # domstats –block –domain linx80-1
Domain: ‘linx80-1’
block.count=2
block.0.name=hdc
block.0.rd.reqs=4
block.0.rd.bytes=152
block.0.rd.times=82981
block.0.wr.reqs=0
block.0.wr.bytes=0
block.0.wr.times=0
block.0.fl.reqs=0
block.0.fl.times=0
block.0.allocation=0
block.0.physical=0
block.1.name=sda
block.1.path=/rhev/data-center/a3ae667f-bd61-4b9e-903b-9f57b2e89080/572679d5-b080-425e-9e4f-f5d01988a6be/images/1796cd02-5b15-4d72-864d-4bc33ca7cc1c/8ad5f4b3-b3fd-43c5-8c47-8da9a24904e9
block.1.rd.reqs=13340
block.1.rd.bytes=404829696
block.1.rd.times=73699128640
block.1.wr.reqs=1182
block.1.wr.bytes=128020480
block.1.wr.times=128182811428
block.1.fl.reqs=157
block.1.fl.times=401870367
block.1.allocation=149946368
block.1.capacity=107374182400
block.1.physical=536870912
可以看出domblkinfo無法獲取到Allocation,但domstats卻能獲取到allocation,定位到該問題出現(xiàn)在libvirt端。
libvirt中domblkinfo調(diào)用流程如下:
Created with Rapha?l 2.1.2domblkinfocmdDomblkinfovirDomainGetBlockInfoconn->drivce->domainGetBlockInfoqemuDomainGetBlockInfoqemuMonitorGetAllBlockStatsInfo主要問題出在qemuDomainGetBlockInfo中的:
if (entry->physical == 0 || info->allocation == 0 ||info->allocation == entry->physical) {info->allocation = entry->physical;if (info->allocation == 0) info->allocation = entry->physical;if (qemuDomainStorageUpdatePhysical(driver, cfg, vm, disk->src) < 0)goto endjob;info->physical = disk->src->physical;} else {info->physical = entry->physical;}libvirt當(dāng)從monitor獲得的磁盤現(xiàn)有大小為0時(shí)強(qiáng)行將已分配大小置0,之后從disk->src中重新獲取磁盤大小,主要目的在于保證已分配大小不大于實(shí)際大小,但當(dāng)從disk->src中獲取到的磁盤大小不為0時(shí),已分配大小卻為0,實(shí)際改動(dòng)見[1],只有當(dāng)從disk->src中獲取的磁盤大小為0時(shí),才真正的將allocation大小置0。
測試:
進(jìn)行了3次重裝win10,安裝過程磁盤分配均正常
總結(jié)
以上是生活随笔為你收集整理的解决ovirt虚拟机使用FCP瘦分配安装win10系统卡死的问题的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Python中局部放大图案例
- 下一篇: java信息管理系统总结_java实现科