VIRTIO 前后端驱动中 GPA,HVA 转换原理
先說幾個英文縮寫:
GVA - Guest Virtual Address,虛擬機的虛擬地址
GPA - Guest Physical Address,虛擬機的物理地址
HVA - Host Virtual Address,宿主機虛擬地址,也就是對應kvmtool中申請的地址
HPA - Host Physical Address,宿主機物理地址
使用 kvmtool 啟動arm guest時 (樹莓派4b上測試),kvmtool為guest準備的系統內存分布如下 (kvmtool: arm/include/arm-common/kvm-arch.h),
?使用如下命令啟動guest時,那么guest?使用的”物理”內存(GPA)段為:0x80000000?~?0xBFFFFFFF。
lkvm?run?-k?Image??--console?virtio?-i?rootfs.cpio.gz???-c?2?-m?1024?-d?/dev/ram0?--vsock?3見guest?啟動日志,
?kvmtool需要為virtio設備分配mmio?space、irq等,mmio?space的起始地址是0x3000000(即guest內存分布圖48M位置),
Info: virtio-mmio.devices=0x200@0x3000000:100 Info: virtio-mmio.devices=0x200@0x3000200:101 Info: virtio-mmio.devices=0x200@0x3000400:102 Info: virtio-mmio.devices=0x200@0x3000600:103GPA是guest系統中假的物理地址,guest對GPA的訪問,最終還是需要訪問HVA,為此需要給GPA內存段0x80000000~0xBFFFFFFF準備同樣長度的HVA地址段,kvmtool是通過?mmap?或?hugetlb?為guest準備HVA地址段?(kvmtool:?arm/kvm.c:?kvm__arch_init),設置完成后我們會得到如下數據,然后將GPA與HVA段注冊到kvmtool中的mem_banks中(kvmtool:kvm.c:kvm__register_mem),
kvm__initkvm__arch_initkvm->ram_size = ram_size;kvm->ram_start = mmap_hugetlbfs(kvm, hugetlbfs_path, size)or mmap(NULL, size, PROT_RW, MAP_ANON_NORESERVE, -1, 0)madvise(kvm->ram_start, kvm->ram_size, MADV_MERGEABLE) // 相同頁合并, KSM may merge identical pageskvm__init_ramphys_size = kvm->ram_size;host_mem = kvm->ram_start;kvm__register_ram(kvm, phys_start, phys_size, host_mem); printf("GPA:?%llx?mapped?to?HVA?%p?with?size?%llx\n",?kvm->arch.memory_guest_start,?kvm->ram_start,?kvm->ram_size);? eg,GPA:?80000000?mapped?to?HVA:?0x7f73a00000?with?size?40000000mem_banks主要用于在virtio?用戶態設備驅動?(eg,virtio-console的后端處理)處理guest數據時完成GPA與HVA之間的轉換(同時kvm__register_mem還會調用“ioctl(kvm->vm_fd, KVM_SET_USER_MEMORY_REGION, &mem)” 將mem注冊到內核kvm)。
例如,如果guest有串口數據需要host幫其顯示,guest填充好數據后寫virtio-console的VIRTIO_MMIO_QUEUE_NOTIFY字段實現kick?host?然后VM_EXIT到?host(寫 IO 指令導致虛擬機vm_exit),為virtio-console?tx?queue注冊的線程會被?VIRTIO_MMIO_QUEUE_NOTIFY?eventfd喚醒,tx?queue處理函數?(VIRTIO_CONSOLE_TX_QUEUE?callback)?會被調用,
static void virtio_console_handle_callback(struct kvm *kvm, void *param) {struct iovec iov[VIRTIO_CONSOLE_QUEUE_SIZE];struct virt_queue *vq;u16 out, in, head;u32 len;vq = param;/** The current Linux implementation polls for the buffer* to be used, rather than waiting for an interrupt.* So there is no need to inject an interrupt for the tx path.*/while (virt_queue__available(vq)) {//從tx queue中獲取iovhead = virt_queue__get_iov(vq, iov, &out, &in, kvm); //打印到terminal fd:writev(term_fds[term][TERM_FD_OUT], iov, iovcnt);len = term_putc_iov(iov, out, 0); virt_queue__set_used_elem(vq, head, len);} }virt_queue__get_iov() -> virt_queue__get_head_iov() 從vring desc_chain中獲取到GPA,并將其轉化為HVA,do {/* Grab the first descriptor, and check it's OK. */iov[*out + *in].iov_len = virtio_guest_to_host_u32(vq, desc[idx].len);// HVA = guest_flat_to_host(kvm, GPA)iov[*out + *in].iov_base = guest_flat_to_host(kvm, virtio_guest_to_host_u64(vq, desc[idx].addr));?/* If this is an input descriptor, increment that count. */if (virt_desc__test_flag(vq, &desc[idx], VRING_DESC_F_WRITE))(*in)++;else(*out)++;} while ((idx = next_desc(vq, desc, idx, max)) != max);void *guest_flat_to_host(struct kvm *kvm, u64 offset) {struct kvm_mem_bank *bank;list_for_each_entry(bank, &kvm->mem_banks, list) {?u64 bank_start = bank->guest_phys_addr;u64 bank_end = bank_start + bank->size;// 找到GPA所在的bankif (offset >= bank_start && offset < bank_end)?return bank->host_addr + (offset - bank_start); // GPA = HVA_start + GPA_offset}return NULL; }如果virtio設備后端驅動是vhost模式,那么host處理guest數據請求時,就需要在內核態完成GPA與HVA之間的轉換,為此需要為vhost驅動把mem_banks注冊到內核態,例如vsock(kvmtool:?virtio/vsock.c)的注冊,
? ? ? i = 0;list_for_each_entry(bank, &kvm->mem_banks, list) {mem->regions[i] = (struct vhost_memory_region) {.guest_phys_addr = bank->guest_phys_addr, //GPA start.memory_size ? ? = bank->size,.userspace_addr? = (unsigned long)bank->host_addr, // HVA start};i++;}mem->nregions = i;r = ioctl(vdev->vhost_fd, VHOST_SET_MEM_TABLE, mem);if (r != 0)die_perror("VHOST_SET_MEM_TABLE failed");內核中注冊mem_banks流程如下(kernel:?drivers/vhost/vhost.c),
vhost_dev_ioctl() -> vhost_set_memory() ->vhost_iotlb_add_range, int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, u64 start,u64 last, u64 addr, unsignedint perm) {? ? ? ? ? ? ? ?struct vhost_iotlb_map *map; ? ? ?map = kmalloc(sizeof(*map), GFP_ATOMIC);map->start = start; //GPA startmap->size = last - start + 1;map->last = last; ? //GPA endmap->addr = addr; ? //HVA startmap->perm = perm;iotlb->nmaps++;vhost_iotlb_itree_insert(map, &iotlb->root);//map信息插入到rb treeINIT_LIST_HEAD(&map->link);list_add_tail(&map->link, &iotlb->list);return0; }vhost設備驅動接收到guest的kick后,需要從ving?desc_chain中獲取GPA,轉化為HVA后即可執行copy_from_iter獲取guest填充的數據,
tx kick func(eg vhost_vsock_handle_tx_kick)-> vhost_get_vq_desc() -> translate_desc(), static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len, struct iovec iov[], int?iov_size, int access) {const struct vhost_iotlb_map *map;//參數addr就是virtio前端驅動填充到vring desc_chain中的GPA,根據GPA找到mapmap = vhost_iotlb_itree_first(umem, addr, addr + len - 1);?//size = map->size - offset (addr - map->start)size = map->size - addr + map->start;iov->iov_len = min((u64)len - s, size);//HVA = HVA_start + offset(GPA - GPA_start)//拿到HVA后,vhost設備驅動可通過copy_from_iter獲取guest發送的數據了_iov->iov_base = (void __user *)(unsigned long)(map->addr + addr - map->start); }對于反方向的數據發送(host?call?guest)原理也是類似的。
上面說到?virtio?前端驅動向vring?desc_chain中填充的是GPA,對于所有的?virtio?前端驅動處理流程都是一樣的,以split?vring為例,
virtqueue_add_sgsvirtqueue_addvirtqueue_add_split?desc[i].addr = cpu_to_virtio64(_vq->vdev, addr); // addr is GPAdesc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);在virtio前端驅動填充GPA到desc_chain之前,如果guest需要向GPA中寫入數據?(eg,virtio-blk的寫IO操作、virtio-net的guest發包操作),其流程為?wirte data to?GVA?->?GPA?->?HVA?->?HPA,?對于arm-v8來說?GPA?->?HPA?由Stage?2?translation完成。而上文討論的GPA與HVA之間的轉換不涉及Stage?2?translation。下圖展示了多種地址轉換所處的位置,
總結
以上是生活随笔為你收集整理的VIRTIO 前后端驱动中 GPA,HVA 转换原理的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 谷粒学院(十九)统计分析模块 | 定时任
- 下一篇: go、JS AES(CBC模式)加密解密