如何在nova中实现一个类似cinder的volume插件

原文地址:http://aspirer2004.blog.163.com/blog/static/1067647201422841039140/

github地址:https://github.com/aspirer/docfiles/raw/master/%E5%A6%82%E4%BD%95%E5%9C%A8nova%E4%B8%AD%E5%AE%9E%E7%8E%B0%E4%B8%80%E4%B8%AA%E7%B1%BB%E4%BC%BCcinder%E7%9A%84volume%E6%8F%92%E4%BB%B6.docx

OneDrive(SkyDrive)地址:http://1drv.ms/1jBTNqK

nova 兼容Netease Block Service插件实现

1.      现状说明

本文基于havana版本nova来进行代码分析和原型实现。

实现一套新的volume插件的原因是,nbs不提供兼容cinder的api,无法使用现有的cinder插件,并且libvirt volume driver也没有nbs的实现,需要重新编写代码来完成这部分工作。

已有的实现是比较不好的一种方式,也即通过配置参数在api层区分是nbs 的volume还是cinder的volume,如果是nbs的就走新增加的挂卷、卸卷等流程(nbs支持扩容卷、修改卷QoS功能),这样就要维护很多冗余代码,包括从compute/api->compute/rpc_api->compute/manager->libvirt driver一整条代码链都要自己实现,并且绝大部分代码都是直接拷贝的cinder的实现流程,维护起来也很困难,一旦要进行大版本升级就带来很大的工作量,比如nova从F版本升级H版本的时候就花费了很长时间进行代码移植和测试工作。此外还有很多设计到卷的代码都要考虑兼容性问题,很多周边流程需要处理,导致很多与volume相关的功能在没有修改前都没办法使用。到目前为止还有很多功能都没能提供,比如unshelve、在线迁移、从volume启动虚拟机等等,我们仅仅实现了已有的基本功能(创建、删除、开机、关机、resize、cold migration等)。

2.      改进目标

基于上述原因,以及更重要的一点,为后续新功能开发做准备,我们想要实现一套新的的流程,目标是最大化的复用nova已有的代码,来完成挂卷、卸卷以及更多的涉及到卷操作的各种虚拟机生命周期管理功能。

也即达到如下目标:

  • 最大化的与nova当前的cinder流程保持兼容,可以不经改动支持现有的各种涉及到volume的操作
  • 尽量不增加新的配置项(多使用已有的配置项,但配置项的功能有少许差异),减少SA的运维工作
  • 减少私有代码量,尽量重用nova中已有实现,以减少代码维护工作量
  • 为后续涉及到volume的新功能开发做准备,比如从volume启动虚拟机,在线迁移虚拟机等功能
  • 使用更多的nova流程也可以利用社区的力量来帮我们完成一定的开发测试工作,我们也可以把自己的代码贡献到社区

 

3.      当前实现

直接拿代码来说明,这里是F版本移植H版本首次提交gerrit记录:https://scm.service.163.org/#/c/2049/18,下面截取部分代码段进行说明:

nova/api/openstack/compute/contrib/volumes.py:

@wsgi.serializers(xml=VolumeAttachmentTemplate)

def create(self, req, server_id, body):

“””Attach a volume to an instance.”””

# Go to our owned process if we are attaching a nbs volume

if CONF.ebs_backend == ‘nbs’:

return self._attach_nbs_volume(req, server_id, body)

 

nova/compute/api.py:

@check_instance_lock

@check_instance_state(vm_state=[vm_states.ACTIVE, vm_states.PAUSED,

vm_states.SUSPENDED, vm_states.STOPPED,

vm_states.RESIZED, vm_states.SOFT_DELETED],

task_state=None)

def attach_nbs_volume(self, context, instance, volume_id):

“””Attach an existing volume to an existing instance.”””

# NOTE(vish): This is done on the compute host because we want

#             to avoid a race where two devices are requested at

#             the same time. When db access is removed from

#             compute, the bdm will be created here and we will

#             have to make sure that they are assigned atomically.

 

# Raise exception if the instance is forbidden to attach volume.

allow_attach = self.check_allow_attach(context, instance)

if not allow_attach:

raise exception.NbsAttachForbidden()

 

# Check volume exists and is available to attach

# FIXME(wangpan): we just deal with single attachment status now

try:

volume = self.nbs_api.get(context, volume_id)[‘volumes’][0]

except (IndexError, KeyError, TypeError):

raise exception.VolumeNotFound(volume_id=volume_id)

 

nova/compute/manager.py:

@reverts_task_state

@wrap_instance_fault

def attach_nbs_volume(self, context, volume_id, device, instance):

“””Attach a nbs volume to an instance.”””

# TODO(wangpan): if this host is forbidden to attach nbs volume,

#                an exception needs to be raised.

try:

return self._attach_nbs_volume(context, volume_id,

device, instance)

except Exception:

with excutils.save_and_reraise_exception():

capi = self.conductor_api

capi.block_device_mapping_destroy_by_instance_and_volume(

context, instance, volume_id)

 

nova/virt/libvirt/driver.py:

def attach_nbs_volume(self, instance_name, device, host_dev,

qos_info, volume_id):

“””

Attach a nbs volume to instance, and check the device or slot is

in-use, return retry if in-use, if need retry, the used device is

returned, too.

“””

target_dev = device[‘mountpoint’].rpartition(“/”)[2]

 

conf = vconfig.LibvirtConfigGuestDisk()

conf.source_type = “block”

conf.driver_name = libvirt_utils.pick_disk_driver_name(

self.get_hypervisor_version(), is_block_dev=True)

conf.driver_format = “raw”

 

nova/compute/manager.py:

def _finish_resize():

……

nbs = (CONF.ebs_backend == ‘nbs’)

block_device_info = self._get_instance_volume_block_device_info(

context, instance, refresh_conn_info=True, is_nbs=nbs)

 

# re-attach nbs volumes if needed.

if nbs:

bdms = block_device_info.get(‘block_device_mapping’, [])

else:

bdms = []

same_host = (migration[‘source_compute’] == migration[‘dest_compute’])

# call nbs to re-attach volumes to this host if it is not resize to

# same host.

if nbs and bdms and not same_host:

host_ip = utils.get_host_ip_by_ifname(CONF.host_ip_ifname)

for bdm in bdms:

 

4.      改进实现

还是直接拿代码来说明,这里的提交是原型试验代码,有很多细节问题没有处理,但已经可以正常工作:https://scm.service.163.org/#/c/3090/

这次的实现是增加了类似cinder的插件,主要增加了nova/volume/nbs.py模块,它的功能是模仿同一目录下的cinder.py来实现的,这样nova的其他代码只要把默认的volume api从cinder改为nbs,就可以直接调用,它主要调用了nova/volume/nbs_client.py,这个文件是之前已经实现了的,主要用来调用nbs的api,跟nbs服务打交道,实现查询卷信息、挂载卷到宿主机、从宿主机上卸载卷等各种需要nbs完成的操作;另外还在nova/virt/libvirt/volume.py模块里面增加了LibvirtNBSVolumeDriver,这个类主要是为libvirt生成挂盘所需要的xml文件,由于实际底层挂卷到宿主机是nbs的agent来负责的,所以这部分功能也不用加到这里了(其他volume服务有些是需要的)。

简单来说就是主要增加nbs交互的前端模块(与nbs api交互)和半个后端模块(与libvirt交互,缺少了nbs自己维护的agent那部分功能)。

nova/volume/__init__.py:

_volume_opts = [

oslo.config.cfg.StrOpt(‘ebs_backend’,

default=’cinder’,

help=’The backend type of ebs service, ‘

‘should be nbs or cinder’),

oslo.config.cfg.StrOpt(‘volume_api_class’,

default=’nova.volume.cinder.API’,

help=’The full class name of the ‘

‘volume API class to use’),

]

 

“””

Handles all requests relating to volumes + nbs.

“””

 

import datetime

 

from nova.db import base

from nova import exception

from nova.openstack.common.gettextutils import _

from nova.openstack.common import jsonutils

from nova.openstack.common import log as logging

from nova.volume import nbs_client

 

 

LOG = logging.getLogger(__name__)

 

 

NBS_CLIENT = None

 

 

def nbsclient():

global NBS_CLIENT

if NBS_CLIENT is None:

NBS_CLIENT = nbs_client.API()

 

return NBS_CLIENT

 

 

def _untranslate_volume_summary_view(context, vol):

“””Maps keys for volumes summary view.”””

d = {}

d[‘id’] = vol[‘volumeId’]

d[‘status’] = vol[‘status’]

d[‘size’] = vol[‘size’]

d[‘availability_zone’] = vol[‘availabilityZone’]

created_at = long(vol[‘createTime’]) / 1000

created_at = datetime.datetime.utcfromtimestamp(created_at)

created_at = created_at.strftime(“%Y-%m-%d %H:%M:%S”)

d[‘created_at’] = created_at

 

d[‘attach_time’] = “”

d[‘mountpoint’] = “”

 

if vol[‘attachments’]:

att = vol[‘attachments’][0]

d[‘attach_status’] = att[‘status’]

d[‘instance_uuid’] = att[‘instanceId’]

d[‘mountpoint’] = att[‘device’]

d[‘attach_time’] = att[‘attachTime’]

else:

d[‘attach_status’] = ‘detached’

 

d[‘display_name’] = vol[‘volumeName’]

d[‘display_description’] = vol[‘volumeName’]

 

# FIXME(wangpan): all nbs volumes are ‘share’ type, so we fix here to 0

d[‘volume_type_id’] = 0

d[‘snapshot_id’] = vol[‘snapshotId’]

 

# NOTE(wangpan): nbs volumes don’t have metadata attribute

d[‘volume_metadata’] = {}

# NOTE(wangpan): nbs volumes don’t have image metadata attribute now

 

return d

 

 

class API(base.Base):

“””API for interacting with the volume manager.”””

 

def get(self, context, volume_id):

item = nbsclient().get(context, volume_id)[‘volumes’][0]

return _untranslate_volume_summary_view(context, item)

 

def get_all(self, context, search_opts={}):

items = nbsclient().get(context)[‘volumes’]

rval = []

 

for item in items:

rval.append(_untranslate_volume_summary_view(context, item))

 

return rval

 

def check_attached(self, context, volume):

“””Raise exception if volume not in use.”””

if volume[‘status’] != “in-use”:

msg = _(“status must be ‘in-use'”)

raise exception.InvalidVolume(reason=msg)

 

def check_attach(self, context, volume, instance=None):

# TODO(vish): abstract status checking?

if volume[‘status’] != “available”:

msg = _(“status must be ‘available'”)

raise exception.InvalidVolume(reason=msg)

if volume[‘attach_status’] in (“attached”, “attachedInVM”):

msg = _(“already attached”)

raise exception.InvalidVolume(reason=msg)

 

def check_detach(self, context, volume):

# TODO(vish): abstract status checking?

if volume[‘status’] == “available”:

msg = _(“already detached”)

raise exception.InvalidVolume(reason=msg)

if volume[‘attach_status’] not in (“attached”, “attachedInVM”):

msg = _(“volume not attached”)

raise exception.InvalidVolume(reason=msg)

 

def reserve_volume(self, context, volume_id):

“””We do not need to reserve nbs volume now.”””

pass

 

def unreserve_volume(self, context, volume_id):

“””We do not need to unreserve nbs volume now.”””

pass

 

def begin_detaching(self, context, volume_id):

“””We do not need to notify nbs begin detaching volume now.”””

pass

 

def roll_detaching(self, context, volume_id):

“””We do not need to roll detaching nbs volume now.”””

pass

 

def attach(self, context, volume_id, instance_uuid, mountpoint):

“””We do not need to change volume state now.

 

We implement this operation in volume driver.

“””

pass

 

def post_attach(self, context, volume_id, instance_uuid,

mountpoint, host_ip):

“””Tell NBS manager attachment success.”””

device = jsonutils.loads(mountpoint)

return nbsclient().notify_nbs_libvirt_result(context, volume_id,

‘attach’, True, device=device[‘real_path’],

host_ip=host_ip, instance_uuid=instance_uuid)

 

def detach(self, context, volume_id):

“””We do not need to change volume state now.

 

We implement this operation in volume driver.

“””

pass

 

def initialize_connection(self, context, volume_id, connector):

“””We do attachment of nbs volume to host in the method.”””

instance_uuid = connector[‘instance_uuid’]

host_ip = connector[‘ip’]

device = jsonutils.loads(connector[‘device’])

real_path = device[‘real_path’]

 

result = nbsclient().attach(context, volume_id, instance_uuid,

host_ip, real_path)

if result is None:

raise exception.NbsException()

 

# check volume status, wait for nbs attaching finish

succ = nbsclient().wait_for_attached(context, volume_id,

instance_uuid)

if not succ:

raise exception.NbsTimeout()

 

# get host dev path and QoS params from nbs

return nbsclient().get_host_dev_and_qos_info(

context, volume_id, host_ip)

 

def terminate_connection(self, context, volume_id, connector):

“””We do detachment of nbs volume from host in the method.”””

host_ip = connector[‘ip’]

return nbsclient().detach(context, volume_id, host_ip)

 

def migrate_volume_completion(self, context, old_volume_id, new_volume_id,

error=False):

raise NotImplementedError()

 

def create(self, context, size, name, description, snapshot=None,

image_id=None, volume_type=None, metadata=None,

availability_zone=None):

“””We do not support create nbs volume now.”””

raise NotImplementedError()

 

def delete(self, context, volume_id):

“””We do not support delete nbs volume now.”””

raise NotImplementedError()

 

def update(self, context, volume_id, fields):

raise NotImplementedError()

 

def get_snapshot(self, context, snapshot_id):

“””We do not support nbs volume snapshot now.”””

raise NotImplementedError()

 

def get_all_snapshots(self, context):

“””We do not support nbs volume snapshot now.”””

raise NotImplementedError()

 

def create_snapshot(self, context, volume_id, name, description):

“””We do not support nbs volume snapshot now.”””

raise NotImplementedError()

 

def create_snapshot_force(self, context, volume_id, name, description):

“””We do not support nbs volume snapshot now.”””

raise NotImplementedError()

 

def delete_snapshot(self, context, snapshot_id):

“””We do not support nbs volume snapshot now.”””

raise NotImplementedError()

 

def get_volume_encryption_metadata(self, context, volume_id):

“””We do not support for encrypting nbs volume snapshot now.”””

return {}

 

def get_volume_metadata(self, context, volume_id):

raise NotImplementedError()

 

def delete_volume_metadata(self, context, volume_id, key):

raise NotImplementedError()

 

def update_volume_metadata(self, context, volume_id,

metadata, delete=False):

raise NotImplementedError()

 

def get_volume_metadata_value(self, volume_id, key):

raise NotImplementedError()

 

def update_snapshot_status(self, context, snapshot_id, status):

“””We do not support nbs volume snapshot now.”””

raise NotImplementedError()

 

nova/virt/libvirt.py:

cfg.ListOpt(‘libvirt_volume_drivers’,

default=[

‘iscsi=nova.virt.libvirt.volume.LibvirtISCSIVolumeDriver’,

‘iser=nova.virt.libvirt.volume.LibvirtISERVolumeDriver’,

‘local=nova.virt.libvirt.volume.LibvirtVolumeDriver’,

‘fake=nova.virt.libvirt.volume.LibvirtFakeVolumeDriver’,

‘rbd=nova.virt.libvirt.volume.LibvirtNetVolumeDriver’,

‘sheepdog=nova.virt.libvirt.volume.LibvirtNetVolumeDriver’,

‘nfs=nova.virt.libvirt.volume.LibvirtNFSVolumeDriver’,

‘aoe=nova.virt.libvirt.volume.LibvirtAOEVolumeDriver’,

‘glusterfs=’nova.virt.libvirt.volume.LibvirtGlusterfsVolumeDriver’,

‘fibre_channel=nova.virt.libvirt.volume.LibvirtFibreChannelVolumeDriver’,

‘scality=nova.virt.libvirt.volume.LibvirtScalityVolumeDriver’,

‘nbs=nova.virt.libvirt.volume.LibvirtNBSVolumeDriver’,

],

help=’Libvirt handlers for remote volumes.’),

 

nova/virt/libvirt/volume.py:

class LibvirtNBSVolumeDriver(LibvirtBaseVolumeDriver):

“””Driver to attach NetEase Block Service volume to libvirt.”””

 

def __init__(self, connection):

“””Create back-end to NBS.”””

super(LibvirtNBSVolumeDriver,

self).__init__(connection, is_block_dev=False)

 

def connect_volume(self, connection_info, disk_info):

“””Returns xml for libvirt.”””

import ipdb;ipdb.set_trace()

conf = super(LibvirtNBSVolumeDriver,

self).connect_volume(connection_info,

disk_info)

 

conf.source_type = ‘block’

conf.source_path = connection_info[‘host_dev’]

conf.slot = disk_info[‘device’][‘slot’]

 

return conf

 

def disconnect_volume(self, connection_info, disk_dev):

“””Disconnect the volume.”””

pass

 

5.      剩余工作

剩下的工作主要包括:

  • 完善nbs插件代码:主要是要确认不同操作的时候试验代码能否满足要求,比如卸载nbs卷操作是否能用试验代码来完成
  • 清理旧的实现:包括挂载卷、卸载卷两个主要功能,以及较多的保证兼容性的冗余代码
  • 细节完善:包括为nbs新增的部分功能(带slot号挂载卷、支持卷的QoS设置、相关通知操作等),以及补充相关bug修复代码到nova的各种卷操作流程(如防止频繁挂卸载卷等)
  • 外围功能验证:支持带nbs情况下resize、离线迁移、强制重启、查询虚拟机详细信息可显示挂载的卷等功能
  • 兼容性验证及完善:支持带nbs情况下的其他功能如shelve、resume等等,以及已有nbs卷的兼容性问题(要支持虚拟机上挂载已有卷的情况下,使用修改后的代码完成各种生命周期操作,以及卸载卷操作等)

 

云计算与虚拟化

PPT下载地址:

github: https://github.com/aspirer/docfiles/blob/master/%E4%BA%91%E8%AE%A1%E7%AE%97%E4%B8%8E%E8%99%9A%E6%8B%9F%E5%8C%96.pptx?raw=true

内容:
}什么是云计算
}云计算发展现状
}云计算面对的问题
}什么是虚拟化
}主流虚拟化技术
}虚拟化中间件
}虚拟化技术面对的问题

 

之后又补充了和VMware的对比内容:

OpenStack vs VMware

PPT下载地址:

基于qemu guest agent的openstack kvm虚拟机监控

通过libvirt来实现在宿主机上获取kvm虚拟机内部的运行数据信息,python编写,使用了libvirt_qemu库和libvirt两个python库,除了qemu guest agent之外不需要在虚拟机内部跑其他agent即可获取大多数运维所需的监控项。
别的不多说,直接上源码。
github源码:
qemu guest agent相关patch(获取磁盘设备的realpath,以及获取指定目录所在磁盘的空间信息):
如果你的虚拟机是debian os,并且在虚拟机内安装了1.5.0+dfsg-5版本的qemu guest agent,可以直接用下面这个可执行的二进制文件替换掉/usr/sbin/qemu-ga,即可使用这两个patch的功能。
支持的监控项包括:
<metric name=”cpuUsage” unit=”Percent”/>
<metric name=”memUsage” unit=”Megabytes”/>
<metric name=”networkReceive” unit=”Kilobytes/Second”/>
<metric name=”networkTransfer” unit=”Kilobytes/Second”/>
<metric name=”diskUsage” unit=”Megabytes”/>
<metric name=”diskReadRequest” unit=”Count/Second”/>
<metric name=”diskWriteRequest” unit=”Count/Second”/>
<metric name=”diskReadRate” unit=”Kilobytes/Second”/>
<metric name=”diskWriteRate” unit=”Kilobytes/Second”/>
<metric name=”diskReadDelay” unit=”Milliseconds/Count”/>
<metric name=”diskWriteDelay” unit=”Milliseconds/Count”/>
<metric name=”diskPartition” unit=”all partitions infos”/>
<metric name=”loadavg_5″ unit=”Percent”/>
<metric name=”memUsageRate” unit=”Percent”/>

qemu-monitor-command & qemu monitor in openstack

instance-name.monitor的用途
首先我们注意到openstack中创建的每台KVM虚拟机,都会在/var/lib/libvirt/qemu/目录下生成一个instance-name.monitor socket文件
这个文件的用途是什么?
libvirt.xml里面以及用virsh dumpxml命令都看不到这个文件的配置,它是怎么创建出来的?
wangpan@xxyyzz8:~$ sudo ls /var/lib/libvirt/qemu/ l
srwxrxrx 1 libvirtqemu kvm             0 Aug 30  2012 instance000000a4.monitor
srwxrxrx 1 libvirtqemu kvm             0 Sep 17  2012 instance000001ec.monitor
srwxrxrx 1 libvirtqemu kvm             0 Oct  9  2012 instance00000217.monitor
用途:用来查看qemu运行状态以及与qemu进行交互,管理虚拟机,提供类似qemu命令行方式启动虚拟机的与虚拟机的交互操作
怎么创建的:可以查看libvirt的运行时配置文件/var/run/libvirt/qemu/instance-name.xml,打开可以看到如下内容:
<monitor path=’/var/lib/libvirt/qemu/instance-00000611.monitor’ json=’1′ type=’unix’/>
但是我们在openstack生成的libvirt.xml中又看不到这项配置,那它到底是怎么生成的?
普通列表项目看了libvirt的源码,在创建虚拟机的qemuProcessstart()的时候会先准备qemu监控字符设备‘MonitorChr’:
VIR_DEBUG(“Preparing monitor state”);
if (qemuProcessPrepareMonitorChr(driver, priv->monConfig, vm->def->name) < 0)
    goto cleanup;
这里就会增加monitor字符设备,类型为unix domain socket,文件路径为libvirt libdir+vm-name+.monitor,具体代码如下:
int
qemuProcessPrepareMonitorChr(struct qemud_driver *driver,
                             virDomainChrSourceDefPtr monConfig,
                             const char *vm)
{
    monConfig->type = VIR_DOMAIN_CHR_TYPE_UNIX;
    monConfig->data.nix.listen = true;
    if (virAsprintf(&monConfig->data.nix.path, “%s/%s.monitor”,
                    driver->libDir, vm) < 0) {
        virReportOOMError();
        return 1;
    }
    return 0;
}
最后libvirt生成qemu命令行参数的时候会把这个配置加进去:
VIR_DEBUG(“Building emulator command line”);
if (!(cmd = qemuBuildCommandLine(conn, driver, vm->def, priv->monConfig,
                                 priv->monJSON != 0, priv->qemuCaps,
                                 migrateFrom, stdin_fd, snapshot, vmop)))
    goto cleanup;
qemuBuildCommandLine():
……
if (monitor_chr) { /*monitor_chr就是priv->monConfig,所以这里走if分支*/
      char *chrdev;
      /* Use -chardev if it’s available */
      if (qemuCapsGet(qemuCaps, QEMU_CAPS_CHARDEV)) {
          virCommandAddArg(cmd, “-chardev”);
          if (!(chrdev = qemuBuildChrChardevStr(monitor_chr, “monitor”,
                                                qemuCaps)))
              goto error;
          virCommandAddArg(cmd, chrdev);
          VIR_FREE(chrdev);
          virCommandAddArg(cmd, “-mon”);
          virCommandAddArgFormat(cmd,
                                 “chardev=charmonitor,id=monitor,mode=%s”,
                                 monitor_json ? “control” : “readline”);
      } else {
          const char *prefix = NULL;
          if (monitor_json)
              prefix = “control,”;
          virCommandAddArg(cmd, “-monitor”);
          if (!(chrdev = qemuBuildChrArgStr(monitor_chr, prefix)))
              goto error;
          virCommandAddArg(cmd, chrdev);
          VIR_FREE(chrdev);
      }
  }
最终生成的qemu命令行信息为:
-chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-00000611.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control
使用qemu-monitor-command查询、管理虚拟机
libvirt提供了qemu-monitor-command命令行及API,下面以命令行方式为例进行介绍:
$ sudo virsh help qemumonitorcommand
NAME
  qemumonitorcommand QEMU Monitor Command
SYNOPSIS
  qemumonitorcommand <domain> [–hmp] {[–cmd] <string>}…
DESCRIPTION
  QEMU Monitor Command
OPTIONS
  [–domain] <string>  domain name, id or uuid
  hmp            command is in human monitor protocol
  [–cmd] <string>  command
$ sudo virsh qemumonitorcommand ${instancename} hmp ‘info commands’
info balloon   show balloon information
info block   show the block devices
info blockjobs   show progress of ongoing block device operations
info blockstats   show block device statistics
info capture   show capture information
info chardev   show the character devices
info cpus   show infos for each CPU
info history   show the command line history
info irq   show the interrupts statistics (if available)
info jit   show dynamic compiler info
info kvm   show KVM information
info mem   show the active virtual memory mappings
info mice   show which guest mouse is receiving events
info migrate   show migration status
info mtree   show memory tree
info name   show the current VM name
info network   show the network state
info numa   show NUMA information
info pci   show PCI info
info pcmcia   show guest PCMCIA status
info pic   show i8259 (PIC) state
info profile   show profiling information
info qdm   show qdev device model list
info qtree   show device tree
info registers   show the cpu registers
info roms   show roms
info snapshots   show the currently saved VM snapshots
info spice   show the spice server status
info status   show the current VM status (running|paused)
info tlb   show virtual to physical memory mappings
info traceevents   show available traceevents & their state
info usb   show guest USB devices
info usbhost   show host USB devices
info usernet   show user network stack connection states
info uuid   show the current VM UUID
info version   show the version of QEMU
info vnc   show the vnc server status
$ sudo virsh qemumonitorcommand instance00000611 hmp ‘info vnc’
Server:
     address: 114.113.199.8:5900
        auth: none
Client: none
还有一些其他的管理类的操作,这里不一一列举,可参考:
另外在openstack Folsom版本中没有看到有用到这个monitor,没有看到这个Libvirt API被调用,不知道后续会不会有用处?个人感觉应该不会用到,因为这个monitor是libvirt自己生成的,跟openstack其实没有关系。

基于Qemu guest agent的kvm虚拟机监控调研

原文地址:http://aspirer2004.blog.163.com/blog/static/106764720136845622508/

文档地址:

Qemu guest agent调研

Qemu guest agent调研···· 1

  1. 原理分析···· 2
  2. 实现程度···· 3

2.1        已有功能··· 3

2.2        功能扩展方式··· 4

  1. 社区活跃度···· 7
  2. 实现监控方案的可行性···· 7

4.1        监控方案现状··· 7

4.2        当前方案存在的问题··· 7

4.3        采用qga方式的监控方案··· 8

4.4        改为qga方式需要做的工作··· 9

4.5        qga方式存在的问题··· 9

  1. 对网易私有云项目的其他用途···· 9

5.1        云主机内部操作系统状态检查··· 9

5.2        冻结云主机内部文件系统··· 9

  1. 类似agent比较···· 10

6.1        ovirt-guest-agent 10

 

 

 

 

 

 

 

 

 

 

 

本文主要是对Qemu guest agent调研结果的总结,主要内容如下:

  • Qemu guest agent的原理
  • Qemu guest agent的实现程度
  • Qemu guest agent的社区活跃度
  • 基于Qemu guest agent实现监控方案的可行性分析
  • Qemu guest agent对网易私有云项目的其他用途
  • Qemu guest agent风险分析

为了叙述方便,下文中将用qga来代替Qemu guest agent。

  1. 原理分析

qga是一个运行在虚拟机内部的普通应用程序(可执行文件名称默认为qemu-ga,服务名称默认为qemu-guest-agent),其目的是实现一种宿主机和虚拟机进行交互的方式,这种方式不依赖于网络,而是依赖于virtio-serial(默认首选方式)或者isa-serial,而QEMU则提供了串口设备的模拟及数据交换的通道,最终呈现出来的是一个串口设备(虚拟机内部)和一个unix socket文件(宿主机上)。

qga通过读写串口设备与宿主机上的socket通道进行交互,宿主机上可以使用普通的unix socket读写方式对socket文件进行读写,最终实现与qga的交互,交互的协议与qmp(QEMU Monitor Protocol)相同(简单来说就是使用JSON格式进行数据交换),串口设备的速率通常都较低,所以比较适合小数据量的交换。

QEMU virtio串口设备模拟参数:

/usr/bin/kvm(QEMU) \

……\

-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 \

-device isa-serial,chardev=charserial1,id=serial1 \

-chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/test.agent,server,nowait \

-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,\

name=com.163.spice.0

通过上面的参数就可以在宿主机上生成一个unix socket文件,路径为:/var/lib/libvirt/qemu/test.agent,同时在虚拟机内部生成一个serial设备,名字为com.163.spice.0,设备路径为:/dev/vport0p1,映射出来的可读性比较好的路径为:/dev/virtio-ports/com.163.spice.0,可以在运行qga的时候通过-p参数指定读写这个设备。

Libvirt支持QEMU串口相关配置,所以上述参数已经可以通过libvirt进行配置,且更简单直观,配置方式如下:

<channel type=’unix’>

<source mode=’bind’ path=’/var/lib/libvirt/qemu/test.agent’/>

<target type=’virtio’ name=’com.163.spice.0’/>

</channel>

需要注意的是libvirt-qemu:kvm用户要有权限读写’/var/lib/libvirt/qemu/test.agent’。

参考资料:http://wiki.qemu.org/Features/QAPI/GuestAgenthttp://wiki.libvirt.org/page/Qemu_guest_agent

  1. 实现程度
  • 已有功能

目前qga最新版本为1.5.50,linux已经实现下面的所有功能,windows仅支持加*的那些功能:

  • guest-sync-delimited*:宿主机发送一个int数字给qga,qga返回这个数字,并且在后续返回字符串响应中加入ascii码为0xff的字符,其作用是检查宿主机与qga通信的同步状态,主要用在宿主机上多客户端与qga通信的情况下客户端间切换过程的状态同步检查,比如有两个客户端A、B,qga发送给A的响应,由于A已经退出,目前B连接到qga的socket,所以这个响应可能被B收到,如果B连接到socket之后,立即发送该请求给qga,响应中加入了这个同步码就能区分是A的响应还是B的响应;在qga返回宿主机客户端发送的int数字之前,qga返回的所有响应都要忽略;
  • guest-sync*:与上面相同,只是不在响应中加入0xff字符;
  • guest-ping*:Ping the guest agent, a non-error return implies success;
  • guest-get-time*:获取虚拟机时间(返回值为相对于1970-01-01 in UTC,Time in nanoseconds.);
  • guest-set-time*:设置虚拟机时间(输入为相对于1970-01-01 in UTC,Time in nanoseconds.);
  • guest-info*:返回qga支持的所有命令;
  • guest-shutdown*:关闭虚拟机(支持halt、powerdown、reboot,默认动作为powerdown);
  • guest-file-open:打开虚拟机内的某个文件(返回文件句柄);
  • guest-file-close:关闭打开的虚拟机内的文件;
  • guest-file-read:根据文件句柄读取虚拟机内的文件内容(返回base64格式的文件内容);
  • guest-file-write:根据文件句柄写入文件内容到虚拟机内的文件;
  • guest-file-seek:Seek to a position in the file, as with fseek(), and return the current file position afterward. Also encapsulates ftell()’s functionality, just Set offset=0, whence=SEEK_CUR;
  • guest-file-flush:Write file changes bufferred in userspace to disk/kernel buffers;
  • guest-fsfreeze-status:Get guest fsfreeze state. error state indicates;
  • guest-fsfreeze-freeze:Sync and freeze all freezable, local guest filesystems;
  • guest-fsfreeze-thaw:Unfreeze all frozen guest filesystems;
  • guest-fstrim:Discard (or “trim”) blocks which are not in use by the filesystem;
  • guest-suspend-disk*:Suspend guest to disk;
  • guest-suspend-ram*:Suspend guest to ram;
  • guest-suspend-hybrid:Save guest state to disk and suspend to ram(This command requires the pm-utils package to be installed in the guest.);
  • guest-network-get-interfaces:Get list of guest IP addresses, MAC addresses and netmasks;
  • guest-get-vcpus:Retrieve the list of the guest’s logical processors;
  • guest-set-vcpus:Attempt to reconfigure (currently: enable/disable) logical processors inside the guest。

参考资料:源码

http://git.qemu.org/?p=qemu.git;a=blob;f=qga/qapi-schema.json;h=7155b7ab55fc4ef5336fd771ca06905c485fad62;hb=refs/heads/master

  • 功能扩展方式

qga功能扩展十分方便,只需要在qapi-schema.json文件中定义好功能名称、输入输出数据类型,然后在commands-posix.c里面增加对应的功能函数即可,下面的补丁即在qga中增加一个通过statvfs获取虚拟机磁盘空间信息的功能:

diff –git a/qga/commands-posix.c b/qga/commands-posix.c

index e199738..2f42a2f 100644

— a/qga/commands-posix.c

+++ b/qga/commands-posix.c

@@ -21,6 +21,7 @@

#include <stdio.h>

#include <string.h>

#include <sys/stat.h>

+#include <sys/statvfs.h>

#include <inttypes.h>

#include “qga/guest-agent-core.h”

#include “qga-qmp-commands.h”

@@ -1467,6 +1468,36 @@ void qmp_guest_fstrim(bool has_minimum, int64_t minimum, Error **err)

}

#endif

 

+GuestFileSystemStatistics *qmp_guest_get_statvfs(const char *path, Error **errp)

+{

+    int ret;

+    GuestFileSystemStatistics *fs_stat;

+    struct statvfs *buf;

+    buf = g_malloc0(sizeof(struct statvfs));

+

+

+    ret = statvfs(path, buf);

+    if (ret < 0) {

+       error_setg_errno(errp, errno, “Failed to get statvfs”);

+       return NULL;

+    }

+

+    fs_stat = g_malloc0(sizeof(GuestFileSystemStatistics));

+    fs_stat->f_bsize = buf->f_bsize;

+    fs_stat->f_frsize = buf->f_frsize;

+    fs_stat->f_blocks = buf->f_blocks;

+    fs_stat->f_bfree = buf->f_bfree;

+    fs_stat->f_bavail = buf->f_bavail;

+    fs_stat->f_files = buf->f_files;

+    fs_stat->f_ffree = buf->f_ffree;

+    fs_stat->f_favail = buf->f_favail;

+    fs_stat->f_fsid = buf->f_fsid;

+    fs_stat->f_flag = buf->f_flag;

+    fs_stat->f_namemax = buf->f_namemax;

+

+    return fs_stat;

+}

+

/* register init/cleanup routines for stateful command groups */

void ga_command_state_init(GAState *s, GACommandState *cs)

{

diff –git a/qga/qapi-schema.json b/qga/qapi-schema.json

index 7155b7a..a071c3f 100644

— a/qga/qapi-schema.json

+++ b/qga/qapi-schema.json

@@ -638,3 +638,52 @@

{ ‘command’: ‘guest-set-vcpus’,

‘data’:    {‘vcpus’: [‘GuestLogicalProcessor’] },

‘returns’: ‘int’ }

+

+##

+# @GuestFileSystemStatistics:

+#

+# Information about guest file system statistics.

+#

+# @f_bsize: file system block size.

+#

+# @f_frsize: fragment size.

+#

+# @f_blocks: size of fs in f_frsize units.

+#

+# @f_bfree: free blocks.

+#

+# @f_bavail: free blocks for non-root.

+#

+# @f_files: inodes.

+#

+# @f_ffree: free inodes.

+#

+# @f_favail: free inodes for non-root.

+#

+# @f_fsid: file system id.

+#

+# @f_flag: mount flags

+#

+# @f_namemax: maximum filename length.

+#

+# Since 1.5.10(NetEase)

+##

+{ ‘type’: ‘GuestFileSystemStatistics’,

+  ‘data’: { ‘f_bsize’: ‘int’, ‘f_frsize’: ‘int’, ‘f_blocks’: ‘int’,

+            ‘f_bfree’: ‘int’, ‘f_bavail’: ‘int’, ‘f_files’: ‘int’,

+            ‘f_ffree’: ‘int’, ‘f_favail’: ‘int’, ‘f_fsid’: ‘int’,

+            ‘f_flag’: ‘int’, ‘f_namemax’: ‘int’} }

+

+##

+# @guest-get-statvfs:

+#

+# Get the information about guest file system statistics by statvfs.

+#

+# Returns: @GuestFileSystemStatistics.

+#

+# Since 1.5.10(NetEase)

+##

+{ ‘command’: ‘guest-get-statvfs’,

+  ‘data’:    { ‘path’: ‘str’ },

+  ‘returns’: ‘GuestFileSystemStatistics’ }

+

中间复杂的类型定义代码,以及头文件包含关系处理都由一个python脚本在编译的时候动态生成出来,这对开发人员来说是非常方便的,开发人员在扩展功能的时候只需要关注输入、输出的数据类型,以及功能的函数内容即可。

参考资料:源码及guest-get-time功能提交记录

http://git.qemu.org/?p=qemu.git;a=commitdiff;h=6912e6a94cb0a1d650271103efbc3ac2299e4fd0

  1. 社区活跃度

QEMU社区从2011年7月20号开始在QEMU代码仓库中增加qga功能,最近一次提交在2013年5月18号,总共有100多次提交记录,代码维护人员主要来自redhat和IBM,社区的活跃度不高,但是QEMU本身的提交记录从2003年至今已有27200多条,还是比较活跃的,qga的功能及代码都比较简单,也是活跃度不高的一个重要原因。

QEMU代码仓库地址:git clone git://git.qemu-project.org/qemu.git

qga代码位于QEMU代码的根目录下的qga目录中。

参考资料:代码仓库git log

  1. 实现监控方案的可行性
  • 监控方案现状

目前云主机监控的实现方法是,在创建云主机的过程中,增加监控脚本及其配置文件、定时任务及监控信息推送配置文件的注入过程,包括四个文件,其中监控信息推送配置文件(/etc/vm_monitor/info)由管理平台根据云主机所属用户的注册信息以及监控平台相关配置生成,并传入创建云主机的API来实现文件的注入,监控脚本(/etc/vm_monitor/send_monitor_data.py)及其配置文件(/etc/vm_monitor/monitor_settings.xml)、定时任务文件(/etc/cron.d/inject_cron_job)是包含在NVS经过base64编码后的监控脚本文件inject_files.json中。

工作模式为,在root账户增加定时任务inject_cron_job,其中有一条任务为:root su -c ‘python /etc/vm_monitor/send_monitor_data.py’ > /dev/null 2>&1,也即每60s收集并推送一次监控信息给监控平台。

  • 当前方案存在的问题
  • 依赖云主机内部的python解释器
  • 云主机必须存在root账户
  • 依赖NVS文件注入功能;并且为了注入这些监控文件对nova的改动也比较大,也无法与社区同步;windows镜像也会注入这些无用的文件,可能导致一些意想不到的问题;另外如果有的镜像的操作系统不在第一个分区上,则注入的监控文件会失效
  • 已经运行的云主机内部的监控相关文件更新困难,导致新监控项的添加、推送周期、推送地址等的修改也比较困难,灵活性较差
  • Nova中base64编码的注入脚本的代码可读性很差,代码更新及维护困难
  • 定位问题一般都需要登录到云主机内部进行,对于采用密钥对登录的云主机来说定位问题比较困难
  • 采用qga方式的监控方案

首先为每个云主机增加virtio-serial的配置,这个只需要修改生成libvirt配置文件的代码即可,并且应该可以提交给社区;其次需要在虚拟机内部安装qga服务;最后需要在宿主机上新增一个服务进程(这里暂定为monitor服务),用来通过与qga交互从云主机内部获取监控信息;总的模块交互流程如下:

图表 1 云主机创建流程中的监控相关操作

monitor服务单次监控信息获取及推送流程如下:

图表 2 单次监控数据获取及推送流程图

  • 改为qga方式需要做的工作
  • 需要扩展qga的功能,增加获取文件系统信息的功能(1人天)
  • 为qga打包的相关工作(3~5人天)
  • 需要重构监控信息获取脚本,把读本地/proc目录相关文件改为通过qga读文件(3~5人天)
  • 计算节点新增monitor服务(5~10人天)
  • nova代码修改,包括新增virtio-serial配置生成流程以及去掉原有的监控文件注入流程(3人天)
  • 已有镜像需要更新,安装qga(1人天)
  • qga方式存在的问题
  • 与当前方案的兼容性问题:已创建的云主机仍然采用原有的监控方式进行数据推送,新创建的云主机则采用qga方式进行监控数据的推送(可以通过检查宿主机上没有socket文件来区分新创建还是已有云主机);监控项保持不变;
  • qga升级问题:除非需要增加读取云主机内部相关文件不能获取到相关信息的监控项,否则不需要特别对qga进行升级;如果确实需要升级,可以在云主机内部配置包含新版本qga的apt相关源,通过apt方式进行升级;
  • qga稳定性问题:已经简单对其进行过测试,文件读取功能稳定性没有发现问题(每秒读取一次,连续运行超过24小时,测试代码:https://github.com/aspirer/study/blob/master/qemu-guest-agent/poll_qemu_guest_agent.py
  • 监控项扩展问题:在不需要对qga进行升级的情况下,只需要升级monitor服务即可(已有云主机也不需要改动);如果需要同时升级qga(这种情况比较麻烦,如果不升级qga则新的监控项无法推送);
  • 安全问题:主要包括暴露在计算节点的socket文件以及暴露在云主机内部的virtio-serial,这两个文件或设备的安全性问题;外部的socket文件与云主机镜像文件类似,都是有权限控制的,普通用户无法访问;云主机内部的virtio-serial设备可能会被用户攻击,但由于其速率较低,所以对宿主机影响不大,主要的问题可能是用户会伪造监控文件,然后通过virtio-serial设备返回给monitor服务,导致报警异常,之前的监控文件注入方式比qga方式在这一问题上更严重,因为监控完全是通过注入的脚本在云主机内部获取并推送的,用户可以做任何修改,所以这个问题也应该不是问题;
  • 用户隐私保护问题:用户可能会担心我们使用qga获取云主机内部的某些敏感文件,但这个问题其实在云计算环境是无法完全避免的,即使不通过qga,我们也可以通过拷贝云主机镜像并重新创建新的云主机的方式获取云主机内部的任意文件,所以这应该也不是问题;
  • LXC支持问题:目前不支持LXC,只支持基于kvm的虚拟机。
  1. 对网易私有云项目的其他用途
  • 云主机内部操作系统状态检查

主要是心跳获取过程可以改为主动通过qga进行查询,而不是依赖注入的定时任务进行上报,这样会更准确及时一些。

  • 冻结云主机内部文件系统

可以用于保证快照操作的安全性*(不影响云主机正常运行,目前有潜在问题);可以通过扩展qga功能来保证文件系统一致性(sync功能)。

  1. 类似agent比较

该agent的原理与qga完全相同,是redhat公司为自己的OVirt虚拟化管理平台开发的与虚拟机交互的方案,与qga不同之处在于redhat采用了python作为其guest agent的开发语言,其支持的协议也是基于JSON格式的,并且它支持部分windows系统(但是配置起来比较复杂),另外它没有提供通用的文件读写功能,对我们的监控实现来说比较麻烦。其支持的功能列表可以在其主页查看到:http://www.ovirt.org/Guest_Agent

 

参考资料:http://www.ovirt.org/Category:Ovirt_guest_agent

qemu guest agent研究

1. qemu-guest-agent虚拟机内安装:

debian: 在/etc/apt/sources.list增加一行 deb http://ftp.cn.debian.org/debian sid main,sudo apt-get update,sudo apt-get install qemu-guest-agent
ubuntu: 在/etc/apt/sources.list增加一行 deb http://free.nchc.org.tw/ubuntu/ raring main universe,sudo apt-get update,sudo apt-get install qemu-guest-agent

2.安装卡住

  原因是如果你先修改了libvirt的配置文件,增加了virtio-serial的配置,并且name=’org.qemu.guest_agent.0’,那么由于在/etc/init.d/qemu-guest-agent启动脚本中没有增加-d参数,导致qemu-guest-agent处于前台启动过程,无法退出,导致安装卡住,解决方法是kill掉qemu-ga进程或者先安装qemu-guest-agent,之后再修改libvirt配置。

3.libvirt配置文件

     <channel type=’unix’>
       <source mode=’bind’ path=’/var/lib/libvirt/qemu/test.agent’/>
        <target type=’virtio’ name=’com.163.spice.0’/>
     </channel>
要注意path=’/var/lib/libvirt/qemu/test.agent’这个路径libvirt-qemu:kvm用户要有权限进行读写,否则虚拟机会启动失败。

4.无法与宿主机通信

要输出如下内容才基本可断定配置的serial可以通信:
root@debian:~# qemu-ga -v -p /dev/virtio-ports/com.163.spice.0
1372055252.431905: debug: received EOF
1372055252.532232: debug: received EOF
1372055252.632594: debug: received EOF
1372055252.732949: debug: received EOF
否则要查找原因。
我遇到一个特别奇怪的问题,如果我按照libvirt官方配置说明中的配置,
<channel type=’unix’>
      <source mode=’bind’ path=’/var/lib/libvirt/qemu/test.agent’/>
      <target type=’virtio’ name=’org.qemu.guest_agent.0’/>
</channel>
会导致/dev/virtio-ports/org.qemu.guest_agent.0无法用来与宿主机通信,改为com.163.guest_agent.0或者org.qemu.ga.0或者其他类似的名字,甚至不写(默认名称com.redhat.spice.0)都OK,我的libvirt版本是0.9.13,qemu版本(qemu-kvm  1.1.2+dfsg-2),虚拟机内核版本(Linux debian 3.2.0-3-amd64/Linux ubuntu 3.2.0-29-generic)。
这个问题的原因是如果不改名,libvirt就会自己连接到这个socket上,所以如果你不想让libvirt连接,就得改掉默认的名称。详见http://wiki.libvirt.org/page/Qemu_guest_agent(Configure guest agent without libvirt interference)

5. 依赖的内核模块(virtio_console)

debian wheezy 3.2内核编译处理的qemu-guest-agent:https://github.com/aspirer/study/blob/master/qemu-guest-agent/qemu-ga

—————————————————————————————-

qemu编译:

apt-get install libzip-dev libsdl1.2-dev  uml-utilities  dh-autoreconf  bridge-utils libpixman-1-dev
可选安装包(不确定是否需要):build-essential
./configure –target-list=x86_64-softmmu –prefix=/usr –localstatedir=/var –sysconfdir=/etc –enable-debug
make或者只编译qemu-guest-agent: make qemu-ga

nova-cinder交互流程分析

nova-cinder交互流程分析

原文地址:http://aspirer2004.blog.163.com/blog/static/106764720134755131463/

本文主要调研cinder与nova的交互流程,分析了自有块存储系统与nova的整合问题。

1.    Nova现有API统计

nova已经支持的块设备API可以参考http://api.openstack.org/api-ref.html中Volume Attachments,Volume Extension to Compute两个部分的说明。

操作类(所有删除操作都是异步的,需要用户自行调用查询API进行确认):

  • 创建块设备(包括从快照恢复出块设备)(可以指定块设备AZ)(需要提供用户ID)
  • 删除块设备(需要提供用户ID和块设备ID)
  • 挂载块设备(需要指定用户ID,云主机ID,块设备ID)
  • 卸载块设备(需要指定用户ID,云主机ID,块设备ID)
  • 给块设备建快照(需要提供用户ID和块设备ID)
  • 删除快照(需要提供用户ID和快照ID)

查询类:

  • 列出云主机上挂载的块设备(需要指定用户ID和云主机ID)
  • 根据云主机ID及挂载在其上的块设备ID查询挂载详细信息(需要指定用户ID,云主机ID,块设备ID)
  • 查询用户所有的块设备(需要提供用户ID)
  • 根据块设备ID查询用户某个块设备的详细信息(需要提供用户ID和块设备ID)
  • 查询用户所有的块设备快照(需要提供用户ID)
  • 查询用户所有的块设备快照详细信息(需要提供用户ID和快照ID)

 

需要新增API:

  • 扩容API(我们这边有新增API的经验,比较容易实现)

2.    Nova-Cinder交互流程分析

这里只选择两个比较典型的交互过程进行分析。

2.1     创建块设备cinder流程

创建块设备支持从快照恢复出块设备。

API URL:POST http://localhost:8774/v1.1/{tenant_id}/os-volumes

Request parameters

Parameter        Description

tenant_id                   The unique identifier of the tenant or account.

volume_id         The unique identifier for a volume.

Volume              A partial representation of a volume that is used to create a volume.

Create Volume Request: JSON

{

“volume”: {

“display_name”: “vol-001”,

“display_description”: “Another volume.”,

“size”: 30,

“volume_type”: “289da7f8-6440-407c-9fb4-7db01ec49164”,

“metadata”: {“contents”: “junk”},

“availability_zone”: “us-east1”

}

}

 

Create Volume Response: JSON

{

“volume”: {

“id”: “521752a6-acf6-4b2d-bc7a-119f9148cd8c”,

“display_name”: “vol-001”,

“display_description”: “Another volume.”,

“size”: 30,

“volume_type”: “289da7f8-6440-407c-9fb4-7db01ec49164”,

“metadata”: {“contents”: “junk”},

“availability_zone”: “us-east1”,

“snapshot_id”: null,

“attachments”: [],

“created_at”: “2012-02-14T20:53:07Z”

}

}

 

 

# nova\api\openstack\compute\contrib\volumes.py:

VolumeController.create()

@wsgi.serializers(xml=VolumeTemplate)

@wsgi.deserializers(xml=CreateDeserializer)

def create(self, req, body):

“””Creates a new volume.”””

context = req.environ[‘nova.context’]

authorize(context)

 

if not self.is_valid_body(body, ‘volume’):

raise exc.HTTPUnprocessableEntity()

 

vol = body[‘volume’]

# 卷类型,暂时不支持,参数不传入即可

vol_type = vol.get(‘volume_type’, None)

if vol_type:

try:

vol_type = volume_types.get_volume_type_by_name(context,

vol_type)

except exception.NotFound:

raise exc.HTTPNotFound()

 

metadata = vol.get(‘metadata’, None)

# 如果要从快照恢复卷,传入要被恢复的快照ID即可

snapshot_id = vol.get(‘snapshot_id’)

 

if snapshot_id is not None:

# 从快照恢复云硬盘需要实现如下方法,self.volume_api下面会有说明

snapshot = self.volume_api.get_snapshot(context, snapshot_id)

else:

snapshot = None

 

size = vol.get(‘size’, None)

if size is None and snapshot is not None:

size = snapshot[‘volume_size’]

 

LOG.audit(_(“Create volume of %s GB”), size, context=context)

# 卷AZ信息

availability_zone = vol.get(‘availability_zone’, None)

# 云硬盘需要实现如下方法,self.volume_api下面会有说明

new_volume = self.volume_api.create(context,

size,

vol.get(‘display_name’),

vol.get(‘display_description’),

snapshot=snapshot,

volume_type=vol_type,

metadata=metadata,

availability_zone=availability_zone

)

 

# TODO(vish): Instance should be None at db layer instead of

#             trying to lazy load, but for now we turn it into

#             a dict to avoid an error.

retval = _translate_volume_detail_view(context, dict(new_volume))

result = {‘volume’: retval}

 

location = ‘%s/%s’ % (req.url, new_volume[‘id’])

 

return wsgi.ResponseObject(result, headers=dict(location=location))

 

# self.volume_api说明

self.volume_api = volume.API()

volume是from nova import volume导入的

# nova\volume\__init__.py:

def API():

importutils = nova.openstack.common.importutils

cls = importutils.import_class(nova.flags.FLAGS.volume_api_class)

return cls()

可见self.volume_api调用的所有方法都是由配置项volume_api_class决定的,默认配置是使用nova-volume的API封装类,

cfg.StrOpt(‘volume_api_class’,

default=’nova.volume.api.API’,

help=’The full class name of the volume API class to use’),

也可以改用cinder的API封装类,只要把配置改为volume_api_class=nova.volume.cinder.API即可,cinder API封装类通过调用封装了创建卷方法的cinder_client库来调用到cinder的API,云硬盘可以实现一个类似的client库,也可以直接调用已有的API来实现相同的动作(cinder_client库也是对cinder API调用的封装),云硬盘可以参考nova\volume\cinder.py开发自己的API封装类,供NVS使用,由于API已经开发完成,所以只是封装API,工作量应该不是很大,需要注意的应该是认证问题。

快照相关操作及查询与上述流程没有区别,只要模仿nova\volume\cinder.py即可实现。

 

 

2.2     挂载块设备cinder流程

API URL:POST http://localhost:8774/v2/{tenant_id}/servers/{server_id}/os-volume_attachments

 

Request parameters

Parameter        Description

tenant_id                   The ID for the tenant or account in a multi-tenancy cloud.

server_id          The UUID for the server of interest to you.

volumeId          ID of the volume to attach.

device                Name of the device e.g. /dev/vdb. Use “auto” for autoassign (if supported).

volumeAttachment          A dictionary representation of a volume attachment.

Attach Volume to Server Request: JSON

{

‘volumeAttachment’: {

‘volumeId’: volume_id,

‘device’: device

}

}

 

Attach Volume to Server Response: JSON

{

“volumeAttachment”: {

“device”: “/dev/vdd”,

“serverId”: “fd783058-0e27-48b0-b102-a6b4d4057cac”,

“id”: “5f800cf0-324f-4234-bc6b-e12d5816e962”,

“volumeId”: “5f800cf0-324f-4234-bc6b-e12d5816e962”

}

}

需要注意的是这个API返回是同步的,但挂载卷到虚拟机是异步的。

# nova\api\openstack\compute\contrib\volumes.py:

VolumeAttachmentController.create()

@wsgi.serializers(xml=VolumeAttachmentTemplate)

def create(self, req, server_id, body):

“””Attach a volume to an instance.”””

context = req.environ[‘nova.context’]

authorize(context)

 

if not self.is_valid_body(body, ‘volumeAttachment’):

raise exc.HTTPUnprocessableEntity()

 

volume_id = body[‘volumeAttachment’][‘volumeId’]

device = body[‘volumeAttachment’].get(‘device’)

 

msg = _(“Attach volume %(volume_id)s to instance %(server_id)s”

” at %(device)s”) % locals()

LOG.audit(msg, context=context)

 

try:

instance = self.compute_api.get(context, server_id)

# nova-compute负责挂载卷到虚拟机

device = self.compute_api.attach_volume(context, instance,

volume_id, device)

except exception.NotFound:

raise exc.HTTPNotFound()

 

# The attach is async

attachment = {}

attachment[‘id’] = volume_id

attachment[‘serverId’] = server_id

attachment[‘volumeId’] = volume_id

attachment[‘device’] = device

 

# NOTE(justinsb): And now, we have a problem…

# The attach is async, so there’s a window in which we don’t see

# the attachment (until the attachment completes).  We could also

# get problems with concurrent requests.  I think we need an

# attachment state, and to write to the DB here, but that’s a bigger

# change.

# For now, we’ll probably have to rely on libraries being smart

 

# TODO(justinsb): How do I return “accepted” here?

return {‘volumeAttachment’: attachment}

 

# nova\compute\api.py:API.attach_volume()

@wrap_check_policy

@check_instance_lock

def attach_volume(self, context, instance, volume_id, device=None):

“””Attach an existing volume to an existing instance.”””

# NOTE(vish): Fail fast if the device is not going to pass. This

#             will need to be removed along with the test if we

#             change the logic in the manager for what constitutes

#             a valid device.

if device and not block_device.match_device(device):

raise exception.InvalidDevicePath(path=device)

# NOTE(vish): This is done on the compute host because we want

#             to avoid a race where two devices are requested at

#             the same time. When db access is removed from

#             compute, the bdm will be created here and we will

#             have to make sure that they are assigned atomically.

device = self.compute_rpcapi.reserve_block_device_name(

context, device=device, instance=instance)

try:

# 云硬盘需要实现的方法,也可以参考nova\volume\cinder.py

volume = self.volume_api.get(context, volume_id)

# 检测卷是否可以挂载

self.volume_api.check_attach(context, volume)

# 预留要挂载的卷,防止并发挂载问题

self.volume_api.reserve_volume(context, volume)

# RPC Cast异步调用到虚拟机所在的宿主机的nova-compute服务进行挂载

self.compute_rpcapi.attach_volume(context, instance=instance,

volume_id=volume_id, mountpoint=device)

except Exception:

with excutils.save_and_reraise_exception():

self.db.block_device_mapping_destroy_by_instance_and_device(

context, instance[‘uuid’], device)

# API在这里返回

return device

 

# nova\compute\manager.py:ComputeManager.attach_volume()

@exception.wrap_exception(notifier=notifier, publisher_id=publisher_id())

@reverts_task_state

@wrap_instance_fault

def attach_volume(self, context, volume_id, mountpoint, instance):

“””Attach a volume to an instance.”””

try:

return self._attach_volume(context, volume_id,

mountpoint, instance)

except Exception:

with excutils.save_and_reraise_exception():

self.db.block_device_mapping_destroy_by_instance_and_device(

context, instance.get(‘uuid’), mountpoint)

 

def _attach_volume(self, context, volume_id, mountpoint, instance):

# 同上面的volume_api.get方法

volume = self.volume_api.get(context, volume_id)

context = context.elevated()

LOG.audit(_(‘Attaching volume %(volume_id)s to %(mountpoint)s’),

locals(), context=context, instance=instance)

try:

# 这里返回的是initiator信息,下面有分析

connector = self.driver.get_volume_connector(instance)

# 云硬盘需要实现的方法,下面有cinder的具体实现

connection_info = self.volume_api.initialize_connection(context,

volume,

connector)

except Exception:  # pylint: disable=W0702

with excutils.save_and_reraise_exception():

msg = _(“Failed to connect to volume %(volume_id)s ”

“while attaching at %(mountpoint)s”)

LOG.exception(msg % locals(), context=context,

instance=instance)

# 这个方法也要实现

self.volume_api.unreserve_volume(context, volume)

 

if ‘serial’ not in connection_info:

connection_info[‘serial’] = volume_id

 

try:

self.driver.attach_volume(connection_info,

instance[‘name’],

mountpoint)

except Exception:  # pylint: disable=W0702

with excutils.save_and_reraise_exception():

msg = _(“Failed to attach volume %(volume_id)s ”

“at %(mountpoint)s”)

LOG.exception(msg % locals(), context=context,

instance=instance)

self.volume_api.terminate_connection(context,

volume,

connector)

# 这个方法也要实现,作用是更新cinder数据库中的卷的状态

self.volume_api.attach(context,

volume,

instance[‘uuid’],

mountpoint)

values = {

‘instance_uuid’: instance[‘uuid’],

‘connection_info’: jsonutils.dumps(connection_info),

‘device_name’: mountpoint,

‘delete_on_termination’: False,

‘virtual_name’: None,

‘snapshot_id’: None,

‘volume_id’: volume_id,

‘volume_size’: None,

‘no_device’: None}

self.db.block_device_mapping_update_or_create(context, values)

 

# nova\virt\libvirt\driver.py:LibvirtDriver.get_volume_connector()

def get_volume_connector(self, instance):

if not self._initiator:

self._initiator = libvirt_utils.get_iscsi_initiator()

if not self._initiator:

LOG.warn(_(‘Could not determine iscsi initiator name’),

instance=instance)

return {

‘ip’: FLAGS.my_ip, #宿主机IP地址

‘initiator’: self._initiator,

‘host’: FLAGS.host #宿主机名

}

# nova\virt\libvirt\utils.py:get_iscsi_initiator()

def get_iscsi_initiator():

“””Get iscsi initiator name for this machine”””

# NOTE(vish) openiscsi stores initiator name in a file that

#            needs root permission to read.

contents = utils.read_file_as_root(‘/etc/iscsi/initiatorname.iscsi’)

for l in contents.split(‘\n’):

if l.startswith(‘InitiatorName=’):

return l[l.index(‘=’) + 1:].strip()

 

nova中cinder API封装实现:

# nova\volume\cinder.py:API.initialize_connection():

def initialize_connection(self, context, volume, connector):

return cinderclient(context).\

volumes.initialize_connection(volume[‘id’], connector)

 

调用的是cinder中的initialize_connection,iscsi driver的实现如下:

# cinder\volume\iscsi.py:LioAdm.initialize_connection()

def initialize_connection(self, volume, connector):

volume_iqn = volume[‘provider_location’].split(‘ ‘)[1]

 

(auth_method, auth_user, auth_pass) = \

volume[‘provider_auth’].split(‘ ‘, 3)

 

# Add initiator iqns to target ACL

try:

self._execute(‘rtstool’, ‘add-initiator’,

volume_iqn,

auth_user,

auth_pass,

connector[‘initiator’],

run_as_root=True)

except exception.ProcessExecutionError as e:

LOG.error(_(“Failed to add initiator iqn %s to target”) %

connector[‘initiator’])

raise exception.ISCSITargetAttachFailed(volume_id=volume[‘id’])

 

# nova\virt\libvirt\driver.py:LibvirtDriver.attach_volume()

@exception.wrap_exception()

def attach_volume(self, connection_info, instance_name, mountpoint):

virt_dom = self._lookup_by_name(instance_name)

mount_device = mountpoint.rpartition(“/”)[2]

# 可能需要改动,下面会分析这个方法

conf = self.volume_driver_method(‘connect_volume’,

connection_info,

mount_device)

 

if FLAGS.libvirt_type == ‘lxc’:

self._attach_lxc_volume(conf.to_xml(), virt_dom, instance_name)

else:

try:

# 挂载到虚拟机上

virt_dom.attachDevice(conf.to_xml())

except Exception, ex:

if isinstance(ex, libvirt.libvirtError):

errcode = ex.get_error_code()

if errcode == libvirt.VIR_ERR_OPERATION_FAILED:

self.volume_driver_method(‘disconnect_volume’,

connection_info,

mount_device)

raise exception.DeviceIsBusy(device=mount_device)

 

with excutils.save_and_reraise_exception():

self.volume_driver_method(‘disconnect_volume’,

connection_info,

mount_device)

 

# TODO(danms) once libvirt has support for LXC hotplug,

# replace this re-define with use of the

# VIR_DOMAIN_AFFECT_LIVE & VIR_DOMAIN_AFFECT_CONFIG flags with

# attachDevice()

# 重新define一下,以间接实现持久化的挂载

domxml = virt_dom.XMLDesc(libvirt.VIR_DOMAIN_XML_SECURE)

self._conn.defineXML(domxml)

 

# nova\virt\libvirt\driver.py:LibvirtDriver.volume_driver_method()

def volume_driver_method(self, method_name, connection_info,

*args, **kwargs):

driver_type = connection_info.get(‘driver_volume_type’)

if not driver_type in self.volume_drivers:

raise exception.VolumeDriverNotFound(driver_type=driver_type)

driver = self.volume_drivers[driver_type]

method = getattr(driver, method_name)

return method(connection_info, *args, **kwargs)

def __init__():

……

self.volume_drivers = {}

for driver_str in FLAGS.libvirt_volume_drivers:

driver_type, _sep, driver = driver_str.partition(‘=’)

driver_class = importutils.import_class(driver)

self.volume_drivers[driver_type] = driver_class(self)

volume_drivers是由配置项libvirt_volume_drivers决定的,默认配置是:

cfg.ListOpt(‘libvirt_volume_drivers’,

default=[

‘iscsi=nova.virt.libvirt.volume.LibvirtISCSIVolumeDriver’,

‘local=nova.virt.libvirt.volume.LibvirtVolumeDriver’,

‘fake=nova.virt.libvirt.volume.LibvirtFakeVolumeDriver’,

‘rbd=nova.virt.libvirt.volume.LibvirtNetVolumeDriver’,

‘sheepdog=nova.virt.libvirt.volume.LibvirtNetVolumeDriver’

],

help=‘Libvirt handlers for remote volumes.’),

云硬盘可以使用已有的iscsi driver,也可以参考iscsi实现自己的driver,iscsi driver的内容为:

# nova\virt\libvirt\volume.py:LibvirtISCSIVolumeDriver:

class LibvirtISCSIVolumeDriver(LibvirtVolumeDriver):

“””Driver to attach Network volumes to libvirt.”””

 

def _run_iscsiadm(self, iscsi_properties, iscsi_command, **kwargs):

check_exit_code = kwargs.pop(‘check_exit_code’, 0)

(out, err) = utils.execute(‘iscsiadm’, ‘-m’, ‘node’, ‘-T’,

iscsi_properties[‘target_iqn’],

‘-p’, iscsi_properties[‘target_portal’],

*iscsi_command, run_as_root=True,

check_exit_code=check_exit_code)

LOG.debug(“iscsiadm %s: stdout=%s stderr=%s” %

(iscsi_command, out, err))

return (out, err)

 

def _iscsiadm_update(self, iscsi_properties, property_key, property_value,

**kwargs):

iscsi_command = (‘–op’, ‘update’, ‘-n’, property_key,

‘-v’, property_value)

return self._run_iscsiadm(iscsi_properties, iscsi_command, **kwargs)

 

@utils.synchronized(‘connect_volume’)

def connect_volume(self, connection_info, mount_device):

“””Attach the volume to instance_name”””

iscsi_properties = connection_info[‘data’]

# NOTE(vish): If we are on the same host as nova volume, the

#             discovery makes the target so we don’t need to

#             run –op new. Therefore, we check to see if the

#             target exists, and if we get 255 (Not Found), then

#             we run –op new. This will also happen if another

#             volume is using the same target.

try:

self._run_iscsiadm(iscsi_properties, ())

except exception.ProcessExecutionError as exc:

# iscsiadm returns 21 for “No records found” after version 2.0-871

if exc.exit_code in [21, 255]:

self._run_iscsiadm(iscsi_properties, (‘–op’, ‘new’))

else:

raise

 

if iscsi_properties.get(‘auth_method’):

self._iscsiadm_update(iscsi_properties,

“node.session.auth.authmethod”,

iscsi_properties[‘auth_method’])

self._iscsiadm_update(iscsi_properties,

“node.session.auth.username”,

iscsi_properties[‘auth_username’])

self._iscsiadm_update(iscsi_properties,

“node.session.auth.password”,

iscsi_properties[‘auth_password’])

 

# NOTE(vish): If we have another lun on the same target, we may

#             have a duplicate login

self._run_iscsiadm(iscsi_properties, (“–login”,),

check_exit_code=[0, 255])

 

self._iscsiadm_update(iscsi_properties, “node.startup”, “automatic”)

 

host_device = (“/dev/disk/by-path/ip-%s-iscsi-%s-lun-%s” %

(iscsi_properties[‘target_portal’],

iscsi_properties[‘target_iqn’],

iscsi_properties.get(‘target_lun’, 0)))

 

# The /dev/disk/by-path/… node is not always present immediately

# TODO(justinsb): This retry-with-delay is a pattern, move to utils?

tries = 0

while not os.path.exists(host_device):

if tries >= FLAGS.num_iscsi_scan_tries:

raise exception.NovaException(_(“iSCSI device not found at %s”)

% (host_device))

 

LOG.warn(_(“ISCSI volume not yet found at: %(mount_device)s. ”

“Will rescan & retry.  Try number: %(tries)s”) %

locals())

 

# The rescan isn’t documented as being necessary(?), but it helps

self._run_iscsiadm(iscsi_properties, (“–rescan”,))

 

tries = tries + 1

if not os.path.exists(host_device):

time.sleep(tries ** 2)

 

if tries != 0:

LOG.debug(_(“Found iSCSI node %(mount_device)s ”

“(after %(tries)s rescans)”) %

locals())

 

connection_info[‘data’][‘device_path’] = host_device

sup = super(LibvirtISCSIVolumeDriver, self)

return sup.connect_volume(connection_info, mount_device)

 

@utils.synchronized(‘connect_volume’)

def disconnect_volume(self, connection_info, mount_device):

“””Detach the volume from instance_name”””

sup = super(LibvirtISCSIVolumeDriver, self)

sup.disconnect_volume(connection_info, mount_device)

iscsi_properties = connection_info[‘data’]

# NOTE(vish): Only disconnect from the target if no luns from the

#             target are in use.

device_prefix = (“/dev/disk/by-path/ip-%s-iscsi-%s-lun-” %

(iscsi_properties[‘target_portal’],

iscsi_properties[‘target_iqn’]))

devices = self.connection.get_all_block_devices()

devices = [dev for dev in devices if dev.startswith(device_prefix)]

if not devices:

self._iscsiadm_update(iscsi_properties, “node.startup”, “manual”,

check_exit_code=[0, 255])

self._run_iscsiadm(iscsi_properties, (“–logout”,),

check_exit_code=[0, 255])

self._run_iscsiadm(iscsi_properties, (‘–op’, ‘delete’),

check_exit_code=[0, 21, 255])

也即主要实现了卷挂载到宿主机和从宿主机卸载两个方法。

 

 

2.3     相关代码源文件

nova\volume\cinder.py源文件(云硬盘需要实现的方法或者要封装的API都在这里面):    https://github.com/openstack/nova/blob/stable/folsom/nova/volume/cinder.py

nova\virt\libvirt\volume.py源文件(云硬盘需要实现的driver可以参考这个文件):    https://github.com/openstack/nova/blob/stable/folsom/nova/virt/libvirt/volume.py

# 默认的driver映射关系,可以看出iscsi卷使用的是LibvirtISCSIVolumeDriver

cfg.ListOpt(‘libvirt_volume_drivers’,

default=[

‘iscsi=nova.virt.libvirt.volume.LibvirtISCSIVolumeDriver’,

‘local=nova.virt.libvirt.volume.LibvirtVolumeDriver’,                  ‘fake=nova.virt.libvirt.volume.LibvirtFakeVolumeDriver’,

‘rbd=nova.virt.libvirt.volume.LibvirtNetVolumeDriver’,

‘sheepdog=nova.virt.libvirt.volume.LibvirtNetVolumeDriver’

],

help=‘Libvirt handlers for remote volumes.’),

 

cinder处理各种API请求的抽象类源文件:    https://github.com/openstack/cinder/blob/master/cinder/volume/manager.py

上述抽象类会调用不同的driver去执行实际的动作,完成API的请求,其中iSCSI driver源文件为:

# 默认的volume driver是cinder.volume.drivers.lvm.LVMISCSIDriver

cfg.StrOpt(‘volume_driver’,

default=‘cinder.volume.drivers.lvm.LVMISCSIDriver’,

help=‘Driver to use for volume creation’),

]    https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/lvm.py#L304

它继承了LVMVolumeDriver, driver.ISCSIDriver两个类,其中后一个类所在的源文件为:    https://github.com/openstack/cinder/blob/master/cinder/volume/driver.py#L199    https://github.com/openstack/cinder/blob/master/cinder/volume/driver.py#L339这里的self.tgtadm是在    https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/lvm.py#L321这里初始化的,调用的是    https://github.com/openstack/cinder/blob/master/cinder/volume/iscsi.py#L460这里的方法。

iscsi_helper默认使用的是tgtadm:

cfg.StrOpt(‘iscsi_helper’,

default=‘tgtadm’,

help=‘iscsi target user-land tool to use’),

3.    需要新增的API

  • 扩容云硬盘的API(或者直接调用云硬盘已有的API,但是推荐nova新增一个,这样云硬盘就不必对外暴露任何API了,都可以经过nova来转发处理。)

 

4.    需要注意的问题

  • 之前云硬盘agent实现的一下错误恢复、异常处理逻辑需要在nova里面实现
  • 挂载点在云主机内外看到的不一致问题(因为nova挂载动作是异步的,所以返回给用户的是libvirt看到的挂载点,不是实际的虚拟机内部的挂载点,目前考虑通过查询卷信息接口返回最终的挂载点)
  • 用户及认证问题(之前云硬盘应该用的是管理平台的用户认证逻辑,如果改为使用nova接口,需要使用keystone的用户认证,不知道可否在管理平台那一层转换一下)

 

总的来说云硬盘所需要做的改动应该不大,工作重点在于封装已有的API,提供client即可(参考https://github.com/openstack/nova/blob/stable/folsom/nova/volume/cinder.py),另外driver(参考https://github.com/openstack/nova/blob/stable/folsom/nova/virt/libvirt/volume.py)里面要实现扩容逻辑,应该可以重用agent中现有的代码。

Nova image create流程

原文地址:http://aspirer2004.blog.163.com/blog/static/10676472013215111713232/

完整文档下载地址:nova镜像生成流程  nova镜像生成流程.docx

本文主要讨论nova/virt/libvirt/driver.py:_create_image的相关流程,只讨论file磁盘,不包括EBS盘(block设备)。

  1. Resize过程的镜像拷贝优化
  • 优化之前

首先通过libvirt的XMLDesc()方法拿到虚拟机的配置文件,然后从配置文件中读取所有file类型磁盘的信息(路径,driver,qcow2的backing file);然后如果是不同host之间resize,则qemu-img convert合并base和子镜像为qcow2(无backing file),之后通过rsync ssh方式拷贝合并后的镜像到新的host对于instance目录下,然后在if not os.path.exists(self.path) or not os.path.exists(base):则创建镜像,resize过程这个self.path是已经拷贝过来的,所以不需要创建镜像,也就是什么都不做。

  • 优化之后(仅优化了resize过程,创建过程与优化之前相同)

拷贝镜像是用rsync的daemon push模式,并且不合并base和子镜像,只拷贝子镜像部分,然后在目标host上检查base是否存在,不存在则下载,扩容,最后qemu-img rebase把子镜像rebase到新的base上;目前第二块盘(disk.local)以及swap盘(disk.swap)是不拷贝的,因为如果拷贝过去的仅仅是子镜像,会导致base找不到,为disk.local、disk.swap准备base镜像这部分代码没有实现,所以在拷贝子镜像过程中忽略了disk.local,disk.swap目前没有配置,所以代码里面没有忽略,如果开启了swap的配置,则resize过程会出现问题(base找不到导致虚拟机无法启动)。

优化后的镜像生成流程:

# nova/virt/libvirt/driver.py:LibvirtDriver._create_image()

if snapshot_optimization \

and not self._volume_in_mapping(self.default_root_device,

block_device_info):

self._create_snapshot_image(context, instance,

disk_images[‘image_id’],

basepath(‘disk’), size)

 

# nova/virt/libvirt/driver.py:LibvirtDriver

# use optimized snapshot image

# 优化后的resize过程镜像生成流程

def _create_snapshot_image(self, context, instance, image_id,

target, size):

# NOTE(hzwangpan): for resize operation, the ‘disk’ is copied from

# source node before _create_image(), so if we fetch the ‘disk’ here,

# it will cover the ‘disk’ copied from source

# 只有当’disk’不存在的时候才下载’disk’,M3从快照恢复流程优化的遗留代码

# M3的快照只有COW部分也即’disk’,所以创建虚拟机的时候要先下载’disk’,

# 然后根据其backing file的名称从glance下载它的base,这里就是下载’disk’

# 的流程,因为社区原有的下载镜像代码会转换镜像格式,而我们不需要转换,

# 所以这里新加了一个fetch_orig_image()方法。

# resize的时候instance目录下’disk’是存在的,已经从源端拷贝过来了。

if not os.path.exists(target):

libvirt_utils.fetch_orig_image(context=context, target=target,

image_id=image_id,

user_id=instance[“user_id”],

project_id=instance[“project_id”])

 

if not os.path.exists(target):

LOG.error(_(“fetch image failed, image id: %s”), image_id,

instance=instance, context=context)

raise exception.CouldNotFetchImage(image_id)

# 查询’disk’的backing file信息,也即查找其base

backing_file = libvirt_utils.get_disk_backing_file(target)

if not backing_file:

LOG.error(_(“get backing file of image %s failed”), image_id,

instance=instance, context=context)

raise exception.ImageUnacceptable(image_id=image_id,

reason=_(“%s doesn’t has backing file”) % target)

 

virtual_size = libvirt_utils.get_disk_size(target)

size = max(size, virtual_size)

 

# get base image by backing file

# 根据backing file名称下载base镜像

# 如果没有M3那种不完整的快照存在,则从backing file名称下载base镜像

# 的流程可以简化为根据image id下载镜像,因为每一个虚拟机都是从一个

# 完整的镜像/快照创建的,所以resize的时候根据虚拟机的image id下载到

# 的镜像就是’disk’的base。

base_dir = os.path.join(FLAGS.instances_path, ‘_base’)

if not os.path.exists(base_dir):

utils.ensure_tree(base_dir)

old_backing_file = os.path.join(base_dir, backing_file)

old_size = 0

if “_” in os.path.basename(old_backing_file):

base_img = old_backing_file.rsplit(“_”, 1)[0]

old_size = int(old_backing_file.rsplit(“_”, 1)[1]) * \

(1024L * 1024L * 1024L)

else:

base_img = old_backing_file

# 先检查不带大小信息的base是否存在,如果存在就不需要从glance下载了

# 如果不存在,则需要从glance下载base

if not os.path.exists(base_img):

self._get_base_image_by_backing_file(context, instance,

image_id, base_img)

 

lock_path = os.path.join(FLAGS.instances_path, ‘locks’)

 

@utils.synchronized(base_img, external=True, lock_path=lock_path)

def copy_and_extend(base_img, target_img, size):

if not os.path.exists(target_img):

libvirt_utils.copy_image(base_img, target_img)

disk.extend(target_img, size)

 

# NOTE(wangpan): qemu-img rebase ‘Safe mode’ need the old backing file,

#                refer to qemu-img manual for more details.

# 从没有大小信息的base拷贝扩容出’disk’的老的backing file,因为qemu-img

# rebase默认是采用“安全模式”的,这种模式需要COW部分的新老backing file

# 都存在才能正常执行。

if old_size:

copy_and_extend(base_img, old_backing_file, old_size)

# 从没有大小信息的base拷贝扩容出’disk’的新的backing file,也即resize之后的大小

new_backing_file = base_img

if size:

size_gb = size / (1024 * 1024 * 1024)

new_backing_file += “_%d” % size_gb

copy_and_extend(base_img, new_backing_file, size)

 

# when old_backing_file != new_backing_file, rebase is needed

# 如果新老backing file不一样,则需要对’disk’进行rebase操作

if old_backing_file != new_backing_file:

libvirt_utils.rebase_cow_image(new_backing_file, target)

 

def _get_base_image_by_backing_file(self, context, instance,

image_id, backing_file):

base_image_id_sha1 = os.path.basename(backing_file)

LOG.debug(_(“image id sha1 of backing file %(backing_file)s ”

“is: %(base_image_id_sha1)s”) % locals(),

instance=instance, context=context)

 

(image_service, image_id) = glance.get_remote_image_service(

context, image_id)

# 根据base名称,从glance查询镜像/快照信息

image_info = image_service.get_image_properties(context,

“image_id_sha1”,

base_image_id_sha1)

if not image_info:

LOG.error(_(“can’t find base image by base_image_id_sha1 ”

” %(base_image_id_sha1)s, snapshot image_id: %(image_id)s”) %

locals(), instance=instance, context=context)

raise exception.ImageNotFound(image_id=base_image_id_sha1)

base_image_id = str(image_info[0].get(“image_id”))

 

lock_path = os.path.join(FLAGS.instances_path, ‘locks’)

# 下载找到的镜像/快照

@utils.synchronized(base_image_id_sha1,

external=True, lock_path=lock_path)

def fetch_base_image(context, target, image_id, user_id, project_id):

if not os.path.exists(target):

# 使用原有的下载镜像的方法,会转换镜像格式

libvirt_utils.fetch_image(context=context,

target=target,

image_id=image_id,

user_id=user_id,

project_id=project_id)

 

fetch_base_image(context, backing_file, base_image_id,

instance[“user_id”], instance[“project_id”])

if not os.path.exists(backing_file):

LOG.error(_(“fetch base image failed, image id: %s”),

base_image_id, instance=instance, context=context)

raise exception.CouldNotFetchImage(base_image_id)

  1. 公共流程

创建和resize的公共流程:

# nova/virt/libvirt/driver.py:LibvirtDriver._create_image()

# syntactic nicety(为了语法好看,定义了三个内部方法)

def basepath(fname=, suffix=suffix):

return os.path.join(FLAGS.instances_path,

instance[‘name’],

fname + suffix)

 

def image(fname, image_type=FLAGS.libvirt_images_type):

return self.image_backend.image(instance[‘name’],

fname + suffix, image_type)

 

def raw(fname):

return image(fname, image_type=‘raw’)

 

# ensure directories exist and are writable

# 创建instance目录,用来存放镜像和libvirt.xml配置文件

utils.ensure_tree(basepath(suffix=))

# 写入libvirt.xml配置文件

libvirt_utils.write_to_file(basepath(‘libvirt.xml’), libvirt_xml)

# 写入console.log控制台输出文件

libvirt_utils.write_to_file(basepath(‘console.log’,),, 007)

 

# get image type(为了优化镜像流程而新增的代码)

image_type = None

has_base_id_sha1 = False

(image_service, image_id) = glance.get_remote_image_service(

context, disk_images[‘image_id’])

try:

image_info = image_service.show(context, image_id)

if image_info and ‘properties’ in image_info:

if image_info[‘properties’].get(‘image_type’) == “snapshot”:

image_type = “snapshot”

else: # 如果不是快照,则认为是普通镜像

image_type = “image”

# base_image_id_sha1是为了兼容M3时的快照(只上传COW部分)

if image_info[‘properties’].get(‘base_image_id_sha1’):

has_base_id_sha1 = True

except Exception:

image_type = None

has_base_id_sha1 = None

LOG.warn(_(“get image type of %s faild”) % image_id,

context=context, instance=instance)

pass

 

# 检查镜像是否有backing file,也即是否只是COW部分

backing_file = None

if os.path.exists(basepath(‘disk’)):

backing_file = libvirt_utils.get_disk_backing_file(

basepath(‘disk’))

 

# 下面的这些判断都是为了检查是否需要走我们自己修改的镜像流程

# snapshot_optimization为True则需要走修改后的流程

snapshot_optimization = False

# check use image snapshot optimization or not

use_qcow2 = ((FLAGS.libvirt_images_type == ‘default’ and

FLAGS.use_cow_images) or

FLAGS.libvirt_images_type == ‘qcow2’)

 

# only qcow2 image may be need to optimize, and images with

# ‘kernel_id’ or ‘ramdisk_id’ shouldn’t be optimized

if FLAGS.allow_image_snapshot_optimization and use_qcow2 and \

not disk_images[‘kernel_id’] and not disk_images[‘ramdisk_id’]:

# 下面的这些if语句是为了判断当前属于哪种镜像的哪个操作

# 然后就可以判断是否需要走修改后的流程,这种判断方式比较人肉,

# 以后改起来也比较麻烦,但目前没有更好的办法了。

# normal image, when create instance(普通镜像的创建虚拟机过程)

if image_type == “image” and backing_file is None and \

not has_base_id_sha1:

snapshot_optimization = False

 

# normal image, when resize(普通镜像的resize过程)

if image_type == “image” and backing_file is not None and \

not has_base_id_sha1:

snapshot_optimization = True

 

# unbroken snapshot, when create instance(完整快照的创建虚拟机过程)

if image_type == “snapshot” and backing_file is None and \

not has_base_id_sha1:

snapshot_optimization = False

 

# unbroken snapshot, when resize(完整快照的resize过程)

if image_type == “snapshot” and backing_file is not None and \

not has_base_id_sha1:

snapshot_optimization = True

 

# only cow part snapshot, when create instance

# (只有COW部分的快照(M3修改)的创建过程)

if image_type == “snapshot” and backing_file is None and \

has_base_id_sha1:

snapshot_optimization = True

 

# only cow part snapshot, when resize

# (只有COW部分的快照(M3修改)的resize过程)

if image_type == “snapshot” and backing_file is not None and \

has_base_id_sha1:

snapshot_optimization = True

 

# 生成base的文件名

root_fname = hashlib.sha1(str(disk_images[‘image_id’])).hexdigest()

  1. 创建过程
  • 概述

对于qcow2格式镜像root盘,原有流程是先下载镜像(或者说先创建base),然后qemu-img create生成子镜像(disk),对于qcow2格式的第二块临时盘和第三块swap盘,首先是通过mkfs/mkswap创建base,之后qemu-img create生成子镜像(disk.local/disk.swap)。

传入参数:

# nova/virt/libvirt/driver.py:LibvirtDriver.spawn()

self._create_image(context, instance, xml, network_info=network_info,

block_device_info=block_device_info,

files=injected_files,

                       admin_pass=admin_password)

  • Root盘

目前如果不是M3版本的快照文件,完整快照或者镜像的创建过程与社区F版本流程一致。首先根据image id下载镜像,之后转换、copy、扩容后生成并Cache base镜像,最后qemu-img create创建COW部分的disk。

# nova/virt/libvirt/driver.py:LibvirtDriver._create_image()

elif not self._volume_in_mapping(self.default_root_device,

block_device_info):

# image是上面说的三个内部方法之一,初始化为一个对象,具体的对象是

# 根据镜像的格式来确定的,FLAGS.libvirt_images_type默认是default,

# 然后会再判断FLAGS.use_cow_images是否为True,默认值为True

# 如果是True则image是Qcow2类的对象,目前这两个值都是保持默认。

# 否则则是Raw,LVM则需要配置libvirt_images_type=’lvm’。

# ‘disk’参数是root盘的文件名,也就是加上instance目录后的image.path

# cache就是image类里的一个方法,用来缓存base

# fetch_func就是如果base不存在,用来从glance下载镜像的方法

# filename是base文件的名称

image(‘disk’).cache(fetch_func=libvirt_utils.fetch_image,

context=context,

filename=root_fname,

size=size,

image_id=disk_images[‘image_id’],

user_id=instance[‘user_id’],

project_id=instance[‘project_id’])

 

# nova/virt/libvirt/imagebackend.py:Image.cache()

def cache(self, fetch_func, filename, size=None, *args, **kwargs):

“””Creates image from template.

 

Ensures that template and image not already exists.

Ensures that base directory exists.

Synchronizes on template fetching.

 

:fetch_func: Function that creates the base image

Should accept target argument.

:filename: Name of the file in the image directory

:size: Size of created image in bytes (optional)

“””

# 根据base的文件名加锁,防止两个创建过程同时下载导致的镜像损坏

@utils.synchronized(filename, external=True, lock_path=self.lock_path)

def call_if_not_exists(target, *args, **kwargs):

# 这里的判断必不可少,因为可能拿到锁的时候另外一个创建流程已经下载过了这个镜像

if not os.path.exists(target):

fetch_func(target=target, *args, **kwargs)

 

# 如果instance目录下’disk’文件已经存在,则什么都不做,否则生成’disk’

if not os.path.exists(self.path): # self.path的初始化见下面的代码

base_dir = os.path.join(FLAGS.instances_path, ‘_base’)

if not os.path.exists(base_dir):

utils.ensure_tree(base_dir)

base = os.path.join(base_dir, filename)

# 把下载镜像的方法作为参数传给创建disk的方法

self.create_image(call_if_not_exists, base, size,

*args, **kwargs)

# nova/virt/libvirt/imagebackend.py:

class Qcow2(Image):

def __init__(self, instance, name):

super(Qcow2, self).__init__(“file”, “qcow2”, is_block_dev=False)

# instance=instance[‘name’],name=’disk’

# self.path就是instance目录下的disk文件

self.path = os.path.join(FLAGS.instances_path,

instance, name)

 

def create_image(self, prepare_template, base, size, *args, **kwargs):

# 加锁,防止镜像在使用过程中中被删除或修改

@utils.synchronized(base, external=True, lock_path=self.lock_path)

def copy_qcow2_image(base, target, size):

qcow2_base = base

if size:

size_gb = size / (1024 * 1024 * 1024)

qcow2_base += ‘_%d’ % size_gb

if not os.path.exists(qcow2_base):

with utils.remove_path_on_error(qcow2_base):

# 根据flavor拷贝后扩容base

libvirt_utils.copy_image(base, qcow2_base)

disk.extend(qcow2_base, size)

# 使用qemu-img create命令行创建COW部分也即disk文件

libvirt_utils.create_cow_image(qcow2_base, target)

# 使用传入的下载镜像的方法下载镜像,也即准备base

prepare_template(target=base, *args, **kwargs)

with utils.remove_path_on_error(self.path):

copy_qcow2_image(base, self.path, size)

下载时是先把镜像保存在_base目录下,命名为root_fname.part,然后转换为raw格式,转换过程中的目标文件命名为root_fname.converted,转换完成后删除root_fname.part,并把root_fname.converted改为root_fname,扩容后的后面加上size信息例如root_fname_10。

生成的libvirt.xml配置文件中root盘的配置为:

<disk type=‘file’ device=‘disk’>

    <driver name=‘qemu’ type=‘qcow2’ cache=‘none’/>

    <source file=‘/home/openstack/nova/instances/instance-000005ac/disk’/>

    <target dev=‘vda’ bus=‘virtio’/>

</disk>

  • Ephemeral盘

首先qemu-img create创建base(mkfs.ext3格式化),之后qemu-img create创建COW部分的disk.local,配置文件与root盘相同,只是file文件的名称(disk改为disk.local)、target dev(vda改为vdb)不同而已。

# nova/virt/libvirt/driver.py:LibvirtDriver._create_image()

ephemeral_gb = instance[‘ephemeral_gb’]

if ephemeral_gb and not self._volume_in_mapping(

self.default_second_device, block_device_info):

# 如果有第二块盘’disk.local’,则swap盘作为第三块盘vdc

swap_device = self.default_third_device

# 封装创建第二块盘的方法_create_ephemeral

fn = functools.partial(self._create_ephemeral,

fs_label=‘ephemeral0’,

os_type=instance[“os_type”])

fname = “ephemeral_%s_%s_%s” % (“0”,

ephemeral_gb,

instance[“os_type”])

size = ephemeral_gb * 1024 * 1024 * 1024

# 与root盘的创建流程类似,差别只是将从glance下载镜像改为qemu-img创建base

image(‘disk.local’).cache(fetch_func=fn,

filename=fname,

size=size,

ephemeral_size=ephemeral_gb)

else:

swap_device = self.default_second_device

 

# nova/virt/libvirt/driver.py:LibvirtDriver

def _create_ephemeral(self, target, ephemeral_size, fs_label, os_type):

# 创建未格式化的空磁盘文件

self._create_local(target, ephemeral_size)

# 格式化为ext3格式

disk.mkfs(os_type, fs_label, target)

 

@staticmethod

def _create_local(target, local_size, unit=‘G’,

fs_format=None, label=None):

“””Create a blank image of specified size”””

 

if not fs_format:

fs_format = FLAGS.default_ephemeral_format # 默认为None

# qemu-img create命令创建raw格式的base

libvirt_utils.create_image(‘raw’, target,

‘%d%c’ % (local_size, unit))

if fs_format: # =None,这里不执行

libvirt_utils.mkfs(fs_format, target, label)

  • Swap盘

流程与ephemeral盘相同,只是base格式不同,首先qemu-img创建base,并用mkswap格式化,之后qemu-img create创建COW部分的disk.local。配置文件与root盘相同,只是file文件的名称(disk改为disk. swap)、target dev(vda改为vdb/vdc,根据有无Ephemeral盘而定)不同而已。

  1. Resize/冷迁移过程
  • Root盘

resize源端:

# nova/virt/libvirt/driver.py:LibvirtDriver

@exception.wrap_exception()

def migrate_disk_and_power_off(self, context, instance, dest,

instance_type, network_info,

block_device_info=None):

LOG.debug(_(“Starting migrate_disk_and_power_off”),

instance=instance)

# 获取虚拟机上所有的type=’file’类型的disk的信息

disk_info_text = self.get_instance_disk_info(instance[‘name’])

disk_info = jsonutils.loads(disk_info_text)

# 关机

self.power_off(instance)

# 块设备处理,我们目前没有使用cinder,所以这里不处理

block_device_mapping = driver.block_device_info_get_mapping(

block_device_info)

for vol in block_device_mapping:

connection_info = vol[‘connection_info’]

mount_device = vol[‘mount_device’].rpartition(“/”)[2]

self.volume_driver_method(‘disconnect_volume’,

connection_info,

mount_device)

 

# copy disks to destination

# rename instance dir to +_resize at first for using

# shared storage for instance dir (eg. NFS).

# 拷贝disk到目标host

same_host = (dest == self.get_host_ip_addr())

inst_base = “%s/%s” % (FLAGS.instances_path, instance[‘name’])

inst_base_resize = inst_base + “_resize”

clean_remote_dir = False

try:

# 先把instance目录改为instance-xxxxx_resize,备份过程

utils.execute(‘mv’, inst_base, inst_base_resize)

if same_host:

dest = None

utils.execute(‘mkdir’, ‘-p’, inst_base)

else:

# 不同宿主机之间的resize

if not FLAGS.use_rsync:

# 优化前的流程是用ssh创建目标端的instance目录

utils.execute(‘ssh’, dest, ‘mkdir’, ‘-p’, inst_base)

else:

# 新流程是用rsync创建目录

libvirt_utils.make_remote_instance_dir(inst_base_resize,

dest, instance[‘name’])

clean_remote_dir = True

# 遍历所有disk

for info in disk_info:

# assume inst_base == dirname(info[‘path’])

img_path = info[‘path’]

fname = os.path.basename(img_path)

 

# FIXME(wangpan): when resize, we ignore the ephemeral disk

# 我们在这里忽略了第二块盘’disk.local’,不拷贝到目标端

# 这里我们还应该忽略第三块盘’disk.swap’,不过暂时没用到

if fname == “disk.local”:

LOG.debug(_(“ignore disk.local when resize”),

instance=instance)

continue

 

from_path = os.path.join(inst_base_resize, fname)

remote_path = “%s/%s” % (instance[‘name’], fname)

if info[‘type’] == ‘qcow2’ and info[‘backing_file’]:

tmp_path = from_path + “_rbase”

# Note(hzzhoushaoyu): if allow optimization, just copy

# qcow2 to destination without merge.

# 优化后的流程是只拷贝COW部分,不合并COW和base

if FLAGS.allow_image_snapshot_optimization:

tmp_path = from_path

else:

# merge backing file

# 老的流程是先合并COW和base之后再拷贝

utils.execute(‘qemu-img’, ‘convert’, ‘-f’, ‘qcow2’,

‘-O’, ‘qcow2’, from_path, tmp_path)

 

if same_host and \

not FLAGS.allow_image_snapshot_optimization:

utils.execute(‘mv’, tmp_path, img_path)

elif same_host and FLAGS.allow_image_snapshot_optimization:

utils.execute(‘cp’, tmp_path, img_path)

else:

if not FLAGS.use_rsync:

# 老的流程使用rsync的ssh模式拷贝磁盘文件

libvirt_utils.copy_image(tmp_path, img_path,

host=dest)

else:

# 优化后使用rsync的daemon push模式拷贝

libvirt_utils.copy_image_to_remote(tmp_path,

remote_path, dest)

if not FLAGS.allow_image_snapshot_optimization:

utils.execute(‘rm’, ‘-f’, tmp_path)

 

else:  # raw or qcow2 with no backing file

if not FLAGS.use_rsync or same_host:

libvirt_utils.copy_image(from_path, img_path,

host=dest)

else:

libvirt_utils.copy_image_to_remote(tmp_path,

remote_path, dest)

except Exception, e:

try:

# 异常处理,清理残留文件

if os.path.exists(inst_base_resize):

utils.execute(‘rm’, ‘-rf’, inst_base)

if clean_remote_dir and FLAGS.use_rsync:

libvirt_utils.clean_remote_dir(instance[‘name’], dest)

utils.execute(‘mv’, inst_base_resize, inst_base)

if not FLAGS.use_rsync:

utils.execute(‘ssh’, dest, ‘rm’, ‘-rf’, inst_base)

except Exception:

pass

raise e

# 返回磁盘信息共目的端使用

return disk_info_text

 

resize目的端:

# nova/virt/libvirt/driver.py:LibvirtDriver

@exception.wrap_exception()

def finish_migration(self, context, migration, instance, disk_info,

network_info, image_meta, resize_instance,

block_device_info=None):

LOG.debug(_(“Starting finish_migration”), instance=instance)

# 生成libvirt.xml文件

xml = self.to_xml(instance, network_info,

block_device_info=block_device_info)

# assume _create_image do nothing if a target file exists.

# TODO(oda): injecting files is not necessary

# 这里生成镜像,但是实际上社区原有流程不会生成镜像,因为’disk’已经拷贝过来了

# 所以imagebackend.py里面的cache方法什么事情都不做

# 这里主要是创建instance目录,写入libvirt.xml和console.log文件

# 但是我们修改后的流程,会根据’disk’的backing file下载它的base,

# 还会在这里重新生成第二块盘的base和’disk.local’,

# 第三块盘’disk.swap’因为拷贝的时候没有忽略,所以这里不会重新生成,

# 所以这里可能会导致disk.swap找不到base,虚拟机启动失败。

self._create_image(context, instance, xml,

network_info=network_info,

block_device_info=None)

 

# resize disks. only “disk” and “disk.local” are necessary.

# resize磁盘,忽略了第三块盘

disk_info = jsonutils.loads(disk_info)

for info in disk_info:

fname = os.path.basename(info[‘path’])

if fname == ‘disk’:

size = instance[‘root_gb’]

elif fname == ‘disk.local’:

size = instance[‘ephemeral_gb’]

else:

size = 0

size *= 1024 * 1024 * 1024

 

# If we have a non partitioned image that we can extend

# then ensure we’re in ‘raw’ format so we can extend file system.

fmt = info[‘type’]

# 如果是qcow2格式的镜像,并且可以resize,则先把它转换为raw格式

if (size and fmt == ‘qcow2’ and

disk.can_resize_fs(info[‘path’], size, use_cow=True)):

path_raw = info[‘path’] + ‘_raw’

utils.execute(‘qemu-img’, ‘convert’, ‘-f’, ‘qcow2’,

‘-O’, ‘raw’, info[‘path’], path_raw)

utils.execute(‘mv’, path_raw, info[‘path’])

fmt = ‘raw’

# resize磁盘

if size:

disk.extend(info[‘path’], size)

 

if fmt == ‘raw’ and FLAGS.use_cow_images:

# back to qcow2 (no backing_file though) so that snapshot

# will be available

# 如果是raw格式或者刚刚被转换成raw格式,则再次转换成qcow2

path_qcow = info[‘path’] + ‘_qcow’

utils.execute(‘qemu-img’, ‘convert’, ‘-f’, ‘raw’,

‘-O’, ‘qcow2’, info[‘path’], path_qcow)

utils.execute(‘mv’, path_qcow, info[‘path’])

### 上面的两次转换过程是很耗时的,所以不建议这么做

### 还好我们目前的root_gb大小都是一样的,不会做resize动作

 

# 创建虚拟机

self._create_domain_and_network(xml, instance, network_info,

block_device_info)

# 等待虚拟机启动

timer = utils.LoopingCall(self._wait_for_running, instance)

timer.start(interval=0.5).wait()

 

  • Ephemeral盘

与root盘相同

  • Swap盘

与root盘相同

  1. 热迁移过程(带block migration情况)
  • Root盘

热迁移目的端:

# nova/virt/libvirt/driver.py:LibvirtDriver

def pre_block_migration(self, ctxt, instance, disk_info_json):

“””Preparation block migration.

 

:params ctxt: security context

:params instance:

nova.db.sqlalchemy.models.Instance object

instance object that is migrated.

:params disk_info_json:

json strings specified in get_instance_disk_info

 

“””

# 与resize相同,disk_info_json也是找到的所有type=’file’的disk

disk_info = jsonutils.loads(disk_info_json)

 

# make instance directory

instance_dir = os.path.join(FLAGS.instances_path, instance[‘name’])

if os.path.exists(instance_dir):

raise exception.DestinationDiskExists(path=instance_dir)

os.mkdir(instance_dir)

# 遍历所有file disk

for info in disk_info:

base = os.path.basename(info[‘path’])

# Get image type and create empty disk image, and

# create backing file in case of qcow2.

instance_disk = os.path.join(instance_dir, base)

# 如果disk没有backing file(raw格式、或者不带backing file的qcow2)

# 则直接用’qemu-img create’创建空盘,磁盘内容会随着热迁移流程拷贝过来

# 这就是block migration的意义。

if not info[‘backing_file’]:

libvirt_utils.create_image(info[‘type’], instance_disk,

info[‘disk_size’])

else:

# 有backing file的disk

# 与创建虚拟机相同的镜像生成流程,也就是先准备base,

# 再qemu-img create ‘disk’,这里生成的’disk’也是类似空盘

# 需要block migration拷贝过来

# 需要注意的是如果是M3的不完整快照,这里的流程会出错,

# 因为这里是根据image id下载base的,而不完整的快照的id就是它本身

# 我们需要的是根据快照的id找到并下载它的base

# M4的完整快照应该与普通镜像相同,所以没有这个问题

# Creating backing file follows same way as spawning instances.

cache_name = os.path.basename(info[‘backing_file’])

# Remove any size tags which the cache manages

cache_name = cache_name.split(‘_’)[0]

# 下面的流程与创建流程相同

image = self.image_backend.image(instance[‘name’],

instance_disk,

FLAGS.libvirt_images_type)

image.cache(fetch_func=libvirt_utils.fetch_image,

context=ctxt,

filename=cache_name,

image_id=instance[‘image_ref’],

user_id=instance[‘user_id’],

project_id=instance[‘project_id’],

size=info[‘virt_disk_size’])

  • Ephemeral盘

与root盘相同

  • Swap盘

与root盘相同

 

 

 

debian libvirt-0.9.12编译

需要安装的依赖包:
apt-get install gcc  make pkg-config libxml2-dev libgnutls-dev libdevmapper-dev python-dev libnl-dev libyajl-dev
覆盖deb包的安装方式:
./configure –prefix=/usr –libdir=/usr/lib –localstatedir=/var –sysconfdir=/etc
make && make install
也可以不覆盖已有的libvirt,默认参数即可
./configure,不过需要注意库的连接问题
debian编译libvirt-0.9.12遇到的问题:
######error: failed to get the hypervisor version
######error: internal error Cannot find suitable emulator for x86_64
解决方法:安装libyajl-dev之后重新./configure,make,make install
12版本./configure的时候不会提示这个libyajl-dev包,但是编译安装后会无法连接到qemu-kvm hypervisor,这个问题在0.9.13里面解决了,所以提前安装好这个包很重要,这个问题困扰了我两次,所以现在把它记下来。
另外debian下载源码用dget很方便,找到debian网站上的相关软件包,右面会有源码下载链接,右键复制XXX.dsc文件的链接地址,在服务器上安装dget,之后dget 刚刚复制的链接,即可下载到三个文件,一个dsc文件,一个官方原始源码包,一个debian的patch包,之后用dpkg-source -x XXX.dsc,即可把两个源码包解压合并成完整的源码目录,在这个目录下修改代码,之后就可以编译了。