nova、cinder的snapshot与ceph相关交互




其实这两块代码已经看过很多遍了,但是经常忘记,而且经常有同事问起来,每次都要翻一遍代码,因此决定整理下相关交互流程,后续备查。(注:仍然是基于Mitaka版本分析源码,Queens版本粗略看了下改动不大)

先说nova部分,nova的snapshot主要是针对系统盘的(root disk),相关命令行是:

[root@vs2-compute-84 ~]# nova help backup
usage: nova backup <server> <name> <backup-type> <rotation>

Backup a server by creating a 'backup' type snapshot.

Positional arguments:
  <server>       Name or ID of server.
  <name>         Name of the backup image.
  <backup-type>  The backup type, like "daily" or "weekly".
  <rotation>     Int parameter representing how many backups to keep around.
[root@vs2-compute-84 ~]# nova help image-create
usage: nova image-create [--metadata <key=value>] [--show] [--poll]
                         <server> <name>

Create a new image by taking a snapshot of a running server.

Positional arguments:
  <server>                Name or ID of server.
  <name>                  Name of snapshot.

Optional arguments:
  --metadata <key=value>  Record arbitrary key/value metadata to
                          /meta_data.json on the metadata server. Can be
                          specified multiple times.
  --show                  Print image info.
  --poll                  Report the snapshot progress and poll until image
                          creation is complete.

从nova源码来看,backup和image(snapshot)的区别只是多了个rotation的概念,多出的这个概念主要是用来限制备份数量的(超过这个数量的backup会被滚动删除,肯定是先删除最老的),你如果把rotation设置的很大,那它就跟image没什么区别了,nova后端的代码也是一套(api入口不一样,但是到了nova compute manager那层就没什么区别了)

    @wrap_exception()
    @reverts_task_state
    @wrap_instance_fault
    def backup_instance(self, context, image_id, instance, backup_type,
                        rotation):
        """Backup an instance on this host.

        :param backup_type: daily | weekly
        :param rotation: int representing how many backups to keep around
        """
        self._do_snapshot_instance(context, image_id, instance, rotation)
        self._rotate_backups(context, instance, backup_type, rotation)

    @delete_image_on_error
    def _do_snapshot_instance(self, context, image_id, instance, rotation):
        self._snapshot_instance(context, image_id, instance,
                                task_states.IMAGE_BACKUP)
    @wrap_exception()
    @reverts_task_state
    @wrap_instance_fault
    @delete_image_on_error
    def snapshot_instance(self, context, image_id, instance):
        """Snapshot an instance on this host.

        :param context: security context
        :param instance: a nova.objects.instance.Instance object
        :param image_id: glance.db.sqlalchemy.models.Image.Id
        """
        ......

        self._snapshot_instance(context, image_id, instance,
                                task_states.IMAGE_SNAPSHOT)

从上面源码可见二者调用的具体实现没有区别。

_snapshot_instance调用了libvirt driver的snapshot方法,这里面区分了live和cold的snapshot类型,并且还区分了direct snapshot和外部快照,ceph后端是用的direct snapshot,也即通过ceph的rbd image相关api来做快照。

        try:
            update_task_state(task_state=task_states.IMAGE_UPLOADING,
                              expected_state=task_states.IMAGE_PENDING_UPLOAD)
            metadata['location'] = snapshot_backend.direct_snapshot(
                context, snapshot_name, image_format, image_id,
                instance.image_ref)
            self._snapshot_domain(context, live_snapshot, virt_dom, state,
                                  instance)
            self._image_api.update(context, image_id, metadata,
                                   purge_props=False)
        except (NotImplementedError, exception.ImageUnacceptable,
                exception.Forbidden) as e:
    def direct_snapshot(self, context, snapshot_name, image_format,
                        image_id, base_image_id):
        """Creates an RBD snapshot directly.
        """
        fsid = self.driver.get_fsid()
        # NOTE(nic): Nova has zero comprehension of how Glance's image store
        # is configured, but we can infer what storage pool Glance is using
        # by looking at the parent image.  If using authx, write access should
        # be enabled on that pool for the Nova user
        parent_pool = self._get_parent_pool(context, base_image_id, fsid)

        # Snapshot the disk and clone it into Glance's storage pool.  librbd
        # requires that snapshots be set to "protected" in order to clone them
        self.driver.create_snap(self.rbd_name, snapshot_name, protect=True)
        location = {'url': 'rbd://%(fsid)s/%(pool)s/%(image)s/%(snap)s' %
                           dict(fsid=fsid,
                                pool=self.pool,
                                image=self.rbd_name,
                                snap=snapshot_name)}
        try:
            self.driver.clone(location, image_id, dest_pool=parent_pool)
            # Flatten the image, which detaches it from the source snapshot
            self.driver.flatten(image_id, pool=parent_pool)
        finally:
            # all done with the source snapshot, clean it up
            self.cleanup_direct_snapshot(location)

        # Glance makes a protected snapshot called 'snap' on uploaded
        # images and hands it out, so we'll do that too.  The name of
        # the snapshot doesn't really matter, this just uses what the
        # glance-store rbd backend sets (which is not configurable).
        self.driver.create_snap(image_id, 'snap', pool=parent_pool,
                                protect=True)
        return ('rbd://%(fsid)s/%(pool)s/%(image)s/snap' %
                dict(fsid=fsid, pool=parent_pool, image=image_id))

可以看出经过了创建临时snapshot(还在nova系统盘的pool)、在glance pool中clone snapshot出新rbd卷(跨pool clone卷)、flatten(clone的新卷与snapshot解除关联)、删除临时快照(清理临时资源)、glance pool中的rbd image创建snapshot,此snapshot就是生成的云主机(虚拟机)系统盘的快照(新的镜像,或者叫自定义镜像、捕获镜像、镜像模板等,总之就是nova image-create生成的东西,可以用glance image-list看到),也就是说glance中的image(不管是管理员上传的image还是nova image-create制作的image,都是snap)对应的是rbd里面的一个snap而不是实际的卷,这样创建新的云主机(虚拟机)的时候,系统盘直接从snap clone一个rbd卷就好了,由于支持COW,因此实际clone过程中数据copy量极少、创建系统盘卷速度特别快(这也是glance镜像在有云主机使用的情况下不能删除的原因)。

rbd snapshot的原理可以参考前同事的一篇文章:http://www.sysnote.org/2016/02/28/ceph-rbd-snap/

direct+live snapshot场景下,创建临时snapshot过程中,由于云主机一直运行中,因此可能有部分数据还在内存的磁盘缓存中,没有刷新到磁盘,所以还是有一定概率导致制作的系统盘快照是损坏的。

上面是ceph后端的流程,本地存储后端的snapshot流程可参考之前的文章:Mitaka Nova在线快照数据丢失问题及解决方法

nova这边其实还有一种需要跟cinder(ceph)交互的功能,boot-from-volume,从卷启动云主机(虚拟机),这种情况下cinder list里面看到的volume是bootable的,不过这种功能在ceph后端场景下不常用,就不介绍了。

接下来是cinder部分,涉及的命令行应该有create、backup-create、snapshot-create这几个(还有没有其它的不确定,估计应该没了):

[root@vs2-compute-84 ~]# cinder help create
usage: cinder create [--consisgroup-id <consistencygroup-id>]
                     [--snapshot-id <snapshot-id>]
                     [--source-volid <source-volid>]
                     [--source-replica <source-replica>]
                     [--image-id <image-id>] [--image <image>] [--name <name>]
                     [--description <description>]
                     [--volume-type <volume-type>]
                     [--availability-zone <availability-zone>]
                     [--metadata [<key=value> [<key=value> ...]]]
                     [--hint <key=value>] [--allow-multiattach]
                     [<size>]

Creates a volume.

Positional arguments:
  <size>                Size of volume, in GiBs. (Required unless snapshot-id
                        /source-volid is specified).

Optional arguments:
  --consisgroup-id <consistencygroup-id>
                        ID of a consistency group where the new volume belongs
                        to. Default=None.
  --snapshot-id <snapshot-id>
                        Creates volume from snapshot ID. Default=None.
  --source-volid <source-volid>
                        Creates volume from volume ID. Default=None.
  --source-replica <source-replica>
                        Creates volume from replicated volume ID.
                        Default=None.
  --image-id <image-id>
                        Creates volume from image ID. Default=None.
  --image <image>       Creates a volume from image (ID or name).
                        Default=None.
  --name <name>         Volume name. Default=None.
  --description <description>
                        Volume description. Default=None.
  --volume-type <volume-type>
                        Volume type. Default=None.
  --availability-zone <availability-zone>
                        Availability zone for volume. Default=None.
  --metadata [<key=value> [<key=value> ...]]
                        Metadata key and value pairs. Default=None.
  --hint <key=value>    Scheduler hint, like in nova.
  --allow-multiattach   Allow volume to be attached more than once.
                        Default=False
[root@vs2-compute-84 ~]# cinder help backup-create
usage: cinder backup-create [--container <container>] [--name <name>]
                            [--description <description>] [--incremental]
                            [--force] [--snapshot-id <snapshot-id>]
                            <volume>

Creates a volume backup.

Positional arguments:
  <volume>              Name or ID of volume to backup.

Optional arguments:
  --container <container>
                        Backup container name. Default=None.
  --name <name>         Backup name. Default=None.
  --description <description>
                        Backup description. Default=None.
  --incremental         Incremental backup. Default=False.
  --force               Allows or disallows backup of a volume when the volume
                        is attached to an instance. If set to True, backs up
                        the volume whether its status is "available" or "in-
                        use". The backup of an "in-use" volume means your data
                        is crash consistent. Default=False.
  --snapshot-id <snapshot-id>
                        ID of snapshot to backup. Default=None.
[root@vs2-compute-84 ~]# cinder help snapshot-create
usage: cinder snapshot-create [--force [<True|False>]] [--name <name>]
                              [--description <description>]
                              [--metadata [<key=value> [<key=value> ...]]]
                              <volume>

Creates a snapshot.

Positional arguments:
  <volume>              Name or ID of volume to snapshot.

Optional arguments:
  --force [<True|False>]
                        Allows or disallows snapshot of a volume when the
                        volume is attached to an instance. If set to True,
                        ignores the current status of the volume when
                        attempting to snapshot it rather than forcing it to be
                        available. Default=False.
  --name <name>         Snapshot name. Default=None.
  --description <description>
                        Snapshot description. Default=None.
  --metadata [<key=value> [<key=value> ...]]
                        Snapshot metadata key and value pairs. Default=None.

先看create,创建卷,支持多种参数,比如创建裸卷、从snapshot创建卷、从已有的volume创建卷等。

    def execute(self, context, volume_ref, volume_spec):
        ......
        if create_type == 'raw':
            model_update = self._create_raw_volume(volume_ref=volume_ref,
                                                   **volume_spec)
        elif create_type == 'snap':  ## 从snap创建卷
            model_update = self._create_from_snapshot(context,
                                                      volume_ref=volume_ref,
                                                      **volume_spec)
        elif create_type == 'source_vol':  ## 从已有的卷创建新卷
            model_update = self._create_from_source_volume(
                context, volume_ref=volume_ref, **volume_spec)
        elif create_type == 'source_replica':
            model_update = self._create_from_source_replica(
                context, volume_ref=volume_ref, **volume_spec)
        elif create_type == 'image':
            model_update = self._create_from_image(context,
                                                   volume_ref=volume_ref,
                                                   **volume_spec)
        else:
            raise exception.VolumeTypeNotFound(volume_type_id=create_type)

        ......
        return volume_ref

上面忽略了很多taskflow,直接到了cinder.volume.flows.manager.create_volume.CreateVolumeFromSpecTask#execute,cinder里面用到的taskflow一般都是linear类型的,顺序执行,只要一个一个看过去就行了,一般都包含一个参数解析的task,如cinder.volume.flows.manager.create_volume.ExtractVolumeSpecTask,解析出来的参数传递给下一个task使用,最后run起来,正常执行execute,有异常的话就执行revert方法。关于OpenStack的taskflow介绍:https://docs.openstack.org/taskflow/latest/

创建卷的api文档(v2版本,v3也类似):https://developer.openstack.org/api-ref/block-storage/v2/index.html#create-volume

跟snapshot相关的主要是_create_from_snapshot和_create_from_source_volume,先看第一个:

    def _create_from_snapshot(self, context, volume_ref, snapshot_id,
                              **kwargs):
        volume_id = volume_ref['id']
        snapshot = objects.Snapshot.get_by_id(context, snapshot_id)
        model_update = self.driver.create_volume_from_snapshot(volume_ref,
                                                               snapshot)
        ......
    def create_volume_from_snapshot(self, volume, snapshot):
        """Creates a volume from a snapshot."""
        self._clone(volume, self.configuration.rbd_pool,
                    snapshot.volume_name, snapshot.name)
        if self.configuration.rbd_flatten_volume_from_snapshot:
            self._flatten(self.configuration.rbd_pool, volume.name)
        if int(volume.size):
            self._resize(volume)

    def _flatten(self, pool, volume_name):
        LOG.debug('flattening %(pool)s/%(img)s',
                  dict(pool=pool, img=volume_name))
        with RBDVolumeProxy(self, volume_name, pool) as vol:
            vol.flatten()

    def _clone(self, volume, src_pool, src_image, src_snap):
        LOG.debug('cloning %(pool)s/%(img)s@%(snap)s to %(dst)s',
                  dict(pool=src_pool, img=src_image, snap=src_snap,
                       dst=volume.name))

        chunk_size = self.configuration.rbd_store_chunk_size * units.Mi
        order = int(math.log(chunk_size, 2))

        with RADOSClient(self, src_pool) as src_client:
            with RADOSClient(self) as dest_client:
                self.RBDProxy().clone(src_client.ioctx,
                                      utils.convert_str(src_image),
                                      utils.convert_str(src_snap),
                                      dest_client.ioctx,
                                      utils.convert_str(volume.name),
                                      features=src_client.features,
                                      order=order)

    def _resize(self, volume, **kwargs):
        size = kwargs.get('size', None)
        if not size:
            size = int(volume.size) * units.Gi

        with RBDVolumeProxy(self, volume.name) as vol:
            vol.resize(size)

共3步,从snapshot clone新卷、flatten、resize,后面两步不是必须步骤。配置项rbd_flatten_volume_from_snapshot,

    cfg.BoolOpt('rbd_flatten_volume_from_snapshot',
                default=False,
                help='Flatten volumes created from snapshots to remove '
                     'dependency from volume to snapshot'),

从snapshot创建卷的时候是否flatten,默认是False,不flatten。

再看_create_from_source_volume,它调用的是create_cloned_volume,

    def create_cloned_volume(self, volume, src_vref):
        """Create a cloned volume from another volume.

        Since we are cloning from a volume and not a snapshot, we must first
        create a snapshot of the source volume.

        The user has the option to limit how long a volume's clone chain can be
        by setting rbd_max_clone_depth. If a clone is made of another clone
        and that clone has rbd_max_clone_depth clones behind it, the source
        volume will be flattened.
        """
        src_name = utils.convert_str(src_vref.name)
        dest_name = utils.convert_str(volume.name)
        flatten_parent = False

        # Do full copy if requested
        if self.configuration.rbd_max_clone_depth <= 0:
            with RBDVolumeProxy(self, src_name, read_only=True) as vol:
                vol.copy(vol.ioctx, dest_name)

            return

        # Otherwise do COW clone.
        with RADOSClient(self) as client:
            depth = self._get_clone_depth(client, src_name)
            # If source volume is a clone and rbd_max_clone_depth reached,
            # flatten the source before cloning. Zero rbd_max_clone_depth means
            # infinite is allowed.
            if depth == self.configuration.rbd_max_clone_depth:
                LOG.debug("maximum clone depth (%d) has been reached - "
                          "flattening source volume",
                          self.configuration.rbd_max_clone_depth)
                flatten_parent = True

            src_volume = self.rbd.Image(client.ioctx, src_name)
            try:
                # First flatten source volume if required.
                if flatten_parent:
                    _pool, parent, snap = self._get_clone_info(src_volume,
                                                               src_name)
                    # Flatten source volume
                    LOG.debug("flattening source volume %s", src_name)
                    src_volume.flatten()
                    # Delete parent clone snap
                    parent_volume = self.rbd.Image(client.ioctx, parent)
                    try:
                        parent_volume.unprotect_snap(snap)
                        parent_volume.remove_snap(snap)
                    finally:
                        parent_volume.close()

                # Create new snapshot of source volume
                clone_snap = "%s.clone_snap" % dest_name
                LOG.debug("creating snapshot='%s'", clone_snap)
                src_volume.create_snap(clone_snap)
                src_volume.protect_snap(clone_snap)
            except Exception:
                # Only close if exception since we still need it.
                src_volume.close()
                raise

            # Now clone source volume snapshot
            try:
                LOG.debug("cloning '%(src_vol)s@%(src_snap)s' to "
                          "'%(dest)s'",
                          {'src_vol': src_name, 'src_snap': clone_snap,
                           'dest': dest_name})
                self.RBDProxy().clone(client.ioctx, src_name, clone_snap,
                                      client.ioctx, dest_name,
                                      features=client.features)
            except Exception:
                src_volume.unprotect_snap(clone_snap)
                src_volume.remove_snap(clone_snap)
                raise
            finally:
                src_volume.close()

        if volume.size != src_vref.size:
            LOG.debug("resize volume '%(dst_vol)s' from %(src_size)d to "
                      "%(dst_size)d",
                      {'dst_vol': volume.name, 'src_size': src_vref.size,
                       'dst_size': volume.size})
            self._resize(volume)

        LOG.debug("clone created successfully")

这个流程比较多,毕竟要先做一个snapshot,然后再clone新卷,相当于包含了从snapshot创建卷的流程。配置项rbd_max_clone_depth,

    cfg.IntOpt('rbd_max_clone_depth',
               default=5,
               help='Maximum number of nested volume clones that are '
                    'taken before a flatten occurs. Set to 0 to disable '
                    'cloning.'),

默认最大clone深度是5层,达到5层就flatten。

再看下backup操作(其实这个操作跟rbd snapshot没啥大关系),cinder.backup.drivers.ceph.CephBackupDriver#_backup_rbd这里是最终执行的方法,就不具体分析了,主要是有个增量备份过程:

    def _rbd_diff_transfer(self, src_name, src_pool, dest_name, dest_pool,
                           src_user, src_conf, dest_user, dest_conf,
                           src_snap=None, from_snap=None):
        """Copy only extents changed between two points.

        If no snapshot is provided, the diff extents will be all those changed
        since the rbd volume/base was created, otherwise it will be those
        changed since the snapshot was created.
        """
        LOG.debug("Performing differential transfer from '%(src)s' to "
                  "'%(dest)s'",
                  {'src': src_name, 'dest': dest_name})

        # NOTE(dosaboy): Need to be tolerant of clusters/clients that do
        # not support these operations since at the time of writing they
        # were very new.

        src_ceph_args = self._ceph_args(src_user, src_conf, pool=src_pool)
        dest_ceph_args = self._ceph_args(dest_user, dest_conf, pool=dest_pool)

        cmd1 = ['rbd', 'export-diff'] + src_ceph_args
        if from_snap is not None:
            cmd1.extend(['--from-snap', from_snap])
        if src_snap:
            path = utils.convert_str("%s/%s@%s"
                                     % (src_pool, src_name, src_snap))
        else:
            path = utils.convert_str("%s/%s" % (src_pool, src_name))
        cmd1.extend([path, '-'])

        cmd2 = ['rbd', 'import-diff'] + dest_ceph_args
        rbd_path = utils.convert_str("%s/%s" % (dest_pool, dest_name))
        cmd2.extend(['-', rbd_path])

        ret, stderr = self._piped_execute(cmd1, cmd2)
        if ret:
            msg = (_("RBD diff op failed - (ret=%(ret)s stderr=%(stderr)s)") %
                   {'ret': ret, 'stderr': stderr})
            LOG.info(msg)
            raise exception.BackupRBDOperationFailed(msg)

先用’rbd export-diff’导出增量部分,再用’rbd import-diff’导入。参考:https://ceph.com/geen-categorie/incremental-snapshots-with-rbd/

第一次备份的话是走全量备份过程:

    def _full_backup(self, backup_id, volume_id, src_volume, src_name, length):
        """Perform a full backup of src volume.

        First creates a base backup image in our backup location then performs
        an chunked copy of all data from source volume to a new backup rbd
        image.
        """
        backup_name = self._get_backup_base_name(volume_id, backup_id)

        with rbd_driver.RADOSClient(self, self._ceph_backup_pool) as client:
            # First create base backup image
            old_format, features = self._get_rbd_support()
            LOG.debug("Creating backup base image='%(name)s' for volume "
                      "%(volume)s.",
                      {'name': backup_name, 'volume': volume_id})
            self.rbd.RBD().create(ioctx=client.ioctx,
                                  name=backup_name,
                                  size=length,
                                  old_format=old_format,
                                  features=features,
                                  stripe_unit=self.rbd_stripe_unit,
                                  stripe_count=self.rbd_stripe_count)

            LOG.debug("Copying data from volume %s.", volume_id)
            dest_rbd = self.rbd.Image(client.ioctx, backup_name)
            try:
                rbd_meta = rbd_driver.RBDImageMetadata(dest_rbd,
                                                       self._ceph_backup_pool,
                                                       self._ceph_backup_user,
                                                       self._ceph_backup_conf)
                rbd_fd = rbd_driver.RBDImageIOWrapper(rbd_meta)
                self._transfer_data(src_volume, src_name, rbd_fd, backup_name,
                                    length)
            finally:
                dest_rbd.close()

 

最后看下snapshot create,这个流程跟create创建卷流程类似,直接看最终rbd调用就行了:

    def create_snapshot(self, snapshot):
        """Creates an rbd snapshot."""
        with RBDVolumeProxy(self, snapshot.volume_name) as volume:
            snap = utils.convert_str(snapshot.name)
            volume.create_snap(snap)
            volume.protect_snap(snap)

非常简单,就创建snap,然后protect。