sqlalchemy在nova中的应用浅析




sqlalchemy介绍

sqlalchemy是一个python ORM框架,使用它的项目很多,官方网站有列出来一部分:http://www.sqlalchemy.org/organizations.html

python作为流行编程语言,开源的ORM框架有很多,这里有一篇对比文档:https://www.pythoncentral.io/sqlalchemy-vs-orms/

对sqlalchemy的个人理解(仅局限于看了一部分入门文档和官方文档,并对比了一部分OpenStack项目和云容科技项目的具体实现,所以并不一定准确),它是python里比较流行的企业级ORM框架,功能全面、性能优良,受到了开发人员的喜爱,也获得了DBA的认可。

它可以通过ORM将一张数据库表映射成一个python对象(一般为class,表的列映射到attributes),之后操作class的attributes就可以实现操作数据库表的字段的目的,对开发人员屏蔽了复杂的数据库裸sql语句,更符合python语言使用者的编码习惯,而且对于不熟悉sql语句的开发人员来说,它还可以做更多性能方面的优化工作。它支持多种数据库后端,比如SQLite, Postgresql, MySQL, Oracle, MS-SQL等,便于后端数据库的切换。它还能配合数据库版本管理工具、数据迁移工具完成数据库表结构的升级、回退工作,让软件版本发布和数据库表结构变更工作集成一起(很多公司是软件由开发部门发布,由sa维护,数据库结构由dba维护,有可能出现程序和数据库变更不一致的情况,而如果集成在程序中,可以全部交给sa维护,在升级程序的后执行一次数据库升级命令即可),防止二者脱钩引起的程序运行异常。

sqlalchemy也不是完美无缺的,我个人认为它最大的问题在于学习曲线比较陡峭,看官方文档就能发现内容比较多,要深入理解并熟练掌握它需要花费一定的时间和一定的实际项目使用,否则很难精通。当然,掌握它的前提是对关系型数据库的原理和使用有一定的了解或者比较熟悉,否则很难精通。我对关系型数据库的理解很浅,甚至可以说入门级都不到,只是会用它的一些简单功能。后续这块也要补上来。所以这篇文章也只能是浅析。

sqlalchemy包含两种架构或者两大模块,ORM和Core(官方全称是SQL Expression Language),前者属于高级抽象方式,后者属于较底层的方式,跟数据库类型强依赖,可移植性比较差。两者的用法有一定的差异,适用场景也不太一样,但整体思路是能用ORM就不要用Core,http://docs.sqlalchemy.org/en/latest/index.html,官方文档区分了两个部分。

In contrast to the ORM’s domain-centric mode of usage, the SQL Expression Language provides a schema-centric usage paradigm.

sqlalchemy相关教程和文档

sqlalchemy在nova中的应用

基于Mitaka版本nova代码进行分析

model

以Service和Instance为例:

class Service(BASE, NovaBase, models.SoftDeleteMixin):
    """Represents a running service on a host."""

    __tablename__ = 'services'
    __table_args__ = (
        schema.UniqueConstraint("host", "topic", "deleted",
                                name="uniq_services0host0topic0deleted"),
        schema.UniqueConstraint("host", "binary", "deleted",
                                name="uniq_services0host0binary0deleted")
        )

    id = Column(Integer, primary_key=True)
    host = Column(String(255))  # , ForeignKey('hosts.id'))
    binary = Column(String(255))
    topic = Column(String(255))
    report_count = Column(Integer, nullable=False, default=0)
    disabled = Column(Boolean, default=False)
    disabled_reason = Column(String(255))
    last_seen_up = Column(DateTime, nullable=True)
    forced_down = Column(Boolean, default=False)
    version = Column(Integer, default=0)

    instance = orm.relationship(
        "Instance",
        backref='services',
        primaryjoin='and_(Service.host == Instance.host,'
                    'Service.binary == "nova-compute",'
                    'Instance.deleted == 0)',
        foreign_keys=host,
    )

class Instance(BASE, NovaBase, models.SoftDeleteMixin):
    """Represents a guest VM."""
    __tablename__ = 'instances'
    __table_args__ = (
        Index('uuid', 'uuid', unique=True),
        Index('instances_project_id_deleted_idx',
              'project_id', 'deleted'),
        Index('instances_reservation_id_idx',
              'reservation_id'),
        Index('instances_terminated_at_launched_at_idx',
              'terminated_at', 'launched_at'),
        Index('instances_uuid_deleted_idx',
              'uuid', 'deleted'),
        Index('instances_task_state_updated_at_idx',
              'task_state', 'updated_at'),
        Index('instances_host_node_deleted_idx',
              'host', 'node', 'deleted'),
        Index('instances_host_deleted_cleaned_idx',
              'host', 'deleted', 'cleaned'),
        Index('instances_deleted_created_at_idx',
              'deleted', 'created_at'),
        schema.UniqueConstraint('uuid', name='uniq_instances0uuid'),
    )
    injected_files = []

    id = Column(Integer, primary_key=True, autoincrement=True)

    @property
    def name(self):
        try:
            base_name = CONF.instance_name_template % self.id
        except TypeError:
            # Support templates like "uuid-%(uuid)s", etc.
            info = {}
            # NOTE(russellb): Don't use self.iteritems() here, as it will
            # result in infinite recursion on the name property.
            for column in iter(orm.object_mapper(self).columns):
                key = column.name
                # prevent recursion if someone specifies %(name)s
                # %(name)s will not be valid.
                if key == 'name':
                    continue
                info[key] = self[key]
            try:
                base_name = CONF.instance_name_template % info
            except KeyError:
                base_name = self.uuid
        return base_name

    @property
    def _extra_keys(self):
        return ['name']

    user_id = Column(String(255))
    project_id = Column(String(255))

    image_ref = Column(String(255))
    kernel_id = Column(String(255))
    ramdisk_id = Column(String(255))
    hostname = Column(String(255))

    launch_index = Column(Integer)
    key_name = Column(String(255))
    key_data = Column(MediumText())

    power_state = Column(Integer)
    vm_state = Column(String(255))
    task_state = Column(String(255))

    memory_mb = Column(Integer)
    vcpus = Column(Integer)
    root_gb = Column(Integer)
    ephemeral_gb = Column(Integer)
    ephemeral_key_uuid = Column(String(36))

    # This is not related to hostname, above.  It refers
    #  to the nova node.
    host = Column(String(255))  # , ForeignKey('hosts.id'))
    # To identify the "ComputeNode" which the instance resides in.
    # This equals to ComputeNode.hypervisor_hostname.
    node = Column(String(255))

    # *not* flavorid, this is the internal primary_key
    instance_type_id = Column(Integer)

    user_data = Column(MediumText())

    reservation_id = Column(String(255))

    # NOTE(sbiswas7): 'scheduled_at' is still in the database
    # and can be removed in the future release.

    launched_at = Column(DateTime)
    terminated_at = Column(DateTime)

    # This always refers to the availability_zone kwarg passed in /servers and
    # provided as an API option, not at all related to the host AZ the instance
    # belongs to.
    availability_zone = Column(String(255))

    # User editable field for display in user-facing UIs
    display_name = Column(String(255))
    display_description = Column(String(255))

    # To remember on which host an instance booted.
    # An instance may have moved to another host by live migration.
    launched_on = Column(MediumText())

    # NOTE(jdillaman): locked deprecated in favor of locked_by,
    # to be removed in Icehouse
    locked = Column(Boolean)
    locked_by = Column(Enum('owner', 'admin'))

    os_type = Column(String(255))
    architecture = Column(String(255))
    vm_mode = Column(String(255))
    uuid = Column(String(36), nullable=False)

    root_device_name = Column(String(255))
    default_ephemeral_device = Column(String(255))
    default_swap_device = Column(String(255))
    config_drive = Column(String(255))

    # User editable field meant to represent what ip should be used
    # to connect to the instance
    access_ip_v4 = Column(types.IPAddress())
    access_ip_v6 = Column(types.IPAddress())

    auto_disk_config = Column(Boolean())
    progress = Column(Integer)

    # EC2 instance_initiated_shutdown_terminate
    # True: -> 'terminate'
    # False: -> 'stop'
    # Note(maoy): currently Nova will always stop instead of terminate
    # no matter what the flag says. So we set the default to False.
    shutdown_terminate = Column(Boolean(), default=False)

    # EC2 disable_api_termination
    disable_terminate = Column(Boolean(), default=False)

    # OpenStack compute cell name.  This will only be set at the top of
    # the cells tree and it'll be a full cell name such as 'api!hop1!hop2'
    cell_name = Column(String(255))
    internal_id = Column(Integer)

    # Records whether an instance has been deleted from disk
    cleaned = Column(Integer, default=0)

BASE:sqlalchemy的ORM base,所有model都应该继承它,from sqlalchemy.ext.declarative import declarative_base BASE = declarative_base() ,每个介绍sqlalchemy的教程或者文档的示例代码里都能看到它的身影(注意它仅用在ORM模式下,Core模式下不需要),参考:http://docs.sqlalchemy.org/en/latest/orm/mapping_styles.html#declarative-mapping

NovaBase:继承oslo.db的通用基类,重写了__copy__方法,用来在将ORM对象赋值给其他python对象的时候解除与数据库的联系,脱离原有session,保证所有ORM对象中的数据都刷新到数据库,禁用延时加载(获取属性时才从数据库读取出来),如果不这么做就不能安全的把ORM对象赋值给其他变量,注意:前面这些是根据代码注释的个人猜测,仅供参考,并未实际测试验证。

models是oslo.db的模块,models.ModelBase,实现基本crud方法,models.SoftDeleteMixin,实现软删除功能(标记为deleted但并不清理记录);models.TimestampMixin,实现create_at、update_at字段更新功能,参考:http://blog.csdn.net/Bill_Xiang_/article/details/78592389http://blog.csdn.net/happyAnger6/article/details/54772731

nova的mapping方式是Declarative Mapping,官方文档参考上面的BASE类的参考链接。

__tablename__:ORM class对应的数据库表名称;__table_args__:数据库表的一些参数配置,比如索引、约束条件等;

Column,对应数据库表的列:http://docs.sqlalchemy.org/en/latest/core/metadata.html#sqlalchemy.schema.Column

relationship:建立表之间的联系,一对一、一对多、多对一、多对多,参考:http://docs.sqlalchemy.org/en/latest/orm/relationships.html

http://www.cnblogs.com/mrchige/p/6389588.html

backref:是relationship的简化写法,正常要建立表之间的联系,要在两张表的mapping class里面各写一个relationship(),然后用back_populates 指定关联的另外一方的字段名,而使用backref之后,就可以只在一个mapping class里面添加relationship。参考:http://docs.sqlalchemy.org/en/latest/orm/backref.html

uselist:用来标记一对一或者一对多,一对一设置为False,一对多设置为True,也比较容易理解,多了就用list,示例代码http://docs.sqlalchemy.org/en/latest/orm/basic_relationships.html#relationships-one-to-one

primaryjoin:设置表的关联条件,http://docs.sqlalchemy.org/en/latest/orm/join_conditions.html

foreign_keys:外键约束,上面代码里Service的instance relationship的外键host,应该是指instance的host字段受service的host字段约束,参考链接跟上面一样。

query&session

用法说明: http://docs.sqlalchemy.org/en/latest/orm/tutorial.html#creating-a-session

    @staticmethod
    @db.select_db_reader_mode
    def _db_instance_get_by_uuid(context, uuid, columns_to_join,
                                 use_slave=False):
        return db.instance_get_by_uuid(context, uuid,
                                       columns_to_join=columns_to_join)

    @base.remotable_classmethod
    def get_by_uuid(cls, context, uuid, expected_attrs=None, use_slave=False):
        if expected_attrs is None:
            expected_attrs = ['info_cache', 'security_groups']
        columns_to_join = _expected_cols(expected_attrs)
        db_inst = cls._db_instance_get_by_uuid(context, uuid, columns_to_join,
                                               use_slave=use_slave)
        return cls._from_db_object(context, cls(), db_inst,
                                   expected_attrs)
@require_context
@pick_context_manager_reader_allow_async
def instance_get_by_uuid(context, uuid, columns_to_join=None):
    return _instance_get_by_uuid(context, uuid,
                                 columns_to_join=columns_to_join)


def _instance_get_by_uuid(context, uuid, columns_to_join=None):
    result = _build_instance_get(context, columns_to_join=columns_to_join).\
                filter_by(uuid=uuid).\
                first()

    if not result:
        raise exception.InstanceNotFound(instance_id=uuid)

    return result
def _build_instance_get(context, columns_to_join=None):
    query = model_query(context, models.Instance, project_only=True).\
            options(joinedload_all('security_groups.rules')).\
            options(joinedload('info_cache'))
    if columns_to_join is None:
        columns_to_join = ['metadata', 'system_metadata']
    for column in columns_to_join:
        if column in ['info_cache', 'security_groups']:
            # Already always joined above
            continue
        if 'extra.' in column:
            query = query.options(undefer(column))
        else:
            query = query.options(joinedload(column))
    # NOTE(alaski) Stop lazy loading of columns not needed.
    for col in ['metadata', 'system_metadata']:
        if col not in columns_to_join:
            query = query.options(noload(col))
    return query
def model_query(context, model,
                args=None,
                read_deleted=None,
                project_only=False):
    """Query helper that accounts for context's `read_deleted` field.

    :param context:     NovaContext of the query.
    :param model:       Model to query. Must be a subclass of ModelBase.
    :param args:        Arguments to query. If None - model is used.
    :param read_deleted: If not None, overrides context's read_deleted field.
                        Permitted values are 'no', which does not return
                        deleted values; 'only', which only returns deleted
                        values; and 'yes', which does not filter deleted
                        values.
    :param project_only: If set and context is user-type, then restrict
                        query to match the context's project_id. If set to
                        'allow_none', restriction includes project_id = None.
    """

    if read_deleted is None:
        read_deleted = context.read_deleted

    query_kwargs = {}
    if 'no' == read_deleted:
        query_kwargs['deleted'] = False
    elif 'only' == read_deleted:
        query_kwargs['deleted'] = True
    elif 'yes' == read_deleted:
        pass
    else:
        raise ValueError(_("Unrecognized read_deleted value '%s'")
                           % read_deleted)

    query = sqlalchemyutils.model_query(
        model, context.session, args, **query_kwargs)

    # We can't use oslo.db model_query's project_id here, as it doesn't allow
    # us to return both our projects and unowned projects.
    if nova.context.is_user_context(context) and project_only:
        if project_only == 'allow_none':
            query = query.\
                filter(or_(model.project_id == context.project_id,
                           model.project_id == null()))
        else:
            query = query.filter_by(project_id=context.project_id)

    return query

context.session的由来:

首先看下context的由来,可以参考之前的一篇文章:Keystone认证和授权流程分析,里面有提到token验证流程,稍微提及了context的由来,nova.api.auth.NovaKeystoneContext ,然后返回的是nova.context.RequestContext ,分析这个class发现并没有session的初始化代码,但肯定是在这里初始化的session,因为没有别的地方了,看到了class的装饰器@enginefacade.transaction_context_provider ,于是继续分析相关代码,确实是在这里添加的session属性:https://github.com/openstack/oslo.db/blob/ea1ec64274475465f35d68b93dc95b643a51830a/oslo_db/sqlalchemy/enginefacade.py#L1101-L1102

update/insert/delete

这部分代码跟上面的查询比较类似,就不一一贴代码流程了,只给出入口:

  • insert:nova.objects.instance.Instance#create
  • delete:nova.objects.instance.Instance#destroy
  • update:nova.objects.instance.Instance#save

API DB

API DB models:nova/db/sqlalchemy/api_models.py

由于我们没有启用Cells,其他几个表都是空的,目前只看到RequestSpec(对应数据库中request_specs表)这个表里面有数据库,记录的是nova-api传递给nova-scheduler的参数,目前看起来意义不大。

需要注意的是,unshelve的时候,会从request_specs这个表的spec字段中获取原始的availability-zone信息,用做解封存云主机时指定的az,所以如果原来的az被删除了,会导致unshelve失败,即使你修改了instances表里面该云主机的az记录,也有这个问题(原因很明显,unshelve过程中读取的是nova_api db request_specs表里面的instance的az,而不是nova db的instances表的az)。

看了下相关提交的commit message和bp,这个nova_api db,就是专门给cells用的,暂时用不到就先不分析了。部分参考资料:

版本管理

nova-manage db {sync, version}

这篇文章写的非常赞,大家直接看它就行了:OpenStack数据库版本控制工具简介

读写分离

nova已经实现了数据库主从读写分离,类似定时任务或者其他查询类请求都可以走slave connection(读从库),从而减轻主库的压力。

配置方法:

[database]
connection=nova:nova@mysqlmaster   ### 主库
slave_connection=nova:nova@mysqlslave   ### 从库

搜了下代码,貌似M版本只是修改了部分定时任务的数据库查询请求,master版本(Pike之后Queens之前)好像也没改动多少。

    @periodic_task.periodic_task(
        spacing=CONF.heal_instance_info_cache_interval)
    def _heal_instance_info_cache(self, context):
        """Called periodically.  On every call, try to update the
        ......
        """
        heal_interval = CONF.heal_instance_info_cache_interval
        if not heal_interval:
            return

        instance_uuids = getattr(self, '_instance_uuids_to_heal', [])
        instance = None

        LOG.debug('Starting heal instance info cache')

        if not instance_uuids:
            # The list of instances to heal is empty so rebuild it
            LOG.debug('Rebuilding the list of instances to heal')
            db_instances = objects.InstanceList.get_by_host(
                context, self.host, expected_attrs=[], use_slave=True)  ### slave
            for inst in db_instances:

如果没有配置slave_connection,应该是继续使用connection,否则像我们只用connection一个配置就乱套了。

具体实现读写分离这部分功能的代码应该是在oslo.db里面,参考:Use the new enginefacade from oslo_db,以及相关源码,这部分没具体分析。

nova-conductor

为了提升数据库安全性,所有数据库查询操作都改为走nova-conductor服务,在此之前是每个服务各自连接到数据库的,也就意味着每个部署了nova服务(并且需要连接数据库)的节点,都需要被授权访问数据库,可维护性和安全性都比较差(每次扩容节点都要修改数据库授权,每个节点都要配置数据库账户),所以社区增加了nova-conductor服务,集中处理数据库相关操作,通过MQ建立起需要查询数据库的服务和nova-conductor之间的联系(nova-api服务除外,它是直接访问数据库的,不走nova-conductor,因此一般把nova-conductor和nova-api部署在控制节点),这当然牺牲了一部分性能。现在conductor的功能已经越来越完善,比如节点调度相关任务的封装执行(跟nova-scheduler打交道)也都是在nova-conductor里面做的。

这部分也不多做介绍,参考:

sqlalchemy在云容科技的应用

云容科技为了提升云平台性能,也引入了自己的数据库,通过trigger同步OpenStack数据库相关表到自己的库,之后使用sqlalchemy的Core模块进行映射查询,之后对web服务提供查询API,无状态模式可以水平扩容,大大提升了查询响应速度,直观体现为VisionStack云平台web页面的用户响应时间比OpenStack原生horizon快很多(绝大部分页面的后台数据返回时间在毫秒级),用户体验得到了极大改善。

涉及到公司内部代码资产,这里就不分析代码了。