巧用 Protobuf 反射来优化代码，拒做 PB Boy - 哈喽比特

633次阅读 | 发布于4年以前

有些后台同学将自己称为 SQL Boy，因为负责的业务主要是对数据库进行增删改查。经常和 Proto 打交道的同学，是不是也会叫自己 PB Boy？因为大部分工作也是对 Proto 进行 SET 和 GET。面对大量重复且丑陋的代码，除了宏是否有更好的解决方法？本文结合 PB 反射给出了我在运营系统开发工作中的一些代码优化实践。

一、背景

Protobuf(下文称为 PB)是一种常见的数据序列化方式，常常用于后台微服务之间传递数据。

笔者目前主要的工作都是和表单打交道，而表单一般涉及到大量的数据输入，表单调用方一般将数据格式化为 JSON 后传给 CGI，而 CGI 和后台服务、后台服务之前会用 PB 传递数据。

在写代码时，经常会遇到一些丑陋的、圈复杂度较高、较难维护的关于 PB 的使用代码：

对字段的必填校验硬编码在代码中：如果需要变更校验规则，则需要修改代码；
一个字段一个 if 校验，圈复杂度较高：对传进来的字段每个字段都进行多种规则校验，例如长度，XSS，正则校验等，一个校验一个 if 代码，代码圈复杂度很高；
想要获取 PB 中所有的非空字段，形成一个 map<string,string>，需要大量的 if 判断和重复代码；
在后台服务间传递数据，由于模块由不同的人开发，导致相同字段的命名不一样，从一个 PB 中挑选一部分内容到另外一个 PB 中，需要大量的 GET 和 SET 代码。

是否可以有方法解决上面的几个问题呢？

答案是使用PB 反射。

二、PB 反射的使用

反射的一般定义如下：计算机程序在运行时可以访问、检测和修改它本身状态或行为。

protobuf 的类图如下：

从上图我们可以看出，Message 类继承于 MessageLite 类，业务一般自定义的 Person 类继承于 Message 类。

Descriptor 类和 Reflection 类都聚合于 Message，是弱依赖的关系。

类名	类描述
Descriptor	对 Message 进行描述，包括 message 的名字、所有字段的描述、原始 proto 文件内容等
FieldDescriptor	对 Message 中单个字段进行描述，包括字段名、字段属性、原始的 field 字段等
Reflection	提供了动态读和写 message 中单个字段能力

所以一般使用 PB 反射的步骤如下：

1. 通过Message获取单个字段的FieldDescriptor
2. 通过Message获取其Reflection
3. 通过Reflection来操作FieldDescriptor，从而动态获取或修改单个字段

获取 Descript、Reflection 的函数：

const google::protobuf::Reflection* pReflection = pMessage->GetReflection();
const google::protobuf::Descriptor* pDescriptor = pMessage->GetDescriptor();

获取 FieldDescriptor 的函数：

const google::protobuf::FieldDescriptor * pFieldDesc = pDescriptor->FindFieldByName(id);

下面分别介绍上面的三个类。

2.1 类 Descriptor 介绍

类 Descriptor 主要是对 Message 进行描述，包括 message 的名字、所有字段的描述、原始 proto 文件内容等，下面介绍该类中包含的函数。

首先是获取自身信息的函数：

const std::string & name() const; // 获取message自身名字
int field_count() const; // 获取该message中有多少字段
const FileDescriptor* file() const; // The .proto file in which this message type was defined. Never nullptr.

在类 Descriptor 中，可以通过如下方法获取类 FieldDescriptor：

const FieldDescriptor* field(int index) const; // 根据定义顺序索引获取，即从0开始到最大定义的条目
const FieldDescriptor* FindFieldByNumber(int number) const; // 根据定义的message里面的顺序值获取（option string name=3，3即为number）
const FieldDescriptor* FindFieldByName(const string& name) const; // 根据field name获取
const FieldDescriptor* Descriptor::FindFieldByLowercaseName(const std::string & lowercase_name)const; // 根据小写的field name获取
const FieldDescriptor* Descriptor::FindFieldByCamelcaseName(const std::string & camelcase_name) const; // 根据驼峰的field name获取

其中FieldDescriptor* field(int index)和FieldDescriptor* FindFieldByNumber(int number)这个函数中index和number的含义是不一样的，如下所示：

message Student{
  optional string name = 1;
  optional string gender = 2;
  optional string phone = 5;
}

其中字段phone，其index为 5，但是其number为 2。

同时还有一个我们在调试中经常使用的函数：

std::string Descriptor::DebugString(); // 将message转化成人可以识别出的string信息

2.2 类 FieldDescriptor 介绍

类 FieldDescriptor 的作用主要是对 Message 中单个字段进行描述，包括字段名、字段属性、原始的 field 字段等。

其获取获取自身信息的函数：

const std::string & name() const; // Name of this field within the message.
const std::string & lowercase_name() const; // Same as name() except converted to lower-case.
const std::string & camelcase_name() const; // Same as name() except converted to camel-case.
CppType cpp_type() const; //C++ type of this field.

其中cpp_type()函数是来获取该字段是什么类型的，在 PB 中，类型的类目如下：

enum FieldDescriptor::Type {
  TYPE_DOUBLE = = 1,
  TYPE_FLOAT = = 2,
  TYPE_INT64 = = 3,
  TYPE_UINT64 = = 4,
  TYPE_INT32 = = 5,
  TYPE_FIXED64 = = 6,
  TYPE_FIXED32 = = 7,
  TYPE_BOOL = = 8,
  TYPE_STRING = = 9,
  TYPE_GROUP = = 10,
  TYPE_MESSAGE = = 11,
  TYPE_BYTES = = 12,
  TYPE_UINT32 = = 13,
  TYPE_ENUM = = 14,
  TYPE_SFIXED32 = = 15,
  TYPE_SFIXED64 = = 16,
  TYPE_SINT32 = = 17,
  TYPE_SINT64 = = 18,
  MAX_TYPE = = 18
}

类 FieldDescriptor 中还可以判断字段是否是必填，还是选填或者重复：

bool is_required() const; // 判断字段是否是必填
bool is_optional() const; // 判断字段是否是选填
bool is_repeated() const; // 判断字段是否是重复值

类 FieldDescriptor 中还可以获取单个字段的index或者tag:

int number() const; // Declared tag number.
int index() const; //Index of this field within the message's field array, or the file or extension scope's extensions array.

类 FieldDescriptor 中还有一个支持扩展的函数，函数如下：

// Get the FieldOptions for this field.  This includes things listed in
// square brackets after the field definition.  E.g., the field:
//   optional string text = 1 [ctype=CORD];
// has the "ctype" option set.  Allowed options are defined by FieldOptions in
// descriptor.proto, and any available extensions of that message.
const FieldOptions & FieldDescriptor::options() const

具体关于该函数的讲解在 2.4 章。

2.3 类 Reflection 介绍

该类提供了动态读、写 message 中单个字段能力。

读单个字段的函数如下：

// 这里由于篇幅，省略了一部分代码，后面的代码部分也有省略，有需要的可以自行阅读源码。
int32 GetInt32(const Message & message, const FieldDescriptor * field) const

std::string GetString(const Message & message, const FieldDescriptor * field) const

const Message & GetMessage(const Message & message, const FieldDescriptor * field, MessageFactory * factory = nullptr) const // 读取单个message字段

写单个字段的函数如下：

void SetInt32(Message * message, const FieldDescriptor * field, int32 value) const

void SetString(Message * message, const FieldDescriptor * field, std::string value) const

获取重复字段的函数如下：

int32 GetRepeatedInt32(const Message & message, const FieldDescriptor * field, int index) const

std::string GetRepeatedString(const Message & message, const FieldDescriptor * field, int index) const

const Message & GetRepeatedMessage(const Message & message, const FieldDescriptor * field, int index) const

写重复字段的函数如下：

void SetRepeatedInt32(Message * message, const FieldDescriptor * field, int index, int32 value) const

void SetRepeatedString(Message * message, const FieldDescriptor * field, int index, std::string value) const

void SetRepeatedEnumValue(Message * message, const FieldDescriptor * field, int index, int value) const // Set an enum field's value with an integer rather than EnumValueDescriptor. more..

新增重复字段设计如下：

void AddInt32(Message * message, const FieldDescriptor * field, int32 value) const

void AddString(Message * message, const FieldDescriptor * field, std::string value) const

另外有一个较为重要的函数，其可以批量获取字段描述并将其放置到 vector 中：

void Reflection::ListFields(const Message & message, std::vector< const FieldDescriptor * > * output) const

2.4 options 介绍

PB 允许在 proto 中自定义选项并使用选项。在定义 message 的字段时，不仅可以定义字段内容，还可以设置字段的属性，比如校验规则，简介等，结合反射，可以实现丰富丰富多彩的应用。

下面来介绍下：

import "google/protobuf/descriptor.proto";

extend google.protobuf.FieldOptions {
  optional uint32 attr_id              = 50000; //字段id
  optional bool is_need_encrypt        = 50001 [default = false]; // 字段是否加密,0代表不加密，1代表加密
  optional string naming_conventions1  = 50002; // 商户组命名规范
  optional uint32 length_min           = 50003  [default = 0]; // 字段最小长度
  optional uint32 length_max           = 50004  [default = 1024]; // 字段最大长度
  optional string regex                = 50005; // 该字段的正则表达式
}

message SubMerchantInfo {
  // 商户名称
  optional string merchant_name = 1 [
    (attr_id) = 1,
    (is_encrypt) = 0,
    (naming_conventions1) = "company_name",
    (length_min) = 1,
    (length_max) = 80,
    (regex.field_rules) = "[a-zA-Z0-9]"
  ];

使用方法如下：

#include <google/protobuf/descriptor.h>
#include <google/protobuf/message.h>

std::string strRegex = FieldDescriptor->options().GetExtension(regex);

uint32 dwLengthMinp = FieldDescriptor->options().GetExtension(length_min);

bool bIsNeedEncrypt = FieldDescriptor->options().GetExtension(is_need_encrypt);

三、PB 反射的进阶使用

第二章给出了 PB 反射，以及具体的使用细节，在本章中，作者结合自己日常的代码，给出 PB 反射一些使用场景。并且以开发一个表单系统为例，讲一下 PB 反射在开发表单系统中的进阶使用。

3.1 获取 PB 中所有非空字段

在业务中，经常会需要获取某个 Message 中所有非空字段，形成一个 map<string,string>，使用 PB 反射写法如下：

#include "pb_util.h"

#include <sstream>

namespace comm_tools {
int PbToMap(const google::protobuf::Message &message,
            std::map<std::string, std::string> &out) {
#define CASE_FIELD_TYPE(cpptype, method, valuetype)                            \
  case google::protobuf::FieldDescriptor::CPPTYPE_##cpptype: {                 \
    valuetype value = reflection->Get##method(message, field);                 \
    std::ostringstream oss;                                                    \
    oss << value;                                                              \
    out[field->name()] = oss.str();                                            \
    break;                                                                     \
  }

#define CASE_FIELD_TYPE_ENUM()                                                 \
  case google::protobuf::FieldDescriptor::CPPTYPE_ENUM: {                      \
    int value = reflection->GetEnum(message, field)->number();                 \
    std::ostringstream oss;                                                    \
    oss << value;                                                              \
    out[field->name()] = oss.str();                                            \
    break;                                                                     \
  }

#define CASE_FIELD_TYPE_STRING()                                               \
  case google::protobuf::FieldDescriptor::CPPTYPE_STRING: {                    \
    std::string value = reflection->GetString(message, field);                 \
    out[field->name()] = value;                                                \
    break;                                                                     \
  }

  const google::protobuf::Descriptor *descriptor = message.GetDescriptor();
  const google::protobuf::Reflection *reflection = message.GetReflection();

  for (int i = 0; i < descriptor->field_count(); i++) {
    const google::protobuf::FieldDescriptor *field = descriptor->field(i);
    bool has_field = reflection->HasField(message, field);

    if (has_field) {
      if (field->is_repeated()) {
        return -1; // 不支持转换repeated字段
      }

      const std::string &field_name = field->name();
      switch (field->cpp_type()) {
        CASE_FIELD_TYPE(INT32, Int32, int);
        CASE_FIELD_TYPE(UINT32, UInt32, uint32_t);
        CASE_FIELD_TYPE(FLOAT, Float, float);
        CASE_FIELD_TYPE(DOUBLE, Double, double);
        CASE_FIELD_TYPE(BOOL, Bool, bool);
        CASE_FIELD_TYPE(INT64, Int64, int64_t);
        CASE_FIELD_TYPE(UINT64, UInt64, uint64_t);
        CASE_FIELD_TYPE_ENUM();
        CASE_FIELD_TYPE_STRING();
      default:
        return -1; // 其他异常类型
      }
    }
  }

  return 0;
}
} // namespace comm_tools

通过上面的代码，如果需要在 proto 中增加字段，不再需要修改原来的代码。

3.2 将字段校验规则放置在 Proto 中

后台服务接收到前端传来的字段后，会对字段进行校验，比如必填校验，长度校验，正则校验，xss 校验等，这些规则我们常常会硬编码在代码中。但是随着后台字段的增加，校验规则代码会变得越来越多，越来越难维护。如果我们把字段的定义和校验规则和定义放在一起，这样是不是更好的维护？

示例 proto 如下：

syntax = "proto2";

package student;

import "google/protobuf/descriptor.proto";

message FieldRule{
    optional uint32 length_min = 1; // 字段最小长度
    optional uint32 id         = 2; // 字段映射id
}

extend google.protobuf.FieldOptions{
    optional FieldRule field_rule = 50000;
}

message Student{
    optional string name   =1 [(field_rule).length_min = 5, (field_rule).id = 1];
    optional string email = 2 [(field_rule).length_min = 10, (field_rule).id = 2];
}

然后我们自己实现 xss 校验，必填校验，长度校验，选项校验等代码。

示例校验最小长度代码如下：

#include <iostream>
#include "student.pb.h"
#include <google/protobuf/descriptor.h>
#include <google/protobuf/message.h>

using namespace std;
using namespace student;
using namespace google::protobuf;

bool minLengthCheck(const std::string &strValue, const uint32_t &dwLenthMin) {
    return strValue.size() < dwLenthMin;
}

int allCheck(const google::protobuf::Message &oMessage){
    const auto *poReflect = oMessage.GetReflection();

    vector<const FieldDescriptor *> vecFD;
    poReflect->ListFields(oMessage, &vecFD);

    for (const auto &poFiled : vecFD) {
        const auto &oFieldRule = poFiled->options().GetExtension(student::field_rule);
        if (poFiled->cpp_type() == google::protobuf::FieldDescriptor::CPPTYPE_STRING && !poFiled->is_repeated()) {
            // 类型是string并且选项非重复的才会校验字段长度类型
            const std::string strValue = poReflect->GetString(oMessage, poFiled);
            const std::string strName = poFiled->name();

            if (oFieldRule.has_length_min()) {
                // 有才进行校验，没有则不进行校验
                if (minLengthCheck(strValue, oFieldRule.length_min())) {
                    cout << "the length of " << strName << " is lower than " << oFieldRule.length_min()<<endl;
                } else {
                    cout << "check min lenth pass"<<endl;
                }
            }
        }
    }
    return 0;
}

int main() {
    Student oStudent1;
    oStudent1.set_name("xiao");

    Student oStudent2;
    oStudent2.set_name("xiaowei");

    allCheck(oStudent1);
    allCheck(oStudent2);

    return 0;
}

如上，如果需要校验最大长度，必填，xss 校验，只需要使用工厂模式，扩展代码即可。

新增一个字段或者变更某个字段的校验规则，只需要修改 Proto，不需要修改代码，从而防止因变更代码导致错误。

3.3 基于 PB 反射的前端页面自动生成方案

在我们常见的运营系统中，经常会涉及到各种各样的表单页面。在前后端交互方面，当需要增加字段或者变更字段的校验规则时，需要面临如下问题：

前端：针对新字段编写 html 代码，同时需要修改前端页面；
后台：针对每个字段做接收，并进行校验。

每增加或变更一个字段，我们都需要在前端和后台进行修改，工作量大，同时频繁变更容易导致错误。有什么方法可以解决这些问题吗？答案是使用 PB 的反射能力。

通过获取 Message 中每个字段的描述然后返回给前端，前端根据字段描述来展示页面，并且对字段进行校验。同时通过这种方式，前后端可以共享一份表单校验规则。

在使用上述方案之后，当我们需要增加字段或者变更字段的校验规则时，只需要在 Proto 中修改字段，大大节省了工作量，同时避免了因发布带来的风险问题。

3.4 通用存储系统

在运营系统中，前端输入字段，传入到后台，后台校验字段之后，一般还需要把数据存储到数据库中。

对于某些运营系统来说，其希望能够快速接入一些数据，传统开发常常会面临如下问题：

如何在不增加或变更表结构的基础上，如何快速接入数据？
如何零开发实现频繁添加字段、新增渠道等需求？
如何兼容不同业务、不同数据协议（比如 PB 中的不同 message）？

答案是使用 PB 的反射，使得有结构的数据转换为非结构的数据，然后存储到非关系型数据库（在微信支付侧一般存入到 table kv）中。

以 3.2 节中的 Proto 为例，举例如下，学生类中定义了两个字段，name 和 email 字段，原始信息为：

Student oStudent;
oStudent.set_name("xiaowei");
oStudent.set_email("test@tencent.com");

通过 PB 的反射，可以转化为平铺的结构：

[{"id":"1","value":"xiaowei"},{"id":"2","value":"test@tencent.com"}]

转化为平铺结构后，可以快速存入到数据库中。如果现在学生信息里需要增加一个字段 address，则不需要修改表结构，从而完成存储动作。利用 PB 反射，可以完成有结构数据和无结构数据之间的转换，达到存储和业务解耦的特性。

四、总结

本文首先给出了 PB 的反射函数，然后再结合自己平时负责的工作，给出了 PB 的进阶使用。通过对 PB 的进阶使用，可以大大提高开发和维护的效率，同时提升代码的优雅度。有需要更进一步研究 PB 的，可以阅读其源代码，不得不说，通过阅读优秀代码能够极大的促进编程能力。

需要注意的是 PB 反射需要依赖大量计算资源，在密集使用 PB 的场景下，需要注意 CPU 的使用情况。