Protocol Buffers: Python Generated Code

转载 Google Developers   2014-12-23 13:56  

This page describes exactly what Python definitions the protocol buffer compiler generates for any given protocol definition. You should read the language guide before reading this document.

The Python Protocol Buffers implementation is a little different from C++ and Java. In Python, the compiler only outputs code to build descriptors for the generated classes, and a Python metaclass does the real work. This document describes what you get after the metaclass has been applied.

Compiler Invocation

The protocol buffer compiler produces Python output when invoked with the --python_out= command-line flag. The parameter to the --python_out= option is the directory where you want the compiler to write your Python output. The compiler creates a .py file for each .proto file input. The names of the output files are computed by taking the name of the .proto file and making two changes:

  • The extension (.proto) is replaced with
  • The proto path (specified with the --proto_path= or -I command-line flag) is replaced with the output path (specified with the --python_out= flag).

So, for example, let's say you invoke the compiler as follows:

protoc --proto_path=src --python_out=build/gen src/foo.proto src/bar/baz.proto

The compiler will read the files src/foo.proto and src/bar/baz.proto and produce two output files: build/gen/ and build/gen/bar/ The compiler will automatically create the directory build/gen/bar if necessary, but it will not create build or build/gen; they must already exist.

Note that if the .proto file or its path contains any characters which cannot be used in Python module names (for example, hyphens), they will be replaced with underscores. So, the file foo-bar.proto becomes the Python file


When outputting Python code, the protocol buffer compiler's ability to output directly to ZIP archives is particularly convenient, as the Python interpreter is able to read directly from these archives if placed in the PYTHONPATH. To output to a ZIP file, simply provide an output location ending in .zip.


The number 2 in the extension designates version 2 of Protocol Buffers. Version 1 was used primarily inside Google, though you might be able to find parts of it included in other Python code that was released before Protocol Buffers. Since version 2 of Python Protocol Buffers has a completely different interface, and since Python does not have compile-time type checking to catch mistakes, we chose to make the version number be a prominent part of generated Python file names.


The Python code generated by the protocol buffer compiler is completely unaffected by the package name defined in the .proto file. Instead, Python packages are identified by directory structure.


Given a simple message declaration:

1 message Foo {}

The protocol buffer compiler generates a class called Foo, which subclasses google.protobuf.Message. The class is a concrete class; no abstract methods are left unimplemented. Unlike C++ and Java, Python generated code is unaffected by the optimize_for option in the .proto file; in effect, all Python code is optimized for code size.

You should not create your own Foo subclasses. Generated classes are not designed for subclassing and may lead to "fragile base class" problems. Besides, implementation inheritance is bad design.

Python message classes have no particular public members other than those defined by the Message interface and those generated for nested fields, messages, and enum types (described below). Message provides methods you can use to check, manipulate, read, or write the entire message, including parsing from and serializing to binary strings. In addition to these methods, the Foo class defines the following static methods:

  • FromString(s): Returns a new message instance deserialized from the given string.

Note that you can also use the text_format module to work with protocol messages in text format: for example, the Merge() method lets you merge an ASCII representation of a message into an existing message.

A message can be declared inside another message. For example: message Foo { message Bar { } }

In this case, the Bar class is declared as a static member of Foo, so you can refer to it as Foo.Bar.


For each field in a message type, the corresponding class has a member with the same name as the field. How you can manipulate the member depends on its type.

As well as accessor methods, the compiler generates an integer constant for each field containing its field number. The constant name is the field name converted to upper-case followed by _FIELD_NUMBER. For example, given the field optional int32 foo_bar = 5;, the compiler will generate the constant FOO_BAR_FIELD_NUMBER = 5.

Singular Fields

If you have a singular (optional or required) field foo of any non-message type, you can manipulate the field foo as if it were a regular field. For example, if foo's type is int32, you can say:

1 = 123
2 print

Note that setting foo to a value of the wrong type will raise a TypeError.

If foo is read when it is not set, its value is the default value for that field. To check if foo is set, or to clear the value of foo, you must call the HasField() or ClearField() methods of the Message interface. For example:

1 assert not message.HasField("foo")
2 = 123
3 assert message.HasField("foo")
4 message.ClearField("foo")
5 assert not message.HasField("foo")

Singular Message Fields

Message types work slightly differently. You cannot assign a value to an embedded message field. Instead, assigning a value to any field within the child message implies setting the message field in the parent. So, for example, let's say you have the following .proto definition:

1 message Foo {
2   optional Bar bar = 1;
3 }
4 message Bar {
5   optional int32 i = 1;
6 }

You cannot do the following:

1 foo = Foo()
2 = Bar()  # WRONG!

Instead, to set bar, you simply assign a value directly to a field within bar, and - presto! - foo has a bar field:

1 foo = Foo()
2 assert not foo.HasField("bar")
3 = 1
4 assert foo.HasField("bar")
5 assert == 1
6 foo.ClearField("bar")
7 assert not foo.HasField("bar")
8 assert == 0  # Default value

Similarly, you can set bar using the Message interface's CopyFrom() method. This copies all the values from another message of the same type as bar.


Note that simply reading a field inside bar does not set the field:

1 foo = Foo()
2 assert not foo.HasField("bar")
3 print  # Print i's default value
4 assert not foo.HasField("bar")

Repeated Fields

Repeated fields are represented as an object that acts like a Python sequence. As with embedded messages, you cannot assign the field directly, but you can manipulate it. For example, given this message definition:

1 message Foo {
2   repeated int32 nums = 1;
3 }

You can do the following:

 1 foo = Foo()
 2 foo.nums.append(15)        # Appends one value
 3 foo.nums.extend([32, 47])  # Appends an entire list
 5 assert len(foo.nums) == 3
 6 assert foo.nums[0] == 15
 7 assert foo.nums[1] == 32
 8 assert foo.nums == [15, 32, 47]
10 foo.nums[1] = 56    # Reassigns a value
11 assert foo.nums[1] == 56
12 for i in foo.nums:  # Loops and print
13   print i
14 del foo.nums[:]     # Clears list (works just like in a Python list)

The ClearField() method of the Message interface works as well in addition to using Python del.

Repeated Message Fields

Repeated messages works similar to repeated scalar fields, except the corresponding Python object does not have an append() function. Instead, it has an add() function that creates a new message object, appends it to the list, and returns it for the caller to fill in. It also has an extend() function that appends an entire list of messages, but makes a copy of every message in the list. This is done so that messages are always owned by the parent message to avoid circular references and other confusion that can happen when a mutable data structure has multiple owners.

For example, given this message definition:

1 message Foo {
2   repeated Bar bars = 1;
3 }
4 message Bar {
5   optional int32 i = 1;
6   optional int32 j = 2;
7 }

You can do the following:

 1 foo = Foo()
 2 bar = foo.bars.add()        # Adds a Bar then modify
 3 bar.i = 15
 4 foo.bars.add().i = 32       # Adds and modify at the same time
 5 new_bar = Bar()
 6 new_bar.i = 47
 7 foo.bars.extend([new_bar])  # Uses extend() to copy
 9 assert len(foo.bars) = 3
10 assert foo.bars[0].i == 15
11 assert foo.bars[1].i == 32
12 assert foo.bars[2].i == 47
13 assert foo.bars[2] == new_bar      # The extended message is equal,
14 assert foo.bars[2] is not new_bar  # but it is a copy!
16 foo.bars[1].i = 56    # Modifies a single element
17 assert foo.bars[1].i == 56
18 for bar in foo.bars:  # Loops and print
19   print bar.i
20 del foo.bars[:]       # Clears list
22 # add() also forwards keyword arguments to the concrete class.
23 # For example, you can do:
25 foo.bars.add(i = 12, j = 13)


In Python, enums are just integers. A set of integral constants are defined corresponding to the enum's defined values. For example, given:

1 message Foo {
2   enum SomeEnum {
3     VALUE_A = 1;
4     VALUE_B = 5;
5     VALUE_C = 1234;
6   }
7   optional SomeEnum bar = 1;
8 }

The constants VALUE_A, VALUE_B, and VALUE_C are defined with values 1, 5, and 1234, respectively. No type corresponding to SomeEnum is defined. If an enum is defined in the outer scope, the values are module constants; if it is defined within a message (like above), they become static members of that message class.

An enum field works just like a scalar field. It does not do any type checking in the setter or getter.

1 foo = Foo()
2 = Foo.VALUE_A
3 assert == 1
4 assert == Foo.VALUE_A

Note that in C++ and Java, an enum field cannot contain a numeric value other than those defined for the enum type. If an unknown enum value is encountered while parsing, the field will be treated as if its tag number were unknown. Therefore, you should never assign an enum field to an undefined value in Python, either. A future version of the library may explicitly disallow this.

Enums have a number of utility methods for getting field names from values and vice versa, lists of fields, and so on - these are defined in So, for example, if you have the following standalone enum in myproto.proto:

1 enum SomeEnum {
2     VALUE_A = 1;
3     VALUE_B = 5;
4     VALUE_C = 1234;
5 } can do this:

1 self.assertEqual('VALUE_A', myproto_pb2.SomeEnum.Name(myproto_pb2.VALUE_A))


Given a message with a oneof:

1 message Foo {
2   oneof test_oneof {
3      string name = 1;
4      int32 serial_number = 2;
5   }
6 }

The Python class corresponding to Foo will have members called name and serial_number with accessor methods just like regular fields. However, unlike regular fields, at most one of the fields in a oneof can be set at a time, which is ensured by the runtime. For example:

1 message = Foo()
2 = "Bender"
3 assert message.HasField("name")
4 message.serial_number = 2716057
5 assert message.HasField("serial_number")
6 assert not message.HasField("name")

The message class also has a WhichOneof method that lets you find out which field (if any) in the oneof has been set. This method returns the name of the field that is set, or None if nothing has been set:

1 assert message.WhichOneof("test_oneof") is None
2 = "Bender"
3 assert message.WhichOneof("test_oneof") == "name"

HasField and ClearField also accept oneof names in addition to field names:

1 assert not message.HasField("test_oneof")
2 = "Bender"
3 assert message.HasField("test_oneof")
4 message.serial_number = 2716057
5 assert message.HasField("test_oneof")
6 message.ClearField("test_oneof")
7 assert not message.HasField("test_oneof")
8 assert not message.HasField("serial_number")

Note that calling ClearField on a oneof just clears the currently set field.


Given a message with an extension range:

1 message Foo {
2   extensions 100 to 199;
3 }

The Python class corresponding to Foo will have a member called Extensions, which is a dictionary mapping extension identifiers to their current values.

Given an extension definition:

1 extend Foo {
2   optional int32 bar = 123;
3 }

The protocol buffer compiler generates an "extension identifier" called bar. The identifier acts as a key to the Extensions dictionary. The result of looking up a value in this dictionary is exactly the same as if you accessed a normal field of the same type. So, given the above example, you could do:

1 foo = Foo()
2 foo.Extensions[] = 2
3 assert foo.Extensions[] == 2

Note that you need to specify the extension identifier constant, not just a string name: this is because it's possible for multiple extensions with the same name to be specified in different scopes.

Analogous to normal fields, Extensions[...] returns a message object for singular messages and a sequence for repeated fields.

The Message interface's HasField() and ClearField() methods do not work with extensions; you must use HasExtension() and ClearExtension() instead.


If the .proto file contains the following line:

1 option py_generic_services = true;

Then the protocol buffer compiler will generate code based on the service definitions found in the file as described in this section. However, the generated code may be undesirable as it is not tied to any particular RPC system, and thus requires more levels of indirection that code tailored to one system. If you do NOT want this code to be generated, add this line to the file:

1 option py_generic_services = false;

If neither of the above lines are given, the option defaults to false, as generic services are deprecated. (Note that prior to 2.4.0, the option defaults to true)

RPC systems based on .proto-language service definitions should provide plugins to generate code approriate for the system. These plugins are likely to require that abstract services are disabled, so that they can generate their own classes of the same names. Plugins are new in version 2.3.0 (January 2010).

The remainder of this section describes what the protocol buffer compiler generates when abstract services are enabled.


Given a service definition:

1 service Foo {
2   rpc Bar(FooRequest) returns(FooResponse);
3 }

The protocol buffer compiler will generate a class Foo to represent this service. Foo will have a method for each method defined in the service definition. In this case, the method Bar is defined as:

1 def Bar(self, rpc_controller, request, done)

The parameters are equivalent to the parameters of Service.CallMethod(), except that the method_descriptor argument is implied.

These generated methods are intended to be overridden by subclasses. The default implementations simply call controller.SetFailed() with an error message indicating that the method is unimplemented, then invoke the done callback. When implementing your own service, you must subclass this generated service and implement its methods as appropriate.

Foo subclasses the Service interface. The protocol buffer compiler automatically generates implementations of the methods of Service as follows:

  • GetDescriptor: Returns the service's ServiceDescriptor.
  • CallMethod: Determines which method is being called based on the provided method descriptor and calls it directly.
  • GetRequestClass and GetResponseClass: Returns the class of the request or response of the correct type for the given method.


The protocol buffer compiler also generates a "stub" implementation of every service interface, which is used by clients wishing to send requests to servers implementing the service. For the Foo service (above), the stub implementation Foo_Stub will be defined.

Foo_Stub is a subclass of Foo. Its constructor takes an RpcChannel as a parameter. The stub then implements each of the service's methods by calling the channel's CallMethod() method.

The Protocol Buffer library does not include an RPC implementation. However, it includes all of the tools you need to hook up a generated service class to any arbitrary RPC implementation of your choice. You need only provide implementations of RpcChannel and RpcController.

Plugin Insertion Points

Code generator plugins which want to extend the output of the Python code generator may insert code of the following types using the given insertion point names.

  • imports: Import statements.
  • module_scope: Top-level declarations.
  • class_scope:TYPENAME: Member declarations that belong in a message class. TYPENAME is the full proto name, e.g. package.MessageType.


Do not generate code which relies on private class members declared by the standard code generator, as these implementation details may change in future versions of Protocol Buffers.

C++ Implementation

There is also an experimental C++ implementation for Python messages via a Python extension for better performance. Implementation type is controlled by an environment variable PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION (valid values: "cpp" and "python"). The default value is currently "python" but will be changed to "cpp" in future release.

Note that the environment variable needs to be set before installing the protobuf library, in order to build and install the python extension. The C++ implementation also requires CPython platforms. See python/INSTALL.txt for detailed install instructions.