ݺߣ

ݺߣShare a Scribd company logo
Google Protocol Buffers
(Overview)
Sergey Podolsky
sergey.podolsky@gmail.com
History
Binary serialization - sucks
? Language-specific
? Not safe (see Effective Java)
? Not extensible
XML, JSON C back to 1999
? Too verbose
? Need to parse
? Slow performance
? Huge size if not compressed
? No strong types (int vs float)
? Need to store field names
Google Protocol Buffers
? Cross-language
? Schema evolution
? Compact
? Strongly typed
Message
Person.json Person.proto
{
"userName": "Martin",
"favouriteNumber": 1337,
"interests": ["daydreaming", "hacking"]
}
message Person {
required string user_name = 1;
optional int64 favourite_number = 2;
repeated string interests = 3;
}
Options
? Outer class
? Default value
? Deprecated
? Speed / Size
? Custom options
? desriptor.proto
Our protobuf use cases:
? Java, C++, C#
? Payload for ZeroC ICE
? IBM MQ / Solace messages
? DB raw data
? Log messages to disk
? Compress using TAR.GZIP
? Show as XML / JSON
? exe utility associated with protobuf files
Disadvantages
? No Map<K, V> / Dictionary<K, V>
? No Set<T>
? No short / int16 / uint16
? No interning
? Generated classes are immutable
? compiler vs library are not backwards compatible
? descriptor.proto is not backwards compatible
? Poor number of officially supported languages
? Enum is not extensible (unknown resets to 0)
Google Protocol Buffers
Apache Avro
? No tag
? Schema is required
? The entire record is tagged by schema ID
? Fields are matched by name
? No optional values: union { null, long } is used instead
? Resolution rules are used for server vs client schemas
Apache Avro
JSON Notation IDL
{
"type": "record",
"name": "Person",
"fields": [
{"name": "userName", "type": "string"},
{"name": "favouriteNumber", "type": ["null", "long"]},
{"name": "interests", "type": {"type": "array", "items": "string"}}
]
}
record Person {
string userName;
union { null, long } favouriteNumber;
array<string> interests;
}
Apache
? one-stop shop
? RPC framework
? Different serialization formats (protocols)
Apache
struct Person {
1: string userName,
2: optional i64 favouriteNumber,
3: list<string> interests
}
Comparison
Thrift Protobuf
Language Bindings
Java, C++, Python, C#, Cocoa, Erlang, Haskell, OCaml, Perl,
PHP, Ruby, Smalltalk
Java, C++, Python
Primitive Types
bool, byte, 16/32/64-bit integers, double, string, byte
sequence, map<t1,t2>, list<t>, set<t>
bool, 32/64-bit integers, float, double, string, byte
sequence, repeated properties act like lists
Enumerations Yes Yes
Constants Yes No
Composite Type Struct Message
Exception Handling Yes No
Documentation Lacking Good
License Apache BSD-style
Compiler C++ C++
RPC Interfaces Yes Yes
RPC Implementation Yes No
Composite Type Extensions No Yes
Data Versioning Yes Yes
Pros
- More languages supported out of the box
- Richer data structures than Protobuf (e.g.: Map and Set)
- Includes RPC implementation for services
- Slightly faster than Thrift when using "optimize_for =
SPEED"
- Serialized objects slightly smaller than Thrift due to more
aggressive data compression
- Better documentation
- API a bit cleaner than Thrift
Cons
- Good examples are hard to find
- Missing/incomplete documentation
- .proto can define services, but no RPC implementation is
defined (although stubs are generated for you).

More Related Content

Google Protocol Buffers

  • 1. Google Protocol Buffers (Overview) Sergey Podolsky sergey.podolsky@gmail.com
  • 3. Binary serialization - sucks ? Language-specific ? Not safe (see Effective Java) ? Not extensible
  • 4. XML, JSON C back to 1999 ? Too verbose ? Need to parse ? Slow performance ? Huge size if not compressed ? No strong types (int vs float) ? Need to store field names
  • 5. Google Protocol Buffers ? Cross-language ? Schema evolution ? Compact ? Strongly typed
  • 6. Message Person.json Person.proto { "userName": "Martin", "favouriteNumber": 1337, "interests": ["daydreaming", "hacking"] } message Person { required string user_name = 1; optional int64 favourite_number = 2; repeated string interests = 3; }
  • 7. Options ? Outer class ? Default value ? Deprecated ? Speed / Size ? Custom options ? desriptor.proto
  • 8. Our protobuf use cases: ? Java, C++, C# ? Payload for ZeroC ICE ? IBM MQ / Solace messages ? DB raw data ? Log messages to disk ? Compress using TAR.GZIP ? Show as XML / JSON ? exe utility associated with protobuf files
  • 9. Disadvantages ? No Map<K, V> / Dictionary<K, V> ? No Set<T> ? No short / int16 / uint16 ? No interning ? Generated classes are immutable ? compiler vs library are not backwards compatible ? descriptor.proto is not backwards compatible ? Poor number of officially supported languages ? Enum is not extensible (unknown resets to 0)
  • 11. Apache Avro ? No tag ? Schema is required ? The entire record is tagged by schema ID ? Fields are matched by name ? No optional values: union { null, long } is used instead ? Resolution rules are used for server vs client schemas
  • 12. Apache Avro JSON Notation IDL { "type": "record", "name": "Person", "fields": [ {"name": "userName", "type": "string"}, {"name": "favouriteNumber", "type": ["null", "long"]}, {"name": "interests", "type": {"type": "array", "items": "string"}} ] } record Person { string userName; union { null, long } favouriteNumber; array<string> interests; }
  • 13. Apache ? one-stop shop ? RPC framework ? Different serialization formats (protocols)
  • 14. Apache struct Person { 1: string userName, 2: optional i64 favouriteNumber, 3: list<string> interests }
  • 15. Comparison Thrift Protobuf Language Bindings Java, C++, Python, C#, Cocoa, Erlang, Haskell, OCaml, Perl, PHP, Ruby, Smalltalk Java, C++, Python Primitive Types bool, byte, 16/32/64-bit integers, double, string, byte sequence, map<t1,t2>, list<t>, set<t> bool, 32/64-bit integers, float, double, string, byte sequence, repeated properties act like lists Enumerations Yes Yes Constants Yes No Composite Type Struct Message Exception Handling Yes No Documentation Lacking Good License Apache BSD-style Compiler C++ C++ RPC Interfaces Yes Yes RPC Implementation Yes No Composite Type Extensions No Yes Data Versioning Yes Yes Pros - More languages supported out of the box - Richer data structures than Protobuf (e.g.: Map and Set) - Includes RPC implementation for services - Slightly faster than Thrift when using "optimize_for = SPEED" - Serialized objects slightly smaller than Thrift due to more aggressive data compression - Better documentation - API a bit cleaner than Thrift Cons - Good examples are hard to find - Missing/incomplete documentation - .proto can define services, but no RPC implementation is defined (although stubs are generated for you).