Add an encoder optimized for in-memory buffers.

git-svn-id: https://svn.kapsi.fi/jpa/nanopb-dev@1088 e3a754e5-d11d-0410-8d38-ebb782a927b9
This commit is contained in:
Michael Poole
2011-12-21 04:36:10 +00:00
committed by Petteri Aimonen
parent 3979f9137f
commit accd93be8d
8 changed files with 676 additions and 48 deletions

View File

@@ -183,16 +183,20 @@ Encoding callbacks
::
bool (*encode)(pb_ostream_t *stream, const pb_field_t *field, const void *arg);
bool (*encode_buffer)(pb_strstream_t *stream, const pb_field_t *field, const void *arg);
When encoding, the callback should write out complete fields, including the wire type and field number tag. It can write as many or as few fields as it likes. For example, if you want to write out an array as *repeated* field, you should do it all in a single call.
When encoding, the callbacks should write out complete fields, including the wire type and field number tag. The callback can write as many or as few fields as it likes. For example, if you want to write out an array as *repeated* field, you should do it all in a single call.
Usually you can use `pb_encode_tag_for_field`_ to encode the wire type and tag number of the field. However, if you want to encode a repeated field as a packed array, you must call `pb_encode_tag`_ instead to specify a wire type of *PB_WT_STRING*.
Usually you can use `pb_encode_tag_for_field`_ (or `pb_encbuf_tag_for_field`_ for the *encode_buffer* callback) to encode the wire type and tag number of the field. However, if you want to encode a repeated field as a packed array, you must call `pb_encode_tag`_ (respectively, `pb_encbuf_tag`_) instead to specify a wire type of *PB_WT_STRING*.
If the callback is used in a submessage, it will be called multiple times during a single call to `pb_encode`_. In this case, it must produce the same amount of data every time. If the callback is directly in the main message, it is called only once.
If the callback is used in a submessage, *encode* will be called multiple times during a single call to `pb_encode`_. In this case, it must produce the same amount of data every time. If the callback is directly in the main message, or if you are using `pb_encode_buffer`_, the callback is called only once.
.. _`pb_encode`: reference.html#pb-encode
.. _`pb_encode_buffer`: reference.html#pb-encode-buffer
.. _`pb_encode_tag_for_field`: reference.html#pb-encode-tag-for-field
.. _`pb_encbuf_tag_for_field`: reference.html#pb-encbuf-tag-for-field
.. _`pb_encode_tag`: reference.html#pb-encode-tag
.. _`pb_encbuf_tag`: reference.html#pb-encbuf-tag
This callback writes out a dynamically sized string::
@@ -205,6 +209,17 @@ This callback writes out a dynamically sized string::
return pb_encode_string(stream, (uint8_t*)str, strlen(str));
}
The equivalent for in-memory buffers has to write the elements in the opposite order, because the buffer writers prepend their data::
bool write_string_buf(pb_strstream_t *stream, const pb_field_t *field, const void *arg)
{
char *str = get_string_from_somewhere();
if (!pb_encbuf_string(stream, (uint8_t*)str, strlen(str)))
return false;
return pb_encbuf_tag_for_field(stream, field));
}
Decoding callbacks
------------------
::
@@ -234,7 +249,7 @@ This callback reads multiple integers and prints them::
Message descriptor
==================
For using the *pb_encode* and *pb_decode* functions, you need a message descriptor describing the structure you wish to encode. This description is usually autogenerated from .proto file.
For using the *pb_encode*, *pb_encode_buffer* and *pb_decode* functions, you need a message descriptor describing the structure you wish to encode. This description is usually autogenerated from .proto file.
For example this submessage in the Person.proto file::
@@ -285,9 +300,9 @@ that array; in the previous example, they are
*Person_PhoneNumber_has*, *Person_PhoneNumber_set* and
*Person_PhoneNumber_clear*.
For convenience, *pb_encode* only checks these bits for optional
fields. *pb_decode* sets the corresponding bit for every field it
decodes, whether the field is optional or not.
For convenience, *pb_encode* and *pb_encode_buffer* only check these
bits for optional fields. *pb_decode* sets the corresponding bit for
every field it decodes, whether the field is optional or not.
.. Should there be a section here on pointer fields?

View File

@@ -14,6 +14,7 @@ Overall structure
For the runtime program, you always need *pb.h* for type declarations.
Depending on whether you want to encode, decode, or both, you also need *pb_encode.h/c* or *pb_decode.h/c*.
If you only encode into in-memory buffers, *pb_decode_buffer.h/c* should be slightly faster and smaller.
If your *.proto* file encodes submessages or other fields using pointers, you must compile *pb_decode.c* with a preprocessor macro named *MALLOC_HEADER* that is the name of a header with definitions (either as functions or macros) for *calloc()*, *realloc()* and *free()*. For a typical hosted configuration, this should be *<stdlib.h>*.
@@ -27,6 +28,7 @@ So a typical project might include these files:
- pb.h
- pb_decode.h and pb_decode.c (needed for decoding messages)
- pb_encode.h and pb_encode.c (needed for encoding messages)
- pb_encode_buffer.h and pb_encode_buffer.c (for encoding specifically into in-memory buffers)
2) Protocol description (you can have many):
- person.proto (just an example)
- person.pb.c (autogenerated, contains initializers for message descriptors)
@@ -89,6 +91,17 @@ Now in your main program do this to encode a message::
After that, buffer will contain the encoded message.
The number of bytes in the message is stored in *stream.bytes_written*.
Using *pb_encode_buffer.h/c* interface is very similar::
Example mymessage = {42};
uint8_t buffer[10];
pb_strstream_t stream = pb_str_from_buffer(buffer, sizeof(buffer));
pb_encode_buffer(&stream, Example_msg, &mymessage);
The encoded message will start at *stream.last* and continue until the
end of *buffer* (that is, it has length *buffer - stream.last*).
You can feed the message to *protoc --decode=Example message.proto* to verify its validity.
For complete examples of the simple cases, see *tests/test_decode1.c* and *tests/test_encode1.c*. For an example with network interface, see the *example* subdirectory.
@@ -112,6 +125,5 @@ This also generates a file called *breakpoints* which includes all lines returni
Wishlist
========
#) A specialized encoder for encoding to a memory buffer. Should serialize in reverse order to avoid having to determine submessage size beforehand.
#) A cleaner rewrite of the Python-based source generator.
#) Better performance for 16- and 8-bit platforms: use smaller datatypes where possible.

View File

@@ -102,6 +102,7 @@ Part of a message structure, for fields with type PB_HTYPE_CALLBACK::
union {
bool (*decode)(pb_istream_t *stream, const pb_field_t *field, void *arg);
bool (*encode)(pb_ostream_t *stream, const pb_field_t *field, const void *arg);
bool (*encode_buffer)(pb_strstream_t *stream, const pb_field_t *field, const void *arg);
} funcs;
void *arg;
@@ -109,11 +110,11 @@ Part of a message structure, for fields with type PB_HTYPE_CALLBACK::
The *arg* is passed to the callback when calling. It can be used to store any information that the callback might need.
When calling `pb_encode`_, *funcs.encode* is used, and similarly when calling `pb_decode`_, *funcs.decode* is used. The function pointers are stored in the same memory location but are of incompatible types. You can set the function pointer to NULL to skip the field.
When calling `pb_encode`_, *funcs.encode* is used, and similarly when calling `pb_encode_buffer`_, *funcs.encode_buffer* is used, and when calling `pb_decode`_, *funcs.decode* is used. The function pointers are stored in the same memory location but are of incompatible types. You can set the function pointer to NULL to skip the field.
pb_wire_type_t
--------------
Protocol Buffers wire types. These are used with `pb_encode_tag`_. ::
Protocol Buffers wire types. These are used with `pb_encode_tag`_ and `pb_encbuf_tag`_. ::
typedef enum {
PB_WT_VARINT = 0,
@@ -311,6 +312,107 @@ In Protocol Buffers format, the submessage size must be written before the subme
If the submessage contains callback fields, the callback function might misbehave and write out a different amount of data on the second call. This situation is recognized and *false* is returned, but it is up to the caller to ensure that the receiver of the message does not interpret it as valid data.
pb_encode_buffer.h
==================
An important note about this module is that data is written from the
back of the buffer to the front. That is, when you call
*pb_buf_write()*, it will place the bytes (in the order you provide
them) before the data currently in the buffer.
pb_strstream_from_buffer
------------------------
Constructs a buffer descriptor. This is just a helper function, it doesn't do anything you couldn't do yourself in a callback function. ::
pb_strstream_t pb_strstream_from_buffer(uint8_t *buf, size_t bufsize);
:buf: Memory buffer to write into.
:bufsize: Maximum number of bytes to write.
:returns: The buffer descriptor.
The descriptor only tracks the amount of space left; it does not count how many bytes have been written.
pb_buf_write
------------
Prepends data to an in-memory buffer. Always use this function, instead of trying to manage the pointers inside the buffer descriptor. ::
bool pb_buf_write(pb_strstream_t *stream, const uint8_t *buf, size_t count);
:stream: Descriptor for buffer to write to.
:buf: Pointer to buffer with the data to be written.
:count: Number of bytes to write.
:returns: True on success, false if maximum length is exceeded.
If there is not enough space, *stream* is not modified.
pb_encode_buffer
----------------
Encodes the contents of a structure as a protocol buffers message and writes it to a buffer. ::
bool pb_encode_buffer(pb_strstream_t *stream, const pb_message_t *msg, const void *src_struct);
:stream: Descriptor for buffer to write to.
:msg: A message descriptor, usually autogenerated.
:src_struct: Pointer to the data that will be serialized.
:returns: True on success, false if the buffer is too small or if a field encoder returns false.
pb_encbuf_varint
----------------
Encodes an unsigned integer in the varint_ format. ::
bool pb_encbuf_varint(pb_strstream_t *stream, uint64_t value);
:stream: Descriptor for buffer to write to. 1-10 bytes will be written.
:value: Value to encode.
:returns: True on success, false on IO error.
.. _varint: http://code.google.com/apis/protocolbuffers/docs/encoding.html#varints
pb_encbuf_tag
-------------
Finishes a field in the Protocol Buffers binary format: encodes the field number and the wire type of the data. ::
bool pb_encbuf_tag(pb_strstream_t *stream, pb_wire_type_t wiretype, int field_number);
:stream: Descriptor for buffer to write to. 1-5 bytes will be written.
:wiretype: PB_WT_VARINT, PB_WT_64BIT, PB_WT_STRING or PB_WT_32BIT
:field_number: Identifier for the field, defined in the .proto file.
:returns: True on success, false on IO error.
pb_encbuf_tag_for_field
-----------------------
Same as `pb_encbuf_tag`_, except takes the parameters from a *pb_field_t* structure. ::
bool pb_encbuf_tag_for_field(pb_strstream_t *stream, const pb_field_t *field);
:stream: Descriptor for buffer to write to. 1-5 bytes will be written.
:field: Field description structure. Usually autogenerated.
:returns: True on success, false on IO error or unknown field type.
This function only considers the LTYPE of the field. You can use it from your field callbacks, because the source generator writes correct LTYPE also for callback type fields.
Wire type mapping is as follows:
========================= ============
LTYPEs Wire type
========================= ============
VARINT, SVARINT PB_WT_VARINT
FIXED64 PB_WT_64BIT
STRING, BYTES, SUBMESSAGE PB_WT_STRING
FIXED32 PB_WT_32BIT
========================= ============
pb_encbuf_string
----------------
Writes the length of a string as varint and then contents of the string. Used for writing fields with wire type PB_WT_STRING. ::
bool pb_encbuf_string(pb_strstream_t *stream, const uint8_t *buffer, size_t size);
:stream: Descriptor for buffer to write to.
:buffer: Pointer to string data.
:size: Number of bytes in the string.
:returns: True on success, false on IO error.
pb_decode.h
===========