Document the rsync wire formats

This commit is contained in:
Kovid Goyal
2023-07-26 19:48:21 +05:30
parent a74e93d241
commit 8d24bf1162

View File

@@ -342,7 +342,7 @@ Repeated transfer of large files that have only changed a little between
the receiving and sending side can be sped up significantly by transmitting
binary deltas of only the changed portions. This protocol has built-in support
for doing that. This support uses the `rsync algorithm
<https://github.com/librsync/librsync>`__. In this algorithm first the
<https://rsync.samba.org/tech_report/tech_report.html>`__. In this algorithm, first the
receiving side sends a file signature that contains hashes of blocks
in the file. Then the sending side sends only those blocks that have changed.
The receiving side applies these deltas to the file to update it till it matches
@@ -409,8 +409,80 @@ The client then uses this delta to update the file.
The format of signatures and deltas
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
These come from `librsync <https://github.com/librsync/librsync>`__. If this
specification gains wider adoption, these formats should be documented here.
In what follows, all integers must be encoded in little-endian format,
regardless of the architecture of the machines involved. The XXH3 hash family
refers to `the xxHash algorithm
<https://github.com/Cyan4973/xxHash/blob/dev/doc/xxhash_spec.md>`__.
A signature first has a 12 byte header of the form:
.. code::
uint16 version
uint16 checksum_type
uint16 strong_hash_type
uint16 weak_hash_type
uint32 block_size
These fields define the parameters to the rsync algorithm. Allowed values are
currently all zero except for ``block_size``, which is usually the square root
of the file size, but implementations are free to use any algorithm they like
to arrive at the block size.
``checksum_type`` must be ``0`` which indicates using the XXH3-128 bit hash
to verify file integrity after transmission.
``strong_hash_type`` must be ``0`` which indicates using the XXH3-64 bit hash
to identify blocks.
``weak_hash_type`` must be ``0`` which indicates using the `rsync rolling
checksum hash <https://rsync.samba.org/tech_report/node3.html>`__ to identify
blocks, weakly.
After the header comes the list of block signatures. The number of blocks is
unknown allowing for streaming, the transfer protocol takes care of indicating
end-of-stream via an ``action=end_data`` packet. Each signature in the list is of the form:
.. code::
uint64 index
uint32 weak_hash
uint64 strong_hash
Here, ``index`` is the zero-based block number. ``weak_hash`` is the weak, but easy
to calculate hash of the block and strong hash is a stronger hash of the block
that is very unlikely to collide.
The algorithms used for these hashes are specified by the signature header
above. Given the ``block_size`` from the header and ``index`` the position of a
block in the file is: ``index * block_size``.
Once the sending side receives the signature, it calculates a *delta* based on
the actual file contents and transmits that delta to the receiving side. The delta
is of the form of a list of *operations*. An operation is a single byte
denoting the operation type followed by variable length data depending on the
type. The types of operations are:
``Block (type=0)``
Followed by an 8 byte ``uint64`` that is the block index. It means copy the
specified block from the existing file to the output, unmodified.
``Data (type=1)``
Followed by a 4 byte ``uint32`` that is the size of the payload and then the
payload itself. The payload must be written to the output.
``Hash (type=2)``
Followed by a 2 byte ``uint16`` specifying the size of the hash checksum and
then the checksum itself. The checksum of the output file must match this
checksum. The algorithm used to calculate the checksum is specified in the
signature header.
``BlockRange (type=3)``
Followed by an 8 byte ``uint64`` that is the starting block index and then
a 4 byte ``uint32`` (``N``) that is the number of additional blocks. Works just
like ``Block`` above, except that after copying the block an additional (``N``) more
blocks must be copied.
Compression
--------------