mirror of
https://github.com/kovidgoyal/kitty
synced 2026-06-08 14:18:26 +02:00
Document the rsync wire formats
This commit is contained in:
@@ -342,7 +342,7 @@ Repeated transfer of large files that have only changed a little between
|
|||||||
the receiving and sending side can be sped up significantly by transmitting
|
the receiving and sending side can be sped up significantly by transmitting
|
||||||
binary deltas of only the changed portions. This protocol has built-in support
|
binary deltas of only the changed portions. This protocol has built-in support
|
||||||
for doing that. This support uses the `rsync algorithm
|
for doing that. This support uses the `rsync algorithm
|
||||||
<https://github.com/librsync/librsync>`__. In this algorithm first the
|
<https://rsync.samba.org/tech_report/tech_report.html>`__. In this algorithm, first the
|
||||||
receiving side sends a file signature that contains hashes of blocks
|
receiving side sends a file signature that contains hashes of blocks
|
||||||
in the file. Then the sending side sends only those blocks that have changed.
|
in the file. Then the sending side sends only those blocks that have changed.
|
||||||
The receiving side applies these deltas to the file to update it till it matches
|
The receiving side applies these deltas to the file to update it till it matches
|
||||||
@@ -409,8 +409,80 @@ The client then uses this delta to update the file.
|
|||||||
The format of signatures and deltas
|
The format of signatures and deltas
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
These come from `librsync <https://github.com/librsync/librsync>`__. If this
|
In what follows, all integers must be encoded in little-endian format,
|
||||||
specification gains wider adoption, these formats should be documented here.
|
regardless of the architecture of the machines involved. The XXH3 hash family
|
||||||
|
refers to `the xxHash algorithm
|
||||||
|
<https://github.com/Cyan4973/xxHash/blob/dev/doc/xxhash_spec.md>`__.
|
||||||
|
|
||||||
|
A signature first has a 12 byte header of the form:
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
uint16 version
|
||||||
|
uint16 checksum_type
|
||||||
|
uint16 strong_hash_type
|
||||||
|
uint16 weak_hash_type
|
||||||
|
uint32 block_size
|
||||||
|
|
||||||
|
These fields define the parameters to the rsync algorithm. Allowed values are
|
||||||
|
currently all zero except for ``block_size``, which is usually the square root
|
||||||
|
of the file size, but implementations are free to use any algorithm they like
|
||||||
|
to arrive at the block size.
|
||||||
|
|
||||||
|
``checksum_type`` must be ``0`` which indicates using the XXH3-128 bit hash
|
||||||
|
to verify file integrity after transmission.
|
||||||
|
|
||||||
|
``strong_hash_type`` must be ``0`` which indicates using the XXH3-64 bit hash
|
||||||
|
to identify blocks.
|
||||||
|
|
||||||
|
``weak_hash_type`` must be ``0`` which indicates using the `rsync rolling
|
||||||
|
checksum hash <https://rsync.samba.org/tech_report/node3.html>`__ to identify
|
||||||
|
blocks, weakly.
|
||||||
|
|
||||||
|
After the header comes the list of block signatures. The number of blocks is
|
||||||
|
unknown allowing for streaming, the transfer protocol takes care of indicating
|
||||||
|
end-of-stream via an ``action=end_data`` packet. Each signature in the list is of the form:
|
||||||
|
|
||||||
|
.. code::
|
||||||
|
|
||||||
|
uint64 index
|
||||||
|
uint32 weak_hash
|
||||||
|
uint64 strong_hash
|
||||||
|
|
||||||
|
Here, ``index`` is the zero-based block number. ``weak_hash`` is the weak, but easy
|
||||||
|
to calculate hash of the block and strong hash is a stronger hash of the block
|
||||||
|
that is very unlikely to collide.
|
||||||
|
|
||||||
|
The algorithms used for these hashes are specified by the signature header
|
||||||
|
above. Given the ``block_size`` from the header and ``index`` the position of a
|
||||||
|
block in the file is: ``index * block_size``.
|
||||||
|
|
||||||
|
Once the sending side receives the signature, it calculates a *delta* based on
|
||||||
|
the actual file contents and transmits that delta to the receiving side. The delta
|
||||||
|
is of the form of a list of *operations*. An operation is a single byte
|
||||||
|
denoting the operation type followed by variable length data depending on the
|
||||||
|
type. The types of operations are:
|
||||||
|
|
||||||
|
``Block (type=0)``
|
||||||
|
Followed by an 8 byte ``uint64`` that is the block index. It means copy the
|
||||||
|
specified block from the existing file to the output, unmodified.
|
||||||
|
|
||||||
|
``Data (type=1)``
|
||||||
|
Followed by a 4 byte ``uint32`` that is the size of the payload and then the
|
||||||
|
payload itself. The payload must be written to the output.
|
||||||
|
|
||||||
|
``Hash (type=2)``
|
||||||
|
Followed by a 2 byte ``uint16`` specifying the size of the hash checksum and
|
||||||
|
then the checksum itself. The checksum of the output file must match this
|
||||||
|
checksum. The algorithm used to calculate the checksum is specified in the
|
||||||
|
signature header.
|
||||||
|
|
||||||
|
``BlockRange (type=3)``
|
||||||
|
Followed by an 8 byte ``uint64`` that is the starting block index and then
|
||||||
|
a 4 byte ``uint32`` (``N``) that is the number of additional blocks. Works just
|
||||||
|
like ``Block`` above, except that after copying the block an additional (``N``) more
|
||||||
|
blocks must be copied.
|
||||||
|
|
||||||
|
|
||||||
Compression
|
Compression
|
||||||
--------------
|
--------------
|
||||||
|
|||||||
Reference in New Issue
Block a user