mirror of
https://github.com/kovidgoyal/kitty
synced 2026-06-10 18:48:54 +02:00
Change the spec to restrict safe strings further to avoid such bugs in other implementations. Reported by Edwin Hoffman.
536 lines
22 KiB
ReStructuredText
536 lines
22 KiB
ReStructuredText
File transfer over the TTY
|
|
===============================
|
|
|
|
There are sometimes situations where the TTY is the only convenient pipe
|
|
between two connected systems, for example, nested SSH sessions, a serial
|
|
line, etc. In such scenarios, it is useful to be able to transfer files
|
|
over the TTY.
|
|
|
|
This protocol provides the ability to transfer regular files, directories and
|
|
links (both symbolic and hard) preserving most of their metadata. It can
|
|
optionally use compression and transmit only binary diffs to speed up
|
|
transfers. However, since all data is base64 encoded for transmission over the
|
|
TTY, this protocol will never be competitive with more direct file transfer
|
|
mechanisms.
|
|
|
|
Overall design
|
|
----------------
|
|
|
|
The basic design of this protocol is around transfer "sessions". Since
|
|
untrusted software should not be able to read/write to another machines
|
|
filesystem, a session must be approved by the user in the terminal emulator
|
|
before any actual data is transmitted, unless a :ref:`pre-shared password is
|
|
provided <bypass_auth>`.
|
|
|
|
There can be either send or receive sessions. In send sessions files are sent
|
|
from remote client to the terminal emulator and vice versa for receive sessions.
|
|
Every session basically consists of sending metadata for the files first and
|
|
then sending the actual data. The session is a series of commands, every command
|
|
carrying the session id (which should be a random unique-ish identifier, to
|
|
avoid conflicts). The session is bi-directional with commands going both to and
|
|
from the terminal emulator. Every command in a session also carries an
|
|
``action`` field that specifies what the command does. The remaining fields in
|
|
the command are dependent on the nature of the command.
|
|
|
|
Let's look at some simple examples of sessions to get a feel for the protocol.
|
|
|
|
|
|
Sending files to the computer running the terminal emulator
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The client starts by sending a start send command::
|
|
|
|
→ action=send id=someid
|
|
|
|
It then waits for a status message from the terminal either
|
|
allowing the transfer or refusing it. Until this message is received
|
|
the client is not allowed to send any more commands for the session.
|
|
The terminal emulator should drop a session if it receives any commands
|
|
before sending an ``OK`` response. If the user accepts the transfer,
|
|
the terminal will send::
|
|
|
|
← action=status id=someid status=OK
|
|
|
|
Or if the transfer is refused::
|
|
|
|
← action=status id=someid status=EPERM:User refused the transfer
|
|
|
|
The client then sends one or more ``file`` commands with the metadata of the file it wants
|
|
to transfer::
|
|
|
|
→ action=file id=someid file_id=f1 name=/path/to/destination
|
|
→ action=file id=someid file_id=f2 name=/path/to/destination2 ftype=directory
|
|
|
|
The terminal responds with either ``OK`` for directories or ``STARTED`` for
|
|
files::
|
|
|
|
← action=status id=someid file_id=f1 status=STARTED
|
|
← action=status id=someid file_id=f2 status=OK
|
|
|
|
If there was an error with the file, for example, if the terminal does not have
|
|
permission to write to the specified location, it will instead respond with an
|
|
error, such as::
|
|
|
|
← action=status id=someid file_id=f1 status=EPERM:No permission
|
|
|
|
The client sends data for files using ``data`` commands. It does not need to
|
|
wait for the ``STARTED`` from the terminal for this, the terminal must discard data
|
|
for files that are not ``STARTED``. Data for a file is sent in individual
|
|
chunks of no larger than ``4096`` bytes. For example::
|
|
|
|
|
|
→ action=data id=someid file_id=f1 data=chunk of bytes
|
|
→ action=data id=someid file_id=f1 data=chunk of bytes
|
|
...
|
|
→ action=end_data id=someid file_id=f1 data=chunk of bytes
|
|
|
|
The sequence of data transmission for a file is ended with an ``end_data``
|
|
command. After each data packet is received the terminal replies with
|
|
an acknowledgement of the form::
|
|
|
|
← action=status id=someid file_id=f1 status=PROGRESS size=bytes written
|
|
|
|
After ``end_data`` the terminal replies with::
|
|
|
|
← action=status id=someid file_id=f1 status=OK size=bytes written
|
|
|
|
If an error occurs while writing the data, the terminal replies with an error
|
|
code and ignores further commands about that file, for example::
|
|
|
|
← action=status id=someid file_id=f1 status=EIO:Failed to write to file
|
|
|
|
Once the client has finished sending as many files as it wants to, it ends
|
|
the session with::
|
|
|
|
→ action=finish id=someid
|
|
|
|
At this point the terminal commits the session, applying file metadata,
|
|
creating links, etc. If any errors occur it responds with an error message,
|
|
such as::
|
|
|
|
← action=status id=someid status=Some error occurred
|
|
|
|
|
|
Receiving files from the computer running terminal emulator
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The client starts by sending a start receive command::
|
|
|
|
→ action=receive id=someid size=num_of_paths
|
|
|
|
It then sends a list of ``num_of_paths`` paths it is interested in
|
|
receiving::
|
|
|
|
→ action=file id=someid file_id=f1 name=/some/path
|
|
→ action=file id=someid file_id=f2 name=/some/path2
|
|
...
|
|
|
|
The client must then wait for responses from the terminal emulator. It
|
|
is an error to send anymore commands to to the terminal until an ``OK``
|
|
response is received from the terminal. The terminal wait for the user to accept
|
|
the request. If accepted, it sends::
|
|
|
|
← action=status id=someid status=OK
|
|
|
|
If permission is denied it sends::
|
|
|
|
← action=status id=someid status=EPERM:User refused the transfer
|
|
|
|
The terminal then sends the metadata for all requested files. If any of them
|
|
are directories, it traverses the directories recursively, listing all files.
|
|
Note that symlinks must not be followed, but sent as symlinks::
|
|
|
|
← action=file id=someid file_id=f1 mtime=XXX permissions=XXX name=/absolute/path status=file_id1 size=size_in_bytes file_type=type parent=file_id of parent
|
|
← action=file id=someid file_id=f1 mtime=XXX permissions=XXX name=/absolute/path2 status=file_id2 size=size_in_bytes file_type=type parent=file_id of parent
|
|
...
|
|
|
|
Here the ``file_id`` field is set to the ``file_id`` value sent from the client
|
|
and the ``status`` field is set to the actual file id for each file. This is
|
|
because a file query sent from the client can result in multiple actual files if
|
|
it is a directory. The ``parent`` field is the actual ``file_id`` of the directory
|
|
containing this file and is set for entries that are generated from client
|
|
requests that match directories. This allows the client to build an unambiguous picture
|
|
of the file tree.
|
|
|
|
Once all the files are listed, the terminal sends an ``OK`` response that also
|
|
specifies the absolute path to the home directory for the user account running
|
|
the terminal::
|
|
|
|
← action=status id=someid status=OK name=/path/to/home
|
|
|
|
If an error occurs while listing any of the files asked for by the client,
|
|
the terminal will send an error response like::
|
|
|
|
← action=status id=someid file_id=f1 status=ENOENT: Does not exist
|
|
|
|
Here, ``file_id`` is the same as was sent by the client in its initial query.
|
|
|
|
Now, the client can send requests for file data using the paths sent by the
|
|
terminal emulator::
|
|
|
|
→ action=file id=someid file_id=f1 name=/some/path
|
|
...
|
|
|
|
The terminal emulator replies with the data for the files, as a sequence of
|
|
``data`` commands each with a chunk of data no larger than ``4096`` bytes,
|
|
for each file (the terminal emulator should send the data for
|
|
one file at a time)::
|
|
|
|
|
|
← action=data id=someid file_id=f1 data=chunk of bytes
|
|
...
|
|
← action=end_data id=someid file_id=f1 data=chunk of bytes
|
|
|
|
If any errors occur reading file data, the terminal emulator sends an error
|
|
message for the file, for example::
|
|
|
|
← action=status id=someid file_id=f1 status=EIO:Could not read
|
|
|
|
Once the client is done reading data for all the files it expects, it
|
|
terminates the session with::
|
|
|
|
→ action=finished id=someid
|
|
|
|
Canceling a session
|
|
----------------------
|
|
|
|
A client can decide to cancel a session at any time (for example if the user
|
|
presses :kbd:`ctrl+c`). To cancel a session it sends a ``cancel`` action to the
|
|
terminal emulator::
|
|
|
|
→ action=cancel id=someid
|
|
|
|
The terminal emulator drops the session and sends a cancel acknowledgement::
|
|
|
|
← action=status id=someid status=CANCELED
|
|
|
|
The client **must** wait for the canceled response from the emulator discarding
|
|
any other responses till the cancel is received. If it does not wait, after
|
|
it quits the responses might end up being printed to screen.
|
|
|
|
Quieting responses from the terminal
|
|
-------------------------------------
|
|
|
|
The above protocol includes lots of messages from the terminal acknowledging
|
|
receipt of data, granting permission etc., acknowledging cancel requests, etc.
|
|
For extremely simple clients like shell scripts, it might be useful to suppress
|
|
these responses, which can be done by adding the ``quiet`` key to the start
|
|
session command::
|
|
|
|
→ action=send id=someid quiet=1
|
|
|
|
The key can take the values ``1`` - meaning suppress acknowledgement responses
|
|
or ``2`` - meaning suppress all responses including errors. Only actual data
|
|
responses are sent. Note that in particular this means acknowledgement of
|
|
permission for the transfer to go ahead is suppressed, so this is typically
|
|
useful only with :ref:`bypass_auth`.
|
|
|
|
.. _file_metadata:
|
|
|
|
File metadata
|
|
-----------------
|
|
|
|
File metadata includes file paths, permissions and modification times. They are
|
|
somewhat tricky as different operating systems support different kinds of
|
|
metadata. This specification defines a common minimum set which should work
|
|
across most operating systems.
|
|
|
|
File paths
|
|
File paths must be valid UTF-8 encoded POSIX paths (i.e. using the forward slash
|
|
``/`` as a separator). Linux systems allow non UTF-8 file paths, these
|
|
are not supported. A leading ``~/`` means a path is relative to the
|
|
``HOME`` directory. All path must be either absolute (i.e. with a leading
|
|
``/``) or relative to the HOME directory. Individual components of the
|
|
path must be no longer than 255 UTF-8 bytes. Total path length must be no
|
|
more than 4096 bytes. Paths from Windows systems must use the forward slash
|
|
as the separator, the first path component must be the drive letter with a
|
|
colon. For example: :file:`C:\some\file.txt` is represented as
|
|
:file:`/C:/some/file.txt`. For maximum portability, the following
|
|
characters *should* be omitted from paths (however implementations are free
|
|
to try to support them returning errors for non-representable paths)::
|
|
|
|
\ * : < > ? | /
|
|
|
|
File modification times
|
|
Must be represented as the number of nanoseconds since the UNIX epoch. An
|
|
individual file system may not store file metadata with this level of
|
|
accuracy in which case it should use the closest possible approximation.
|
|
|
|
File permissions
|
|
Represented as a number with the usual UNIX read, write and execute bits.
|
|
In addition, the sticky, set-group-id and set-user-id bits may be present.
|
|
Implementations should make a best effort to preserve as many bits as
|
|
possible. On Windows, there is only a read-only bit. When reading file
|
|
metadata all the ``WRITE`` bits should be set if the read only bit is clear
|
|
and cleared if it is set. When writing files, the read-only bit should be
|
|
set if the bit indicating write permission for the user is clear. The other
|
|
UNIX bits must be ignored when writing. When reading, all the ``READ`` bits
|
|
should always be set and all the ``EXECUTE`` bits should be set if the file is
|
|
directly executable by the Windows Operating system. There is no attempt to
|
|
map Window's ACLs to permission bits.
|
|
|
|
|
|
Symbolic and hard links
|
|
---------------------------
|
|
|
|
Symbolic and hard links can be preserved by this protocol.
|
|
|
|
.. note::
|
|
In the following when target paths of symlinks are sent as actual paths, they must be
|
|
encoded in the same way as discussed in :ref:`file_metadata`. It is up to
|
|
the receiving side to translate them into appropriate paths for the local
|
|
operating system. This may not always be possible, in which case either the
|
|
symlink should not be created or a broken symlink should be created.
|
|
|
|
|
|
Sending links to the terminal emulator
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
When sending files to the terminal emulator, the file command has the form::
|
|
|
|
→ action=file id=someid file_id=f1 name=/path/to/link file_type=link
|
|
→ action=file id=someid file_id=f2 name=/path/to/symlink file_type=symlink
|
|
|
|
Then, when the client is sending data for the files, for hardlinks, the data
|
|
will be the ``file_id`` of the target file (assuming the target file is also
|
|
being transmitted, otherwise the hard link should be transmitted as a plain
|
|
file)::
|
|
|
|
→ action=end_data id=someid file_id=f1 data=target_file_id_encoded_as_utf8
|
|
|
|
For symbolic links, the data is a little more complex. If the symbolic link is
|
|
to a destination being transmitted, the data has the form::
|
|
|
|
→ action=end_data id=someid file_id=f1 data=fid:target_file_id_encoded_as_utf8
|
|
→ action=end_data id=someid file_id=f1 data=fid_abs:target_file_id_encoded_as_utf8
|
|
|
|
The ``fid_abs`` form is used if the symlink uses an absolute path, ``fid`` if
|
|
it uses a relative path. If the symlink is to a destination that is not being
|
|
transmitted, then the prefix ``path:`` and the actual path in the symlink is
|
|
transmitted.
|
|
|
|
Receiving links from the terminal emulator
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
When receiving files from the terminal emulator, link data is transmitted in
|
|
two parts. First when the emulator sends the initial file listing to the
|
|
client, the ``file_type`` is set to the link type and the ``data`` field is set
|
|
to file_id of the target file if the target file is included in the listing.
|
|
For example::
|
|
|
|
← action=file id=someid file_id=f1 status=file_id1 ...
|
|
← action=file id=someid file_id=f1 status=file_id2 file_type=symlink data=file_id1 ...
|
|
|
|
Here the rest of the metadata has been left out for clarity. Notice that the
|
|
second file is symlink whose ``data`` field is set to the file id of the first
|
|
file (the value of the ``status`` field of the first file). The same technique
|
|
is used for hard links.
|
|
|
|
The client should not request data for hard links, instead creating them
|
|
directly after transmission is complete. For symbolic links the terminal
|
|
must send the actual symbolic link target as a UTF-8 encoded path in the
|
|
data field. The client can use this path either as-is (when the target is not
|
|
a transmitted file) or to decide whether to create the symlink with a relative
|
|
or absolute path when the target is a transmitted file.
|
|
|
|
|
|
Transmitting binary deltas
|
|
-----------------------------
|
|
|
|
Repeated transfer of large files that have only changed a little between
|
|
the receiving and sending side can be sped up significantly by transmitting
|
|
binary deltas of only the changed portions. This protocol has built-in support
|
|
for doing that. This support uses the `rsync algorithm
|
|
<https://github.com/librsync/librsync>`__. In this algorithm first the
|
|
receiving side sends a file signature that contains hashes of blocks
|
|
in the file. Then the sending side sends only those blocks that have changed.
|
|
The receiving side applies these deltas to the file to update it till it matches
|
|
the file on the sending side.
|
|
|
|
The modification to the basic protocol consists of setting the
|
|
``transmission_type`` key to ``rsync`` when requesting a file. This triggers
|
|
transmission of signatures and deltas instead of file data. The details are
|
|
different for sending and receiving.
|
|
|
|
Sending to the terminal emulator
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
When sending the metadata of the file it wants to transfer, the client adds the
|
|
``transmission_type`` key::
|
|
|
|
→ action=file id=someid file_id=f1 name=/path/to/destination transmission_type=rsync
|
|
|
|
The ``STARTED`` response from the terminal will have ``transmission_type`` set
|
|
to ``rsync`` if the file exists and the terminal is able to send signature data::
|
|
|
|
← action=status id=someid file_id=f1 status=STARTED transmission_type=rsync
|
|
|
|
The terminal then transmits the signature using ``data`` commands::
|
|
|
|
← action=data id=someid file_id=f1 data=...
|
|
...
|
|
← action=end_data id=someid file_id=f1 data=...
|
|
|
|
Once the client receives and processes the full signature, it transmits the
|
|
file delta to the terminal as ``data`` commands::
|
|
|
|
→ action=data id=someid file_id=f1 data=...
|
|
→ action=data id=someid file_id=f1 data=...
|
|
...
|
|
→ action=end_data id=someid file_id=f1 data=...
|
|
|
|
The terminal then uses this delta to update the file.
|
|
|
|
Receiving from the terminal emulator
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
When the client requests file data from the terminal emulator, it can
|
|
add the ``transmission_type=rsync`` key to indicate it will be sending
|
|
a signature for that file::
|
|
|
|
→ action=file id=someid file_id=f1 name=/some/path transmission_type=rsync
|
|
|
|
The client then sends the signature using ``data`` commands::
|
|
|
|
→ action=data id=someid file_id=f1 data=...
|
|
...
|
|
→ action=end_data id=someid file_id=f1 data=...
|
|
|
|
After receiving the signature the terminal replies with the delta as a series
|
|
of ``data`` commands::
|
|
|
|
← action=data id=someid file_id=f1 data=...
|
|
...
|
|
← action=end_data id=someid file_id=f1 data=...
|
|
|
|
The client then uses this delta to update the file.
|
|
|
|
The format of signatures and deltas
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
These come from `librsync <https://github.com/librsync/librsync>`__. If this
|
|
specification gains wider adoption, these formats should be documented here.
|
|
|
|
Compression
|
|
--------------
|
|
|
|
Individual files can be transmitted compressed if needed.
|
|
Currently, only :rfc:`1950` ZLIB based deflate compression is
|
|
supported, which is specified using the ``compression=zlib`` key when
|
|
requesting a file. For example when sending files to the terminal emulator,
|
|
when sending the file metadata the ``compression`` key can also be
|
|
specified::
|
|
|
|
→ action=file id=someid file_id=f1 name=/path/to/destination compression=zlib
|
|
|
|
Similarly when receiving files from the terminal emulator, the final file
|
|
command that the client sends to the terminal requesting the start of the
|
|
transfer of data for the file can include the ``compression`` key::
|
|
|
|
→ action=file id=someid file_id=f1 name=/some/path compression=zlib
|
|
|
|
.. _bypass_auth:
|
|
|
|
Bypassing explicit user authorization
|
|
------------------------------------------
|
|
|
|
In order to bypass the requirement of interactive user authentication,
|
|
this protocol has the ability to use a pre-shared secret (password).
|
|
When initiating a transfer session the client sends a hash of the password and
|
|
the session id::
|
|
|
|
→ action=send id=someid bypass=sha256:hash_value
|
|
|
|
For example, suppose that the session id is ``mysession`` and the
|
|
shared secret is ``mypassword``. Then the value of the ``bypass``
|
|
key above is ``sha256:SHA256("mysession" + ";" + "mypassword")``, which
|
|
is::
|
|
|
|
→ action=send id=mysession bypass=sha256:192bd215915eeaa8c2b2a4c0f8f851826497d12b30036d8b5b1b4fc4411caf2c
|
|
|
|
The value of ``bypass`` is of the form ``hash_function_name : hash_value``
|
|
(without spaces). Currently, only the SHA256 hash function is supported.
|
|
|
|
.. warning::
|
|
Hashing does not effectively hide the value of the password. So this
|
|
functionality should only be used in secure/trusted contexts. While there
|
|
exist hash functions harder to compute than SHA256, they are unsuitable as
|
|
they will introduce a lot of latency to starting a session and in any case
|
|
there is no mathematical proof that **any** hash function is not brute-forceable.
|
|
|
|
Encoding of transfer commands as escape codes
|
|
------------------------------------------------
|
|
|
|
Transfer commands are encoded as ``OSC`` escape codes of the form::
|
|
|
|
<OSC> 5113 ; key=value ; key=value ... <ST>
|
|
|
|
Here ``OSC`` is the bytes ``0x1b 0x5d`` and ``ST`` is the bytes
|
|
``0x1b 0x5c``. Keys are words containing only the characters ``[a-zA-Z0-9_]``
|
|
and ``value`` is arbitrary data, whose encoding is dependent on the value of
|
|
``key``. Unknown keys **must** be ignored when decoding a command.
|
|
The number ``5113`` is a constant and is unused by any known OSC codes. It is
|
|
the numeralization of the word ``file``.
|
|
|
|
|
|
.. table:: The keys and value types for this protocol
|
|
:align: left
|
|
|
|
================= ======== ============== =======================================================================
|
|
Key Key name Value type Notes
|
|
================= ======== ============== =======================================================================
|
|
action ac enum send, file, data, end_data, receive, cancel, status, finish
|
|
compression zip enum none, zlib
|
|
file_type ft enum regular, directory, symlink, link
|
|
transmission_type tt enum simple, rsync
|
|
id id safe_string A unique-ish value, to avoid collisions
|
|
file_id fid safe_string Must be unique per file in a session
|
|
bypass pw safe_string hash of the bypass password and the session id
|
|
quiet q integer 0 - verbose, 1 - only errors, 2 - totally silent
|
|
mtime mod integer the modification time of file in nanoseconds since the UNIX epoch
|
|
permissions prm integer the UNIX file permissions bits
|
|
size sz integer size in bytes
|
|
name n base64_string The path to a file
|
|
status st base64_string Status messages
|
|
parent pr safe_string The file id of the parent directory
|
|
data d base64_bytes Binary data
|
|
================= ======== ============== =======================================================================
|
|
|
|
The ``Key name`` is the actual serialized name of the key sent in the escape
|
|
code. So for example, ``permissions=123`` is serialized as ``prm=123``. This
|
|
is done to reduce overhead.
|
|
|
|
The value types are:
|
|
|
|
enum
|
|
One from a permitted set of values, for example::
|
|
|
|
ac=file
|
|
|
|
safe_string
|
|
A string consisting only of characters from the set ``[0-9a-zA-Z_:./@-]``
|
|
Note that the semi-colon is missing from this set.
|
|
|
|
integer
|
|
A base-10 number composed of the characters ``[0-9]`` with a possible
|
|
leading ``-`` sign
|
|
|
|
base64_string
|
|
A base64 encoded UTF-8 string using the standard base64 encoding
|
|
|
|
base64_bytes
|
|
Binary data encoded using the standard base64 encoding
|
|
|
|
|
|
An example of serializing an escape code is shown below::
|
|
|
|
action=send id=test name=somefile size=3 data=01 02 03
|
|
|
|
becomes::
|
|
|
|
<OSC> 5113 ; ac=send ; id=test ; n=c29tZWZpbGU= ; sz=3 ; d=AQID <ST>
|
|
|
|
Here ``c29tZWZpbGU`` is the base64 encoded form of somefile and ``AQID`` is the
|
|
base64 encoded form of the bytes ``0x01 0x02 0x03``. The spaces in the encoded
|
|
form are present for clarity and should be ignored.
|