Chapter 9. The Guacamole protocol

This chapter is an overview of the Guacamole protocol, describing its design and general use. While a few instructions and their syntax will be described here, this is not an exhaustive list of all available instructions. The intent is only to list the general types and usage. If you are looking for the syntax or purpose of a specific instruction, consult the protocol reference included with the appendices.

Design

The Guacamole protocol consists of instructions. Each instruction is a comma-delimited list followed by a terminating semicolon, where the first element of the list is the instruction opcode, and all following elements are the arguments for that instruction:

OPCODE,ARG1,ARG2,ARG3,...;

Each element of the list has a positive decimal integer length prefix separated by the value of the element by a period. This length denotes the number of Unicode characters in the value of the element, which is encoded in UTF-8:

LENGTH.VALUE

Any number of complete instructions make up a message which is sent from client to server or from server to client. Client to server instructions are generally control instructions (for connecting or disconnecting) and events (mouse and keyboard). Server to client instructions are generally drawing instructions (caching, clipping, drawing images), using the client as a remote display.

For example, a complete and valid instruction for setting the display size to 1024x768 would be:

4.size,1.0,4.1024,3.768;

Here, the instruction would be decoded into four elements: "size", the opcode of the size instruction, "0", the index of the default layer, "1024", the desired width in pixels, and "768", the desired height in pixels.

The structure of the Guacamole protocol is important as it allows the protocol to be streamed while also being easily parsable by JavaScript. JavaScript does have native support for conceptually-similar structures like XML or JSON, but neither of those formats is natively supported in a way that can be streamed; JavaScript requires the entirety of the XML or JSON message to be available at the time of decoding. The Guacamole protocol, on the other hand, can be parsed as it is received, and the presence of length prefixes within each instruction element means that the parser can quickly skip around from instruction to instruction without having to iterate over every character.

Handshake phase

The handshake phase is the phase of the protocol entered immediately upon connection. It begins with a "select" instruction sent by the client which tells the server which protocol will be loaded:

6.select,3.vnc;

After receiving the "select" instruction, the server will load the associated client support and respond with a list of accepted parameter names using an "args" instruction:

4.args,8.hostname,4.port,8.password,13.swap-red-blue,9.read-only;

After receiving the list of arguments, the client is required to respond with the list of supported audio and video mimetypes, the optimal display size and resolution, and the values for all arguments available, even if blank. If any of these requirements are left out, the connection will close:

4.size,4.1024,3.768,2.96;
5.audio,9.audio/ogg;
5.video;
7.connect,9.localhost,4.5900,0.,0.,0.;

For clarity, we've put each instruction on its own line, but in the real protocol, no newlines exist between instructions. In fact, if there is anything after an instruction other than the start of a new instruction, the connection is closed.

Here, the client is specifying that the optimal display size is 1024x768 at 96 DPI and it supports Ogg Vorbis audio, but no video. It wants to connect to localhost at port 5900, and is leaving the three other parameters blank.

Once these instructions have been sent by the client, the actual interactive phase begins, and drawing and event instructions pass back and forth until the connection is closed.

Nesting and interleaving

The Guacamole protocol can be nested within itself, such that long instructions or independent streams of multiple instructions need not block each other; they can be multiplexed into the same stream. Nesting is accomplished with the "nest" instruction.

A nest instruction has only two parameters: an arbitrary integer index denoting what stream the data is associated with, and the instruction data itself. The integer index is important as it defines how the instruction will be reassembled. The data from nest instructions with the same stream index is reassembled by the client in the order received, and instructions within that data are executed immediately once completed.

This is particularly important when data needs to be trickled to the client rather than as a single atomic instruction.

Drawing

Compositing

The Guacamole protocol provides compositing operations through the use of "channel masks". The term "channel mask" is simply a description of the mechanism used while designing the protocol to conceptualize and fully enumerate all possible compositing operations based on four different sources of image data: source image data where the destination is opaque, source image data where the destination is transparent, destination image data where the source is opaque, and destination image data where the source is transparent. Assigning a binary value to each of these "channels" creates a unique integer ID for every possible compositing operation, where these operations parallel the operations described by Porter and Duff in their paper. As the HTML5 canvas tag also uses Porter/Duff to describe their compositing operations (as do other graphical APIs), the Guacamole protocol is conveniently similar to the compositing support already present in web browsers, with some operations not yet supported. The following operations are all implemented and known to work correctly in all browsers:

B out A (0x02)

Clears the destination where the source is opaque, but otherwise draws nothing. This is useful for masking.

A atop B (0x06)

Fills with the source where the destination is opaque only.

A xor B (0x0A)

As with logical XOR. Note that this is a compositing operation, not a bitwise operation. It draws the source where the destination is transparent, and draws the destination where the source is transparent.

B over A (0x0B)

What you would typically expect when drawing, but reversed. The source appears only where the destination is transparent, as if you were attempting to draw the destination over the source, rather than the source over the destination.

A over B (0x0E)

The most common and sensible compositing operation, this draws the source everywhere, but includes the destination where the source is transparent.

A + B (0x0F)

Simply adds the components of the source image to the destination image, capping the result at pure white.

The following operations are all implemented, but may work incorrectly in WebKit browsers which always include the destination image where the source is transparent:

B in A (0x01)

Draws the destination only where the source is opaque, clearing anywhere the source or destination are transparent.

A in B (0x04)

Draws the source only where the destination is opaque, clearing anywhere the source or destination are transparent.

A out B (0x08)

Draws the source only where the destination is transparent, clearing anywhere the source or destination are opaque.

B atop A (0x09)

Fills with the destination where the source is opaque only.

A (0x0C)

Fills with the source, ignoring the destination entirely.

The following operations are defined, but not implemented, and do not exist as operations within the HTML5 canvas:

Clear (0x00)

Clears all existing image data in the destination.

B (0x03)

Does nothing.

A xnor B (0x05)

Adds the source to the destination where the destination or source are opaque, clearing anywhere the source or destination are transparent. This is similar to A + B except the aspect of transparency is also additive.

(A + B) atop B (0x07)

Adds the source to the destination where the destination is opaque, preserving the destination otherwise.

(A + B) atop A (0x0D)

Adds the destination to the source where the source is opaque, copying the source otherwise.

Image data

The Guacamole protocol, like many remote desktop protocols, provides a method of sending an arbitrary rectangle of image data and placing it either within a buffer or in a visible rectangle of the screen. Raw image data in the Guacamole protocol is sent within PNG chunks using the "png" instruction, and thus provides the same level of image compression and color representation. Image updates sent in this way can be RGB or RGBA (alpha transparency) and are automatically palettized if sent using libguac.

Image data in the Guacamole protocol is sent base64-encoded, as the Guacamole protocol is entirely text-based. This works out well, because all browsers have native support for base64, and are required to at least support PNG, thus the Guacamole "png" instruction is one of the more efficient ways to stream image data to a browser.

Each chunk of image data can be sent to any specified rectangle within a layer or buffer. Sending the data to a layer means that the image becomes immediately visible, while sending the data to a buffer allows that data to be reused later.

Copying image data between layers

Image data can be copied from one layer or buffer into another layer or buffer. This is often used for scrolling (where most of the result of the graphical update is identical to the previous state) or for caching parts of an image.

Both VNC and RDP provide a means of copying a region of screen data and placing it somewhere else within the same screen. RDP provides an additional means of copying data to a cache, or recalling data from that cache and placing it on the screen. Guacamole takes this concept and reduces it further, as both on-screen and off-screen image storage is the same. The Guacamole "copy" instruction allows you to copy a rectangle of image data, and place it within another layer, whether that layer is the same as the source layer, a different visible layer, or an off-screen buffer.

Graphical primitives

The Guacamole protocol provides basic graphics operations similar to those of Cairo or the HTML5 canvas. In many cases, these primitives are useful for remote drawing, and desirable in that they take up less bandwidth than sending corresponding PNG images. Beware that excessive use of primitives leads to an increase in client-side processing, which may reduce the performance of a connected client, especially if that client is on a lower-performance machine like a mobile phone or tablet.

Buffers and layers

All drawing operations in the Guacamole protocol affect a layer, and each layer has an integer index which identifies it. When this integer is negative, the layer is not visible, and can be used for storage or caching of image data. In this case, the layer is referred to within the code and within documentation as a "buffer". Layers are created automatically when they are first referenced in an instruction.

There is one main layer which is always present called the "default layer". This layer has an index of 0. Resizing this layer resizes the entire remote display. Other layers default to the size of the default layer upon creation, while buffers are always created with a size of 0x0, automatically resizing themselves to fit their contents.

Non-buffer layers can be moved and nested within each other. In this way, layers provide a simple means of hardware-accelerated compositing. If you need a window to appear above others, or you have some object which will be moving or you need the data beneath it automatically preserved, a layer is a good way of accomplishing this. If a layer is nested within another layer, its position is relative to that of its parent. When the parent is moved or reordered, the child moves with it. If the child extends beyond the parents bounds, it will be clipped.

Multimedia streaming and pipes

Guacamole supports transfer of both audio and video data, as well as files and arbitrary named pipes. Streams can be allocated with audio or video instructions for the sake of playing media, with file instructions for file transfer, or with "pipe" instructions for transfer or abtrary data between client and server.

By the nature of the Guacamole protocol, you must know the duration of the audio or video data before it is sent. This is because sequential playing of multiple audio or video chunks must be carefully timed by the client to avoid gaps in playback. Even a gap of only one millisecond will produce an audible "click" in audio playback.

As far as audio and video are concerned, variance in chunk size gives a trade-off between responsiveness and perceived quality. Larger chunks will provide smoother audio and video, but have noticable lag compared to smaller chunks. Smaller chunks provide less lag, but may produce noticable clicking or stutter if the timing of each chunk is not perfect.

Events

When something changes on either side, client or server, such as a key being pressed, the mouse moving, or clipboard data changing, an instruction describing the event is sent.

Disconnecting

The server and client can end the connection at any time. There is no requirement for the server or the client to communicate that the connection needs to terminate. When the client or server wish to end the connection, and the reason is known, they can use the "disconnect" or "error" instructions.

The disconnect instruction is sent by the client when it is disconnecting. This is largely out of politeness, and the server must be written knowing that the disconnect instruction may not always be sent in time (guacd is written this way).

If the client does something wrong, or the server detects a problem with the client plugin, the server sends an error instruction, including a description of the problem in the parameters. This informs the client that the connection is being closed.