Data Types

Simple types only contain one type. This makes them very simple to serialize/deserialize. All simple types have the following layout:

For an unicode string ‘sayan’, the layout of the unicode string type (+) will look like:

  1. +5\n # 'sayan' is an unicode string, so '+' and has 5 bytes so '5'
  2. sayan\n # the element 'sayan' itself

Table

Do keep the matching for this symbol non-exhaustive since we might add more types in future revisions of the protocol.

Compound types

Compound types are derived types — they are based on simple types, but often with some additional properties (and serialization/deserialization differences).

Type symbol (tsymbol)TypeAdditional notesProtocol
&ArrayA recursive array1.0
_Flat arrayA non-recursive array1.0
@Typed arrayAn array of a specific type, with nullable elements1.1
~Any arrayAn array with a single type but no information about the type1.1
^Typed non-null arrayA non-recursive array with non-null elements1.1

Array

See the full discussion on arrays here.

A flat array is like an array, but with the exception that it is non-recursive. This means that a flat array can contain all types except other compound types (hence the name ‘flat’).

So if you represent an array in a programming language like:

  1. ["hello", 12345, "world"];

then it will be serialized by Skyhash into:

note

Typed array

A typed array is like a flat array, but with the exception that it can only hold two types: either a or a NULL. Since this array just has a specific type in its declaration, unlike flat arrays, tsymbols are not required.

You can think of it to be like:

  • or there is an element of the declared type

Say a programming language represents an array like:

  1. ["omg", NULL, "happened"]

then it will be serialized by Skyhash into:

  1. @+3\n
  2. omg\n
  3. \0\n
  4. 8\n
  5. happened\n

Line-by-line explanation:

  • @+3\n because it is a typed array, so @, the elements are unicode strings, so + and there are three elements, so 3
  • 3\n because ‘omg’ has 3 bytes
  • omg\n, the element itself
  • \0\n, NULL because there was no element

  • 8\n because ‘happened’ has 8 bytes

  • happened\n, the element itself
note

An AnyArray is like a typed array — but without any explicit information about the type that is sent. Currently, all the element types have to be the same, but however, no information about the type has to be sent. It is upto the server to convert them to the correct types. This makes running actions extremely simple as the clients don’t have to specify the type. The server will convert it into the appropriate type for that action. No matter how flexible this may sound — AnyArrays are extremely performant. Also, no element in an AnyArray can be null.

If you have a programming language that represents a singly-typed array like:

then Skyhash will serialize it into:

  1. 5\n
  2. 2\n
  3. is\n
  4. 6\n
  5. hiking\n

Line-by-line explanation:

  1. ~3\n because this is an AnyArray with 3 elements
  2. 5\n because ‘sayan’ has 5 bytes
  3. sayan\n, the element ‘sayan’ itself
  4. 2\n because ‘is’ has 2 bytes
  5. is\n the element ‘is’ itself
  6. 6\n because ‘hiking’ has 6 bytes
  7. hiking\n the element ‘hiking’ itself
note

An AnyArray is currently a query specific data type (only sent by the client and never by the server)

Typed non-null array

A typed non-null array is just like a typed array, except for one thing — its elements can never be null. Say you have an array of three strings like this:

  1. ["super", "wind"]

Then it will be represented like this:

Line-by-line explanation:

  1. ^+2\n because this a typed non-null array, with two string elements
  2. 5\n because the first element is “super” and has 5 chars
  3. super\n the element itself
  4. 4\n the second element is “wind” and has 4 chars