Airsequel Updates logo

Airsequel Updates

Subscribe
Archives
December 8, 2022

Announcing Double-X-Encoding - Encode any UTF-8 string with [0-9a-zA-Z_]

We're happy to announce the release of Double-X-Encoding, a simple and effective way to encode any Unicode string using only the characters [0-9a-zA-Z_].

Double-X-Encoding is similar to URL percent-encoding, but has been specifically designed to only use alpha-numeric ASCII characters and the underscore. This makes it particularly useful for generating unique IDs in various use-cases and especially for GraphQL IDs.

At Airsequel it allows users to use any Unicode string as a column name and still have a direct mapping to our automatically generated GraphQL API.

The encoding scheme is based on a set of straightforward rules:

  1. Characters in the range [0-9A-Za-z_] are encoded as is, except for XX which is encoded as XXXXXX

  2. All other printable characters inside the ASCII range are encoded as XX[0-9A-W]

  3. All other Unicode code points until U+fffff (e.g. Emojis) are encoded as a sequence of 7 characters: XX[a-p]{5}, where the 5 characters are the hexadecimal representation with an alternative hex alphabet ranging from a to p instead of 0 to f.

  4. All Unicode code points in the Supplementary Private Use Area-B (U+100000 to U+10ffff) are encoded as a sequence of 9 characters: XXY[a-p]{6}

Additionally, the Double-X-Encoding scheme offers optional support for encoding leading digits and double underscores to fulfill GraphQL's constraints regarding IDs.

  1. A leading digit can be encoded as XXZ[0-9].

  2. Double underscores (__) can be encoded as XXRXXR.

Here are a few examples to illustrate how Double-X-Encoding works:

Input Encoded
camelCaseId camelCaseId
snake_case_id snake_case_id
__Schema __Schema
doxxing doxxing
DOXXING DOXXXXXXING
id with spaces idXX0withXX0spaces
id-with.special$chars! idXXDwithXXEspecialXX4charsXX1
id_with_ümläutß id_with_XXaaapmmlXXaaaoeutXXaaanp
Emoji: 😅 EmojiXXGXX0XXbpgaf
Multi Byte Emoji: 👨‍🦲 MultiXX0ByteXX0EmojiXXGXX0XXbpegiXXacaanXXbpjlc

With encoding of leading digit and double underscore activated:

Input Encoded
1FileFormat XXZ1FileFormat
__index__ XXRXXRindexXXRXXR

We believe that Double-X-Encoding is a simple and effective solution for encoding Unicode strings and we can't wait to see what you're going to use it for!

You can find the code in following GitHub repo:
github.com/Airsequel/double-x-encoding

Give it a try and let us know what you think!

Don't miss what's next. Subscribe to Airsequel Updates:
GitHub X
This email brought to you by Buttondown, the easiest way to start and grow your newsletter.