Announcing Double-X-Encoding - Encode any UTF-8 string with [0-9a-zA-Z_]
We're happy to announce the release of Double-X-Encoding, a simple and effective way to encode any Unicode string using only the characters [0-9a-zA-Z_]
.
Double-X-Encoding is similar to URL percent-encoding, but has been specifically designed to only use alpha-numeric ASCII characters and the underscore. This makes it particularly useful for generating unique IDs in various use-cases and especially for GraphQL IDs.
At Airsequel it allows users to use any Unicode string as a column name and still have a direct mapping to our automatically generated GraphQL API.
The encoding scheme is based on a set of straightforward rules:
-
Characters in the range
[0-9A-Za-z_]
are encoded as is, except forXX
which is encoded asXXXXXX
-
All other printable characters inside the ASCII range are encoded as
XX[0-9A-W]
-
All other Unicode code points until
U+fffff
(e.g. Emojis) are encoded as a sequence of 7 characters:XX[a-p]{5}
, where the 5 characters are the hexadecimal representation with an alternative hex alphabet ranging froma
top
instead of0
tof
. -
All Unicode code points in the Supplementary Private Use Area-B (
U+100000
toU+10ffff
) are encoded as a sequence of 9 characters:XXY[a-p]{6}
Additionally, the Double-X-Encoding scheme offers optional support for encoding leading digits and double underscores to fulfill GraphQL's constraints regarding IDs.
-
A leading digit can be encoded as
XXZ[0-9]
. -
Double underscores (
__
) can be encoded asXXRXXR
.
Here are a few examples to illustrate how Double-X-Encoding works:
Input | Encoded |
---|---|
camelCaseId |
camelCaseId |
snake_case_id |
snake_case_id |
__Schema |
__Schema |
doxxing |
doxxing |
DOXXING |
DOXXXXXXING |
id with spaces |
idXX0withXX0spaces |
id-with.special$chars! |
idXXDwithXXEspecialXX4charsXX1 |
id_with_ümläutß |
id_with_XXaaapmmlXXaaaoeutXXaaanp |
Emoji: 😅 |
EmojiXXGXX0XXbpgaf |
Multi Byte Emoji: 👨🦲 |
MultiXX0ByteXX0EmojiXXGXX0XXbpegiXXacaanXXbpjlc |
With encoding of leading digit and double underscore activated:
Input | Encoded |
---|---|
1FileFormat |
XXZ1FileFormat |
__index__ |
XXRXXRindexXXRXXR |
We believe that Double-X-Encoding is a simple and effective solution for encoding Unicode strings and we can't wait to see what you're going to use it for!
You can find the code in following GitHub repo:
github.com/Airsequel/double-x-encoding
Give it a try and let us know what you think!