๐ #6 - Magical Macros
๐
It's the start of the month again, and that means another newsletter! ... Okay, it's not exactly the start of the Month, but we were deliberately avoiding April 1st because... well. While we enjoy the good Mom Joke or Dad Pun here, sometimes people think it's alright to just be edgy and cruel with their humor!
On the bright side, there's no joke here because it's finally going to be the end of the 3-part Newsletter series! This one teaches you how to make a somewhat nice-feeling interface in C for handling various different kinds of arguments. It has a few holes that may be patched in future standards versions, but we're pretty excited for it! Bur first, let's go over a little recap.
Previously...
In the last 2 letters, we established that we could:
- Use "
va_arg
s" (the...
at the end of a function declaration) to handle any number of arguments; - That we could use a special "last" argument marker to know when to stop processing arguments rather than depending on, say, a format string or an explicit count passed into the argument;
- And, how to properly capture string literals and other data using l-value conversion.
With all of that under our belts, it's time to get started on the real magic that would make the Macromancer himself proud!
Let's handle an arbitrary sequence of arguments/types in a function-like macro using ...
and __VA_ARGS__
in the preprocessor.
__VA_ARGS__
While the mechanism for handling variable number of arguments in C is clumsy and generally type-unsafe (see: the number of compiler warnings around using printf
with the wrong format specifiers for e.g. an int
or a double
), we can force that type safety by falling back onto a technique called Macro Generic Programming, or MGP for short. MGP allows us to give a simple syntax and bury the completing of type checking and other shenanigans into the preprocessor, giving us flexible interfaces even in a programming language like C that lacks the strong generic programming support found in other languages.
We already have been using some MGP to create our GET_ARG
macro:
/* ... */
#define GET_ARG_TYPE(value) _Generic(value, \
int: arg_type_int, \
double: arg_type_double, \
char*: arg_type_ptr_char, \
const char*: arg_type_ptr_char, \
void*: arg_type_ptr_void, \
/* and so on, and so forth... */ \
default: arg_type_shrug \
)
#define GET_ARG_DATA(value) \
((__typeof((void)0, (value))[]){ (value) })
#define GET_ARG(value) \
(arg){ GET_ARG_TYPE(value), GET_ARG_DATA(value) }
The GET_ARG
macro has been producing both the type and the data. Now, we just need to use it on the arguments we receive, in a recursive manner! Like mentioned in the last newsletter, this is -- unfortunately -- not something we are capable of doing:
#define process_args(...) \
process_arg(GET_ARG(__VA_ARGS__))
This does not call GET_ARG
for each value. So we need to call GET_ARG
on each value, one at a time. We re-worked our base function to be capable of this in the last newsletter:
void process_all_args (int marker_argument, ...);
And now it's time to use {insert some magical implementation technique here} to allow for something that looks like this:
#define process_args(...) \
process_all_args(/* marker */ 0xAAAA, \
FOR_ALL_ARGS(GET_ARG, __VA_ARGS__))
So... let's do it!
FOR_ALL_ARGS
, in the Preprocessor
We won't call it FOR_ALL_ARGS
, because we're not going to create a "general purpose" invoker of macros. What we will do, though, is write something that expands all the tokens out for us, and then start working from there to invoke our GET_ARG
one at a time. We do not really have the power to invoke things in a loop or in a while statement, since there exists no such construct in the preprocessor. But, you can fake it with a pseudo-recursion. We say "pseudo" here, because it's not really true recursion. Instead, it's kind of like a hand-unrolled loop or a sequentially daisy-chained set of wires, really:
#define PROCESS_ARG_LAST() (arg){ arg_type_shrug, NULL }
#define PROCESS_ARG0() PROCESS_ARG_LAST() // and we're done!!
#define PROCESS_ARG1(val) GET_ARG(val), PROCESS_ARG0()
#define PROCESS_ARG2(val, ...) GET_ARG(val), PROCESS_ARG1(__VA_ARGS__)
#define PROCESS_ARG3(val, ...) GET_ARG(val), PROCESS_ARG2(__VA_ARGS__)
#define PROCESS_ARG4(val, ...) GET_ARG(val), PROCESS_ARG3(__VA_ARGS__)
#define PROCESS_ARG5(val, ...) GET_ARG(val), PROCESS_ARG4(__VA_ARGS__)
#define PROCESS_ARG6(val, ...) GET_ARG(val), PROCESS_ARG5(__VA_ARGS__)
#define PROCESS_ARG7(val, ...) GET_ARG(val), PROCESS_ARG6(__VA_ARGS__)
#define PROCESS_ARGS_SELECT(inv1, inv2, inv3, inv4, inv5, inv6, inv7, INVOKE_THIS_ONE, ...) \
INVOKE_THIS_ONE
#define process_args(...) \
process_all_args( 0xAAAA, PROCESS_ARGS_SELECT(__VA_ARGS__, \
PROCESS_ARG7, PROCESS_ARG6, PROCESS_ARG5, \
PROCESS_ARG4, PROCESS_ARG3, PROCESS_ARG2, \
PROCESS_ARG1, PROCESS_ARG0)(__VA_ARGS__))
"... Huh?!"
Okay, so! This looks quite a bit complicated, but I promise you it's actually a really simple concept rooted in the fact that we can only have X number of maximum arguments. process_args
is our top-level invocation. It's job is to apply GET_ARG
to everything it receives, so we can do all that compound-literal generation of the struct arg
objects we need to do the printing. Now, you may be wondering: "Why do you use __VA_ARGS__
in the PROCESS_ARG_SELECT
macro invocation, AND in at the end in the parentheses?". This is where the trick comes in, and why we explicitly list out the PROCESS_ARG7
, PROCESS_ARG6
, etc. values. The goal of the PROCESS_ARG_SELECT
call is to pick the right macro to invoke and start the chain of GET_ARG
s! This is achieved by the dumping of the macro arguments before us, and then using the offset position of the given PROCESS_ARG[n]
macro to determine which to call.
An Example
Let's look at the hypothetical expansion of these, if we were a compiler, and why this would work. So, consider the call
process_args(1);
Then that would expand, at first, to something like this:
process_all_args(0xAAAA, PROCESS_ARGS_SELECT(1,
PROCESS_ARG7, PROCESS_ARG6, PROCESS_ARG5,
PROCESS_ARG4, PROCESS_ARG3, PROCESS_ARG2,
PROCESS_ARG1, PROCESS_ARG0)(1));
So far, so good. Now, how does PROCESS_ARGS_SELECT
expand? Well, the goal is that it picks the right macro to expand to. If you expand out just PROCESS_ARGS_SELECT
, you end up with (annotated to drive home the point):
PROCESS_ARGS_SELECT(1 /* inv7 */, PROCESS_ARG7 /* inv6 */,
PROCESS_ARG6 /* inv5 */, PROCESS_ARG5 /* inv4 */,
PROCESS_ARG4 /* inv3 */, PROCESS_ARG3 /* inv2 */,
PROCESS_ARG2 /* inv1 */,
PROCESS_ARG1 /* INVOKE_THIS_ONE */,
PROCESS_ARG0 /* ... */)
Which, removing annotations and doing the call, becomes:
process_all_args(0xAAAA, PROCESS_ARG1(1));
Aha! See, PROCESS_ARGS_SELECT
essentially "pushes" the right macro to invoke (one of the PROCESS_ARGS[n]
macros) into the INVOKE_THIS_ONE
position. That's why we call it SELECT
; it selects the right macro to invoke! So, finally, the function call ends up looking like this:
process_all_args(0xAAAA,
(arg){ arg_type_shrug, (int[]){ 1 } },
(arg){ arg_type_shrug, NULL } );
And, when called, you should see this output:
1
Boom!
And that's it! That's all the magic! This generalizes to 2, 3, 4, 5, etc. arguments! We have achieved a type-safe, variadic call in a C API, that keeps its arguments alive with the power of compound literals! Someone working with us is working on getting __typeof
to not be a widely-implemented extension and instead a normal part of things. But, there are some problems with this setup that are, quite literally, impossible to solve in Standard C even if we take the __typeof
for granted.
__VA_ARGS__
and zero arguments
You'll notice that nowhere during this exercise have we run through the usage of process_args()
, where it's called with no arguments. This is, unfortunately, intentional, because it can't work. The biggest problem with ...
and __VA_ARGS__
in macros is that the __VA_ARGS__
and the paired ...
construct are not allowed to have no arguments passed in it:
If there is a
...
in the identifier-list in the macro definition, then the trailing arguments, including any separating comma preprocessing tokens, are merged to form a single item: the variable arguments. The number of arguments so combined is such that, following merger, the number of arguments is one more than the number of parameters in the macro definition (excluding the...
).โ ยง6.10.3 Macro replacement, Semantics, paragraph 12, C Standard Working Draft
There must always be 1 or more arguments to the function for the ...
part. This unfortunate fact is only resolved by a paper that's been fixed in C++20 with __VA_OPT__
; the C Committee still needs to move the required N2610 forward. If you'd like to see this problem solved so that C is not behind on C++ with the preprocessor, we do encourage you to e-mail the author with supporting words, and maybe ask to lend a helping hand!
More Fundamental: a maximum of 7
Right now, if you use this code from the article, directly, it only gives 7 arguments maximum. That's a cute number, but likely to fall apart with rigorous use. The C++ Boost.Preprocessor library typically allows for up to 64, but be careful: bigger macros can cost more for compile time. If you're from Rust or C++, you're likely already used to gigantic compile times and macros being slightly complicated won't make you blink an eye and will still be faster than procedural Rust macros, Rust traits, and/or C++ templates.
We'll have some brand new knowledge for you, next newsletter! A full code listing is available on Godbolt for you to play with, and is shown below.
Until next time!
โ Shepherd's Oasis ๐
P.S.: We would've put a joke here, but honestly we're a little "fooled out" from the exhausting, too-edgy jokes from April Fool's Day. Just have a good one, okay? ๐
#include <stdio.h>
#include <stdarg.h>
enum arg_type {
arg_type_shrug,
arg_type_int,
arg_type_double,
arg_type_ptr_char,
arg_type_ptr_void
};
typedef struct arg_ {
enum arg_type type;
void* data;
} arg;
#define GET_ARG_TYPE(value) _Generic(value, \
int: arg_type_int, \
double: arg_type_double, \
char*: arg_type_ptr_char, \
const char*: arg_type_ptr_char, \
void*: arg_type_ptr_void, \
/* and so on, and so forth... */ \
default: arg_type_shrug \
)
#define GET_ARG_DATA(value) \
((__typeof((void)0, (value))[]){ (value) })
#define GET_ARG(value) \
(arg){ GET_ARG_TYPE(value), GET_ARG_DATA(value) }
void process_arg(arg arg0);
void process_all_args(int marker_argument, ...);
#define PROCESS_ARG_LAST() (arg){ arg_type_shrug, NULL }
#define PROCESS_ARG0() PROCESS_ARG_LAST() // and we're done!!
#define PROCESS_ARG1(val) GET_ARG(val), PROCESS_ARG0()
#define PROCESS_ARG2(val, ...) GET_ARG(val), PROCESS_ARG1(__VA_ARGS__)
#define PROCESS_ARG3(val, ...) GET_ARG(val), PROCESS_ARG2(__VA_ARGS__)
#define PROCESS_ARG4(val, ...) GET_ARG(val), PROCESS_ARG3(__VA_ARGS__)
#define PROCESS_ARG5(val, ...) GET_ARG(val), PROCESS_ARG4(__VA_ARGS__)
#define PROCESS_ARG6(val, ...) GET_ARG(val), PROCESS_ARG5(__VA_ARGS__)
#define PROCESS_ARG7(val, ...) GET_ARG(val), PROCESS_ARG6(__VA_ARGS__)
#define PROCESS_ARGS_SELECT(inv1, inv2, inv3, inv4, inv5, inv6, inv7, INVOKE_THIS_ONE, ...) \
INVOKE_THIS_ONE
#define process_args(...) \
process_all_args( 0xAAAA, PROCESS_ARGS_SELECT(__VA_ARGS__, \
PROCESS_ARG7, PROCESS_ARG6, PROCESS_ARG5, \
PROCESS_ARG4, PROCESS_ARG3, PROCESS_ARG2, \
PROCESS_ARG1, PROCESS_ARG0)(__VA_ARGS__))
int main () {
process_args( NULL, "\n", 1, "\n2\n", 3.0 );
return 0;
}
/* implementation of arg processing! */
void process_all_args(int marker_argument, ...) {
va_list all_args;
va_start(all_args, marker_argument);
while (1) {
arg current_arg = va_arg(all_args, arg);
if (current_arg.data == NULL
&& current_arg.type == arg_type_shrug) {
// exit!
break;
}
process_arg(current_arg);
}
va_end(all_args);
}
void process_arg(arg arg0) {
switch (arg0.type) {
case arg_type_int:
{
// points at a single integer
int value = *(int*)arg0.data;
printf("%d", value);
}
break;
case arg_type_double:
{
// points at a single double
double value = *(double*)arg0.data;
printf("%f", value);
}
break;
case arg_type_ptr_char:
{
// points at a character string
char* value = *(char**)arg0.data;
printf("%s", value);
}
break;
case arg_type_ptr_void:
{
// points at a character string
void* value = *(void**)arg0.data;
printf("%p", value);
}
break;
/* and so on, and so forth... */
default:
{
void* value = arg0.data;
printf("(unknown type!) %p", value);
}
break;
}
}