🧠 #5 - Nerding Out, Part 2
🧠
It’s the halfway point of the month, and that means another newsletter! This one’s going to be focused exclusively on some techniques in C that leverage some macro fun to create a generic “print” kind of macro, that prints any kind of object we get. (Or, at least, a reasonable default!)
But before that…
⭐ CoSy
Trainers are being locked in! We’ve got a handful of trainers for
- Rust (and are tentatively working for one with Embedded Rust);
- We’re working to confirm 3 C and C++ trainers;
- Potentially, 2 Zig trainers;
- and, two more Special Training in Systems Programming!
We’re getting pretty excited and hope that in the coming month we’ll get to release The Big Lineup™ if they confirm! That news will drop on the website with much fanfare 🎺!
And now…
Multi-argument, type-safe mechanisms in C
It’s time to get into it! First, we need to review what we picked up from a month ago.
Recap
- We learned that Compound Literals, with the form
(type[]){ value }
, can be used to give arguments passed to a function longevity. - We learned that
_Generic
is an expression, and each branch of it must not only have the same resulting type, but every branch of the_Generic
must compile perfectly for every given value placed into the_Generic
. - We learned that we can put this behind some macro calls to make it look nicer!
Here is the resulting code for what we had so far, with a quick demonstration in Godbolt if you click this link:
enum arg_type {
arg_type_int,
arg_type_double,
arg_type_ptr_char,
// ... and more...
arg_type_shrug
};
typedef struct arg_ {
enum arg_type type;
void* data;
} arg;
#define GET_ARG_TYPE(value) _Generic(value, \
int: arg_type_int, \
double: arg_type_double, \
char*: arg_type_ptr_char, \
/* and so on, and so forth... */ \
default: arg_type_shrug \
)
#define GET_ARG_DATA(value) \
((__typeof((value))[]){ (value) })
#define GET_ARG(value) \
(arg){ GET_ARG_TYPE(value), GET_ARG_DATA(value) }
void process_arg(arg arg0);
int main () {
int value0 = 1;
double value1 = 2.0;
const char* value2 = "3";
process_arg( GET_ARG(value0) );
process_arg( GET_ARG(value1) );
process_arg( GET_ARG(value2) );
return 0;
}
#include <stdio.h>
void process_arg(arg arg0) {
switch (arg0.type) {
case arg_type_int:
{
// points at a single integer
int value = *(int*)arg0.data;
printf("%d", value);
}
break;
case arg_type_double:
{
// points at a single double
double value = *(double*)arg0.data;
printf("%f", value);
}
break;
case arg_type_ptr_char:
{
// points at a character string
char* value = *(char**)arg0.data;
printf("%s", value);
}
break;
/* and so on, and so forth... */
default:
break;
}
}
Now, from here, we’re going to build towards what we promised in the last January newsletter:
int main (int argc, char* argv[]) {
process_args(1, 2.0, "3");
process_args(0x4, '5');
process_args((char)0x67);
return 0;
}
Step 0: Expanding Macros, in Macros
The process_args
call looks this way because it is a macro that hides the deduction of argument types and data. That part is easy: we just need to call the GET_ARG
macro from what we have:
#define process_args(arg0) \
process_arg(GET_ARG(arg0))
This is all fine and good, until we get to the stage where we need to expand it over 1 or more arguments. To do this we use the variable arguments for macros, the ...
syntax:
#define process_args(...) \
process_arg(GET_ARG(__VA_ARGS__))
While we might hope this produces 1 GET_ARG
for each expanded element, what this in-fact does is place all the arguments in side 1 GET_ARG(...)
macro invocation which is all then crammed into one process_arg
function call. Of course, none of this works out so the compiler freaks out and we’re back to the start!
So, we need a way to:
- Pass in multiple arguments (
__VA_ARGS__
) - Call the macro
GET_ARG
on each individual element, not the whole blob - Call the function
process_arg
or some equivalent for it.
This is going to require a bit of surgery, to both the function calls we have to make and the macro itself. We’ll start with the modifications we need to make to the function call to start.
Step 0, Redo: A Multi-Argument Function Call
First up, the function call process_arg
itself. That function is fine for processing individual arguments, but we need more so we can process multiple arg
functions. The way to do this is to make a top-level function that can take a variable number of arguments, using the “var_args
” construct:
#include <stdarg.h>
void process_all_args (int marker_argument, ...) {
va_list all_args;
va_start(all_args, marker_argument);
/* Processing Here! */
va_end(all_args);
}
One of the things that makes variable arguments (va_args
, as it’s known) work with va_start
and va_end
macros are 2 things:
- Knowing the # of arguments you are going to process and use, exactly.
- Having an argument that is not part of the
...
that can be used to “start” the list withva_start
.
When C standard library functions like printf
and friends use the ...
syntax and va_args
, the format string is both “marker_argument
” and the way to inform the va_args
on what to do with each argument it comes across. This is slightly detrimental, because it results in code where you write a specific argument indicator in the format string (such as "%d"
) when you meant a different kind of indicator (such as "%f"
) and thusly invoke undefined behavior when it does not match:
int some_integer = 0xAAAA;
printf("%s", some_integer); // uh oh!
printf("%d %f", some_integer); // uh oh, not enough arguments!
Compilers have gotten good at catching these cases for printf
with warnings, but this does not really scale up very well for common library developers like us who don’t have the might of the Standard Library behind us. We’ve solved this problem, however, with the arg
structure and process_arg
function! The type is stored in arg
, and we get a compound literal holding the data we can print out. So, we just need to go over each of these arg
structures, one at a time, from the va_list
:
#include <stdarg.h>
void process_all_args (int marker_argument, ...) {
va_list all_args;
va_start(all_args, marker_argument);
while (1) {
arg current_arg = va_arg(all_args, arg);
process_arg(arg);
}
va_end(all_args);
}
Awesome! But…
Step 1: When Do We Stop??
One of the things about printf
, formatting strings, and such functions is that there is always some indicator about how many arguments there are, or when to stop processing. Right now, we have an infinite while
loop that will keep pulling arguments, forever. That’s no good! In order to get around this, we will make sure there is always a last argument in our list that is an arg
, but has two special values:
- the
void* data;
member will be set toNULL
; and, - the
arg_type
will bearg_type_shrug
.
This will aid us in making sure we know when to stop. And thusly, we can use it like so:
/* code from the initial
part of the article */
#include <stdarg.h>
void process_all_args (int marker_argument, ...) {
va_list all_args;
va_start(all_args, marker_argument);
while (1) {
arg current_arg = va_arg(all_args, arg);
if (arg.data == NULL && arg.type == arg_type_shrug) {
// exit!
break;
}
process_arg(arg);
}
va_end(all_args);
}
int main (int argc, char* argv[]) {
process_all_args(
// marker argument
0,
// actual arguments
GET_ARG(1), GET_ARG("\n2\n"), GET_ARG(3.0),
// "no more, we are done" marker
(arg){ arg_type_shrug, NULL }
);
return 0;
}
If you run this code, unmodified, you might get some printouts and then some garbage, or you might get a Program Return Code of 139
.
Return Code 1…39 ???
Yes! If you get it, 139
means the program has encountered an I/O error, and is shorthand for “something went really wrong trying to read or write this data”. This is mostly because we need to use a special trick called l-value conversion in order to make the string literal "\n2\n"
be recognized by our GET_ARG_DATA
macro. Namely, __typeof("\n2\n")
deduces "\n2\n"
to be a type of const char[4]
, or a whole array.
This matters, because we use __typeof
with our GET_ARG_DATA
that generates a compound literal. When we unwind that argument on the other side inside the arg_type_ptr_char
case value for our switch
processing statement, we just expect a char*
. This essentially means we’re reading the data wrong, and shenanigans ensue! 😱
We need to somehow get const char[N]
to decay to const char*
, so it gets held onto properly. We also need it so we can keep the types for other things! It turns out there is a way:
Step 1.5: L-value Conversions!
Yes, it turns out the C Standard has exactly a mechanism for this. It’s called l-value conversions, and it’s a small little feature that decays arrays to pointers and strips qualifiers (const
, volatile
, _Atomic
, etc.) from object types. One way to trigger an l-value conversion is to do a cast, but that’s a bit difficult to expression in our GET_ARG_DATA
macro. Another way is to trigger it through other means in the language, like invoking the decay as part of using C’s native comma operator with a void
-banished expression. That ends up looking like this:
#define GET_ARG_DATA(value) \
((__typeof((void)0, (value))[]){ (value) })
This will prevent the Compound Literal from being stored as an array type and instead make sure the type is a const char*
instead. It’s a tad esoteric and a little whacky, in all perfect honestly, but it’s the best tool we have available for this. Note that we are still using __typeof
, the extension name for the Standard Feature, because there is still work to do to approve the Standard Feature for the next revision of the C Standard.
“Why not do this for GET_ARG_TYPE
, too?”
Good question! It turns out that _Generic
, before it starts doing type-matching, will always perform an l-value conversion on the passed-in value. That means it will “decay” arrays to the pointer type (const char[N]
becomes const char*
). For the compound literal, we need to lift the value up more directly so it matches.
WHEW, that was a lot of work and explanation! So, now we can start on the juicy part of the newsletter: working with variadic macros…
Next Time!
Ooooh, we know, so sorry! If you’ve read through all the way until now, we’re sure you were super excited for more! But, this newsletter is already mega long, and you need a break! At the end of the month, we’ll finally settle the score and make sure you have everything you need to write wonderful, nice-looking C that almost feels as good as C++ and its templates, without the shenanigans!
A complete listing of everything we’ve done, so far, is down below and at this link.
We hope you found this enlightening. We promise we’ll show you the secret to __VA_ARGS__
expansion and other things next time, for sure!!
— Shepherd’s Oasis 💙
P.S. A lot of people have been saying this month things like “March Madness”, which got us excited to get involved. Unfortunately, it was a little awkward; we expected chaos but the procession was way too organized!
#include <stdio.h>
#include <stdarg.h>
enum arg_type {
arg_type_shrug,
arg_type_int,
arg_type_double,
arg_type_ptr_char,
arg_type_ptr_void
};
typedef struct arg_ {
enum arg_type type;
void* data;
} arg;
#define GET_ARG_TYPE(value) _Generic(value, \
int: arg_type_int, \
double: arg_type_double, \
char*: arg_type_ptr_char, \
const char*: arg_type_ptr_char, \
void*: arg_type_ptr_void, \
/* and so on, and so forth... */ \
default: arg_type_shrug \
)
#define GET_ARG_DATA(value) \
((__typeof((void)0, (value))[]){ (value) })
#define GET_ARG(value) \
(arg){ GET_ARG_TYPE(value), GET_ARG_DATA(value) }
void process_arg(arg arg0);
void process_all_args(int marker_argument, ...);
#define process_args(...) \
process_all_args(0, \
/* To Be Discovered Next Time!! */, \
(arg){ arg_type_shrug, NULL })
int main () {
process_all_args( 0, GET_ARG(1), GET_ARG("\n2\n"), GET_ARG(3.0), (arg){ arg_type_shrug, NULL } );
return 0;
}
/* implementation of arg processing! */
void process_all_args(int marker_argument, ...) {
va_list all_args;
va_start(all_args, marker_argument);
while (1) {
arg current_arg = va_arg(all_args, arg);
if (current_arg.data == NULL
&& current_arg.type == arg_type_shrug) {
// exit!
break;
}
process_arg(current_arg);
}
va_end(all_args);
}
void process_arg(arg arg0) {
switch (arg0.type) {
case arg_type_int:
{
// points at a single integer
int value = *(int*)arg0.data;
printf("%d", value);
}
break;
case arg_type_double:
{
// points at a single double
double value = *(double*)arg0.data;
printf("%f", value);
}
break;
case arg_type_ptr_char:
{
// points at a character string
char* value = *(char**)arg0.data;
printf("%s", value);
}
break;
/* and so on, and so forth... */
default:
{
void* value = arg0.data;
printf("(unknown type!) %p", value);
}
break;
}
}