Every day thousands of C headers are included in C++ code. At least that's what our intuition may say. Are all of
them really C headers or is there something a bit more interesting happening? Let's see it through by doing the exact
opposite: by exposing C++ to C. Step by step.
Functionality is not important here, so let's start with an identity function in a identity.cpp:
namespace foo {
int identity(const int x) { return x; }
}
Now, a naive C test executable, as much as we can fill in, so we have problems to solve, main.c:
#include <stdio.h>
#include <stdlib.h>
int
main(int argc, char* argv[])
{
const int x = argc > 1 ? atoi(argv[1]) : 0;
printf("%d -> %d\n", x, foo::identity(x));
}
This, of course, will not work. Namespaces are C++ and the function was not declared. If we ignore the namespaces for
now, we know the return type and the arguments of the function, so the only missing piece for declaration is a name.
It's not something unobtainable, once we compile object file we can take a peek with readelf(1):
$ make identity.o
c++ -c -o identity.o identity.cpp
$ readelf -s identity.o | grep FUNC
3: 0000000000000000 12 FUNC GLOBAL DEFAULT 1 _ZN3foo4identityEi
C++ uses name mangling by default hence the unsightly name. Note that the name contains the namespace, which solves
the other problem before we even attempted to approach it. We can now modify the test program, main.c:
#include <stdio.h>
#include <stdlib.h>
int _ZN3foo4identityEi(int);
int
main(int argc, char* argv[])
{
const int x = argc > 1 ? atoi(argv[1]) : 0;
printf("%d -> %d\n", x, _ZN3foo4identity(x));
}
Tweak the Makefile, compile and link everything together, and there it goes:
$ make main
cc -c -o main.o main.c
cc main.o identity.o -lstdc++ -o main
$ ./main
0 -> 0
$ ./main 42
42 -> 42
Using mangled names directly like this is questionable and is done here just as an exercise. C++ provides a syntax to
manage this behaviour, so let's use that, identity.cpp:
namespace foo {
extern "C" int identity(const int x) { return x; }
}
Now the generated symbol name will be identity
. extern "C"
changes
language linkage of selected extern function
or extern variable declarations. This property, among other things, affects the name mangling behaviour. We may still
write normal C++ code within the function body as long as we can express the return type and arguments in a way
understandable to C.
Example code has a little bit of duplication: our identity function is declared twice: first in C++ along with
definition and second time in C. Of course, we'll use the orthodox solution and, finally, get into the headers. We can
take two different routes starting from either side. Since this is about C++ used in C, let's start with C++ side,
identity.hpp:
#pragma once
namespace foo {
extern "C" int identity(int x);
}
Then adjust identity.cpp accordingly:
#include "identity.hpp"
namespace foo {
int identity(const int x) { return x; }
}
Note that only the header declares identity
with C language linkage. Be aware that this approach mixed with
namespaces may result in some name conflicts as the clean name does not contain namespace. Anyway, this header compiles
on C++ side, but the C cannot consume it just yet. Let's backtrack and create another header, this time from the C side,
identity.h:
#pragma once
int identity(int x);
Insert include into the main.c:
#include <stdio.h>
#include <stdlib.h>
#include "identity.h"
int
main(int argc, char* argv[])
{
const int x = argc > 1 ? atoi(argv[1]) : 0;
printf("%d -> %d\n", x, identity(x));
}
Now, if we compile both, link and all, it will work. Just like with mangled names. But not only I didn't remove the
duplication I made it much worse by introducing two new header files—identity.h and identity.hpp—declaring
same function twice much more visibly!
The solution is to pretend it's not a problem. We'll merge both headers into a polyglot that is well-formed for both
C and C++ and declares the exact same thing for both. Consider, identity.h:
#pragma once
#ifdef __cplusplus
namespace foo {
extern "C" {
#endif
int identity(int x);
#ifdef __cplusplus
} // extern "C"
} // namespace foo
#endif
There are ways to make it more or less bearable. I changed declaration extern "C"
to a block one to
avoid EXPORT
macros or similar. This allowed me to have only two #ifdef
. Nonetheless, this
particular header can be included in both languages and it gets us equivalent results as intended. At this point
identity.hpp can be removed and we are left with one header.
Now we can answer the initial question. Are these all really C headers [that are being included in C++]?
Yes but not all. Some are definitely pure C headers with their include wrapped in explicit extern "C"
(e.g., lua.h) or through a dedicated C++ wrapper (e.g., cstdint). Others are polyglots that use clever (or not)
preprocessor operations to appear as C to a C compiler and as C++ to a C++ compiler.
The fact our little example works is not surprising, because the interface is really simple. There are linking and
syntax problems waiting for when we decide to use more C++ standard library or when we start distributing our binaries.
Most of these are details and are better to be considered on case by case basis. However, I'd like to explore one more
use case: publishing C++ classes to C.
Consider a simple counter. Its implementation is not important so assume consecutive calls to next()
will return a sequence of integers: 0, 1, 2, 3, and so on. Declared as below, counter.hpp:
#pragma once
namespace foo {
struct Counter
{
Counter();
int next();
private:
int m_count;
};
} // namespace foo
Including this header from C would quickly fail. It has no dedicated class syntax and trying to make a polyglot like
before will not work here. Well, there are some tricks possible. Today, we will build a special kind of an adapter,
counter.h:
#pragma once
#ifdef __cplusplus
namespace foo {
extern "C" {
#endif
int counter_next();
#ifdef __cplusplus
} // extern "C"
} // namespace foo
#endif
Not extravagant but it clearly states what functionality we want to provide. What about the state and identity of
objects? With this simple interface, we can use a singleton and put it in the module state, counter.cpp:
#include "counter.h"
#include "counter.hpp"
namespace foo {
static Counter g_counter;
int counter_next() { return g_counter.next(); }
} // namespace foo
If we need more than one instance, we can hold a container with all counters and hand out references to counters
instead. This is somewhat similar to how, e.g., file descriptors work. "Somewhat" because of a simplification.
Implementing it safely and reliably can be a good exercise. A naive approach to it could possibly suffice in a
prototype. I won't attempt either now.
What other options do we have? C has pointers and structs. Of course, we could change our class so that it wouldn't
have methods and then make sure it's trivial and has standard layout as defined in C++. That wouldn't be fun. Instead, I
will go the incomplete type route,
counter.cpp:
#include "counter.h"
#include "counter.hpp"
namespace foo {
struct Counter* counter_new() { return new Counter; }
void counter_free(struct Counter* const counter) { delete counter; }
int counter_next(struct Counter* const counter) { return counter->next(); }
} // namespace foo
Nothing unusual so far. Now, back to the C adapter header, counter.h:
#pragma once
#ifdef __cplusplus
namespace foo {
extern "C" {
#endif
struct Counter;
struct Counter* counter_new();
void counter_free(struct Counter* counter);
int counter_next(struct Counter* counter);
#ifdef __cplusplus
} // extern "C"
} // namespace foo
#endif
Of course, we have to declare and expose the lifetime management functions for the user. We declare
Counter
without content. Still not very unusual. In C++ this kind of declaration is usually called
a forward declaration. The unusual
bit is that we use it on C side without ever declaring its content. Both languages allow for such use in particular
cases and this is one of them. In the end, when type is actually used, in counter.cpp, it's well defined because
the C++ header is included.
Note that extern "C"
doesn't affect struct declaration, but it's there for convenience because on C++
side it must remain within the namespace.
Finally, from the user perspective the flow is rather orthodox (new → use → free) and now they may have multiple
counters, main.c:
#include <stdio.h>
#include <stdlib.h>
#include "counter.h"
int main(int argc, char* argv[])
{
const int x = argc > 1 ? atoi(argv[1]) : 0;
const int y = argc > 2 ? atoi(argv[2]) : 0;
int i, j;
struct Counter* a = counter_new();
while ((i = counter_next(a)) < x) {
struct Counter* b = counter_new();
while ((j = counter_next(b)) < y)
printf("i=%d\tj=%d\n", i, j);
counter_free(b);
}
}
This method can be also used in pure C if we want to hide some implementation details and force user to rely on
accessor functions. All in all, it's a handy tool when dealing with classic object-oriented programming in C. Polyglot
interpretation of headers is a fun thought but, if you think long enough about it, quite obvious. From here, I'd like to
ponder more about headers as interface declarations and their various uses and misuses. I'll leave that for another
time.