Publishing C++ Interfaces to C

Every day thousands of C headers are included in C++ code. At least that's what our intuition may say. Are all of them really C headers or is there something a bit more interesting happening? Let's see it through by doing the exact opposite: by exposing C++ to C. Step by step.

Functionality is not important here, so let's start with an identity function in a identity.cpp:

namespace foo {
int identity(const int x) { return x; }
}

Now, a naive C test executable, as much as we can fill in, so we have problems to solve, main.c:

#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char* argv[])
{
	const int x = argc > 1 ? atoi(argv[1]) : 0;
	printf("%d -> %d\n", x, foo::identity(x));
}

This, of course, will not work. Namespaces are C++ and the function was not declared. If we ignore the namespaces for now, we know the return type and the arguments of the function, so the only missing piece for declaration is a name. It's not something unobtainable, once we compile object file we can take a peek with readelf(1):

$ make identity.o
c++ -c -o identity.o identity.cpp
$ readelf -s identity.o | grep FUNC
     3: 0000000000000000    12 FUNC    GLOBAL DEFAULT    1 _ZN3foo4identityEi

C++ uses name mangling by default hence the unsightly name. Note that the name contains the namespace, which solves the other problem before we even attempted to approach it. We can now modify the test program, main.c:

#include <stdio.h>
#include <stdlib.h>

int _ZN3foo4identityEi(int);


int
main(int argc, char* argv[])
{
	const int x = argc > 1 ? atoi(argv[1]) : 0;
	printf("%d -> %d\n", x, _ZN3foo4identity(x));
}

Tweak the Makefile, compile and link everything together, and there it goes:

$ make main
cc -c -o main.o main.c
cc main.o identity.o -lstdc++ -o main
$ ./main
0 -> 0
$ ./main 42
42 -> 42

Using mangled names directly like this is questionable and is done here just as an exercise. C++ provides a syntax to manage this behaviour, so let's use that, identity.cpp:

namespace foo {
extern "C" int identity(const int x) { return x; }
}

Now the generated symbol name will be identity. extern "C" changes language linkage of selected extern function or extern variable declarations. This property, among other things, affects the name mangling behaviour. We may still write normal C++ code within the function body as long as we can express the return type and arguments in a way understandable to C.

Example code has a little bit of duplication: our identity function is declared twice: first in C++ along with definition and second time in C. Of course, we'll use the orthodox solution and, finally, get into the headers. We can take two different routes starting from either side. Since this is about C++ used in C, let's start with C++ side, identity.hpp:

#pragma once

namespace foo {
extern "C" int identity(int x);
}

Then adjust identity.cpp accordingly:

#include "identity.hpp"

namespace foo {
int identity(const int x) { return x; }
}

Note that only the header declares identity with C language linkage. Be aware that this approach mixed with namespaces may result in some name conflicts as the clean name does not contain namespace. Anyway, this header compiles on C++ side, but the C cannot consume it just yet. Let's backtrack and create another header, this time from the C side, identity.h:

#pragma once

int identity(int x);

Insert include into the main.c:

#include <stdio.h>
#include <stdlib.h>

#include "identity.h"


int
main(int argc, char* argv[])
{
	const int x = argc > 1 ? atoi(argv[1]) : 0;
	printf("%d -> %d\n", x, identity(x));
}

Now, if we compile both, link and all, it will work. Just like with mangled names. But not only I didn't remove the duplication I made it much worse by introducing two new header files—identity.h and identity.hpp—declaring same function twice much more visibly!

The solution is to pretend it's not a problem. We'll merge both headers into a polyglot that is well-formed for both C and C++ and declares the exact same thing for both. Consider, identity.h:

#pragma once

#ifdef __cplusplus
namespace foo {
extern "C" {
#endif


int identity(int x);

#ifdef __cplusplus
}  // extern "C"
}  // namespace foo
#endif

There are ways to make it more or less bearable. I changed declaration extern "C" to a block one to avoid EXPORT macros or similar. This allowed me to have only two #ifdef. Nonetheless, this particular header can be included in both languages and it gets us equivalent results as intended. At this point identity.hpp can be removed and we are left with one header.

Now we can answer the initial question. Are these all really C headers [that are being included in C++]?

Yes but not all. Some are definitely pure C headers with their include wrapped in explicit extern "C" (e.g., lua.h) or through a dedicated C++ wrapper (e.g., cstdint). Others are polyglots that use clever (or not) preprocessor operations to appear as C to a C compiler and as C++ to a C++ compiler.

The fact our little example works is not surprising, because the interface is really simple. There are linking and syntax problems waiting for when we decide to use more C++ standard library or when we start distributing our binaries. Most of these are details and are better to be considered on case by case basis. However, I'd like to explore one more use case: publishing C++ classes to C.

Consider a simple counter. Its implementation is not important so assume consecutive calls to next() will return a sequence of integers: 0, 1, 2, 3, and so on. Declared as below, counter.hpp:

#pragma once

namespace foo {
struct Counter
{
	Counter();
	int next();
private:
	int m_count;
};
}  // namespace foo

Including this header from C would quickly fail. It has no dedicated class syntax and trying to make a polyglot like before will not work here. Well, there are some tricks possible. Today, we will build a special kind of an adapter, counter.h:

#pragma once

#ifdef __cplusplus
namespace foo {
extern "C" {
#endif

int counter_next();

#ifdef __cplusplus
}  // extern "C"
}  // namespace foo
#endif

Not extravagant but it clearly states what functionality we want to provide. What about the state and identity of objects? With this simple interface, we can use a singleton and put it in the module state, counter.cpp:

#include "counter.h"
#include "counter.hpp"


namespace foo {

static Counter g_counter;

int counter_next() { return g_counter.next(); }

}  // namespace foo

If we need more than one instance, we can hold a container with all counters and hand out references to counters instead. This is somewhat similar to how, e.g., file descriptors work. "Somewhat" because of a simplification. Implementing it safely and reliably can be a good exercise. A naive approach to it could possibly suffice in a prototype. I won't attempt either now.

What other options do we have? C has pointers and structs. Of course, we could change our class so that it wouldn't have methods and then make sure it's trivial and has standard layout as defined in C++. That wouldn't be fun. Instead, I will go the incomplete type route, counter.cpp:

#include "counter.h"
#include "counter.hpp"


namespace foo {

struct Counter* counter_new() { return new Counter; }
void counter_free(struct Counter* const counter) { delete counter; }
int counter_next(struct Counter* const counter) { return counter->next(); }


}  // namespace foo

Nothing unusual so far. Now, back to the C adapter header, counter.h:

#pragma once

#ifdef __cplusplus
namespace foo {
extern "C" {
#endif

struct Counter;


struct Counter* counter_new();
void counter_free(struct Counter* counter);
int counter_next(struct Counter* counter);

#ifdef __cplusplus
}  // extern "C"
}  // namespace foo
#endif

Of course, we have to declare and expose the lifetime management functions for the user. We declare Counter without content. Still not very unusual. In C++ this kind of declaration is usually called a forward declaration. The unusual bit is that we use it on C side without ever declaring its content. Both languages allow for such use in particular cases and this is one of them. In the end, when type is actually used, in counter.cpp, it's well defined because the C++ header is included.

Note that extern "C" doesn't affect struct declaration, but it's there for convenience because on C++ side it must remain within the namespace.

Finally, from the user perspective the flow is rather orthodox (new → use → free) and now they may have multiple counters, main.c:

#include <stdio.h>
#include <stdlib.h>

#include "counter.h"

int main(int argc, char* argv[])
{
	const int x = argc > 1 ? atoi(argv[1]) : 0;
	const int y = argc > 2 ? atoi(argv[2]) : 0;
	int i, j;
	struct Counter* a = counter_new();
	while ((i = counter_next(a)) < x) {
		struct Counter* b = counter_new();
		while ((j = counter_next(b)) < y)
			printf("i=%d\tj=%d\n", i, j);
		counter_free(b);
	}
}

This method can be also used in pure C if we want to hide some implementation details and force user to rely on accessor functions. All in all, it's a handy tool when dealing with classic object-oriented programming in C. Polyglot interpretation of headers is a fun thought but, if you think long enough about it, quite obvious. From here, I'd like to ponder more about headers as interface declarations and their various uses and misuses. I'll leave that for another time.