Playing Around With Simple Interfaces

This one comes as a little surprising and for experienced (or even intermediate) reader may appear boring. Think of it as "back to the fundamentals" type of article. Now that I think about it... Do I even have anything that wouldn't classify into this category? Oh, yeah, rants.

I had an opportunity to give lectures at the local university about programming and one of the topics was loosely related to persistence. We were playing around with a hardware simulator and I asked a question that we could use as a plot device:

What is the simplest way to put several files into an arbitrary memory?

Now, that probably wouldn't be a blog post if it wasn't for that not only the students were surprised, but also my colleagues at work.

How to Store Something in Memory?

We were using a simulator that I prepared for the lectures. It has a rather modern and straight-forward interface for writing data into the memory:

bool hwd::memory::write(std::vector<char> data, std::size_t offset);

Usually you will probably use something more similar to:

ssize_t write(int fd, const void * buf, size_t count);
ssize_t pwrite(int fd, const void * buf, size_t count, off_t offset);

Where regular write can be offset by using lseek on the file descriptor.

Whatever the case it is for you - it's almost given that there will be at least a little bit of similarities. I'll continue to use my simulator as example - it's simple, you can build it yourself, and it should be easily translatable.

Let's use it! We need some data and the just push it to the memory:

std::vector<char> data {'H', 'e', 'l', 'p', 0};
bool ok = hwd::memory::write(data, 0);

Nice, we wrote our call for help to the memory with a terminating null byte. How does one receive it?

How to Read Something From Memory?

There are similar functions provided:

std::vector<char> hwd::memory::read(std::size_t length, std::size_t offset);
ssize_t read(int fd, void * buf, size_t count);
ssize_t pread(int fd, void * buf, size_t count, off_t offset);

For any questions refer to your friendly manual page. If you are not familiar with this convention: in read and pread the content read from the memory is written to the buf that was passed as argument and the function result is just the count of bytes that were read.

With this, we can:

auto data = hwd::memory::read(5, 0);

But that's only because we know the length of the data before-hand. And the truth is - program needs to know something before it starts reading and then using the data. That something is usually a standard - memory layout or a format. In this case, we could assume: the program needs to store only one null terminated string. This kind of knowledge would be enough for the program to write and read arbitrary strings as long as they are null terminated.

Since we are using strings as an example, there is also another implementation of them: length coupled with an array. This kind of format allows the string to contain null bytes which is convenient when storing sightly more arbitrary data.

Assume that you know the size of the memory - in case of this simulator it's 10KB - figure out the number of bytes you need to use to store the maximum length of the data (2 bytes here), and then just say that there is an integer on first X bytes of that memory. Write the length of data to these bytes when writing data. Read the length of data from it and then read that many bytes from memory that follow it.

More or less it looks something like this:

if (data.size() < 9998) {
	prepend_uint16_to_vector(data, static_cast<std::uint16_t>(data.size()));
	hwd::memory::write(data, 0);
}

Where prepend_uint16_to_vector does exactly what you can expect from it in a known way (e.g., endianness is consistent between platforms). Then to read reverse the process:

auto data = hwd::memory::read(2, 0);
auto length = read_uint16_from_vector(data);
data = hwd::memory::read(length, 2);

In this case read_uint16_from_vector also does what you can expect from it. Note that this samples does two reads in total: first to get the length, second to get the data of given length.

How to Do the Same With Multiple Files

Now you might start thinking how do you apply this knowledge to store multiple independent files. The thing is, you are already there. Well, almost. First, allow the reader to initialize the data from standard input:

std::vector<char> data {std::ifstreambuf_iterator<char>{std::cin}, {}};

Of course, there are better ways to do it, but for the sake of the length of this post, let's not discuss them. In case this looks like elvish to you - trust me, this reads standard input to data. After you have this, the process remains the same: get the length, put it on two first bytes of the memory, and then put the data itself.

You have the input stream saved in memory. Now read it using our method and then output it to the standard output stream:

for (const auto & byte : data)
	std::cout << byte;

Or using any other method as long as it doesn't create any clutter.

By now you either got it or you started wondering where it is going. We are not storing any files in the memory! I'm lying! That's right accusation. We are not doing it but we created an interface that allows to do it. Think about it:

$ tar cf - directory/ | ./our_memory_write
$ ./our_memory_read | tar tf -
directory/file_a
directory/file_b

Ha! Assuming you have enough space you can even create image, format it, and then write to memory:

$ fallocate -l 9998 filesystem.img
$ mkfs.bfs filesystem.img
$ ./our_memory_write filesystem.img

Quite satisfying, but that's not the point of this post.

Design Simple Interfaces

Now, technically speaking this whole example can be limiting in couple of aspects. Primary thing you may not like about it is that it focuses on shell usage, and for some it's an outdated approach. One way to solve it would be to follow Hurd way and implement the interface as filesystem in userspace.

However, technicalities weren't meant to be the primary topic, it just so happened because they were fun to write about. What I would want you to understand by this example is that designing interfaces, following conventions or standards and choosing points/levels of contacts are all important things. You may think that when you just build single program it doesn't matter, but in such cases it matters even more, because if it happens to be used in long term, sooner or later it will be integrated with other programs.

This isn't about UNIX philosophy. This isn't about standardization of the entire world. No. It's all about you, your creations, and things that you integrate with. And it's sounds all hippie-dippie-unicorns, but that's just how it is. Think about what you are building. Think about what you will need and what you will integrate with. Find common generic interfaces that will last and build upon them. Be conscious about it. This method of saving files into memory works only because it builds on a well-established and incredibly simple convention. Look at the all-popular modern Web APIs, they are built on top of HTTP so well that it's hard to believe that there are any other protocols out there in the net.