Deconstructing Web Browsers

Welcome to one of my little experiments! The theme is simple: to deconstruct a web browser and create several utilities with distinct and clear responsibilities in its stead. This is not your regular blog post. Honestly, I'm not yet sure what it is, but I'll figure it out at some point. Expect this page to be updated and extend. (Well, just like my regular posts, for some reason, I need rethink it one more time...)

Motivation and History

The idea started to sprout in my mind few years ago. In its early stages it wasn't really directed at web browsers but instead it focused on markdown. After giving it some more thinking, it changed target to web browsers. Now, it also started to draw from Unix philosophy and my general aversion towards IDEs, or rather that this kind of modularity started to be visible as the real core of the motivation that drives this idea. I never really touched or explored it yet. I didn't try to discredit it either. Hopefully, once I reach that point, it will stand its ground.

Last year, I explored this idea a bit in a two-part text series and within a small project called browse. I naively split the responsibilities between programs and had some fun writing very simple scripts that did the work. And they did the work surprisingly good, but functionality constraints had to be extremely strict. Recently, I came back to it, read my own stuff, looked at my own code, and I still could relate to it. Instead of removing everything like I sometimes do, I decided to develop a new utility and write this new summary and project status.

Experimenting From a Terminal

Rather than jumping into design or development work straight away, let's see how far can we get, while using only shell and some usual utilities you can find in every shed. To access a webpage, one could potentially eat it raw:

$ curl -sL https://ignore.pl/ | less
...

Now, that's raw! With a page like this one, it's possible. I write them by hand and comply to my own rules that make it possible for the reader to consume them as plain text. However, it's not very useful considering how astoundingly obfuscated modern HTML pages can get.

It's not only extremely complex HTML hierarchies that we need to deal with. Another great opponents are web applications that pretend to be webpages. Separating those two will prove itself to be useful. Not only that, it will also open us to new possibilities. Consider a dead simple script that acts similarly to regular opener:

#!/bin/sh
TMP=$(mktemp -p /dev/shm) &&
	{ TYPE=$(curl -sLw "%{content_type}\n" $@ -o "$TMP") &&
		case "$TYPE" in
			application/pdf) zathura "$TMP";;
			image/*) sxiv "$TMP";;
			text/html*) html_viewer "$TMP";;
			text/markdown*) markdown_viewer "$TMP";;
			text/*) less "$TMP";;
			*) echo "$TMP";;
		esac }
rm -f "$TMP"

You use it like this:

$ ./script https://ignore.pl/

It shows the requested content using a program that's selected based on its mime type. Here, the difference between webpage and web application is blurred. Hypothetically, using mime or some other means we could do a switch cases like these:

web-application/html+js) fork_of_chromium_or_something "$TMP";;
web-application/lua) lua_gui_sandbox "$TMP";;

The ability to support multiple competing frameworks that are meant to run seamlessly loading sandboxed applications (so, web applications) is really making me interested.

That's not the only thing though. As you can see, in this example markdown and HTML are completely separated. Markdown is no longer a format that's supposed to generate HTML but instead it becomes a stand-alone hypertext format. Because the content requests are meant to run through such demultiplexer the hyperlinks can lead from one hypertext format to another. This allows new formats and new ways of expression to grow and compete, hopefully breathing some life into an ecosystem that's currently driven by monolithic giants.

Browser That’s Part of Your Environment

Of course, a single script like the example above is not the way to go, but it's a good start as it gives insight into data flow and responsibilities. At first, just by looking at it, I decided to naively distinguish four components:

navigator: Takes address of the request from user and forwards it to a protocol daemon. Retrieved content is then pushed to opener.
protocol daemon: Acquires and caches data using a single protocol e.g., HTTP.
opener: Chooses viewers based on content type.
viewer: Presents content to user and allows to interact with it.

I found it to be a decent starting point and played around with it getting encouraging results. All predicted obstacles made their appearances and thanks to working prototypes shortcomings of each role were shown. In the second iteration I wanted to divide navigator into several stand-alone parts but in the end I never committed to it.

Based on the description above, it doesn't seem as if navigator would require such division. Actually, current implementation doesn't clearly show such need either. The only hints are -f option in navigator and opener, and direct calls to protocol daemon by viewers to retrieve auxiliary content (e.g., stylesheet or embedded image). Meaning navigator is hiding a plumbing-capable request resolver below the porcelain interface that's dedicated to user.

More than that, navigator may also be hiding functionality meant to support browsing history that I didn't explore yet at all. Combining it with graphic interfaces, sessions management or tabs are all question marks.

Obviously, responsibilities of the components is not the only matter to think about. Interfaces in every form are also important. I'm talking here: communication between the components of the browser, interchangeability, communication between the browser and the rest of the environment it runs in, and integration with graphical user interfaces and window managers.

For now, I plan to split navigator and look into a equivalent of an address bar.