Hunt for Lex and Yacc, the Dinosaur

Everything is text we were told. Imagine that with this assumption we plan to take on a dinosaur. Indeed, this is one of the greatest jokes ever done in history of programming. Lex manual page said that there is an asteroid that will kill it for us. There is none, and the dinosaur is what we created ourselves and what we are made of.

Human and computer interaction is quite limited. It doesn't help that we express ourselves in a very inefficient and quite bizarre ways. The basis for a chunk of communication is a language, and using it in combination with text in order to communicate with computers was undeniably a well-made choice, and perhaps even natural to some extent. Designing and creating languages entirely to express what computer is supposed to do was an expected consequent action. Creation of tools like Lex and Yacc was also predictable, and obviously there is nothing wrong with it.

With introduction like this, it feels like I haven't got anything else to write. Regarding this level of abstraction, yeah, it's not like we can easily change ourselves as human beings. However, if we change the level, we have something to discuss. That thing, or rather things are the form our written language takes, intermediate abstractions we use, and tools or interfaces that are between us and the machine.

I'll focus here on a selected tiny bit of the problem, and if you feel interested in the whole thing, check out e.g. Bret Victor's talks, especially The Humane Representation of Thought.

Now then, for me it looks like the source code is dominantly structured similarly to books. It has table of content, maybe index, and the content itself. Sometimes there might be some annotations or references to other books. In general, the content is one sequence. It could be divided into chapters or paragraphs, but it's still one book. Source code behaves the same way: it's one thing presented to the system that processes it. In some cases, the source is structured using files as a unit, but the strictness of this approach varies, and the goal for such structuring is so that the programmer can understand it better. In the end, computer usually receives the source as a whole anyway (like: "here, the program consists of these files; parse them, compile, link, whatever").

The abstract structure of the program is pretty much always explained to the computer using the features present in the language. It sounds obvious to do so. Thing is, we have more ways of expressing complex structures or hierarchies to computer than just plain text.

Files and databases. More generally, a dedicated thing that expresses only structure of the program. Yes, even using a text file, if you must. It doesn't matter. What's important is to have a dedicated, readable for both the programmer and the machine way of structuring the program (not just source) that is closer to a graph that's similar to the abstract semantic graph. The closer we get to this graph representation, the easier it gets to maintain the source and understand the program it describes. Name or parameter position changes, division of classes into smaller pieces, movement of functions from one entity to another, separating entire functionality into an external module; all of those and possibly more become either trivial or non-existent tasks.

Of course, some of the mentioned methods are less capable than the others. Filesystems aren't really a tool to create graphs, and text files are terrible at referencing. Additionally, a language that wants to be structured like this should provide tools for developers. They should be simple and specialized with an interface that lets them be easily integrated into more verbose toolsets. One example that tried something similar is Smalltalk with its environment.

You may ask now: "Isn't that what IDEs are?" Similar, but not quite the same. Modern IDEs are the essence of the law of the instrument: "I'm a text editor and the source is all text, therefore everything I do is change the text!" Some of them just got better at it. The other problem is that they are external to the environment of the language. An intruder that seeks information on its own. Look at LLVM and amazing things it produced by exposing smaller and smaller things to developers.

The goal is to extract part of the programming language into something new. The key is to find a balance between the representation, readability, ease of integration, and the tooling. Modern programming languages try to accomplish that through verbose text editing, which sooner or later might become a dead end. Exposing the representations that are used internally by the compilers and interpreters to the external tools and the user either through data or small specialized tools may help us to avoid such fate. Allowing user to interact with more abstract representations in a meaningful way will prove itself beneficial.

We are here not to kill a dinosaur. It's impossible for us to do as of now. We are here to reduce it to a smaller animal. Perhaps a chicken. It will live on our farm, we will take care of it, and in exchange it will give us some eggs. I believe we have more than just one way to describe the abstract programs that sit in our heads.