Hunt for Lex and Yacc, the Dinosaur
Published on 2020-06-20 23:36:00+02:00
Everything is text we were told. Imagine that with this assumption we plan to take on a dinosaur. Indeed, this
is one of the greatest jokes ever done in history of programming. Lex manual page said that there is an asteroid that
will kill it for us. There is none, and the dinosaur is what we created ourselves and what we are made of.
Human and computer interaction is quite limited. It doesn't help that we express ourselves in a very inefficient and
quite bizarre ways. The basis for a chunk of communication is a language, and using it in combination with text in order
to communicate with computers was undeniably a well-made choice, and perhaps even natural to some extent. Designing and
creating languages entirely to express what computer is supposed to do was an expected consequent action. Creation of
tools like Lex and Yacc was also predictable, and obviously there is
nothing wrong with it.
With introduction like this, it feels like I haven't got anything else to write. Regarding this level of abstraction,
yeah, it's not like we can easily change ourselves as human beings. However, if we change the level, we have something
to discuss. That thing, or rather things are the form our written language takes, intermediate abstractions we use, and
tools or interfaces that are between us and the machine.
I'll focus here on a selected tiny bit of the problem, and if you feel interested in the whole thing, check out e.g.
Bret Victor's talks, especially The Humane
Representation of Thought.
Now then, for me it looks like the source code is dominantly structured similarly to books. It has table of content,
maybe index, and the content itself. Sometimes there might be some annotations or references to other books. In general,
the content is one sequence. It could be divided into chapters or paragraphs, but it's still one book. Source code
behaves the same way: it's one thing presented to the system that processes it. In some cases, the source is structured
using files as a unit, but the strictness of this approach varies, and the goal for such structuring is so that the
programmer can understand it better. In the end, computer usually receives the source as a whole anyway (like: "here,
the program consists of these files; parse them, compile, link, whatever").
The abstract structure of the program is pretty much always explained to the computer using the features present in
the language. It sounds obvious to do so. Thing is, we have more ways of expressing complex structures or hierarchies to
computer than just plain text.
Files and databases. More generally, a dedicated thing that expresses only structure of the program.
Yes, even using a text file, if you must. It doesn't matter. What's important is to have a dedicated, readable for both
the programmer and the machine way of structuring the program (not just source) that is closer to a graph that's similar
to the abstract semantic graph. The closer we get to this graph representation, the easier it gets to maintain the
source and understand the program it describes. Name or parameter position changes, division of classes into smaller
pieces, movement of functions from one entity to another, separating entire functionality into an external module; all
of those and possibly more become either trivial or non-existent tasks.
Of course, some of the mentioned methods are less capable than the others. Filesystems aren't really a tool to create
graphs, and text files are terrible at referencing. Additionally, a language that wants to be structured like this
should provide tools for developers. They should be simple and specialized with an interface that lets them be easily
integrated into more verbose toolsets. One example that tried something similar is Smalltalk with its environment.
You may ask now: "Isn't that what IDEs are?" Similar, but not quite the same. Modern IDEs are the essence of the law
of the instrument: "I'm a text editor and the source is all text, therefore everything I do is change the text!" Some of
them just got better at it. The other problem is that they are external to the environment of the language. An intruder
that seeks information on its own. Look at LLVM and amazing things it produced by exposing smaller and smaller things to
The goal is to extract part of the programming language into something new. The key is to find a balance between the
representation, readability, ease of integration, and the tooling. Modern programming languages try to accomplish that
through verbose text editing, which sooner or later might become a dead end. Exposing the representations that are used
internally by the compilers and interpreters to the external tools and the user either through data or small specialized
tools may help us to avoid such fate. Allowing user to interact with more abstract representations in a meaningful way
will prove itself beneficial.
We are here not to kill a dinosaur. It's impossible for us to do as of now. We are here to reduce it to a smaller
animal. Perhaps a chicken. It will live on our farm, we will take care of it, and in exchange it will give us some eggs.
I believe we have more than just one way to describe the abstract programs that sit in our heads.