Structured source in examples

Let's consider a simple three-level example as a follow-up to Hunt for Lex (and Yacc), the dinosaur. I'll write in Lua. It's simple enough to understand without getting too much into the syntax, and the metatables will allow me to actually run the stuff I write. Please note that the examples are not the very exact thing I aim for. They are going to be read and interpreted at least twice from the text form, or they might be used to generate code as text to run it.

Now, now. Let's say that we have the following function written in Lua:

function fold (list, func, a)
	local key, value
	if nil == a then
		key, a = next(list)
	end
	key, value = next(list, key)
	while key do
		a = func(a, value)
		key, value = next(list, key)
	end
	return a
end

function sum (a, b)
	return a + b
end

function main (args)
	local t = {}
	for i=1, args[1] or 5 do
		t[i] = math.random(1, 100)
	end
	local s = fold(t, sum, 0)
	print(s)
end

It should be understandable: a sample implementation of fold function, and a main function that populates table with some random integers, then prints their sum. On my machine without seeding the result of main{5}: 376.

This was our regular looking source code. Let's make it less readable now. Again, please note, that my aim here is to express the structure and abstract, not how it is written down using Lua tables. Hopefully, next paragraphs will explain the idea better.

Let's say that we have our program saved as a following table in some kind of source image:

program = {
	fold = func {
		arguments = {"list", "func", "a"},
		upvalues = {"next"},
		impl = [[
			local key, value
			if nil == a then
				key, a = next(list)
			end
			key, value = next(list, key)
			while key do
				a = func(a, value)
				key, value = next(list, key)
			end
			return a
		]],
	},
	sum = func {
		arguments = {"a", "b"},
		impl = [[
			return a + b
		]],
	},
	main = func {
		arguments = {"args"},
		upvalues = {"math", "fold", "sum", "print"},
		impl = [[
			local t = {}
			for i=1, args[1] or 5 do
				t[i] = math.random(1, 100)
			end
			local s = fold(t, sum, 0)
			print(s)
		]],
	},
}

With some manipulation of metatables and environments, if I run main{5}, the result will be 376. Surprising, isn't it, we meet the bare minimum perfectly.

fold the map

What does it give us? In such small shift of representation the benefit is actually almost unnoticeable. I know, sounds like treason; I pretty much promised you neverending source of oil and honey. Take it slow, hear me out. That little difference is in how we can manage the program source. With a simple utility and even a naive approach, one could swiftly get such information as: what functions exist in there, and where are they used (through the upvalues). We can also extend the model to include the documentation, so that such image could be browsed by an utility as some kind of manual viewer.

Another example of a simple utility that could enhance the process of managing the code base is diff program. Assuming, we would get it work with this format it could produce something like:

diff = {
	main = func {
		impl = [[
			@3-3,3-3
			-	t[i] = math.random(1, 100)
			+	t[i] = math.random(50, 75)
		]],
	},
}

The information we get from this kind of representation is slightly more verbose than just pure text. Imagine what would happen if we were to push the representation even further. Like: "oh, those two parameters in function call have been changed." Imagine that diff spells exactly that.

Of course, all of those are already accomplished with the traditional approach. Sadly, I feel like I need to acknowledge it quite often to not get ridiculed into oblivion. The traditional approach restrains us and enforces to treat the source code as text. What I want to avoid is Maslow's hammer fallacy. Let's represent the program as a program, not text describing the program. Let's aim for that.

I think that describing the program in a more structured way is actually the trend in here. Take a look at the tools you can refactor your source code with. They are most likely integrated into a bigger tool-set such as IDE or i.e. some clang-stuff. Both of them use their internal representations to do the changes. They rarely operate solely on the text. Be it some cached symbols or whatever. It's still there.

Let me repeat myself yet again: the goal is to make such representation standardized and available externally. The details are up to individual decisions: I would aim for minimizing heavy parsing requirements, and letting proper referencing. In other words: normalizing the representation as if it's a database (shifting the normal form).

Let's say we get step further:

program = {
	fold = func {
		arguments = {"list", "func", "a"},
		upvalues = {"next"},
		impl = {
			locals {"key", "value"},
			ifstmt {
				cond = {nil, "equal", argument "a"},
				then = {
					assign {
						{local_ "key", argument "a"},
						call {upvalue "next", {argument "list", nil}}
					}
				}
			},
			assign {
				{local_ "key", local_ "value"},
				call {upvalue "next", {argument "list", local_ "key"}}
			},
			whilelp {
				cond = local_ "key",
				do_ = {
					assign {
						local_ "a",
						call {argument "func", {local_ "a", local_ "value"}}
					},
					assign {
						{local_ "key", local_ "value"},
						call {upvalue "next", {argument "list", local_ "key"}}
					},
				}
			},
			return_ {local_ "a"}
		}
	}
}

Just one function, but it should give an overview. That's an unreadable mess. That's because it's an extreme example. It doesn't need to be like that. Moreover, the way it is represented to the programmer may be different. We can afford lightly parsing the source before saving it to the image of the structure. Assuming this is what we make available to the external utilities, we could expect even more verbose diff tool:

diff = {
	fold = func {
		upvalues = append "new_next",
		impl = {
			[3] = _ {_, _ {upvalue "new_next", _}}
		}
	}
}

It reads as: changes in fold implementation, in third statement, first element of selected nested table. Of course, this example might hide a little bit too much information, but you should get the idea. I could extend it even more by adding even better references, so that the names of variables or functions won't matter at all.

The possibilities are there. What's left for me now is to explore the idea. I think creating a simple implementation of a complete system that revolves around this idea is the best course of action. Wish me luck and dedication. I would prefer to have some domain-specific need to fulfil with it, rather than creating some general-purpose programming language as there are plenty of those now, and how they end up, we all know. Maybe someone else will get inspired; that would be cool, too.