why languages matter (or: why that guy doesn’t understand the meaning of “syntactic sugar”)

I’ve been thinking more about one of my coworker’s many (strong) opinions, that programming languages are nothing more than syntax. I worry that we get too far taken by this argument, that we start to think of languages as the brand of hammer we use to pound the nail instead of the tools we use to solve the problem. Then again, languages are even more than the tools we use to solve the problem, they are the problem.

Now, I’m pretty sure I know what you are thinking, only because I performed an ugly bit of language myself, there. You see, when I say languages are the problem, I don’t mean that they are a problem, or that they are something that needs to be solved. This somewhat goes without saying, and it isn’t really the deeper point I want to make here.

Let’s play with an analogy then, and forgive how poorly fitting this is. Let’s pretend that programming languages are like scientific disciplines.

Now, we can relatively firmly draw an analogy between assembly language and atomic physics. Both are “atomic”, in the sense that they are the smallest units of whatever we are talking about. Both can describe anything we would want, and the problems of both are similar: we really can’t describe large scale (as in, say, visible stuff) things in terms of atomic physics, because complexity ends up kicking us in the ass. Likewise, we have difficulties scaling pieces of assembly past certain points – there is a reason linux wasn’t written in assembly, past the obvious part of running it on different processors. People like things like named variables, looping constructs, and short-hand for conditional jumps.

In the world of science, these “syntactic sugars” are like the other sciences. Chemistry is really just a certain area of atomic physics, on a much grander scale. We amass bodies of knowledge as they relate to this tiny piece of atomic physics, and it isn’t just syntactic sugar. It tames complexity in a way that you cannot really achieve without it. Biology is another complexity tamer, ecology even more so. Each of these is an attempt to construct abstractions on top of physics that we can use to tackle certain problems.

We do the same thing, in the programming world. We built C, which is just slight syntax on top of assembly (I would go so far as to say that it is an assembler, like the least common denominator of all assembly languages, plus the nice little shortcuts). With C, we are mostly communicating in the same language of assembly.

Mostly.

You see, C gives us a few really, really, I mean really important semantic constructs (and by “gives us”, I don’t mean that it was historically the first, just that it is now the prominent “base languages” for all our Algol-based languages). Here’s the most important one: the variable.

That’s right. Give away everything else – looping isn’t too bad, just some gotos here or there. Functions? Well, hey, just jumping to some pieces of code and jumping back. Maybe I could say that those are just some pieces of “syntactic sugar” – after all, we can accomplish the same thing in assembly code, right?

The one thing you absolutely cannot ever do in machine instruction code is have variables. Sure, you have memory locations, and registers, and all the other, addressable storage, but you cannot hold the concept of variables. This of course doesn’t mean that you can’t tell the processor the same exact thing in either language; after all, in a very important sense, assembly language is more powerful, because I have the freedom to tell the processor exactly what steps to take.

But machine code doesn’t have the idea of variables. You fake it – you write comments, you put human readable instructions outside the language, but it ends up just that – out-of-band communication.

This, then, is where we must truly understand what “syntactic sugar” means. Most people use it in the derogatory sense, as in, “Lambda-what? It’s just syntactic sugar! Who needs it?” Of course, those same people wouldn’t call variables “syntactic sugar” (except to make a point); you don’t hear anyone screaming “Variables?? Syntactic sugar! Get rid of ‘em!”.

A part of being Turing-complete means that all languages can “do” the same thing. What it doesn’t imply is that all languages can “mean” the same thing. It’s subtle, and it is certainly arguable, as most low-level lovers will tell you that what something does is what it means, or at least imply just that (Raymond Chen is somewhat famous for saying something along the lines of the fact that we shouldn’t have source-level debuggers, just assembly debuggers).

But this isn’t really true. A variable named “fileName” doesn’t “mean” the contents of the register or memory location that it’s variable reference points to, and it certainly doesn’t “mean” the various charges on the circuits in your computer. It means that it is some type of moniker to identify a file, which is a logical entity that contains information that somebody can probably understand and parse.

Is a file just syntactic sugar? Why don’t we all just write random bits to disk and parse them out ourselves?

In the end, a large part of it is complexity. Most of what “higher” level languages give us is a handle on complexity. A simple piece of C# code, say, 20 lines, might eventually bloom into 100 assembly instructions, maybe 200, maybe 1000 (maybe 10, who knows :) ). In C#, you can “do” all the same things you could “do” in assembly, in the Turing-complete sense. But in C#, you can “mean” something different than the Turing-complete sense.

Anyways, this was just to start off my thought process on this. I’m sure there will be a sequel for this, soon. After all, how can we talk about “higher-level” languages without discussing Lisp? Would I still be Noah if I didn’t bring up Lisp anytime anyone discussed anything related to languages? Also, for next time, (in 5 o’clock newscaster voice: ) why the language you choose can effect the way that you work, in ways that you cannot even immediately see or comprehend, until it is too late.

Ah, almost forgot.  When I said “languages are the problem”, I meant that the problem, as we understand it, exists in the language we use to describe it.  In other words, both the problem and its solution are affected by the language(s) you use.  I’ll actually explain that, next time.

Stay tuned.

Comments are closed.