5 min read

Syntax Cycle

I’m a delinquent correspondent. Perhaps I have been too busy. Perhaps I’ve just been telling myself that I’m busy because I haven’t been able to think anything worth your time to read.

I have been traveling quite a lot and that’s usually what prompts me to write. I’ve been around the world with a Starlink terminal this year, and on the road with laptops and phones galore. I’ve purchased Raspberries Pi and travel routers in a number of local currencies and bought and lost so many charging bricks and batteries that I have lost track completely.

All I can really share from traveling is that computing ‘at the edge’ means without vibe coding, without reliable google, without any more power and information and compute than you could bring with you in a carry-on bag. That’s… that’s where the edge is. Edge-stuff, like all matter, is more gorge than bridge.

#vanlifers and business travelers in lay-flat airline beds have more in common than I think either would care to admit, by the way.

If I learned anything from this year’s travel, they are lessons that will take years to consolidate. In the mean time, I left another set of threads dangling here that may be worth completing.

We worked part of the way around a syntax/flavor cycle last year — ending on sour. (catch up on sour, umami, sweet)

Today we’ll do salt. Interpreter files are not always well understood. Most UNIX users know that a ‘shell script’ starts with ‘#!’, but understanding goes downhill from there. This combination is sometimes called 'shebang', but not by me. I say 'hashbang'

Hash and bang are just bytes like any other. Put them together at the start of a file and the kernel can tell that you have an interpreter file. It will fish the name of the interpreter out of the rest of the first line of the file and exec() it with the name of the ‘#!’ file as its terminal argument. The interpreter can’t search the path, it can’t expand variables on the command line, can’t do any of that. What’s more, the ‘#!’ line at the start of the file had better be legal in whatever language you’re writing. In /bin/sh language, ‘#’ denotes a comment and the rest of the line is ignored. The UNIX gods were crafty.

One of the sharp edges of these interpreter files is that you have to know exactly where in the filesystem the interpreter lives. Easy maybe for an interpreter like /bin/sh, harder for an interpreter that you have built yourself and which may appear in different places depending on where a remote filesystem is mounted.

Plan 9 laughs at this problem. It’s as easy to arrange the filesystem to suit an individual process as it is to set a PATH variable. /bin/sh is whatever program you choose for it to be for however long you wish it.

Do you remember that contestants on Jeopardy used to receive the home version of the game as a consolation? Anyhow, that’s docker. Punch out enough little cardboard game pieces and you can recapture the experience of losing — right in your living room. Assemble enough scaffolding around your interpreter file and you can pretend that you had the same control over /bin/sh as the cool kids.

There is a middle way that’s more like neighborhood pub quiz. That’s /usr/bin/env. /usr/bin/env is a handy utility that lets you modify the environment of a subordinate program, but it also works as a hacky stand-in interpreter that gives you room to dynamically indirect to an interpreter of your choice. #!/usr/bin/python calls whatever python is in /usr/bin. #!/usr/bin/env python uses whichever program called ‘python’ that comes first in your PATH.

This would seem to solve no problems if you can’t choose which version of env to use. The contracts with the kernel and with env are so simple that it really shouldn’t matter. Alas, it does. Consider the following C program:

#include <stdio.h>

int
main(int argc, char **argv) {
  int x;
  for(x = 0 ; x < argc ; x++) {
    printf("ARGV[%d] = %s\n", x, argv[x]);
  }
}

We use it as the supplied interpreter on Mac OS:

$ uname -s
Darwin
$ cat fake_hashbang 
#!./fake_interpreter these are additional arguments
newt@shim:syntax_salt$ ./fake_hashbang 
ARGV[0] = ./fake_interpreter
ARGV[1] = these
ARGV[2] = are
ARGV[3] = additional
ARGV[4] = arguments
ARGV[5] = ./fake_hashbang

And we use it as the supplied interpreter on Linux:

$ uname -s
Linux
$ cat fake_hashbang 
#!./fake_interpreter these are additional arguments
newt@alex:~$ ./fake_hashbang 
ARGV[0] = ./fake_interpreter
ARGV[1] = these are additional arguments
ARGV[2] = ./fake_hashbang

Hahahahaha. The semantics of interpreter files differ between these operating systems. Let's put env in front of our fake interpreter.

Mac:

$ uname -s
Darwin  
$ cat env_hashbang 
#!/usr/bin/env ./fake_interpreter these are additional arguments
$ ./env_shebang 
ARGV[0] = ./fake_interpreter
ARGV[1] = these
ARGV[2] = are
ARGV[3] = additional
ARGV[4] = arguments
ARGV[5] = ./env_hashbang

Linux:

$ uname -s
Linux
$ cat env_hashbang 
#!/usr/bin/env ./fake_interpreter these are additional arguments
$ ./env_hashbang 
env: ‘./fake_interpreter these are additional arguments’: No such file or directory
env: use -[v]S to pass options in shebang lines  

Am I "laughing, crying" or laughing, crying? Let's try -S and we'll see. On both systems we get the same output:

$ cat env_s_hashbang 
#!/usr/bin/env -S ./fake_interpreter these are additional arguments
$ ./env_s_hashbang 
ARGV[0] = ./fake_interpreter
ARGV[1] = these
ARGV[2] = are
ARGV[3] = additional
ARGV[4] = arguments
ARGV[5] = ./env_s_hashbang  

That works pretty well right up until you run on a system where env is supplied by busybox and provides no '-S'. That's exactly where I recently found myself. I needed to invoke the javascript interpreter 'node' as an interpreter but I also needed to supply some command-line arguments directly to node. /usr/bin/env won't work, but a little salty syntax will:

#!/bin/sh
// 2> /dev/null || exec node --v8-pool-size=20 "$0" "$@"; exit 1
console.log("hello there");

Run on Mac or linux or busybox linux and get:

$ cat foojs
#!/bin/sh
// 2> /dev/null || exec node --v8-pool-size=20 "$0" "$@"; exit 1
console.log("hello there");
$ ./foojs
hello there  

Why does this work? Essentially the same trick that the kernel shares with shell for comments. Our interpreter is /bin/sh, which runs foojs as a shell program. "//" is a comment in javascript, but in shell it's an attempt to run the root directory as if it was a program. This fails, so stick around with "||" and then exec the program you want, which is to invoke node on the very same file. Once node gets around to it, it tolerates the first line because it knows how to be launched as an interpreter file and then it ignores then second line as a comment. The rest of the file is plainly not shell syntax, but that's OK. Shell evaluates as it goes, so as long as you exit before shell would consider the first line of non-shell syntax then it doesn't matter.

So why is this syntax salty? Well, it's certainly flavorful. I think it's salty because it functions as a preservative, capturing and binding together with the program some of what it takes to actually run it all in one portable file. Is this syntax bad for you like maybe too much salt? After all, it invokes a whole copy of the shell just to do nothing.

Here's the cost on my laptop:

$ cat foojs
#!/bin/sh
// 2> /dev/null || exec node --v8-pool-size=20 "$0" "$@"; exit 1
console.log("hello there");  
$ time ./foojs
hello there

real	0m0.049s
user	0m0.029s
sys	0m0.015s
$ time node --v8-pool-size=20 ./foojs
hello there

real	0m0.048s
user	0m0.031s
sys	0m0.013s
$ time env node --v8-pool-size=20 ./foojs
hello there

real	0m0.048s
user	0m0.031s
sys	0m0.014s

What appear to be a perceptible couple of milliseconds blur in repeated runs. It certainly costs something. So too does /usr/bin/env. The remedy is maybe to add a fat and make sure you're running programs worth the time it takes to start them up. You can run /bin/true in just two milliseconds on my laptop and it's worth what you pay for it.