(2023-05-04) If stuck with Busybox but need to run Forth, use... Subleq
-----------------------------------------------------------------------
It might sound even crazier than running Brainfuck on top of VTL-2 on top of
AWK, but yeah, there is an implementation of eForth ([1]) for a specific 
variant of Subleq architecture, the details of which I'm going to share here 
shortly and programs for which are distributed as .dec files, where a .dec 
file is just a list of signed decimal integers separated by whatever 
whitespace delimiter. And yes, this file format looks like a perfect target 
for AWK, so I couldn't resist writing my own Subleq implementation in this 
language. And, to be honest, considering all the quirks AWK requires where 
plain C offers straightforward solutions, 26 SLOC is not a lot for such a 
useful single-instruction computer architecture.

Now, I am going to share the ready-made .awk file as usual on the main page,
but I want to go through every significant line of it here to explain what's 
going on. As I already said, this program accepts a single .dec file as 
input in the form [busybox] awk -f subleq.awk program.dec and processes it 
line by line. But first, we need to define two helper functions:

function L(v) { # cast any value to unsigned 16-bit integer
  v = int(v)
  while(v < 0) v += 65536
  return int(v%65536)
}

function getchar(c, cmd) { # POSIX-compatible getchar emulation with sh read
  (cmd="c='';IFS= read -r -n 1 -d $'\\0' c;printf '%u' \"'$c\"") | getline c
  close(cmd)
  return int(c)
}

These functions are mostly self-explanatory, but I have something to add. The
L() function could be simplified if we could use bitwise operations, but I 
decided to stay on the POSIX side and emulated everything with conditions 
and the % operator. It also ensures the result stays integer before and 
after the conversion. The getchar() function is necessary to emulate 
Subleq's input logic and, as you can see, unlike the C version, here it 
requires some external shell processing so it is quite slow already. I only 
used POSIX-compatible command options for read and printf though. Here, we 
read a single character (which can be a newline, hence the null delimiter) 
and then display its decimal character code, which is cast to integer at the 
AWK side and returned to the caller after the subprocess is closed. Now, we 
can initialize our 64K virtual Subleq memory: 

BEGIN {
  for(pc=0;pc<65536;pc++) MEM[pc] = 0 # init the memory array
  pc = a = b = c = 0 # reset the program counter and other vars
}

Once we've done this, we can start matching on the integers within the file
and filling our memory with the actual values:

{ for(i=1;i<=NF;i++) if($i ~ /^[-0-9][0-9]*$/) MEM[pc++] = L($i) }

Here, the logic is like this. We iterate over every single input line, which
can contain any amount of fields. The default delimiters are fine but we 
need to iterate over every field we encounter. If the field matches the 
regex for _signed_ integers (the first character can be either - or a digit, 
any next ones, if present, can only be digits), we cast it into a 16-bit 
unsigned value using our L() function, set it to the current memory cell and 
shift the pointer to the next one.

Finally, once the entire file has been read and parsed, we can start the
actual execution process in the END block:

END {
  for(pc=0;pc<32768;) {
    a = MEM[pc++]; b = MEM[pc++]; c = MEM[pc++] # fill the cell addresses
    if(a == 65535) MEM[b] = L(getchar())
    else if(b == 65535) printf("%c", MEM[a]%256)
    else {
      MEM[b] = L(MEM[b] - MEM[a]) # subtract the first 2 cells and cast
      if(MEM[b] == 0 || (MEM[b] > 32767)) pc = c # jump if result <=0 
    }
  }
}

Here's how it works. First, we reset our program counter PC once again to 0
and start sequentially reading three values in a loop: A, B and C. These are 
the cell addresses our OISC operates on every cycle. Now, and this is the 
first quirk of this particular Subleq variant, we have two special cases: if 
A is set to -1 (which is obviously cast to 65535 as unsigned 16-bit value), 
we input a character from standard input and set it to the cell at address 
B, and if B is set to -1, we output the contents of the cell at address A as 
a character to the standard output (which is done much easier in AWK and 
doesn't need a special method). If none of this special cases is true, we 
run the general Subleq logic: subtract the cell at address A from the cell 
at address B and write the result to the cell at address B, then jump to 
address C if this result is less then or equal to zero. However, you may 
notice that the code doesn't specify the condition exactly like this. What's 
the matter?

The thing is, and here is the second quirk, that this specific implementation
does all the casting beforehands and requires to map all negative 
subtraction results (from -32768 to -1) to the upper half of 16-bit range 
(from 32768 to 65535). And programs like eForth actually do check this to 
determine whether they are running on the correct Subleq VM version. For the 
same reason, we only iterate our program counter from 0 to 32767, as the 
program can only be loaded into the lower 32K of virtual memory. Higher PC 
values would be internally treated as negative and thus invalid. So, we 
change our <=0 condition to check if the result is zero or above 32767. This 
way, everything works as expected.

And guess what, the .dec file of eForth does run under Busybox AWK too.
Extremely slowly but surely. I recommend the version of subleq.dec found on 
the JS version of howerj's project, because the one from the repo is even 
slower despite having smaller size. So, even if you can't compile anything 
for the target system (whatever it might be) but have an awk command there, 
you can run eForth programs just via this Subleq emulator. Ain't it 
wonderful?

--- Luxferre ---

[1]: https://howerj.github.io/subleq.htm