proxy70

Tabular Programming: Roman Numerals

Machines needn't compute as humans do; machines shouldn't compute as humans do. Metaprogramming has
shown me that all automatic reckoning eventually explicitly states a subset of the possible answers,
somewhere; I'd find myself trying to build the advanced metaprograms I'd thought were possible, only
to find increasingly advanced templating as the only reasonable solution; structured metaprogramming
is superior; it's clear to me the majority of useful automated program synthesis isn't unstructured.

Computation is disgusting where it be unnecessary, and I've noticed a great deal can be unnecessary.

Working with my machine text representation has had me thinking about tables so often, and of tables
referencing tables, and of using but the barest algorithms to compose their saved answers into more.
It's uncommon to work backwards from tables of answers, but I now believe this approach is superior.

Representing numbers as Roman numerals is easy for men to figure. A machine can use better methods.
Programs men would use for this are useful here for creating the initial tables, but won't be shown.
I first wrote a program to generate all Roman numerals, and compared the table of answers to another
implementation to increase my confidence. Analyzing my table makes apparent some important details:
the largest Roman numeral is fifteen letters long; the range of values fits in three bits, being one
of I, V, X, L, C, D, or M, and leaving room for an eighth value; and the table hosts thirty-thousand
letters, though a representation using the same amount of storage for each would nearly double this.

The table is a complete definition of the function, but such definition is akin to knowledge without
wisdom. It's trivial to use a simple composition rule and smaller tables which can be made equal to
the former table, and it's necessary for using this approach with greater and more complex problems.

The obvious way to simplify this single large table is with two smaller tables:

four-by-ten table of bigger numbers

C CC CCC CD D DC DCC DCCC CM M
MC MCC MCCC MCD MD MDC MDCC MDCCC MCM MM
MMC MMCC MMCCC MMCD MMD MMDC MMDCC MMDCCC MMCM MMM
MMMC MMMCC MMMCCC MMMCD MMMD MMMDC MMMDCC MMMDCCC MMMCM

ten-by-ten table of smaller numbers

I II III IV V VI VII VIII IX X
XI XII XIII XIV XV XVI XVII XVIII XIX XX
XXI XXII XXIII XXIV XXV XXVI XXVII XXVIII XXIX XXX
XXXI XXXII XXXIII XXXIV XXXV XXXVI XXXVII XXXVIII XXXIX XL
XLI XLII XLIII XLIV XLV XLVI XLVII XLVIII XLIX L
LI LII LIII LIV LV LVI LVII LVIII LIX LX
LXI LXII LXIII LXIV LXV LXVI LXVII LXVIII LXIX LXX
LXXI LXXII LXXIII LXXIV LXXV LXXVI LXXVII LXXVIII LXXIX LXXX
LXXXI LXXXII LXXXIII LXXXIV LXXXV LXXXVI LXXXVII LXXXVIII LXXXIX XC
XCI XCII XCIII XCIV XCV XCVI XCVII XCVIII XCIX C

The domain of the former is one-to-thirty-nine and of the latter is one-to-one-hundred. The largest
value in the former table is seven letters long, and the largest value in the latter table is eight.

The composition rule is itself also easily expressed as a table:

Let B be the input divided by one hundred, rounding down.
Let S be the input modulo one hundred.
Index the abstract table accordingly.
0 ...
0 S
... B B+S
In the case where B be zero, only the table of smaller numbers be used, indexed by S.
In the case where S be zero, only that table of bigger numbers be used, indexed by B.
In the other case, both be used, with their results concatenated.
The case in which both be zero is impossible due to the domain not including zero.

These two tables and their composition rule can be proven correct trivially, merely by comparing the
results for each value of the domain with the larger and complete table; programming this so is thus
equivalent to intelligently-guided compression. Analyzing these tables makes clear how their ranges
are nearly-disjoint, and can thus be encoded more efficiently, alongside other details; importantly,
such a result is evidence that this is a particularly good and natural segmentation of this problem.
The range of values in that former table are C, D, and M; those of the latter are I, V, X, L, and C.

That larger table is always suspect, as a single error within is damning. This new approach reduces
manually checking one below four thousand entries to checking a mere one hundred and thirty-nine and
the validity of the rule. The table of bigger numbers can be further reduced trivially, by noticing
the predictable repetition of the letter M, but I don't find the added complexity of another rule to
be worthwhile in that; I'm much more interested in reducing the size of the table of smaller numbers
and I believe it to be better to have fewer, larger tables using simple rules, than the alternation.

Simply splitting symmetrically shows simple similarities:

I LI
II LII
III LIII
IV LIV
V LV
VI LVI
VII LVII
VIII LVIII
IX LIX
X LX
XI LXI
XII LXII
XIII LXIII
XIV LXIV
XV LXV
XVI LXVI
XVII LXVII
XVIII LXVIII
XIX LXIX
XX LXX
XXI LXXI
XXII LXXII
XXIII LXXIII
XXIV LXXIV
XXV LXXV
XXVI LXXVI
XXVII LXXVII
XXVIII LXXVIII
XXIX LXXIX
XXX LXXX
XXXI LXXXI
XXXII LXXXII
XXXIII LXXXIII
XXXIV LXXXIV
XXXV LXXXV
XXXVI LXXXVI
XXXVII LXXXVII
XXXVIII LXXXVIII
XXXIX LXXXIX
XL XC
XLI XCI
XLII XCII
XLIII XCIII
XLIV XCIV
XLV XCV
XLVI XCVI
XLVII XCVII
XLVIII XCVIII
XLIX XCIX
L C

The first half of that table contains no instances of the letter C, and the latter half only does so
in the last eleven entries, but the position of the letter L in the latter half is consistent. It's
clear this table's size could be halved, by another simple composition rule. It's also obvious that
last entry is unused even by the original composition rule, but it makes this symmetry more obvious:

five-by-ten table partially representing the table of smaller numbers

I II III IV V VI VII VIII IX X
XI XII XIII XIV XV XVI XVII XVIII XIX XX
XXI XXII XXIII XXIV XXV XXVI XXVII XXVIII XXIX XXX
XXXI XXXII XXXIII XXXIV XXXV XXXVI XXXVII XXXVIII XXXIX X?
X?I X?II X?III X?IV X?V X?VI X?VII X?VIII X?IX ?

The composition rule will be first expressed without tables:

Let I be the the index modulo fifty.
If the index be in the former half, let ? be the letter L.
If the index be in the latter half, let ? be the letter C.
Return the contents of the table indexed by I; the index of zero will result in a ?.
If the index be in the latter half, but not the final eleven, append a letter L to the result.

Sans the division, this can nicely be expressed with the following tables; an additional composition
rule can reduce one of the tables in size from one hundred to two, and from one hundred to fifty, as
determining which half can be done by index division by fifty, and appending an L done through using
a table with fifty entries, use thereof determined by that other, and indexed by index modulo fifty:

four-by-two table mapping the former half

I I
V V
X X
? L

four-by-two table mapping the latter half

I I
V V
X X
? C

five-by-ten table of positions for the letter L to be appended, with blanks represented as N

L L L L L L L L L L
L L L L L L L L L L
L L L L L L L L L L
L L L L L L L L L N
N N N N N N N N N N

two table of indications as to whether the index be in the former (F) or latter (L) half

F L

While this could halve the number of entries to check, in comparison to the earlier table of smaller
numbers, it's not necessarily easier; it's better to check the earlier table, and then to check that
this newer representation be equivalent. Importantly, modifying this table requires no checking but
with that table which it represents, thus checking one hundred entries and not nearly four thousand.

A Latin book I'm reading uses CCCC instead of CD and DCCCC instead of CM. These changes are easy to
make by modifying the corresponding table entries, and this is an advantage of programming this way:

four-by-ten table of bigger numbers with modifications

C CC CCC CCCC D DC DCC DCCC DCCCC M
MC MCC MCCC MCCCC MD MDC MDCC MDCCC MDCCCC MM
MMC MMCC MMCCC MMCCCC MMD MMDC MMDCC MMDCCC MMDCCCC MMM
MMMC MMMCC MMMCCC MMMCCCC MMMD MMMDC MMMDCC MMMDCCC MMMDCCCC

Importantly, the largest value in this table of bigger numbers is now eight letters long, not seven.
Furthermore, the largest Roman numeral becomes sixteen letters long, not fifteen, with the addition.

It's very important to recalculate such information in the cases in which it could change.

These changes can be checked for correctness quite trivially by comparing only the changed elements:

CCCC CD
MCCCC MCD
MMCCCC MMCD
MMMCCCC MMMCD

DCCCC CM
MDCCCC MCM
MMDCCCC MMCM
MMMDCCCC MMMCM

An issue with compressing these tables well is adding special cases. I've seen an irregular form of
eighteen, XIIX rather than XVIII, which would be interesting to add. However, changing the table of
smaller numbers wouldn't only influence numbers ending in eighteen, but those ending in sixty-eight.
The twenty-second Roman Legion would apparently write twenty-two as IIXX, and it certainly shouldn't
apply to any other number ending in twenty-two. It would be trivial to store such exceptions within
the full table, but that defeats my purpose in optimizing it; the solution is to have a sparse table
hold these rare exceptions; a reasonably short such table needs no special mechanisms, linear search
would work without issue. It would also be nice if this program would return N, or NIHIL, for zero.

It's particularly stupid to extend the domain of a function through exceptions alone.

It's best to extend the domains of the two tables to include zero, as a blank value, and this allows
for greatly simplifying the composition rule, which truly has no place being represented as a table:

B+S

Domain extended, it's also possible to improve the compression of the table of smaller numbers; with
a granularity of ten made feasible, the fifty-entry table can be reduced to ten entries, though that
two-entry table would be extended to ten. These would be indexed by dividing the index by ten; this
lone division serves to index either table equally well; the tables could even be combined for such:

ten table of positions for the letter L to be appended, with blanks represented as N

N N N N N L L L L N

ten table of indications as to whether the index be in the former (F) or latter (L) half

F F F F F L L L L L

That final result is a very simple program which indexes four tables, and all computation happens in
manipulating the indices or in combining the intermediate results into the final answer. There's no
need for the program to operate in any particular order, either; the value slots of the final answer
can be decided by the worst case, by the longest result, meaning the value of each slot can be given
independently, and an additional composition rule could normalize it to result in the proper answer.

That most efficient representation of the tables could be determined mechanically. For the table of
bigger numbers, two bits suffice to hold one letter, and seven or eight could be stored easily, with
that fourth potential letter used to indicate blanks. The smaller table of smaller numbers could be
represented with two bits for each letter, and the length of the entry with three bits. Those other
two tables could be combined with each other, or with another table. There are possibilities there.

Rather than represent code as data, it's better to eliminate the code entirely, I've grown to think.
A table can be modified while a machine uses it, without any issue whatsoever; a special environment
could help in this, but merely seeing the tables of passive data is as interactive as anything else.

This style is trivial to implement in most any language, and makes it easy to write complex programs
in machine code. As the basis of a programming tool, a program to generate machine code could use a
similar approach and exhaustively list different fragments implementing indexing with various units.

I was satisfied with this, but it occurred to me the inverse problem is also interesting, converting
from the Roman numeral representation to numbers. This is again trivial with the full table, merely
requiring a search, but my current approach doesn't support such a simple search; a composition rule
is easily added for this, however. The typical approach to the problem would be called ``parsing'',
which disgusts me, and requires encoding some model of understanding; this is unnecessary. Analysis
has shown the ranges of the tables of bigger and smaller numbers are nearly-disjoint, and conversion
can occur based on splitting the input when a letter from the latter table is found, to then search.

While this works in all cases, it's not clear that it would avoid error. The composition rule isn't
yet proven, and the domain is extremely large. However, the other composition rules have been shown
to be correct, so they may be used for this. A valid result will result in the same output if given
to the composition rule for conversion to the Roman numeral representation; checking for equivalency
between these suffices to prove the result this rule gives correct or incorrect. Lastly, the sparse
exceptions table poses an issue, but it's trivial to search that first, and any result it gives will
agree with the other composition rules, as they similarly treat it specially. Particularly, if this
weren't done, the exceptions would prove exceptions to the equivalency test, giving different forms.

Follows is the composition rule:

Search the sparse exceptions table, and return any match.
Search the input for the first occurence of I, V, X, or L, and split the input in twain at the seam.
Search that table of bigger numbers for the former half, and return the index of the match.
Search the table of smaller numbers for the latter half, and return the index of the match.
If either search fails, the input be invalid.
Let S be the sum of the former index, multiplied by one hundred, with the latter index.
Convert S to a Roman numeral and ensure it be equal to the input, lest the input be invalid.
Otherwise, S be the result.

Searching the table of smaller numbers can be done by checking whether the latter half begins with L
or not, later ignoring the first letter and adding fifty to the resulting index in this case, and by
permitting that ? to match either of L or C, adding fifty to the resulting index in the latter case.

There are alternative methods for encoding such things, but this suffices; the length of the longest
Roman numeral places an upper bound on the length of the input, giving this domain a clear boundary.
The general solution, simply enumerating the possibilities and matching against them, works well for
such repetitive and unstructured data. I see no better way to test than checking every combination.

Taking advantage of such automated, exhaustive proof provides more time for when this be infeasible.

I'll be removing this sentence once I've an Ada version of this simple program available.

Tabular programming is applicable to many problems, and I want to begin heavily using it. I believe
most programs likely could be and should be written in this style. In writing this, I sparsely used
my machines; sans generating that initial full table and checking my proofs exhaustively, I found it
mostly unnecessary, and it guided none of my thinking about the problem, or how to approach it. The
translation of integers to ordinal or cardinal Roman numbers will make a lovely demonstration, next.