Tabular Programming: Roman Numerals Machines needn't compute as humans do; machines shouldn't compute as humans do. Metaprogramming has shown me that all automatic reckoning eventually explicitly states a subset of the possible answers, somewhere; I'd find myself trying to build the advanced metaprograms I'd thought were possible, only to find increasingly advanced templating as the only reasonable solution; structured metaprogramming is superior; it's clear to me the majority of useful automated program synthesis isn't unstructured. Computation is disgusting where it be unnecessary, and I've noticed a great deal can be unnecessary. Working with my machine text representation has had me thinking about tables so often, and of tables referencing tables, and of using but the barest algorithms to compose their saved answers into more. It's uncommon to work backwards from tables of answers, but I now believe this approach is superior. Representing numbers as Roman numerals is easy for men to figure. A machine can use better methods. Programs men would use for this are useful here for creating the initial tables, but won't be shown. I first wrote a program to generate all Roman numerals, and compared the table of answers to another implementation to increase my confidence. Analyzing my table makes apparent some important details: the largest Roman numeral is fifteen letters long; the range of values fits in three bits, being one of I, V, X, L, C, D, or M, and leaving room for an eighth value; and the table hosts thirty-thousand letters, though a representation using the same amount of storage for each would nearly double this. The table is a complete definition of the function, but such definition is akin to knowledge without wisdom. It's trivial to use a simple composition rule and smaller tables which can be made equal to the former table, and it's necessary for using this approach with greater and more complex problems. The obvious way to simplify this single large table is with two smaller tables: four-by-ten table of bigger numbers C CC CCC CD D DC DCC DCCC CM M MC MCC MCCC MCD MD MDC MDCC MDCCC MCM MM MMC MMCC MMCCC MMCD MMD MMDC MMDCC MMDCCC MMCM MMM MMMC MMMCC MMMCCC MMMCD MMMD MMMDC MMMDCC MMMDCCC MMMCM ten-by-ten table of smaller numbers I II III IV V VI VII VIII IX X XI XII XIII XIV XV XVI XVII XVIII XIX XX XXI XXII XXIII XXIV XXV XXVI XXVII XXVIII XXIX XXX XXXI XXXII XXXIII XXXIV XXXV XXXVI XXXVII XXXVIII XXXIX XL XLI XLII XLIII XLIV XLV XLVI XLVII XLVIII XLIX L LI LII LIII LIV LV LVI LVII LVIII LIX LX LXI LXII LXIII LXIV LXV LXVI LXVII LXVIII LXIX LXX LXXI LXXII LXXIII LXXIV LXXV LXXVI LXXVII LXXVIII LXXIX LXXX LXXXI LXXXII LXXXIII LXXXIV LXXXV LXXXVI LXXXVII LXXXVIII LXXXIX XC XCI XCII XCIII XCIV XCV XCVI XCVII XCVIII XCIX C The domain of the former is one-to-thirty-nine and of the latter is one-to-one-hundred. The largest value in the former table is seven letters long, and the largest value in the latter table is eight. The composition rule is itself also easily expressed as a table: Let B be the input divided by one hundred, rounding down. Let S be the input modulo one hundred. Index the abstract table accordingly. 0 ... 0 S ... B B+S In the case where B be zero, only the table of smaller numbers be used, indexed by S. In the case where S be zero, only that table of bigger numbers be used, indexed by B. In the other case, both be used, with their results concatenated. The case in which both be zero is impossible due to the domain not including zero. These two tables and their composition rule can be proven correct trivially, merely by comparing the results for each value of the domain with the larger and complete table; programming this so is thus equivalent to intelligently-guided compression. Analyzing these tables makes clear how their ranges are nearly-disjoint, and can thus be encoded more efficiently, alongside other details; importantly, such a result is evidence that this is a particularly good and natural segmentation of this problem. The range of values in that former table are C, D, and M; those of the latter are I, V, X, L, and C. That larger table is always suspect, as a single error within is damning. This new approach reduces manually checking one below four thousand entries to checking a mere one hundred and thirty-nine and the validity of the rule. The table of bigger numbers can be further reduced trivially, by noticing the predictable repetition of the letter M, but I don't find the added complexity of another rule to be worthwhile in that; I'm much more interested in reducing the size of the table of smaller numbers and I believe it to be better to have fewer, larger tables using simple rules, than the alternation. Simply splitting symmetrically shows simple similarities: I LI II LII III LIII IV LIV V LV VI LVI VII LVII VIII LVIII IX LIX X LX XI LXI XII LXII XIII LXIII XIV LXIV XV LXV XVI LXVI XVII LXVII XVIII LXVIII XIX LXIX XX LXX XXI LXXI XXII LXXII XXIII LXXIII XXIV LXXIV XXV LXXV XXVI LXXVI XXVII LXXVII XXVIII LXXVIII XXIX LXXIX XXX LXXX XXXI LXXXI XXXII LXXXII XXXIII LXXXIII XXXIV LXXXIV XXXV LXXXV XXXVI LXXXVI XXXVII LXXXVII XXXVIII LXXXVIII XXXIX LXXXIX XL XC XLI XCI XLII XCII XLIII XCIII XLIV XCIV XLV XCV XLVI XCVI XLVII XCVII XLVIII XCVIII XLIX XCIX L C The first half of that table contains no instances of the letter C, and the latter half only does so in the last eleven entries, but the position of the letter L in the latter half is consistent. It's clear this table's size could be halved, by another simple composition rule. It's also obvious that last entry is unused even by the original composition rule, but it makes this symmetry more obvious: five-by-ten table partially representing the table of smaller numbers I II III IV V VI VII VIII IX X XI XII XIII XIV XV XVI XVII XVIII XIX XX XXI XXII XXIII XXIV XXV XXVI XXVII XXVIII XXIX XXX XXXI XXXII XXXIII XXXIV XXXV XXXVI XXXVII XXXVIII XXXIX X? X?I X?II X?III X?IV X?V X?VI X?VII X?VIII X?IX ? The composition rule will be first expressed without tables: Let I be the the index modulo fifty. If the index be in the former half, let ? be the letter L. If the index be in the latter half, let ? be the letter C. Return the contents of the table indexed by I; the index of zero will result in a ?. If the index be in the latter half, but not the final eleven, append a letter L to the result. Sans the division, this can nicely be expressed with the following tables; an additional composition rule can reduce one of the tables in size from one hundred to two, and from one hundred to fifty, as determining which half can be done by index division by fifty, and appending an L done through using a table with fifty entries, use thereof determined by that other, and indexed by index modulo fifty: four-by-two table mapping the former half I I V V X X ? L four-by-two table mapping the latter half I I V V X X ? C five-by-ten table of positions for the letter L to be appended, with blanks represented as N L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L N N N N N N N N N N N two table of indications as to whether the index be in the former (F) or latter (L) half F L While this could halve the number of entries to check, in comparison to the earlier table of smaller numbers, it's not necessarily easier; it's better to check the earlier table, and then to check that this newer representation be equivalent. Importantly, modifying this table requires no checking but with that table which it represents, thus checking one hundred entries and not nearly four thousand. A Latin book I'm reading uses CCCC instead of CD and DCCCC instead of CM. These changes are easy to make by modifying the corresponding table entries, and this is an advantage of programming this way: four-by-ten table of bigger numbers with modifications C CC CCC CCCC D DC DCC DCCC DCCCC M MC MCC MCCC MCCCC MD MDC MDCC MDCCC MDCCCC MM MMC MMCC MMCCC MMCCCC MMD MMDC MMDCC MMDCCC MMDCCCC MMM MMMC MMMCC MMMCCC MMMCCCC MMMD MMMDC MMMDCC MMMDCCC MMMDCCCC Importantly, the largest value in this table of bigger numbers is now eight letters long, not seven. Furthermore, the largest Roman numeral becomes sixteen letters long, not fifteen, with the addition. It's very important to recalculate such information in the cases in which it could change. These changes can be checked for correctness quite trivially by comparing only the changed elements: CCCC CD MCCCC MCD MMCCCC MMCD MMMCCCC MMMCD DCCCC CM MDCCCC MCM MMDCCCC MMCM MMMDCCCC MMMCM An issue with compressing these tables well is adding special cases. I've seen an irregular form of eighteen, XIIX rather than XVIII, which would be interesting to add. However, changing the table of smaller numbers wouldn't only influence numbers ending in eighteen, but those ending in sixty-eight. The twenty-second Roman Legion would apparently write twenty-two as IIXX, and it certainly shouldn't apply to any other number ending in twenty-two. It would be trivial to store such exceptions within the full table, but that defeats my purpose in optimizing it; the solution is to have a sparse table hold these rare exceptions; a reasonably short such table needs no special mechanisms, linear search would work without issue. It would also be nice if this program would return N, or NIHIL, for zero. It's particularly stupid to extend the domain of a function through exceptions alone. It's best to extend the domains of the two tables to include zero, as a blank value, and this allows for greatly simplifying the composition rule, which truly has no place being represented as a table: B+S Domain extended, it's also possible to improve the compression of the table of smaller numbers; with a granularity of ten made feasible, the fifty-entry table can be reduced to ten entries, though that two-entry table would be extended to ten. These would be indexed by dividing the index by ten; this lone division serves to index either table equally well; the tables could even be combined for such: ten table of positions for the letter L to be appended, with blanks represented as N N N N N N L L L L N ten table of indications as to whether the index be in the former (F) or latter (L) half F F F F F L L L L L That final result is a very simple program which indexes four tables, and all computation happens in manipulating the indices or in combining the intermediate results into the final answer. There's no need for the program to operate in any particular order, either; the value slots of the final answer can be decided by the worst case, by the longest result, meaning the value of each slot can be given independently, and an additional composition rule could normalize it to result in the proper answer. That most efficient representation of the tables could be determined mechanically. For the table of bigger numbers, two bits suffice to hold one letter, and seven or eight could be stored easily, with that fourth potential letter used to indicate blanks. The smaller table of smaller numbers could be represented with two bits for each letter, and the length of the entry with three bits. Those other two tables could be combined with each other, or with another table. There are possibilities there. Rather than represent code as data, it's better to eliminate the code entirely, I've grown to think. A table can be modified while a machine uses it, without any issue whatsoever; a special environment could help in this, but merely seeing the tables of passive data is as interactive as anything else. This style is trivial to implement in most any language, and makes it easy to write complex programs in machine code. As the basis of a programming tool, a program to generate machine code could use a similar approach and exhaustively list different fragments implementing indexing with various units. I was satisfied with this, but it occurred to me the inverse problem is also interesting, converting from the Roman numeral representation to numbers. This is again trivial with the full table, merely requiring a search, but my current approach doesn't support such a simple search; a composition rule is easily added for this, however. The typical approach to the problem would be called ``parsing'', which disgusts me, and requires encoding some model of understanding; this is unnecessary. Analysis has shown the ranges of the tables of bigger and smaller numbers are nearly-disjoint, and conversion can occur based on splitting the input when a letter from the latter table is found, to then search. While this works in all cases, it's not clear that it would avoid error. The composition rule isn't yet proven, and the domain is extremely large. However, the other composition rules have been shown to be correct, so they may be used for this. A valid result will result in the same output if given to the composition rule for conversion to the Roman numeral representation; checking for equivalency between these suffices to prove the result this rule gives correct or incorrect. Lastly, the sparse exceptions table poses an issue, but it's trivial to search that first, and any result it gives will agree with the other composition rules, as they similarly treat it specially. Particularly, if this weren't done, the exceptions would prove exceptions to the equivalency test, giving different forms. Follows is the composition rule: Search the sparse exceptions table, and return any match. Search the input for the first occurence of I, V, X, or L, and split the input in twain at the seam. Search that table of bigger numbers for the former half, and return the index of the match. Search the table of smaller numbers for the latter half, and return the index of the match. If either search fails, the input be invalid. Let S be the sum of the former index, multiplied by one hundred, with the latter index. Convert S to a Roman numeral and ensure it be equal to the input, lest the input be invalid. Otherwise, S be the result. Searching the table of smaller numbers can be done by checking whether the latter half begins with L or not, later ignoring the first letter and adding fifty to the resulting index in this case, and by permitting that ? to match either of L or C, adding fifty to the resulting index in the latter case. There are alternative methods for encoding such things, but this suffices; the length of the longest Roman numeral places an upper bound on the length of the input, giving this domain a clear boundary. The general solution, simply enumerating the possibilities and matching against them, works well for such repetitive and unstructured data. I see no better way to test than checking every combination. Taking advantage of such automated, exhaustive proof provides more time for when this be infeasible. I'll be removing this sentence once I've an Ada version of this simple program available. Tabular programming is applicable to many problems, and I want to begin heavily using it. I believe most programs likely could be and should be written in this style. In writing this, I sparsely used my machines; sans generating that initial full table and checking my proofs exhaustively, I found it mostly unnecessary, and it guided none of my thinking about the problem, or how to approach it. The translation of integers to ordinal or cardinal Roman numbers will make a lovely demonstration, next.