proxy70

Metadata is Useful Where Trust Isn't

I like to think about problems of distributed computation where other participants or communications
channels can't be trusted. There are cases where repeatedly performing such computations results in
an increasingly-certain answer, but I'll ignore this as a solution even though it's my first choice.

I can imagine a problem such as sorting an ordered set of data where it would be easier or faster to
send the set to another machine alongside how to sort it, but that matter of verification comes into
play. Merely returning that ordered set is insufficient, as malicious interference could change its
contents, and validating both sets only differ in order could have worse performance characteristics
than the sorting itself. The APL sorting functions aren't like those in other computer systems, for
they return an array which, when used to index the input, results in the sorted input. Similarly, a
distributed sort should return such metadata. A list of indices can be verified to have no holes or
duplicate entries in linear time and space. Applying such a list to the input is easily reversible,
and checking the resulting set be properly sorted can happen in constant space and also linear time.

Using metadata so is also that basis of all proof of work systems. The easiest way to validate some
downloaded data is to have multiple sources and to ensure they give multiple identical copies, but I
see the obvious issues with this fact: It's possible for one source to masquerade as multiple; there
may be only a single source globally; and some data is simply too big to justify downloading several
times. Bittorrent solves this problem by making the trusted data a much smaller set of metadata and
then using that metadata to verify the much larger downloaded data beyond any reasonable such doubt.

Still, the use of cryptographic hash checksum functions in such ways leaves a bad taste in my mouth.
Someday, it may be shown how these important primitives can't hold to their purpose, which will then
destroy every system which uses them to otherwise achieve what may be impossible results. Still, it
may be the case that these can exist in forms proven to work to their specifications. Personally, I
believe the former case to be more likely, and would rather have a world without trapdoor functions.
Regardless, these tools provide the most popular form of trustless metadata, and now appear to work.