proxy70

PREDICTIVE COMPRESSION

In my last ideas post I mentioned that I'd been thinking about 
alternative data compression and transmission schemes. In this 
respect what I meant was that I had written one of my pages of 
handwritten notes on the topic, without much consideration of prior 
work. Checking now I see that my basic idea is of course well 
established in the very actively researched field of data 
compression. This is good though - it is a significant part of the 
current PAQ compressors, on which the winners (well err, winner) of 
the Hutter Prize have based their work, and that dataset of English 
Wikipedia content should be relatively similar to compressing 
things like Gopher content (plus or minus some markup and
ASCII-art). That means it might be applicable to transmitting over
low-bandwidth communications such as for my idea of broadcasting
Gopherspace.

gopher://gopherpedia.com:70/0/PAQ
http://prize.hutter1.net/

That said, it's always a little disappointing to find that you 
haven't really thought of anything new. I will provide a breif 
summary of my idea though, ending with one alternative approach 
that perhaps could lead to something of potential.

Basically you have an AI system process preceeding parts (maybe 
also following parts) of the data stream and then proposes a 
selection of probable sequences to follow it. The depth of context 
around the part to be guessed would obviously help it to narrow 
down the most likely outcomes. The probability of each of these 
outcomes is assigned a value within a pre-defined scale, therefore 
if an identical AI system is used for both compression and 
decompression, only the value assigned to the correct outcome needs 
to be transmitted - the decompressor simply has to match that to 
the corresponding outcome that it calculated itself from the other 
data already received. If the likelihood of the correct outcome is 
too low to be included within the range of values within the 
probability range, then it has to be sent using more conventional 
compression methods.

Obviously this is computationally difficult (slower) compared to 
conventional compression schemes (PAQ compressors confirm this), 
but in the application I have in mind the main thing is keeping 
size small enough to be practical for transmission over a slow 
radio link.

The only thing I'd add to this that doesn't factor into the 
implementations that I've read about online (and for good reason 
really) is on the subject of this probability scale itself. 
Conventionally this would be a list of integers that correspond to 
bit values in digital communication. But what if you represent it 
within an analogue frequency or voltage range? Then granuality 
becomes a question of resolution rather than simply byte size. An 
analogue scale could theoretically represent the entire range of 
probabilities if the measurement resolution was fine enough to 
detect every single possible outcome within that range. In practice 
it won't be, but perhaps there's some advantage to this concept if 
over a more restricted range similar probable outcomes are grouped 
together.

For example, say you're measuring voltage within a voltage range 
that represents probable outcomes in the data stream. The range 
used is only accurately measureable at close to ideal signal 
conditions - very little noise. If the signal is degraded by noise, 
the value received is incorrect, but because it is still nearby on 
the scale the result is still similar to the correct outcome. This 
is in contrast to digital communications where an incorrectly 
received bit can cause a wildly different position within the scale 
to be received (and it's often discarded entirely after comparing 
with a CRC value). That said, digital communications basically 
operate this same system with only two outcomes within their range, 
so they can cope with much more noise while retaining signal 
integrity. But this is only as a consequence of throwing away extra 
resolution in the received analogue signal that _could_ represent 
more information.

It is confusing to mix the analogue and digital domains - that 
argument could continue in favour of dynamic number bases according 
to signal quality - base 2 when signal quality is poor and only two 
states can be reliably detected, base 10 when it's good enough to 
represent ten points within the voltage or frequency range (which I 
guess is an alternative to simply increasing the transmission speed 
with base 2 until the signal quality drops to your limit). But what 
if it's possible to stay within the analogue domain? Picture a form 
of analogue AI computer capable of generating the analogue voltage 
value that corresponds to the correct outcome within the voltage 
range representing the scale of probable outcomes, and use that for 
compression. Then you have another analogue computer able to 
reverse that process for the decompression. Or else for 
decompression you use one that is identical to that used for 
compression, but its "correct outcome" inputs are scanned through 
the entire range of probailities within their own analogue input 
scale, until its output value is identical to that generated by the 
compresson computer, at which time the momentary state of its 
inputs represents the answer.

I can't help but think how that last arrangement sniffs a little of 
quantum computing and qbits "collapsing" to a particular state. I'm 
not sure how to tie those domains of analogue and quantum together 
so that an analogue voltage value within a range would equate 
directly to a qbit, but perhaps it's close to something.

I can't draw this to much of a conclusion unfortunately. An 
analogue AI computer at a scale to be useful in this application is 
bound to be impractical by conventional means, and my equating 
analogue electronics to quantum computing may well be complete 
nonsense. So while these thoughts spell to me some machine that can 
process an analogue data stream with increased resolution and data 
throughput compared with conventional digital techniques, it may 
well be completely wrong and backward-looking. On the other hand 
it's interesting to consider that the simplification of analogue 
values down to just one of two states might represent just one 
stage of computer development; One that leads towards data 
representations within infinate scales of possibility, and 
calculation using physical behaviours themselves rather than the 
layered abstrations of digital electronics.

 - The Free Thinker, 2021.