| ----------------------------------------
Using ptx to generate one-time pads
March 15th, 2018
----------------------------------------
I have been working my way through coreutils [0] recently when
I came across ptx.
$ apropos ptx
ptx (1) - produce a permuted index of file contents
What the hell does that mean? I know...
$ man ptx
PTX(1) User Commands PTX(1)
NAME
ptx - produce a permuted index of file contents
SYNOPSIS
ptx [OPTION]... [INPUT]... (without -G)
ptx -G [OPTION]... [INPUT [OUTPUT]]
DESCRIPTION
Output a permuted index, including context, of the words
in the input files.
With no FILE, or when FILE is -, read standard input.
Mandatory arguments to long options are mandatory for
short options too.
...
Oh that totally clears it... nope. Still no clue.
So I asked on Mastodon and a few people had some suggestions in
particular someone was able to shoot me over to a blog post [1]
which tries to clear up what a 'purmuted index' even is. And
that's the key. So check this out:
A while back before we had badass search engines and hyperlinked
doom shenanigans manually finding the reference to a word in
a document SUUUUUUUUCKED. So they made this index in the back that
listed all the key terms alphebetically in the middle column of
a page. To the left of that word it would list whatever sentence
led up to it. To the right they'd list the sentence fragment that
followed the term. Finally, the page number. With that you could
jump to the page and eye-ball search it yourself.
It's been around since systemV and it's pretty much useless,
right? Well, foxy, I think I came up with a fun hobby use-case.
Pick a book with a publically available canonical plain-text
source. Oh, I dunno, head over to Project Gutenburg [2] or
something and wrestle yourself up some Joyce (or ILLEGAL GERMAN
NOVELS!!!!! [3]). We're gonna shove that badboy into ptx like
a champ. Here we go...
$ curl https://www.gutenberg.org/files/4300/4300-0.txt > ulysses.txt
$ ptx ulysses.txt
SCREEN EXPLODES WITH TEXT FOR SEVERAL MINUTES!!!!!
That's not how that works. Back to manpage!
Hmmm...
...assumes latin-1 charset...
...ignore case, perhaps...
...[.?!][]\"')}]*\\($\\|\t\\| \\)[ \t\n]*...
...Emacs next-error, grumble...
...-w, width, ahha...
ROFF! NO FUCKING WAY!
One of the output formats for ptx is freaking roff! Syncronicity,
baby! [4] Lets try something a little smaller.
$ curl http://www.gutenberg.org/cache/epub/1065/pg1065.txt > theraven.txt
$ ptx -O -f -w 66 theraven.txt > theraven-index.txt
That sorta works. Ugh, but I'm getitng tired. Here's the plan for
what's next:
- Figure out how to format this stuff so I can awk it
- awk so that the text key and one more word to the right are
the output. Two words with a space between, that's it.
- sort unique that bad-boy by each column in turn so both pairs
of words are unique.
- Use whatever words are in your primary list to write a plain
text message. If your source document is large enough that's
virtually any word you'd like to use.
- Use awk to replace your words with the one to the right via
a lookup file
- Send secret message to a friend. The knowledge of which book
is your cypher is all that's necessary to repeat the process
in reverse.
Huzzah for secret codes.
If I get some time this weekend I'll look at writing a script to
automate this for you. Provide a book and a message and indicate
whether to encode or decode. Oh what fun that would be for some
private crypto. Thinking you could do this in perl? Wanna show me
up? Put your illogical collection of special characters where your
mouth is, buddy!
|