USAGE AND RECOMPILATION DOCUMENTATION FOR:         8/29/81
     SQ.COM    1.5  File squeezer
     USQ.COM   1.5  File unsqueezer
     FLS.COM   1.1  Ambiguous file name expander

DISTRIBUTION RIGHTS:
I   allow  unrestricted  non-profit  distribution  of   this 
software  and  invite  users groups  to  spread  it  around. 
However,  any distribution for profit requires my permission 
in  advance.  This applies only to the above listed programs 
and their program source and documentation files.  I do sell  
other software.

PURPOSE:
The file squeezer,  SQ, compresses files into a more compact 
form.  This provides:
     1.   Faster transmission by modem.
     2.   Fewer diskettes to distribute a program  package. 
          (Include USQ.COM and instructions, both unsqueezed.)
     3.   Fewer diskettes for archival storage.

Any file can be squeezed,  but program source files and text 
files  benefit the most,  typically shrinking by 35%.  Files 
containing only a limited character set,  such as dictionary 
files,  may shrink as much as 48%.  Squeezed files look like 
gibbersh and must be unsqueezed before they can be used.

The  unsqueezer,  USQ,  expands  squeezed files  into  exact 
duplicates  of the original or provides a quick,  unsqueezed 
display  of  the  tops  of  (or  all  of)  squeezed   files. 
Unsqueezing requires only a single pass.

Both SQ and USQ accept batches of work specified by lists of 
file  names  (with  drives  if  needed)  and   miscellaneous 
options. They accept these parameters in any of three ways:

     1. On the CP/M command line.
     2. From the console keyboard.
     3. From a file.

The  FLS program can be used (on the same command line!)  to 
expand parameter lists containing wild-card (ambiguous) file 
names into lists with the specific file names required by SQ 
and USQ.

This  combination of programs allows you to issue  a  single 
command which will produce many squeezed or unsqueezed files 
from and to various diskettes. For example, to unsqueeze all 
squeezed  ASM files on drive B and send the results to drive 
C  and also unsqueeze all squeezed TXT files on drive A  and 
send the results to drive D:
     A>fls c: b:*.aqm d: *.tqt |usq
For detailed instructions see USAGE.
This  DOES  run under plain old vanilla CP/M!  Many  of  the 
smarts  are buried in the COM files in the form  of  library 
routines  provided  with the BDS C package  (available  from 
Lifeboat).

The  above example simulates a "pipe" (indicated by the "|") 
by sending the "console" output of the fls.com program to  a 
temporary  file  and  then running the sq.com  program  with 
options  which  cause  it to read its  parameters  from  its 
"console" input, which is really redirected to come from the 
temporary file.

Note that programs written in BDS C tend to be GOable. That is
if you do A>save 0 GO and run a C program (just one - no pipes)
then you can rerun it without reading it from disk by using GO
as its name and giving the usual parameters. This works because
BDS C doesn't support initialized static variables. The program
has to initialize everything dynamically, so it cleans up for
each rerun.

THEORY:
The  data  in the file is treated at the byte  level  rather 
then  the word level,  and can contain absolutely  anything. 
The compression is in two stages: first repeated byte values 
are  compressed  and  then a  Huffman  code  is  dynamically 
generated  to match the properties of each particular  file. 
This requires two passes over the source data.

The  decoding  table is included in the  squeezed  file,  so 
squeezing  short  files can actually  lengthen  them.  Fixed 
decoding  tables  are not used because  English and  various 
computer  languages vary greatly as to upper and lower  case 
proportions  and  use of special  characters.  Much  of  the 
savings  comes  from  not  assigning codes  to  unused  byte 
values.

More detailed comments are included in the source files.

USAGE TUTORIAL:
As usual, you have to learn how to tell the programs what to 
do  (i.e.,  what parameters to type after the program name). 
First I will introduce the various possibilities by example. 
Then I will summarize the rules.

In  the simplest case either SQ or USQ can simply  be  given 
one or more file names (with or without drive names):
     A>sq xyz.asm
     A>sq thisfile.doc b:thatfile.doc
will   create  squeezed  files  xyz.aqm,   thisfile.dqc  and 
thatfile.dqc,  all  on the current drive,  A.  The  original 
files are not disturbed. Note that the names of the squeezed 
files are generated by rules - you don't specify them.

Likewise,
     A>usq xyz.aqm
will  create file xyz.asm on the A  drive,  overwriting  the 
original.  (The  original name is recreated from information 
stored in the squeezed version.) The squeezed version is not 
disturbed.

Each file name is processed in order,  and you can list  all 
the files you can fit in a command.  The file names given to 
SQ and USQ must be specific. You will learn below how to use 
the FLS program to expand patterns like *.asm (all files  of 
type  asm) into a list of specific names and feed them  into 
SQ or USQ.

The above examples let the destination drive default to  the 
current logged drive, which was shown in the prompt to be A. 
You can change the destination drive as often as you like in 
the parameter list. For example,
     A>sq x.aqm b: y.aqm z.aqm c: d:s.aqm
will create x.aqm on the current drive,  A,  y.aqm and z.aqm 
on the B drive and s.aqm on the C drive. Note that the first 
three originals are on drive A and the last one is on  drive 
D.  Remember  that each parameter is processed in order,  so 
you must change the destination drive before you specify the 
files to be created on that drive.

Eventually you will have diskettes with many squeezed  files 
on  them and you will wonder what is in which file.  If they 
weren't  squeezed you would use the TYPE command to look  at 
the  comments at the beginning of the  files.  But  squeezed 
files  just  make  a mess on your CRT screen when  you  TYPE 
them,  so  I have provided the required feature as a preview 
option to the USQ program.
     A>usq -10 x.bqs b:y.aqm
will  not take the time to create unsqueezed files.  Instead 
it  will  unsqueeze  the first 10 lines  of  each  file  and 
display  them  on your console.  The display from each  file 
consists of the file names, the data and a formfeed (FF).
Also,
     A>usq - c:xyz.mqc
will  unsqueeze  and display the first 65,535 lines  of  any 
files listed. That's the biggest number you can give it, and 
is intended to display the whole file.

This   preview  option  also  ensures  that  the   data   is 
displayable.  The  parity bit is stripped off (some Wordstar 
files  use  it for format control) and any  unusual  control 
characters  are  converted to periods.  You'll see  some  of 
these  at  the end of the files as the CP/M end of  file  is 
treated  as  data  and  the  remainder  of  the  sector   is 
displayed.

You are now familiar with all of the operational  parameters 
of SQ and USQ.  But so far you have always typed them on the 
command line which caused the program to be run. For reasons 
which  will become apparent later,  I have also provided  an 
interactive  mode.   If  there  are  no  parameters  (except 
directed  i/o  parameters,  described later) on the  command 
line,  SQ  and USQ will prompt with an asterisk  and  accept 
parameters from the console keyboard. Each parameter must be 
followed  by  RETURN and will be processed  immediately.  An 
empty  command (just RETURN) will cause the program to  exit 
back  to  CP/M.  Try it - it will help you  understand  what 
follows.

Now lets get into directed i/o, which will be new to most of 
you,  but will save you so much work you will wonder how you 
ever got along without it.

Perhaps you frequently squeeze or unsqueeze the same list of 
files  and you would like to type the list once and be  done 
with it. Use an editor (or FLS, described below) to create a 
file  with  one  parameter per line.  For  example  call  it 
commands.lst.

Then,
A>sq <commands.lst
will  cause the command list file to be read as if you  were 
typing it!

That was redirected console input.  Now assume that you have 
a very long list of files to squeeze or unsqueeze and  while 
you  are  taking a nap the progress comments and maybe  some 
error  comments  scroll  off  the  screen.  Redirecting  the 
console   output   will  let  you  capture   the   progress 
information  in a file so you can check it later.  The error 
comments will have the screen to themselves.

For example,
A>sq <commands.lst >out
will send the progress comments to the file "out", which you 
can TYPE later.  The routine display of the program name and 
version, etc., will still go to the console.

A more practical example is to send that information to  the 
console and to the file.
A>sq <commands.lst +out
will do that.

Redirected  input  and output are independent - you  can  do 
either, both or neither.

There is one more form of redirection called a "pipe". It is 
by far the most important to you.  Recall that I promised to 
tell  you how to use ambiguous file names such as *.asm (all 
files  of  type asm on the current default drive)  or  *.?q? 
(all files having a "q" as the second letter of their type). 
That last example just happens to mean "all squeezed files", 
assuming  you don't have any other files with such  a  silly 
name (I hope).

I  have  provided  a program called FLS  which  is  intended 
primarily for use in pipes. Here is an example:
A>fls c: x.asm y*.asm >temp.$$$
will  simply  pass the first two parameters through  to  the 
console output,  which is being redirected to a file  called 
temp.$$$.  But  the third parameter will be replaced by  all 
the  files  on the current drive which are of type  asm  and 
have names beginning with y.

FLS  is  smart  enough to know that a letter followed  by  a 
colon and nothing else is a destination drive name  intended 
for  SQ or USQ.  It will also treat any parameter  beginning 
with  a  - (minus sign) as an option to be  passed  through. 
Anything  else  is considered a file name or pattern and  is 
checked against the directory of the appropriate drive.

Therefore you could use:
A>fls b: c:*.aqm *.aqm -10 stuff.dqc >temp.$$$
A>usq <temp.$$$
A>era temp.$$$
to unsqueeze all files of type aqm on drives C and A and put 
the unsqueezed files on drive B,  and then preview the first 
10 lines of file stuff.dqc.

Here  is where the pipe comes in.  The above three  commands 
can be abbreviated as:
A>fls b: c:*.aqm *.aqm -10 stuff.dqc |usq

That  little  "|" is the pipe option and it causes  the  FLS 
output to be redirected to a temporary file and when that is 
done  it  actually  runs USQ for you with the  proper  input 
redirection and then erases the temporary file.

If  that  isn't  enough,  you  can still  use  the  +  or  > 
redirection  option  at the end of that line to capture  the 
console output from USQ.
A>fls b: c:*.aqm *.aqm -10 stuff.dqc |usq >out

If you plan your comments carefully you can produce a single 
file containing an abstract of an entire library of squeezed 
files in one step!
A>fls -25 *.?q? |usq >abstract

One  final point.  Anywhere you specify a file name you  can 
specify a drive in front of it.  That applies to redirection 
and well as files to be squeezed and unsqueezed.  If a  name 
begins  with a - (minus sign) it will look like an option to 
FLS unless you put a drive name in front of it (b:-sq.077).

USAGE SUMMARY:
The previous section gradually presented the various options 
by example. This section gives a condensed and more abstract 
description  and is intended for reference.  If you couldn't 
see  the forest for the trees,  maybe this will give  you  a 
better view.

The parameter handling of these programs is straightforward. 
Parameters  fall into two classes:  directed i/o options and 
operational parameters . Note that parameters read from files 
or  from the console are not forced to upper case,  but  the 
internal  file  handling routines all treat  lower  case  as 
upper case.

When  a  file to be written already exists,  it  is  quietly 
overwritten.



Directed I/O parameters:
The  first  action  taken by these programs  is  to  process 
directed  i/o parameters from the CP/M command  line.  These 
parameters are optional and take the forms:

     <file     read console input from file
     >file     send most console output to file
     +file     send most console output to file and console
     |pgm ...  send most console output to a temporary file
               then run PGM.COM and take console input
               from the temporary file. "..." represent the
               parameters for PGM. This is called "piping".

Only  one input and one output redirection can apply to each 
program. After the program has arranged for any directed i/o 
parameters to be obeyed they are deleted from the  parameter 
list seen by the rest of the program.

Operational parameters:
The   program  then  checks  if  there  are  any   remaining 
parameters from the CP/M command line.  If there  are,  they 
are obeyed. If and only if there are no remaining parameters 
on  the  command line,  the program prompts for them at  the 
console.  If  console input has been directed to a file  one 
parameter  is  read and obeyed from each line of  the  file. 
Otherwise,  the  user  follows each typed parameter  with  a 
RETURN and an empty command exits the program.

Each  operational parameter is obeyed without looking  ahead 
to  other  parameters,  so options should precede  the  file 
names to which they apply.

SQ operational parameters are a list of the following types:
     drive:         set the current destination drive
     filename       file to be squeezed
     drive:filename  "   "    "    "
     -		    Toggle debug mode (dumps tables)

SQ does not change the files being squeezed. New, squeezed 
files are created on the destination drive (defaults to  the 
current drive) with names derived from the original name but 
with  the second letter of the file type (extention) changed 
to Q.  When there is no type, QQQ is used. The original name 
is saved in the squeezed file.

USQ  operational  parameters  are a list  of  the  following 
types:
     drive:         set the current destination drive
     filename       file to be squeezed
     drive:filename  "   "    "    "
     -count         Preview (display on the console) the first
                    "count"  lines  of  each   file,   where 
                    "count" is a number from 1 to 65535.

If  the  -count  option IS NOT in effect  then  USQ  creates 
unsqueezed  versions of the listed files on the  destination 
drive,  which  defaults to the current  logged  drive.  Each 
unsqueezed  file is CRC checked against the CRC value of the 
original file, which is part of the squeezed file.

The  -count  option is for  previewing  squeezed  files.  It 
allows  you  to  skim  through a group  of  squeezed  files, 
peeking  at  the first "count" lines in each.  The  >  or  + 
output  redirection  option could be used  to  capture  this 
information  in a file,  along with the corresponding  file 
names, thus forming an abstract of the files on a disk.

When  the  -count option is used the CRC check is  cancelled 
and  the  output is forced into printable form by  stripping 
the  parity bit and changing most unprintable characters  to 
periods.  The exceptions are CR,  LF, TAB and FF. The output 
from  each file is terminated by an FF.  PIP can be used  to 
strip FFs and provide formatted printing if desired. "Count" 
defaults to the maximum value,  65,535,  in case you want to 
look at a whole file.

FLS operational parameters:  FLS is a "filter",  which means 
it  accepts input from the console input or command line and 
transforms the input according to a set of rules to  produce 
console  output.  That's fine for getting familiar with FLS, 
but to make it useful you "pipe" its output to the input  of 
SQ or USQ.

Any FLS parameter which is of the form:
     drive:
or   -anything
is  copied  to console output unchanged.

Any  other  FLS operational parameter is treated as  a  file 
name and is checked against the directory of the appropriate 
drive. If it contains * or ? it is replaced by a list of all 
the files which fit the pattern.  If nothing is found in the 
directory  an error comment is sent to the console,  even if 
normal console output has been redirected to a file.

IMPORTANT:  when  using  a pipe from FLS or any other  input 
redirection to get the file list,  etc.,  on which USQ or SQ 
are  to operate you must NOT put any parameters  other  than 
redirection  following  the program  name.  The  operational 
parameters must be all together in the input parameter list. 
Example:

A>fls -10  b:*.cq |usq +saveout
is  the  proper way to preview the top (first 10  lines)  of 
each  squeezed  .C file on the B drive.  The -10  is  passed 
through  FLS  to USQ.  The results will be displayed on  the 
console  and  saved in file "saveout" on the  A  drive.  The 
saveout  file lets you confirm the list of  processed  files 
even  if  the display scrolls off the screen  while  running 
unattended.

In summary, i/o redirection parameters (those prefixed by +, 
<,  >,  or |) always follow the command to which they apply, 
but  operational parameters  (destination  drive,  -options) 
must be with the file name list.

EXAMPLES:
1. Unsqueeze all squeezed files on the current drive and put 
the resulting unsqueezed files on the same drive.
     A>fls *.?q? |usq

2.  Look  at  the first 10 lines of every squeezed  file  on 
drive B.
    A>fls -10 b:*.?Q? |usq
note  that since the file names for USQ came from  FLS,  the 
count option had to come from there too.

4.  Squeeze all .ASM files on the B and C drives and put the 
squeezed files on the D drive.
     A>fls d: b:*.asm c:*.asm |sq
Note that if d:  had not been first the squeezed files would 
have gone to the A drive.

5.  Squeeze file xyz.c on the A drive and put the results on 
the A drive.
     A>sq xyz.c

6.  Build  a  parameter list of all ASM files on drive C  in 
file XX.PAR and view it on the console.
     A>fls c:*.asm +xx.par

7. Use the above list to squeeze the files to the A drive.
     A>sq <xx.par

8. As above, but results to the B drive.
     A>b:
     B>a:sq <a:xx.par

9.  Squeeze  all ASM and C files on the A drive and put  the 
results on the B drive. Capture the progress comments in the 
file "out" without displaying them.
     A>fls b: *.asm *.c |sq >out

10.  Preview  the first 24 lines of each squeezed  ASM  file 
THEN unsqueeze them (unless stopped via cntl-C).
     A>fls -24 *.aqm a: *.aqm |usq
Note  that  specification  of a  destination  drive  cancels 
previewing.

RECOMPILATION:
These programs are written in C and the instructions are for 
the BDS C compiler. The libraries must have been adapted for 
directed i/o as described in DIO2.C.

The  procedures below indicate the various C language source 
files  (file  type .C) required to  recompile.  Those  files 
contain  #include statements which cause header files  (file 
type  .H) to be read and compiled.  The BDSCIO.H header file 
contains information about your system,  including how  much 
space  to reserve for file buffers.  You should use your own 
version of this file.

The source files DIO2.C, SQDIO.C and USQDIO.C are identical! 
If you only get one,  just use PIP to create the rest.  They 
are separate only to provide separate CRL files,  which  are 
needed  because of the different external variable  options. 
Note  that  they  do  not  include  all  the  header  files, 
therefore  the  other  source  files must  include  the  dio 
related headers first.

DIO.C is supplied with BDS C.  The above three files  differ 
from  the official version only by a change to the  dioflush 
function to ensure TEMPIN.$$$ is deleted before another file 
is renamed to that name.  (CP/M is stupid enough to make two 
files of the same name!).

The  procedure for building the SQ.COM,  USQ.COM and FLS.COM 
files  from  their source files follows.  Note that  I  have 
renamed  the  first phase of the BDS C compiler  to  CC.COM. 
Also  I will assume the BDS C package is on drive D and  the 
SQ  and USQ related files are on B along with  BDSCIO.H  and 
DIO.H.

Each  CC command produces a CRL file with specific addresses 
for  external variables.  If you recompile a file  with  the 
same  value in the -e option you don't have to recompile the 
other  files,  just  do the desired CC and then  repeat  the 
entire CLINK.

CLINK's -s option prints statistics. Top of memory means the 
current TPA. Stack space is what's left over. These programs 
require  stack  space for local  variables,  including  some 
healthy i/o buffers.  Also some functions are recursive.  If 
SQ doesn't have several K of stack space it will probably go 
crazy and do almost anything.

If you have .CQ and .HQ files instead of .C and .H files you
must use USQ, probably with FLS, as described above to make
the .C and .H files.

For SQ (note not all use -o):
D>cc b:sq.c -o -e3600
D>a:pip b:sqdio.c=b:dio2.c
D>cc b:sqdio.c -e3600
D>cc b:tr1.c -o -e3600
D>cc b:tr2.c -o -e3600
D>cc b:io.c -o -e3600
D>cc b:sqdebug.c -e3600
D>clink b:sq sqdio tr2 tr1 io sqdebug -s

The linker will display some statistics. Check that the last 
code  address is less than the start address of the external 
variables (3600 in this example).  If not,  repeat the above 
with a higher address in the -e options.

For USQ (note not all use -o):
D>cc b:usq.c -o -e2900
D>a:pip b:usqdio.c=b:dio2.c
D>cc b:usqdio.c -e2900
D>cc b:utr.c -o -e2900
D>clink b:usq usqdio utr -s

Check the addresses as described above.

For FLS:
D>cc b:fls.c
D>cc b:dio2.c
D>clink b:fls dio2

IN CASE OF TROUBLE:
I  welcome  suggestions  and  bug  reports,   but  you  must 
understand that some of the ideas I get would involve almost 
as much program development as the original package.  I have 
what I want and (I hope) what most users want,  so I am  not 
motivated  to  spend  many more  months  creating  something 
entirely  different  which  just  happens  to  involve  data 
compression. The data compression routines are probably less 
than  half of this package,  and are designed to operate  on 
large blocks of data, such as files.

The - option recently added to SQ can be used to dump critical
tables if you are having trouble and need to ask for help. Just
run the program with control-P on the command line to get hard
copy. The last table gives the lengths of the bit codes used.

		Dick Greenlaw
                251 Colony Ct.
                Gahanna, Ohio 43230
                614-475-0172 weekends and evenings