USAGE AND RECOMPILATION DOCUMENTATION FOR: 8/29/81 SQ.COM 1.5 File squeezer USQ.COM 1.5 File unsqueezer FLS.COM 1.1 Ambiguous file name expander DISTRIBUTION RIGHTS: I allow unrestricted non-profit distribution of this software and invite users groups to spread it around. However, any distribution for profit requires my permission in advance. This applies only to the above listed programs and their program source and documentation files. I do sell other software. PURPOSE: The file squeezer, SQ, compresses files into a more compact form. This provides: 1. Faster transmission by modem. 2. Fewer diskettes to distribute a program package. (Include USQ.COM and instructions, both unsqueezed.) 3. Fewer diskettes for archival storage. Any file can be squeezed, but program source files and text files benefit the most, typically shrinking by 35%. Files containing only a limited character set, such as dictionary files, may shrink as much as 48%. Squeezed files look like gibbersh and must be unsqueezed before they can be used. The unsqueezer, USQ, expands squeezed files into exact duplicates of the original or provides a quick, unsqueezed display of the tops of (or all of) squeezed files. Unsqueezing requires only a single pass. Both SQ and USQ accept batches of work specified by lists of file names (with drives if needed) and miscellaneous options. They accept these parameters in any of three ways: 1. On the CP/M command line. 2. From the console keyboard. 3. From a file. The FLS program can be used (on the same command line!) to expand parameter lists containing wild-card (ambiguous) file names into lists with the specific file names required by SQ and USQ. This combination of programs allows you to issue a single command which will produce many squeezed or unsqueezed files from and to various diskettes. For example, to unsqueeze all squeezed ASM files on drive B and send the results to drive C and also unsqueeze all squeezed TXT files on drive A and send the results to drive D: A>fls c: b:*.aqm d: *.tqt |usq For detailed instructions see USAGE. This DOES run under plain old vanilla CP/M! Many of the smarts are buried in the COM files in the form of library routines provided with the BDS C package (available from Lifeboat). The above example simulates a "pipe" (indicated by the "|") by sending the "console" output of the fls.com program to a temporary file and then running the sq.com program with options which cause it to read its parameters from its "console" input, which is really redirected to come from the temporary file. Note that programs written in BDS C tend to be GOable. That is if you do A>save 0 GO and run a C program (just one - no pipes) then you can rerun it without reading it from disk by using GO as its name and giving the usual parameters. This works because BDS C doesn't support initialized static variables. The program has to initialize everything dynamically, so it cleans up for each rerun. THEORY: The data in the file is treated at the byte level rather then the word level, and can contain absolutely anything. The compression is in two stages: first repeated byte values are compressed and then a Huffman code is dynamically generated to match the properties of each particular file. This requires two passes over the source data. The decoding table is included in the squeezed file, so squeezing short files can actually lengthen them. Fixed decoding tables are not used because English and various computer languages vary greatly as to upper and lower case proportions and use of special characters. Much of the savings comes from not assigning codes to unused byte values. More detailed comments are included in the source files. USAGE TUTORIAL: As usual, you have to learn how to tell the programs what to do (i.e., what parameters to type after the program name). First I will introduce the various possibilities by example. Then I will summarize the rules. In the simplest case either SQ or USQ can simply be given one or more file names (with or without drive names): A>sq xyz.asm A>sq thisfile.doc b:thatfile.doc will create squeezed files xyz.aqm, thisfile.dqc and thatfile.dqc, all on the current drive, A. The original files are not disturbed. Note that the names of the squeezed files are generated by rules - you don't specify them. Likewise, A>usq xyz.aqm will create file xyz.asm on the A drive, overwriting the original. (The original name is recreated from information stored in the squeezed version.) The squeezed version is not disturbed. Each file name is processed in order, and you can list all the files you can fit in a command. The file names given to SQ and USQ must be specific. You will learn below how to use the FLS program to expand patterns like *.asm (all files of type asm) into a list of specific names and feed them into SQ or USQ. The above examples let the destination drive default to the current logged drive, which was shown in the prompt to be A. You can change the destination drive as often as you like in the parameter list. For example, A>sq x.aqm b: y.aqm z.aqm c: d:s.aqm will create x.aqm on the current drive, A, y.aqm and z.aqm on the B drive and s.aqm on the C drive. Note that the first three originals are on drive A and the last one is on drive D. Remember that each parameter is processed in order, so you must change the destination drive before you specify the files to be created on that drive. Eventually you will have diskettes with many squeezed files on them and you will wonder what is in which file. If they weren't squeezed you would use the TYPE command to look at the comments at the beginning of the files. But squeezed files just make a mess on your CRT screen when you TYPE them, so I have provided the required feature as a preview option to the USQ program. A>usq -10 x.bqs b:y.aqm will not take the time to create unsqueezed files. Instead it will unsqueeze the first 10 lines of each file and display them on your console. The display from each file consists of the file names, the data and a formfeed (FF). Also, A>usq - c:xyz.mqc will unsqueeze and display the first 65,535 lines of any files listed. That's the biggest number you can give it, and is intended to display the whole file. This preview option also ensures that the data is displayable. The parity bit is stripped off (some Wordstar files use it for format control) and any unusual control characters are converted to periods. You'll see some of these at the end of the files as the CP/M end of file is treated as data and the remainder of the sector is displayed. You are now familiar with all of the operational parameters of SQ and USQ. But so far you have always typed them on the command line which caused the program to be run. For reasons which will become apparent later, I have also provided an interactive mode. If there are no parameters (except directed i/o parameters, described later) on the command line, SQ and USQ will prompt with an asterisk and accept parameters from the console keyboard. Each parameter must be followed by RETURN and will be processed immediately. An empty command (just RETURN) will cause the program to exit back to CP/M. Try it - it will help you understand what follows. Now lets get into directed i/o, which will be new to most of you, but will save you so much work you will wonder how you ever got along without it. Perhaps you frequently squeeze or unsqueeze the same list of files and you would like to type the list once and be done with it. Use an editor (or FLS, described below) to create a file with one parameter per line. For example call it commands.lst. Then, A>sq <commands.lst will cause the command list file to be read as if you were typing it! That was redirected console input. Now assume that you have a very long list of files to squeeze or unsqueeze and while you are taking a nap the progress comments and maybe some error comments scroll off the screen. Redirecting the console output will let you capture the progress information in a file so you can check it later. The error comments will have the screen to themselves. For example, A>sq <commands.lst >out will send the progress comments to the file "out", which you can TYPE later. The routine display of the program name and version, etc., will still go to the console. A more practical example is to send that information to the console and to the file. A>sq <commands.lst +out will do that. Redirected input and output are independent - you can do either, both or neither. There is one more form of redirection called a "pipe". It is by far the most important to you. Recall that I promised to tell you how to use ambiguous file names such as *.asm (all files of type asm on the current default drive) or *.?q? (all files having a "q" as the second letter of their type). That last example just happens to mean "all squeezed files", assuming you don't have any other files with such a silly name (I hope). I have provided a program called FLS which is intended primarily for use in pipes. Here is an example: A>fls c: x.asm y*.asm >temp.$$$ will simply pass the first two parameters through to the console output, which is being redirected to a file called temp.$$$. But the third parameter will be replaced by all the files on the current drive which are of type asm and have names beginning with y. FLS is smart enough to know that a letter followed by a colon and nothing else is a destination drive name intended for SQ or USQ. It will also treat any parameter beginning with a - (minus sign) as an option to be passed through. Anything else is considered a file name or pattern and is checked against the directory of the appropriate drive. Therefore you could use: A>fls b: c:*.aqm *.aqm -10 stuff.dqc >temp.$$$ A>usq <temp.$$$ A>era temp.$$$ to unsqueeze all files of type aqm on drives C and A and put the unsqueezed files on drive B, and then preview the first 10 lines of file stuff.dqc. Here is where the pipe comes in. The above three commands can be abbreviated as: A>fls b: c:*.aqm *.aqm -10 stuff.dqc |usq That little "|" is the pipe option and it causes the FLS output to be redirected to a temporary file and when that is done it actually runs USQ for you with the proper input redirection and then erases the temporary file. If that isn't enough, you can still use the + or > redirection option at the end of that line to capture the console output from USQ. A>fls b: c:*.aqm *.aqm -10 stuff.dqc |usq >out If you plan your comments carefully you can produce a single file containing an abstract of an entire library of squeezed files in one step! A>fls -25 *.?q? |usq >abstract One final point. Anywhere you specify a file name you can specify a drive in front of it. That applies to redirection and well as files to be squeezed and unsqueezed. If a name begins with a - (minus sign) it will look like an option to FLS unless you put a drive name in front of it (b:-sq.077). USAGE SUMMARY: The previous section gradually presented the various options by example. This section gives a condensed and more abstract description and is intended for reference. If you couldn't see the forest for the trees, maybe this will give you a better view. The parameter handling of these programs is straightforward. Parameters fall into two classes: directed i/o options and operational parameters . Note that parameters read from files or from the console are not forced to upper case, but the internal file handling routines all treat lower case as upper case. When a file to be written already exists, it is quietly overwritten. Directed I/O parameters: The first action taken by these programs is to process directed i/o parameters from the CP/M command line. These parameters are optional and take the forms: <file read console input from file >file send most console output to file +file send most console output to file and console |pgm ... send most console output to a temporary file then run PGM.COM and take console input from the temporary file. "..." represent the parameters for PGM. This is called "piping". Only one input and one output redirection can apply to each program. After the program has arranged for any directed i/o parameters to be obeyed they are deleted from the parameter list seen by the rest of the program. Operational parameters: The program then checks if there are any remaining parameters from the CP/M command line. If there are, they are obeyed. If and only if there are no remaining parameters on the command line, the program prompts for them at the console. If console input has been directed to a file one parameter is read and obeyed from each line of the file. Otherwise, the user follows each typed parameter with a RETURN and an empty command exits the program. Each operational parameter is obeyed without looking ahead to other parameters, so options should precede the file names to which they apply. SQ operational parameters are a list of the following types: drive: set the current destination drive filename file to be squeezed drive:filename " " " " - Toggle debug mode (dumps tables) SQ does not change the files being squeezed. New, squeezed files are created on the destination drive (defaults to the current drive) with names derived from the original name but with the second letter of the file type (extention) changed to Q. When there is no type, QQQ is used. The original name is saved in the squeezed file. USQ operational parameters are a list of the following types: drive: set the current destination drive filename file to be squeezed drive:filename " " " " -count Preview (display on the console) the first "count" lines of each file, where "count" is a number from 1 to 65535. If the -count option IS NOT in effect then USQ creates unsqueezed versions of the listed files on the destination drive, which defaults to the current logged drive. Each unsqueezed file is CRC checked against the CRC value of the original file, which is part of the squeezed file. The -count option is for previewing squeezed files. It allows you to skim through a group of squeezed files, peeking at the first "count" lines in each. The > or + output redirection option could be used to capture this information in a file, along with the corresponding file names, thus forming an abstract of the files on a disk. When the -count option is used the CRC check is cancelled and the output is forced into printable form by stripping the parity bit and changing most unprintable characters to periods. The exceptions are CR, LF, TAB and FF. The output from each file is terminated by an FF. PIP can be used to strip FFs and provide formatted printing if desired. "Count" defaults to the maximum value, 65,535, in case you want to look at a whole file. FLS operational parameters: FLS is a "filter", which means it accepts input from the console input or command line and transforms the input according to a set of rules to produce console output. That's fine for getting familiar with FLS, but to make it useful you "pipe" its output to the input of SQ or USQ. Any FLS parameter which is of the form: drive: or -anything is copied to console output unchanged. Any other FLS operational parameter is treated as a file name and is checked against the directory of the appropriate drive. If it contains * or ? it is replaced by a list of all the files which fit the pattern. If nothing is found in the directory an error comment is sent to the console, even if normal console output has been redirected to a file. IMPORTANT: when using a pipe from FLS or any other input redirection to get the file list, etc., on which USQ or SQ are to operate you must NOT put any parameters other than redirection following the program name. The operational parameters must be all together in the input parameter list. Example: A>fls -10 b:*.cq |usq +saveout is the proper way to preview the top (first 10 lines) of each squeezed .C file on the B drive. The -10 is passed through FLS to USQ. The results will be displayed on the console and saved in file "saveout" on the A drive. The saveout file lets you confirm the list of processed files even if the display scrolls off the screen while running unattended. In summary, i/o redirection parameters (those prefixed by +, <, >, or |) always follow the command to which they apply, but operational parameters (destination drive, -options) must be with the file name list. EXAMPLES: 1. Unsqueeze all squeezed files on the current drive and put the resulting unsqueezed files on the same drive. A>fls *.?q? |usq 2. Look at the first 10 lines of every squeezed file on drive B. A>fls -10 b:*.?Q? |usq note that since the file names for USQ came from FLS, the count option had to come from there too. 4. Squeeze all .ASM files on the B and C drives and put the squeezed files on the D drive. A>fls d: b:*.asm c:*.asm |sq Note that if d: had not been first the squeezed files would have gone to the A drive. 5. Squeeze file xyz.c on the A drive and put the results on the A drive. A>sq xyz.c 6. Build a parameter list of all ASM files on drive C in file XX.PAR and view it on the console. A>fls c:*.asm +xx.par 7. Use the above list to squeeze the files to the A drive. A>sq <xx.par 8. As above, but results to the B drive. A>b: B>a:sq <a:xx.par 9. Squeeze all ASM and C files on the A drive and put the results on the B drive. Capture the progress comments in the file "out" without displaying them. A>fls b: *.asm *.c |sq >out 10. Preview the first 24 lines of each squeezed ASM file THEN unsqueeze them (unless stopped via cntl-C). A>fls -24 *.aqm a: *.aqm |usq Note that specification of a destination drive cancels previewing. RECOMPILATION: These programs are written in C and the instructions are for the BDS C compiler. The libraries must have been adapted for directed i/o as described in DIO2.C. The procedures below indicate the various C language source files (file type .C) required to recompile. Those files contain #include statements which cause header files (file type .H) to be read and compiled. The BDSCIO.H header file contains information about your system, including how much space to reserve for file buffers. You should use your own version of this file. The source files DIO2.C, SQDIO.C and USQDIO.C are identical! If you only get one, just use PIP to create the rest. They are separate only to provide separate CRL files, which are needed because of the different external variable options. Note that they do not include all the header files, therefore the other source files must include the dio related headers first. DIO.C is supplied with BDS C. The above three files differ from the official version only by a change to the dioflush function to ensure TEMPIN.$$$ is deleted before another file is renamed to that name. (CP/M is stupid enough to make two files of the same name!). The procedure for building the SQ.COM, USQ.COM and FLS.COM files from their source files follows. Note that I have renamed the first phase of the BDS C compiler to CC.COM. Also I will assume the BDS C package is on drive D and the SQ and USQ related files are on B along with BDSCIO.H and DIO.H. Each CC command produces a CRL file with specific addresses for external variables. If you recompile a file with the same value in the -e option you don't have to recompile the other files, just do the desired CC and then repeat the entire CLINK. CLINK's -s option prints statistics. Top of memory means the current TPA. Stack space is what's left over. These programs require stack space for local variables, including some healthy i/o buffers. Also some functions are recursive. If SQ doesn't have several K of stack space it will probably go crazy and do almost anything. If you have .CQ and .HQ files instead of .C and .H files you must use USQ, probably with FLS, as described above to make the .C and .H files. For SQ (note not all use -o): D>cc b:sq.c -o -e3600 D>a:pip b:sqdio.c=b:dio2.c D>cc b:sqdio.c -e3600 D>cc b:tr1.c -o -e3600 D>cc b:tr2.c -o -e3600 D>cc b:io.c -o -e3600 D>cc b:sqdebug.c -e3600 D>clink b:sq sqdio tr2 tr1 io sqdebug -s The linker will display some statistics. Check that the last code address is less than the start address of the external variables (3600 in this example). If not, repeat the above with a higher address in the -e options. For USQ (note not all use -o): D>cc b:usq.c -o -e2900 D>a:pip b:usqdio.c=b:dio2.c D>cc b:usqdio.c -e2900 D>cc b:utr.c -o -e2900 D>clink b:usq usqdio utr -s Check the addresses as described above. For FLS: D>cc b:fls.c D>cc b:dio2.c D>clink b:fls dio2 IN CASE OF TROUBLE: I welcome suggestions and bug reports, but you must understand that some of the ideas I get would involve almost as much program development as the original package. I have what I want and (I hope) what most users want, so I am not motivated to spend many more months creating something entirely different which just happens to involve data compression. The data compression routines are probably less than half of this package, and are designed to operate on large blocks of data, such as files. The - option recently added to SQ can be used to dump critical tables if you are having trouble and need to ask for help. Just run the program with control-P on the command line to get hard copy. The last table gives the lengths of the bit codes used. Dick Greenlaw 251 Colony Ct. Gahanna, Ohio 43230 614-475-0172 weekends and evenings