Files

    A file is a collection of related information.  All programs, text,
and data on your disk reside in files and each file has a unique name.
You refer to files by their file names.

    You create a file each time you enter and save data or text at your
terminal.   Files are also created when you write programs and save them
on your disks.

    The names of the files are kept in directories on a disk.  These
directories also contain information on the size of the files and may
contain the dates that they were created, updated, and accessed.

    If you want to know what files are on your disk, you can use the DIR
command.  This command tells the operating system to display all the
files in the working directory of a specific disk.


                               File Names

    Each CP/M file has a unique name consisting of one to eight
characters, optionally qualified by the drive and an extension.  The
three parts that make up file names follow:

                d:filename.ext

    The first part is the drive code and is optional.  The drive code is
a single letter followed by a colon.  The drive code specifies the disk
drive on which the file is currently to be found.  CP/M provides up to
16 disk drives, named "A" thru "P".  Some systems, such as CP/M Plus and
ZCPR, allow you to specify the user area with the drive code.  Many
utility programs also allow the "drive/user" specification.  If the
drive code is not specified, the logged on drive is assumed.

    The second part is the actual name of the file.  The file name is
from one to eight characters, usually upper case alphabetic or numeric,
but some other printable characters can be used.  The characters < > . ,
; : = ? * [ ] have special meaning and may not be used.  The name should
be an abbreviated, descriptive name of what the file contains.

    The third part is the optional file type or file extension.  It is
separated form the file name by a period.  It is good practice to
include a file type even though it is optional.  File types consist of
up to three characters and are separated from the file name by a period.
Some programs, such as "ASM" require a specific file type.

        EXAMPLES:

                A:FILENAME.INF
                ^ ^        ^
                | |        |
                | |        --- FILE TYPE ---  optional 1-3 characters.
                | |
                | |
                | ------------ FILE NAME ---  required 1-8 characters.
                |
                |
                -------------- DRIVE CODE --  optional 1 character.



    At the operating system prompt (the "A>") and from most programs,
you can enter the filename in upper or lower case letters.  The
characters will be translated to upper case.  The exception to this is
when you issue the save command from BASIC.  Lower case characters will
actually be saved in the directory.

    Attached is a list of commonly used file type designations on RCP/M
systems around the country.  There is no RULE that these have to be used
as described, but it is conventional to use them that way.

    Two special characters (called wildcards) can be used when you are
searching the files on a disk: the asterisk (*) and the question mark
(?).  The question mark (?) in a file name or extension means that any
valid character can occupy that position.  An asterisk (*) in the file
name or extension means that any character can occupy that position or
any of the remaining positions in the file name or extension.

    ASCII (American Standard Code for Information Interchange) files are
printable -- readable-by-human -- files.  They consist of letters,
numbers, and a few symbols such as periods, comas, !, @, #, $, %, &, *,
etc. . . with which we are familiar.  An ASCII file may be "TYPE"ed, and
can be transferred over phone lines without error-checking, if desired.
ASCII file should contain no word processor specific information.
WordStar files should be saved in non-document mode.  Each line should
end with a Carriage Return, Line Feed sequence and an EOF (hex 1A)
should pad the last block of data.

     Many of the popular Public Domain programs and information files
are distributed in library (LBR) files.  Below is a discussion of the
structure of LBRs and the utilities needed to process them.

     A library is a group of files collected together into one file in
such a way that the individual files may be recovered intact.  A library
file can be identified by the "LBR" as the extent of the file name.  LU
is a CP/M utility used to maintain libraries of files.  LU does not
perform any compression.  Because of this, most people will squeeze or
crunch files before adding them to a library if they want to save space.
If you want to remove the component files (members) from a .LBR file,
you should have a copy of LU.COM or other LBR extractor utility.  At the
end of this document is a list of the programs available on many Remote
CP/M systems and in the CP/M RoundTable Software Libraries of GEnie that
function with libraries.

     A library file usually takes up less space than the total of the
individual member files which went into it.  The reason for this is that
CP/M allocates disk space in fixed blocks or groups, typically 2k bytes
each.  Any space after the last sector of a file up to the next 2k block
boundary is wasted.  The same files in a library use only the number of
sectors they actually need, and though the library itself may have a
partially wasted block at the end, and requires some space for directory
information at the beginning, the net effect is usually a saving of
total space.  The best results are seen when many small files are
combined into one library.

     A library file makes most efficient use of the CP/M disk directory,
since it is treated as only one file by CP/M regardless of how many
members it contains.

     Libraries can aid in transferring packages of software from one
system to another using XMODEM or other file transfer protocol.  Only
one file is transferred, eliminating the need to run the XMODEM transfer
program several times, the chance of overlooking a needed file, and the
problems of naming conflicts, (such as READ.ME files) among unrelated
packages.

     When members are added to a library, a CRC (Cyclic Redundancy
Check) value is calculated and stored in the directory of the library.
When the members are later extracted or the library is reorganized, the
CRC value is again calculated and checked against the value in the
directory.  If a discrepancy occurs the operator is notified. (Caution:
This CRC validation does not occur with some public domain file
extractors and earlier versions of LU and NULU.)

     Members can be added to, renamed, and deleted to the library.  The
directory information of library is contained in the same file as the
members.  The amount of space to be allocated to the directory must be
specified by the user when a new library is created, but can be changed
when the file is reorganized.


     Recently popular CP/M Public Domain software files and information
files are being distributed using ARCHIVE files.  ARChive files are
similar to library (LBR) files in that they take a logical group of
files and put them together in a single file.  The main difference, is
that the members of the "ARC" file are automatically compressed.  The
compression algorithm chosen is one of three which will produce the
smallest file.

     ARChive files have been available to the MS-DOS and PC-DOS areas,
but, have been made useful in the CP/M environment with the introduction
of the "UNARC" program.  The current version is 1.6, and is available
with assembly language source, extensive documentation, and two
executable COM files, a 8080/8085 version and a Z80 version.  The Z80
version takes advantage of the expanded Z80 (and equivalent) instruction
set for speed and size, and therefore is machine dependent.

     A CP/M utility has just recently been made available to make an
"ARC" file.  However, because of the resources required, it is still
impractical to make Archives in the CP/M environment.  ARChive files will
be made on systems using other operating systems.

     ARChive files are identified by the "ARC" as the file extension.
This is a packaging method that guarantees no growth during storage.
The files contain a "marker", followed by file information, file-data,
file information, file-data etc.  File contents are analyzed before
storage and either stored:

1.  AS IS (typically files in the 1 to 200 byte range).
2.  With repeat-compression (same range as above).
3.  Using Huffman 8-byte encoding.
4.  Using Lempel-Ziv-Welch encoding (all others).
8.  Crunched - non-repeat packed (DLE encoded).
9.  New squashed files created with PKARC.

    The ARChive technique frees the user from worrying about storage
mechanisms and delivers practically all needed services (extract, store,
list, type, check, execute and re-compress using "latest" state of
compression technique).  ARC is "downward" compatible.  It is currently
heavily used in the MSDOS/PCDOS world, although usage in RCP/M systems
is starting with availability of a fast DE-ARCer.

    The MS/PC-DOS ARC utility belongs into the category of "Share-ware"
or "Free-ware" - it is copyrighted by System Enhancement Associates
(source-language C, system MSDOS).  Phil Katz is the author of PKARC and
the current version is 3.5.  UNARC was written by Bob Freed for the
Public Domain (source-language assembler, for CP/M systems).


     Some files on RCP/M systems and in the CP/M RoundTable Software
Libraries have been compressed, using one of the standard public domain
utilities, to minimize download time and save storage space.  This topic
briefly discusses these compression techniques.

     Files that have been compressed can be identified by the filetype
(the last 3 letters of a filename after the ".") that signifies the
compression.  These are:

        .?Q?   for Squeezed files (middle letter is a Q).
        .?Z?   for Crunched files (middle letter is a Z).

     USQ120.COM is used to unsqueeze, or expand files that have a "Q" as
the middle letter of the filetype.  Such files have been squeezed, or
compressed with SQ111.COM or similar utility.  These programs use
Huffman Encoding to reduce the size of the target file.  Depending on
the distribution of data in a file it can be reduced in size by 5% to
60% by squeezing it.  If you download a file with a filetype indicating
that it is squeezed, you will need USQ120.COM to expand it before you
can use it.  There are other programs available, written in different
languages and take advantage of special hardware, but USQ is
8080/8085/Z80 compatible.

     Other utilities are available that have the unsqueeze coding
imbedded and function with squeezed or unsqueezed files.  There are
programs that perform file maintenance functions (NSWP), bi-directional
display utilities (BISHOW), and string search programs, (SEARCH and
FINDU).  This method of compressing files has been used for some time
now and programs to uncompress the files are available to several micro
processors and main frame computers.


     CRUNCH uses the Lempel-Ziv-Welch (LZW) techniques.  This method is
fast and offers compression ratios around 55%.  Highest compression is
achieved with graphics data, values of 90% are typical, followed by
text, with 50%, and COM files around 20%.  This method is relatively new
to the CP/M environment.  See CRUNCH24.LBR for the Z80 CRUNCH and
UNCRUNCH utilities.  FCRNCH11.LBR contains the utilities for 8080/8085
compatible processors.  CRUNCH Version v2.0 and higher embody all of the
concepts employed in the UNIX COMPRESS / ARC512 algorithm, but is
additionally enhanced by a "metastatic code reassignment" facility.
This is one of several concepts the author, Steven Greenberg is
developing as part of an effort to advance data compression techniques
beyond current performance limits.  He believes this is the first time
this principle has been proposed and implemented.

     Since this method of file compression is relatively new, only a few
utilities are available that process a crunched file directly.  TYPELZW,
TYPEQZ, and LT are display utilities, which also display members of
libraries and squeezed files.  SEARCH is a file searching program that
allows you to search multiple text files for various words or phrases.
SEARCH can directly search files within libraries, as well as squeezed
and crunched files.  Files may also be processed on other systems not
using the Z80 processor.

     A mini comparison of Huffman Encoding and Lempel-Ziv-Welch (LZW)
techniques follows.

     Huffman Encoding expresses each storage unit as a variable length
pointer into a frequency-ordered tree.  Compression is achieved by
choosing a "native" storage unit (where repetitions are bound to occur)
and (on the average) expressing the more frequent storage units with
shorter pointers [although less used units might be presented by longer
pointers].  The Encoding process needs two passes i.e., once reading all
units (under CP/M and MSDOS 8 bit bytes) to build the frequency ordered
tree (also called the "dictionary") and then translating all units into
their respective pointer values.  Original filename, dictionary and
pointer values are stored - by convention the second character of the
filename extension is changed to Q - reminder of a "squeezed" file.

     LZW expresses strings of 8-bit bytes by pointers into an "ordered"
string-table.  The rules for "constructing" the table are reversible, so
that Compressor and De-Compressor can build their table on-the-fly.  LZW
is one-pass, although achieved speed is VERY dependent on language
implementation and available physical memory (in general more than 90%
of time spent in hashing and table searching). Although early
implementations of LZW seemed to need more than 64K of physical memory,
current enhancements make a maximum of 2**11 table entries sufficient to
handle all cases.  State of the art implementations check compression
ratio on the fly - and rebuild the table if compression ratio decreases
beyond a minimum or rebuild the table on table overflow.

     Typical Huffman compression ratios however around 33% (compressed
file is 66% of original, whereby text is typically compressed a little
better, and executable files less).  Typical LZW compression ratios
average 55%.  Highest compression is achieved with pixel-information,
values of 90% are typical, followed by text, with 50%, and executable
files around 20%.  Although the original paper on LZW suggested
implementation between CPU and peripheral devices (terminal,
disk-drives, mag-tapes) - current usage encompasses file-compression
(Unix COMPRESS, MSDOS ARC, CPM UNArc) - high speed proprietary
MODEM-protocols ("LZW in SILICON") and "picture transmission" at 1200
baud.

Thoughts on CP/M and MS-DOS filename compatibility.

     Many users now work with both CP/M and MS-DOS systems.  Files of
the two systems have a compatible file structure , (ASCII text,
WordStar, dBase II, Archives, etc), and multi-format disk utilities,
(Media Master, Uniform, etc).  Unfortunately, although the file naming
conventions for each of the systems are similar, there are some
differences that demand attention if compatibility is to be assured.

     Below is a list of the LEGAL characters common to both CP/M and
MS-DOS:

        A-Z  0-9  ! # $ & ' - @ ^ ` { } ~

In ASCII sorting order (same characters):

        ! # $ & ' -  0-9  @  A-Z  ^ ` { } ~


MS-DOS illegal file names (reserved for device names):

        AUX, CON, PRN, NUL, COM1, COM2, LPT1, LPT2, LPT3


     Computer users that are interested in transferring files could
standardize on the above characters, (while avoiding the reserved names
when using CP/M).  This would provide one more area of compatibility.

                               File Types

$$$ -- Temporary file, used by PIP and other copy programs as a work file.
ACT -- ACT language source file.
ADD -- Indicating an "addition" or new update.
ADV -- Adventure game.
ALG -- ALGOL language source files.
APL -- APL language.
ARC -- ARChive files.
ARK -- ARChive files, used for CP/M files.
ART -- Article files.
ASC -- BASIC language source statements.
ASM -- Assembly Language source code, usually for 8080 assemblers.
AZM -- Assembly Language source code, used with Z80MR.
BAD -- Bad sector directory entry file.
BAK -- Backup file.
BAS -- Basic language source statements.  Normally saved as ASCII.
BBS -- Bulletin board system file.
BHB -- Heath Benton Harbor Basic language.
BIN -- Binary file.  Usually NOT a .COM file renamed.
BSE -- E BASIC source.  See also, "EBA" and "EBS".
BUG -- Bug data/information file.
C   -- C Language source.  Most often BDS C.
CAL -- Calc or spreadsheet data file.
CAT -- Catalog of file names.
CCP -- Console command processor file.
CHK -- Check file.
CMD -- Command file CP/M 86.
COB -- COBOL language source statements.
COM -- Machine language COMMAND files for CP/M 80.
CPR -- Compare file.
CRC -- CRC data file.
CRL -- C language relocatable/intermediate file:.
DAT -- DATA file.
DDT -- DDT file.
DIF -- Difference file.
DIR -- Directory file.
DOC -- Documentation file.
DSK -- Disk data file.
EBA -- E BASIC source.  See also "BSE" and "EBS".
EBS -- E BASIC source.  See also "EBA" and "BSE".
ENV -- ZCPR3 Environment Descriptor file.
ERL -- Relocatable pascal module.
FCP -- ZCPR3 Flow Command Package.
FEX -- Felix language source file.
FIX -- Instructions for correct program errors.
FMT -- Format file.
FOR -- FORTRAN language source statements.
GMR -- Grammar file.
H   -- C Language "header" source statements.
HEX -- HEX intermediate file.  Most often INTEL format.
HLP -- File intended for use with the HELP utility.
IDX -- Index file for data file.
INF -- Information files.
INP -- Input file.
INT -- Intermediate code produced by compilers such as CBASIC.
INV -- Invoice file.
IOP -- ZCPR3 Input/Output Package.
LBR -- Library file.  Use NULU, LU, LDIR, LUX, LTYPE to manipulate.
LIB -- Library file assembly source module.
LST -- Listing files, intended for printing.
LTR -- Letter/correspondence file.
M80 -- Microsoft M80 Macro assembler source.
MAC -- Macro assembly source file for M80.
MAG -- Magazine file.
MAP -- Map data file.
MEM -- Memory file.
MNU -- ZCPR3 MENU utility script.
MOD -- Modification instructions.
MSG -- Message file.  Timely, not of permanent use.
MSS -- Manuscript documents.  Input to word processors.
MUS -- Music language source file.
NAM -- Name file.
NDR -- ZCPR3 Named Directory Package.
NEW -- Indicates proposed revision to an existing program/release.
OBJ -- Object file or renamed COM.
OUT -- Output file.
OVL -- Overlay command file.
OVR -- Overlay: a "part" of a multi-part .COM file.
PAS -- PASCAL language source statements.
PAT -- Patch for customizing or fixing programs.
PGM -- Program file.
PIC -- Picture file.
PL1 -- PL/1 language source statements.
PLM -- PLM language source file.
PLT -- Pilot language source file.
PRN -- Listing output of assemblers.
PRT -- Print files, intended for printing.
PTR -- Printer file.
PUN -- Punch device file.
RAT -- Ratfor language source file.
RCP -- ZCPR3 Resident Command Package.
REF -- Reference file.
REL -- Relocatable/intermediate file.  Output from.
ROM -- Read only memory file.
RPT -- Report file.
SAM -- SAM language source file.
SET -- Setup file.
SIG -- SIG/M information file.
SRC -- Pascal source file.
SRT -- Sorted file.
STC -- STOIC language source file.
SUB -- File of commands for input to SUBMIT.
SUB -- Submit command file.
SYM -- Symbol table file.
SYS -- System file.
TEL -- Telephone number file.
TEX -- Text file.
TST -- Test file.
TXT -- Text file.
TYP -- Type file.
UTL -- Utility file.
VAR -- Variable file.
VMN -- ZCPR3 VMENU utility script.
WS  -- Text document in WordStar format.
Z3T -- ZCPR3 TCAP entry.
Z80 -- Assembly Language source code, usually for Z80 assemblers.
ZEX -- ZCPR3 ZEX utility script file.
nnn -- Used to indicate "volume serial #".
xQx -- Squeezed file.  Needs to be "unsqueezed" before use.
xZx -- Crunched file.  Needs to be "uncrunched" before use.

                             File Utilities

      File name       K    Description

     ARC-FILE.IQF     5  ARC file internal structure defined
     CPMSQV3.LBR     30  SQueeze/UnSQueeze - Turbo Pascal
     CRUNCH20.LBR    52  Data compression with LZW algorithm
     DELBR11.COM     13  LBR file extractor
     DELBR11A.CQ      6  LBR file extractor source code
     DLU12.PQS       11  A library utility in turbo pascal
     LBRDSK23.LBR    17  Treat libraries as a logical drive
     LDIR.COM         2  Directory lister for LBR files
     LDIR23.LBR      16  Lists directory of LBR file
     LRUN20.AQM      16  Run .COM files inside LBRs
     LRUN20.COM       2  Run .COM files inside LBRs
     LSTYPE.LBR       7  Print multiple files inside LBRs
     LSWEEP13.LBR    25  Library SWEEP utility extract/view
     LTYPE17.LBR     17  Types text files inside LBRs
     LU300.DQC       22  Documentation for LU
     LU310.COM       21  Library Utility version 3.10
     LU310.HLP        1  Help file for use with LU310
     LU310.UPD        3  Update info on LU310.COM
     LUDEF5.DQC      11  Internal structure of LBR files
     LZW.LBR         52  Compression/decompression Utilities
     NULU15.NOT       2  A note from the author of NULU151
     NULU15.WQ       40  Complete user's guide for NULU151
     NULU151.COM     16  Machine lang. Library Utility pgm
     NULUFIX.ASM      2  Bug fixes for NULU15.COM
     NULUTERM.AQM     2  Terminal configuration for NULU151
     SQ.PQS          13  File SQueezer
     SQ111.COM        6  Machine language SQueezer, very fast
     SQUEEZE.TXT     13  Tutorial on SQueeze/UnSQueeze
     SQUPORT2.LBR    35  Portable SQueeze/UnSQueeze in C lang
     TYPEQZ12        35  Squeezed/Crunched type utility
     UNARC-P1.NQT     2  UNARC12 patch for non-standard CP/M
     UNARC.COM        5  Z80 version of UNARChive utility
     UNARC12.LBR    108  UNARC utility for CP/M
     UNARCA.COM       5  Lists, types, extracts from ARChives for 8080
     UNCR20.COM       4  UNCRunch for CRUNCH20 and prior
     UNCR8080.COM     6  UNCRunch for 8080/8085 CPUs
     USQ.PQS          5  SQueezed file UnSQueezer
     USQ120.COM       2  Dave Rand's machine lang. UnSQueezer
     USQ120.DOC       3  Documentation for Dave Rand's USQ120
     USQFST20.LBR    28  Fast unsqueezer for Z80 computers

November 11, 1987

    This text file consists of notes taken at the November meeting of
D:KUG (The Detroit Metropolitan Kaypro Users Group).  The subject of the
meeting was about files, the format of file names, and the public domain
programs available to process disk files.

B.Duerr