Files A file is a collection of related information. All programs, text, and data on your disk reside in files and each file has a unique name. You refer to files by their file names. You create a file each time you enter and save data or text at your terminal. Files are also created when you write programs and save them on your disks. The names of the files are kept in directories on a disk. These directories also contain information on the size of the files and may contain the dates that they were created, updated, and accessed. If you want to know what files are on your disk, you can use the DIR command. This command tells the operating system to display all the files in the working directory of a specific disk. File Names Each CP/M file has a unique name consisting of one to eight characters, optionally qualified by the drive and an extension. The three parts that make up file names follow: d:filename.ext The first part is the drive code and is optional. The drive code is a single letter followed by a colon. The drive code specifies the disk drive on which the file is currently to be found. CP/M provides up to 16 disk drives, named "A" thru "P". Some systems, such as CP/M Plus and ZCPR, allow you to specify the user area with the drive code. Many utility programs also allow the "drive/user" specification. If the drive code is not specified, the logged on drive is assumed. The second part is the actual name of the file. The file name is from one to eight characters, usually upper case alphabetic or numeric, but some other printable characters can be used. The characters < > . , ; : = ? * [ ] have special meaning and may not be used. The name should be an abbreviated, descriptive name of what the file contains. The third part is the optional file type or file extension. It is separated form the file name by a period. It is good practice to include a file type even though it is optional. File types consist of up to three characters and are separated from the file name by a period. Some programs, such as "ASM" require a specific file type. EXAMPLES: A:FILENAME.INF ^ ^ ^ | | | | | --- FILE TYPE --- optional 1-3 characters. | | | | | ------------ FILE NAME --- required 1-8 characters. | | -------------- DRIVE CODE -- optional 1 character. At the operating system prompt (the "A>") and from most programs, you can enter the filename in upper or lower case letters. The characters will be translated to upper case. The exception to this is when you issue the save command from BASIC. Lower case characters will actually be saved in the directory. Attached is a list of commonly used file type designations on RCP/M systems around the country. There is no RULE that these have to be used as described, but it is conventional to use them that way. Two special characters (called wildcards) can be used when you are searching the files on a disk: the asterisk (*) and the question mark (?). The question mark (?) in a file name or extension means that any valid character can occupy that position. An asterisk (*) in the file name or extension means that any character can occupy that position or any of the remaining positions in the file name or extension. ASCII (American Standard Code for Information Interchange) files are printable -- readable-by-human -- files. They consist of letters, numbers, and a few symbols such as periods, comas, !, @, #, $, %, &, *, etc. . . with which we are familiar. An ASCII file may be "TYPE"ed, and can be transferred over phone lines without error-checking, if desired. ASCII file should contain no word processor specific information. WordStar files should be saved in non-document mode. Each line should end with a Carriage Return, Line Feed sequence and an EOF (hex 1A) should pad the last block of data. Many of the popular Public Domain programs and information files are distributed in library (LBR) files. Below is a discussion of the structure of LBRs and the utilities needed to process them. A library is a group of files collected together into one file in such a way that the individual files may be recovered intact. A library file can be identified by the "LBR" as the extent of the file name. LU is a CP/M utility used to maintain libraries of files. LU does not perform any compression. Because of this, most people will squeeze or crunch files before adding them to a library if they want to save space. If you want to remove the component files (members) from a .LBR file, you should have a copy of LU.COM or other LBR extractor utility. At the end of this document is a list of the programs available on many Remote CP/M systems and in the CP/M RoundTable Software Libraries of GEnie that function with libraries. A library file usually takes up less space than the total of the individual member files which went into it. The reason for this is that CP/M allocates disk space in fixed blocks or groups, typically 2k bytes each. Any space after the last sector of a file up to the next 2k block boundary is wasted. The same files in a library use only the number of sectors they actually need, and though the library itself may have a partially wasted block at the end, and requires some space for directory information at the beginning, the net effect is usually a saving of total space. The best results are seen when many small files are combined into one library. A library file makes most efficient use of the CP/M disk directory, since it is treated as only one file by CP/M regardless of how many members it contains. Libraries can aid in transferring packages of software from one system to another using XMODEM or other file transfer protocol. Only one file is transferred, eliminating the need to run the XMODEM transfer program several times, the chance of overlooking a needed file, and the problems of naming conflicts, (such as READ.ME files) among unrelated packages. When members are added to a library, a CRC (Cyclic Redundancy Check) value is calculated and stored in the directory of the library. When the members are later extracted or the library is reorganized, the CRC value is again calculated and checked against the value in the directory. If a discrepancy occurs the operator is notified. (Caution: This CRC validation does not occur with some public domain file extractors and earlier versions of LU and NULU.) Members can be added to, renamed, and deleted to the library. The directory information of library is contained in the same file as the members. The amount of space to be allocated to the directory must be specified by the user when a new library is created, but can be changed when the file is reorganized. Recently popular CP/M Public Domain software files and information files are being distributed using ARCHIVE files. ARChive files are similar to library (LBR) files in that they take a logical group of files and put them together in a single file. The main difference, is that the members of the "ARC" file are automatically compressed. The compression algorithm chosen is one of three which will produce the smallest file. ARChive files have been available to the MS-DOS and PC-DOS areas, but, have been made useful in the CP/M environment with the introduction of the "UNARC" program. The current version is 1.6, and is available with assembly language source, extensive documentation, and two executable COM files, a 8080/8085 version and a Z80 version. The Z80 version takes advantage of the expanded Z80 (and equivalent) instruction set for speed and size, and therefore is machine dependent. A CP/M utility has just recently been made available to make an "ARC" file. However, because of the resources required, it is still impractical to make Archives in the CP/M environment. ARChive files will be made on systems using other operating systems. ARChive files are identified by the "ARC" as the file extension. This is a packaging method that guarantees no growth during storage. The files contain a "marker", followed by file information, file-data, file information, file-data etc. File contents are analyzed before storage and either stored: 1. AS IS (typically files in the 1 to 200 byte range). 2. With repeat-compression (same range as above). 3. Using Huffman 8-byte encoding. 4. Using Lempel-Ziv-Welch encoding (all others). 8. Crunched - non-repeat packed (DLE encoded). 9. New squashed files created with PKARC. The ARChive technique frees the user from worrying about storage mechanisms and delivers practically all needed services (extract, store, list, type, check, execute and re-compress using "latest" state of compression technique). ARC is "downward" compatible. It is currently heavily used in the MSDOS/PCDOS world, although usage in RCP/M systems is starting with availability of a fast DE-ARCer. The MS/PC-DOS ARC utility belongs into the category of "Share-ware" or "Free-ware" - it is copyrighted by System Enhancement Associates (source-language C, system MSDOS). Phil Katz is the author of PKARC and the current version is 3.5. UNARC was written by Bob Freed for the Public Domain (source-language assembler, for CP/M systems). Some files on RCP/M systems and in the CP/M RoundTable Software Libraries have been compressed, using one of the standard public domain utilities, to minimize download time and save storage space. This topic briefly discusses these compression techniques. Files that have been compressed can be identified by the filetype (the last 3 letters of a filename after the ".") that signifies the compression. These are: .?Q? for Squeezed files (middle letter is a Q). .?Z? for Crunched files (middle letter is a Z). USQ120.COM is used to unsqueeze, or expand files that have a "Q" as the middle letter of the filetype. Such files have been squeezed, or compressed with SQ111.COM or similar utility. These programs use Huffman Encoding to reduce the size of the target file. Depending on the distribution of data in a file it can be reduced in size by 5% to 60% by squeezing it. If you download a file with a filetype indicating that it is squeezed, you will need USQ120.COM to expand it before you can use it. There are other programs available, written in different languages and take advantage of special hardware, but USQ is 8080/8085/Z80 compatible. Other utilities are available that have the unsqueeze coding imbedded and function with squeezed or unsqueezed files. There are programs that perform file maintenance functions (NSWP), bi-directional display utilities (BISHOW), and string search programs, (SEARCH and FINDU). This method of compressing files has been used for some time now and programs to uncompress the files are available to several micro processors and main frame computers. CRUNCH uses the Lempel-Ziv-Welch (LZW) techniques. This method is fast and offers compression ratios around 55%. Highest compression is achieved with graphics data, values of 90% are typical, followed by text, with 50%, and COM files around 20%. This method is relatively new to the CP/M environment. See CRUNCH24.LBR for the Z80 CRUNCH and UNCRUNCH utilities. FCRNCH11.LBR contains the utilities for 8080/8085 compatible processors. CRUNCH Version v2.0 and higher embody all of the concepts employed in the UNIX COMPRESS / ARC512 algorithm, but is additionally enhanced by a "metastatic code reassignment" facility. This is one of several concepts the author, Steven Greenberg is developing as part of an effort to advance data compression techniques beyond current performance limits. He believes this is the first time this principle has been proposed and implemented. Since this method of file compression is relatively new, only a few utilities are available that process a crunched file directly. TYPELZW, TYPEQZ, and LT are display utilities, which also display members of libraries and squeezed files. SEARCH is a file searching program that allows you to search multiple text files for various words or phrases. SEARCH can directly search files within libraries, as well as squeezed and crunched files. Files may also be processed on other systems not using the Z80 processor. A mini comparison of Huffman Encoding and Lempel-Ziv-Welch (LZW) techniques follows. Huffman Encoding expresses each storage unit as a variable length pointer into a frequency-ordered tree. Compression is achieved by choosing a "native" storage unit (where repetitions are bound to occur) and (on the average) expressing the more frequent storage units with shorter pointers [although less used units might be presented by longer pointers]. The Encoding process needs two passes i.e., once reading all units (under CP/M and MSDOS 8 bit bytes) to build the frequency ordered tree (also called the "dictionary") and then translating all units into their respective pointer values. Original filename, dictionary and pointer values are stored - by convention the second character of the filename extension is changed to Q - reminder of a "squeezed" file. LZW expresses strings of 8-bit bytes by pointers into an "ordered" string-table. The rules for "constructing" the table are reversible, so that Compressor and De-Compressor can build their table on-the-fly. LZW is one-pass, although achieved speed is VERY dependent on language implementation and available physical memory (in general more than 90% of time spent in hashing and table searching). Although early implementations of LZW seemed to need more than 64K of physical memory, current enhancements make a maximum of 2**11 table entries sufficient to handle all cases. State of the art implementations check compression ratio on the fly - and rebuild the table if compression ratio decreases beyond a minimum or rebuild the table on table overflow. Typical Huffman compression ratios however around 33% (compressed file is 66% of original, whereby text is typically compressed a little better, and executable files less). Typical LZW compression ratios average 55%. Highest compression is achieved with pixel-information, values of 90% are typical, followed by text, with 50%, and executable files around 20%. Although the original paper on LZW suggested implementation between CPU and peripheral devices (terminal, disk-drives, mag-tapes) - current usage encompasses file-compression (Unix COMPRESS, MSDOS ARC, CPM UNArc) - high speed proprietary MODEM-protocols ("LZW in SILICON") and "picture transmission" at 1200 baud. Thoughts on CP/M and MS-DOS filename compatibility. Many users now work with both CP/M and MS-DOS systems. Files of the two systems have a compatible file structure , (ASCII text, WordStar, dBase II, Archives, etc), and multi-format disk utilities, (Media Master, Uniform, etc). Unfortunately, although the file naming conventions for each of the systems are similar, there are some differences that demand attention if compatibility is to be assured. Below is a list of the LEGAL characters common to both CP/M and MS-DOS: A-Z 0-9 ! # $ & ' - @ ^ ` { } ~ In ASCII sorting order (same characters): ! # $ & ' - 0-9 @ A-Z ^ ` { } ~ MS-DOS illegal file names (reserved for device names): AUX, CON, PRN, NUL, COM1, COM2, LPT1, LPT2, LPT3 Computer users that are interested in transferring files could standardize on the above characters, (while avoiding the reserved names when using CP/M). This would provide one more area of compatibility. File Types $$$ -- Temporary file, used by PIP and other copy programs as a work file. ACT -- ACT language source file. ADD -- Indicating an "addition" or new update. ADV -- Adventure game. ALG -- ALGOL language source files. APL -- APL language. ARC -- ARChive files. ARK -- ARChive files, used for CP/M files. ART -- Article files. ASC -- BASIC language source statements. ASM -- Assembly Language source code, usually for 8080 assemblers. AZM -- Assembly Language source code, used with Z80MR. BAD -- Bad sector directory entry file. BAK -- Backup file. BAS -- Basic language source statements. Normally saved as ASCII. BBS -- Bulletin board system file. BHB -- Heath Benton Harbor Basic language. BIN -- Binary file. Usually NOT a .COM file renamed. BSE -- E BASIC source. See also, "EBA" and "EBS". BUG -- Bug data/information file. C -- C Language source. Most often BDS C. CAL -- Calc or spreadsheet data file. CAT -- Catalog of file names. CCP -- Console command processor file. CHK -- Check file. CMD -- Command file CP/M 86. COB -- COBOL language source statements. COM -- Machine language COMMAND files for CP/M 80. CPR -- Compare file. CRC -- CRC data file. CRL -- C language relocatable/intermediate file:. DAT -- DATA file. DDT -- DDT file. DIF -- Difference file. DIR -- Directory file. DOC -- Documentation file. DSK -- Disk data file. EBA -- E BASIC source. See also "BSE" and "EBS". EBS -- E BASIC source. See also "EBA" and "BSE". ENV -- ZCPR3 Environment Descriptor file. ERL -- Relocatable pascal module. FCP -- ZCPR3 Flow Command Package. FEX -- Felix language source file. FIX -- Instructions for correct program errors. FMT -- Format file. FOR -- FORTRAN language source statements. GMR -- Grammar file. H -- C Language "header" source statements. HEX -- HEX intermediate file. Most often INTEL format. HLP -- File intended for use with the HELP utility. IDX -- Index file for data file. INF -- Information files. INP -- Input file. INT -- Intermediate code produced by compilers such as CBASIC. INV -- Invoice file. IOP -- ZCPR3 Input/Output Package. LBR -- Library file. Use NULU, LU, LDIR, LUX, LTYPE to manipulate. LIB -- Library file assembly source module. LST -- Listing files, intended for printing. LTR -- Letter/correspondence file. M80 -- Microsoft M80 Macro assembler source. MAC -- Macro assembly source file for M80. MAG -- Magazine file. MAP -- Map data file. MEM -- Memory file. MNU -- ZCPR3 MENU utility script. MOD -- Modification instructions. MSG -- Message file. Timely, not of permanent use. MSS -- Manuscript documents. Input to word processors. MUS -- Music language source file. NAM -- Name file. NDR -- ZCPR3 Named Directory Package. NEW -- Indicates proposed revision to an existing program/release. OBJ -- Object file or renamed COM. OUT -- Output file. OVL -- Overlay command file. OVR -- Overlay: a "part" of a multi-part .COM file. PAS -- PASCAL language source statements. PAT -- Patch for customizing or fixing programs. PGM -- Program file. PIC -- Picture file. PL1 -- PL/1 language source statements. PLM -- PLM language source file. PLT -- Pilot language source file. PRN -- Listing output of assemblers. PRT -- Print files, intended for printing. PTR -- Printer file. PUN -- Punch device file. RAT -- Ratfor language source file. RCP -- ZCPR3 Resident Command Package. REF -- Reference file. REL -- Relocatable/intermediate file. Output from. ROM -- Read only memory file. RPT -- Report file. SAM -- SAM language source file. SET -- Setup file. SIG -- SIG/M information file. SRC -- Pascal source file. SRT -- Sorted file. STC -- STOIC language source file. SUB -- File of commands for input to SUBMIT. SUB -- Submit command file. SYM -- Symbol table file. SYS -- System file. TEL -- Telephone number file. TEX -- Text file. TST -- Test file. TXT -- Text file. TYP -- Type file. UTL -- Utility file. VAR -- Variable file. VMN -- ZCPR3 VMENU utility script. WS -- Text document in WordStar format. Z3T -- ZCPR3 TCAP entry. Z80 -- Assembly Language source code, usually for Z80 assemblers. ZEX -- ZCPR3 ZEX utility script file. nnn -- Used to indicate "volume serial #". xQx -- Squeezed file. Needs to be "unsqueezed" before use. xZx -- Crunched file. Needs to be "uncrunched" before use. File Utilities File name K Description ARC-FILE.IQF 5 ARC file internal structure defined CPMSQV3.LBR 30 SQueeze/UnSQueeze - Turbo Pascal CRUNCH20.LBR 52 Data compression with LZW algorithm DELBR11.COM 13 LBR file extractor DELBR11A.CQ 6 LBR file extractor source code DLU12.PQS 11 A library utility in turbo pascal LBRDSK23.LBR 17 Treat libraries as a logical drive LDIR.COM 2 Directory lister for LBR files LDIR23.LBR 16 Lists directory of LBR file LRUN20.AQM 16 Run .COM files inside LBRs LRUN20.COM 2 Run .COM files inside LBRs LSTYPE.LBR 7 Print multiple files inside LBRs LSWEEP13.LBR 25 Library SWEEP utility extract/view LTYPE17.LBR 17 Types text files inside LBRs LU300.DQC 22 Documentation for LU LU310.COM 21 Library Utility version 3.10 LU310.HLP 1 Help file for use with LU310 LU310.UPD 3 Update info on LU310.COM LUDEF5.DQC 11 Internal structure of LBR files LZW.LBR 52 Compression/decompression Utilities NULU15.NOT 2 A note from the author of NULU151 NULU15.WQ 40 Complete user's guide for NULU151 NULU151.COM 16 Machine lang. Library Utility pgm NULUFIX.ASM 2 Bug fixes for NULU15.COM NULUTERM.AQM 2 Terminal configuration for NULU151 SQ.PQS 13 File SQueezer SQ111.COM 6 Machine language SQueezer, very fast SQUEEZE.TXT 13 Tutorial on SQueeze/UnSQueeze SQUPORT2.LBR 35 Portable SQueeze/UnSQueeze in C lang TYPEQZ12 35 Squeezed/Crunched type utility UNARC-P1.NQT 2 UNARC12 patch for non-standard CP/M UNARC.COM 5 Z80 version of UNARChive utility UNARC12.LBR 108 UNARC utility for CP/M UNARCA.COM 5 Lists, types, extracts from ARChives for 8080 UNCR20.COM 4 UNCRunch for CRUNCH20 and prior UNCR8080.COM 6 UNCRunch for 8080/8085 CPUs USQ.PQS 5 SQueezed file UnSQueezer USQ120.COM 2 Dave Rand's machine lang. UnSQueezer USQ120.DOC 3 Documentation for Dave Rand's USQ120 USQFST20.LBR 28 Fast unsqueezer for Z80 computers November 11, 1987 This text file consists of notes taken at the November meeting of D:KUG (The Detroit Metropolitan Kaypro Users Group). The subject of the meeting was about files, the format of file names, and the public domain programs available to process disk files. B.Duerr