Elements of E-Text Style Version 1.0 9 August 1993 This file should be named ESTYLE10.TXT or estyle10.txt. Copyright (c) 1993 by John E. Goodwin. All Rights Reserved. You may make and distribute verbatim copies of this work for non- commercial purposes using any means, provided this copyright notice is included in all such copies. Contact: John E. Goodwin P.O. Box 6022 St. Charles, IL 60174 jegoodwin@delphi.com [John Goodwin is available to consult, write, and teach courses on E- text issues and Internetworking] Abstract: This manual discusses how to use electronic text (E-text) as a communications medium distinct from the print media. The manual is written in a non-technical style, such as a humanist-of-little-brain might enjoy reading. o You can learn how to write effective E-text for personal, business, and scholarly communication. o It includes sections on preparing forms and texts for electronic response and on writing effective and business-like E-mail letters. o There is a brief section on Standard Generalized Markup Language, a coding standard of interest to humanists. Just to prove how non-technical it all is, here is an exceptional lapse into technical jargon, in case you know what the Internet and FTP archives are: This work is a companion volume to _E-Mail 101_, available free as ftp://mrcnext.cso.uiuc.edu:/etext/etext93/email025.txt. <title> Elements of E-Text Style =Preface= An Apology for E-Text =Part I= Writing for an E-Text Audience =Part II= Specific Differences of Style and Mechanics =Part III= A Very Brief Style Manual =Appendix A= Technical Details: Relationship to SGML and TEI =Table I= Full Table of Contents (go to very end of this file) <Preface> This work grew out of my earlier course notes published under the title _EMAIL 101_. It was originally projected to be a three chapter section concerning the special needs of writers who wished their works to be transportable by the electronic networks. The chapters were not included in the original release as they existed only in outline form. Over the course of Summer 1993 I gradually came to realize that E-text was a communication medium in its own right, with its own needs and conventions, its own strengths and weaknesses, and not merely the bastard child of the print medium. Consequently, many questions of style, long ago settled for print media and fixed into rules in style manuals, needed to be re-examined in light of the new medium. Since, it seemed to me, that no one had set out to treat the stylistic considerations of writing E-text, at least at any length, I decided to expand my three chapters into the present work. I set out to write down systematically some observations I had made concerning the differences between E-text and "ordinary" writing. I treat E-text as a legitimate medium of expression, one that must be addressed on its own terms and without unnecessary reference to how the words might look on paper or how the work might be useful if printed out. For reasons that I will discuss at length in the first part, only a small fraction of E-text will ever see the light of print. While paper may offer a better resolution image and a more perspicuous whole, E-text excels at ease of production and portability. It can be copied simply, transported great distances in seconds by electronic networks, and stored on magnetic media--floppy disks, hard drives, and CD-ROMs-- that are less bulky and cheaper than paper. The extraordinary growth of E-mail in the past few years, from a medium used by a few scientists and government officials to one accessible to millions, often in a humanistic or business setting, demands that we give the writing of E-text the attention it deserves. If you wish to communicate effectively, you will have to master this new medium. It is a necessary part of education--if only we knew what to teach! Good writing is, in many respects, the same for any medium. And the first thing any writer learns is that their** writing must fit both the audience and the medium being used. We cannot pretend any longer that we are writing for print or that our audience will be looking at anything other than a computer screen. ** I deliberately use "their" as an ambiguous pronoun throughout. Just as the print media differ among themselves depending on the intended audience, expected lifetime of the text, and peculiarities of the medium, so E-text differs from print. This work is organized as follows: In the first part we delineate the major differences between the print media and E-text. In the second part, we discuss specific issues such as techniques for designing a visually appealing layout, or representing characters. The third and final part is a brief style manual for writing E-text. It is not offered as a set of prescriptions, but as an example of how the principles in the second part can be realized in practice. + + + In this introductory section, I would like to make a brief apology for E-text. It is not usual, in discussing the print media, to begin a manual on style with a defense of the worth of the medium; however, E- text is so new that many persons will say "Why bother with it?". They deserve an answer. The most insidious objection to E-text is the claim that it is just printed text before it has been printed out. In effect, this denies either that (1) There is any difference between the needs of E-text and the needs of print; or (2) That all text is printed out before being read. The second premise is demonstrably false--most E-mail correspondence and anything longer than about 25 pages obtained over a computer network suffice as examples. The first premise requires a more extended answer, since it is the source of a great deal of confusion. In fact, the entire first part of this work is devoted to refuting it. In this brief apology I will answer two simpler objections: that E-text is so esoteric that it is of no interest to ordinary persons; or that it is so commonplace as to be beneath our consideration. I call these two objections the "Ham Radio" and "Telephone" objections, respectively. Not every communications medium is of interest to a large number of persons. Take, for example, Amateur Radio. Using short-wave radio to communicate requires a fair technical knowledge and special equipment. Because of these two investments, neither the medium nor the skills required to master it are common. This situation is very similar to that of computers in the late '70's. Computers were not commonplace, being owned mostly by hobbyists. Communication and distribution of information was primitive, often by floppy disk passed hand to hand. And the special programs required to create and read E-text--word processors--were uncommon and required special skills. On the other hand, some will object that E-text is now so commonplace that it needs no consideration. You don't read style manuals about how to talk on the Telephone do you? Although some scholars may discuss how telephone conversation differs from the ordinary face-to-face variety, most of us use telephones un-self-consciously. E-text is like typing a letter. Who cares? Although the *mechanics* of talking on a telephone are trivial, the social implications are not. One can point out, for example, that to most people, their parents have become persons that they talk to on the telephone and not persons that they work with every day and see face-to- face. The social implications of this are enormous; the technology trivial. Similarly with E-text: while the mechanics are easily mastered and perhaps of little interest, E-text together with global computer networks make possible a form of community that didn't exist prior to the medium. The sort of community that will form around E-text is different from the kind of communities that are centered on the telephone. Rather than family or casual friends, it is likely to be a community that cares about a single issue or agenda. These communities can range from complex communities like companies or groups of scholars, to persons sharing a single, simple interest. Already, in our society, we find that technology has allowed us to adopt a pattern of individualism never seen in the world before. Most face- to-face communication is with your immediate family, your co-workers, and perhaps a few friends. These friends are not as likely to live next to you as in a small town, and you see them less often. E-text both carries this atomization to its extreme and simultaneous offers a way out from its worst effects. It is possible, using the medium, to form important relationships with persons you have never seen or talked to--this is individual atomism in the extreme. At the same time, E-text provides a communications medium that can go beyond. It solves the problem, inherent in much of our society, of shallow relationships with other humans. These new, deep relationships can be business or scholarly, , or just old-fashioned friendship. Thus communicating well carries social implications that go far deeper than talking well on the telephone. How you write E-text may affect how you *appear* to potential friends, clients, and one day perhaps even family.** ** It is only a matter of time before parents of college children realize they can have a much closer relationship with their children for the 10 dollars a month it costs to open an E-mail account. Despite the unnaturalness compared to talking, in many ways E-text is superior to the telephone as a way to "keep in touch". The telephone requires that both persons be available simultaneously. Most conversations are short and business-like, with marathon sessions being reserved for close family and a few friends. But it is not for writing the occasional personal note that one needs a style manual. Unlike the telephone, E-mail has more serious uses--the same uses that print media have. It is used for business, persuasion, publication, and scholarship. E-mail may become as commonplace as telephone, but it will not be approached with the same casualness. Over the course of the past year or so I have seen collaborations of individuals in many fields spring up. These collaborations at first were of course among computer scientists. Then, in the last couple of years the Scientists have caught on. There are signs that all academic disciplines will soon have such collaborations. The cost in equipment is low and the advantages great. Software for business "working groups" is already in the marketplace. Collaboration by E-mail--and a consequent reliance on E-text--may become the dominant social model for certain kinds of collaboration: E.g., within a company or scholarly community--wherever the persons cannot meet face to face. There are many who say that E-text as we now know it--the typewriter- like production of character-oriented terminals--will soon give way to a new medium, mulitmedia. In this view, newer computers will spawn newer media and the old ones will be forgotten. In five years, ten at the most, E-text will be a thing of the past. Surely, the argument goes, we should not invest time in perfecting a medium that is little better than a fad. Multimedia indeed shows great promise. I have no doubt that soon it will be possible to mail graphic images, audio, and video clips along with text. Printers will print not only color but black and white. And visual formatting information like font, point size, and so on will be sent alongside the basic text. Not only that, but these capabilities will become part of every household, every phone system, cable system, and cellular communications network. Personal computers will replace telphones as the "communications center" of the household. The vision of multimedia is one of old media--color magazines, television, telephone, radio--being reborn in the new guise of electronics. But what do you think will be a large component of each and every mulitmedia message? Could it be that most of it will be E- text? I think multimedia will turn out to be a lot like a letter to home. We may send an occasional picture, or even an audio cassette, but most of the communication will be in our writing. Ultimately, writing is easier than taking photographs or editing video clips--though not as easy as talking. It takes less time, less capital, and less effort. Multimedia may be good for advertising, for writing textbooks, and for fun; but for just plain communicating? If it requires more thought or needs to reach more persons than a short telephone call, it will be E-text. Multimedia will fill the niche of four color magazines, coffee table art books, the biology textbooks, and advertising. Look around you, at your bookshelves, and notice how many have no pictures. Think how many typed letters your office sends out compared to the number of four-color brochures it creates. Most information is disseminated by the cheapest possible means. Right now, electronic text is that cheapest means. As more and more persons learn how to get it, it will become the dominant medium. E-text is the black and white print of the electronic age. The uses of E-text are as diverse as the uses of print. The chief innovation of the new medium is the fact that it places the capability to publish in the hands of *anyone*. The capital required to spread information or ideas has been reduced to a level any person, or at least any community of persons, can afford. The E-text revolution is that individuals are no longer dependent on institutions or even businesses to create, share, and gather information.** Every interest and splinter group, every church or synagogue, every would-be author, student, or scholar can collaborate with others, write, and share texts. ** They are still dependent on hardware, software, and telecommunications. As E-text becomes more and more acceptable, it will become the medium of expression used by the masses. If you wish to reach them, you will have to learn to write it effectively. Education--real education--has always been a rather solitary effort. The right conditions seem to involve access to a good library, a chance to talk with collaborators, write new material, and have it discussed by the community of interested persons. E-text can bring these necessary conditions for education out of the university to the simplest home. E-text is at the stage the European vernaculars were at the time of the Renaissance. There were many doubters who pointed to the established Latin tongue as the medium of communication. But, in time, reality forced even the scholars to yield. A revolution was accomplished in which masses of ordinary people could own books and even on occasion produce them. The implications for society and learning were staggering. Like that earlier time, when print was new, there is now much innovation and experimentation, and the wise practitioner will sift carefully the techniques and suggestions offered both here and by others. In time we shall have our Dantes, our Bacons, and our Shakespeares; the persons who will show us how to make this new medium not only a utilitarian one but a sublime one. For now, let us take those first hesitant steps down that path. <Part I> Writing for an E-Text Audience: Basic Problems Writing for an E-text audience is very much like writing for a print audience, but there are subtle differences. Nowadays, both works destined for print and works aimed at the global networks are likely to be created on a personal computer. The advantage of being able to make incremental changes to a manuscript, and to create near print-quality works with a laser printer--not to mention the advantages of spell- checkers, automatic footnotes and the like--means that both kinds of author will be using a computer. But one will be aiming for an effective and attractive *printed* manuscript and the other will be aiming to accomplish the same end on a computer screen. The difference between E-text and print comes down to two factors: (1) it is not currently possible to create a file that simultaneously looks good in print and on the screen, yet is universally accepted by all computer programs; and (2) the least common denominator computer screen has a lower resolution, a smaller viewing window, and a more limited repertoire of visual effects than even a typewriter. This Chapter will address three questions: o Why write for an E-text audience at all? o Is it possible to write for both audiences at the same time? o How does writing for an E-text audience differ from writing for a Print Audience? In Part II we will explore the extent to which you can have it both ways--strategies for getting as close as possible to the Holy Grail of Electronic Communications, a file everyone can read that looks just as good on the screen and in print. Part III presents the mechanics of creating E-text in the form of a very brief style manual. The issues of the previous two chapters are summarized in a series of suggestions for creating your own effective style. The last Chapter of this part of the course discusses copyright issues that effect the distribution of E-text. These issues, important as they are for print media, become paramount concerns when copying your text is as easy as pressing a button. =Section 1.1= Why Write for an E-text Audience? =Section 1.2= Is it Possible to Write E-Text and Print at the Same Time? =Section 1.3= Differences between E-Text and Print Media =Section 1.4= Version Control <Section 1.1> Why Write for an E-text Audience? The basic position here is that computers are basically machines for creating printed text. This position, contrary to the one taken here, has a number of advantages: (1) The resolution (appearance) of the final product is superior to anything that can be created on the screen; (2) Print media are easier to handle and browse; (3) Paper is universally accepted and readable--no special hardware or programs are required; (4) The product is compatible with all the information-handling systems that have been developed for paper (files, libraries, catalogues, ... ); and (5) The author's copyright is easier to maintain because the final text is harder to copy. These are overwhelming advantages. I call the resulting situation-- paper is the medium of storage and standard of communication while computers and printers are just tools for creating paper--the "cellulose interchange standard". It is well established; it works; and it is hard to beat. It is the norm even in the computing world. The paper bias, e.g. of word processors, is obvious. Against this view is a reality of modern life: it is becoming very much cheaper to store information in electronic form and comparatively more expensive to store it as paper. Let's consider some facts: o A 300 page book takes up a Megabyte of memory--around one 50 cent floppy. o CD-ROM storage lowers the cost to a few pennies for a book. A single CD-ROM can store several hundred books. o An 8 mm video tape can store several *thousand* books. This means that the information in the Harvard University library system, one of the world's largest (6 million volumes) would take up few thousand cassette tapes presently costing less than $100,000. Given that, in addition, you can revise electronic text easily, make copies faster, send it further in less time and at less expense, store it more cheaply, print it out, send it as a fax, and convert it to other formats, it will soon be a commonplace that *even for documents that are designed to be printed out and looked at on paper* the principal means of storing and exchanging information will be in electronic format. So let's get this straight: we are not discussing whether it is better to store and exchange information on paper or electronically. The bulk of information will soon be stored on magnetic media and exchanged electronically. What we are discussing is whether it makes more sense to prepare *electronic* documents that look good printed out or ones that look good on a screen. Paper will become (in fact already is) a luxury reserved for the cream of the information crop--just as four color printing is reserved for Art books, glossy magazines, and advertising, while most other printed information is black and white. You will want the 1% of your information you use most often in print form. You won't be *able* to get or afford print versions of most information, any more than you can afford to buy everything in hard cover or print every brochure in four colors. And is this so bad? Why not have a good print library *and* good electronic text. My public library, a very good one, has around 100,000 volumes and cost several millions to build. If I want something they don't have, I have to wait a week for interlibrary loan, or a xerox copy, or a fax. In most cases I would be happier with an electronic version I can look at *today*. The point is that paper doesn't compete with E-text, E-text is probably information you *would not have* in any other form. It's books you wouldn't have because you can't afford 1000 books right now (but you can afford a couple of CD ROMs); it's text you can view (and download) at your public library that the library couldn't afford in paper; and it's free stuff that can't be distributed for free any other way, because paper just costs too much. Do you have to write E-text? You do if you want your audience to include the 25 million people with E-mail access (projected to be 75 million in three years). You do if you want your message to travel as far as possible--even if it is intended to be printed at the other end. So you will write electronic text that will never be printed because you *have to*. That means you need to learn to write effective E-text, because there really is no alternative. Fortunately, if you can write well in the print medium, you can write well in E-text. We'll give you a few tips in a moment, but first we need to dispel the notion that it is possible to write for both media at the same time. <Section 1.2> Is it Possible to Write for E-text and Print at the Same Time? Here we come to the claim I made above, that it is impossible to satisfy all three of these criteria with a single file: (1) the file can be read by any computer (2) the file creates good looking print (3) the file looks good on the screen You can pick two out of three, but you can't get all three. This is unfortunate, but it is also true. The only kind of file that is *universally* accepted is the plain text file, also called the common =ASCII= file. Actually, even this is an overstatement. ASCII, the American Standard Code for Information Interchange, is a very specific code for representing text. A *fraction* of that code can be translated without difficulty to virtually all computers, including the fifty-two letters of the English language (both upper and lower case), the ten digits, and a handful of punctuation marks.** But the rest of ASCII--some punctuation marks and special "control" characters used by computers (including the common tab!)--are off-limits if you want your message to have a truly global reach. ** It is important to remember that some files that need the full ASCII repertoire, e.g. the source code of computer programs, may not travel well. Anyway, if you want a file that looks good in print (criterion 2) and is also a plain text file (criterion 1), you *have* to give up criterion 3 --the file will not look good on the screen. This is because creating "laser quality" output with a "book quality" appearance requires printed commands, called =markup=, to be interspersed with text. This can be minimized, but the cost is a text without even the visual effects that are possible with a typewriter--underlining, superscripts and subscripts, and diacritic marks to name a few. If you want these effects and others typical of book-quality printing--multiple fonts, automatic footnotes, and so on--then the markup burden makes the file unpleasant to read, i.e. not effective as E-text. Finally, the question of how to get as close as possible to satisfying all three criteria and a discussion of formats and markup is left to next chapter. There has been some success at creating files that look good on the screen and in print--using so called WYSIWYG ("what you see is what you get") word processors or using SGML ("Standard Generalized Markup Language")--but these are not in universal use. I.e., you have to give up criterion 1 to get numbers 2 and 3. So right now the plain truth is you can have any two out of three but not all three. Sorry. <Section 1.3> Differences Between E-text and Print Media Creating a manuscript on a computer is quite a different process from the old fashioned method of revising a manuscript by (literally) cutting an pasting typescript, of maintaining bibliographies, and of checking spelling against a dictionary. Most of the peculiarities of using a computer are true whether or not the output is meant to be the printed page. Nevertheless, it helps to enumerate a few, since these considerations apply in spades to producing E-text: o Computer aided Research and Organization Methods Note taking, creating bibliographies and databases, and gathering information now involves all the techniques discussed in my course notes, _EMAIL 101_.** ** Available free from: //mrcnext.cso.uiuc.edu:/etext/etext93/email025.txt. o Rigidity of Format and Outlining Word Processing programs enforce a formatting and outlining discipline to a degree that would be unusual in the old style. Outlining encourages a strict hierarchical style and the automated formatting features make for a more rigorous observance of whatever conventions are built into the program used. Rigorous formatting is a virtual requirement for E-text, since it is otherwise impossible for programs to tell where a chapter starts, say, or what portions of the text are italicized. Spellchecking programs are another example of (welcome?) rigidity of style imposed by the new methods. Rigorous spelling is what enables the SEARCH command to find all references to a given subject. o Incomplete drafts are more likely to be circulated The ease of making changes leads to a more collaborative style of working in which draft after draft (not uncommonly 10 or 20) is circulated to a large group for comments. Often documents are *never* final, but are instead continuously revised. It is useful to compare this process to the way computer programs are written: First a trial version, or "alpha" version is circulated to a few select individuals. Next a beta version, mostly complete and supposedly correct, is given wide circulation as a trial balloon. Finally there is a succession of ever more refined upgrades ranging from minor changes to major "releases". It is a good guess that most working documents will be produced in a similar way. In a way, this is similar to the print industries "editions" and "printings", except, like the manuscripts themselves, there is more consciousness of the structure of the process when computers are involved. Also, the cost of producing minor revisions is less, so there is less fanfare for a new edition--and more trouble with version control! o Collaborative efforts are easier. When the drafts we have been discussing are circulated by E-mail, the working style discussed above becomes even more natural. o Backup copies are necessary Although one might take the precaution of xeroxing an important manuscript, failing to make backups of works stored in magnetic media is sheer folly for anything that takes longer than 15 minutes to write. There is a whole new discipline of saving work frequently to disk, copying it to backup floppies or tape, and so on. <Section 1.4> Version Control The problem of multiple versions is a big one any time the revision process is easy or frequent. Most computer systems keep track of the date a file was last modified--so you can tell which of seven files.** But even time stamps won't help if some files are exact copies of others--as they should be if you are doing proper backups. It helps to use version numbers like "3.1.5a" to distinguish the multiple copies. ** (three on various floppies, one in a directory "/project/old" and two in directory "/project/new", is the most current) As with any tree structure,** it is often good to use =dotted decimal= notation: Version 5.18.2 means release 2 of minor revision 18 of major revision 5. Version 0.1 is probably a rough draft. ** This concept is discussed in part III of my course, _EMAIL 101_. You have to be careful: this notation can either represent successive versions or divergent versions. For example, 1.4.3 can mean the third minor change to version 1.4, which was the fourth major change to version 4. This is the most common scheme. It provides an odometer- like method of numbering the versions. It differs from an odometer in that you are not forced to increment the next place when you get to the tenth revision. As long as the revision path is a straight line, with each version being derived from the version before it, this scheme will work. It gets into trouble if there are any branches in the revision path. Suppose two versions, [a] and [b], are both derived from 1.2. Does 1.2.1 refer to [a] and 1.2.2 to [b]? This is a natural way to describe *branching* versions, i.e. with a tree notation, but you can't use both schemes simultaneously. It's a good bet that Version Control software--programs that keep track of multiple versions, store them as "deltas", or difference files, to save space, and allow you to recover *any* past version or display differences between versions--will become more common and integrated in word processing software. <Part II> Specific Differences of Style and Mechanics This part enumerates some of the differences between E-text and print media and discusses them in a general way. Actual recommended practice is deferred until Part III, which takes the form of a conventional style manual. In the long run, the reader will find the material in this part more valuable than the style manual. The manual, after all, is only one possible concrete realization of the principles discussed here. It is better to give thought to these principles in the context of your own writing than to slavishly follow the manual. =Section 2.1= Differences Traceable to Physical Media =Section 2.2= Differences in Style =Section 2.3= Differences in Process =Section 2.4= Differences in Repertoire =Section 2.5= Differences in Layout =Section 2.6= Searching and Hypertext =Section 2.7= Copyright Issues =Section 2.8= The Parts of a Book =Section 2.9= The General Theory of Markup (SGML) =Section 2.10= Summary: Basic Tricks of the Trade <Section 2.1> Differences Traceable to Physical Media The basic differences between E-text and print can be traced to the physical differences of the media, and to the fact that the needs of the human reader and computer must coexist.** (1) The human's need for visual relief within a 24 line frame. (2) The computer's need for a rigid hierarchy and consistent spelling. (3) The limitations of the =character set= available (4) The limited possibilities of different renderings of the characters, e.g. by font and placement on the page; and (5) a consequent dependence on =delimiters= and structure for rendering. Taken together, these factors account, in the first instance, for most of the differences that their are.between E-text and print. In this Part we will primarily be working out the implications for representing text and developing techniques for dealing with the limitations. ** I thank Michael Hart for pointing out this second requirement to me. The small viewing window of E-text--commonly 24 lines and often less-- has a number of consequences. Combined with the fact that moving around within a document requires one of: scrolling (moving a scroll bar with a mouse); paging (hitting a single key, such as "return", repeatedly); or searching (using special commands to find sequences of characters); we can see why E-text is hard to navigate, or, as I say, E-text is less perspicuous than print. I think this limitation of the medium is a greater bar to its widespread acceptance than visual resolution. This limited window and lack of perspicuity has a number of immediate consequences for writing style: (1) paragraphs must be short enough to present at least one break in any given 24 line window. For practical purposes paragraphs much over 10 lines are anathema. This means that the flow of thought must be broken up on a finer scale than is common in print--though not, perhaps, as radically as in newspapers. (2) E-text is much more linear than print. Signposts, such as enumeration and other cues, organization, and arrangement in sequence are much more critical. The trick is to structure your argument so that the mainline reader can read it in sequence. Side-trips have a much higher penalty for the reader than in print. This statement, that E-text is more linear than print, seems to go against the promise of "hypertext", i.e. documents in which you can skip around to your heart's content. In fact, it is precisely because E-text is so linear that hypertext is important. It makes navigating E-text manageable. The high penalty for skipping around (or passing through long sections) has a number of other implications: (1) tables of contents should be distributed throughout the text, as a sort of preview of the following section. In effect, these tables become "hypertext menus", allowing the reader to locate the appropriate section with a SEARCH command. This gives as much aid as possible to the reader. However, if the text is long and there are many logical levels, then the full table of contents should be provided at the *end* of the document (Not the beginning! We don't want the reader to have to scroll past a very long table to begin reading.). The Table of Contents is discussed at length in =Section =. (2) footnotes should be located immediately after the paragraph to which they refer. An E-text is logically a scroll. There is no such thing as a "page", except as a arbitrary marker added to synchronize the E-text version with a print version. Because of the small viewing window, the only place you can put a footnote is after the paragraph. In effect, it becomes a "small print" section with added detail. (3) bulleted lists should be relatively short and should not turn into full-fledged tables. Instead, they should be broken up into sub-lists if possible, with no more than ten items in one run. Tables and long lists should be placed in appendices or separate files unless they are exceptionally compact or unless viewing them is necessary to the flow of ideas. <Section 2.2> Differences in Style The most marked characteristic of E-text style is brevity. We have already commented on brevity of paragraphs. The same can be said for the overall work. 200k, or about 70 printed pages, is already quite long. A larger work should probably be broken up into 100-150k segments. Two other stylistic characteristics are =hierarchy= and =rigidity=. Given that computers, in popular culture, are often associated with mindless authority or fascism, these are not promising characteristics for the would-be writer of E-text. The words "hierarchy" and "rigidity" are just convenient labels, however. We could use more complimentary terms, such as "logical organization" and "consistency of style". In any event, the hierarchy and rigidity apply to the formatting and not the ideas expressed. Qualities such as brevity, an organizational structure that helps the reader, and consistency in spelling, grammar, punctuation, and layout, are generally accounted hallmarks of good style. In fact, their is a trend in print media towards these qualities, as well as towards shorter paragraphs, perhaps occasioned by the widespread use of computers for preparing printed text. It might be said that the style advocated is essentially that of journalism and the classic pyramid scheme for writing newspaper articles. This is true to a point. E-text, however, is much more linear than a newspaper article. Above the article level the typical newspaper is a jumble of many articles bundled together in a very large package. The E-text equivalent of a newspaper will almost certainly be a large number of separate files, indexed and arranged in a directory hierarchy.** Long files purporting to be E-journals are very tedious to read, precisely because they violate the brevity maxim. **In fact this is the case with Usenet Newsgroups. Another stylistic difference is repetition. Saying the same thing in different contexts, even verbatim, is more acceptable in E-text than in print. Since it is harder to navigate E-text, repitition** saves the readers time looking up references. Material that is repeated in several places is a good candidate for a footnote or "small print" section. ** Repetition is a technique widely used in computer programming to save the time needed to follow up a reference. In this context it is called "in-line coding". On top of the major stylistic differences, there are numerous minor points of grammar and markup (punctuation) that are covered in Part III. These are almost at the quirk level, and have little effect on style =per se=, so we don't consider them here. <Section 2.3> Differences in Process Electronic text and printed text created on computers are prepared in a different fashion from print. E-text typically passes through more stages and is in a rougher form than print. This does not prove that print is a superior medium because the product is more polished; rather, the capital investment required to produce *any* edition is so high that intermediate drafts are too expensive to circulate. E-text creation is more collaborative and not punctuated by such monumental milestones as "first draft to printer" or "second edition". The stages tend to be so incremental as to blend into each other. Thus, "publishing" an incomplete or rough draft is appropriate for E- text. The medium seems to invite statements that "this section is under construction". I call this the =cathedral model= of text production. A premium is placed on the execution of one's art, collaboration among successive "generations", and grand design, but the product itself is never really finished. The stages of the E-text production process are discussed at greater length in =Section 2.3=. The pervasive sense of hierarchy in E-text affects the writing process. You might think that the rigid hierarchy leads to a top-down process in which each section is outlined in excruciating detail and the writing fills in the gaps. In fact, the actual process is a combination of this and a bottom-up one in which sections are created piecemeal and tacked together as ideas emerge. The ideal working style, like that of building a cathedral, works from both ends. There is both a grand design (far more ambitious than what the author can produce at the moment) and whole sections that are created of a piece. Unlike cathedrals, the parts can be re-organized with ease after construction. There is another respect in which the E-text production process differs from its print analogue. In E-text, self-publishing is the norm. The low capital investment, both in equipment and training required to create the text, all points to self-publishing as the most economical distribution method. The traditional segmentation into author, publisher, printer, distributor, follows the logic of the print production process. E-text needs only an author and a distributor--the distributor being a friendly archive site or bulletin board. Print media can use this same simplified distribution scheme *if* it is in electronic format. It is important to differentiate between distributing E-text and distributing files that are intended to be printed. The later are likely to have special markup, commands, or formatting codes. Often they are binary (i.e. not text) files. Since E-text is easily copied, far more so than text locked up in proprietary formats, it presents a problem for compensating the author. There are four suggested compensation schemes: (freeware model) no compensation--the text is either in the public domain or copyrighted but with a license for free distribution The advantage of this model is that the work gains the widest possible distribution. Without fee, license, or undue copyright restriction, the work travels wherever it is wanted. (shareware model) distribution is unrestricted but there is a licensing fee for use. This is an elegant solution to the compensation problem. Its reliance on the honor system has drawbacks, however. (proprietary model) distribution is restricted by licensing and copyright. This is the common method for distributing commercial software. In effect it assimilates E-text to print media by artificially taking away the natural ease of copying E-text. (patron model) the work is commissioned and paid for by a patron--a university, government, or other buyer. Since the work is paid for by the patron, distribution can be free or by any of the other methods. In fact, the patron model is the common, since royalties. Thus cries that free distribution of E-text will destroy intellectual property are have little merit. In fact, except in the commercial world, intellectual property has little market value and is almost always a public, not a private, good. <Section 2.4> Differences in Repertoire In addition to physical, stylistic, and process differences, E-text has a different repertoire of visual techniques--and consequently different problems. The major problem is the limited number of characters. Unlike even the typewriter, E-text is limited to letters, numbers, and a few punctuation characters *in a single font*. Print-oriented word processors eliminate these restrictions, of course, but they remain for E-text. In addition, the visual effects are more limited even than the typewriter's. Super- and subscripting are not possible, and certain layouts involving lots of vertical space are ill-advised.** Finally, graphic images are presently hard to include with text--at best they are separate files distributed with the text and viewed with difficulty--and such visual effects as parallel columns, and tabular layout do not work well. They are not very robust in the E-text environment. ** more on this below, in =Section 2.5=. The solutions to the character repertoire problem is to extend the character set by a number of techniques: o escape characters, o delimiters, and o tags. An =escape character= is a rarely used character, such as the ampersand or percent sign, that indicates the next character or characters is not to be interpreted literally but as a symbol for some other character. In effect, it acts as a sort of shift key to shift the character set. Thus, "&e" might represent a Greek epsilon instead of an English [e]. =Delimiters= are pairs of characters used to mark off text. The equals signs I have been using in place of italics are delimiters. So are the asterisks I use if I *really* want to emphasize something. Delimiters are so-called because they serve to "delimit" the text they enclose. This strategy, widely used in E-text, replaces *rendering* by *delimiting*. A final technique is =tagging=. Tagging is discussed at length in =Section 2.9= on markup. It extends the repertoire of delimiters by combining delimiters and escape characters in a construct called a <tag>. The tag is a logical unit that indicates an entity or logical unit ("element") in the text. The character repertoire problem becomes most acute when different fonts or formulae are needed. Fonts are effectively handled by the techniques discussed above, but formulas are a very sticky problem. Probably the only solution is to realize that the notation we use for formulas grew up in the handwritten environment. It has been brilliantly adapted to print, but it's adaptation to E-text is new and awkward. All we can do is let notation for formulas occurring in E-text evolve *without reference to their print analogues*. The solution is not to (a) give up and wait for multimedia; or (b) to use print-oriented markup as an interim solution. Programming languages have of necessity experimented with representing mathematical formula. As E-text communication becomes more common, conventions *will* evolve that are elegant and empower, rather than hinder, communication. Some suggestions (and they are only that) for mathematical notation are contained in Part III. <Section 2.5> Differences in Layout Layout of text on the page is one of the major differences between E- text and print media. Naturally, this consideration is dominated by the small viewing window of E-text. In E-text, the paragraph, not the page, is the fundamental frame of reference for the reader. o footnotes, as mentioned above, should be placed at the foot of their *paragraphs*; o manifestations of hierarchy at the chapter level or above do not need the differentiated rendering (special indentation, typefaces, capitalization, and the like) that they have in print media. Instead high-level headings are optimized for searching, using a consistent numbering scheme such as dotted decimal (e.g. 3.5.2)--or else replaced by breaking the document into separate files. Vertical and horizontal space is less important visually, because the reader is conceptually "closer" to the text and unable to appreciate such effects as indentation and vertical spacing. In particular: o paragraphs should not be indented except to mark structural features such as: list items; sub-paragraphs; "small print"; and minor section breaks. Minor section breaks should have a larger indent than list items, to distinguish the two. o vertical spaces beyond five blank lines or so are an annoyance. o lines printed with deep indentation in print media, e.g. letter signatures, date and place of writing, and run-on lines in poetry, should use some other device to set them off. o unlike print, the visual effect of a block of text carries less weight in E-text. Consequently you should not go to great efforts to block text by hand like this paragraph--your efforts will be wasted in a proportional font anyway. Just as pushed margins are to be avoided in E-text, so attempting to line up blocks of text in list items should be avoided. While this visual effect works well in print, it is actually harder to read in E- text. On the whole, vertical and horizontal spacing merges with formal markup in E-text, so that it becomes just one more way of delimiting text. Its role in creating visually pleasing forms is very muted in E-text. Since the reader is so close to the "painting", the effect, which depends on a certain distance, is lost. E-text is not a medium that lends itself to impressionism. Combining "white space" role with the delimiting role is very much an art. Functionality and minimalism are the main virtues of this art. Mostly, it is a matter of being sensitive to the different needs of E- text and print, and avoiding elaborate markup that mimics print techniques that have little meaning for E-text. Tables, multiple columns, and the like do not adapt well to E-text. Although you might think that E-text is the medium =par excellence= for tabular material, tables--perspicuous as they are in print--are very difficult to navigate in E-text. They tend to be long *and* wide. This is especially true of double-spaced tables common in typewritten text. Also, E-text tables are difficult to transport and maintain, since whitespace is the most unstable part of E-text. Various programs may trim, condense, and reinterpret spaces, tabs, and returns. A far better solution to viewing tables is to treat them as spreadsheets. Spreadsheet programs, unlike word processors, are optimized for viewing tables. I would rather have an table in Comma Separated Value format that I can cut and paste into a spreadsheet program than one formatted with spaces.** ** Admittedly, some, but not all, spreadsheet programs can handle space formatting. If you are tempted to include a long table in E-text, try to observe the following: o Put tabular material in appendices or in a separate file so the reader is not forced to traverse it. Last ditch: tell the reader how to jump over it, if it absolutely must interrupt the flow of text. o Redesign the data structure of the table so that it is as narrow as possible, e.g. by breaking it into several logical units--sub-tables-- that can be related by an =index= or =key= column. o Tabular material should have field delimiters other than spaces. Commas are something of a standard, as are tabs, if portability is not an issue. Very often, a table that looks good in print has to be redesigned altogether for E-text. You should constantly ask yourself *why* the table is effective. Does it have to be a table at all or is it really a list in disguise; Does the tabular arrangement make the right comparison; What is the main relationship a user will look for in the table? One example of an organizing principle useful in print but less so in E- text is alphabetical ordering. An alphabetical list is very effective in print because it aids searching. It is also effective in E-text that has to be *modified by hand*. But it is not effective in E-text that is meant to be searched, because it gives up the chance for an alternate organization of the material. Besides tables, layout effects such as =parallel columns= should be avoided altogether in E-text. The likeliest result is that the text will be corrupted and rendered unreadable by a program somewhere along the line. E-text has very different visual needs from print. These are strongly reflected in layout design. Writing visually appealing E-text requires a conscious effort to meet the needs of the E-text medium on its own terms. Whitespace, markup, and structure are all handled differently in this new medium. <Section 2.6> Searching and Hypertext We have already discussed the SEARCH capability of E-text on a number of occasions. In the present section we tie some of these strands together, the most important of which is that The author must be constantly aware of the need of the reader to navigate their text by searching. This imperative leads to a pervasive tendency in E-text: all manner of references, cross-references, and indexing are replaced by a single concept, the =pointer=. The pointer is a sequence of characters that allows the reader to find the reference. This reference may be in the present file, somewhere else in the same computer system, or in print. In print, pointers take the following forms: o cross-references (e.g., See page 37. See also "Dinosaurs"). o glossary and index references o bibliographic citations o mailing and telephony addresses o subject classifications and shelf locations To these the electronic medium (including E-text) adds: o network and other information retrieval references o hypertext links and menus In E-text, the mechanism of pointing is the same for all these categories, and =consequently the syntax should be the same also=. Merely mimicking print forms of expression, with its elaborate formatting rules for footnotes and bibliographies, obscures the underlying unity of the "pointer" notion. In print, the visual differentiation cues the reader in to the process required to resolve (look up) the reference. In E-text, great efforts should be expended to make the lookup process the same for all manner of references. The main practical distinction is between internal references and external ones. Part III discusses this topic in greater depth. <Section 2.7> Copyright issues The most prominent characteristic of E-text, the ease with which it is copied, leads to endless copyright headaches. Even the simplest E-text is likely to sport a copyright, even if the author wishes to distribute it for free, since otherwise who will know that it's for free? Here we just present a few basic copyright concepts: o Everything you "fix in a medium" (e.g., type into E-text) is copyrighted, whether or not you have a notice, which merely announces the fact that you have a copyright; or whether you have a registration, which is legal evidence of your rights. o *WHO* has the copyright is complicated. Usually the author; but it could be their employer. o Copyrights include the right to (1) copy, (2) distribute, (3) display publicly, and (4) create derivative works. For other rights, such as the right to sell these rights to other people and so on, consult a legal manual. o Copyrights, claimed or otherwise, remain in effect for a *long* time. The "public domain" ends around 1917, with rare exceptions. If you *place* your work in the public domain, that's another matter. o A compromise between retaining all your copyrights and is a "freelore copyright"++ like this manual's. You retain a copyright but let others copy and distribute your work for free. This is the preferred approach unless you think your work has commercial value--or if you want to restrict distribution. It lets the work circulate widely and, most importantly, gives permission to do so without losing the work to the public domain. You cannot use this method if you want others to be able to produce "derivative works", however. For that, the Public Domain is your only choice.** ** You could try to write an elaborate general public license, but with few exceptions it is not worth it. Software source code and educational curricula are likely exceptions to this rule. ++ The term "freelore copyright" is not a legal term. In programming circles you will sometimes hear it called a GNU-like copyright, after the GNU project, the first programming project to make extensive use of a non-restrictive copyright for copyrighting software. <Section 2.8> The Parts of a Book In this section we take a brief tour of the typical book and make a few observations along the way. The front matter of an E-text differs somewhat from its print cousin, the main virtue being brevity. No reader wants to scroll through page after page of apparatus to get to text. In a book, it makes a great deal of sense to put tables and reference material at both ends of the book, these being the places one can find most easily. They are also easy to reach in E-text, but most likely the reader wants to begin reading quickly, so the front of the work, at least, is forbidden territory. An E-text should have the following frontmatter: o cataloguing information (the title, author's name, preferred name for the text file, subject classification, and how to get an electronic copy, since the reader may, after all, be looking at a printout); o an advertisement, abstract, or teaser to entice the reader; o copyright information or terms of use (if too lengthy these should be placed at the end with a pointer too them after the copyright statement itself); and THAT'S ALL. Do not ask your reader to scroll through more than this. Tables of Contents and the like belong in appendices at the end or in another file. Whether or not you have an official Table of Contents or other indexing material in an appendix, you should, at the beginning of each major division, have a list of the contents of that section. You can think of these as "menus". A merged version of all these local menus is needed so the reader does not have to search through the entire document to get an overview; neither should the reader have to scroll back and forth to the beginning or end for help navigating. E-texts thus always have *two* tables of contents. The body of E-text is much like that of a print work, except for comments pertaining to length, the pervasive sense of hierarchy (no more than three local levels!), and the placement of footnotes after their paragraphs. The endmatter is likely to contain tables and bibliographies and multiple indices, the *very last* of which is the Table of Contents. The end of an E-text file is a very special place, because it is an easy place to find; yet, unlike the front, few readers start there. It should thus be the location of the most important navigation aid for the document. Normally this is a full hierarchical list of the document's contents with pointers back to the text. With E-text, the notion of a Table of Contents and an Index is blurred. In a book, the index really serves two purposes. It takes the place of the SEARCH command in E-text, except that not every word is indexed (barring the existence of a concordance, of course--a luxury in print). It also serves as a schematic and *alternate* representation of how the text might be organized. Most works could have been organized profitably in more than one way. One way is fixed by the linear organization of the text. The Index provides an alternate organization. E-texts do not really need the first form of index. Computer programs make their own search indices with lightening speed. In effect, you have a concordance for every document. Alphabetic indices are of little use. Not even hypertext programs can navigate them well. Either they present a menu with 26 entries (too long!) or else you have to go through two levels to get to your entry ("select A-G"). Even glossaries are best arranged by topic and not alphabetically, since the alphabetical order is irrelevant to the SEARCH command.** **Not quite true: very long indices profit from "clustering", or physical arrangement in search order. Unless there is some reason that browsing topics in alphabetical order might be interesting in itself, you shouldn't bother. Notice that this is very different from electronic print, where the computer should always be used to create an elaborate print index in the final print- out. The best way to think of an E-text index is as an alternate topical organization of your work. It is especially useful if there are two (or more) *hierarchical* ways to approach your subject. Your layout can only show one way--echoed in your Table of Contents. The others have to be represented by an index. <Section 2.9> The General Theory of Markup (SGML) The International Standards Organization (ISO) has developed a very flexible standard for marking text, Standard Generalized Markup Language.(SGML, or ISO-8879). SGML has a very flexible syntax for describing the logical structure of documents. Its drawback is that, like markup languages that are intended for print media, the burden of the markup makes the text unreadable. SGML goes a long way towards creating a text that can look good on paper, on the screen, or to a program. The problem is that SGML software is not widely available, so although SGML files are portable and *potentially* useful, there is little use for them as yet. A widely available SGML Tags are extraneous material used to mark a section of text. Along with delimiters, they comprise the markup added to a text. A program that uses the marked up text has to recognize delimiters and find the tags. Since we are more or less following SGML, the tags themselves are delimited by angle brackets, like this: <outline> The word "outline" is the =generic identifier= (GI) of the tag. The left angle bracket is the Start-Tag-Open delimiter (STAGO); the right angle bracket is the Tag-Close delimiter (TAGC). Ending tags look like </outline> The sequence "</" is the End-Tag-Open delimiter (ETAGO). The end tag ends with TAGC, just like the opening tag. Thus paired tags themselves become themselves a sort of delimiter, albeit at a higher level than the delimiters they are built out of. They serve as a sort of named parentheses to represent the "nesting" structure of the document. Here is the hierarchical structure of the document represented as an outline: outline chapter 1 section 1.1 footnote 1 chapter 2 section 2.1 footnote 2 section 2.2 In parenthesis notation (a common mathematical device), the same structure looks like this: (outline (chapter 1 (section 1.1) (footnote 1) ) (chapter 2 (section 2.1) (footnote 2) (section 2.2) ) ). Maybe this is a bit more clear: (outline (chapter 1 (section 1.1) (footnote 1) ) (chapter 2 (section 2.1) (footnote 2) (section 2.2) ) ) The parenthesis notation allows the tree structure of the outline, which used to be represented only by the indentations, to be faithfully represented even when the indentation is lost. I.e., we have a flexible method of representing tree-structures in *running text*. The parenthesis are delimiters whose purpose is to make clear the nesting structure of the textual =elements=. If we think of SGML as having "named parentheses" with <tag> being a left (opening) parenthesis and </tag> being the matching closing parenthesis, we have: <outline> <chapter 1> <section 1.1> Section 1.1 text ... </section> <footnote 1> Footnote 1 text ... </footnote> </chapter> <chapter 2> <section 2.1> Section 2.1 text ... </section> <footnote 2> Footnote 2 text ... </footnote> <section 2.2> Section 2.2 text ... </section> </chapter> </outline> Notice the exact match between parentheses above and tags. Each text element is clearly delimited. The section and footnote numbers only appear in the opening delimiter as =attributes=. They would be redundant in the closing delimiters. The reason for the funny names, STAGO, ETAGO, TAGC, and GI, is that SGML actually has an =abstract syntax=. The delimiters "<", ">", and "</" could be any symbols at all (within reason). The choice shown here is called the =Reference Concrete Syntax=. It is a particular choice for the abstract syntax of SGML. In practice, you will almost always see the standard choice. In addition to the tagging of elements, SGML has a very general facility for including text and making the sort of references we discussed in =Section 2.6=. An =entity reference= is meant to be replaced either with a character or with the contents of a file. It starts with an ampersand (and-sign) and ends with a semicolon. Thus &file1; means include file1 here. And if you can't type an "e" with an acute accent on your keyboard you can use é to get the same effect. Of course your entities have to be defined as part of your document's =entity set=. SGML provides a way to do this. SUMMARY: We have introduced the basic ideas of SGML: representing the =logical structure= of textual =elements= using =tags= as delimiters; the various parts of an opening and closing tag; entities for external references and character substitution; and the notion of abstract vs. concrete syntax. These ideas are useful in developing notations and markup conventions. <Section 2.10> Summary: Basic Tricks of the Trade This part has covered a lot of ground. Creating E-text that is visually pleasing and communicates effectively is an art. Some themes, driven by the nature of the medium, recur over and over. I have summarized these as a series of Tricks of the Trade: TRICK 1: Replace visual rendering with delimiters and other markup, but be sparing. The minimalist wins this game. TRICK 2: Use a tree structure no more than three levels deep for the basic hierarchy. This trick goes hand in hand with the next: TRICK 3: For more levels of hierarchy use data hiding techniques. The point here is to remember that the reader has an unnaturally narrow window on a very wide world. To avoid giving the reader the sense of being helplessly lost, you *must* make an effort to keep the relevant portion of reality small and easy to navigate. TRICK 4: Use pointers to fill the roles of notes, cross-references, bibliographic citations, hypertext links, etc. Pointers are a recurring theme in computer science; they serve to unify a whole series concepts that are visually distinct in the print media. They are used to implement hierarchy and to allow "nonlinearity" in the text. TRICK 5: Think less in terms of traditional categories like "Table of Contents" or "Index" and more in terms of data structure. This trick follows naturally from the observation that logical structure and not its rendering in a particular system should be primary. This is a prerequisite for communicating with readers using *you know not what* software or device. TRICK 6: Use escape characters and tags to extend character set and delimiter repertoire respectively. TRICK 7: Formatting, rigorous markup that looks like visual layout, can meet the needs of humans and computers. The trick is to rigorously use sequences of characters (especially "white space" like carriage returns and spaces) to create what appears to be visual formatting. This simultaneously satisfies the human and the computer. This is a nice trick, but hard to carry too far. In all things use moderation. Being too clever or too idiosyncratic usually marrs the effect for little gain. As always, the main trick is to hide the effort that goes into the art, making the difficult look easy. <Part III> A Very Brief E-Text Style Manual =Section 3.1= Backups and Saving Work =Section 3.2= Compressed Files =Section 3.3= Version Control =Section 3.4= Use of Word Processing Features =Section 3.5= Character Set and Font =Section 3.6= Outlining and Hierarchies =Section 3.7= Text Inclusions =Section 3.8= Esoterica This chapter is meant as a concrete example of the suggestions in the previous two chapters, in the form of a "style manual". You should take these guidelines as suggestions you may want to adopt, not as rigid rules. <Section 3.1> Backups and Saving Work RULE 1.1 You should always keep two copies of any electronic text you would mind not having one day, one on your hard disk and one on a floppy. The floppy is far more likely to fail, so you should consider keeping two floppies. A common scheme if you don't work at home is to keep two backups, one at home and one at work. Alternate which one you revise so that you will always have the most recent one at home and the next most recent at work. This is so that if a fire or other disaster destroys your work records you still have the most recent copy. If you work at home, make sure your two sets of backup disks are in different places. That way an accident with a strong magnetic field (found near motors, in telephones, in TV monitors, etc.)--or a spilled cup of coffee--will not wipe out both copies. RULE 1.2 (Archive copies) You should have *both* an archive copy of each important "milestone" version *and* a set of backups. Backups are usually snapshots of your system. If you delete a file from your hard disk and then revise your backup, you will no longer have the file on your backup disk! Even if you put your backup set aside from time to time as an archive of "My System, December 1992", One day, you will decide to recycle those disks and lose your copy. The Moral: you need both an archive copy of each important project and a revolving set of backup disks. (I call mine "A" for archives and "B" for Backups). Checklist: o original working copy on hard disk o second copy on hard disk for really important files o daily archive of important work, organized by project o most recent revolving backup set at another location (weekly or monthly; more often for critical files). o second most recent backup set on site. Remember the basics: *at least* one backup and don't put your eggs all in one basket. If you think I sound paranoid about this backup stuff, trust me. Do this or you will get burned one day. I know what I am talking about. RULE 1.3 (Exception to Backups) An exception can be made for E-text that can be easily obtained over the network if you don't modify it and *if* getting a replacement would not be burdensome. In effect, the network is your backup copy. But beware that what is on the network today may not be there forever. You can also "forget" about backups (but not archive copies) if your computer is on a local area network and you know for a fact that backups are made over the network on a regular basis. Many businesses, recognizing that most persons would rather risk losing a months work than spend five minutes backing it up, make systematic backups, often using automatic systems that work at night, when the network is quite. That is nice, but remember that you can still lose nearly an entire day's work if disaster strikes just before you go home for the day. RULE 1.4 (Saving Your Work) Unless your word processor has an autosave and recover feature, you should develop the habit of saving your work at least every fifteen minutes and whenever you get up to leave your workstation. <Section 3.2> Compressed Files It is possible to compress text files to around half their original size. Of course, you have to uncompress them before reading, but in effect you can double your hard disk size with *software*. File compression is becoming a standard feature of many programs and systems. Compression works because text has very regular patterns that can be encoded more compactly than the standard encoding. Files that have more random bit patterns--binary files like programs or graphic images-- seldom compress more than a few percent. RULE 2.1 Never compress any file except a text file. I would be a bit leery of compression. It trades memory, which is fairly cheap, for your time, which is expensive. Also, it complicates the strategies your software has to use--what happens if your system goes down in the middle of uncompressing an important file? Compression is here to stay, but I recommend you follow this rule: RULE 2.2 Only compress things you keep around for archival purposes-- old reports and projects, things you want at your finger tips but don't use day-to-day. To give some further guidelines, compression makes a lot of sense in these cases: o compressed files make good archive copies, at least if you are keeping the file to feel safe and not because you need it regularly. o compressed files are good for network transfers because, for text files at least, they cut the time in half. o more subtlely, there is a limit to the amount of hard disk you can safely use--you shouldn't use more than you can backup in 10 minutes a week or half an hour a month. If you make your own backups on floppies, that means that 80 Megabytes is about tops. More than that and you have 100 backup disks (times two sets!) to deal with. Probably you get sloppy. File compression means you can get twice as much stuff on your disk without increasing the backup burden, so you save both time and space. SUMMARY The most important thing to remember about file compression is that there is a trade off between time and disk space. The fact that you can get twice as much on your hard disk is traded against the fact that it takes time to compress and uncompress files. Given that memory is very cheap this is not always a good trade. The most likely outcome is that by keeping too much useless stuff on disk you're setting yourself up to waste a lot of time. <Section 3.3> Version Control Version control is important. It is easy to keep on top of *if* you bother. If you don't, one day you will modify the second most recent version of a long manuscript and then have to figure out the differences between two variant documents and "merge" them into your next draft. Then again, you could give up all the work you did and go back to the old version. Or maybe you would like to follow this rule: RULE 3.1 Versions should be numbered consecutively in "dotted decimal" notation. E.g., 2.4.1 means version 1 of sub-version 2.4 of main version 2.0. You can add the version number to the heading of the file or make it part of the file name. This is hardest to do in DOS, where filenames look like DRAFT241.TXT meaning version 2.4.1. Version 0.1 up to, but not including 1.0, are reserved for "drafts". Version 1.0 is the first public release and Version 1.0.1 its first minor revision. RULE 3.2 In general, the primary version of your work should be in the format your word-processing program considers to be "native". Plain text files should be derived from this master copy. Creating a plain text version and then re-importing it to the word processor will often result in problems. The word processor's native format (the one that understands all the nifty features) is proprietary; i.e., it is not directly portable to other systems like plain text. Something is lost translating proprietary format to plain text and back again. The most common problems are: o A "hard" return is located at the end of every line, making editing difficult because you constantly have to adjust the length of each line by hand, or else use the "fill" command on each paragraph; o Unusual "line wraps" result from incompatible line lengths; o Lists of items that are supposed to be on separate lines are compressed into paragraphs; o Visual formatting like the spaces or tabs before an indented block quote, vertical bars alongside paragraphs, and similar things are scrambled. o Structures requiring elaborate spacing or tabbing like outlines, tables, or section headings are confused; o Double and single spacing is mixed up. o Special symbols and codes are no longer readable. These problems cause severe version control headaches unless you follow the "master copy" strategy. SUMMARY Strategies for avoiding these problems in general are given in the next section. But in general you can avoid them if you follow this basic strategy: (1) Always consider native format the "primary" version and plain text the "derived" version. (2) Never use any feature of your word processor that can't be easily translated into the plain text version. The next two sections concentrate on just which features you can use. <Section 3.4> Use of Word Processing Features From the standpoint of creating effective E-text it is extremely important to understand the following concepts: hard return : a control character that signals the end of a line of text. The actual code, an ASCII character, varies from computer to computer. This is a source of many formatting problems. filling : many word processors are capable of adjusting the length of lines automatically in a process called paragraph filling. This can either be automatic or on command. In the older method, the line "wraps" when you reach the end, but if you make editing changes you have to select the "fill" command. Newer word processors constantly refill the paragraph as you make changes, adjust margins, etc. formatting codes and markup : in order to represent all the effects you can create on paper using a text file, it is necessary to add additional characters that control the formatting of the document-- italics and underlining, fonts, superscripts, and the like. These codes can be typed letters like ".cl" or "</p>"; or they can be "invisible" on the screen but nevertheless present in the underlying file. bit-mapped vs. character-oriented screens : The screen is represented in the computer memory as a series of black and white dots, called pixels ("picture elements"). There are two kinds of screens, those that can only represent characters and those that can draw any graphic shape (including any screen font ever devised, any line or shapes, patterns, and complex artistic images like photographs and computer drawings). Character-oriented terminals only have WYSIWYG : "What You See Is What You Get" is a strategy adopted by many word processing systems that run on bit-mapped terminals. A single underlying file can create font size and rulers : fonts in a traditional character-oriented screen are all the same size--usually 80 or 132 characters per line. In a bit-mapped system the fonts can have any size. Some fonts are fixed width, meaning that any character takes up the same width (and hence there are the same number in every line); in proportional fonts the characters have different widths. The line expands and contracts when you change letters. With these concepts in mind, we can discuss how to create E-text that is meant to be read as E-text. The main problem is that not every word processor can read files from every other word processor. The least common denominator is the plain text file, or ASCII file. ASCII means "American Standard Code for Information Interchange". The ASCII code includes the characters commonly found on an American typewriter keyboard plus some "control characters" representing actions like "carriage return" or "horizontal tab". The issue of which characters you can use is discussed in the next section. When using a word processor you have to be careful because it is not always obvious which features will come out well in plain ASCII. Word Processors compete on the basis of their wonderful features. Often, however, the fancy features you paid for cannot be used in the real E- text world. They are oriented towards producing pretty paper, but will *confuse* other computers unless they are running identical software. You will not be able to represent o bolding o italics o underscores o superscripts o subscripts o indenting and margins (except by spacing--not tabbing) o soft returns o multiple proportional fonts o double columns o special symbols or formulas o included graphics and spreadsheets and so on and so on. This means that you must forgo essentially anything that needs a formatting code. In newer WYSIWYG word processors, it may be hard to tell what is formatting and what isn't. In general, you have to think like a typewriter. *sigh* RULE 4.1 Change to a non-proportional font, preferably 10 point (elite) or 12 point (pica) and 6 inch distance between margins. This works out to 72 characters for elite or 60 for pica. To be safe, lines should not exceed 72 characters; but in no event should there be more than 80 characters without a hard return. Actually, 60 characters per line (12-point non-proportional font with a six inch ruler) is more portable, because it can be read both on standard 80 character screens and with the default settings of most word processors. If you use the 72 character line, some users may have to select the whole text and convert it to Courier-10 to read it in a WYSIWYG word processor without funny line wrapping. Not all users are that sophisticated, so you are better off using a 60 character line unless you have a special reason to go with 72. Also, short lines are easier to read, as you will learn in any speed reading course. RULE 4.2 Don't justify the text, but keep all text "ragged right", like typescript. RULE 4.3 Don't hyphenate words. Let the right column look uneven. If someone is using a different screen width, your hyphenated word could end up looking like "this in the mid-dle of the text". Also, SEARCH commands choke on hyphenation. There is probably nothing you can do to prevent your word processor from breaking words that have "real" hyphens in them and happen to fall at the end of a line (remember this when *you* have to search). RULE 4.4 Start text flush with the left margin and don't add spaces to create an indented effect. Do not use indentation, tab stops, or spreadsheet like tables to format your text. If you want to include spreadsheet data, use Comma Separated Value (CSV) text like this: "January Actual","January Budget" <hard return> 23201.45,20000.00 <hard return> You can cut and paste this into any Spreadsheet program. RULE 4.5 If you use the autofill feature to avoid having to type return, make sure your word processing program has a feature that will insert "hard" returns at the end of each line when you create your plain text output file. In Microsoft Word, this is the "Save as Text with Linebreaks" command. If you use "Save as Text" you get returns at the end of every *paragraph*, not every *line*. Someone with an old-fashioned text editor--one that likes hard returns after every line--will see *very* long lines (and probably truncate them to boot). You may have to experiment to find the equivalent command in your system. RULE 4.6 Don't use special characters like non-breaking spaces or optional hyphens to dictate where line breaks occur. These features are not portable. RULE 4.7 Try to prevent your word processor from hyphenating words on its own. It's OK to break a word that has a "hard" hyphen at the hyphen. That is, if a hyphen is a normal part of the word's spelling and the word processor decides to break the word at the hyphen, don't worry. But you should try to avoid breaking sentences that have dashes--like this--at the double dash. Sometimes the word processor will break a double dash in half. RULE 4.8 Use single spacing with two hard returns between paragraphs. Many WYSIWYG word processors allow single, double, or triple spacing between lines. In the text file, however, there is not necessarily two returns between each paragraph. Double spaced text *is* much easier to read on a screen, but it is hard to re-paragraph. The two returns between each line tend to make word processors think that each line is a paragraph. In general, it is easier to RULE 4.9 Keep paragraphs short, say around 10 lines. Paragraph breaks form a visual guide for the eye. A book that has paragraphs spanning whole pages is hard to read. Similarly, on a 24 line screen it may be difficult to read paragraphs longer than 20 lines. Even if you naturally express yourself in paragraphs of 7 to 10 sentences, you should break your progression of thought into shorter segments after writing it down, if you want to reach your audience. If I see a paragraph that fills the whole screen, I tend to want to scroll down and skip ahead. <Section 3.5> Character Sets and Fonts In order to be portable, a document must be coded as a text file. The American Standard Code for Information Interchange (ASCII) represents each character as a seven bit number. There are variant dialects of ASCII, especially for languages other than English, but the variations do not affect the subset we will be discussing. In particular, there are many extensions of ASCII to eight bits, of which Latin-1 is the most popular. These extensions are *not* portable, and hence not discussed further. In order to be as portable as possible, ASCII text , or plain text, must observe a number of conventions: Rule 5.1 Use only the 84 character subset of ASCII consisting of the twenty-six letters of the English alphabet (both upper and lower case), the ten digits, and twenty-two punctuation marks in Table 1 below. Do *not* use the ten bad characters: o dollar sign o pound sign (number sign) o at-sign o carat (circumflex) o tilde o back quote o backslash o vertical bar, and o curly brackets These symbols do not translate well into character sets in other countries. TABLE 1. The 22 Legal Punctuation Marks o comma o period o colon o semicolon o exclamation point o percent sign o ampersand o asterisk o parentheses o hyphen o underscore o plus sign o equals sign o square brackets o apostrophe o double quote o angle brackets (less-than and greater-than signs) o question mark o and (forward) slash. More briefly: !%&*()-_=+[];:'",.<>?/ The reason why these characters are fine and others aren't is obscure.** ** There is an international standard, ISO-646 that adapts ASCII to non-English languages. Part of its character set, the "invariant subset", is the same on all keyboards. There are also obscure problems translating ASCII to its IBM mainframe equivalent, EBCDIC. Even the 22 legal characters are too many in some circles. RULE 5.2 The only "white-space" allowed are spacebar and line endings (carriage returns). Horizontal tabs and other "control characters" are not portable. Actually you can use tabs for text that is not going to pass through unusual or difficult conversions. For example, if you are sharing a Spreadsheet by E-mail you can probably exchange a tab-formatted file rather than using the (safer) Comma Separated Value format. Sticklers will point out that Rule 5.2 means we allow an 87 character subset of ASCII. <Section 3.6> Outlining and Hierarchies RULE 6.1 Impose a relatively rigid outline (hierarchy) on your manuscript and reflect that hierarchy in a rigid formatting scheme for the section and chapter headers. E.g., this manual uses angle brackets *plus* two spaces *plus* a section title. Each section is preceded by two blank lines, each part by three. Sections are numbered in dotted-decimal form. Conventions like these allow casual searching, or "navigation" of the document. Unless you have a "hypertext" document that lets you skip around easily, such guideposts are necessary. In designing markup conventions, you should keep in mind that it is more valuable to represent *logical* structure than to try to mimic the *physical* appearance of a printed page. Thus, o it is wasteful to use vertical space to try to mimic vertical layout of a printed page, because the resulting effect looks disconcerting on the screen. Use the number of blank lines to represent the logical structure of the document instead. o use flush left headers for the top levels, indent a couple of spaces for lower level. At the very lowest logical level, just skip an extra line between paragraphs and don't bother with a separate title for the header. o try "tagging" important breaks with special characters like angle brackets, a row of hyphens "----------", or a decorative break like this: + + + o don't overuse all caps in titles, especially for Section breaks. They scream too loudly. You can't really mimic the print-based effect of small caps on the screen. o don't bother developing elaborately different formats for headers that are seen infrequently (e.g. chapters, parts, or "books"). Concentrate instead on the sections, sub-sections and minor breaks. The "wide-area" structure is more simply represented by dotted-decimal and the low level structure by visual formatting. Remember that the global appearance of the document is much less important than it is for a book, since the user never sees the document as a whole, only small local sections. In fact, the highest level logical divisions are probably not visual at all--they are the breaks between computer *files*, or even a directory hierarchy. And--ever more commonly--hierarchies of *computers*! This leads us to the next rule. In preparing your outline, remember the following rule: RULE 6.2 Never nest a hierarchy or outline more than three levels deep without hiding some of the structure. There is a great deal of structure in the computer world. Countries contain domains contain networks contain companies contain individual computers (nodes) contain directories contain subdirectories contain documents contain chapters contain sections . . . (*whew* that was ten levels). You *must* try to hide some of this structure from your reader. The easy way to do this is to narrow in on the local focus and pretend that what we're looking at right now is the *only* thing in the world. It is impossible to read a document and simultaneous think of its place in the wide world. Forget the tree structure of the whole network or computer system; let the reader focus on the local tree-structure. And, whatever you do, don't let the reader know they are more than three levels deep. <Section 3.7> Text Inclusions We have already discussed basic formatting issues like paragraphing, line length, and basic layout. This section concentrates on the myriad details that bedevil the typist. We save most of the *really* technical stuff like tables, foreign languages, and formula, for =Section 3.9=. In this section we discuss very common inclusions in text-- <Section 3.7.1> Alternate Fonts RULE 7.1 Use markup to represent *logical* emphasis rather than particular font effects. Here are some typical reasons and traditional *print* renderings: o emphasis (italics or underlining) o strong emphasis (bolding, all caps) o interior dialog (italics) o editor's emphasis (italics) o foreign language phrase (italics) o book title (italics, underline) o article title (quotation marks) o new term, index term, glossary item (italics, quotes, underline) As you see, italics are overused and the choices are not always consistent. In order to make your meaning PERFECTLY CLEAR, it is best to observe this rule: RULE 7.2 Prefer delimiters for marking inclusions. Use different delimiters for different purposes. A delimiter is a character or pair of characters that is used fore and aft to set off text. It is not a bad idea to develop a set of guidelines for how to render each sort of inclusion. Here is what I use: o now *this* is emphasis (and *strong* emphasis) o as I said, markup is part of =la vie=. o or we can introduce a new "term" like this (See "taxonomy"). o book titles, like _Elements of Style_, are a snap. I also have a series of conventions I use for special situations that arise in scholarly text, such as multiple languages or included math text. By the way, avoid the effect that results when you try to _mimic_print_media_by_underlining_in_this_fashion_. The result is tedious and leads to long words that don't wrap well. In E-text, a pair of underlines is just another delimiter, nothing more. RULE 7.3 In E-text, always place punctuation *outside* delimiters. Otherwise, the E-text looks "silly." Better: "silly". In print, you put the punctuation after a quotation on the "inside." This looks good in print but terrible on the screen. If your E-text is destined for computer screens (and automated search programs) it is better put the punctuation on the "outside". If this disturbs you, remember that in the last century the printers rule (as I have seen in many books,) was to put *commas* inside parentheses as well as inside quotation marks. We are allowed to change these conventions from time to time. <Section 3.7.2> Quotations and Included Blocks of Text There are a number of ways to include quoted or included materials. One, favored in print, is to push the margin of included text inwards, like this. You should use this technique *very* sparingly. It requires a hard return and hand spacing for each line. Reformatting (to shorten the passage, say) is very difficult. WYSIWYG word processors let you shift margins on a per-paragraph basis. This feature is not transportable so you can't use it for E-text. In E-mail correspondence you often see the convention that a right angle bracket in the left column sets off correspondence. Often, this continues to the point of inanity: >>> I said I don't like the President's new policy >> O yeah? > Yeah. O yeah? > Not only that, you're an idiott Well so are you. And you don't spell good either. Our ability to reconstruct the whole train of correspondence is a poor trade for legibility. Another device to avoid is the frequent use of vertical bars alongside text to indicate changes. Although most computer keyboards have a "vbar" (usually a shift-backslash) this character does not travel well and the visual effect is lost in some fonts or if the line length changes. An alternative to the vertical bar is to mark changed sections with double brackets: [[Our new improved widget has a longer lifetime and higher customer satisfaction rating.]] More elaborate schemes for marking changes are discussed in the section on Editing and Marked Sections. In summary, we have this rule: RULE 7.4 Avoid block quotes and text with vertical lines to represent additions or changes. Just use conventional quotation marks or a special "delimiter" like double square brackets. <Section 3.7.3> Lists There are two basic kinds of list, ordered** and unordered. Unordered lists often have "bullets" in front of the items. ** Also called enumeration's. RULE 7.5 Indent list items at least two spaces and make sure list items are in separate "paragraphs", i.e. with a blank line between each item. This prevents formatting problems that occur when the word processor decides that a list is actually a paragraph and pours it, bullets and all, into a rectangular shape. RULE 7.6 Do not use a "hanging indent" for list items. Let subsequent lines run to the left margin. o This is an example of a list item that looks good in print but is hard to re-format in E-text. o This second list item is more typical of E-text. You can reformat it without deleting lots of spaces at the beginning of each line. Also, as mentioned in Part II, the visual effect of straight line margins is less important in E-text. You don't gain all that much visually by going for the pretty-but-hard-to-format look. <Section 3.7.4> Cross References, Hypertext, and Embedding References to other parts of the text should be set off so they can be found. Cross-references are of several sort, all related: o Cross-References to other parts of the document: See Section 3.4, See "UNIX" in glossary, Page 43. These cross references are essentially pointers that urge you to leap over the intervening text. This is easy in print media, where you have all the pages in your hand. With a computer program you have to use the comparatively clumsy method of manipulating the keyboard or mouse to move around. With plain text, the only rational approach is to use the "search" or "find" command of the word processor to locate the passage. The art comes in guessing good "strings" (sequences of letters) to "search" for. o Hypertext references (outline overview, hypertext menu and references) Many word processors allow you to "navigate" a document by traversing an outline overview. In what amounts to the same thing, "hypertext" programs often implement the natural tree-structure of a document by a series of menus representing the possible "branches" available at each "node". This is the computer equivalent of the dime-store "interactive adventure book" in which you get to choose the plot developments by making choices like "If you want to rescue the damsel go to page 43; if you want to kill the ogre, go to page 136." o External File References Here the point is that we can name other files and directories--and even other computers, e.g. "rtfm.mit.edu:/pub/usenet" means subdirectory "usenet" of directory "pub" on computer node "rtfm" at M.I.T. o "Bibliographic Citations" of print media The Bibliographic citation, either as a hypertext link in the text (footnote) or as a list of references (menu) is a subject of great attention in print media, with all sorts of elaborate formatting rules. o Embedded Figures and Included Files Very often, word processors (and long computer programs) have a master file that looks something like this: include <Front.Matter> include <Chapter.1> include <Chapter.2> include <Chapter.3> include <Chapter.4> include <Chapter.5> include <Appendix.A> include <Index> This master document sews together a bunch of smaller files. In advanced programs, you may be unaware that this process, called file inclusion, or embedding, is taking place. File inclusion is especially common as a solution to the following problem: how do you include material that is "foreign" to textual matter, say a graphic image or drawing. If you just cut and paste the text, the program will mistake it for part of running text, often with dire consequences. The solution is to keep the offending material in a separate file and have only the file reference in the text itself. Then all the word processing program has to do is You can immediately see that all these are applications of a single idea, the idea of a "pointer", or "reference". One part of the document points to another. We are supposed to imagine--and the program is supposed to make us think--that there is a bridge from one place to another, or that the reference can be expanded to that we can enter into the other file or location and get back again. Thus, the "See Reference", hypertext link, or external file reference really amount to the same thing. The point is that in all cases we need is someway to represent the starting point (reference or pointer) and ending point (anchor point) of the arrow. The World Wide Web uses Hypertext Markup Language (HTML). An HTML cross-reference looks like this: The corresponding target, or anchor, is marked this way: One soon tires of making up unique names to allow each cross-reference to mate with its anchor. It is more natural to use the document's natural tree structure (perhaps represented by dotted decimal) for anchor identification. Admittedly, this lends itself to dangling references like "See page 25" when page 37 is the correct page for version 2.3. Correcting these references is probably less work than typing the ungainly syntax of an HTML cross-reference. If we are not creating a source document to connect to the World Wide Web, a simpler method is to delimit the reference with equals signs, =See Index=, and the anchor point with angle brackets. This has an added advantage if you are using equals signs to delimit italic text, since glossary entries are often rendered in italics. You can see how natural this is given the section marking scheme adopted here (See =Section 26.6.4= below). <=Index> This is the anchor point for the index reference made in the above paragraph. The equals sign is optional. It just serves to mark the tag <index> as an anchor point. RULE 7.7 Delimit glossary entries, index entries, See references, and so on with equals signs. Use a consistent notation, such as angle brackets, to mark the anchor points. The World Wide Web attempts to link documents with cross-references (hypertext links) on a global scale. The notation developed for this project is called a universal reference locator (URL) and is very similar to protocol://node:/directory/file:port E.g.: ftp://ftp.ncsa.uiuc.edu:/pub/education/README:80. news://comp.sys.mac The "protocol" part has to do with the method of getting the document (and thus implicitly with the classification scheme). The examples here are File Transport Protocol and Usenet News, two common document retrieval systems. "ftp.ncsa.uiuc.edu" is a computer, "comp.sys.mac" is a "newsgroup". "/pub/education/README" is a file in a directory called "/pub/education"; and "80" is a "port number". These details only concern the retriever, who may be just a computer program. The URL notation is easily adapted to other hierarchical schemes used outside the computing world, especially if the syntax rules are relaxed a bit. Here are some ideas: For Books: dewey://stcharles.pub.lib:270.23.07:gilson:4 (St. Charles Public Library, Dewey Decimal, Author Ettienne Gilson, copy 4). LoC://QA.22.4: (a library of Congress citation) ISBN://123-24-55 For a Journal Article: journal://Time:1990.23.56-69 For the Phone System: voice://1.708.840.8069 (A voice number) fax://1.708.840.8069 (A FAX number) internet://jgoodwin:adcalc.fnal.gov (E-mail address) postal://Box.6022:St.Charles:IL:60174 (Surface mail) Or something like that. RULE 7.8 Use Universal Reference Locators (URLs) for worldwide computer file references. Campaign for its extension to other obvious (paper and telephonic) information sources. <Section 3.7.5> Editing and Marked Sections RULE 7.9 Indicate short deletions [and additions] with square brackets. If you need to tell them apart add a plus or minus sign in front. Indicate the version of the change by a version number (single number or dotted decimal) after the sign. This regulation shall apply to each +1[and every] tax payer -2[, except members of this legislature]. We can thus reconstruct the history of this text: Version 0: This regulation shall apply to each taxpayer, except members of this legislature. Revision 1: This regulation shall apply to each and every taxpayer, except members of this legislature. Revision 2: This regulation shall apply to each and every taxpayer. This principle can be extended to whole sections of text except that it is better to use double square brackets since the text itself may contain "innocent" brackets. -2.3[[ . . . ]] means that this section is omitted in Version 2.3. This notation soon becomes wearisome after multiple and intricate revisions. Jim Warren has devised a visual format that makes collating multiple versions in tabular or outline form: 012 This regulation shall apply to each and every taxpayer , except members of this legislature . RULE 7.10 For complicated additions and deletions, such as those found in legal matter, use Warren format. Here are three examples of the formats we have been discussing: [[include example here]] One final rule: RULE 7.11 Don't space between ellipsis. Instead, leave one blank space before and after: ( ... ). Word processors do not necessarily recognize ellipses as a single "thing". The gracious effect of spacing created by a typewriter seems lost on a computer screen. <Section 3.7.6> General Style and Conventions This section is about rules that are conventional to almost all typing. A brief list is included here for completeness: RULE 6.12 Add two spaces after each major break (period or question mark, colon, etc.) and two spaces after minor pauses (comma, semicolon). An exception is made for periods that are part of an abbreviation or initials of a name, where the rule is: RULE 6.13 Allow one space after each initial in a name but not between initials of an abbreviation: J. E. Goodwin, St. Charles, Ill., U.S.A. RULE 6.14 Represent a double dash with two hyphen and do not allow spaces on either side of the dash--instead like this. RULE 6.15 Certain Latin abbreviations do not have internal spaces, nor are they in italics: i.e., e.g., etc. <Section 3.8> Esoterica 99% of all ordinary E-text written in English does not need this Section. But the issues discussed here greatly effect certain kinds of text: 1. Texts requiring traditional scholarly adjuncts such as citations, cross-references, indexing, bibliographies, glossaries, critical apparatus, and figures; 2. Scientific and mathetmatical texts that use formulas extensively; 3. Statistical text with frequent use of numbers, uncertainties (plus or minus), scientific notation, and tabular material. Such text occurs commonly in the physical and social sciences, e.g. reports of experiments. 4. Texts in one language that discuss another (language textbooks, grammars, dictionaries, commentaries, many works in the humanities); <Section 3.8.1> Inclusions in Languages Other than English In English, where diacritical marks are rare, foreign languages are. It is important to distinguish between =transcription= and =transliteration=. In transcription, an attempt is made to render the word as nearly as possible using the English alphabet, with or without diacritics. Precision . Transliteration is an attempt to represent the *spelling* of the word in the non-English alphabet. Great effort is made, in designing the tranliteration system, to make the transliteration reversible, so that the exact original text can be recovered by a knowledgable human or program. These two possible approaches to including non-English text lead to two different rules, depending on intent: RULE 8.1 Set off foreign phrases with the same delimiters used in place of italics (usually equals signs). RULE 8.2 Use special delimiters (for example plus signs or asterisks) to signal special notations used for *tranliteration*. No attempt is made to distinguish three uses of "equals-italics"-- foreign language italics, cross-reference signal, and miscellaneous italics. As in print, these can usually be distinguished by context. Beyond representing foreign phrases exactly, one might want an informal notation for representing the diacritic marks that do occasionally occur in English. Using these is probably pedantic in ordinary E-text, but from time to time they may be useful, e.g. in grammatical discussions: RULE 8.3 In ordinary English texts it is not usual to use diacritical marks, even when the English word technically has them, such as: fac?ade, ro=le, coo%rdinate, blesse!d. If absolutely necessary, we recommend: acute accent: ne/e grave accent: blesse!d circumflex accent, tilde, or macron: ro=le, nolo= contendere diaeresis or umlaut: coo%dinate cedilla: fac?ade The choice of symbols is based in portability (which excludes, for example a tilde or circumflex). Also, the notation is just a little ugly to discourage its overuse. E-texts that discuss foreign languages present special problems. Here are some suggestions: 1. The basic convention is that the primary language is unmarked, and the secondary language delimited by asterisks: *E pluribus unum*, or by equals signs =E pluribus unum=. The choice of delimiter used requires some thought. In Latin, asterisks should be used so that equals signs can be used to represent macrons: Ve=ni=, vi=di=, vi=ci=. Unless there are considerations like these, the asterisk is chosen for the most frequent use in the text (usually italics-for-emphasis) because it is less obtrusive and most conventional. Since such text do not usually contain quotations, double quotes may be used to represent translations or definitions: =E pluribus unum= means "from many, one." In printing, both the foreign text and the translations are often rendered in a different style. If italics are needed for other purposes, they should be delimited by asterisks: =E pluribus unum= is *so* Eighteenth Century. 2. If the text contains a selection of many different languages, special delimiters are used to segregate languages that use the Latin alphabet from others. In this case no effort is made to choose one secondary language as "the" secondary language. Instead, the delimiters are used to mark alphabets that differ visually from the Latin alphabet. = = Languages using the Latin Alphabet, other than the primary language (in effect "language italics"). * * Greek + + Hebrew / / International Phonetic Alphabet Other delimiters can be constructed =ad hoc=, such as &&[ ... ]&& or +/ ... /, (* ... *) and so on. Just a reminder: the recommendations here are strictly for informal use in the context of "flat" ASCII files, e.g. for casual communication, or as character-oriented output from a program that uses a proprietary format or SGML for internal use. Any substantial work with multiple languages is probably worth the effort to use something other than E- text for the *underlying* representation. In particular, scholars should consider the Text Encoding Initiative's recommendation. Even with an elaborate underlying markup system, however, the problem remains of how to render the foreign language text, perhaps a text that does not even use the Latin alphabet, on a character-oriented screen. <Section 3.8.2> Footnotes, Cross-References, and Bibliographic Citations There are two issues here: how to write the citation and where to put it. As to the first issue, citation schemes that work well in print are often cumbersome in E-text. The answer to the second issue is RULE 8.4 Place footnotes at the foot of the paragraph, or else gather them in an appendix at the end of the work. Another common place to put notes, at the end of a Chapter, should be avoided since it is a relatively hard place to find, compared to the end of the file. The inclusion of footnotes in the body of the text with special delimiters, as is done by any word processors, is a concession to print- oriented production of text. It places the footnote where the *program* wants it. From the standpoint of the reader, there may as well not be a footnote at all! RULE 8.5 The footnote mark should be as unobtrusive and short as possible: usually ** or ++, [34], or [Wells85]. . . . as discussed in the paper by Wells.[Wells85] Another . . . . . . again makes this point in Ref.[36], where the bias . . . . . . See the Nichomachean Ethics+[NE,1150a]. . . . Footnotes with a single asterisk could be confused with an "emphasis" delimiter. Putting asterisks in brackets, [*], seems long-winded. RULE 8.6 Footnote sequencing should not continue across physical files. Use dotted decimal notation to refer to "long-range" footnotes: [2.15] means footnote 15 in chapter 2. Designing a good bibliographic citation scheme for E-text means breaking away from print models. Long dashes and hanging indents are useless in E-text. Also, most readers, if they read notes at all, will synchronize two windows so that notes can be read in one and the text in another. *Therefore* it is better to make your annotated bibliography follow chapter organization than to make it alphabetical or chronological. In general, it is a good idea to gather bibliographic references in one place and *not* put them in footnotes, as is common in print. This is because many of the citations will be URL's (see =Section 3.7.4=), which mar the appearance of the text.** ** This assumes the E-text is not being prepared for linkage to the World Wide Web! In this context, our discussion applies more to the output of a WWW server than to its input. <Section 3.8.3> Formulas and Statistical Text There is a great deal of scope for developing new mathematical notations that work well with E-text. I can only make a few recommendations and observations here. RULE 8.7 Use square brackets to set off "math italics", especially variable names embedded in ordinary text. Omit the brackets for displayed equations. This rule is necessary to make variables stand out. Human eyes that are used to picking out subtle font differences find it hard to read text that refers to variables like a where a is the unknown. To repeat, [a], where [a] is the unknown. RULE 8.8 Separate displayed material by one blank line before and after, and indent consistently (five spaces recommended). Here is a well know example: E = m c[2] E[2] = p[2]c[2] + m[2]c[4] where [E] is the total energy, [m] is the rest mass, and [c] is the speed of light in a vacuum. Scientific notation is a travesty in type. One commonly sees such attempts as 1e12, 2.005+/-.01, or 2 x 10 5. We recommend quoting numbers in the following fashion: 1.0E+12, 2.005(10), and 2.E+5. To my eye, at least, the following rules are useful: RULE 8.9A. Always use a sign after the "E" in exponential notation; RULE 8.9B. Always express the decimal in floating point numbers and precede a decimal point by a zero, i.e. 0.05, not .05. RULE 8.9C Represent symmetric tolerances in parenthesis after the base number. A little care here is considerate of the reader and helpful for subsequent typesetting. RULE 8.10 In running text, superscripts and subscripts could be represented the same way as footnotes in the main guidelines, viz. 2+[20] = 4+[10], although the FORTRAN notation 2**20 = 4**10 is more perspicuous. RULE 8.11 Subscripts and superscripts that do not represent powers but represent labels, are conveniently handled like array subscripts: a(1,3) = b(2,4) instead of a+[1]-[3] = b+[2]-[4]. The array indices might use square brackets instead of parentheses. RULE 8.12 For the mixed case of subscripts for labeling and superscripts for powers, we recommend: a1[2] = a2[2] or a1**2 = a2**2 or a(1)[2] = a(2)[2]. The first approach is better suited for long formulas with many powers: (x+y)[3] := x[3]+x[2]y+xy[2]+y[3] (x+y)**3 := x**3 + x**2*y + x*y**2 + y**3. RULE 8.13. Complex expression like summations and integrals can be handled informally as follows: (1/n)*sum(i=0,n; x(i)[2]) or int(x=0,infty;x[-2]). RULE 8.14 Matrices, tables, and outlines are handled in a consistent fashion. 7, 18, 19 -43, 72, 930.1 -1.1, 18, 100 Whereas in print vectors and Matrices are represented by boldface letters, in E-text it is probably best to adopt Paul Dirac's bra-ket** notation, first developed for Quantum Mechanics. Here, the vector "v" is represented as [v>. This notation is well-developed and *can* be typed in E-text. ** The name comes from the following construction: <bra] c [ket>. The vector is called a "ket", the dual vector a "bra", and [c] is the operator matrix. <Section 3.8.4> Verse, Drama, and Liturgy RULE 8.15 Each line is a separate paragraph. There should be two hard returns between lines and three between stanzas. Alternatively, two returns may mark stanzas, with lines beyond the first indented by white space (one space recommended). Three returns can mark longer segments. Only one of these two methods should be used in any one work. RULE 8.16 Do not try to mimic vertical or horizontal spacing of a printed source, unless the visual effect of the poem is the main concern. RULE 8.17 Run on lines (say past 80 characters) can be represented by a slash (/) at the beginning of the line. RULE 8.18 An asterisk, *, is used to mark caesura, pause, or breathing mark. This should be preceded and followed by a space (or return) to prevent its confusion with a footnote or emphasis delimiter. RULE 8.19 Use asterisks to delimit stage directions or rubrics. RULE 8.20 Use special delimiters to mark speakers, roles, or questions and answers. Follow these with two spaces. This helps the reader skip from part to part. Ampersands and periods make unobtrusive delimiters. Brackets are visually more striking: &Ham. To be or not to be. &Pol. That is the question. *or this* &V. The Lord be with you. &R. And with thy spirt. *or this* &Q.1.5 What is LINUX? &A. LINUX is a small, free UNIX-like operating system for 386 computers. <Section 3.9> Electronic Forms and Tests E-text is often used as a medium for distributing forms, tests, and other items to be filled out and returned. Often, these forms mimic paper counterparts at the expense of their purpose--to be easy to fill out and return. Here are some rules: RULE 9.1 Avoid the multiple column format common on paper forms. As soon as you start to fill out the form, the columns don't line up. RULE 9.2 Skip a line between questions. This avoids the dread re-formatting problem. RULE 9.3 Place a left open bracket wherever an answer is required, but not a right closing one at the end. In order to fill in a checkbox, you have to position the cursor exactly in the middle of the box, delete a character and type and "x". It is easier to position a cursor at the end of the line and start typing right away. RULE 9.4 Avoid checkboxes. Ask for a one-character typed answer instead. RULE 9.5 Leave four hard returns (three blank lines) between "short answer" questions The responder begins typing at the beginning of the second blank line. RULE 9.6 Do not use spaces or underscores to show blanks; use periods or hyphens instead. Put them on the line *below* the response area (so the responder doesn't have to erase them and lose count!). Your state or province: [ -- Your zip or postal code: [ ----- This cues the responder as to desired length of the response. Blanks are invisible, except in certain word processors, and underscores are often run together, so you can't count them easily. This sort of form is easy to fill out: Your city of residence (20 characters max): [Chicago, Illinois --------------------. <Section 3.10> The E-Mail Business Letter The paramount rule in writing an effective E-mail business letter is brevity. RULE 10.1 In general, you should omit as much of the traditional apparatus of the business letter as you can, since the mailing system may well add lots of unwanted detail. An effective letter can be as short as: From: jegoodwin To: anotheruser Subject: E-mail <blank line> This is what I have to say. =John= RULE 10.2 Always begin E-mail with a single blank line. This is to allow some visual separation from the mail header. RULE 10.3 For short (one paragraph) messages, use only the paragraph and your name, in-line with the last sentence. Since brevity is the rule, anything beyond a one-paragraph note should be carefully trimmed. The model below is about the *maximum* you can do and still have a brief effective letter. Feel free to omit anything unnecessary. At most, an E-mail letter will have the following parts: 1. Mail Header Do not add a letterhead** or mailing address. The mail system will add enough garbage as it is. Your info goes at the end of the letter. ** An exception is in resumes and advertisements, where catching the readers attention is of paramount importance. There, lots of whitespace and visually arresting designs are welcome. The effect wears off quickly, however, so think twice before adding eye-catching effects to all your E-Mail. 2. Greeting This is optional. "<skip one line>Dear Sir or Madam<Skip another line>" (if you don't know the sex of the person you are writing--very frequently the case, with E-mail), or "Dear George" or simply "George--" 3. Body This follows the principles in the rest of this manual. Remember: flush left. 4. Closing and Signature The closing optional. "<skip line>Your Name<skip line>" is fine. If you want one, don't indent it a half page, as is customary in print. Suggested formal closings are "Sincerely", "[Best] Regards", and "Thanks". I generally avoid "Thanks in advance", since it implies that either you aren't thankful if the person doesn't respond (which is ungracious); or you don't plan to thank them if they do (which is churlish). You may use special delimiters to mark your signature, but keep these light and tasteful. I sign =John Goodwin=. Other persons use two slashes before there name or add a plus (for clergy), etc., etc. This is more distinctive than a signature file. 5. Contact information Since the reader is most likely to contact you just after reading the letter, but info here. RULE 10.4 Keep contact information short, probably only your E-mail address and phone number (two of each, at most) RULE 10.5 Use the international style for phone numbers: e.g., +1 708 840 8069 (work). Note: "+1" is the Country Code for the U.S.A. RULE 10.6 Never, NEVER, include a character-drawing or funny quote in a signature file. //// [oo] <-- This is me!!! "Remember O Man that Dust Thou Art" ---- Many persons use a "dot-signature" file that is automatically appended to all their E-mail. The effect is almost invariably puerile and tasteless. If you include it twice you can add "incompetent" to the list. Here is how it looks all together: To: blah blah From: blah blah Subj: blah blah <--blank line required Dear Sir or Madam: <-- or Dear George, or Dear Ms.Smith [Body of text] [Body of text] [Last paragraph] Sincerely, <--optional close =Your name= <--use signature delimiters for visual effect [Your Contact information] <Section 3.11> The Final Rule And lest the reader forget, RULE B. All Rules are Made to Be Broken. Rules summarize experience and judgment. In this manual I have tried to reflect my own judgment as to what is appropriate, functional, and aesthetically pleasing. I have not always succeeded. If I have spurred the reader to consider their own style and refine it for their own purposes, I will have achieved all my end in writing this manual. Above all, remember, dear reader, Question Authority. It's wrong. + + + <Appendix A> Technical Details: Relationship to SGML and TEI Many of the concerns addressed in this manual are common to participants in the Text Encoding Initiative (TEI) and other users of the ISO standard, Standard Generalized Markup Language (SGML, see =Section 2.9=). I would like to emphasize, for their benefit, that this manual describes a *presentation format* and not an encoding format. It is perfectly possible to create an SGML- or TEI-compliant file that uses the format discussed in this manual as a visual output format. There are very distinct advantages to having a visually appealing, informal, character-oriented format, like the one advocated here, in which the logical structure (i.e. markup) is still present, but not visually intrusive. SGML compliant systems may well produce such a flat file at the request of a user, or the screen output may be cut from the program's display window and pasted into such a file. This style manual has tried to describe design principles that will make the resulting flat file useful and appealing to read. Naturally, there are many uses for such a format outside SGML systems as well; and a certain uniformity, or at least attention to design principles, can only help make the texts created more useful. The advantages of SGML or TEI encoding will only come about if word processors that hide the markup process from the casual user become commonplace and interoperable. Probably, a low-end freeware editing system will have to be created.** Until that time, welcome or not, flat ASCII is not only a visual format, but an interim interchange standard as well. ** Such a system is being created for the LINUX operating system. Once again: this is not a new encoding or input format, nor is it primarily intended as an interchange standard; it is a suggested format for visual *output* that happens to be maximally transportable at the present moment. + + + <Table I> Table of Contents =Part I= Writing for an E-text Audience =Section 1.1= Why Write for an E-text Audience? =Section 1.2= Is it Possible to Write E-Text and Print at the Same Time? =Section 1.3= Differences between E-Text and Print Media =Section 1.4= Version Control =Part II= Specific Differences of Style and Mechanics =Section 2.1= Differences Traceable to Physical Media =Section 2.2= Differences in Style =Section 2.3= Differences in Process =Section 2.4= Differences in Repertoire =Section 2.5= Differences in Layout =Section 2.6= Searching and Hypertext =Section 2.7= Copyright Issues =Section 2.8= The Parts of a Book =Section 2.9= The General Theory of Markup (SGML) =Section 2.10= Summary: Basic Tricks of the Trade =Part III= A Very Brief E-Text Style Manual =Section 3.1= Backups and Saving Work =Section 3.2= Compressed Files =Section 3.3= Version Control =Section 3.4= Use of Word Processing Features =Section 3.5= Character Set and Font =Section 3.6= Outlining and Hierarchies =Section 3.7= Text Inclusions =Section 3.7.1= Alternate Fonts =Section 3.7.2= Quotations and Included Blocks of Text =Section 3.7.3= Lists =Section 3.7.4= Cross-References, Hypertext, and Embedding =Section 3.7.5= Editing and Marked Sections =Section 3.7.6= General Style and Conventions =Section 3.8= Esoterica =Section 3.8.1= Inclusions in Languages Other than English =Section 3.8.2= Footnotes, Cross-References, and Bibliographic Citations =Section 3.8.3= Formulas and Statistical Text =Section 3.8.4= Verse, Drama, and Liturgy =Section 3.9= Electronic Forms and Tests =Section 3.10= The E-Mail Business Letter =Section 3.11= The Final Rule + + + (end of _Elements of E-Text Style_)