proxy70

14-Mar-88 03:18:32-EST,1904;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:A-PIRARD@BLIULG11.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Mon 14 Mar 88 03:18:18-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Mon, 14 Mar 88 03:17:59 EST
Received: from VM1.ULG.AC.BE by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 7381; Mon, 14 Mar 88 03:17:58 EST
Received: by BLIULG11 (Mailer X1.25) id 7697; Mon, 14 Mar 88 09:15:55 +0100
Date:         Mon, 14 Mar 1988 08:45:21 +0100
From:         Andre PIRARD <A-PIRARD%BLIULG11.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      ASCII, ISO and which EBCDIC?
To:           Info-IBMPC Digest c/o Gregory Hicks COMFLEACTS
 <HICKS@WALKER-EMH.ARPA>,
              IBM-KERMIT@CU20B.COLUMBIA.EDU,
              Protocol Converter list <IBM7171@DEARN>,
              LINKFAIL@FRULM11,
              Columbia University Center for Computing Activities
 <INFO-KERMIT@CU20B.COLUMBIA.EDU>

We, ASCII or EBCDIC network users must pay particular attention to character
codes standards, now extending to international. Even sites not interested in
in international characters will sooner or later hit the problem because,
albeit the situation is straight in the ASCII world with an ISO standard,
it is far from that for EBCDIC users faced to a choice of several codes whose
differences lies on a few codes, strangely enough not international.

The subject is discussed on a mailing list set up by Edwin Hart. Joining with:

  TELL LISTSERV AT JHUVM SUB ISO8859 user-name

Or sending a note on BITNET to: LISTSERV AT JHUVM
Containing:                     SUB ISO8859 user-name

can help the community agree on a viable single code or at least help each
site in finding its most appropriate one and save everybody's time and money.

I'll soon post a summary of the problem to that list.

Please forward this note to anybody interested.
22-Mar-88 13:31:54-EST,21373;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Tue 22 Mar 88 13:31:43-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Tue, 22 Mar 88 13:32:04 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 0746; Tue, 22 Mar 88 13:32:01 EDT
Received: by BITNIC (Mailer X1.24) id 0743; Tue, 22 Mar 88 13:21:56 EDT
Date:         Tue, 15 Mar 88 11:17:07 EST
Reply-To:     Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
Sender:       Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
From:         Edwin Hart <HART%APLVM.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Some Important Comments from Howard Gilbert at Yale University
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Enclosed are some comments Howard Gilbert wrote after the SHARE 69 meeting held
in August, 1987.  It is a good summary.  Unless otherwise stated, "EBCDIC"
means "U.S./Canada English EBCDIC".

IBM is very interested in the issues surrounding ASCII, EBCDIC characters sets
--particularly as they relate to National Character Set Issues (See the
SHARE European Association (SEAS) "White Paper on national character,
language and keyboard problems", September, 1985) and the System Application
Architecture.

The Michigan Terminal System (MTS) community has implemented an ISO 8859-1 to
EBCDIC Code Page 37, Version 1 conversion already.  (Brian Eliot

IBM also is trying to decide between two code pages to use as a single
EBCDIC for ISO Latin Alphabet number 1:  Code Page 37, version 1
(which is data processing oriented) and Code Page 500, version 1 (which is
word processing oriented).

Ed Hart


Date:         Thu, 03 Sep 87 11:45:39 EST
From:         Howard Gilbert <GILBERT@YALEVM>
To:           HART@APLVM

I think that what we learned at SHARE is important to a lot of people.
Besides our local users, it should be directed to:

    BITNET technical reps (for ASCII BITNET nodes)
    NOTIS groups (library automation)
    7171  protocol converter list

Here is a first draft of the kind of thing I am thinking of sending out:


    I attended the meetings of the ASCII-EBCDIC Translate Table
committee at SHARE 69 (Chicago) Aug. 23-28.  As someone who has
struggled with this question for 15 years, I was stunned by the
amount of progress which has been made almost overnight.  There
is no solution which will be satisfactory to everyone, but a
solution is now visible and requires only the willingness to
adopt it.  Because of the problem, I will discuss all
characters by name or code value and not attempt to
print them.  I will try to make the presentation as short as
possible while making it complete.

    HISTORY: ASCII was standardized after EBCDIC.  With no
national standard in place, and requirements for BCD
compatibility, IBM extemded its 6 bit character set
BCD to an 8 bit code by moving bits rather than translating with
a table.  Both standards made, what is in hindsight, some regrettable
mistakes.  The ASCII committee placed too much emphasis on the
TTY 33 which had no lowercase letters and lacked braces and
vertical bar.  The choice of EBCDIC printable characters was
influenced by the number of characters which could be placed on a
Selectric typewriter ball.

    The ANS committee (not IBM) recommended that ASCII
exclamation point be regarded as EBCDIC vertical bar and that
ASCII Circumflex be treated as EBCDIC not.  These were "stylistic
differences" in the 1968 standard.  EBCDIC has a cent
sign which ASCII did not match, and ASCII has brackets, braces,
backslash, accent, and tilde which EBCDIC did not originally
position in its tables.  There were some IBM print bands (notably
TN and ALA) which included some other characters, but they do not
constitute a standard nor are they the basis for one.

    Both ASCII and EBCDIC have "national use" characters which
can be replaced in other countries by local graphics.  National
standards bodies in most European countries have chosen specific
graphics for the ASCII positions, and IBM has copied these
choices to the EBCDIC positions.  Technically "ASCII" refers to
the USA version, but everyone uses the term to refer to all the
ISO 646 standard character sets which are similar except for national
use positions.  IBM does not change the compiler.  Therefore, a C
program will print rather strangely in Germany where the German
standard replaces backslash with O-umlaut.  Of course, the
programmers in most countries continue to order terminals and
printers with the US character set or to make that set available
as a special printer setup.  It is interesting to note that
PASCAL was, after all, originally developed in Switzerland where
the offical national languages are French, German, and Italian.

    At the time that Yale ASCII was shipped, IBM had no strong
definition as to the position of backslash, braces, and brackets
in the EBCDIC set.  Some mistakes were made (in hindsight) which
were then perpetuated in the 7171.  However, the translate tables
are installation configuration options.

    One approach to extending the character set is to form
diacritically marked characters using true overstrikes.  In other
words, there is a key marked with an umlaut.  Press the key and
an umlaut appears but the cursor does not advance.  Press "O" and
what displays is O-umlaut and the cursor advances.  This is the
model used by the CCITT for European telex and by MARC and
library systems (such as the ALA character feature on the
IBM 316x). It is also effectively used by the NOTIS and DOBIS
library systems.  It has never been accepted by data processing
equipment manufacturers, who have instead pressed for an entirely
different form of character set extension based on one character
per code position (i.e., each accented character occupies one
code position).

    TECHNOLOGY TO THE RESCUE.  There would probably never have
been a solution to this problem without the elimination of some
constraints.  It may be that there are many devices today which
will not be able to use the solution, but this is a long term
problem which will be solved over years as old devices disappear.
The important change is that microprocessor technology in
terminals and non-impact pageprinters both make it possible to
extend the generally available number of characters from the old
96 to 194.

    ISO (the International Standards Organization) has adopted
ISO 8859/1 and ANSI (the US standards organization) is adopting the
same standard under title ANSI X3.134.2.  It provides a specific
standard set of graphics for 8 bit ASCII code points X'A0' to
X'FF'.  Of particular significance is the assignment of cent sign
to X'A2' and EBCDIC-not to X'AC'.  So suddenly ASCII has all of
the true characters typically found on an IBM terminal.  Of
course, IBM always had an 8 bit character set with many unused
positions.  However, there are only so many keys on the keyboard
or positions on the print band.  The PC and 3800 allow all of the
possible positions to be filled in.  Since IBM does pay careful
attention to standards, the ISO development made it possible for
them to create an internal standard for EBCDIC placement of the
same 194 graphic symbols in the range of X'41' to X'FE'.  IBM
started with the 38xx page printer USA DP code assignments (see
"Code Page T1GDP037" in IBM 3800 Printing Subsystem Model 3 Font
Catalog SH35-0053 which I assume everyone has in his library) and
then adjusted it with:

    middle dot at X'B4'
    copyright at X'B5'
    times/multiply at X'BF'
    special hyphen at X'CA'
    superscript 1  at X'DA' (replaces Turkish dotless small i)
    divide at X'E1'

This will be referred to as the "code page 37" table.  (In addition,
IBM adjusted each country's EBCDIC to include the extra characters
by filling the empty slots in the tables.  (These are the Country
Extended Code Pages, CECPs.)  Note that while all the characters
exist, code positions vary for each country's individual CECP.)

    The result is an implied 1-1 correspondence of 194 ASCII and
US EBCDIC printable characters which in turn implies a translate
table.  All of this is possible because the technology has moved
to microprocessors on most terminals and printers and expanded
memory allows extended character sets.

    MOST PEOPLE DO NOT UNDERSTAND WHAT THIS REALLY MEANS.  A code
set assigns a graphic representation to a byte value.  The
"graphic representation" is most important when a file is printed
or displayed on a terminal.  The value X'4E' can be stored in
binary in a FORTRAN program and can be copied from one variable
to another without anyone caring what it means.  Only when it is
displayed do we determine if it should be "N" (ASCII) or "+"
(EBCDIC).

    Now compilers do care about the difference between "N" and
"+".  Curiously enough, most ANSI language standards do not
specify code points.  FORTRAN, COBOL, PASCAL, PLI, and C all
specify that "+" means addition.  But none of the standards
requires that plus have any particular binary value.  IBM
mainframes use EBCDIC "+" and most other computers use ASCII
"+", but some systems place it at another location.  VSAPL, for
example, has an internal code set called "Z code" which
rearranges characters for easier interpretation.  Even then, the
code which displays as plus still means addition.

    The problem is that graphic representation is a matter of
taste.  A certain amount of flexibility has to be left in to
allow for italics, to let zero optionally have a bar through it
or not, and to let the Europeans put a bar through their Z
(pronounced "zed" over there).  The standards allow "stylistic
differences".  In its most extreme case, however, ASCII
exclamation point was regarded as a stylistic representation of
EBCDIC vertical bar! (or should I say |) in ANSI X3.4-1966.

    These stylistic differences start to become a problem when
the effect the selection of codes accepted by compilers, command
processors, and other non-printer system components.  They have
then been allowed to gum up the translation between code sets.

    The naive user says that the standard EBCDIC code for "A" is
"C1".  A more correct statement is that the standard graphic
reprsentation for X'C1' is "A".  Other graphic representations
exist for the code (look at the 3800 fonts manual for symbol
fonts and consider Japanese and other languages).  The point,
however, is that most of the compilers, editors, and systems
regard X'C1' as a letter.  Actually no compiler cares what the
human thinks that the letter is.  Alright, there is a funny thing
in FORTRAN that I-N are integers by default, but by and large the
26 letters of the alphabet are interchangable in forming names.
Thus in some other country these letters could be replaced by the
local alphabet as long as letters remain letters and punctuation
remains punctuation.

    WARNINGS: So suddenly we have an "official" translate table
between U.S. EBCDIC and ASCII.  To do it, we had to go to a larger
character set on both sides.  In doing so we pick up the most
important foreign language characters (as determined by ISO, not
IBM).  This can be supported on all laser printers, PCs, and
character loadable devices.  It may not work on older printers
and terminals.  DEC, for example, supports a subset of the ISO
standard as its extended character set on its terminals.
However, it is generally possible today to load fonts into all
but the very ancient equipment.

    An IBM standard is an internal document.  Its existence will
force subsequent product developers to justify deviation from the
standard, but will not prevent such deviations when a business
case exists.  Put another way, if IBM feels that there is still a
market for band printers, technology will prevent the creation of
a 194 character set for such a device.  Given a smaller character
set, IBM may have to deviate from the larger standard.  However,
when an organization has to make code translations, this new
standard becomes the obvious starting point.

    There is no evidence that the compilers and other
applications will be ready to deal with these EBCDIC assignments.
In particular, for C and PASCAL which are defined for ASCII, the
compilers must support:

    circumflex at X'B0'
    left bracket at X'BA'
    right bracket at X'BB'

The compilers and other applications must recognize dual EBCDIC codes
for some characters.  Specific examples are:

   C:      circumflex and not for "negation"
           vertical bar and split bar for "or"
           brackets at BA/BB and AD/BD code points
           braces at 8B/9B and C0/D0 code points
           "*" and new "x" for multiplication
           "/" and new divide for division

   PASCAL: circumflex and not and tilde for "negation"
           vertical bar and split bar for "or"
           brackets at BA/BB and AD/BD code points
           braces at 8B/9B and C0/D0 code points
           "*" and new "x" for multiplication
           "/" and new divide for division

   PL/I:   circumflex and not and tilde for "negation"
           vertical bar and split bar for "or"
           "*" and new "x" for multiplication
           "/" and new divide for division

   REXX:   circumflex and not and tilde for "negation"
           vertical bar and split bar for "or"

   Query Languages:  Unknown.

   TELNET: (In DoD TCP/IP network) virtual terminal
           protocol must allow the installation to define
           the character to use for the CONTROL shift.
           Ideally, the installation would be able to
           define two code positions (e.g., cent
           for U.S. EBCDIC 3270s and left bracket for
           ASCII-7 character) compatibility.  (You want
           a character that you seldom use.
           ASCII-7 terminals have no cent and EBCDIC-94
           has no brackets).

    There is no standard for the translation of control
characters.  There are 65 ASCII control codes (X'00'-X'1F' and
X'7F'-X'9F') and exactly the same number in EBCDIC (X'00-X'3F'
and X'FF').  However, there is no official 1-1 translation.
In the past there was a tendency for duplicated mappings (EBCDIC
LF and NL were both commonly mapped to ASCII LF) so making
changes will not be a trivial decision.

ISSUES

    It is always difficult to know what the implications of a
translate table change are going to be.  It is necessary to try
it and then see what happens.  There are some old devices, like
the 6670, for which change is extremely complex.  Fortunately,
the desktop publishing revolution and Postscript printers are
making such old devices less important.

    At Yale, we have no special insight into the software.  We
will have to determine the impact of this new character mapping
on PASCAL, C, PL/I, WSCRIPT, DCF, and other character sensitive
products.

    However, we have unusual control over the communications
area.  Through YTERM 1.4 it is possible to define any PC to be an
ISO 8859 terminal.  It is also possible to load the EGA with a
font that corresponds to ISO 8859 characters (eliminating a
translation into the standard monochrome extended character set).
By changing the translate tables in the Series/1, it is possible
to build a pseudo-3270 display which supports all 194 new EBCDIC
code points and displays them on an ISO 8859 terminal (like
YTERM).

    In the near term, there will be some problems.  Older ASCII-7
terminals will not support the ISO standard and will require the
old translate tables to code PL/I.  Some host compilers will not
support the new ASCII positions and uploaded files may require a
translation pass until the compilers are upgraded.  Therefore,
these changes would be installed only for experimental access
until the full impact is determined.

    In the long run, however, these tables provide an interesting
recommendation for BITNET file transfer.  If this translation
could be adopted (with specific control code mappings) by ASCII
locations on BITNET then we could address a number of file
interchange problems.  However, even though the ISO 8859 is 95%
identical to the DEC VAX extended character set, there will still
have to be a comparable period of testing on the VMS and UNIX
side to determine if the translation poses a problem on that end.

    The ISO 8859 standard has several parts.  I have been talking
about 8859/1 the standard for Roman character sets.  There are
other parts for Eastern European and presumably Russian, Arabic,
Hebrew, and Japanese.  Eventually these issues may also have to
be addressed.

IMPACT

    There are several areas of Yale activity which could be
effected by this standard:

    The Computer Center would be directly effected if the
standard is to be supported for general terminal access.
For communications and terminal support, this
involves creation of YTERM tables, changes to the frontend
processors, and changes to PCTRANS and possibly TPRINT.  At this
time there would be no intention to change line-at-a-time
communications support or the Datasouth printers since these
devices do not support extended character sets.  This is a
subject for discussion and a long range objective.  We also need
to study the impact on the existing IBM compilers, REXX, and
other applications.

    Yale will work through BITNET to get these standards adopted
throughout the community.  This will require the agreement of the
rest of the university community.

    The university library community and the NOTIS package might
consider the implications of this standard.  The current approach
based on the special ALA support for overstruck characters is
available on a limited set of devices.  This is an area of future
discussion.

    The general community of word processing programs and users
at Yale should take these code assignments into consideration
when building fonts.  This is a user activity and does not
explicitly involve Computer Center personnel.

    University users who have in the past been interested in
non-Roman character sets should investigate the implications of
the other elements of the ISO 8859 standard.

    Unfortunately, Yale has no particular forum in which to
discuss such changes.  I would like to receive comments at
GILBERT@YALEVM and will attempt to call a meeting to discuss the
implications and implementation of table changes if the response
warrants it.

HERE IS IS, WITH WARNINGS.

    The following tables are presented for the purpose of
discussion.  They have not been checked for accuracy and are
subject to amendment.  It is, for example, rather difficult for
an American to distinguish lowercase from uppercase "Islandic
Thorn" especially in two entirely different type settings.
Still, the only way to document things is to actually provide the
tables.  A curse upon anyone who actually puts them into
production before the community as a whole agrees to them.

ASCII TO EBCDIC TABLE (WITHOUT CONTROL CODES)

*            0 1 2 3  4 5 6 7  8 9 A B  C D E F
        0    00?????? ???????? ???????? ????????   0
        1    ???????? ???????? ???????? ????????   1
        2    405A7F7B 5B6C507D 4D5D5C4E 6B604B61   2
        3    F0F1F2F3 F4F5F6F7 F8F97A5E 4C7E6E6F   3
        4    7CC1C2C3 C4C5C6C7 C8C9D1D2 D3D4D5D6   4
        5    D7D8D9E2 E3E4E5E6 E7E8E9BA E0BBB06D   5
        6    79818283 84858687 88899192 93949596   6
        7    979899A2 A3A4A5A6 A7A8A9C0 4FD0A1FF   7
        8    ???????? ???????? ???????? ????????   8
        9    ???????? ???????? ???????? ????????   9
        A    41AA4AB1 9FB26AB5 BDB49A8A 5FCAAFBC   A
        B    908FEAFA BEA0B6B3 9DDA9B8B B7B8B9AB   B
        C    64656266 63679E68 74717273 78757677   C
        D    AC69EDEE EBEFECBF 80FDFEFB FCADAE59   D
        E    44454246 43479648 54515253 58555657   E
        F    8C49CDCE CBCFCCE1 70DDDEDB DC8D8EDF   F
*            0 1 2 3  4 5 6 7  8 9 A B  C D E F


EBCDIC ASCII TABLE

*            0 1 2 3  4 5 6 7  8 9 A B  C D E F
        0    00?????? ???????? ???????? ????????   0
        1    ???????? ???????? ???????? ????????   1
        2    ???????? ???????? ???????? ????????   2
        3    ???????? ???????? ???????? ????????   3
        4    20A0E2E4 E0E1E3E5 E7F1A22E 3C282B7C   4
        5    26E9EAEB E8EDEEEF ECDF2124 2A293BAC   5
        6    2D2FC2C4 C0C1C3C5 C7D1A62C 254F3E2F   6
        7    F8C9CACB C8CDCECF CC6D3A23 4D273D22   7
        8    D8616263 64656667 6869ABBB F0FDFEA1   8
        9    B06A6B6C 6D6E6F70 7172AABA E6B8C6A4   9
        A    B57E7374 75767778 797AA1BF D0DDDEAE   A
        B    5EA3A5B7 A9A7B6BC BDBE5B5D AFA8B4D7   B
        C    7B414243 44454647 4849ADF4 F6F2F3F5   C
        D    7D4A4B4C 4D4E4F50 5152B9FB FCF9FAFF   D
        E    5CF75354 55565758 595AB2D4 D6D2D3D5   E
        F    30313233 34353637 3839B3D8 DCD9DA7F   F
*            0 1 2 3  4 5 6 7  8 9 A B  C D E F
22-Mar-88 20:55:10-EST,6359;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Tue 22 Mar 88 20:55:02-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Tue, 22 Mar 88 20:55:15 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 1512; Tue, 22 Mar 88 20:55:14 EDT
Received: by BITNIC (Mailer X1.24) id 3564; Tue, 22 Mar 88 20:49:19 EDT
Date:         Tue, 22 Mar 88 19:10:18 EST
Reply-To:     Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
Sender:       Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
From:         Mike_Alexander@um.cc.umich.edu
Subject:      Re: Some Important Comments from Howard Gilbert at Yale
              University
X-To:         ISO8859%JHUVM.BITNET@CUNYVM.CUNY.EDU
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

I read with great interest the comments from Howard Gilbert at Yale
regarding ISO8859 to EBCDIC translation.  He captures the essence of
the problem very well.

As Edwin Hart indicated in his preface to these comments, the MTS
community has already installed a ISO8859 to EBCDIC translate table
as its standard for network access to the various machines
involved.  I compared the translate table that Howard Gilbert gave
at the end of his message with the one we use (ignoring the control
characters he didn't fill in) and found the following differences.

       ISO8859    Name in ISO8859    Gilbert      MTS

         7F        DEL                FF           07
         DE        Capital Thorn      AE           8E
         E6        Small ae dipthong  96           9C
         FE        Small Thorn        8E           AE

The code for E6 seems to be a typo, since ISO8859 code point 6F also
translates into EBCDIC code point 96.  The reverse table translates
EBCDIC 96 (which is a lower case o) into ISO8859 6F which seems
correct.

We chose to translate ISO8859 code 7F into EBCDIC 07 since that has
been defined as the DEL character in various IBM publications for
some time.  I didn't personally have much to do with this decision,
so I'll let others justify it, but it seems to make sense.

We seem to disagree about the difference between an upper case and a
lower case Icelandic Thorn.  I hope we're right, since our table is
already installed.

In case anyone is interested, here is the rest of our translate
table.  The codes for ISO8859 01 through 1F were chosen to
correspond to existing EBCDIC control characters.  I don't recall
all the discussion behind the choice of the codes for ISO8859 80 to
9F, but these codes were chosen so that the entire table is one to
one.  I can dig up some of the discussion behind these choices if
anyone cares.

In the following, the name on the left gives the ISO8859 code and
the value in quotes is the corresponding EBCDIC code.

ITOE#01  DC    X'01'        SOH   start of heading           (Ctrl-A)
ITOE#02  DC    X'02'        STX   start of text              (Ctrl-B)
ITOE#03  DC    X'03'        ETX   end of text                (Ctrl-C)
ITOE#04  DC    X'37'        EOT   end of transmission        (Ctrl-D)
ITOE#05  DC    X'2D'        ENQ   enquiry                    (Ctrl-E)
ITOE#06  DC    X'2E'        ACK   acknowledge                (Ctrl-F)
ITOE#07  DC    X'2F'        BEL   bell                       (Ctrl-G)
ITOE#08  DC    X'16'        BS    backspace                  (Ctrl-H)
ITOE#09  DC    X'05'        HT    horizontal tabulation      (Ctrl-I)
ITOE#0A  DC    X'25'        LF    line feed                  (Ctrl-J)
ITOE#0B  DC    X'0B'        VT    vertical tabulation        (Ctrl-K)
ITOE#0C  DC    X'0C'        FF    form feed                  (Ctrl-L)
ITOE#0D  DC    X'0D'        CR    carriage return            (Ctrl-M)
ITOE#0E  DC    X'0E'        SO    shift-out                  (Ctrl-N)
ITOE#0F  DC    X'0F'        SI    shift-in                   (Ctrl-O)
*
ITOE#10  DC    X'10'        DLE   data link escape           (Ctrl-P)
ITOE#11  DC    X'11'        DC1   device control 1    (X-Off, Ctrl-Q)
ITOE#12  DC    X'12'        DC2   device control 2           (Ctrl-R)
ITOE#13  DC    X'13'        DC3   device control 3     (X-On, Ctrl-S)
ITOE#14  DC    X'3C'        DC4   device control 4           (Ctrl-T)
ITOE#15  DC    X'3D'        NAK   negative acknowledge       (Ctrl-U)
ITOE#16  DC    X'32'        SYN   synchronous idle           (Ctrl-V)
ITOE#17  DC    X'26'        ETB   end of transmission block  (Ctrl-W)
ITOE#18  DC    X'18'        CAN   cancel                     (Ctrl-X)
ITOE#19  DC    X'19'        EM    end of medium              (Ctrl-Y)
ITOE#1A  DC    X'3F'        SUB   substitute character       (Ctrl-Z)
ITOE#1B  DC    X'27'        ESC   escape                     (Escape)
ITOE#1C  DC    X'1C'        FS    file separator
ITOE#1D  DC    X'1D'        GS    group separator
ITOE#1E  DC    X'1E'        RS    record separator
ITOE#1F  DC    X'1F'        US    unit separator

ITOE#80  DC    X'20'              ...
ITOE#81  DC    X'21'              ...
ITOE#82  DC    X'22'              ...
ITOE#83  DC    X'23'              ...
ITOE#84  DC    X'24'              ...
ITOE#85  DC    X'15'              ...
ITOE#86  DC    X'06'              ...
ITOE#87  DC    X'17'              ...
ITOE#88  DC    X'28'              ...
ITOE#89  DC    X'29'              ...
ITOE#8A  DC    X'2A'              ...
ITOE#8B  DC    X'2B'              ...
ITOE#8C  DC    X'2C'              ...
ITOE#8D  DC    X'09'              ...
ITOE#8E  DC    X'0A'              ...
ITOE#8F  DC    X'1B'              ...
*
ITOE#90  DC    X'30'              ...
ITOE#91  DC    X'31'              ...
ITOE#92  DC    X'1A'              ...
ITOE#93  DC    X'33'              ...
ITOE#94  DC    X'34'              ...
ITOE#95  DC    X'35'              ...
ITOE#96  DC    X'36'              ...
ITOE#97  DC    X'08'              ...
ITOE#98  DC    X'38'              ...
ITOE#99  DC    X'39'              ...
ITOE#9A  DC    X'3A'              ...
ITOE#9B  DC    X'3B'              ...
ITOE#9C  DC    X'04'              ...
ITOE#9D  DC    X'14'              ...
ITOE#9E  DC    X'3E'              ...
ITOE#9F  DC    X'FF'              ...
23-Mar-88 05:07:01-EST,9045;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:A-PIRARD@BLIULG11.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Wed 23 Mar 88 05:06:36-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Wed, 23 Mar 88 05:06:42 EDT
Received: from VM1.ULG.AC.BE by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 2089; Wed, 23 Mar 88 05:06:40 EDT
Received: by BLIULG11 (Mailer X1.25) id 6083; Wed, 23 Mar 88 11:03:41 +0100
Date:         Wed, 23 Mar 88 11:00:18 +0100
From:         Andre PIRARD <A-PIRARD%BLIULG11.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      ASCII/ISO/which EBCDIC? summary
To:           ISO8859@JHUVM,
              Protocol Converter list <IBM7171@DEARN>,
              Columbia University Center for Computing Activities
 <INFO-KERMIT@CU20B.COLUMBIA.EDU>,
              IBM-KERMIT@CU20B.COLUMBIA.EDU

Some  time  ago,  I raised a discussion on several mailing  lists
about data communication and ASCII/ISO/EBCDIC character codes.  I
now  realize my wording was very loose.  Since then,  I have  had
contacts   with  both  kind  people  on  the  nets  and  a   very
knowledgeable  IBM representative.  I feel responsible to restate
the  problem  correctly  to  avoid  confusion  and  reflect   the
information,  as  I promised to some.  I'll try to be as short as
feasible.  Please join the Edwin Hart's list ISO8859 at JHUVM for
discussing details on codes etc...

We,  ASCII or EBCDIC network users must pay particular  attention
to  character  codes standards,  now extending to  international.
Even sites not interested in international characters will sooner
or  later hit the problem because,  albeit the situation is  well
defined  in  the  ASCII  world with  an  (often  overlooked)  ISO
standard,  it is far from that for EBCDIC users faced to a choice
among   several  new  "codes  pages"  whose  differences  lie  on
the  positions  of  a few characters,  strangely enough  not  the
extended  ones.  The era of data communication raises  an  urgent
need for a single character codes standard.

BITNET apparently had found one.  It is now silently tossed up by
these  new codes sets.  We had been proposed "table 500"  (below)
without  warning.  And it turns out that our IBM  representatives
ignored the de-facto coherence of BITNET.

The  ISO  have  produced  a considerable  work  in  defining  the
graphics necessary for each country and assigned them codes.  For
latin  based  alphabets,  this  yielded  the ISO  8859/1  =  ANSI
X3.134.2 = ECMA 94,  which is wisely a superset of ISO 646 = ANSI
X3.4, the well known ASCII.

ISO  8859/1 assigns character graphics to the A0-FF codes  range.
The  range  80-9F  is  unassigned and can  be  used  for  special
purposes  in 8-bit storage and transmission.  But it is kept free
in  order to not interfere with control codes 00-1F during  7-bit
transmission  in  compatibility with the  ISO  2022,  alternating
between the two sets with the SI/SO control codes.

Nobody  questions  the value of ISO and everything so  far  looks
ideal to avoid a new Babel for the largest part of the world.

IBM,  in conforming EBCDIC to ISO,  at least strongly claims that
any  EBCDIC  extension shall contain exactly the  ISO  characters
set,  in  order to make a revertible translation always possible,
but allows variations in which particular code is assigned to  an
ISO  character.  This idea is also the origin of the IBM PC  code
page  850  ASCII  extension and of the  IBM  mainframes  multiple
CECP's (country extended code pages) EBCDIC extensions.

Why multiple? because:
- Compatibility with previous codes rules IBM  evolution,  e.  g.
code page 850 contains the ISO characters, but most of the former
cp 437 stay in place (missing ones expel graphic characters).
- The  eighty-some-characters  restricted former EBCDIC  did  not
contain  all the X3.4 ASCII characters and conversely.  (see  IBM
publication  GX20-1850,  the yellow book,  pp 9-12 second column,
let's call it simply "EBCDIC" and the third column "TN-chain").
- Some  of  those  EBCDIC  codes  not  in  ASCII  are  vital  for
programming  or  using  IBM systems and had to be  produced  from
ASCII terminals.
- ASCII/EBCDIC translation tables were built to accommodate these
needs instead of mapping equivalent characters,  varied over time
and systems, and are different from those used in file transfers.
- Habits, software and data built up to a huge amount.
- ISO now defines the missing EBCDIC characters.
- It  is finally embarrassing to define a single extended  EBCDIC
and  the  proposed extensions tend to match the  terminal  tables
rather than the more stable file transfer ones.

Never  mind,  says IBM.  As long as a particular EBCDIC extension
conforms to ISO,  GDDM will take care of that.  And we're off  on
the  grounds  that  any  conforming  extension  will  do.   These
extensions  are called "Code Pages XXXXX" (cpXXX for short).  The
most prominent offerings are cp500 and cp037, more of them below,
but others exist in order to best fit existing installation use.

GDDM  is  an IBM product that will interface with  the  operating
system,  the I/O devices and the application programs in order to
(for  our  concern) convert one particular code page to  another.
They  say GDDM will use cp500 internally as the code page to  and
from which conversion will be made.

I simply don't believe in (that function of) GDDM because it  can
only  be  effective when everything will have been  converted  to
that interface.  Networking is a crying example.  What could GDDM
do to a file (they're supposed to be code-tagged) received from a
network site that does not use it?  My opinion is that we have to
settle  on a single code  NOW because the sooner the  better,  at
least for networking, but also the recommended one. Which one?

Practically,  that  making the most people happy  certainly.  And
BITNET users are numerous. Other reasons favour the present code:
- It must be compatible with former EBCDIC.
- The  compatibility with the former ASCII/EBCDIC translation  is
vital, because it often gets involved in conversions whose result
is  used  as  data critical for  computation  rather  than  "good
looking" humanly readable text.

BTW,  I  think that storing ASCII data on BITNET servers is  best
done in "binary" format (ASCII files streams split into "records"
of  arbitrary length,  best 128).  So bad for docs direct EBCDIC-
wise readability.

cp500 is simply not compatible with the former EBCDIC: it carries
on a strange habit of using exclamation marks for what a compiler
understands  as a vertical bar and such things.  I am told it  is
recommended to European because GDDM uses it internally (???) and
on  the ground of previous codes compatibility,  but it does  not
preserve their accented letters :-)

cp037 is EBCDIC compatible and recommended to US and Canadian.

Both  are  not compatible with what I believe  is  the  prominent
ASCII/EBCDIC translation,  that of the 7171,  VM,  Kermit, BITNET
gateways, ASCII tapes conversion etc... and, as I am told by IBM,
the  3708  and 3275.
- cp037  puts brackets at BA and BB and cp500 puts them at 4A and
5A whereas traditional conversion from ASCII is to the  positions
in the TN chain AD and BD.
- cp500   additionally   deviates,    because   of   its   EBCDIC
discrepancies, for ASCII "exclamation mark" and "vertical bar".
- the  ASCII  "circumflex"  uniformly translated to  EBCDIC  "not
sign"  5F.  There was no circumflex in EBCDIC,  but its new  ISO-
based definition threatens the former conversion.
- whereas  the  ASCII  backslash is often used to give  the  cent
sign in terminal mode, file transfers keep the EBCDIC backslash.

cp037 and cp500 differ in only 7 characters.
VM/SP  5 uses two TTY conversions:  TERMINAL ASCIITBL VM1 or VM2.
VM1,  the default,  is "traditional" (037 with TN chain brackets)
and  matches  no code page.  VM2 corresponds to  cp500,  but  the
brochure  GC24-5328 explains that by using the 037  graphics.  To
add  to  the  confusion the explanation refers to ANSI  X3.4  and
X3.26 respectively.

My  experience  shows  that  BITNET is working  perfectly  as  it
stands. Are we going to let a chance messing up all that?

And it looks like defining another code page would not be hard to
get  from  IBM  and  that  there  is  "nothing  defined  yet   as
communication standard". I think that we should strongly consider
requiring  another  code  page that matches BITNET  and  that  it
become the standard.

In summary:
Adopting CP037 with brackets at AD BD is easy.  What I find  more
serious is the "ASCII circumflex" to "EBCDIC not" conversion that
makes no theoretical sense now both characters are defined in the
other  set,  but is is presently used as such in  many  character
encoded stored binary files.

I  close  this discussion on these lists,  it now belongs to  the
list ISO8859.
23-Mar-88 06:36:54-EST,1210;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:A-PIRARD@BLIULG11.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Wed 23 Mar 88 06:36:37-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Wed, 23 Mar 88 06:36:37 EDT
Received: from VM1.ULG.AC.BE by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 2224; Wed, 23 Mar 88 06:36:36 EDT
Received: by BLIULG11 (Mailer X1.25) id 7865; Wed, 23 Mar 88 12:35:32 +0100
Date:         Wed, 23 Mar 88 12:21:37 +0100
From:         Andre PIRARD <A-PIRARD%BLIULG11.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: Non-standard EBCDIC mappings
To:           IBM-KERMIT@CU20B.COLUMBIA.EDU
In-Reply-To:  Message of 1988 Mar 14 23:41 EST from <PEPMNT@CFAAMP>

Had the situation been well defined, I would have suggested implementing
the full ISO character set translation in the optional 8-bit table.
But with various EBCDIC versions and pure ISO itself being rarely used, even
on the IBM PC, I think the best is to wait and see.
The present IBM Kermit translation table is probably what everyone silently
wishes as "the" standard EBCDIC. Let us keep from encouraging exotic ones
and leave the door open for compatible extension.
23-Mar-88 15:05:14-EST,2877;000000000001
Return-Path: <@um.cc.umich.edu:Bruce_Jolliffe@mtsg.ubc.ca>
Received: from umix.cc.umich.edu by CU20B.COLUMBIA.EDU with TCP; Wed 23 Mar 88 15:04:53-EST
Received: by umix.cc.umich.edu (5.54/umix-2.0)
	id AA20587; Wed, 23 Mar 88 15:08:57 EST
Received: from MTSG.UBC.CA by um.cc.umich.edu via MTS-Net; Wed, 23 Mar 88 14:54:46 EST
Date: Wed, 23 Mar 88 11:53:14 PST
From: Bruce_Jolliffe@mtsg.ubc.ca
To: IBM-Kermit@cu20b.Columbia.edu, info-kermit@cu20b.Columbia.edu,
        iso8859%jhuvm@umix.cc.umich.edu, ibm7171%dearn@umix.cc.umich.edu
Message-Id: <972890@mtsg.ubc.ca>
Subject: ISO (ASCII) to EBCDIC Standards

 
As one of several MTS sites that have recently adopted an ISO 8859 -
Code Page 37 translation table I found your note on the adoption
standard ASCII-EBCDIC tables interesting.  We mapped each ISO graphic
to its corresponding EBCDIC graphic.  Thus we mapped the EBCDIC
logical not (5F) into the ISO logical not (AC).  Similarily we mapped
the ISO circumflex into the EBCDIC circumflex (B0) and the ISO tilde
(7F) into the EBCDIC tilde (A1).
 
As you might guess the two thorniest issues over the IBM Code Page 37
was the square brackets and the logical not.  As previously noted, in
another message, the square brackets in Code Page 37 are moved from
their traditional TN positions of AD and BD to BA and BB respectively.
The second issue concerned the logical not.  At most of the MTS sites
we had traditionally mapped EBCDIC logical nots into tildes.  After
much debate we decided it made no sense to do cross graphics mapping
and decided to go with a graphic to graphic mapping.
 
Many of the MTS sites provide general access to their IBM mainframes
exclusively through ASCII terminals. Thus many applications that
used the logical not as an input character had to be changed to accept
the EBCDIC tilde (we had previously mapped EBCDIC logical nots to
ASCII tildes).
 
Prior to the conversion there was a lot apprehension about changing to
the newer standard and we prepared for the worse. Now the conversion
has been done, and we can look back the conversion was more of a nuisance
rather than a major hassle. Granted it was not free, but with a
reasonable amount of preparation and saturation publicity the conversion
can be relatively painless.
 
The installations that have made this change include the University
of Michigan, Renssellaer Polytechnic Institute, University of
British Columbia, Simon Fraser University, University of Newcastle,
Durham University, and Wayne State University. The University of
Alberta, the other remaining major MTS site, is due to convert this
summer.
 
 
                Bruce Jolliffe
                Computing Centre
                University of British Columbia
 
                Bruce_Jolliffe@mtsg.ubc.ca
          or
                USERBDJ@UBCMTSG.BITNET
 
 
23-Mar-88 16:04:42-EST,1909;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Wed 23 Mar 88 16:04:37-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Wed, 23 Mar 88 16:04:49 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 3172; Wed, 23 Mar 88 16:04:45 EDT
Received: by BITNIC (Mailer X1.24) id 2885; Wed, 23 Mar 88 15:58:39 EDT
Date:         Wed, 23 Mar 88 15:38:39 +0100
Reply-To:     Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
Sender:       Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
From:         "Alain FONTAINE (Postmaster - NAD)" <FNTA80%BUCLLN11.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: Some Important Comments from Howard Gilbert at Yale
              University
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Message of Tue, 15 Mar 88 11:17:07 EST from <HART@APLVM>

Quite important, indeed...  But the tables shown are not  correct: it is
easy to verify  that some values are present twice,  and some others not
at all. This affects six values in the EBCDIC to ASCII table, and one in
the ASCII to EBCDIC table. The  replacement values given here are indeed
consistent, but that does not mean that they are the truth.

EBCDIC to ASCII

      '6D' should be translated into '5F' instead of '4F'
      '6F' should be translated into '3F' instead of '2F'
      '79' should be translated into '60' instead of '6D'
      '7C' should be translated into '40' instead of '4D'
      '8F' should be translated into 'B1' instead of 'A1'
      'FB' should be translated into 'DB' instead of 'D8'

ASCII to EBCDIC

      'E6' should be translated into '9C' instead of '96'

Does anybody know better ?                                      /AF
23-Mar-88 16:05:24-EST,2571;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Wed 23 Mar 88 16:05:20-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Wed, 23 Mar 88 16:05:38 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 3180; Wed, 23 Mar 88 16:05:34 EDT
Received: by BITNIC (Mailer X1.24) id 2907; Wed, 23 Mar 88 15:59:28 EDT
Date:         Wed, 23 Mar 88 09:56:36 CST
Reply-To:     Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
Sender:       Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
From:         Michael Sperberg-McQueen <U18189%UICVM.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Thorn
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

I have not yet had the chance to look all through Howard Gilbert's
translate table, but can answer the query about thorn:  in the EBCDIC
CECP for the US, uppercase thorn is at AE and lowercase thorn is
at 8E.  Apart from the typography (which I admit is not always real
clear for non-readers of Icelandic or Old English), these contextual
clues should tip you off:  8C-8E (lowercase) correspond to AC-AE
(uppercase), and the IBM identifying code (LT630000 and LT640000)
for uppercase letters (LT640000 in this case) is consistently
10000 higher than the code for the corresponding lowercase letters.

The typographic differences (in case anyone has to design a font
for these!) are:

    - the lowercase thorn has a descender and an ascender; its bowl
         rests on the base line.  (so it is sometimes simulated on
         non-Icelandic typewriters by overstriking 'b' and 'p', unless
         they have serifs, or by overstriking right-bracket and 'o')
    - the uppercase thorn is standard upper-case height, has no
         descender, and its bowl is at mid-letter height, like the
         bowl on a 'P' that has slipped down a bit.

Speaking of fonts -- I have designed an ISO8859 font for the IBM3163
terminal, using font design software which was unsigned but I
believe came from Penn.  It's utilitarian, not beautiful, more or
less matches the native IBM3163 fonts, and anyone who wants it
can have it if they promise to send me any improvements they make.
(It can also be downloaded and used as a start on a PC font, since
the cell sizes are similar but the base line and line thickness
are different.)

Michael Sperberg-McQueen, University of Illinois at Chicago
23-Mar-88 23:10:21-EST,3250;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Wed 23 Mar 88 23:10:00-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Wed, 23 Mar 88 22:41:12 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 3772; Wed, 23 Mar 88 22:41:10 EDT
Received: by BITNIC (Mailer X1.24) id 5808; Wed, 23 Mar 88 22:35:21 EDT
Date:         Wed, 23 Mar 88 11:53:14 PST
Reply-To:     Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
Sender:       Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
From:         Bruce Jolliffe <USERBDJ%UBCMTSG.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      ISO (ASCII) to EBCDIC Standards
X-To:         IBM-Kermit@cu20b.Columbia.edu, info-kermit@cu20b.Columbia.edu,
              iso8859@JHUVM, ibm7171@DEARN
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>


As one of several MTS sites that have recently adopted an ISO 8859 -
Code Page 37 translation table I found your note on the adoption
standard ASCII-EBCDIC tables interesting.  We mapped each ISO graphic
to its corresponding EBCDIC graphic.  Thus we mapped the EBCDIC
logical not (5F) into the ISO logical not (AC).  Similarily we mapped
the ISO circumflex into the EBCDIC circumflex (B0) and the ISO tilde
(7F) into the EBCDIC tilde (A1).

As you might guess the two thorniest issues over the IBM Code Page 37
was the square brackets and the logical not.  As previously noted, in
another message, the square brackets in Code Page 37 are moved from
their traditional TN positions of AD and BD to BA and BB respectively.
The second issue concerned the logical not.  At most of the MTS sites
we had traditionally mapped EBCDIC logical nots into tildes.  After
much debate we decided it made no sense to do cross graphics mapping
and decided to go with a graphic to graphic mapping.

Many of the MTS sites provide general access to their IBM mainframes
exclusively through ASCII terminals. Thus many applications that
used the logical not as an input character had to be changed to accept
the EBCDIC tilde (we had previously mapped EBCDIC logical nots to
ASCII tildes).

Prior to the conversion there was a lot apprehension about changing to
the newer standard and we prepared for the worse. Now the conversion
has been done, and we can look back the conversion was more of a nuisance
rather than a major hassle. Granted it was not free, but with a
reasonable amount of preparation and saturation publicity the conversion
can be relatively painless.

The installations that have made this change include the University
of Michigan, Renssellaer Polytechnic Institute, University of
British Columbia, Simon Fraser University, University of Newcastle,
Durham University, and Wayne State University. The University of
Alberta, the other remaining major MTS site, is due to convert this
summer.


                Bruce Jolliffe
                Computing Centre
                University of British Columbia

                Bruce_Jolliffe@mtsg.ubc.ca
          or
                USERBDJ@UBCMTSG.BITNET


24-Mar-88 09:23:23-EST,2803;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Thu 24 Mar 88 09:23:17-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Thu, 24 Mar 88 09:23:41 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 4191; Thu, 24 Mar 88 09:23:40 EDT
Received: by BITNIC (Mailer X1.24) id 0618; Thu, 24 Mar 88 09:18:10 EDT
Date:         Thu, 24 Mar 88 07:19:59 EST
Reply-To:     Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
Sender:       Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
From:         John C Klensin <KLENSIN@INFOODS.MIT.EDU>
Subject:      RE:       How to get a copy of ISO8859
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

   The way to obtain any ISO standard is to go to one's own national
standards body and order it through them.  In various countries,
national depository libraries have them also and other libraries have at
least some on standing subscriptions.   The ISO Central Secretariat in
Geneva tries to stay out of the bookstore business and I gather will not
sell standards to individuals.
  The national standards bodies act as ISO's sales agents within their
own countries.
  Bo, this means that you have to find ISO8859 in Norway and, for the
other readers of this list, similarly.
  For readers in the USA, ISO standards are obtained through ANSI.  It
is best to call their order department at 212/642-4900 and get price and
shipping information.  If you must write, they are at 1430 Broadway, New
York City, NY 10018.  Specifying "order department" in the address will
save a bit of time.  I don't have the information on enough other
countries handy to make it worth listing them.  I recommend that people
outside the USA not try to order through ANSI for two reasons - they
might refuse to sell them to you, and, since the publications department
is a major source of funds for ANSI, their prices for ISO standards are
often significantly higher than the prices of many other national
bodies (some of which, I gather, give the things away).

  Specific warning about ISO 8859:  It is not one standard, but a whole
family of things, starting with what used to be called "eight-bit ASCII"
and is now known as "Latin alphabet-1" (ISO8859/1), and extending into a
large variety of things (many still in draft) that cover mixtures of the
simple characters the Romans used with a large assortment of specialized
graphics, embellished Roman, and the character sets of other languages.
Since they are all "part of" 8859, ordering "8859" is likely to get you
a lot of documents at a proportionately high price.

24-Mar-88 13:28:38-EST,6620;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:LISTSERV@BITNIC.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Thu 24 Mar 88 13:28:32-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Thu, 24 Mar 88 13:28:33 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 5734; Thu, 24 Mar 88 13:28:28 EDT
Received: by BITNIC (Mailer X1.24) id 2911; Thu, 24 Mar 88 13:21:03 EDT
Date:         Thu, 24 Mar 1988 13:21:01 EDT
Sender:       "Revised List Processor (1.5m)" <LISTSERV%BITNIC.BITNET@CUVMA.COLUMBIA.EDU>
From:         Johan van Wingen <MOSGLA%HLERUL2.BITNET@CUVMA.COLUMBIA.EDU>
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
Subject:      File "MOSGLA XMIT" being sent to you.

*--------------------------------- Cut here ----------------------------------*
)\INMR01a&HLERUL2MOSGLA    JHUVMISO88591988032416
56\HINMR02INMCOPYo- a&
MOSGLAXMITDATA-ISOLIST\INMR03o- a&
*{Dear list users
       *{ECMA standards, which are generally identical with the corresponding
         *{ones from ISO, can be ordered from: ECMA, 114 Rue du Rhone, CH-1204,
           *{Geneve, Switzerland.
             *{The following is a rather comprehensive list of everything availa
ble.           *{
                 *{1
                   *{  INTERNATIONAL STANDARDS FOR CHARACTER CODES AND RELATED S
UBJECTS              *{
                       *{  ISO 646-1983  ISO 7-bit coded character set for infor
mation interchange       *{  ISO 2022-1986 ISO 7-bit and 8-bit coded character s
ets -                      *{               Code extension techniques
                             *{  ISO 2047-1975 Graphical representations for the
 control characters of         *{               the 7-bit coded character set
                                 *{  ISO 2375-1985 Procedure for the registratio
n of escape sequences              *{  ISO 4873-1985 8-bit code for information
interchange -                        *{               Structure and rules for im
plementation                           *{  ISO 5426-1983 Extension of the Latin
alphabet coded character set for         *{               bibliographic informat
ion interchange                            *{  ISO 5428-1984 Greek alphabet code
d character set for                          *{               bibliographic info
rmation interchange                            *{  ISO 6429 DIS  ISO 7-bit and 8
-bit coded character sets -                      *{               additional con
trol functions for character-imaging devices       *{  ISO 6862 DIS  Mathematica
l coded character set for                            *{               bibliograp
hic information interchange                            *{  ISO 6937   Coded char
acter sets for textcommunication                         *{  ISO 6937/1-1983 Gen
eral Introduction                                          *{  ISO 6937/2-1987 L
atin alphabetic and non-alphabetic graphic characters        *{  ISO 6937/3 DIS
 Control functions for page-image format                       *{  ISO 6937/4 DP
   Text-processible format                                       *{  ISO 6937/5
DP   Scientific and technical graphic characters                   *{  ISO 6937/
6 DP   Publishing and box drawing graphic characters                 *{  ISO 693
7/7 DIS  Greek graphic characters    (to be withdrawn)                 *{  ISO 6
937/8 DIS  Cyrillic graphic characters (to be withdrawn)                 *{  ISO
 7350 DIS  Text communication -                                            *{
            registration of graphic character subrepertoires                 *{
 ISO 8859   8-bit single byte coded graphic characters                         *
{  ISO 8859/1-1987 Latin alphabet no. 1
 *{  ISO 8859/2-1987 Latin alphabet no. 2
   *{  ISO 8859/3-DIS  Latin alphabet no. 3
     *{  ISO 8859/4-DIS  Latin alphabet no. 4
       *{  ISO 8859/5-DIS  Latin/Cyrillic alphabet
         *{  ISO 8859/6-1987 Latin/Arabic alphabet
           *{  ISO 8859/7-1987 Latin/Greek alphabet
             *{  ISO 8859/8-DIS  Latin/Hebrew alphabet
               *{  ISO 8884 DIS  Keyboard layout for multiple Latin-alphabet lan
guages           *{  ISO 9036-1987 Arabic 7-bit coded character set for informat
ion interchange    *{
                     *{  (DIS : Draft International Standard; DP : Draft Proposa
l)                     *{1
                         *{  Correspondence between ISO and ECMA standards
                           *{    ISO    ECMA    Registration number of escape se
quence (ISO 2375)            *{   8859/1    94    100
                               *{   8859/2    94    101
                                 *{   8859/3    94    109
                                   *{   8859/4    94    110
                                     *{   8859/5   113    111
                                       *{   8859/6   114    127
                                         *{   8859/7   118    126
                                           *{   8859/8   121    138
                                             *{
                                               *{  National Standards
                                                 *{
                                                   *{  ANSI X3.04-1977 Code for
Information Interchange                              *{ 1GOST 19767-74--GOST 197
69-74, GOST 13052-74                                   *{ 1Main\ v\yislitel'n\e
, sistem\ obrabotki i apparatura peredayi dann\h         *{  (to be withdrawn, a
nd replaced by a new version)                              *{  CAS  GB 2312-80 C
oded Chinese graphic character set for                       *{               in
formation interchange                                          *{  JIS  C 6226-1
983 Japanese graphic character set for                           *{
  information interchange                                          *{
                                                                     *{  Some li
tterature                                                              *{
                                                                         *{  C.
E. Mackenzie, Coded Character sets, History and Development, 1980          *{  J
oan M. Smith, Transmitting Text, Ass. for Lit. and Ling. Computing,          *{
       Bulletin, Vol. 11, no. 2, 1983                                          
\INMR06
24-Mar-88 13:55:41-EST,4716;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Thu 24 Mar 88 13:55:35-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Thu, 24 Mar 88 13:50:49 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 6006; Thu, 24 Mar 88 13:50:47 EDT
Received: by BITNIC (Mailer X1.24) id 3461; Thu, 24 Mar 88 13:43:58 EDT
Date:         Thu, 24 Mar 88 11:36:38 EST
Reply-To:     Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
Sender:       Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
From:         Edwin Hart <HART%APLVM.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      List of Character Coding Standards
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Enclosed is a list of character set standards I received from MOSGLA @ HLERUL2.
Dear list users
                              _________

ECMA standards, which are generally identical with the corresponding
ones from ISO, can be ordered from: ECMA, 114 Rue du Rhone, CH-1204,
Geneve, Switzerland.
The following is a rather comprehensive list of everything available.


  INTERNATIONAL STANDARDS FOR CHARACTER CODES AND RELATED SUBJECTS

  ISO 646-1983  ISO 7-bit coded character set for information interchange
  ISO 2022-1986 ISO 7-bit and 8-bit coded character sets -
               Code extension techniques
  ISO 2047-1975 Graphical representations for the control characters of
               the 7-bit coded character set
  ISO 2375-1985 Procedure for the registration of escape sequences
  ISO 4873-1985 8-bit code for information interchange -
               Structure and rules for implementation
  ISO 5426-1983 Extension of the Latin alphabet coded character set for
               bibliographic information interchange
  ISO 5428-1984 Greek alphabet coded character set for
               bibliographic information interchange
  ISO 6429 DIS  ISO 7-bit and 8-bit coded character sets -
               additional control functions for character-imaging devices
  ISO 6862 DIS  Mathematical coded character set for
               bibliographic information interchange
  ISO 6937   Coded character sets for text communication
  ISO 6937/1-1983 General Introduction
  ISO 6937/2-1987 Latin alphabetic and non-alphabetic graphic characters
  ISO 6937/3 DIS  Control functions for page-image format
  ISO 6937/4 DP   Text-processible format
  ISO 6937/5 DP   Scientific and technical graphic characters
  ISO 6937/6 DP   Publishing and box drawing graphic characters
  ISO 6937/7 DIS  Greek graphic characters    (to be withdrawn)
  ISO 6937/8 DIS  Cyrillic graphic characters (to be withdrawn)
  ISO 7350 DIS  Text communication -
               registration of graphic character subrepertoires
  ISO 8859   8-bit single byte coded graphic characters
  ISO 8859/1-1987 Latin alphabet no. 1
  ISO 8859/2-1987 Latin alphabet no. 2
  ISO 8859/3-DIS  Latin alphabet no. 3
  ISO 8859/4-DIS  Latin alphabet no. 4
  ISO 8859/5-DIS  Latin/Cyrillic alphabet
  ISO 8859/6-1987 Latin/Arabic alphabet
  ISO 8859/7-1987 Latin/Greek alphabet
  ISO 8859/8-DIS  Latin/Hebrew alphabet
  ISO 8884 DIS  Keyboard layout for multiple Latin-alphabet languages
  ISO 9036-1987 Arabic 7-bit coded character set for information interchange

  (DIS : Draft International Standard; DP : Draft Proposal)

  Correspondence between ISO and ECMA standards
    ISO    ECMA    Registration number of escape sequence (ISO 2375)
   8859/1    94    100
   8859/2    94    101
   8859/3    94    109
   8859/4    94    110
   8859/5   113    111
   8859/6   114    127
   8859/7   118    126
   8859/8   121    138

  National Standards

  ANSI X3.04-1986 7-bit ASCII Code for Information Interchange
  ANSI X3.26      Punched Card Standard (ref. for IBM ASCII-EBCDIC translation)
  ANSI X3.41      7-bit ASCII character extensions, corresponds to ISO 2022
  ANSI X3.134.2   (proposed) 8-bit ASCII, corresponds to ISO 8859-1

  GOST 19767-74--GOST 19769-74, GOST 13052-74
  Main\ v\yislitel'n\e, sistem\ obrabotki i apparatura peredayi dann\h
  (to be withdrawn, and replaced by a new version)
  CAS  GB 2312-80 Coded Chinese graphic character set for
               information interchange
  JIS  C 6226-1983 Japanese graphic character set for
               information interchange

  Some literature

  C. E. Mackenzie, Coded Character sets, History and Development, 1980

  Joan M. Smith, Transmitting Text, Ass. for Lit. and Ling. Computing,
        Bulletin, Vol. 11, no. 2, 1983
25-Mar-88 02:53:09-EST,2918;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 25 Mar 88 02:53:01-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 25 Mar 88 02:53:18 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 6884; Fri, 25 Mar 88 02:53:16 EDT
Received: by BITNIC (Mailer X1.24) id 9708; Fri, 25 Mar 88 02:48:06 EDT
Date:         Fri, 25 Mar 88 08:37:11 +0100
Reply-To:     Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
Sender:       Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
From:         "Alain FONTAINE (Postmaster - NAD)" <FNTA80%BUCLLN11.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: ISO (ASCII) to EBCDIC Standards
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

I've followed the discussion, and tried  to keep up with all remarks and
corrections...  As a  result, I've  produced the  two following  tables,
which are at least complete and  coherent. This still does not mean that
they are completely right. Any help would be appreciated.     /AF

P.S. they are in REXX syntax because I used REXX to check the consistency..

/* EBCDIC -> ASCII */
asc8859 = '000102039C09867F978D8E0B0C0D0E0F'x||,
          '101112139D8508871819928F1C1D1E1F'x||,
          '80818283840A171B88898A8B8C050607'x||,
          '909116939495960498999A9B14159E1A'x||,
          '20A0E2E4E0E1E3E5E7F1A22E3C282B7C'x||,
          '26E9EAEBE8EDEEEFECDF21242A293BAC'x||,
          '2D2FC2C4C0C1C3C5C7D1A62C255F3E3F'x||,
          'F8C9CACBC8CDCECFCC603A2340273D22'x||,
          'D8616263646566676869ABBBF0FDFEB1'x||,
          'B06A6B6C6D6E6F707172AABAE6B8C6A4'x||,
          'B57E737475767778797AA1BFD0DDDEAE'x||,
          '5EA3A5B7A9A7B6BCBDBE5B5DAFA8B4D7'x||,
          '7B414243444546474849ADF4F6F2F3F5'x||,
          '7D4A4B4C4D4E4F505152B9FBFCF9FAFF'x||,
          '5CF7535455565758595AB2D4D6D2D3D5'x||,
          '30313233343536373839B3DBDCD9DA9F'x
/* ASCII -> EBCDIC */
ebc8859 = '00010203372D2E2F1605250B0C0D0E0F'x||,
          '101112133C3D322618193F271C1D1E1F'x||,
          '405A7F7B5B6C507D4D5D5C4E6B604B61'x||,
          'F0F1F2F3F4F5F6F7F8F97A5E4C7E6E6F'x||,
          '7CC1C2C3C4C5C6C7C8C9D1D2D3D4D5D6'x||,
          'D7D8D9E2E3E4E5E6E7E8E9BAE0BBB06D'x||,
          '79818283848586878889919293949596'x||,
          '979899A2A3A4A5A6A7A8A9C04FD0A107'x||,
          '202122232415061728292A2B2C090A1B'x||,
          '30311A333435360838393A3B04143EFF'x||,
          '41AA4AB19FB26AB5BDB49A8A5FCAAFBC'x||,
          '908FEAFABEA0B6B39DDA9B8BB7B8B9AB'x||,
          '6465626663679E687471727378757677'x||,
          'AC69EDEEEBEFECBF80FDFEFBFCADAE59'x||,
          '4445424643479C485451525358555657'x||,
          '8C49CDCECBCFCCE170DDDEDBDC8D8EDF'x
25-Mar-88 05:16:23-EST,7033;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 25 Mar 88 05:16:15-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 25 Mar 88 05:16:30 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 6911; Fri, 25 Mar 88 05:16:29 EDT
Received: by BITNIC (Mailer X1.24) id 0258; Fri, 25 Mar 88 05:10:17 EDT
Date:         Fri, 25 Mar 88 10:52:20 +0100
Reply-To:     Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
Sender:       Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
From:         Andre PIRARD <A-PIRARD%BLIULG11.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      IBM official translate tables
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

I've obtained from IBM the following translate tables, so-said official.
They apply to CECP 500 vs IBM PC cp850 or cp437.
I may ask for CECP 037 and ISO8859 as well if anyone is interested.
I'll "batch the orders" and post the answer to the list.
Any comment?

-------------------------------------------------------------------------

     FROM: INTERNATL    697/500   TO:   PC           980/850

     ---------------------------------------------------------------
     -0  -1  -2  -3  -4  -5  -6  -7  -8  -9  -A  -B  -C  -D  -E  -F
     ---------------------------------------------------------------
0-   00  01  02  03  DC  09  C3  9F  CA  B2  D5  0B  0C  0D  0E  0F
1-   10  11  12  13  DB  DA  08  C1  18  19  C8  F2  1C  1D  1E  1F
2-   C4  B3  C0  D9  BF  0A  17  1B  B4  C2  C5  B0  B1  05  06  07
3-   CD  BA  16  BC  BB  C9  CC  04  B9  CB  CE  DF  14  15  FE  1A
4-   20  FF  83  84  85  A0  C6  86  87  A4  5B  2E  3C  28  2B  21
5-   26  82  88  89  8A  A1  8C  8B  8D  E1  5D  24  2A  29  3B  5E
6-   2D  2F  B6  8E  B7  B5  C7  8F  80  A5  DD  2C  25  5F  3E  3F
7-   9B  90  D2  D3  D4  D6  D7  D8  DE  60  3A  23  40  27  3D  22
8-   9D  61  62  63  64  65  66  67  68  69  AE  AF  D0  EC  E7  F1
9-   F8  6A  6B  6C  6D  6E  6F  70  71  72  A6  A7  91  F7  92  CF
A-   E6  7E  73  74  75  76  77  78  79  7A  AD  A8  D1  ED  E8  A9
B-   BD  9C  BE  FA  B8  F5  F4  AC  AB  F3  AA  7C  EE  F9  EF  9E
C-   7B  41  42  43  44  45  46  47  48  49  F0  93  94  95  A2  E4
D-   7D  4A  4B  4C  4D  4E  4F  50  51  52  FB  96  81  97  A3  98
E-   5C  F6  53  54  55  56  57  58  59  5A  FD  E2  99  E3  E0  E5
F-   30  31  32  33  34  35  36  37  38  39  FC  EA  9A  EB  E9  7F

     ---------------------------------------------------------------


     FROM: PC           980/850   TO:   INTERNATL    697/500

     ---------------------------------------------------------------
     -0  -1  -2  -3  -4  -5  -6  -7  -8  -9  -A  -B  -C  -D  -E  -F
     ---------------------------------------------------------------
0-   00  01  02  03  37  2D  2E  2F  16  05  25  0B  0C  0D  0E  0F
1-   10  11  12  13  3C  3D  32  26  18  19  3F  27  1C  1D  1E  1F
2-   40  4F  7F  7B  5B  6C  50  7D  4D  5D  5C  4E  6B  60  4B  61
3-   F0  F1  F2  F3  F4  F5  F6  F7  F8  F9  7A  5E  4C  7E  6E  6F
4-   7C  C1  C2  C3  C4  C5  C6  C7  C8  C9  D1  D2  D3  D4  D5  D6
5-   D7  D8  D9  E2  E3  E4  E5  E6  E7  E8  E9  4A  E0  5A  5F  6D
6-   79  81  82  83  84  85  86  87  88  89  91  92  93  94  95  96
7-   97  98  99  A2  A3  A4  A5  A6  A7  A8  A9  C0  BB  D0  A1  FF
8-   68  DC  51  42  43  44  47  48  52  53  54  57  56  58  63  67
9-   71  9C  9E  CB  CC  CD  DB  DD  DF  EC  FC  70  B1  80  BF  07
A-   45  55  CE  DE  49  69  9A  9B  AB  AF  BA  B8  B7  AA  8A  8B
B-   2B  2C  09  21  28  65  62  64  B4  38  31  34  33  B0  B2  24
C-   22  17  29  06  20  2A  46  66  1A  35  08  39  36  30  3A  9F
D-   8C  AC  72  73  74  0A  75  76  77  23  15  14  04  6A  78  3B
E-   EE  59  EB  ED  CF  EF  A0  8E  AE  FE  FB  FD  8D  AD  BC  BE
F-   CA  8F  1B  B9  B6  B5  E1  9D  90  BD  B3  DA  FA  EA  3E  41

     ---------------------------------------------------------------


     FROM: INTERNATL    697/500   TO:   PC           919/437

     ---------------------------------------------------------------
     -0  -1  -2  -3  -4  -5  -6  -7  -8  -9  -A  -B  -C  -D  -E  -F
     ---------------------------------------------------------------
0-   00  01  02  03  DC  09  C3  9F  CA  B2  D5  0B  0C  0D  0E  0F
1-   10  11  12  13  DB  DA  08  C1  18  19  C8  F2  1C  1D  1E  1F
2-   C4  B3  C0  D9  BF  0A  17  1B  B4  C2  C5  B0  B1  05  06  07
3-   CD  BA  16  BC  BB  C9  CC  04  B9  CB  CE  DF  F4  F5  FE  1A
4-   20  FF  83  84  85  A0  C6  86  87  A4  5B  2E  3C  28  2B  21
5-   26  82  88  89  8A  A1  8C  8B  8D  E1  5D  24  2A  29  3B  5E
6-   2D  2F  B6  8E  B7  B5  C7  8F  80  A5  DD  2C  25  5F  3E  3F
7-   BD  90  D2  D3  D4  D6  D7  D8  DE  60  3A  23  40  27  3D  22
8-   BE  61  62  63  64  65  66  67  68  69  AE  AF  D0  EC  E7  F1
9-   F8  6A  6B  6C  6D  6E  6F  70  71  72  A6  A7  91  F7  92  CF
A-   E6  7E  73  74  75  76  77  78  79  7A  AD  A8  D1  ED  E8  A9
B-   9B  9C  9D  FA  B8  15  14  AC  AB  F3  AA  7C  EE  F9  EF  9E
C-   7B  41  42  43  44  45  46  47  48  49  F0  93  94  95  A2  E4
D-   7D  4A  4B  4C  4D  4E  4F  50  51  52  FB  96  81  97  A3  98
E-   5C  F6  53  54  55  56  57  58  59  5A  FD  E2  99  E3  E0  E5
F-   30  31  32  33  34  35  36  37  38  39  FC  EA  9A  EB  E9  7F

     ---------------------------------------------------------------


     FROM: PC           919/437   TO:   INTERNATL    697/500

     ---------------------------------------------------------------
     -0  -1  -2  -3  -4  -5  -6  -7  -8  -9  -A  -B  -C  -D  -E  -F
     ---------------------------------------------------------------
0-   00  01  02  03  37  2D  2E  2F  16  05  25  0B  0C  0D  0E  0F
1-   10  11  12  13  B6  B5  32  26  18  19  3F  27  1C  1D  1E  1F
2-   40  4F  7F  7B  5B  6C  50  7D  4D  5D  5C  4E  6B  60  4B  61
3-   F0  F1  F2  F3  F4  F5  F6  F7  F8  F9  7A  5E  4C  7E  6E  6F
4-   7C  C1  C2  C3  C4  C5  C6  C7  C8  C9  D1  D2  D3  D4  D5  D6
5-   D7  D8  D9  E2  E3  E4  E5  E6  E7  E8  E9  4A  E0  5A  5F  6D
6-   79  81  82  83  84  85  86  87  88  89  91  92  93  94  95  96
7-   97  98  99  A2  A3  A4  A5  A6  A7  A8  A9  C0  BB  D0  A1  FF
8-   68  DC  51  42  43  44  47  48  52  53  54  57  56  58  63  67
9-   71  9C  9E  CB  CC  CD  DB  DD  DF  EC  FC  B0  B1  B2  BF  07
A-   45  55  CE  DE  49  69  9A  9B  AB  AF  BA  B8  B7  AA  8A  8B
B-   2B  2C  09  21  28  65  62  64  B4  38  31  34  33  70  80  24
C-   22  17  29  06  20  2A  46  66  1A  35  08  39  36  30  3A  9F
D-   8C  AC  72  73  74  0A  75  76  77  23  15  14  04  6A  78  3B
E-   EE  59  EB  ED  CF  EF  A0  8E  AE  FE  FB  FD  8D  AD  BC  BE
F-   CA  8F  1B  B9  3C  3D  E1  9D  90  BD  B3  DA  FA  EA  3E  41

     ---------------------------------------------------------------
25-Mar-88 06:44:36-EST,5111;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 25 Mar 88 06:44:26-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 25 Mar 88 06:44:39 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 6929; Fri, 25 Mar 88 06:44:37 EDT
Received: by BITNIC (Mailer X1.24) id 0723; Fri, 25 Mar 88 06:39:10 EDT
Date:         Fri, 25 Mar 88 12:27:00 MET
Reply-To:     Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
Sender:       Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
From:         Johan van Wingen <MOSGLA%HLERUL2.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      cp37/500
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>


Dear List Subscribers
It took me some time to compare Mr. Gilbert's conversion table with
others in use. It is simply not true that there was no "official"
translate table before CP37 and CP500 turned up. There is one in VS
FORTRAN Language and Library Reference, SC26-4119-1, Appendix C, p.
365-370. There is even a Government Standard, exactly identical to this,
but is not a US one, it is found in GOST 19768 of the USSR, issued in
1974. This is the thing I use as the most authoritative reference. The
combination of this table with ISO 8859-1 produces a unique code page,
which I implemented using IEBIMAGE at our STC/Siemens laser printer
(working in IBM 3800 compatibility mode), based on DOTR. I did the same
thing with ISO 8859-2 for Eastern European languages. The only
concession to present practice was the exchange of "logical not" with
"circumflex", and a shift between right square bracket, exclamation
sign, and vertical bar. I see no reason why to invent a new table for
ISO 80-FF, creating further confusion. It could even involve changing
the VS FORTRAN compiler. CP37 and CP500 ought to be withdrawn.
Yours faithfully, Johan van Wingen

This is the table (in ISO format, IBM mirrors this sometimes):

  CONVERSION FROM ASCII TO EBCDIC

     0. 1. 2. 3. 4. 5. 6. 7. 8. 9. A. B. C. D. E. F.

 .0  00 10 40 F0 7C D7 79 97 20 30 41 58 76 9F B8 DC

 .1  01 11 4F F1 C1 D8 81 98 21 31 42 59 77 A0 B9 DD

 .2  02 12 7F F2 C2 D9 82 99 22 1A 43 62 78 AA BA DE

 .3  03 13 7B F3 C3 E2 83 A2 23 33 44 63 80 AB BB DF

 .4  37 3C 5B F4 C4 E3 84 A3 24 34 45 64 8A AC BC EA

 .5  2D 3D 6C F5 C5 E4 85 A4 15 35 46 65 8B AD BD EB

 .6  2E 32 50 F6 C6 E5 86 A5 06 36 47 66 8C AE BE EC

 .7  2F 26 7D F7 C7 E6 87 A6 17 08 48 67 8D AF BF ED

 .8  16 18 4D F8 C8 E7 88 A7 28 38 49 68 8E B0 CA EE

 .9  05 19 5D F9 C9 E8 89 A8 29 39 51 69 8F B1 CB EF

 .A  25 3F 5C 7A D1 E9 91 A9 2A 3A 52 70 90 B2 CC FA

 .B  0B 27 4E 5E D2 4A 92 C0 2B 3B 53 71 9A B3 CD FB

 .C  0C 1C 6B 4C D3 E0 93 6A 2C 04 54 72 9B B4 CE FC

 .D  0D 1D 60 7E D4 5A 94 D0 09 14 55 73 9C B5 CF FD

 .E  0E 1E 4B 6E D5 5F 95 A1 0A 3E 56 74 9D B6 DA FE

 .F  0F 1F 61 6F D6 6D 96 07 1B E1 57 75 9E B7 DB FF


  CONVERSION FROM EBCDIC TO ASCII

     0. 1. 2. 3. 4. 5. 6. 7. 8. 9. A. B. C. D. E. F.

 .0  00 10 80 90 20 26 2D BA C3 CA D1 D8 7B 7D 5C 30

 .1  01 11 81 91 A0 A9 2F BB 61 6A 7E D9 41 4A 9F 31

 .2  02 12 82 16 A1 AA B2 BC 62 6B 73 DA 42 4B 53 32

 .3  03 13 83 93 A2 AB B3 BD 63 6C 74 DB 43 4C 54 33

 .4  9C 9D 84 94 A3 AC B4 BE 64 6D 75 DC 44 4D 55 34

 .5  09 85 0A 95 A4 AD B5 BF 65 6E 76 DD 45 4E 56 35

 .6  86 08 17 96 A5 AE B6 C0 66 6F 77 DE 46 4F 57 36

 .7  7F 87 1B 04 A6 AF B7 C1 67 70 78 DF 47 50 58 37

 .8  97 18 88 98 A7 B0 B8 C2 68 71 79 E0 48 51 59 38

 .9  8D 19 89 99 A8 B1 B9 60 69 72 7A E1 49 52 5A 39

 .A  8E 92 8A 9A 5B 5D 7C 3A C4 CB D2 E2 E8 EE F4 FA

 .B  0B 8F 8B 9B 2E 24 2C 23 C5 CC D3 E3 E9 EF F5 FB

 .C  0C 1C 8C 14 3C 2A 25 40 C6 CD D4 E4 EA F0 F6 FC

 .D  0D 1D 05 15 28 29 5F 27 C7 CE D5 E5 EB F1 F7 FD

 .E  0E 1E 06 9E 2B 3B 3E 3D C8 CF D6 E6 EC F2 F8 FE

 .F  0F 1F 07 1A 21 5E 3F 22 C9 D0 D7 E7 ED F3 F9 FF


  DEVIATIONS: ASCII TO EBCDIC           EBCDIC TO ASCII     UNPRINTABLE
                                                                      |
             21 5D 5E 09 0A 1C FF      4F 5A 5F 15 17 22 24 35 E1 FF  |
  STANDARD   4F 5A 5F 05 25 1C 00      21 5D 5E 85 87 82 84 95 9F FF
  PDP-HASP   5A 5F 4F 05 25 22 07      5E 21 5D 00 00 1C 00 1E 00 7F 00
  VAX-SNA    4F 5A 5F 40 25 1C 3F      21 5D 5E 5C 5C 5C 5C 5C 5C 5C 5C
  VAX SUBR   4F 5A 5F 05 25 1C FF      21 5D 5E 0A 1B 5C 5C 5C 5C FF 5C
  VTAM       4F 5A 5F 05 15 1C DELETED 21 5D 5E 0A 00 5C 00 5C 00 7F 00
  TSO-KERMIT 4F 5A 5F 05 25 1C 00      21 5D 5E 0A 1B 00 00 00 00 00 00
  PC-3278 AD 5A 4F 5F 05 25 1C 00      5D 21 5E 85 87 82 84 95 9F FF

  EARN/BITNET
 A  21 5B 5D 7C 85 8A D5 E3 E5 FC   E  15 2A 4A 4F 5A 6A AD BB BD FC
 E  5A AD BD 4F 2A 15 BB 4A FC 6A   A  8A 85 E3 7C 21 FC 5B D5 5D E5

 FROM  J. W. van Wingen    MOSGLA@HLERUL2   :
     Mail to                                :
 P. O. Box 486,  2300AL Leiden, Netherlands :

25-Mar-88 07:31:19-EST,2977;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 25 Mar 88 07:31:13-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 25 Mar 88 07:31:28 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 6971; Fri, 25 Mar 88 07:31:27 EDT
Received: by BITNIC (Mailer X1.24) id 0909; Fri, 25 Mar 88 07:26:10 EDT
Date:         Fri, 25 Mar 88 11:12:03 +0100
Reply-To:     Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
Sender:       Discussion list for ASCII/EBCDIC character set related issues
              <ISO8859@JHUVM>
From:         Andre PIRARD <A-PIRARD%BLIULG11.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: ASCII/ISO/which EBCDIC? summary
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Message of Wed, 23 Mar 88 20:10:42 EST from <VM0A61@WVNVM>

>>My  experience  shows  that  BITNET is working  perfectly  as  it
>>stands. Are we going to let a chance messing up all that?
>
>I agree that this discussion be moved to the other list, but before
>I do I can't help but point out that the above statement that BITNET
>is "working perfectly" is one of the silliest things I have heard in
>a long time, and it is a shame because this was an otherwise
>fairly reasonable note.

These words ask for a public reply.

*From context*, the statement applies to ASCII/EBCDIC 7-bit codes translation
of mail (through gateways or retrieving stored data obtained through them)
and to receiving the same codes entered at EBCDIC terminals.

*My experience* shows that, for example, we've never had any problem sending
or receiving UUENCODEd or BOOed binary data, a good test because it uses every
possible ASCII code in a message. And that this translation matched everything
I could get my hands on. This is what threatens extension to 8 bits.
This experience might be limited to a subset of BITNET or of its use however.
This is why I have first queried the net to make up my mind. All I could hear
of was some "sometime somewhere somebody...".

I would have liked to evaluate that numerically by sending a simple form to
be filled by a random sample of BITNET sites. But I have no time to do this
and the questions to ask had to wait for some discussion first. Maybe after
a while of ISO8859 good thinking, someone could undertake the project...

That parity, uselessly reducing transmissions to 7 bits, is nonsense, that it
is a pity we have to use mail to send binary data, and that other things could
be better are all subjects I agree with but that were not the point of my
note.

But that the guy next door is suddenly typing hieroglyphs for brackets because
CECP 500 has fallen upon him and that multiplying 3 EBCDIC by 3 ASCII codes
sets gives us 9 translation tables pairs to choose from in the best case,
*that* is really silly.
29-Mar-88 09:40:33-EST,5895;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Tue 29 Mar 88 09:40:21-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Tue, 29 Mar 88 09:40:57 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 4009; Tue, 29 Mar 88 09:40:55 EDT
Received: by BITNIC (Mailer X1.24) id 4916; Tue, 29 Mar 88 09:38:49 EDT
Date:         Tue, 29 Mar 88 15:30:00 MET
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Johan van Wingen <MOSGLA%HLERUL2.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Accented Letters
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>


Dear List Subscribers
The discussion of character codes shows that problems can be classified:
1. Problems with differing conversion tables EBCDIC/ISO8859.
2. Problems with available characters.

As for 1, there is the traditional table, as found in the FORTRAN
manual. BITNET table deviates from this at a few places in transferring
codes in ASCII 00-7F (left part). For details see my previous letter.
Then there is the table based on CP37/500 with a complete different
right part (80-FF).
As for 2, there is the national character problem, which can only be
solved using the character sets of ISO 8859.
Both issues should not be confused with each other.
Distributing the ISO 8859 characters over a code page cannot be done in
an arbitrary way. As soon as you choose the conversion table the result
is fixed, and conversely, every code page created fixes its conversion
table. So it is up to your choice to determine what is convenient.

From the information I received I tried to reconstruct the CP500 code
page, SH35-0053 not being available here. Then I compared it with the
FORTRAN code page, as derived from the FORTRAN conversion table. Which
do you prefer? Are the differences really worth the confusion?
Yours faithfully, Johan van Wingen


  A COMPARISON OF FACILITIES FOR LETTERS WITH DIACRITICS

  Notation
  (descriptions taken from ISO 6937-2, additions between parentheses)
  /  acute accent
  \  grave accent
  ^  circumflex accent
  %  diaeresis (umlaut, trema)
  ~  tilde
  *  caron (hachek)
  #  breve (Rumanian a)
  #  double acute accent (Hungarian o,u)
  @  ring (above: a,u)
  @  dot (above: z)
  =  macron (upper line)
  $  cedilla (c,s,t)
  $  ogonek (Polish a,e)
  $  (barred: o, eth, thorn)
  _  (underline, fraction)
  &  (ligature: ae,oe,sz)
  ?  (dot under)

  REPRESENTATION OF LETTERS FROM ISO 8859-1 WITH FORTRANTABLE

     4. 5. 6. 7. 8. 9. A. B. C. D. E. F.

 .0              ~A ^E ~N $O           0
 .1               a  j    \U  A  J     1
 .2               b  k  s /U  B  K  S  2
 .3               c  l  t ^U  C  L  T  3
 .4               d  m  u %U  D  M  U  4
 .5               e  n  v /Y  E  N  V  5
 .6           \A  f  o  w $P  F  O  W  6
 .7           /A  g  p  x &s  G  P  X  7
 .8           ^A  h  q  y \a  H  Q  Y  8
 .9               i  r  z /a  I  R  Z  9
 .A              %A %E \O ^a \e ^i ^o /u
 .B              @A \I /O ~a /e %i ~o ^u
 .C              &A /I ^O %a ^e $d %o %u
 .D              $C ^I ~O @a %e ~n    /y
 .E              /E %I %O &a \i \o $o $p
 .F              \E $D    $c /i /o \u %y



  REPRESENTATION OF LETTERS FROM ISO 8859-1 WITH CP500 TABLE

     4. 5. 6. 7. 8. 9. A. B. C. D. E. F.

 .0              $o $O
 .1     /e    /E  a  j        A  J     1
 .2  ^a ^e ^A ^E  b  k  s     B  K  S  2
 .3  %a %e %A %E  c  l  t     C  L  T  3
 .4  \a \e \A \E  d  m  u     D  M  U  4
 .5  /a /i /A /I  e  n  v     E  N  V  5
 .6  ~a ^i ~A ^I  f  o  w     F  O  W  6
 .7  @a %i @A %I  g  p  x     G  P  X  7
 .8  $c \i $C \I  h  q  y     H  Q  Y  8
 .9  ~n &s ~N     i  r  z     I  R  Z  9
 .A
 .B                          ^o ^u ^O ^U
 .C              $d &a $D    %o %u %O %U
 .D              /y    /Y    \o \u \O \U
 .E              $p &A $P    /o /u /O /U
 .F                          ~o %y ~O


  REPRESENTATION OF LETTERS FROM ISO 8859-2 WITH FORTRAN TABLE

     4. 5. 6. 7. 8. 9. A. B. C. D. E. F.

 .0           $s #A $E /N *R           0
 .1     *S    *t  a  j    @U  A  J     1
 .2  $A $S    /z  b  k  s /U  B  K  S  2
 .3     *T $l     c  l  t #U  C  L  T  3
 .4  $L       *z  d  m  u %U  D  M  U  4
 .5        *l @z  e  n  v /Y  E  N  V  5
 .6  *L *Z /s /R  f  o  w $T  F  O  W  6
 .7  /S @Z    /A  g  p  x &s  G  P  X  7
 .8        *s ^A  h  q  y /r  H  Q  Y  8
 .9     $a        i  r  z /a  I  R  Z  9
 .A              %A %E *N ^a *c ^i ^o /u
 .B              /L *E /O #a /e *d #o #u
 .C              /C /I ^O %a $e $d %o %u
 .D              $C ^I #O /l %e /n    /y
 .E              /E *D %O /c *e *n *r $t
 .F     /Z       *C $D    $c /i /o @u



  REPRESENTATION OF LETTERS FROM ISO 8859-2 WITH CP500 TABLE

     4. 5. 6. 7. 8. 9. A. B. C. D. E. F.

 .0           *r *R *l
 .1     /e    /E  a  j    $L  A  J     1
 .2  ^a $e ^A $E  b  k  s *L  B  K  S  2
 .3  %a %e %A %E  c  l  t     C  L  T  3
 .4  /r *c /R *C  d  m  u *S  D  M  U  4
 .5  /a /i /A /I  e  n  v     E  N  V  5
 .6  #a ^i #A ^I  f  o  w /s  F  O  W  6
 .7  /l *d /L *D  g  p  x /z  G  P  X  7
 .8  $c *e $C *E  h  q  y     H  Q  Y  8
 .9  /n &s /N     i  r  z *z  I  R  Z  9
 .A        /S    *T $S $A       *s    $l
 .B              *t $s @z    ^o #u ^O #U
 .C              $d /c $D @Z %o %u %O %U
 .D              /y    /Y    *n @u *N @U
 .E              $T /C $T    /o /u /O /U
 .F     /Z       $a    *Z    #o %y #O



 FROM  J. W. van Wingen    MOSGLA@HLERUL2   :
     Mail to                                :
 P. O. Box 486,  2300AL Leiden, Netherlands :

29-Mar-88 12:30:03-EST,3713;000000000001
Mail-From: SY.FDC created at 29-Mar-88 12:29:58
Date: Tue 29 Mar 88 12:29:58-EST
From: Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
Subject: For the digest...
To: sy.christine@CU20B.COLUMBIA.EDU
Message-ID: <12386232998.151.SY.FDC@CU20B.COLUMBIA.EDU>

Date: Tue, 29 Mar 88 17:54:11 +0200
From: Andre PIRARD <A-PIRARD%BLIULG11.BITNET@CUVMA.COLUMBIA.EDU>
Subject: Proposed Kermit Rule for Extended ASCII
Keywords: ASCII, Extended ASCII, ISO8859, Translation Tables

In the process of implementing extended (national) characters transfer
between micros and IBM mainframes, I came to the conclusion that, for the
sole IBM PC, I had to build at least 9 different tables in order to support 3
EBCDIC tables (traditional and CECP 500 and 037) x 3 "ASCII" tables (table
437, table 850 and ISO 8859/1).  Not considering ISO for the Macintosh, I've
still got 3 tables to build for the IBM host and, if I endeavoured Mac to IBM
PC conversion, 3 more tables or so.

When we add more machine types, it all looks like the wheat grains on a
chessboard problem. Not counting the added difficulty of knowing which is to
translate what in what.

Doesn't it look reasonable that each party deal with its own code problems
and that the Kermit protocol rule what character code standard travels on the
line as it already does for restricted ASCII? (That applies for text mode
only, of course).

I think ISO8859/1 is there for the purpose, with the added bonus that it
keeps the 80-9F range free (but available for additionals if needed).  This
range is indeed the one that adds the largest overhead to 8th bit quoting.

Similarly, ISO8859/1 should be used for terminal mode communication, at least
as an option. This just involves byte to byte conversion in 8-bit wide mode
and an additional SO/SI escaping (ISO 2022) mechanism in 7-bit mode.

The same applies to non-Latin group users who should use their own 8859/x
version similarly.

[Ed. - Kermit was designed (in 1981) on the assumption that 7-bit ASCII was
the most common representation for text files.  In ISO terms, 7-bit ASCII
(with control-character prefixing, etc) is the presentation-layer "transfer
syntax" for text files.  But now we have a proliferation of 8-bit ASCII
character sets -- in addition to the IBM PC's, Apple's, and DEC's various
incompatible extended ASCIIs, we have the ISO 8859 variations, and then the
various translations between them and EBCDIC.

In Japan, they face a similar problem.  There are numerous character sets --
Katakana, Hiragana, Romaji, Kanji -- and there are numerous "standards" for
representing each of these (especially Kanji) in the computer.  Their
solution was to modify the Kermit programs they use to "SET FILE TYPE TEXT
<name of standard>", putting the onus on the user to specify not only the
file type but also the encoding.

As Andre suggest, it would be best if there were one single transfer syntax
for text files (at least for languages whose alphabets can be respresented in
8-bit characters), and each Kermit program translate between that and its own
local code set.  Is ISO 8859/1-1987 ("Latin Alphabet 1", = ANSI X3.134.2, =
ECMA 94) a choice that won't offend anyone?  The lower half (characters 0-127)
corresponds to US ASCII (ANSI X3.4).

If this proposal results in controversy, then does anyone have a simple
alternative proposal?  Meanwhile, it seems wise to build user-defined
translation tables into Kermit programs, such as we have in MS-Kermit 2.30,
and IBM mainframe Kermit 4.0.  In MS-Kermit, it might also be desirable to
extend the translation mechanism to file transfer, in some general,
user-controllable way.  Opinions?]
-------
30-Mar-88 08:54:30-EST,1543;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Wed 30 Mar 88 08:54:23-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Wed, 30 Mar 88 08:54:43 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 5247; Wed, 30 Mar 88 08:54:41 EDT
Received: by BITNIC (Mailer X1.24) id 4652; Wed, 30 Mar 88 08:53:46 EDT
Date:         Tue, 29 Mar 88 20:48:32 EST
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         John C Klensin <KLENSIN@INFOODS.MIT.EDU>
Subject:      RE:       Turning the Tables:  A Standards Problem
X-To:         ASCII/EBCDIC character set related issues
              <ISO8859%JHUVM.BITNET@MITVMA.MIT.EDU>
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Well, it may or may not be just history from the "old" ASCII
Standard, but they are ALL that way.  Every one.  The current
ASCII standard, the ANSI standards corresponding to ISO8859, the
control code standards, and so on and so forth.  And, yes, ISO
has done "the same thing".  And so has CCITT, where you will
find character codes expressed as column/row.

Perhaps it is really an artifact, not of "old" ASCII, but of
"old" FORTRAN, which also addressed things in this order.

In any event, better just get used to it; a "correction" would
cause chaos.

31-Mar-88 10:15:54-EST,3967;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Thu 31 Mar 88 10:15:51-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Thu, 31 Mar 88 10:16:32 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 6913; Thu, 31 Mar 88 10:16:29 EDT
Received: by BITNIC (Mailer X1.24) id 9858; Thu, 31 Mar 88 10:15:22 EDT
Date:         Thu, 31 Mar 88 16:58:17 GMT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Matthias Melcher <$28%DHDURZ1.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Code Page Nationalities
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Dear list subscribers,

the comparison of code pages can be simplified when we just think of
them having a nationality or a "mother tongue", and some of them
knowing foreign languages.

The mother tongue of a code page is determined by one half
of its character repertoire, a kernel which could be
- entered on display terminals
- coded with 7 bits
- mapped onto the kernels of code pages of other nationalities simply
  by replacing the 14 "national use characters"
That is the left half of an ASCII code page, and in EBCDIC the areas
roundabout 4A-7F, 81-A9 and C1-F9.

The difference of CP 037 and CP 500 is not "data processing
oriented" vs. "word processing oriented" (Ed Hart), but:
- CP 037 has US nationality
- CP 500 has nationality "International", like 3274 Interface Code 14,
  and ISO 8859 itself.

In that sense, "US"-ASCII must be regarded as International rather
than US, and there is no real US ASCII code page (with e.g.
Cent-sign in the left half).


In the times when code pages did not speak foreign languages
translations had to be done
- either ignoring the graphic representations
  (e.g. exclamation point <-> right bracket, circumflex <-> logical-not)
- or with foul compromizes
  (e.g. taking brackets AD/BD from TN-chain, but not braces 7B/8B).

Today, if we want to respect the graphics  we have the choice:
(a) Map International ASCII (=ISO 8859) to International EBCDIC
    (= CP 500), i.e. kernel onto kernel (mother tongue) and extension
    onto extension (foreign languages).
(b) Map International ASCII to national EBCDICs, e.g. US (= CP 037),
    thus intermixing kernel and extension.

We must be aware that choice (b) logically consists of two translations:
ASCII to EBDCIC and International to US, and this brings a lot of
conceptual complexity and confusions which, in the long run, make
communication cumbersome.

Choice (a), on the other hand, bears many migration problems,
especially as long as IBM has not completed its CECP support (like
teaching PL/1 to recognize B0 as logical not, or teaching the 3174 to
show and accept thorns).


But I think this all will come. For example, the 3174 CECP RPQ 8Q0566
has been already shown at CeBIT Hannover fair and will be
released as soon as some software corequisites are done.

In the meantime, its not too difficult two deal with Code Page 500.
Even on a US 3278 you can edit most of the CECP characters:
just using CMS set output / set input with the old 5A-device codes.
(We can send you a copy of the EXEC).

I don't know which IBM representatives still recommend CP 037 to
US users. The official recommendation explicitly states
"Standardizing on a single code page for the entire network ..."
"IBM recommends that if this is going to be done that the customer
standardize using the International CECP code page." (8Q0566 announcmt)

EARN/BITNET is an international network. So I think the code page
has to be international as well, and every site must be able to
send and accept mail in this code page.

Mit freundlichen Gruen - Matthias Melcher
 8-Apr-88 12:09:29-EST,2329;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 8 Apr 88 12:09:26-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 08 Apr 88 12:07:40 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 6032; Fri, 08 Apr 88 12:07:37 EDT
Received: by BITNIC (Mailer X1.25) id 6853; Fri, 08 Apr 88 12:06:10 EDT
Date:         Fri, 8 Apr 88 11:38:12 EDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Edwin Hart <HART%APLVM.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: Code Page Nationalities
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Your message of Thu, 31 Mar 88 16:58:17 GMT

The ISO 8859-1 character set is Latin Alphabet number 1 and has most characters
needed for Western Europe. Eastern Europe uses a different version of ISO 8859.

In discussing the differences between Code Pages 500 and 37, please
understand that they contain exactly the same set of characters as ISO 8859-1.
They were designed that way.  However, the code points for most of the
characters are different - each is a different code.  Code Page 37 is a 192
character superset of the US/Canada English Data Processing 96-character
code except for (square) brackets.  Similarly, the 96 character subset of
Code Page 500 characters match the ISO 646 / ANSI X3.4 characters.

When translating characters from ISO 8859-1 codes to Code Page 500, for
example, I believe that the characters should match.  If the translation
were from ISO 8859-1 to Code Page 37, the translation would be different.
However, if we were considering another variation, ISO 8859-2, then I would
expect IBM to provide another code page with the same character set as ISO
8859-2.  I would expect that the IBM Code Page corresponding to ISO 8859-2
might require a different translation than the one defined by ISO 8859-1 to
Code Page 500.  If however, IBM would standardize on one code page for Latin
Alphabet number 1, then the ISO 8859-x translation to IBM code page xxx
could be held constant.  That would be desirable.

Ed Hart
 8-Apr-88 23:15:10-EST,2676;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 8 Apr 88 23:15:04-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 08 Apr 88 23:13:14 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 6853; Fri, 08 Apr 88 23:13:12 EDT
Received: by BITNIC (Mailer X1.25) id 4371; Fri, 08 Apr 88 23:07:15 EDT
Date:         Fri, 8 Apr 88 12:10:00 EST
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Barry D Gates <GATES%MAINE.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Something I came across out in netnews-land...
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

I thought the ISO8859 list might be interested in what this person has to say.
I do not necessarily support the persons views, nor do I reject them
(I'm not an expert on this, just an interested party).  I'm interested in
any opinions folks may have on this.  I noticed from previous postings here
that what we are generally referring to as ISO8859 is really ISO8859/1 for
the Western European Nations.  Does ISO intend to come out with a codepage
for the languages this poster lists in his mailfile?
Also are there any other EBCDIC mappings for the other codepages?

Anyway, here is the person's posting.  It was brought on in response to
a posting on "interNational Language Support" on the HP computers I believe.
The NLS has nothing to do with the SP5 IBM oddity, but more with what we are
talking about here.

------ Forwarded MAIL from comp.std.internat: International Standards Newsgroup

From: bas+@andrew.cmu.edu (Bruce Sherwood)
Newsgroups: comp.std.internat
Subject: Re: International Language Support
Message-ID: <8WKYiky00UgCM600g4@andrew.cmu.edu>
Date: 6 Apr 88 15:16:00 GMT
Organization: Carnegie Mellon University
Lines: 14
In-Reply-To: <691@kuling.UUCP>

To repeat a major complaint I have about ISO 8859 (which I'm distressed to see
is a component of NLS):

This standard is based on nations rather than languages.  So the West European
version doesn't handle Welsh or Catalan or Esperanto (which don't have their
own nations).

The older standard, ISO 6937, was based on forty Latin-alphabet-using
languages, not on nations.  So it handled just about everything (except for
Vietnamese) including Welsh and Catalan and Esperanto.

ISO 8859 is a MAJOR step backward in terms of linguistic equality.

Bruce Sherwood
--------- End of forwarded mail.
 9-Apr-88 16:19:02-EST,2716;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Sat 9 Apr 88 16:18:59-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Sat, 09 Apr 88 16:17:19 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 7347; Sat, 09 Apr 88 16:17:18 EDT
Received: by BITNIC (Mailer X1.25) id 8422; Sat, 09 Apr 88 16:16:23 EDT
Date:         Sat, 9 Apr 88 16:08:00 EDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Chris Tanner <01696%AECLCR.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      ISO 6937 and ISO 8859
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

This is a few remarks about ISO 6937 and ISO 8859 in reply to a recent mailing
decrying ISO 8859. I am not an expert in this field, but from the little I know,
here goes.

ISO 6937 and ISO 8859 were developed by 2 different groups within
ISO-IEC JTC1/SC2 for different purposes. ISO 6937 is designed for printers.
It creates accented characters by the providing accented symbols which are
actually no spaceing characters (these are found in the G2 set). Forinstance,
e acute is created by the acute sign character plus e. It also includes the
oe dipthong. This sort of thing is fine for printing but not very good for
character string comparison and sorting.

ISO 8859 is designed for use in programs (character string comparison and
sorting). It provides separate characters for all the accented characters.
It does not provide the oe dipthong since this is treated in string
comparisons as O + E. There are 8 parts to ISO 8859. If people are interested,
I can post to the list the title of each part, and the languages covered.

SC2 has been asked by its member countires to achieve a harmonization between
these 2 standards. This has resulted in a project proposal (Document JTC1 N 156)
(balloting closes June 2, 1988) which is accompanied with a paper entitled
Co-ordination of the Development of ISO 6937 and ISO 8859. It describes the
purposes of ISO 6937, ISO 8859 and ISO 4873 (which specifies the rules and
structure for 8 bit codes), the problems with these standards, and it proposes
a structure for a family of graphic character sets for 8 bit coding. Hopefully
this project will achieve its aim.

By the way, the library/ information services people have a coding standard
of their own which is similiar to ISO 6937 in some ways.

Chris Tanner
Atomic Energy of Canada

My views are my own and not the views of my employer.
11-Apr-88 11:59:23-EST,1872;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Mon 11 Apr 88 11:59:18-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Mon, 11 Apr 88 11:30:37 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 8303; Mon, 11 Apr 88 11:30:35 EDT
Received: by BITNIC (Mailer X1.25) id 8501; Mon, 11 Apr 88 11:29:52 EDT
Date:         Mon, 11 Apr 88 17:14:00 MET
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Johan van Wingen <MOSGLA%HLERUL2.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      6937/8859
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Dear list subscribers
Mr. Sherwood's remarks about ISO 8859 compared with ISO 6937 are unfair
and untrue. In ISO 8859-3 one finds characters for Catalan, Esperanto,
Galician, Maltese and Turkish. Lappish is in ISO8859-4. ECMA-94 contains
all of ISO 8859-1,2,3,4 together. ISO 6937 is NOT a single byte coded
character set.
As a sequel to Mr. Tanner's comments, I attended the meeting of SC2/WG3
responsible for 6937 and 8859, 16-17 March 1988 in Paris. Work on
6937-5,6 will be discontinued, and DIS 6937-7,8 withdrawn. There will be
ISO 8859-9, Latin alphabet no. 5, with Icelandic eth, thorn and /y
replaced by Turkish g breve, s cedilla and dotless i / dotted capital I.
The first draft of ISO XYZ (the harmonization) will appear in May.
ISO 5426, the bibliographic coded character set is not (yet) included in
the harmonization.
Yours faithfully, Johan van Wingen


 FROM  J. W. van Wingen    MOSGLA@HLERUL2   :
     Mail to                                :
 P. O. Box 486,  2300AL Leiden, Netherlands :

11-Apr-88 13:19:29-EST,1919;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Mon 11 Apr 88 13:19:27-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Mon, 11 Apr 88 13:17:50 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 8608; Mon, 11 Apr 88 13:17:49 EDT
Received: by BITNIC (Mailer X1.25) id 0917; Mon, 11 Apr 88 13:17:09 EDT
Date:         Mon, 11 Apr 88 12:46:17 EDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Edwin Hart <HART%APLVM.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: Something I came across out in netnews-land...
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Your message of Fri, 8 Apr 88 12:10:00 EST

The problem with ISO 6937 is that none of the computer manufacturers support
it.  ISO 6937 indeed has all of the accents and with it you can form many
characters.  ISO 6937 comes from CCITT and the standard is concerned with
teletext transmission.  However, the computer manufacturers found it
unacceptable because multiple bytes were used to form and store the characters.
For example, to form an "a" with a circumflex, you did something like:  strike
the accent, then backspace, then the character.  The manufacturers wanted to
represent each character with one code - not some with three.

ISO 8859-1 is also incomplete.  For example, it also lacks the
French "oe" diphthong.  ISO 8859-1 is a compromise standard.  I understand
that when the compromise was reached, and the chairman asked if any more
changes should be made, no one said anything; because if one started, then
everyone would have "just a little change" and we would still not have a
standard.

Ed Hart
11-Apr-88 23:57:10-EST,7258;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Mon 11 Apr 88 23:57:07-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Mon, 11 Apr 88 23:55:32 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 9570; Mon, 11 Apr 88 23:55:31 EDT
Received: by BITNIC (Mailer X1.25) id 7817; Mon, 11 Apr 88 23:52:55 EDT
Date:         Mon, 11 Apr 88 10:56:34 CDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Michael Sperberg-McQueen <U18189%UICVM.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Single or Multiple Tables for Multiple Character Sets?
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

If I understand his messages correctly, Johan van Wingen's main
objection to the late discussions is that they are working on the
assumption that equivalent graphics in different character sets
should be mapped to each other.  He objects that this would lead
to multiple, incompatible translate tables being required.

Edwin Hart voices the hope that IBM may be able to do for 8859
(in all its various parts) what they did for ISO 646 (in all its
various national manifestations):  set a translation or mapping
by comparing two character sets (e.g. CP 37 or CP 500 and ISO
8859/1), and then define the various related EBCDIC code pages
by applying that mapping to the other parts of 8859.  (So a
Greek EBCDIC code page will result from applying the CP500-ISO8859/1
translation to ISO8859/7, and so on.)  That might not completely
answer Mr. van Wingen's concerns, but it would be handy.

I agree that it would be nice to keep the number of translate tables
down, where feasible.  But I fear that it's not feasible in the way
Mr. Hart suggests, and I have a number of problems with Mr. van Wingen's
idea that an arbitrary mapping should be defined, implemented *in
hardware* (!), and stuck to come hell or high water.

The fundamental question I have for anyone who will answer is:  If we
do not translate graphic-for-graphic, what is the point of translating?
Why not define our mapping as the one-to-one mapping in which each
hex code maps to itself?  Or, to protect the control-code areas,
why not just map

    ISO       EBCDIC
    0-1F      0-1F
    80-9F     20-3F
    20-7E     40-9E
    7F        FF
    A0-FE     A0-FE
    FF        9F

?

Obviously, this is *not* repeat *not* a serious suggestion.  Why?
because it would do no one any good at all.  Similarly, applying the
mapping given in the back of the VS Fortran 2.1 manuals to either
ISO 8859/1 or any extended EBCDIC code page will give you code pages
that contain all the necessary characters, but in an arrangement that
no one at all supports.  What good would data like that do anyone?

The obvious desiderata for translate tables seem to be:
    - there should be as few as possible, preferably only one
    - they should translate characters correctly (i.e. graphic for
         graphic, with substitutions only where required)
    - they should preserve the collation sequence of the special
         characters or second alphabet in the code

Equally obvious, no two of these are compatible.  The US CECP
does not preserve the collating sequence of ISO 8859/1, the usual
EBCDIC version of the library character set does not preserve the
collating sequence of the ASCII version of the same set (why?!
does anyone know why?), and the mappings from ALA/ASCII to ALA/EBCDIC
are not compatible.  Similarly:  ISO 8859/7 will define a Greek
character set, and ISO 8859/8 a Hebrew character set.  Without having
seen either, I'll give ten to one odds against either set mapping
to the EBCDIC character sets for Greek and Hebrew with the same
translations as for 8859/1 and any IBM extended code page.

    ------------------------------------------------------------

What can be done?  I don't know the answers, but it seems obvious
that certain things *will* be done no matter what, and that others
*can* be done if we here will agree to do them.  I offer the following
observations as one person's tentative assessment of the facts,
probabilities, and hopes.

First:  if IBM can be persuaded to define one single EBCDIC for Western
Europe without country variations, as proposed (I think) in the SHARE
paper, they should do so.

Second:  a good character-to-character translation for one of the
IBM extended code pages (37, 500, or some CECP) to ISO 8859/1 is
going to be implemented a lot of places, along the lines of Howard
Gilbert's posting and the various revisions to it.  There is no point
in trying to stop this, but we should, as Mr. Gilbert says, all agree
and implement the same mapping, not different ones.  To implement
the same mapping, we should decide, even if IBM will not, on one
EBCDIC code page to take as basic or common.

Third:  at sites with library automation systems, a translate table for
the library character sets will also be implemented, or has already
been.  There is no point in stopping this, either, and it can't be
stopped anyway.  The hardware for library terminals defines the
required table very rigidly, and the library automation systems don't
have the flexibility to adjust to divergent translations.  (Notis, at
least, does know the difference between terminals with the library
character set and terminals without -- but it *knows* what the library
character set is, and cannot readily be told different.)

Fourth:  although some protocol converters have the memory required for
multiple translate tables (e.g. Series/1s), others (e.g. 7171s) don't.
Those running 7171s may be able to fit one or even two alternate
tables into their 7171s, but not more.  And we can only fit one:  the
rest of the room is taken up by local terminal types.  So we are going
to have to choose:  if you can only support ONE extended-character-set
translate table, which is it going to be?  Obviously, I think it
should be the one we here agree on, if we here can agree on one.
But for library machines, it's going to have to be the one defined
by the library code pages.

Fifth:  how can we support the other required translations if we cannot
put them into our protocol converters?  Matthias Melcher has the
best idea I've seen:  we use the CMS SET INPUT and SET OUTPUT commands
to simulate the translate tables actually needed by the users.  I have
only done a little work on this, but my experiments so far seem to
show that it can work.

SET INPUT and SET OUTPUT, on the other hand, only work for terminal
support.  For file transfer, we are going to have to have execs which
will post-process files uploaded with a given translate table, and
re-translate them into the proper code page.  That shouldn't be too
hard with Rexx under VM.  What other operating environments can do,
I don't know.

All of which is just one person's private opinion.  Please contradict
me where I am wrong.

Michael Sperberg-McQueen, University of Illinois at Chicago
12-Apr-88 00:56:38-EST,6367;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Tue 12 Apr 88 00:56:33-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Tue, 12 Apr 88 00:54:57 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 9635; Tue, 12 Apr 88 00:54:55 EDT
Received: by BITNIC (Mailer X1.25) id 8067; Tue, 12 Apr 88 00:51:49 EDT
Date:         Mon, 11 Apr 88 23:31:39 EDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         "Bryan, Jerry" <VM0A61%WVNVM.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Single or Multiple Tables for Multiple Character Sets?
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Message of 04/11/88 at 10:56:34 from U18189@UICVM

I have refrained from saying anything to this list so far, mostly
out of fear of saying something foolish.  It remains to be seen
how well founded that fear is, but I have finally decided to
put my two cents worth in anyway.  Speaking of which, I am writing this
on a PC running a VT100 emulator going into an IBM system
through a 7171 protocol converter, so I would have a hard time
putting in a cent sign (ignoring going into XEDIT hex mode), and
even it I did, many of you would have a hard time seeing the cent sign
on your terminal anyway, which I suppose is the whole point of this
list.  (Note to Europeans about American slang, "two cents worth" is
an opinion that may not be worth very much (or may  -- it is up
to the listener to decide).  The speaker is specifically disclaiming
any great profundity by using that phrase.)

Anyway, I have the feeling that there is too much emphasis on
EBCDIC-ASCII conversion and not enough emphasis on straightening
out EBCDIC and straightening out ASCII *as separate problems*.
The problems of EBCDIC are legion and well documented  --
the characters needed by C and PASCAL, national characters, etc.
ASCII has the same problems of missing characters and national
characters, and is even worse than EBCDIC in some ways because
it historically has been only a 7-bit code.  I offer the
following suggestions.

  1.  7-bit ASCII is a lost cause.  I realize there will be
      7-bit ASCII well into the next century, but we would
      do well to concentrate on 8-bit ASCII and getting it
      right.  One could argue that 8 bits are not enough,
      either, but 7 bits are hopeless.

  2.  Proper graphic-to-graphic mappings *within EBCDIC* and
      *within ASCII* are vital.  To the maximum extent possible,
      "proper" means "it looks the same, no matter what".  This
      goal really cannot be achieved in 8 bits, but it should
      be the goal, nevertheless.  I have had the opportunity which
      many Americans do not have of living in Europe.  It was
      irritating to receive messages from America and have characters
      not look right.  For that matter, even locally written
      things  --  SAS programs using dollar signs, for example  --
      looked awful.  I lived in Norway, which to the eye of an
      American has a curious looking alphabet, but one adapts.
      However, I once traveled to Germany and received messages from
      Norway in Norwegian on a German terminal.  My Norwegian
      is not all that good anyway, but it was doubly hard when
      many of the Norwegian characters were rendered as
      German characters.  Now, I have the same problem in America
      with Norwegian I receive here.  It seems to me that
      constancy of graphic rendering ought to be one of the highest,
      if not the highest goals, even though the goal cannot really
      be achieved in 8 bits if enough languages are considered.
      (For that matter, wouldn't it be nice to prepare something on
      a word processor with italics, send it out over BITNET, and
      have your italics characters appear as italics on the
      recipient's screen?)  The infamous problem of the square brackets
      on the TN print train is one of many examples inconstancy
      graphic rendering in EBCDIC, but as noted on this list and
      SHARE and SEAS papers, there are many others.

   3. Having said all that, rather at too much length, then let me
      further suggest that the same constancy of graphic rendering
      ought to be a goal of ASCII-EBCDIC conversion.  All the problems
      I mentioned in item 1 were EBCDIC-EBCDIC problems with
      communications between IBM VM/CMS systems. I also communicated
      between IBM and VAX systems in Norway and to the rest of BITNET,
      and the lack of graphic constancy gets even worse when ASCII
      is introduced into the equation.

   4. CLearly, graphic constancy implies that both ASCII and
      EBCDIC be as rich as possible in characters, and that the
      national character idea is not a very good idea.  An American
      dollar sign and a British pound sign, for example, need
      to be *standard* in ASCII and EBCDIC, but there are numerous
      other examples such as Western European umlauted, accented, and
      dipthonged characters.  Last I heard, the backslash was a
      national character  --  not so nice for the folks writing in C.
      However, all this still comes back to even 8 bits not being enough
      (Greek? Hebrew? Russian? Japanese? Arabic? etc.)

   5. Finally, graphic constancy implies that ASCII-EBCDIC conversions be
      fully reversible in both directions.  This is a part of my distaste
      for even dealing with 7-bit ASCII, where full conversion
      to/from EBCDIC is clearly impossible.

I will finish by noting that I have tangled with this problem
for years, and never cease to be amazed by how difficult it is.
It always *seems* like it ought to be easy, but somehow it
never is.  Something always gets you.  I have had users
editing IBM PL/1 code on a VAX, submitting batch to an
IBM machine, for example.  How do you handle that? Etc. etc.
etc., as many other people have pointed out.  Be suspicious
of anybody who doesn't understand why the problem is not
trivial, and who submits his own set of translate tables to
prove it.
12-Apr-88 01:38:46-EST,1519;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Tue 12 Apr 88 01:38:41-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Tue, 12 Apr 88 01:37:04 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 9645; Tue, 12 Apr 88 01:37:03 EDT
Received: by BITNIC (Mailer X1.25) id 8504; Tue, 12 Apr 88 01:36:10 EDT
Date:         Mon, 11 Apr 88 22:27:00 PDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Leonard D Woren <LDW%USCMVSA.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Two more cents worth...
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

I'm a hardcore IBM bigot, but (with some help from coworkers), I've
come to the realization that EBCDIC's design doesn't make sense.  The
alphabet isn't contiguous, and numerals sort higher than letters.
ASCII makes certain programming tasks much simpler by not having
either of these defects, which date back to EBCDIC's ancestry in BCD,
which was based on punch card codes.

This isn't intended to start a war of words, so no flames please...
This is just mentioned as something to think about:  It may be heresy,
but maybe the answer is to add some characters to 8 bit ASCII and
throw out EBCDIC.  (Yes, I realize how much work a conversion would
be.)
12-Apr-88 10:09:56-EST,1482;000000000000
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Tue 12 Apr 88 10:09:52-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Tue, 12 Apr 88 10:08:15 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 0194; Tue, 12 Apr 88 10:08:11 EDT
Received: by BITNIC (Mailer X1.25) id 5511; Tue, 12 Apr 88 10:07:38 EDT
Date:         Tue, 12 Apr 88 09:11:02 EDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Edwin Hart <HART%APLVM.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: Two more cents worth...
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Your message of Mon, 11 Apr 88 22:27:00 PDT

Throw out EBCDIC?

That clearly is one of the options.  However, I believe that this would be
a larger conversion effort than to IBM Country Extended Code Pages like
37 v1 or 500 v1.  Also, ISO 8859 does not have a contiguous alphabet because
the accented characters are in the upper half of the table so you have not
fixed one of the problems.  As soon as you have to be concerned with
accented characters and different collating sequences between countries,
then sorting becomes much more difficult (because it depends on both language
and country).

Ed Hart
12-Apr-88 10:32:00-EST,4592;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Tue 12 Apr 88 10:31:57-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Tue, 12 Apr 88 10:30:20 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 0262; Tue, 12 Apr 88 10:30:18 EDT
Received: by BITNIC (Mailer X1.25) id 6198; Tue, 12 Apr 88 10:23:54 EDT
Date:         Tue, 12 Apr 88 09:08:44 EST
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Howard Gilbert <GILBERT%YALEVM.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: Single or Multiple Tables for Multiple Character Sets?
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Message of Mon, 11 Apr 88 10:56:34 CDT from <U18189@UICVM>

It is important in this discussion to realize that characters set issues
are not separable from the purpose of the data or communications.  Some
people mix the question of ASCII data and ASCII terminals without proper
thought.  When an ASCII terminal is connected to a protocol converter to
emulate a 3270 display, it is doing device emulation and not character
translation.  You press "A" and get an "A" on the screen and at the
host.  This is not a case of simple character translation of X'41'
into X'C1'. If you are in APL mode, sending X'41' will be translated
into APL alpha and lowercase "a" goes to uppercase.  If "A" is embedded
in an ESC sequence, it is interpreted and not translated.  What, after
all, is the "EBCDIC" meaning of PFK 4 (answer: it is an AID and not a
character).  Thus the objective of the 7171 is to translate the KEY
marked "A" into EBCDIC and not the code.  Turn on the Dvorak keyboard
mode and see what happens then.

This then follows into all of the remaining discussion.  We recently
were disturbed to note that Notis cannot find the city of Lodz in
Poland.  The problem is that the L is stroked (overtype L and /).
Stroke L is an ALA special alphabetic (along with D bar, O /, and
U hook). It has lowercase and uppercase forms.  The problem is that
ordinary library users do not have the special alphabetics and type
in ordinary "L" and Notis does not alias approximately homographic
alphabetic characters when doing a search of the database.  Note that
Notis will match on "odz" but not "Lodz".

Worse, our database is inconsistent in its handling of AE and OE
dipthongs. In a large number of cases they are typed as two characters
rather than using the dipthong code.  Again, database searching is
a problem.

There are around 500 characters (including all diacritically marked
forms) enumerated in the ANSI Z39.47-1985 standard for 35 Roman
languages and 51 other Romanized forms of languages.  Unfortunately,
this does not include the Hebrew, Cyrillic, and Arabic alphabets let
alone the Far East.

In its most general form, the problem cannot be solved.  It can be
solved FOR PARTICULAR APPLICATIONS.  Not all applications will find
the same solution optimal.  The purpose of the committee is to find
if there is one solution which is applicable to a large enough family
of applications to warrant general acceptance.

There are some who would argue that we are looking for a single common
translation.  I would prefer to believe that we are looking for a
single preferred translation for the bulk of use.  Just as ISO 8859
itself cannot replace ANSI Z39.47 and stay within the 8 bit limit of
available graphic code points, so any ASCII to EBCDIC translation which
is generally suitable for Data and Word Processing will still fail
to address math-technical, traditional TN (box drawing), APL, and
other common code problems.

I have separately held that we need to make an accompanying recommendation
that ASCII-EBCDIC (and EBCDIC-EBCDIC) be systematically addressed in
operating systems, data management subsystems, and communications
subsystems.  This will allow organizations to develop standardized
approaches to special needs which are not addressed by the common
translate table.

In essence, the idea of a single common table is too easy a way out
for IBM.  They have undertaken to define such a table on at least
two previous occasions.  Changing 512 bytes of table space is rather
easy.  Addressing the general question of codes, alphabets, collating
sequences, and the like is a much larger and expensive project.
12-Apr-88 11:20:07-EST,3401;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Tue 12 Apr 88 11:20:04-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Tue, 12 Apr 88 11:18:28 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 0393; Tue, 12 Apr 88 11:18:26 EDT
Received: by BITNIC (Mailer X1.25) id 8035; Tue, 12 Apr 88 11:17:35 EDT
Date:         Tue, 12 Apr 88 10:14:19 EDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         "Bryan, Jerry" <VM0A61%WVNVM.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Two more cents worth...
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Message of 04/12/88 at 09:11:02 from HART@APLVM

>Throw out EBCDIC?

>That clearly is one of the options.  However, I believe that this would be
>a larger conversion effort than to IBM Country Extended Code Pages like
>37 v1 or 500 v1.  Also, ISO 8859 does not have a contiguous alphabet because
>the accented characters are in the upper half of the table so you have not
>fixed one of the problems.  As soon as you have to be concerned with
>accented characters and different collating sequences between countries,
>then sorting becomes much more difficult (because it depends on both language
>and country).

Notwithstanding the problems listed above, I think that throwing out
EBCDIC might ultimately be the way to go, and I, too, am a lifelong
IBM bigot.  Throwing out EBCDIC right now is clearly unthinkable.
But consider the following.  Suppose this process of rationalizing
EBCDIC and ASCII succeeds to the point that there is a well defined
graphic to graphic mapping and also a well defined and fully reversible
256 code point to 256 code point mapping established.  At that point,
aside from such trivial little problems as old data, old hardware,
sorting and collating sequences, etc., isn't the mapping between
code points and graphics somewhat arbitrary?  And if the mapping is
somewhat arbitrary, why not standardize on ASCII?

(An irrelevant aside on the sorting problem:  I do not know how the
accented or umlauted characters are sorted in German or French, for
example.  But Norwegian has one curious sorting problem which I think
no coding will solve completely.  The 29-th letter in the Norwegian
alphabet has two different graphics renderings.  One is as a double
"A"  -- "aa" in lower case and "AA" in upper case.  This is not a
dipthong, it is a *single* letter rendered as two characters.
The other graphics rendering
is as an "A" with a circle over it.  The "AA" must be sorted as if
it were a single letter in the 29-th position of the alphabet, even
though it is represented as two A's in computer memory.  The "A-with-
circle-over-it" is also sorted as if it were a single letter in the
29-th position of the alphabet, and it is represented as a single
character in computer memory.  But the two distinct graphics renderings
must be maintained, so people's names will not only be spelled
correctly (either graphics rendering is a "correct" spelling),
but also will *look* right.  Are there any other sorting
problems which anybody knows about which are this severe?)
12-Apr-88 12:14:23-EST,3118;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Tue 12 Apr 88 12:14:18-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Tue, 12 Apr 88 12:12:36 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 0518; Tue, 12 Apr 88 12:12:35 EDT
Received: by BITNIC (Mailer X1.25) id 9299; Tue, 12 Apr 88 12:11:50 EDT
Date:         Tue, 12 Apr 88 10:25:07 CDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Michael Sperberg-McQueen <U18189%UICVM.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Code translation, device emulation, Notis, and Sorting
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Howare Gilbert corrects, quite rightly, my oversimplified discussion
of the issues -- I will say that *within graphics strings* what is
happening sure seems like code translation to me, but he is right
that terminal emulation, data conversion, and so on are enough
different that we need to keep the differences present in mind as we
discuss them.

His example of searching problems is a good one, but lest a false
idea of Notis's capacities become widespread I should point out that
at UIC we have no trouble finding Lodz (or &odz, as it appears
in most of the records) by searching on 'lodz' -- our indexes never
contain diacritics.  Something other than Notis must be the problem.


Jerry Bryan inquires about analogues to Norwegian's 'aa' and 'a'
-- I know only of two.  The 'ij' digraph in Dutch is sorted
by itself, and the sharp s (esszett, w) of German sorts identically to
'ss', without being (in Germany) the same thing at all.  (In
Switzerland, sharp s is no longer used, and some Swiss refuse to
believe me when I say it is still used in Germany and Austria.)
But diacritics and umlauts also cause problems that no diddling with
collation sequence can solve.  Lists of words in French and German
have, effectively, two sort keys:  they are sorted first on the
base characters without regard to the diacritics, and secondarily
on the diacritics.  (N.B. I am describing the practice taught me
in class, and the practice I observe in dictionaries.  Perhaps one
of the European list members can say how diacritics are typically
handled in data-processing sorts.)  The concordance packages built
for literary and linguistic study, therefore (e.g. WatCon and the
Oxford Concordance Package) have special sort facilities to prepare
sort keys for the sorting.  But N.B. Howard Gilbert is quite right
that sort sequence depends both on language and on country.  Umlauts
follow 'z' in Swedish and Modern Icelandic -- so books on Old
Norse printed there sort them after 'z'.  Books on Old Norse printed
in England and North America tend to sort them either that way or
in the German fashion.  Same goes for edh and thorn.

Michael Sperberg-McQueen, University of Illinois at Chicago
12-Apr-88 13:02:15-EST,4248;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Tue 12 Apr 88 13:02:11-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Tue, 12 Apr 88 13:00:32 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 0610; Tue, 12 Apr 88 13:00:30 EDT
Received: by BITNIC (Mailer X1.25) id 0349; Tue, 12 Apr 88 12:58:48 EDT
Date:         Tue, 12 Apr 88 09:45:56 EST
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         John Kesich <KESICH%NYUCIMSA.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: Two more cents worth...
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Message of Mon, 11 Apr 88 22:27:00 PDT from <LDW@USCMVSA>

I have no doubt you will catch a lot of flames from people who don't
understand the issues, I for one think you are on the right track.

As I see it the whole EBCDIC/ASCII/ISO8859 issue is really 4 closely
related problems:

    1) providing the means to input/output characters which are meaningful
       to the underlying host but are not available on the terminal,
       printer, etc. being used

    2) inter-site communication (ISO8859 should be adopted as the standard
       code for all such communication)

    3) ASCII-ISO8859 migration, while the people on this list may not be
       too concerned about this particular problem, some of us do have to
       'push from the other side' as well.  As yet, I don't know of any
       group which is pushing for UNIX support of ISO8859, for example.

    4) EBCDIC-ISO8859 migration - I'll discuss this at some length.

Suppose for a moment that EBCDIC code pages for each of the ISO8859 family
of codes were adopted, the end result would be that IBM's would be using
ISO8859 with a different collating sequence.  We would also be saddled with
the needless waste of 'translating' characters between the two indefinately.

I realize that there are MANY applications which currently use EBCDIC, and
I am not proposing to simply scrap them.  What I am proposing is that IBM
provide a means for users to migrate at their own pace from EBCDIC to
ISO8859.

There will no doubt be those who say why should IBM switch to ISO8859 why
doesn't everyone else switch to EBCDIC.  The answer is three-fold.  ISO8859
is a better code, it was designed for computers not inherited from TAB
equipment.  ISO8859 is an internationally accepted standard, it may not be
perfect but everyone has agreed to use it.  IBM itself is straddling the
EBCDIC-ASCII divide (PC, PS/2, AIX).  So, if we want a standard, ISO8859 is
the one to pick.

Few people are probably aware that when the 360 was first introduced bit 12
of the psw determined whether the machine was using ASCII or EBCDIC.  I
remember reading an interview with one of the developers a while ago in which
he stated that this feature was dropped because 'no-one wanted it'.  What a
pity, if the business DP centers of 20 years ago had had a bit more foresight
this whole mess might have been avoided.
(Actually, I think the reason users didn't want IBM's ASCII support had to do
with the way IBM defined ASCII - I came across an old copy of Principles of
Operation which describes it.  The characters themselves were pretty much as
they are now, but the bits were laid out strangely:  76X54321 where X was 0
if the character was < @ and 1 if > ?.)
IBM should introduce an EBCDIC/ISO8859 option.  Eventually EBCDIC would then
go the way of card readers and 7-track tapes, and future programmers would
be able to marvel at how quaint this whole situation was.  Even if it took
20 years, eventually we would be rid of the problem.

If IBM does not migrate to ISO8859, how long will it be before there is a
mailing list to discuss the problems of having 2 collating sequences?

If IBM is going to migrate to ISO8859 then now is the time to start planning
for it.

A final question: is there any real benefit to having 2 distinct code
families which I have overlooked?
12-Apr-88 13:28:02-EST,1883;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Tue 12 Apr 88 13:27:55-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Tue, 12 Apr 88 13:26:10 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 0654; Tue, 12 Apr 88 13:26:09 EDT
Received: by BITNIC (Mailer X1.25) id 0461; Tue, 12 Apr 88 13:06:41 EDT
Date:         Tue, 12 Apr 88 13:00:07 EDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         "John F. Chandler" <PEPMNT%CFAAMP.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: Single or Multiple Tables for Multiple Character Sets?
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  U18189%UICVM.BITNET@CUVMA.COLUMBIA.EDU message of Mon,
              11 Apr 88 10:56:34 CDT

I don't subscribe to this discussion list, but I was sent a copy of
this one posting.  My perspective is two-fold: file transfer and
connection of ASCII terminals to IBM mainframes.  In a way, the 2nd
is just a special case of the first -- there is a tremendous corpus of
files that have been typed in over the years.  I will restrain my
skepticism for the moment and assume that a single standard can be
(A) agreed upon in the present forum and (B) acted upon elsewhere.
That leads to my main point: having gone through this whole argument
in the context of CMS Kermit, I have come to the conclusion that, once
a site settles on a single translation scheme, that scheme should be
built into any and all file transfer mechanisms used there.  Kermit,
for example, offers a tailorable A-to-E table (on the mainframe side),
which can embody any mapping you care to define.
12-Apr-88 14:15:11-EST,2194;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Tue 12 Apr 88 14:15:06-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Tue, 12 Apr 88 14:13:28 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 0808; Tue, 12 Apr 88 14:13:26 EDT
Received: by BITNIC (Mailer X1.25) id 1187; Tue, 12 Apr 88 14:12:27 EDT
Date:         Tue, 12 Apr 88 13:39:46 EDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         "Bryan, Jerry" <VM0A61%WVNVM.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Two more cents worth...
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Message of 04/12/88 at 09:45:56 from KESICH@NYUCIMSA


>As I see it the whole EBCDIC/ASCII/ISO8859 issue is really 4 closely
>related problems:

.... text deleted....

>    2) inter-site communication (ISO8859 should be adopted as the standard
>       code for all such communication)

This is a most interesting suggestion.  It would mean, for example, and
if I interpret it correctly, that two IBM EBCDIC machines communicating
with each other would use ISO8859 rather than EBCDIC over the communications
path.  This sort of suggestion is philosophically in line with standards
emerging in other areas where there is an interchange standard for
graphics, for word-processing style text, for CAD/CAM drawings, etc.,
where the interchange standard does not dictate how the data is
stored in the computer, so long as the machine can convert from its
internal representation to the interchange standard and back.

If this idea were carried far enough, it could possibly become the
basis for a long term (30 year?) conversion plan to ISO8859 for
everything.  For example, one could view reading or writing to a tape
or a disk as an "interchange", so one could read or write an ISO8859
tape or disk into an EBCDIC machine or vice versa with smart controllers
performing an "interchange" rather than an I/O.
12-Apr-88 17:25:50-EST,1620;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Tue 12 Apr 88 17:25:45-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Tue, 12 Apr 88 17:23:54 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 1089; Tue, 12 Apr 88 17:23:50 EDT
Received: by BITNIC (Mailer X1.25) id 3729; Tue, 12 Apr 88 17:23:03 EDT
Date:         Tue, 12 Apr 88 16:07:05 EST
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         John Kesich <KESICH%NYUCIMSA.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: Two more cents worth...
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Message of Tue, 12 Apr 88 13:39:46 EDT from <VM0A61@WVNVM>

>>    2) inter-site communication (ISO8859 should be adopted as the standard
>>       code for all such communication)

> This is a most interesting suggestion.  It would mean, for example, and
> if I interpret it correctly, that two IBM EBCDIC machines communicating
> with each other would use ISO8859 rather than EBCDIC over the communications
> path.

In theory, yes they would use ISO8859.  In practice they could continue to
use EBCDIC between them so long as all data being passed through the link
between 3rd parties was correctly mapped from and then back to ISO8859.
But backbone network nodes could avoid all the translation overhead by
working strictly in ISO8859.
13-Apr-88 17:56:19-EST,1818;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Wed 13 Apr 88 17:56:18-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Wed, 13 Apr 88 17:54:42 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 2584; Wed, 13 Apr 88 17:54:39 EDT
Received: by BITNIC (Mailer X1.25) id 7127; Wed, 13 Apr 88 17:53:41 EDT
Date:         Wed, 13 Apr 88 17:23:43 EDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         "User Services, DCS Paul Henderson" <HENDERS%WATDCS.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: Two more cents worth...
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  MAIL of Wed, 13 Apr 88 10:34:49 +0300

>To set the record straight, bit 12 of the PSW controlled the generation of the
>sign in the results of decimal arithmetic computations.  The bit was labelled
>ANSI (rather than ASCII) because the "standard" plus sign was X'A' and the
>minus sign X'B' rather than X'C' and X'D' respectively, which were IBM's

At the risk of sounding like a nit-picker -- I just happen to have a
Principles of Operation, Form A22-6821-7, dated September 1968.
On page 71 it discusses the PSW:
   ASCII(A): When bit 12 of the PSW is one, the codes preferred
   for the  USASCII-8 code  are generated for  decimal results.
   When the PSW  is zero, the codes preferred  for the extended
   binary-coded-decimal interchange code are generated.
Perhaps the meaning of the bit was changed to ANSI before it was dropped
but for some of us, it really was the ASCII bit.  We never used it either.
14-Apr-88 09:33:27-EST,5133;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Thu 14 Apr 88 09:33:24-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Thu, 14 Apr 88 09:31:51 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 3213; Thu, 14 Apr 88 09:31:49 EDT
Received: by BITNIC (Mailer X1.25) id 3982; Thu, 14 Apr 88 09:30:24 EDT
Date:         Thu, 14 Apr 88 14:55:00 MET
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Johan van Wingen <MOSGLA%HLERUL2.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      national versions
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>


Dear list subscribers

Let us stop discussing red herrings like "Throw out EBCDIC". If you
ever have attended a SHARE or SEAS meeting you ought to know what is
needed for putting a requirement to IBM.
Let us also stop using imprecise terminology, like ISO8859 without
further qualification, or ASCII meaning an 8-bit code. Because of this
Mr. Kesich's letter is incomprehensible to me. There is no single
ISO8859, and ASCII is not identical with ISO646, both being 7-bit codes.
8-bit ASCII does not exist under this name.

The real problem for both EBCDIC and ISO is that of the unique graphic-
code correspondence. Risking to bore those who know, some little
tutorial is appropriate. First "ASCII".

ISO646 (1st ed. 1973, 2nd ed. 1983) specifies 7-bit codes for
characters. Of the 128 possible codes 33 are for control characters, one
for SPACE, and 94 for graphic characters. Of this last group 82 are
unique, 12 are left open. For completing the set defining a "national
version" of ISO646 is required. An International Reference Version
(IRV) is provided where none is preferred. ASCII is simply the US
National Version of ISO646. It differs only from IRV in having a $
instead of the currency sign. But German, Swedish, Danish/Norwegian and
others substitute accented letters at most of the 12 places. What does
this mean? If you send square brackets to Norway, they arrive as AE and
A-ring (braces as ae and a-ring). This practice puts a barrier between
the English and the non-English speaking world, caused by the number of
characters being limited to 94.

An 8-bit code could be a solution, by adding 96 codes. But even then,
not every character can be accomodated in a unique way. ISO 8859-1
provides 190 unique characters used in Western Europe. This shifts the
barrier to about the Iron Curtain, leaving Greece and Turkey at the
wrong side. Now, if you send a Turkish text from Ankara to Washington,
it arrives with Icelandic eth's and thorns, inserted into the Turkish
words. If you are not too concerned about excluding a NATO member from
the Western civilisation, ISO8859-1 is certainly an improvement.

There is a next step but - at a price. If we take two bytes for every
character we can accomodate much more, even Chinese. Only a few
mandarins who happen to know 80000 Chinese characters would be
disappointed. The design of this is the subject of SC2/WG2, meeting this
week in Boston. Thus, at some time, there may be an ISO standard for it.
I am interested to hear opinions on this idea.

As for EBCDIC the situation at present is comparable, only there are 14
positions available for national versions. You find the story in "3270
system - Display and Printer I/O Interface Codes", Figure 10-43. Still
worse is the collection of horrors in "IBM Displaywriter Host Attach
Programming Guide". p. 5-3 to 5-33. It also includes the EBCDIC/7-bit
code correspondence. There is also mentioned a distinction between
EBCDIC/Multilingual, EBCDIC/DP and EBCDIC/WP.

So far for the tutorial. If we want the order of things changed, we can
know what they are. But there is an important difference when attempting
changes. The ISO standards are produced by international Working Groups,
and approved by Subcommittees and Technical Committees, National Member
Bodies voting. But in many countries it is not difficult to get into
their panels, provided that you know your stuff, and are prepared to do
a lot of work, and attend the meetings.

With EBCDIC changes are an IBM management decision that can be only to a
certain extent be influenced by SHARE, SEAS or other groups' requests,
and often at a stage that is too late. Even the defining document for
EBCDIC is hard to obtain (it exists, it is IBM Corporate Standard, CSS
3-3220 002, that is to say my copy that dates from 1970, and will have
been modified certainly since then).

So, we should not only talk about what to agree, but also about the way
to achieve it.

I hope that my contribution to our list has been constructive. Let us
shed our tears elsewhere.
Yours faithfully, Johan van Wingen

 FROM  J. W. van Wingen    MOSGLA@HLERUL2   :
     Mail to                                :
 P. O. Box 486,  2300AL Leiden, Netherlands :

14-Apr-88 12:17:59-EST,4482;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Thu 14 Apr 88 12:17:53-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Thu, 14 Apr 88 12:16:17 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 3540; Thu, 14 Apr 88 12:16:13 EDT
Received: by BITNIC (Mailer X1.25) id 6936; Thu, 14 Apr 88 12:12:39 EDT
Date:         Thu, 14 Apr 88 09:27:22 CDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Michael Sperberg-McQueen <U18189%UICVM.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Query about implementation and use of ISO standards
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

In discussions of character sets, I occasionally encounter people (in
person or by their writings) who wonder what all the fuss is about,
since after all ISO 2022 defines a perfectly adequate method of
switching from one seven-bit character set to another (SO) and
back (SI) -- and moreover also defines methods for specifying, as
part of the data stream, which character sets are being used as G0
and G1 sets.  (By means of escape sequences assigned by ECMA, acting
for ISO as a registration authority, to all registered sets.)

The obvious reason for the fuss, of course, is that ISO 2022 defines
a method, but does not provide the hardware or software to implement
the method.  And I wonder -- do the hardware and software exist?

So my questions are these:

1 how common (in the experience of this group, either in North America
or elsewhere) are terminals or terminal emulation programs which accept
and handle SI/SO character set switching?  I can think of:

    - Datamedia APL terminals (I have read that most ASCII APL
         terminals do SI/SO, but I have never encountered any but DM)
    - IBM 3163, 3164 terminals, with or without the ALA cartridges
    - Yterm (beginning with version 1.3)

and that's it for me.  Are there others?

2 how many of these terminals can handle G1 graphics other than the
built in set?  In my limited experience, only two:  the IBM 3163/4
and PCs -- if they have EGA or Hercules Graphics Plus or Quadvue cards.
(Or any PS/2 with a VGA.)

3 how many devices of any type can choose the character set they use
on the basis of the registered escape sequence for a set?  I don't
know of any at all.  Is that supposed to be what happens with ISO
2022, or is it intended that software somewhere along the way will
see the registered escape sequences and translate them into control
sequences that will set the terminals or printers correctly?  If
the recognition of registered escape sequences is supposed to happen
in software, then has anyone ever written, used, seen, or heard of
software that does this?

4 (forgive my ignorance, I have used mostly IBM mainframes) Do the
file structures and utilities of ASCII operating systems and editors
always / usually / sometimes / ever allow escape sequences like those
prescribed by ISO 2022 to be embedded in files?  Or will the
communications link see the escape sequence when it comes in from
a terminal, try unsuccessfully to parse it, and discard it?
For that matter, can ASCII systems embed the SI and SO in the file?
(Or IBM systems?  Yes, I know about SET HEX ON and ALTER in Xedit,
but are there simpler ways?)

In sum -- my own experience is that SI and SO are useful and (now)
possible, between a mainframe host and a terminal where both know in
advance what character sets are to be used.  I have now seen this
convention actually used in terminal-to-host communication, this year
for the first time (long after first reading about it).

But -- while it seems equally useful to be able to identify character
sets by the use of escape sequences embedded in the data stream, I
have still (fifteen years or more after ISO 2022) never seen in use
or heard of as ever being used.  Is it used?  Or is it a nice idea
that no one has implemented, as ISO 6937/2 appears to be?

Since there seems no point in burdening the list with replies saying
"Nope, I haven't ever seen it either," replies can be sent to me,
U18189 at UICVM, and I will post results, if any, to the list.

Michael Sperberg-McQueen, University of Illinois at Chicago
14-Apr-88 12:46:25-EST,4870;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Thu 14 Apr 88 12:46:19-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Thu, 14 Apr 88 12:44:42 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 3584; Thu, 14 Apr 88 12:44:40 EDT
Received: by BITNIC (Mailer X1.25) id 7306; Thu, 14 Apr 88 12:43:19 EDT
Date:         Thu, 14 Apr 88 11:24:08 EST
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         John C Klensin <KLENSIN@INFOODS.MIT.EDU>
Subject:      RE:       national versions
X-To:         ASCII/EBCDIC character set related issues
              <ISO8859%JHUVM.BITNET@MITVMA.MIT.EDU>
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

I agree with Johan's comment, but want to add one observation and
reinforce another.

1) Changing EBCDIC:  By and large, the way IBM makes decisions like this
are dictated by "marketing considerations" much more than by, e.g.,
requests from the likes of SHARE or SEAS.  Since the COBOL debacle and
the environment that spawned it ("we have these three zillion lines of
code that we would have to change, and that would cost us umpity-ump
kilos of gold and pounds of flesh..."), "marketing considerations" has
often been an abbreviation for "we are just not going to make
incompatible changes if they are going to disrupt our installed customer
base".  That is, I want to stress, a reasonable position, but it makes
"drop EBCDIC internally" about as realistic (maybe less so) than "it
would be really nice if the 370 supported a hardware stack
architecture".
  The whatever-you-like-internally, Standard character sets in
interchange, approach is actually realistic, but this is not the right
place to debate it.  Things are moving in that direction anyway, but, if
you are going to have that plan, then you are going to need to convert
the graphics.  Somehow.  Which is where this discussion should probably
focus.

That said, the problem is a little worse than Johan's description.
First of all, the national member body representatives of some of the
countries with non-alphabetic languages stood up at an ISO/IEC JTC1/SC22
(programming languages) meeting last fall and indicated, among other
things, that "multiple byte" might well need to be more than two.
Second, they want the multiple byte sequences *embedded* in single-byte
sequences and vice versa, and got an SC22 vote imposing support for
exactly that requirement on any future programming language
standardization.  They appear to feel that translation from a data
stream that has both sets of characters and escapes into an internal
representation that uses a single (adequately long) length is
unacceptable, or only marginally acceptable - at least in part because
of the space requirements.   Consider the programming language
implications of the usual "how long is that string in 'characters'" and
"are these two strings equal" operations.
    Please do not start a discussion on this topic here, just think
about it as part of the background to any "solutions" you propose.

Now, the other thing that has slipped past in the flurry of messages is
that there are ISO standards finished or under development that permit
switching character sets midstream.  I can, in principle, send out a
stream of characters in ISO8859/1 (Latin alphabet 1) and insert a
control sequence somewhere that says "here comes ISO8859/8", and then
send some characters which I want interpreted according to the latter
set of graphic mappings.  Then I can switch back, or switch to a third
registration set.  Now, one can perfectly well design a system that
responds to those "switch sets" controls with "I can't deal with that
nonsense", or one can be prepared to handle all of them.  But, if you
take the first position, you are better off than you were with national
variants of ISO646 only in that you *know* that you can't interpret the
characters correctly, rather than thinking that they represent your own
variant.
  But, if you decide to cope with a character set switch, then you need
to worry about the EBCDIC code pages, or other variations, to deal with
the entire range, and how you are going to switch between them (so much
for "one network, one conversion standard" or even "one host, one
conversion standard", at least for simple versions of "conversion
standard").
  Again, this is not an attempt to send people off in another irrelevant
direction -- I would strongly discourage that -- but let's also skip the
simplistic "solutions".  They either won't work now, or won't work for
very long.

15-Apr-88 14:34:57-EST,3296;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 15 Apr 88 14:34:54-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 15 Apr 88 14:33:07 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 0759; Fri, 15 Apr 88 14:33:03 EDT
Received: by BITNIC (Mailer X1.25) id 8865; Fri, 15 Apr 88 14:32:21 EDT
Date:         Fri, 15 Apr 88 08:54:55 EST
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         John C Klensin <KLENSIN@INFOODS.MIT.EDU>
Subject:      RE:       Re: national versions
X-To:         ASCII/EBCDIC character set related issues
              <ISO8859%JHUVM.BITNET@MITVMA.MIT.EDU>
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

>       <MSTCOM%NEUVM1.BITNET@MITVMA.MIT.EDU>
>In-Reply-To:   Message of Thu, 14 Apr 88 14:55:00 MET from <MOSGLA@HLERUL2>
>
>     I think you are right about the two-byte international char. set.
>The world isn't as big as it used to be, because of the xxxNETs. So we
>need to have ONE code, having ALL chars of the world in it. Problems
>will be solveable by mapping existing producer-dependent charsets into
>this code when xfering files. Ensuring correct printout is depending on
>printers/printer software.

Sigh.  Let's assume that you can make a list of "ALL chars of the
world".  Let's assume that you can get a list of the "important"
characters in the non-alphabetic languages (Chinese and Japanese Kanji
are not the only ones, just the ones you hear about most often) and get
the people who use those languages to agree to never want to add another
character (which would require an extensible set, which works against
"ONE code").

Those assumptions are pretty unlikely to be true, but, just assume.

Then your only problem is that the nature of the standardization process
is that it is likely to be well into the next century before that
character set can be agreed upon.  There are a number of character sets
for which, as far as I know, there aren't even coding proposals in the
international arena (Sanskrit and Thai come to mind -- if those proposals
exist, they haven't crossed my desk when I was looking).  And, if you
want "ALL characters of the world", you need to worry about some
languages that are no longer in common conversational use, since
scholars in the relevant fields want to communicate with each other -
anyone for an ISO-standard Phoenician character set?  Etruscan, perhaps?

I think one's choice is to learn to deal with an extensible system, and
hence multiple characters sets, today (or soon), or to theorize and
harmonize for a *very* long time.  I think I prefer the former.

Also note that CCITT IA5, otherwise known as ISO646 Basic Version, was
an attempt at a "universal" character set, and works fairly well in
restricted applications, as does the even-more-restricted Telex
character set.  But it does not do very well for non-Latin alphabets or
lots of characters in highly-populated Latin-derived ones, which is what
this discussion is all about.


15-Apr-88 14:43:52-EST,3433;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 15 Apr 88 14:43:41-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 15 Apr 88 14:41:31 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 0776; Fri, 15 Apr 88 14:41:29 EDT
Received: by BITNIC (Mailer X1.25) id 8991; Fri, 15 Apr 88 14:40:27 EDT
Date:         Fri, 15 Apr 88 10:25:00 CDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Rick Troth <TROTH%TAMCBA.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Throw out EBCDIC?
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

        Thank you Jon Klensen for "right on the money" comments.
Indeed IBM is obliged to keep a large customer base happy. Banks (for example)
don't really care about what's new or what's happening in China, they just
want what works. "What we have now is fine ... why change?" IBM will always
be slow to change because most of their customers are slow to change.
We should make the change as smooth as possible.

        I am becoming an IBM biggot myself, but there was a time when I hated
EBCDIC for incompatibility. Then came 7171's, VAXen on BITNET, and ...
whoooa ... we're actually making progress here.  Amdahl seems to recognize
the value of ASCII (ISO) and the whole idea of a more general I/O scheme.
Their version of UNIX, UTS, is quite an excellent implementation. I was quite
astonished to discover that it is an ASCII system. But behold: one need not
give up 3270's, RSCS (even NETDATA), or VM. JNET (for the VAX) and UTS are
both making inroads for ASCII (and then ISO) into the IBM mainframe world.
(Actually JNET is making an inroad for EBCDIC into the VAX world   :-)

        Since I brought up DEC at this time, I will post my "report card"
on the VT220. Having gone over the white paper from Ed Hart, I compared
the listing of ISO8859/1 to the "DEC Multinational" character set.
Multinational diverged from 8859/1 in 15 places, five of those were collisions
where DEC had defined something different from ISO and ten were left blank
in the DEC definition. Jon mentioned switching character sets mid-stream.
The VT200's can do that. They can also handle SI/SO if you modify APL support
on your 7171. There are almost a dozen different "NRC sets" in the box.
I am not as enamored of DEC as I was once, but we have a lot of VAXen on
campus and thus have a lot of VT200's. I'd like one in my office.

        Since I mentioned UNIX, (somebody on IBM-MAIN just royally flamed
UNIX) the ideas that "everything is a file" and "all I/O is performed via
device drivers" are good. MVS does this (to an extent), CMS does not but it
could. I don't care for the idea of a two-byte character set, but I do like
the concept of mid-stream (transparent to the user) set switching.
Device driven I/O can handle that quite well and makes the transition
(to ISO or whatever) much smoother. This is precisely how UTS as an ASCII
system can work just fine with EBCDIC 3270 tubes. I am really quite impressed;
you really should all see it. (mercy! I don't mean to advertise for Amdahl)

                                                          - Rick
15-Apr-88 16:13:01-EST,1116;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 15 Apr 88 16:12:53-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 15 Apr 88 16:10:47 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 0974; Fri, 15 Apr 88 16:10:46 EDT
Received: by BITNIC (Mailer X1.25) id 0919; Fri, 15 Apr 88 16:09:56 EDT
Date:         Fri, 15 Apr 88 15:53:34 EST
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         John C Klensin <KLENSIN@INFOODS.MIT.EDU>
Subject:      RE:       Throw out EBCDIC?
X-To:         ASCII/EBCDIC character set related issues
              <ISO8859%JHUVM.BITNET@MITVMA.MIT.EDU>
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Small addendum to Rick's note:  The DEC VT300 has support for
Latin Alphabet 1.  The 200 predates it, and "DEC Multinational"
was, approximately, a best guess.

15-Apr-88 19:20:02-EST,17835;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 15 Apr 88 19:19:57-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 15 Apr 88 19:18:24 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 1239; Fri, 15 Apr 88 19:18:22 EDT
Received: by BITNIC (Mailer X1.25) id 3169; Fri, 15 Apr 88 19:11:48 EDT
Date:         Fri, 15 Apr 88 18:55:26 EDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Comments:     Code:    CECP 500
From:         Otto Stolz +49 7531 88 2645 <RZOTTO%DKNKURZ1.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Some clarifications
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Dear list subscribers,

first, let me tie together a few loose ends of the discussion that
commenced on 15 Mar 88 by Some Important Comments from Howard Gilbert.

I hasten to disclaim: I'm not the Network Expert of our site; rather,
my duties relate to the end-user interface (software & advice).  Hence,
the opinions and proposals presented below, are my private contribution
and bear no official character.

IBM's CECPs
-----------
Further to the ISO 8859-1 standard of 1987, IBM changed their Graphic
Character Set 00697 to conform with the character set of the ISO
standard.  To do so, they had to replace these 4 (four) characters:
   SC07  florin (guilder) sign
   SM10  double underline
   LI61  small letter dotless i
   SP31  numeric space
with those four characters:
   SM52  copyright sign
   SA07  multiplication sign
   ND011 one superscript
   SA06  division sign
Please, make sure that the tables you use are dated 1987 or later, and
contain the new character set.

Moreover, IBM has defined 9 (nine) Country Extented Code Pages (serving
17 languages), which contain the characters of GCS 00697 in various per-
mutations.  Again, you should make sure that you use the new CECPs.
Clearly, IBM had two aims in mind:
1. provide an unambiguous mapping between any pair of CECPs, and possibly
   between any CECP and ISO 8859-1;
2. avoid data conversion at their customer's sites when they switch over
   to the new CECPs.

This latter aim prevented the introduction of a single CECP: we all now
have to live with the consequences of the mistake IBM (and ISO, by the
way) have made dekades ago, when they started that "National Characters"
rubbish in their code pages.  We call this a "Treppenwitz der Welt-
geschichte", in German.

Characters convey a meaning
---------------------------
ISO 8859-1 and the CECPs deal with coding of characters into bytes,
hence the only sensible mapping between them is via matching characters
(i.e. grafics or character descriptions, eg. "small letter a with grave
accent" of ISO 8859-1 is mapped on "LA14 a Grave Small" of the CECPs).
It's a pitty, that IBM and ISO differ even in the wording of their
respective descriptions, but I guess, everybody can live with that.
This mapping would allow the transfer of notes, scripts and the like
between user's in all countries speaking one of these 17 languages.

As an aside, letters with diacritical marks, and German and Icelandic
National Letters, are vital for these respective languages.  They are
not just "fancy characters", but rather letters in their own right.
Recently, I've seen in a Swiss newspaper an amusing example of the habit
of using "ss" instead of the Sharp-s "":
>   Brigitte Bardot mit ihren beachtlichen K>rpermassen
    (BB, and the considerable masses of her body)
whilst the writer probably intended to say
    Brigitte Bardot mit ihren beachtlichen K>rpermaen
    (BB, and her remarkable anatomical measurements).
O yes, Michael Sperberg-McQueen, most Germans and Austrians cannot
imagine how the Swiss can do without Sharp-s.

With program sources, things are similar, but a bit more complicated.
Program sources are written by human beings, and they are on this world
to be read by human beings|  (As a pleasant accompanying phenomenon,
they can also be obbeyed by computers.)  If it were the other way round,
we all would enter our programs bitwise, in machine-language.  Hence,
program sources must look alike in books, on screens, in listings, and
on the keyboard.  That's the reason, programming languages' standards
do specify characters, and do not (and should not) specify code points
(cf. Howard Gilbert's remark).  On the other hand, they should (and
normally do) specify alternative representations to take account of
limited character sets (not Code pages|), eg. "(*" for "{" in Pascal.
And after introducing any ISO 8859 character set, compilers should cease
using characters for wrong meanings, e.g. tilde, or circumflex accent
for not-sign.

IBM falls far short of the goal of using characters sensibly:  without
being ashamed the least, they sell you equipment for a couple of Mega-
DMarks (a terminal, a control unit, a computer, an operating system and
a compiler) which is not capable of translating a Pascal program, even
as simple as
       PROGRAM ebcdic (output)
         (* example of a little Pascal program *)
       ; BEGIN writeln ( 'hello|' ) END
       .
just because you happen to live in Germany, where the word for "of"
contains a letter(|) that is interpreted by the compiler as an end-of-
comment-symbol|  :-(

Clearly, the next step to be required from IBM must be adapting their
language processors to the CECPs.  Recognizing dual EBCDIC codes for
some characters, is not enough for the compilers and other applications:
as long as there are various EBCDICs (call them CECPs or what you want),
you must be able to customize them for the variant to be used|  Folks,
please help convincing IBM by sending in as many APARs as you have pro-
ducts.  The same holds for other software suppliers.

But now, for the difference between plain text and programs.  In addition
to using characters, you may also refer to them.  This is no problem in
plain text (cf. the sharp-s example, above) -- but in programs you
normally use its code point to refer to a character|  And here the Code
Page crept in, again:  if you are going to convert a source program from
one code to another, you are doomed to understand its ends and means
to a T.  Every number (be it hexa-dekadic or decimal) might well be a
character code, or a character code offset, or whatever you can imagine.
Hence, automatic (and reliable) code conversion of program sources is
virtually impossible.  Example: you read in a Pascal program
         'a'-'z', '', '>', 'u', '' !
From your knowledge, that this program comes from a ISO computer in
Germany, you have to infer the meaning "some small letter", and you have
to translate it into something like
         'a'-'z', ''-'' !            (* for ISO 8859-1*)
 or
         'a'-'', ')'-'', 'W', 'a'-'i'
        , 's'-'.', 'j'-'r', 'N'
        , 's'-'z', 'J'-'m', '-'-''
        !                                (* for CECP 500 *)
and similar (but different) for other CECPs.
Now, you probably understand IBM's reluctance to a single universal CECP.

FORMER CODES
------------
The CECPs meet a lore (80 to 300, depending on whom you ask) of more or
less established codes and practices.

1. There is such a thing as The Factual Software Code: though no stan-
   dard (neither ISO, nor national, nor internal) covers this practice,
   software designers seem to unanimously take English(U.S.) EBCIDC plus
   TN-style brackets minus OCR characters for "the" EBCDIC.

   A couple of months ago, I met an IBM employee who is substantially
   involved in Codes and Keyboards design.  When I told him that the
   brackets are normally assumed on code points AD & BD, he exclaimed:
   "But they never belonged there|"  Then I told him, that even IBM's
   Pascal/VS compiler accepts only(|) AD and BD for the brackets.  He
   had never heared of such a thing|  (Boys, I'm not kidding; that really
   has happened here.)

   As I guess from the recent discussion in this list, the BITNET-Code
   (if there is such a thing) probably looks very similar.

   The character set of this Code is too limited to support any other
   language than English.  So, any CECP would be an improvement.  I do
   not believe that IBM might be willing to define a tenth CECP, based
   on this code (for which not even a standard exists).

2. IBM's I/O Interface Codes are selected during control unit customi-
   zation.  The trouble: chosing some keyboard layout, you implicitly
   chose an I/O Interface Code.  Example: if you chose German Keyboard,
   you get the "p" (capital U with diaresis) on the very codepoint, US
   EBCDIC uses for the exclamation point.  Hence, important sentences in
   notes from abroad, and in every IBM-supplied help text, are marked
   for us, inevitably, with U-Umlaut.

   Note, that every IBM 327x (or similar) screen is capable of displaying
   all letters required for 16 languages (anything except Icelandic) and
   a lot of special characters.  It's the control unit, that prevents
   you from seeing these characters -- or allows you to display them, if
   you have installed Configuration Support C, D or T.  In fact, every
   Configuration Support establishes its own EBCDIC variant;  hence,
   the nine CECPs cannot be truly upwards-compatible.

   Matthias Melcher's suggestion is based on these Configuration
   Supports.

3. The 7171 Control Unit seems to be based on a similar code as 1.,
   above.  The trouble here is, that any character which is not in the
   code translation table of this ingenious device, is translated into a
   colon ":".  Hence, you can only get about 90 different characters
   through this bottle-neck, when you need about 190 different ones.
   The 7171 manual states, that this translation table can be amended.
   Has anybody done this, so far?  If so, please drop me a note stating
   your experiences|

4. Kermit has it's own ideas on ASCII-EBCDIC translation.  (Very similar
   to 1., above.)  During Terminal Emulation, it's confined to 7171's
   limitations (at least in our case, where the PCs are connected via a
   7171).  For File Transfer, Kermit can be customized by a suitable
   take file; so at least in this area incompatibilities can be solved.

5. IBM PCs and clones have used their own 8bit character set, comprising
   the national letters of the same 16 languages but different special
   characters.  All PCs, regardless of their keyboard, use identical
   codes for this character set (that's the purpose of that keyboard
   program you get loaded, when yo boot-strap your PC).

   Pity, not all software designers recognizing this scheme.  Notably,
   terminal emulation programs tend to bypass the keyboard program and
   hence are useless outside USA.  The same tends to hold for software
   designed for multiple computer brands.

   I guess, IBM has started already delivering PCs with a new character
   set, a superset of ISO 8859-1 (they kept 16 classical PC characters,
   most of them semi-graphics).  The code is ISO 8859-1 (i.e every
   codepoint above A0 is re-assigned) plus the additional characters
   in codepoints 80 to 9F.

WHAT CAN BITNET DO?
-------------------
BITNET is primarily designed for transferring messages, i.e. plain
text.  Let's set a comparatively humble goal, for the moment:
      BITNET should transmit any plain text consisting of characters
      from the ISO 8859-1 character set (i.e. GCS 00697) sensibly
      and undisturbed.
This must be our first goal, leaving out
* special handling of program sources (cf. remarks above),
* other latin based alphabets,
* non-latin left-to-right single-byte coded languages (e.g. Greek),
* right-to-left languages, and
* double byte coded languages.

Program sources require human intervention for a thorough, sensible
translation (and they must be enabled for that purpose, cf. IBM's
"National Language Information and Design Guide" series, SE09-8001,
SE09-8002, ...)

The other four require special equipment.  Throughout BITNET, English
seems to take the role of a Lingua Franca; hence even participants
in non-latin-writing countries will have to use a latin-writing
terminal for their BITNET correspondence.

BITNET is still far away from even this moderatest goal;  nor does it
handle the former codes sensibly.  One more example:  Germany's primary
BITNET node, DEARN, refused to accept UDS entries or list subscriptions
containing German Sharp-s or German Umlauts in the participants proper
name.  The subscribers had to substitute other characters (e.g. "oe" or
even "-") for such characters in their names. That happened in a country,
where you are legally entitled (96 BDSG, 96 LDSG) to having your name's
spelling corrected, if it's mis-spelled in any database|

As stated earlier, the transgression from some 300 different EBCDIC
variants to 9 EBCDIC + 1 ISO would be a major improvement.

HOW COULD IT WORK?
------------------
I suppose, that every site will try to introduce one single CECP (or
ISO 8859), and do away with the old Codetable mismash.  This will take
time, as there is new equipment involved (new terminals, 3174 instead of
3274, updatet compilers, &c.)  Also, the old data and programs will have
to be transformed, suitably.  Note, that BITNET is only a small part of
the whole EDP business|

During the transition phase, MM's proposal could help smoothing things.
But behold, SET INPUT and SET OUTPUT cannot be the last word.  These
commands are only available in CMS;  and they take effect only in CMS
line-mode and in XEDIT.  CP commands, and CP messages are not translated,
and most full-screen mode programs do not honour the SET INPUT and SET
OUTPUT commands.  What a pleasure, when you enter
       TELL Kurt Gru Gott|
and CP displays to Kurt the Message
       Gr,( Gott|

After having chosen a CECP (or ISO8859-1), the site could send out its
texts in this local code.  They will have to be code marked: in the tag,
and preferably also inside the text.  For NOTES, RFC822 could be enhanced
with a "Code:" field, such as I have used in the header of this very
note.  I think, there are enough Network Experts listening to this list:
they should be able to design a suitable amendmend to the network's
standards.

The price for sending out the notes in a local code variant (well, that's
the very procedure, most sites are following right now) will be the
obligation of translating incoming messages.  So, every site will use at
most 9 (of the possible 90) translation tables.  Again, this could be
done via SET INPUT and SET OUTPUT, as MM suggested (that's exacly the
way, I read notes from USA and elswhere).  Later, the mailer, or RSCS,
or some similar software piece, would do the translating for all incoming
files, and the end-user will cease bothering with the details.

There will be need for a special marking, say "Code: Binary" preventing
the file from being translated, at all.  (Andr) PIRARD should be able
to continue sending his files through the net.)

HOW CAN WE SIMPLIFY CODE TABLE HANDLING?
----------------------------------------
Divide & impera|  Instead of devicing 90 related Code Tables (and pains-
takingly checking them for consistency), we could write down 10 "Half-
Tables".  These latter would relate one code page, respectively, to a
common description of the characters.  From these Half-Tables, a simple
program could build the translation tables for every desired code-pair.

The ISO 8859 descriptions of the characters are a bit too long to make
for a feasable common base of our half-tables.  But, what about IBM's
character identifiers, accompanying the GCS and CECP tables?  Instead of
"small letter a", we could use "LA01"; instead of "small letter a with
grave accent" we say "LA13", and instead of "small diphthong a with e",
we have "LA51".

Thus, the upper part of the CECP 500 half-table would be:

         Y    4    5    6    7    8    9    A    B    C    D    E    F
    -----+-------------------------------------------------------------
       0 Y SP01 SM03 SP10 LO61 LO62 SM19 SM17 SC04 SM11 SM14 SM07 ND10
         Y
       1 Y SA06 LE11 SP12 LE12 LA01 LJ01 SD19 SC02 LA02 LJ02 SP31 ND01

Note the new CECP 500, having SA06 (Division Sign) instead of the older
SP30 (Numeric Space).

Aside: these identfiers start with the following letter(s):
       L   for Letter,
       ND  for Numerical Digit,
       NF  for Numerical Fraction,
       SA  for Arithmetical sign,
       SC  for Currency sign,
       SD  for Diacritical mark,
       SP  for Punctuation marks,
       SM  for Miscellaneous special characters.

I can also set up a Half-Table for my controll-unit's I/O interface
code, hence a simple program could generate from these two half-tables
an EXEC with suitable SET INPUT and SET OUTPUT commands to display
CECP 500 on my terminal -- simply by matching the SM11 entry in code-
point C0 (CECP 500) with the SM11 entry in code-point 75 (Austrian/
German I/O Interface Code) and generating the REXX-line
    "SET OUTPUT C0 2"; "SET INPUT 2  C0" /* SM11  */
from this match.  (The "2" byte comes out as opening brace, on my
terminal.)  Thus, the generating of MM's procedures can be mechanized
in the same way as the generating of the BITNET translation tables.

The same holds for Kermit's Take-Files.

I would appreciate any comments on this proposition.

Regards
        Otto.
17-Apr-88 12:39:01-EST,4010;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Sun 17 Apr 88 12:38:59-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Sun, 17 Apr 88 12:37:21 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 2123; Sun, 17 Apr 88 12:37:18 EDT
Received: by BITNIC (Mailer X1.25) id 0264; Sun, 17 Apr 88 12:36:31 EDT
Date:         Sun, 17 Apr 88 02:33:42 EST
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         John C Klensin <KLENSIN@INFOODS.MIT.EDU>
Subject:      RE:       Some clarifications
X-To:         ASCII/EBCDIC character set related issues
              <ISO8859%JHUVM.BITNET@MITVMA.MIT.EDU>
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Otto,
  Well-presented, clear, and focused on what I think are the right set
of problems.  One comment/plea:
  While the local system's controller may be the "right" place to
establish the default code table binding for various supported devices,
keep in mind that transformations are not guaranteed to be reversible at
least in going to the large [historical] collection of existing devices.
It is possible that I will have a German-capable (i.e., supporting
German "extended" characters -- those that don't appear in ASCII /
English by code table switching) or a German-national (i.e., supporting
those characters *instead* of the ASCII special characters) -- available
at a site that mostly has only US-national (i.e., ASCII-only) devices.
If I do, I want the ability to see exactly what you write if you write
to me in German, not the local interpretation of what German ought to be
spelled like in IA5/ISO646 Basic version.  I might even want to invent,
for my own use, a set of two-or-three-character conventions.  For, if
you send me that word which we translate to English as "or", and I don't
have lowercase-u-umlaut available, I might prefer that my smart terminal
show me one of those "programming language" convolutions, such as fu:r
or fu..r, rather than trying the either fuer or fur, or showing me f|r
(that is 'f', broken-vertical-bar, 'r' on my device at the moment).

I would not expect to transmit this sort of notation convention, or
expect anyone else to read it, but it is important that the exact text
of what was sent, and the information about how it was encoded, be
available to the end user's mail-reading program or agent.

While you excluded the cases, the underlying problem becomes much more
important for messages that might involve non-Latin alphabets: A
reasonable site default might be to have them rendered into Latinized
transliteration (there are even ISO standards for the Latin alphabet
representation of several non-Latin alphabets).  But a local user with
the right equipment would, presumably want to see whatever the message
looked like to the person who typed it.

And don't hope for changes in RFC822 for several reasons.  The most
important is that local modifications and extensions made by various
people that treat the header fields just as slightly-structured free
text comments already have made it very difficult to build an adequate
processing agent.  The introduction of "Code:" is useful, beyond a
warning that I'm not going to be able to read what follows, only if the
predicate comes from a very restricted vocabulary and is arranged so
that an agent can process the text as it specifies (as your discussion
implies).
   X.400, by contrast, has provision for such a field.  But the "red
book" version doesn't have eight-bit character sets: unless you want to
specify, e.g., Teletext encoding, you will find yourself limited to
IA5-text.  And CCITT IA5 = ISO646 = the restricted character sets from
which our current problems originated.
     john

18-Apr-88 08:53:24-EST,1241;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Mon 18 Apr 88 08:53:20-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Mon, 18 Apr 88 08:50:49 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 2675; Mon, 18 Apr 88 08:50:40 EDT
Received: by BITNIC (Mailer X1.25) id 4044; Mon, 18 Apr 88 08:47:28 EDT
Date:         Mon, 18 Apr 88 08:41:57 EDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         "Thomas D. Denier" <TOM%PENNDRLS.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      IBM Graphic character identifiers
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Otto Stolz states that an initial letter 'L' in an IBM character
identifier stands for 'letter'. It actually stands for 'Latin
alphabetic'. IBM has assigned other initial letters to other alphabets,
as follows:
   A  Arabic
   G  Greek
   H  Hebrew
   J  Katakana
   K  Cyrillic
Thus, for example, Latin lower-case 'a' is LA01, and Greek lower-case
alpha is GA01.
18-Apr-88 12:54:50-EST,1720;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Mon 18 Apr 88 12:54:47-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Mon, 18 Apr 88 12:53:01 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 3050; Mon, 18 Apr 88 12:52:59 EDT
Received: by BITNIC (Mailer X1.25) id 6469; Mon, 18 Apr 88 12:52:13 EDT
Date:         Mon, 18 Apr 88 12:05:56 EST
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         John Kesich <KESICH%NYUCIMSA.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: IBM Graphic character identifiers
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Message of Mon, 18 Apr 88 08:41:57 EDT from <TOM@PENNDRLS>

How does IBM's designation differ from ISO 6937/2 (Coded character sets
for text communication - Part 2: Latin alphabetic and non-alphabetic
graphic characters)?  I know this is wishful thinking but could they
actually be one and the same (or at least 1 a subset of the other)?

As for the notion of adding a "code" field to mail headers, I don't
think it would buy you very much even if it were implemented.  Two
problems suggest themselves:
    1) what about non-mail transmissions
    2) what about shifting to other codes within the text
(How does IBM provide for shifting from 1 code page to another?)

There are already ISO standards which allow you to shift from code set
to code set to your heart's delight - why reinvent the wheel?
(646, 2022, 4873, 8859)
18-Apr-88 15:07:01-EST,2477;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Mon 18 Apr 88 15:06:56-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Mon, 18 Apr 88 15:04:31 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 3273; Mon, 18 Apr 88 15:04:24 EDT
Received: by BITNIC (Mailer X1.25) id 7894; Mon, 18 Apr 88 15:03:47 EDT
Date:         Mon, 18 Apr 88 13:11:38 CDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Michael Sperberg-McQueen <U18189%UICVM.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Header field for CODE and reinvention of wheel
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

John Kesich asks why a Code: field is necessary, since ISO has already
standardized methods for shifting between character sets.  Perhaps a
preliminary report on answers to my earlier query about ISO 2022
implementations is in order.

The reason "Code:" would be useful, and might be necessary, is that
ISO 2022 code-page switching via SI/SO is possible only when the G0
and G1 (and C0 and C1, for that matter) sets are known in advance to
all parties.  ISO standards for identifying coded character sets by
means of registered escape sequences have no known implementation
in any automatic device.  (Possible exception:  some printers may
accept the registered escape sequences to specify G1 before SO is
used.  Certainly some use escape sequences -- whether they are the
registered sequences or not is another matter.)

There are a (small) number of devices which accept SI/SO (terminals
and printers only, so far -- no one has reported on successful or
regular use of SI/SO in data transmission to distant sites).  But
so far only three or four people have reported anything at all.  If
we assume that silence implies that one has not heard of any notable
use of ISO 2022, then it appears that the vast majority of sites and
devices do not use it.

Perhaps someone better informed about Bitnet can say whether the
Bitnet header can or should or cannot or should not handle a CODEPAGE
field.  I was always told "Bitnet is EBCDIC" -- maybe we should at
least be able to specify what flavor of EBCDIC?

Michael Sperberg-McQueen, University of Illinois at Chicago
18-Apr-88 19:51:24-EST,3256;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Mon 18 Apr 88 19:51:19-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Mon, 18 Apr 88 19:49:16 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 3601; Mon, 18 Apr 88 19:49:14 EDT
Received: by BITNIC (Mailer X1.25) id 0347; Mon, 18 Apr 88 19:48:19 EDT
Date:         Mon, 18 Apr 88 19:43:17 EST
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         John Kesich <kesich@acf4.NYU.EDU>
Subject:      punched cards, anyone?
X-To:         iso8859%jhuvm.BITNET@cimsa.nyu.edu
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

The following list may cause one to consider the possibility that BITNET
should convert over to ISO8859:

 857  JNET         7  RNET           2  POWER         1  NOS
 729  RSCS         6  HUJI-           2  NONE         1  NJI
 133  UREP         6  ANL N           2  MUSIC         1  NJE 4
 124  ALIAS         5  PMDF           2  MTF         1  NJE 1
 117  JES2         5  OASYS           2  MAILE         1  NAM
  82  TCP/I         5  JA JN           2  INTER         1  MRJE/
  52  NJEF         4  IBM R           2  CARLE         1  MEMO
  42  NJE         3  TIELI           2  BERKH         1  MACH2
  42  HOMEB         3  RM           1  UNIX         1  JES2/
  26  ?             3  HUJI           1  TRANS         1  IX/37
  20  ANJE         3  GATE           1  SNA/N         1  HUMAI
  19  BERK         3  ANL/N           1  RTP/1         1  HASPM
  18  BITE         2  TELCO           1  RJEF         1  ETHER
  13  JES3         2  RSCSV           1  RES         1  ECF
  12  HASP         2  RJE S           1  RCOM         1  CDC
   9  MULTI         2  RHF           1  PMDF-         1  ANY
   8  DECNE         2  PRIME           1  NRV         1  AMF

This list is just a count of the different entries in the SYSTEM TYPE field
of BITNET LINKS804.  (What happens if we add in NETNORTH & EARN?)
Just counting JNET & UREP, pretty close to half the nodes are ASCII machines.
(a precise definition of my misuse of terms for those who may otherwise
become confused:
        EBCDIC - the stuff they use on IBM's
        ASCII - the stuff they use on most everything else
        ISO8859 - the family of codes which will make ISO2022 practical
        ISO8859/1 - ISO8859/1
admittedly not 100% accurate, but, hey, it works for me.)

Finally, let's not forget all the networked hosts, workstations and pc's
(IBM included) which hang off BITNET gateways and send mail through them,
how many of those do you suppose are EBCDIC?

Perhaps a survey of BITNET hosts should be made.  The 2 questions I'd like
answers to are:
        1) how would you feel about converting all BITNET links to ISO8859?
and for IBM nodes:
        2) if IBM were to announce a new code page which was code-point-to-
           code-point and graphic-to-graphic identical with ISO8859/1 and
           pledged to keep it that way, would you migrate to it?
20-Apr-88 13:37:24-EST,2093;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Wed 20 Apr 88 13:37:22-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Wed, 20 Apr 88 13:37:28 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 6126; Wed, 20 Apr 88 13:37:27 EDT
Received: by BITNIC (Mailer X1.25) id 5947; Wed, 20 Apr 88 13:36:50 EDT
Date:         Wed, 20 Apr 88 11:54:00 CDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Rick Troth <TROTH%TAMCBA.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: punched cards, anyone?
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Your message of Mon 18 Apr 88 19:43:17 EST

        I think the real objective here is to find a (nearly) one-to-one
mapping between EBCDIC (North American) and ISO8859/1, with similar 1-1
mappings between other national EBCDIC's and ISO8859/whatever.

        All those ASCII machines listed in BITNET NAMES are already
performaing their own translation between ASCII and EBCDIC. To switch the
whole network at once would cause many people much grief in both the short-
run and the intermediate-run. In the long-run, we would hope that IBM
supported products (like RSCS or whatever will someday replace it) will
be able to speak ISOxxxx, but remember that that is most likely 21st-century-
long-run.

        JNET (and I suppose others) have their translate tables in place.
What we are striving for is a "correct" translate table where I could do a
TELL <user> AT <vax> Talk  to me   and he would see 3 hex A2's on his VT330
as per DEC Multinational. "Talk cents to me"   Kermit transfers would work
correctly in both directions (if we achieve 1-1). Translate tables are really
not so bad if we can just agree on the translation.

                Or have I completely missed something?           - Rick
20-Apr-88 15:18:47-EST,5977;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Wed 20 Apr 88 15:18:44-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Wed, 20 Apr 88 15:18:49 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 6234; Wed, 20 Apr 88 15:18:47 EDT
Received: by BITNIC (Mailer X1.25) id 7154; Wed, 20 Apr 88 15:16:49 EDT
Date:         Wed, 20 Apr 88 21:05:04 GMT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Matthias Melcher <$28%DHDURZ1.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      PC ASCII
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Dear list subscribers,

the following is a contribution of an IBM code specialist to ECMA
about the history and backgrounds of Code Page 850 (PC Latin 1).
The attachments mentioned are not included, since I don't have them
in machine readable form. Matthias Melcher.

A new character set for the IBM PC, by W.F.Bohn

When I was asked recently to make available to TC1 a copy of the new
IBM PC code page I felt that I could not honour that request without
a few words of explanation.

The IBM PC was developed to be a very versatile computing device
capable also running video games, teaching programs, etc. That
is why the original code table (identified as 437 and Attachment 1 to
this contribution) has some unorthodox features:
- the code is based on ASCII - not on EBCDIC,
- the code table positions in column 00 and 01 have graphic characters
  assigned to them in addition to the normal 7-bit control characters,
- a graphic character was allocated to table position 07/15 in
  addition to, or as a graphical representation of, the control
  character DELETE,
- as controls beyond the normal 32 were not envisaged (actually all
  256 code table positions could be used for controls or for
  graphic characters) the right hand half of the code table was
  divided into
  . three columns with graphic characters believed to satisfy the
    requirements of the major West European languages,
  . three columns with line and box drawing characters plus other
    characters believed useful for creating diagrams, company logo's,
    etc.
  . two columns with mathematical and technical symbols.

Later, when the PC was connected to other computing equipment it
turned out that its graphic character set did not match any other
existing one and that interchange of data between two different IBM
machines would have to be limited to the small number of characters
common to both installations.

The advent of the 8-bit single-byte coded character set of ECMA-94
made a solution of that dilemma possible. By changing the IBM EBCDIC as
well as the PC code to the character set of ECMA-94/1 interchange of
all characters without loss of information could be achieved.

An important decision had to be made, however. Which of the characters
of table 437 should be sacrificed and which should be taken over into
the new code page (identified as 850 and attachment 2 to this
contribution)? Furthermore, should the structure of the code table
be changed to that of ECMA-94 or should the existing structure be kept?

In the interest of compatibility with existing equipment and existing
implementations it was decided:
- to include all the graphic characters of ECMA-94/1,
- to keep the original structure of the code table
- to leave those graphic characters in table 437 and now also in table
  850 in their original code table positions (there are two exceptions
  which need not be explained here),
- to select for the 32 positions in the right hand half of the code
  table (not needed for characters from ECMA-94/1) a useful set of the
  line drawing and other characters of table 437 and keep those
  characters also in their original positions. These characters selected
  are
  . the 11 basic line drawing characters in thin (or single line)
    rendition,
  . the same 11 characters in bold (or double line) rendition,
  . three shading characters (light, medium, heavy),
  . three block characters (full box, upper half, lower half),
  . a small solid square for different uses.
  To the remaining three positions were assigned graphic characters
  formerly in use in IBM but removed when IBM equipment changed their
  graphic character sets to that of ECMA-94/1. Backward compatibility
  with the existing equipment was the reason for this decision.

For interchange of the common character set between EBCDIC and PC
oriented equipment new translation correspondences were determined
which, when the traditional translation correspondence between EBCDIC
and ISO (or ASCII) code equipment is used, would lead to an orderly
arrangement of the additional characters in columns 08 and 09 of
ECMA-94. The arrangement is as similar as possible to the one proposed
for ISO 6937-6.
This may be of some importance when one day the code extension
procedures of ECMA-35 will allow additional G sets of 128 characters
instead of only 94 or 96.

Information on the translation correspondence between EBCDIC, PC-ASCII,
and ECMA (or ISO) oriented coding is appended to this contribution
as Attachment 3. Also attached is a copy of the international version
of EBCDIC (identified as 500 and Attachment 4 to this contribution).
The graphic character set common to ECMA 94/1, the IBM PC and IBM
EBCDIC (identified as 697-1) is Attachment 5.

Conclusion:

The new IBM PC code table (850) is the best possible compromise
between the desire to implement the graphic character set of ECMA-94/1
and the need to create a minumum impact on the existing implementations.
Of course, the IBM PC implementation of ECMA-94/2 followes the
principles outlined above.
20-Apr-88 19:55:59-EST,2743;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Wed 20 Apr 88 19:55:55-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Wed, 20 Apr 88 19:55:40 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 6598; Wed, 20 Apr 88 19:55:38 EDT
Received: by BITNIC (Mailer X1.25) id 0871; Wed, 20 Apr 88 19:54:35 EDT
Date:         Wed, 20 Apr 88 16:49:09 PDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         June Genis <GA.JRG%STANFORD.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      BITNET's current mapping
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

REPLY TO 04/20/88 16:33 FROM ISO8859@JHUVM.BITNET "ASCII/EBCDIC character set
related iss: BITNET's current mapping

>What is BITNET's current ASCII-EBCDIC standard?
>and where may one obtain a copy?
>Thanks in advance.

Sorry, John.  No such thing currently exists which is one reason
why data sets are trashed along the way.  Since BITNET is defined
to be an EBCDIC system, in theory any ASCII node should be
translating to/from EBCDIC for anything originating/terminating at
that node.  No standard has ever been defined as far as I know for
what translation should be used.  An even more ambiguous situation
potentially exists when an ASCII node is an intermediary node.  Can
an ASCII node be anything other than an end node?  If so, are files
passing thru it translated twice or not at all?  In the latter case,
might there be a chance that the translations are not reversible
such that the EBCDIC file emerging on the other side is different
than that which entered?

While it's clear to me that the absence of a standardized translate
table could result in things being messed up when the communication
is between an ASCII and an EBCDIC node, it is not clear to me if the
intervening nodes which just happen to be along the path can have an
impact as well.  This strikes me as the worst problem since finding
out which node trashed the file could be a real bear.

It's not even clear to me if the possibility of implementing a
standardized translate table even exists as many node have local
variations in translation that they are committed to for one reason
or another.  Can we assume that all systems have the ability to
apply one translation to their network mail and another in other
situations (say an ascii terminal attached to the host which is used
to general mail shipped both locally and to the net)?

/June

To:  ISO8859@JHUVM.BITNET
21-Apr-88 09:35:00-EST,1818;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Thu 21 Apr 88 09:34:39-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Thu, 21 Apr 88 09:34:13 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 7058; Thu, 21 Apr 88 09:34:12 EDT
Received: by BITNIC (Mailer X1.25) id 4432; Thu, 21 Apr 88 09:33:36 EDT
Date:         Thu, 21 Apr 88 02:03:36 EDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Edward_Vielmetti@um.cc.umich.edu
Subject:      ISO Latin-1 terminals
X-To:         ISO8859%JHUVM.BITNET@CUNYVM.CUNY.EDU
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

There's an ISO Latin-1 font available for the Apollo workstations,
which Jim Rees (umix!apollo!rees) pointed out in a recent usenet
posting.  Conceptually, it's real easy for any bitmapped terminal
with a replacable character set to make up a font like that; the
difficulties arise when the data transport paths are not 8-bit
transparent (in the all-ASCII world) or when goofy EBCDIC machines
get in the way.

A Latin-1 font for the Apple Macintosh would be easy to construct,
but there's the underlying problem that the typical Mac font has its
own Apple arrangement for the upper set of characters.  I think you
could still get everything to print out OK with a suitable manipulation
of Postscript.

Edward Vielmetti, U of Michigan Computing Center
USERW02S@UMICHUM   emv@umix.cc.umich.edu
(If you can think of any reason to send Bitnet mail to the UMICHUM
address, please do so - it's a new service which might fail.)
21-Apr-88 22:14:47-EST,4697;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Thu 21 Apr 88 22:14:40-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Thu, 21 Apr 88 22:14:19 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 8217; Thu, 21 Apr 88 22:14:18 EDT
Received: by BITNIC (Mailer X1.25) id 5752; Thu, 21 Apr 88 22:12:17 EDT
Date:         Mon, 18 Apr 88 17:57:00 MET
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Johan van Wingen <MOSGLA%HLERUL2.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      IBM4250 etc.
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Dear list subscribers
I think I was too optimistic about EBCDIC in my tutorial. Since then I
have been collecting code pages, and it seems now that there is not only
a separate code page for every language, but for every piece of printing
(or imaging) hardware as well. This results in some kind of Cartesian
product, at least two parameters are required for describing an item.
Some of these code pages are in ISO/ANSI style, some mirrored. My
catalogue is not yet complete, please provide me with the missing data
(IBM numbers in particular).
"Madamina, il catalogo i questo":

IBM Corporate System Standard, CSS 3-3220-2 (Latest version, please)
IBM Technical Reference for Digitized Type G544-3516 (not avlbl. here)
IBM VS FORTRAN Language and Library Reference SC26-4119-1
IBM Displaywriter Host Attach Programming Guide
IBM DCA RFT Reference
IBM 3270 Information Display System,
         Character set Reference GA-27-2837-9
IBM GDDM (which one, the set here is not complete)
IBM 3800 Printing Subsystem Model 3 Font Catalog SH35-0053 (id.)
IBM 3800 (another programming guide)
IBM 4250 Printer

The code pages of the 4250 are particularly nice. I selected the codes
for six important characters: left square bracket, backslash, right
square bracket, left brace, vertical bar, right brace (in that order):
(the square bracket are here AD and BD, I can type them at my 3278 only
in hexadecimal.)

IBM 4250 Code Pages
              AFTC   [  \  ]  {  |  }
German        0382  63 EC FC 43 CC DC
Belgium       0383  4A 48 5A 51 DD 54
Brazil        0384  71 E0 68 CF 48 51
Canada F      0385  44 5A 79 51 DD 54
Denm./Norway  0386  9E E0 5A 9C 70 47
Sweden/Finl.  0387  B5 71 5A 43 CC 47
France        0388  90 48 B5 51 DD 54
Italy         0389  90 48 51 44 CD 54
Japan         0390  B1 B2 6A C0 4F D0
Latin Am.(Sp) 0393  4A E0 5A C0 4F D0
Portugal      0391  4A 68 5A 46 CF D0
Spain         0392  4A E0 5A C0 4F D0
UK, Aus. NZ.  0394  B1 E0 6A C0 4F D0
US, Canada E  0395  B0 E0 6A C0 4F D0
International 0361  4A E0 5A C0 6A D0
APL           0293  AD B7 BD    BF

"Ma in Espagna che glie mille e tre."

I suppose that the ordinary national code pages are only a little bit
less confusing. Mr. Stolz's letter arrived here in good order, only his
German jokes missed their point, because our STC/Siemens laser printer
closely follows the GT12 convention from the IBM 3800 software, and
prints mostly spaces for accented letters, and a vertical bar for the
exclamation sign (the same for Mr. Klensin's broken bar).

I have not seen IBM's latest invention according to Mr. Stolz as yet,
but I remain sceptical. I think 9 tables are still too much. One single
universal code page is what is wanted. Before it is shown that really
not all characters can be accomodated I remain unconvinced. This code
page can be used for BITNET interchange of text. Locally it can be
converted to the historical version.

There are more things in Mr. Stolz's letter that deserve comment, but
that will come later. Just now I propose to you a little experiment.

Suppose someone in Norway is typing a text at a PC. Of course it
contains many AE, ae, A-ring and a-ring, barred O and o, and so on. Now
he transfers the text by Kermit to an IBM system, by EARN to a VAX
system, and by Kermit to the PC again. This involves conversion from
ASCII to EBCDIC and back. Now he does the same, but first to a VAX, then
again to an IBM, and again to the PC. The difference between the two he
cannot explain, but we can. It is sufficient to take the sequence:
AE O/ A0 ae o/ a0
(I suppose it is clear what I mean. German users may take A O U a o u
with umlauts.)

 FROM  J. W. van Wingen    MOSGLA@HLERUL2   :
     Mail to                                :
 P. O. Box 486,  2300AL Leiden, Netherlands :

21-Apr-88 22:47:24-EST,6910;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Thu 21 Apr 88 22:47:17-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Thu, 21 Apr 88 22:47:08 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 8245; Thu, 21 Apr 88 22:47:06 EDT
Received: by BITNIC (Mailer X1.25) id 6021; Thu, 21 Apr 88 22:45:05 EDT
Date:         Tue, 19 Apr 88 14:43:00 MET
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Johan van Wingen <MOSGLA%HLERUL2.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      more comments on O. Stolz
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Dear list subscribers

One should never quote texts from memory. The correct version of
Leporello's aria from Don Giovanni is:

Madamina!
Il catalogo [ questo
delle belle ch'amh il padron mio;
un catalogo egli [, ch'ho fatto io;
osservate, leggete con me!
In Italia sei cento e quaranta;
in Alemagna cento e trent'una;
cento in Francia,
in Turchia novant'una;
ma in Ispagna son gik mille e tre!

This text in Italian contains some accented letters. The question is how
to transmit this by BITNET correctly. I put carefully (with ISPF/PDF
CHANGE) the corresponding CP500 codes at the right places, hoping that
you could at least print it. But that may not work everywhere. I also
could put in the ISO6937-2 designations (also used by IBM), preceded by
a & for identification. But this is hard to read. The other way out is
to devise a notation, that only uses the 94 character set, and is
suitable for conversion by a little program to the extended local
printer set. This is what we use here. As long there is universal code
we should agree on temporary solutions. What is your proposal for
transmitting texts?

Madamina!
Il catalogo &LE13 questo
delle belle ch'am&LO13 il padron mio;
un catalogo egli &LE13, ch'ho fatto io;
osservate, leggete con me!
In Italia sei cento e quaranta;
in Alemagna cento e trent'una;
cento in Francia,
in Turchia novant'una;
ma in Ispagna son gi&LA13 mille e tre!

Madamina!
Il catalogo \e questo
delle belle ch'am\o il padron mio;
un catalogo egli \e, ch'ho fatto io;
osservate, leggete con me!
In Italia sei cento e quaranta;
in Alemagna cento e trent'una;
cento in Francia,
in Turchia novant'una;
ma in Ispagna son gi\a mille e tre!

The following comments pertain to Mr. Otto Stolz's letter:

>Moreover, IBM has defined 9 (nine) Country Extented Code Pages (serving
>17 languages), which contain the characters of GCS 00697 in various per-
>mutations.  Again, you should make sure that you use the new CECPs.

What is the relation between these and CP500, and what is the source
document?

>Clearly, the next step to be required from IBM must be adapting their
>language processors to the CECPs.  Recognizing dual EBCDIC codes for
>some characters, is not enough for the compilers and other applications:
>as long as there are various EBCDICs (call them CECPs or what you want),
>you must be able to customize them for the variant to be used|  Folks,
>please help convincing IBM by sending in as many APARs as you have pro-
>ducts.  The same holds for other software suppliers.

This a most unfortunate idea. If people want to adapt compilers to their
own local or national conventions, it is at their own risk.

>But now, for the difference between plain text and programs.  In addition
>............................................
>Now, you probably understand IBM's reluctance to a single universal CECP.

This passage is completely incomprehensible to me, because nothing is
printed here as was presumably intended.

>BITNET is primarily designed for transferring messages, i.e. plain
>text.  Let's set a comparatively humble goal, for the moment:
>      BITNET should transmit any plain text consisting of characters
>      from the ISO 8859-1 character set (i.e. GCS 00697) sensibly
>      and undisturbed.

BITNET is not transmitting characters, but bytes. At receiving a text,
it has to be interpreted, shown on a screen, or printed. This requires
a key, as with every coded message. For BITNET there is a default,
94-character EBCDIC, not ASCII or ISO8859-1. Any deviations are "subject
to mutual agreement between the interchange parties", as ISO uses to
formulate it. This does not mean that use of the other bytes is
forbidden, only there is no fixed interpretation for them.  As there is
no agreement between Mr. Stolz and me about the meaning of the codes he
uses for German letters, I cannot read his German texts.

>The price for sending out the notes in a local code variant (well, that's
>the very procedure, most sites are following right now) will be the
>obligation of translating incoming messages.  So, every site will use at
>most 9 (of the possible 90) translation tables.  Again, this could be
>done via SET INPUT and SET OUTPUT, as MM suggested (that's exacly the
>way, I read notes from USA and elswhere).  Later, the mailer, or RSCS,
>or some similar software piece, would do the translating for all incoming
>files, and the end-user will cease bothering with the details.

Again, this is a misconception. We want to extend the default set of 94
characters, by a common agreed code table. Like people who pass their
national border have to speak a different language, we should change our
computer-using habits, when using BITNET. We are used to speak English
internationally, we should agree upon one single code system, be it
computer-English or computer-Esperanto. And if a single code page is
technically not possible, we should tell the recipient beforehand which
internationally accepted alternative he is going to receive.
We do not even need IBM for doing this. We only have to provide a local
facility to produce the right codes from a local text, and a printer
with all the necessary characters. All the translate tables required we
can build ourselves. (The 3800 software, even the non-APA, allows you to
construct your own character tables.)

>for a feasable common base of our half-tables.  But, what about IBM's
>character identifiers, accompanying the GCS and CECP tables?  Instead of
>"small letter a", we could use "LA01"; instead of "small letter a with
>grave accent" we say "LA13", and instead of "small diphthong a with e",
>we have "LA51".

Those identifiers are not IBM's, but those found in ISO6937-2, with a
few extensions, such as for the Dutch guilder.

Yours faithfully, Johan van Wingen

 FROM  J. W. van Wingen    MOSGLA@HLERUL2   :
     Mail to                                :
 P. O. Box 486,  2300AL Leiden, Netherlands :

22-Apr-88 09:52:59-EST,3037;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 22 Apr 88 09:52:54-EST
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 22 Apr 88 09:52:45 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 8564; Fri, 22 Apr 88 09:52:44 EDT
Received: by BITNIC (Mailer X1.25) id 9098; Fri, 22 Apr 88 09:51:57 EDT
Date:         Fri, 22 Apr 88 13:19:48 +0200
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Andre PIRARD <A-PIRARD%BLIULG11.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: BITNET's current mapping
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Message of Wed, 20 Apr 88 19:09:20 EST from <KESICH@NYUCIMSA>

>What is BITNET's current ASCII-EBCDIC standard?
>and where may one obtain a copy?
>Thanks in advance.

ASCII/EBCDIC translation on BITNET occurs in gateways  connecting
it  to  ASCII networks.  For long,  WISCVM drained a lot  of  the
public  traffic  (from which it died).  Successors probably  have
inherited  the same tables.  ASCII end nodes deal with their  own
requirements.
I  have found WISCVM tables matching standard CMS KERMIT  tables,
dumped below, as well as many other products and consequent ASCII
data  stored  on servers.  Probably many other gateways  use  the
same.
These tables have been tested with UUENCODED data,  involving the
95 characters, that's all but control codes and DEL.
Focus on A->E, E->A has a couple of non-revertible additions.
Sorry for a straight dump only, this has to be a quick answer.

Andr)

TDUMP A (ASCII -> EBCDIC)
00010203 372D2E2F 1605250B 0C0D0E0F
10111213 3C3D3226 18193F27 1C1D1E1F
405A7F7B 5B6C507D 4D5D5C4E 6B604B61
F0F1F2F3 F4F5F6F7 F8F97A5E 4C7E6E6F
7CC1C2C3 C4C5C6C7 C8C9D1D2 D3D4D5D6
D7D8D9E2 E3E4E5E6 E7E8E9AD E0BD5F6D
79818283 84858687 88899192 93949596
979899A2 A3A4A5A6 A7A8A9C0 4FD0A107
00010203 372D2E2F 1605250B 0C0D0E0F
10111213 3C3D3226 18193F27 1C1D1E1F
405A7F7B 5B6C507D 4D5D5C4E 6B604B61
F0F1F2F3 F4F5F6F7 F8F97A5E 4C7E6E6F
7CC1C2C3 C4C5C6C7 C8C9D1D2 D3D4D5D6
D7D8D9E2 E3E4E5E6 E7E8E9AD E0BD5F6D
79818283 84858687 88899192 93949596
979899A2 A3A4A5A6 A7A8A9C0 4FD0A107

TDUMP E (EBCDIC -> ASCII)
00010203 0009007F 0000000B 0C0D0E0F
10111213 00000800 18190000 1C1D1E1F
00000000 000A171B 00000000 00050607
00001600 00000004 00000000 1415001A
20000000 00000000 00005C2E 3C282B7C
26000000 00000000 00002124 2A293B5E
2D2F0000 00000000 00007C2C 255F3E3F
00000000 00000000 00603A23 40273D22
00616263 64656667 6869007B 00000000
006A6B6C 6D6E6F70 7172007D 00000000
007E7374 75767778 797A0000 005B0000
00000000 00000000 00000000 005D0000
7B414243 44454647 48490000 00000000
7D4A4B4C 4D4E4F50 51520000 00000000
5C005354 55565758 595A0000 00000000
30313233 34353637 38397C00 00000000
23-Apr-88 22:37:33-EST,1548;000000000001
Return-Path: <protocols-request@rutgers.edu>
Received: from rutgers.edu by CU20B.COLUMBIA.EDU with TCP; Sat 23 Apr 88 22:37:28-EST
Received: by rutgers.edu (5.54/1.15) 
	id AA25258; Sat, 23 Apr 88 19:15:30 EDT
Received: by ucbvax.Berkeley.EDU (5.59/1.28)
	id AA13912; Fri, 22 Apr 88 22:27:13 PDT
Received: from USENET by ucbvax.Berkeley.EDU with netnews
	for protocols@rutgers.edu (protocols@rutgers.edu)
	(contact usenet@ucbvax.Berkeley.EDU if you have questions)
Date: 22 Apr 88 20:08:03 GMT
From: mnetor!utzoo!utgpu!water!watmath!egisin@uunet.uu.net  (Eric Gisin)
Organization: U of Waterloo, Ontario
Subject: Re: UUCP over X25 on Sun 3
Message-Id: <18471@watmath.waterloo.edu>
References: <287@tauros.UUCP>, <19772@pyramid.pyramid.com>, <20060@pyramid.pyramid.com>
Sender: protocols-request@rutgers.edu
To: protocols@rutgers.edu

In article <20060@pyramid.pyramid.com>, csg@pyramid.pyramid.com (Carl S. Gutekunst) writes:
> [...]
> The 7-bit-printable-ASCII restriction comes from international X.25 gateways,
> many of which insist on swiping the eigth bit for parity or somesuch. A few
> also do funny mappings of control characters, like munging tabs. If I set up a
> raw X.25 virtual circuit between here and West Germany, it will be 7 bits and
> there is nothing I can do about it.

It's difficult to believe CCITT is so stupid to allow this in X.25 VCs.
Maybe I'll have one last look at the red book to verify it.
What happens if one wants to run IP, DECNET, or OSI across such a gateway?
I guess you don't.
26-Apr-88 12:48:20-EDT,1440;000000000001
Mail-From: SY.FDC created at 26-Apr-88 12:48:17
Date: Tue 26 Apr 88 12:48:17-EDT
From: Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
Subject: MacKermit modifications
To: placeway@TUT.CIS.OHIO-STATE.EDU
cc: sy.christine@CU20B.COLUMBIA.EDU
Message-ID: <12393565441.48.SY.FDC@CU20B.COLUMBIA.EDU>

Paul, here are more people champing at the bit...  Sounds like they're
working on a pretty old version, but the ISO8859 stuff is a big plus for
the Europeans.  Any news?  Haven't heard from you for a while, and we're
starting to get a little anxious...  - Frank
                ---------------

To: hafro!comp-protocols-kermit@uunet.UU.NET
Path: krafla!frisk
From: mcvax!rhi.hi.is!frisk@uunet.UU.NET (Fridrik Skulason)
Newsgroups: comp.protocols.kermit
Subject: MacKermit modifications
Date: 25 Apr 88 17:27:12 GMT
Organization: University of Iceland (RHI)

Here at the University we have made a few modifications to MacKermit

	     *  #ifdef..#endif for MPW
	     *  Full 8 bit (ISO 8859/1) Terminal emulation support
		with automatic character set conversion.

Is anyone else working on similar changes ? If not, do you think someone
would be interested in receiving our modifications ?

         Fridrik Skulason          University of Iceland
         UUCP  frisk@rhi.uucp      BIX  frisk
-------
29-Apr-88 11:41:39-EDT,2968;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 29 Apr 88 11:41:38-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 29 Apr 88 11:38:13 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 6897; Fri, 29 Apr 88 11:37:55 EDT
Received: by BITNIC (Mailer X1.25) id 5773; Fri, 29 Apr 88 11:37:17 EDT
Date:         Thu, 28 Apr 88 22:10:00 MET
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Johan van Wingen <MOSGLA%HLERUL2.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      corrections
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>


Dear list subscribers

Here are a few corrections and additions to my previous letters.

>printer set. This is what we use here. As long there is universal code
>we should agree on temporary solutions. What is your proposal for
>transmitting texts?
 printer set. This is what we use here. As long there is NO universal
 code we should agree on temporary solutions. What is your proposal for
 transmitting texts?

The translation of Lepoello's aria is:
Madamina!                            Dear Miss!
Il catalogo \e questo                Here is the catalog
delle belle ch'am\o il padron mio;   of the beauties my master courted;
un catalogo egli \e, ch'ho fatto io; it is a catalog that I made myself;
osservate, leggete con me!           look, read with me!
In Italia sei cento e quaranta;      In Italy 640,
in Alemagna cento e trent'una;       in Germany 131,
cento in Francia,                    100 in France,
in Turchia novant'una;               in Turkey 91,
ma in Ispagna son gi\a mille e tre!  but in Spain it are 1003!

I hope that the number of entries in IBM's Code Page Catalog is
considerably less.

In the following table $o and $O should be moved one column to the right

  REPRESENTATION OF LETTERS FROM ISO 8859-1 WITH CP500 TABLE

     4. 5. 6. 7. 8. 9. A. B. C. D. E. F.

 .0                 $o $O
 .1     /e    /E  a  j        A  J     1
 .2  ^a ^e ^A ^E  b  k  s     B  K  S  2
 .3  %a %e %A %E  c  l  t     C  L  T  3
 .4  \a \e \A \E  d  m  u     D  M  U  4
 .5  /a /i /A /I  e  n  v     E  N  V  5
 .6  ~a ^i ~A ^I  f  o  w     F  O  W  6
 .7  @a %i @A %I  g  p  x     G  P  X  7
 .8  $c \i $C \I  h  q  y     H  Q  Y  8
 .9  ~n &s ~N     i  r  z     I  R  Z  9
 .A
 .B                          ^o ^u ^O ^U
 .C              $d &a $D    %o %u %O %U
 .D              /y    /Y    \o \u \O \U
 .E              $p &A $P    /o /u /O /U
 .F                          ~o %y ~O

Best regards, Johan van Wingen


 FROM  J. W. van Wingen    MOSGLA@HLERUL2   :
     Mail to                                :
 P. O. Box 486,  2300AL Leiden, Netherlands :

10-May-88 08:17:46-EDT,3091;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Tue 10 May 88 08:17:44-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Tue, 10 May 88 08:15:05 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 7650; Tue, 10 May 88 08:15:04 EDT
Received: by BITNIC (Mailer X1.25) id 7519; Tue, 10 May 88 08:16:11 EDT
Date:         Tue, 10 May 88 13:08:00 MET
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Johan van Wingen <MOSGLA%HLERUL2.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      EBCDIC-1992
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Dear list subscribers

After having read again the whole of the discussion I think a solution
can be proposed that is realizible without asking IBM or our managements
for any important changes.

1.  We take ISO8859-1 as is, other parts are for later consideration.
2.  We agree on a single, universal and international version (code
    page) of EBCDIC, that contains all the characters from ISO8859-1.
    This we call EBCDIC-1992.
3.  We agree on a single translate table between ISO8859-1 and
    EBCDIC-1992. (It is clear that the code page determines the
    translate table, or vice versa. Which is fixed first does not really
    matter, it depends on what is the more difficult: changing the
    existing translate tables, or the code pages.)

These points I call the 1992-convention. (The details are subject to
further discussion.)  People adhering to this convention will send and
receive their EARN/BITNET files using EBCDIC-1992. This requires at an
IBM installation:

a.  a program that that is able to translate files from EBCDIC-1992 to
    local EBCDIC (such programs I write in SNOBOL straight away) and
    back.
b.  a printing facility providing all the 190 characters from ISO8859-1.
c.  a typing convention enabling the user to type transliterations of
    the 190 characters, using only 47 keys on a normal keyboard.
d.  a program converting the transliterations to local EBCDIC or
    EBCDIC-1992 (see c.).

Except for the printer, this scheme does not require any new hardware on
the IBM side, (I know too little from VAX to tell what has to be done
there. Anyway, the conversion table must be installed.)
One advantage of choosing a version of EBCDIC for text interchange is
that texts appearing on the screen are readable to a large extent,
because EBCDIC-1992 does not change the simple Latin letters, the digits
and many specials.

It may be that after much negotiation with IBM a better solution will
turn up, but that can last for years. We need something now. I am
awaiting your reaction.

Yours faithfully, Johan van Wingen


 FROM  J. W. van Wingen    MOSGLA@HLERUL2   :
     Mail to                                :
 P. O. Box 486,  2300AL Leiden, Netherlands :

17-May-88 17:05:13-EDT,4311;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Tue 17 May 88 17:05:09-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Tue, 17 May 88 17:02:29 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 7118; Tue, 17 May 88 17:02:27 EDT
Received: by BITNIC (Mailer X1.25) id 0255; Tue, 17 May 88 16:41:08 EDT
Date:         Tue, 17 May 88 11:31:27 CDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Michael Sperberg-McQueen <U18189%UICVM.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Extended character sets, translation table implemented
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Remembering Howard Gilbert's warning "A curse upon anyone who actually
puts them into production before the community as a whole agrees to
them," I have been careful not to put them into production yet, but we
have now successfully installed for testing the translate tables for ISO
8859/1 and the U.S. Country Extended Code Page which were discussed
here.  The translations for graphics are those proposed by Howard
Gilbert in the message posted on 15 march 1988 (original date 3
September 1987) as modified in the subsequent discussion and summarized
in Alain Fontaine's posting of 25 March 1987.  Control characters,
including DUP and FM, are handled as in the standard Yale ASCII / IBM
7171 tables (except that CR, NL, EM, and FF are *not* given their own
code points in the protocol converter:  any device that needs the
special handling provided by Yale ASCII will have to use the vanilla
tables).

So far, so good.  I had a little trouble getting the 7171 to use the
correct (modified) tables, and of course they take a Kb or so of
precious RAM, but they seem to work correctly.  When I load the upper
half of ISO 8859/1 into my IBM3163 and display a hex chart, it looks
like the U.S. Country Extended Code Page distributed last August at
SHARE.  (Will someone who knows please tell those of us who can't get
the documents whether the US CECP is Code Page 500 or Code Page 037, or
what?  If you want, I'll send you a list of code points and their
graphics, but somebody *please* tell me what code page we've
implemented!)

Results:  one problem so far.  Because ISO, against its own rules (ISO
2022), put a graphic in position FF, which cannot be mapped into a
seven-bit data stream, that graphic (y-with-diaeresis, EBCDIC DF in the
US CECP) does not show up on my screen.

Questions for the group:

1 should EBCDIC DF be returned as an illegal graphic, or what?  (I'm
letting it be, even though it doesn't display.  So far it doesn't seem
to be blowing anything up.)

2 should any compromises be made with the 94-character ASCII-to-EBCDIC
translations users have become accustomed to?  That is, should the
188-character translate tables allow users to type ASCII carat and get
EBCDIC logical not, or should they insist on the new ASCII logical not?
In other words, should the 188-character tables be a superset of the
94-character tables, or not?

(I thought about this, at the last minute, and then decided I didn't
want to have to explain, ten years from now, that we had a chance for a
1-to-1 ASCII-EBCDIC conversion and passed it up because users were used
to typing '5' and getting '^'.  That's "typing '&carat.' and getting
'&logicalnot.'," for those who aren't looking at the same kind of screen
I am.Y  And the brackets I see are BA and BB -- apologies to those with
TN print trains.Y  So I implemented the table as it was discussed here,
no compromises with the old compromise solutions.  Before we make this
public here, I expect to write a couple execs to set and clear various
input and output mappings for CMS and Xedit, so people can simulate
the 94-character translate tables in CMS and Xedit, when they don't
need the whole thing.)

Anyone interested in copies of the code (host-based Series/1 and
7171 macros) can have them if you send me your name and Bitnet address.

Michael Sperberg-McQueen, University of Illinois at Chicago
18-May-88 06:44:16-EDT,4044;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Wed 18 May 88 06:44:12-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Wed, 18 May 88 06:41:33 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 7674; Wed, 18 May 88 06:41:31 EDT
Received: by BITNIC (Mailer X1.25) id 6538; Wed, 18 May 88 06:42:30 EDT
Date:         Wed, 18 May 88 11:15:18 +0200
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Comments:     Resent-From: Andre PIRARD <A-PIRARD@BLIULG11>
Comments:     Originally-From: Andre PIRARD <A-PIRARD@BLIULG11>
From:         Andre PIRARD <A-PIRARD%BLIULG11.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: Extended character sets, translation table implemented
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Your message of Tue, 17 May 88 11:31:27 CDT

>precious RAM, but they seem to work correctly.  When I load the upper
>half of ISO 8859/1 into my IBM3163 and display a hex chart, it looks
>like the U.S. Country Extended Code Page distributed last August at
>SHARE.  (Will someone who knows please tell those of us who can't get
>the documents whether the US CECP is Code Page 500 or Code Page 037, or
>what?  If you want, I'll send you a list of code points and their
>graphics, but somebody *please* tell me what code page we've
>implemented!)

In theory, your US CECP is code page 037. I may have a quick check if you
like. But 037 has exclamation mark at 5A. 500 has closed bracket there.
To my dismay, I am bound to implement cp 500, but I'll make it comment
switchable between cp 037 and cp 500.

>Results:  one problem so far.  Because ISO, against its own rules (ISO
>2022), put a graphic in position FF, which cannot be mapped into a
>seven-bit data stream, that graphic (y-with-diaeresis, EBCDIC DF in the
>US CECP) does not show up on my screen.

You're right! Johan will probably wake up on this one. I'll forward the
question to someone of our brains not on the list.

>1 should EBCDIC DF be returned as an illegal graphic, or what?  (I'm
>letting it be, even though it doesn't display.  So far it doesn't seem
>to be blowing anything up.)

I think you relate to DF as being the RATS code to which invalid EBCDIC
codes are translated. It is finally output to the terminal as whatever
the terminal table contains at offset DF. So, in my mind, DF can be replaced
by anything else, unless there is some hard coded test for that value
somewhere in the code, but I see no reason why.
In fact, this is really the heart of the problem. Which codes in the RATS
or terminal table has a hard coded value?
It seems the two-level translation allows for choosing any value in the RATS,
which is the 7171 own internal hidden business.
So, choosing it to be ISO8859 is a matter of convenience.
A couple of deviations here and there won't hurt if clearly documented.

>2 should any compromises be made with the 94-character ASCII-to-EBCDIC
>translations users have become accustomed to?  That is, should the
>188-character translate tables allow users to type ASCII carat and get
>EBCDIC logical not, or should they insist on the new ASCII logical not?
>In other words, should the 188-character tables be a superset of the
>94-character tables, or not?

I hate it. You have to switch to APL mode to reach ISO2022 haven't you?
I prefer to implement two modes:
1) non-APL for older terminals, for which installation dependent compromises
are acceptable for convenience.
2) APL mode switched to by intelligent terminals (mostly micros) which have
their elaborate keyboard redefinition anyway.
Else, one will one day find true ISO hardware and start the story all over
again.
A standard is a standard. And to be a little bit incompatible ...

Any comment?

Andr).
18-May-88 13:35:57-EDT,4516;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Wed 18 May 88 13:35:43-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Wed, 18 May 88 13:33:02 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 8234; Wed, 18 May 88 13:33:00 EDT
Received: by BITNIC (Mailer X1.25) id 2709; Wed, 18 May 88 13:33:52 EDT
Date:         Wed, 18 May 88 10:46:13 CDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Michael Sperberg-McQueen <U18189%UICVM.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Further on CP 037 DF = ISO 8859/5 FF = (SO) 7F.  Also APL mode
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Many thanks to Andre (Andr)) Pirard for his comments on my last posting.
He is right that the special treatment of EBCDIC DF (= ISO8859/1 FF)
could reflect the special treatment of code point DF inside the Series/1
or 7171, instead of or in addition to complications for the seven-bit
terminal data stream that now must accept a 7F (in the G1 set) as a
graphic.

To test this hypothesis, I altered the translation table on our test
Series/1, from:

    EBCDIC         Series/1       Terminal
    8E (thorn)     DE             (G1) 7E (thorn) = ISO8859/1 FE
    DF (y-diaer.)  DF             (G1) 7F (y-diaer.) = ISO    FF

to:

    EBCDIC         Series/1       Terminal
    8E (thorn)     DF             (G1) 7E (thorn) = ISO8859/1 FE
    DF (y-diaer.)  DE             (G1) 7F (y-diaer.) = ISO    FF

In both cases, lowercase thorn displayed properly; y with diaeresis did
not display.  So we can infer that the internal code point DF is not a
magic number that is used by other portions of the code, and that my
trouble displaying y-diaeresis results from its position in the
ISO8859/1 table.

It seems likely that the Series/1 is actually sending out the desired
shift-out + 7F to the terminal; I don't have a line monitor but putting
the terminal into transparent mode allows me to see that it is receiving
something, which it displays the same way as it displays a 7F.

But the terminal, in normal operation, simply ignores ASCII DEL (7F)
when it is received, and so I cannot make ISO8859/1 FF display.  When
equipment built to handle ISO8859 comes out, I assume this will not be a
problem (unless one's data line refuses to transmit DEL); a PC running
Yterm, similarly, may have no trouble with the 7F.  It will only be
devices which rely on the ISO 2022 definition of eight-bit sets which
will have trouble.

2.  APL mode.

> You have to switch to APL mode to reach ISO2022 haven't you?

I am not quite sure what is meant.  On the IBM3163 I must hit an ALT-CHR
key (a control-key combination) to get to the G1 set; not too hard.  In
the VM host I do *not* need to set the CP APL mode, or SET APL ON in
Xedit.  When I do issue a SET APL ON in Xedit, I hang the Series/1, for
reasons I don't understand.  Probably it has to do with the fact that we
use SI/SO, but not the standard 3278 or 3277 APL conventions.

The tables we are using do not emulate either the 3277 or the 3278 APL
support -- that is, they do not insert 1D (IFS) or 08 (Graphic Escape)
in front of the extended characters when handing characters to the host,
nor do they expect 1D or 08 in front of the characters when taking a
write from the host.  That may be a dumb way to do it, and it certainly
makes the protocol converter look different from a real 3278 or 3277.
But it is simpler to see what is going on inside the protocol converter,
and 3277s and 3278s don't support code page 037 anyhow.  So I followed
the lead of Tom Denier (sp?) at Penn, from whom we got the tables for
support of the ALA character set.  Seems to work okay, and I don't
have to issue any special Xedit commands.  (We did have to change
the Xedit module to treat hex 41 through FE as displayable characters.
When we made FF displayable, it caused terminal errors and dropped
us into line-mode Xedit, so we backed out of that.  This risks data
loss if extended-EBCDIC files are edited on real 3270 devices, and
I keep expecting data loss if they are edited on terminals which use
the standard tables.  But so far, we haven't had any data lost at all.
Knock on wood!)

Michael Sperberg-McQueen
19-May-88 10:01:05-EDT,2066;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Thu 19 May 88 10:00:44-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Thu, 19 May 88 09:50:44 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 9434; Thu, 19 May 88 09:50:43 EDT
Received: by BITNIC (Mailer X1.25) id 7390; Thu, 19 May 88 09:51:39 EDT
Date:         Thu, 19 May 88 13:55:52 +0200
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Andre PIRARD <A-PIRARD%BLIULG11.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: Further on CP 037 DF = ISO 8859/5 FF = (SO) 7F.  Also APL
              mode
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Message of Wed, 18 May 88 10:46:13 CDT from <U18189@UICVM>

>> You have to switch to APL mode to reach ISO2022 haven't you?
>
>I am not quite sure what is meant.  On the IBM3163 I must hit an ALT-CHR
>key (a control-key combination) to get to the G1 set; not too hard.  In
>the VM host I do *not* need to set the CP APL mode, or SET APL ON in
>Xedit.  When I do issue a SET APL ON in Xedit, I hang the Series/1, for
>reasons I don't understand.  Probably it has to do with the fact that we
>use SI/SO, but not the standard 3278 or 3277 APL conventions.

I mean activating APL mode on the 7171 or S/1 only. This, on the 7171,
uses an alternative set of terminal translate tables. My hope is that
it would make two translation modes possible. My fear is that it would
trigger the host input APL escaping too. And I would neither like nor
dare turning on APL mode on the host.
The converters switch to and from APL mode with the setup functions
invoked by <introducer (usually ESC)> "accent" "lower/upper letter A".
Is that your ALT-CHR or does it mean the way you enter SO, which should
be ctrl-N too isn't it?

Andr).
19-May-88 12:45:19-EDT,1637;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Thu 19 May 88 12:45:14-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Thu, 19 May 88 12:42:36 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 0126; Thu, 19 May 88 12:42:35 EDT
Received: by BITNIC (Mailer X1.25) id 9131; Thu, 19 May 88 12:44:06 EDT
Date:         Thu, 19 May 88 17:11:00 MET
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Johan van Wingen <MOSGLA%HLERUL2.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      iso2022
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Dear list subscribers

Just a small comment for the time being.

>Results:  one problem so far.  Because ISO, against its own rules (ISO
>2022), put a graphic in position FF, which cannot be mapped into a
>seven-bit data stream, that graphic (y-with-diaeresis, EBCDIC DF in the
>US CECP) does not show up on my screen.

ISO8859 is NOT an 8-bit code extension of a 7-bit code, but an 8-bit
code in itself. Rules for 8-bit codes have been defined in ISO4873, not
in ISO 2022. It is allowed to take here a 94 or a 96 character set for
G1. If the 7171 cannot manage 8-bit codes, the worse for the 7171.

Yours faithfully, Johan van Wingen


 FROM  J. W. van Wingen    MOSGLA@HLERUL2   :
     Mail to                                :
 P. O. Box 486,  2300AL Leiden, Netherlands :

19-May-88 18:25:01-EDT,6767;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Thu 19 May 88 18:24:58-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Thu, 19 May 88 18:22:21 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 0575; Thu, 19 May 88 18:22:18 EDT
Received: by BITNIC (Mailer X1.25) id 4167; Thu, 19 May 88 18:22:12 EDT
Date:         Thu, 19 May 88 12:49:00 CDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Rick Troth <TROTH%TAMCBA.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Headache
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

        I think it just hit me ...

        We have at least three "schemes" (code pages) for EBCDIC here
at Texas A&M.  This does not count various ASCII character sets.

        1) IBM 3192, 3179G   (newer terminals)
           the "ISO set",  has all (most of) the ISO8859/1 characters

        2) IBM 3180, 3279G, 3179   (older terminals)
           the "3180 set"

        3) IBM 7171, TN Print Chain, JNET (VAX), Kermit, WISCNET
           the "7171 set".

        I here include a "raw EBCDIC" file with illustrations of how it
displays on these three main groups of terminals.  The big problem is not that
7171, WISCNET, et al, deviate from ISO, they were intended for mapping 7-bit
ASCII to "some points" of EBCDIC (and thus are excused).  But the real
concern is that this 3180 set deviates everywhere.  It is wierd ... has
duplications all over the place!  What do we call these sets?  If the ISO set
is CP037, then what is the 3180 set?

* note: I used ^ in place of 5 for the sake of 7171 users.

 Raw EBCDIC    How does this display on your screen?

       0   1  2   3   4   5   6   7   8   9   A   B   C   D   E   F
   0                                                    
   1                                                
   2                  
                              
   3                                                
   4       &   a      k         b   +         .   <   (   +   |
   5   &   )   *      [   %      c   (      !   $   *   )   ;   ^
   6   -   /   _   \      ]   ^         ,   :   ,   %   _   >   ?
   7   W         0   1   2   |   V   {   `   :   #   @   '   =   "
   8   x   a   b   c   d   e   f   g   h   i      $   s   /   .   E
   9      j   k   l   m   n   o   p   q   r         N      q   ~
   A   H   ~   s   t   u   v   w   x   y   z   o   @   Z   [   r   y
   B   5   6   }   7   8   9   f   ;   <   =      Y   ?   ]   X   D
   C   {   A   B   C   D   E   F   G   H   I   K   J   >   h   l   m
   D   }   J   K   L   M   N   O   P   Q   R   !   -   u   t   #   
   E   \   g   S   T   U   V   W   X   Y   Z             i   d   Q
   F   0   1   2   3   4   5   6   7   8   9   3   w   p   z   '   

 IBM 3179G, 3192

       0   1   2   3   4   5   6   7   8   9   A   B   C   D   E   F
   0  --  --  --  --  --  -;  --  --  --  --  --  --  --  --  --  --
   1  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --
   2  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --
   3  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --
   4          ^a  %a  \a  /a  ~a  @a  $c  ~n  /c   .   <   (   +   |
   5   &  /e  ^e  %e  \e  /i  ^i  %i  \i  $s   !   $   *   )   ;   ^
   6   -   /  ^A  %A  \A  /A  ~A  @A  $C  ~N  -|   ,   %   _   >   ?
   7  $o  /E  ^E  %E  \E  /I  ^I  %I  \I   `   :   #   @   '   =   "
   8  $O   a   b   c   d   e   f   g   h   i  <<  >>  -d  /y  $P  +-
   9  ^0   j   k   l   m   n   o   p   q   r  -a  -o  $a  /,  $A  @X
   A  /u   ~   s   t   u   v   w   x   y   z  !!  ??  -D  /Y  $p  @R
   B  /\  -L  -Y  ^.  -f  /s  |P  14  12  34  |(  |)  ^_  ..  //  v=
   C   {   A   B   C   D   E   F   G   H   I   -  ^o  %o  \o  /o  ~o
   D   }   J   K   L   M   N   O   P   Q   R  ^1  ^u  %u  \u  /u  ~u
   E   \       S   T   U   V   W   X   Y   Z  ^2  ^O  %O  \O  /O  ~O
   F   0   1   2   3   4   5   6   7   8   9  ^3  ^U  %U  \U  /U  ~U

 IBM 3279G, 3179, 3180

       0   1  2   3   4   5   6   7   8   9   A   B   C   D   E   F
   0  --  --  --  --  --  -;  --  --  --  --  --  --  --  --  --  --
   1  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --
   2  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --
   3  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --
   4      |(  |)  -L  -Y  Pt  $X  $s  /s  ^-  /c   .   <   (   +   |
   5   &  ^0  \/  /\  ..  //  /,  \a  \e  \i   !   $   *   )   ;   ^
   6   -   /  \o  \u  ~a  ~o  %y  \a  \e  /e  -|   ,   %   _   >   ?
   7  \i  \o  \u  %u  $c  %a  %e  %i  %o   `   :   #   @   '   =   "
   8  %u   a   b   c   d   e   f   g   h   i  ^a  ^e  ^i  ^o  ^u  /a
   9  /e   j   k   l   m   n   o   p   q   r  /i  /o  /u  ~n  \A  \E
   A  \I   ~   s   t   u   v   w   x   y   z  \O  \U  ~A  ~O   Y   A
   B   E   E   I   O   U   Y   C  %A  %E  %I  %O  %U  ^A  ^E  ^I  ^O
   C   {   A   B   C   D   E   F   G   H   I  ^U  /A  /E  /I   l   m
   D   }   J   K   L   M   N   O   P   Q   R  /O  /U  ~N   t   #   
   E   \  $a   S   T   U   V   W   X   Y   Z  $o  @a  $c   i   d  -;
   F   0   1   2   3   4   5   6   7   8   9  $A  $O  @A  $C  -*   

 IBM 7171, TN Print Chain

       0   1  2   3   4   5   6   7   8   9   A   B   C   D   E   F
   0  --  --  --  --  --   ;  --  --  --  --  --  --  --  --  --  --
   1  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --
   2  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --
   3  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --
   4      --  --  --  --  --  --  --  --  --   \   .   <   (   +   |
   5   &  --  --  --  --  --  --  --  --  --   !   $   *   )   ;  /
   6   -   /  --  --  --  --  --  --  --  --   |   ,   %   _   >   ?
   7  --  --  --  --  --  --  --  --  --   `   :   #   @   '   =   "
   8  --   a   b   c   d   e   f   g   h   i  --  --  --  --  --  --
   9  --   j   k   l   m   n   o   p   q   r  --  --  --  --  --  --
   A  --   ~   s   t   u   v   w   x   y   z  --  --  --  |(  --  --
   B  --  --  --  --  --  --  --  --  --  --  --  --  --  |)  --  --
   C   {   A   B   C   D   E   F   G   H   I  --  --  --  --  --  --
   D   }   J   K   L   M   N   O   P   Q   R  --  --  --  --  --  --
   E   \  --   S   T   U   V   W   X   Y   Z  --  --  --  --  --  --
   F   0   1   2   3   4   5   6   7   8   9  --  --  --  --  --  --

20-May-88 08:04:17-EDT,1628;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 20 May 88 08:04:14-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 20 May 88 08:01:38 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 0874; Fri, 20 May 88 08:01:37 EDT
Received: by BITNIC (Mailer X1.25) id 2067; Fri, 20 May 88 08:02:51 EDT
Date:         Fri, 20 May 88 13:38:18 +0200
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Andre PIRARD <A-PIRARD%BLIULG11.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: iso2022
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Message of Thu, 19 May 88 17:11:00 MET from <MOSGLA@HLERUL2>

>ISO8859 is NOT an 8-bit code extension of a 7-bit code, but an 8-bit
>code in itself. Rules for 8-bit codes have been defined in ISO4873, not
>in ISO 2022. It is allowed to take here a 94 or a 96 character set for
>G1. If the 7171 cannot manage 8-bit codes, the worse for the 7171.

Sorry Johan, but both the way ISO looks (80-9F free) and information from
an IBM representative make it clear it was designed to be transmitted
7-bit wide. Else, it would have been a really lucky coincidence.
I wonder which poor people this character affects.

I agree with you 7-bit communication is nonsense. And parity is a hoax.
But the 7171's are there next room. And they are dreadfully useful.

Andr).
20-May-88 12:14:46-EDT,1571;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 20 May 88 12:14:40-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 20 May 88 12:11:57 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 1153; Fri, 20 May 88 12:11:55 EDT
Received: by BITNIC (Mailer X1.25) id 5193; Fri, 20 May 88 12:13:03 EDT
Date:         Fri, 20 May 88 11:57:31 EDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Edwin Hart <HART%APLVM.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: iso2022
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Your message of Fri, 20 May 88 13:38:18 +0200

ISO 8859-1 is an 8-bit code.  I have not seen any standards on how it
is to be transmitted over a communications wire.  It might be 8 data plus
parity.  However, the issue you are discussing is implementation using
existing equipment.  When you try to implement ISO 8859-1 characters using
the 7171 or Yale ASCII Comm System on an IBM Series/1, then because these
controllers were programmed for 7 data bits, you must use the ISO 2022
or ANSI X3.41 protocols to use the characters in columns 10 to 15.  This
is a different problem than transmitting an 8-bit code.  How does DEC
(Digital Equipment Corporation) do it with the new VT300 series of terminals?

Ed Hart
20-May-88 13:55:09-EDT,3061;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 20 May 88 13:55:07-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 20 May 88 13:52:31 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 1254; Fri, 20 May 88 13:52:27 EDT
Received: by BITNIC (Mailer X1.25) id 7552; Fri, 20 May 88 13:53:10 EDT
Date:         Fri, 20 May 88 11:53:56 CDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Michael Sperberg-McQueen <U18189%UICVM.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      7-bit and 8-bit sets
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Many thanks to Johan van Wingen for his note.  I apologize for my error
in ascribing the rules on 8-bit code structures to ISO 2022.  I once
worked at a site where we had a fairly good collection of the ISO
standards, but I don't have copies handy here, so my memory can slip.
Has ISO 4873 been revised in the last ten years or so?  And does 2022
really not talk at all about 8-bit sets?  I was very sure that the
ISO standards I read, when I studied them all a few years back,
prohibited use of A0 and FF for graphics.

The American standard which I think is the equivalent of ISO 2022
(and which I thought was *compatible* with all the relevant ISO
standards, please correct me if I'm wrong) is ANSI X3.41 - 1974
("American National Standard Code Extension Techniques for Use with the
7-Bit Coded Character Set of American National Standard Code for
Information Interchange").  ANSI X3.41- 1974 (which I do have in front
of me) does define the structure of 8-bit sets (despite its name) and
provides for a 94-graphic G1 set:  "The 8-bit code table consists of an
ordered set of controls and graphic characters grouped as follows [...]:
[...] (5) A set of ninety-four graphic characters allocated to columns
10-15, subject to the exception of positions 10/0 and 15/15"  (Section
6.2).  So even if ISO 8859/1 conforms with ISO standards, it doesn't
conform with ANSI X3.41 unless ANSI has revised it very recently.

I'm not sure I understand Ed Hart's note.  He is right, of course, that
we have to use ANSI X3.41 to transmit 8-bit codes over 7-bit wire.  But
how is that "a different problem than transmitting an 8-bit code"?  Part
of ANSI X3.41 *is* the definition of how to transmit 8-bit codes over
7-bit lines, stipulating that SI should be sent to switch from the G0
graphic set to the G1 graphic set, SO to switch back.  (Switching
between C0 and C1, the two sets of controls, seems to be handled by
escape sequences not SI/SO.)  Since A0 and FF are explicitly not part of
the G1 set, ANSI X3.41 provides (sec. 9.3) that they should be
represented, if they have to be, by a private escape sequence.

Michael Sperberg-McQueen, University of Illinois at Chicago
20-May-88 14:50:08-EDT,3543;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 20 May 88 14:50:02-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 20 May 88 14:47:22 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 1400; Fri, 20 May 88 14:47:19 EDT
Received: by BITNIC (Mailer X1.25) id 8740; Fri, 20 May 88 14:48:11 EDT
Date:         Fri, 20 May 88 13:03:29 CDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Phil Howard KA9WGN <PHIL%UIUCVMD.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: 7-bit and 8-bit sets
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Your message of Fri, 20 May 88 11:53:56 CDT

> From:         Michael Sperberg-McQueen <U18189@UICVM>
>                                                        .....  I once
> worked at a site where we had a fairly good collection of the ISO
> standards, but I don't have copies handy here, so my memory can slip.

This is an interesting issue.  I have found it to be difficult to get hold
of ANY ISO documentation at a reasonable price.

ANSI documents are reasonably priced and ANSI sells them directly within the
USA.  There was a company I found once in Washington DC that sold ISO
at exhorbitantly high prices.  I was told that two major factors went into
this high price:  extremenly slick production of the documents themselves
and very expensive binders to hold them.  Further the company charged a
high markup.  Finally, documents were bundled in such a way that nothing
could be had for under $800 a few years ago.

Does anyone know a reasonable source for ISO documents without any slick
covers or excessive bundling or any form of profiteering?  (such practices
really should have no place in standardizing).

Many documents about TCP/IP networking are readily available online.  Are
there any ISO documents online?

> of ANSI X3.41 *is* the definition of how to transmit 8-bit codes over
> 7-bit lines, stipulating that SI should be sent to switch from the G0
> graphic set to the G1 graphic set, SO to switch back.  (Switching
> between C0 and C1, the two sets of controls, seems to be handled by
> escape sequences not SI/SO.)  Since A0 and FF are explicitly not part of
> the G1 set, ANSI X3.41 provides (sec. 9.3) that they should be
> represented, if they have to be, by a private escape sequence.

If the C0 and C1 sets were switched by SI and SO, that would make SI
present in C0 only, and SO present in C1 only, and the opposing codes
acting as no-op.  I guess it would have worked, but maybe someone
thought it wasteful to define 2 no-ops.

+-----------------------------------------------------------------------+
| Phil Howard, KA9WGN                   bitnet: <phil@uiucvmd.bitnet>   |
| Research Programmer                 internet: <phil@vmd.cso.uiuc.edu> |
| Computing Services Office            or unix: <phil@uxg.cso.uiuc.edu> |
| University of Illinois at U/C            mci: 10222-1-217-BIG-MAIN    |
| 1304 West Springfield Avenue            at&t: 10288-1-217-BIG-MAIN    |
| Urbana, IL  61801                     sprint: 10333-1-217-BIG-MAIN    |
| Phil's corollary: "If I was able to fix it, it must have been broke!" |
+-----------------------------------------------------------------------+
20-May-88 10:23:53-EDT,9407;000000000015
Return-Path: <@CUVMA.COLUMBIA.EDU:A-PIRARD@BLIULG11.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 20 May 88 10:23:47-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 20 May 88 10:21:07 EDT
Received: from VM1.ULG.AC.BE by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 1007; Fri, 20 May 88 10:21:05 EDT
Received: by BLIULG11 (Mailer X1.25) id 1202; Fri, 20 May 88 16:15:41 +0200
Date:         Fri, 20 May 88 15:26:25 +0200
From:         Andre PIRARD <A-PIRARD%BLIULG11.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Extended ASCII with Kermit
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Frank,

Here is a rewrite of the note I asked you to stop.
My main goal is to reach Kermit developers.
I leave it for you to judge if Info-Kermit is the right audience.
Are there other lists like IBM-KERMIT for each implementation?
If yes, that's of course the good place, but I'd like some
feedback despite the fact I can't subscribe to all of them.

I agree it's a big piece for Info-Kermit, and it could raise
a lot of discussion. But you will understand it's vital for many
people and it could add Kermit a big plus for them. In fact, some
are probably secretly half way for terminal mode with SI/SO and G sets.

I can shorten it to somewhat more than the conclusion for I-K if
you like.

Because of our special problems with and treat to the Mac, I am
sending this note to Matthias Aebi K116430 @ CZHRZU1A and
Paul Placeway PLACEWAY @ OHIO-STATE.ARPA. I hope these addresses are
still all-right.

Thanks in advance.

Andr).

------------------------- For publication:
Dear Kermit developers,

Abstract.

In  the  course of implementing our own national  character  sets
with Kermit for terminal mode and file transfer, my understanding
of  the  problem evolved from confusion to (near) simplicity  and
from  national to international.  I think my findings will be  of
much interest to those having to deal with the  Spanish,  French,
German, Italian, well, the American continent, Western Europe and
many other languages.  That's,  for them,  really interconnecting
the majority of computers existing to-day.

On request,  I've tried to be as short as possible at the risk of
skimming  here and there.  I sure won't blame those getting bored
with  the subject.  They can skip to the conclusion and see  just
what  it  takes  in  Kermit  terms.   Conversely,   those  really
interested  will get more information from the standards and  the
ISO8859 list of BITNET's LISTSERV @ JHUVM and its archives.

Finally  I take the occasion to praise all those devoting much of
their time to straightening things that had run havoc. It's their
ideas I am conveying.  But I am sure glad to help. I just hope my
limited English will carry the message precisely.


Detail.

In   the   process of implementing extended  characters  transfer
between  micros  and  IBM mainframes,   I relied on the  extended
capabilities of Kermit 370 conversion (thanks John!). I  came  to
the conclusion that,   for the sole IBM PC,  I should set up to 9
different  tables in order to support 3 EBCDIC tables x 3 "ASCII"
tables.  For  the  Macintosh,  that's 3 more tables with the  IBM
host.  I was unable   to have Kermit do Mac to IBM PC conversion,
unless endeavouring translation on the PC, 3 more tables or so.

I  hacked  some  limited national characters  support  for  IBMPC
terminal mode through the 7171,  but our Mac users were left with
a dumb nice keyboard and a deaf screen.


Kermit implements two main files transfer modes.

Binary mode defines how to transport a continuous string of bytes
containing   values  only  required  to  be  meaningful  to   the
originating and receiving final systems.  No matter how stored on
an  intermediate one,  it should forward or return the same  byte
string  on the communication line.  The  point here is that  each
node operation is clearly defined, making it the best method when
appropriate.

Text  mode,  in  contrast,  defines how  to  transport  *records*
containing codes for "readable" characters intended to be  usable
-- and  stored as such -- on any system.  The protocol rules  how
to,  on  the  line,  stream the records in a  system  independent
manner. Again, every node should forward the data unaltered, that
is equivalent communication line encoding.

The  Kermit  protocol  wisely  says that the  ANSI  X3.4  (ASCII)
standard is to be used to represent these characters.  It is  the
code used on most computers and those (IBM,  Commodore) not using
it have to deal with their own problem of code conversion.

Most  modern computers now implement an 8-bit extended  character
set in order to support,  to various extents, languages requiring
characters  not found in ANSI X3.4 (I intentionally disregard the
obsolete  7-bit remapping methods).  Almost each does it its  own
way however,  because there was no standard at the time they were
devised (IBM even has multiple ones within a single system).

Clearly,  translation  must occur somewhere to transmit  extended
text  usefully  between  them.  If it is done say  by  running  a
program in the receiving system,  one must know and use the right
table  according to the sender.  The mere at least 7 codes that I
have  to deal with make for 5040 tables in theory.  In  practice,
what  was  a crystal clear matter as long as only X3.4  was  used
becomes  a  real puzzle with extended codes.  As  the  number  of
tables grows, so does the problem factorially.

To  a lesser extent,  the same problem holds for  terminal  mode.
It  occurs  only when a computer supports remote  terminals,  but
we must fiddle with a 7-bit data path,  an issue solved per se by
the Kermit protocol in the first case.

It  is  evident that the problem lies in each  machine's  dealing
with the others' own business,  and that the solution is to  have
them  talk a common code on the line,  as it is now with X3.4 and
for those not using it. Imposing them to use that code internally
is impractical,  although recommendable.  But having each convert
the  data  to/from  that common  code  before/after  transmission
reduces the above example to a mere 6 tables pairs.

What is striking is the technical simplicity of translating every
character  data  byte  that flows on a communication  line  to  a
common code everyone agrees about.  What is sorry is that we have
to. What is moot is what common code should be used on the line.

It  is my strong feeling that Kermit itself translating  national
codes to make up for the lack of its host system using a standard
will be *extremely* useful for people having to use these  codes.
This feature must be optional, because incompatible with previous
use.  It would be a shame to have two Kermit implementations  for
the  same  system corrupt data because one uses this feature  and
the other lacks it.

The  cause  of the problem,  a missing standard,  does no  longer
exist.  ISO 8859/1 = ANSI X3.134.2 = ECMA 94 has been defined and
gathers  every  possible  character extension for Latin  group  1
users,  by far the largest,  plus other common symbols satisfying
many computer brands. It looks like a very well thought out thing
and  several  leading manufacturers have adopted it,  or  a  pre-
release  because they couldn't wait,  or modified their  previous
codes to conform to ISO (have exactly the same graphics,  but use
other codes points,  in line with this proposition).  That's IBM,
DEC,  Microsoft and Lotus for what I gathered.  It looks like the
future many, international and US, are working for.

The  on-the-way-ISO8859/x  users  should not  be  left  out.  The
problem  is parallel,  but their codes will be untranslatable  to
ours.  They  might be expected to start with pure ISO right  off.
Until the 16 bits (some say 32) codes sets will be  devised,  but
that's our children's Kermit probably.


Conclusion.

In  summary,  a Kermit implementation would be much enhanced  for
many people if simply:

- it  was  optionally  translating bytes during  text  mode  file
transfer (at the file I/O or equivalent level). Nothing elaborate
is  required  to  start this.  Just a pair  of  null  translation
tables, easily found and patched, and a couple of code lines will
cover both the "translation" and "optional" topics.

- it  was  doing  the same at the communication  line  I/O  level
during  terminal  mode  and,  when using 7-bits wide  data  path,
implementing the ISO 2022 SO/SI feature to use the upper half  of
the  set  (shift  out) and revert to the lower  one  (shift  in).
Several already do.


That's all. But a welcome leap further would be to:

- if  a particular system does not conform to ISO (like the  Mac,
misses  some of its graphics or uses others),  define a best  fit
one  to  one correspondence between its character set(s) and  ISO
(there  should  be total agreement as to which,  up to  with  the
manufacturer). It must involve the 256 codes in a revertible way.

- have   systems  supporting  terminals  do  it  in   ISO   mode,
preferrably on an 8-bit wide line.

- have these features bundled in options.


Thanks for your patience in reading.

Andr). Oops, not yet. Andre'.
20-May-88 17:09:24-EDT,2113;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 20 May 88 17:09:19-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 20 May 88 17:06:41 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 1739; Fri, 20 May 88 17:06:39 EDT
Received: by BITNIC (Mailer X1.25) id 1594; Fri, 20 May 88 17:06:58 EDT
Date:         Fri, 20 May 88 13:55:00 CDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Rick Troth <TROTH%TAMCBA.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: iso2022
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Your message of Fri 20 May 88 11:57:31 EDT

Ed, et al -

                I do not have access to a VT300 series terminal,
but on the VT200 series boxes:

        1) 00-1F are the usual ASCII control codes
        2) 20-7E are the usual ASCII graphics
        3) 80-9F are new control codes, including CSI
        4) A0-FE are new graphics, defaulting to DEC Supplemental
                (looks pretty much like A0-FE in ISO 8859/1)
                if you are in "Multi-National" mode.

                On a 7-bit wire:

        3) 80-9F are represeted by ESC followed by one of 40-5F
        4) A0-FE are displayed by  SO, string of 20-7E, SI
                (thus APL support on 7171 can be fudged into ISO8859 support)
                (Phil - SO/SI does not affect controls)


                Examples:

        3) The familiar (to VT100 users) cursor placement operation
           ESC  row ; col H   (7-bit)   is equivalent to
            CSI  row ; col H   (8-bit)
                (that's "escape open-bracket ... ")

        4) The cent sign can be displayed by
            A2 (hex, 8-bit)   or with G1 as DEC Supplemental
            SO 22 (hex) SI   (7-bit)

                                                        - Rick
20-May-88 22:41:04-EDT,1374;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 20 May 88 22:41:01-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 20 May 88 22:38:17 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 1935; Fri, 20 May 88 22:38:15 EDT
Received: by BITNIC (Mailer X1.25) id 4267; Fri, 20 May 88 22:39:42 EDT
Date:         Fri, 20 May 88 20:05:20 EST
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         John Kesich <KESICH%NYUCIMSA.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: Obtaining ANSI and ISO Standards
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Message of Fri, 20 May 88 15:35:30 EDT from <HART@APLVM>

Just so people have a feeling for what "reasonable" means in terms of
ISO standard prices, here is a list of standards I picked up last
December (at ANSI):

standard      price    # pages
----------    -----    -------
ISO 8859-1     $22         7
ISO 8859-2     $20         6
ISO 6937-1     $27        12
ISO 6937-2     $50        37

In short, plan on spending about 4X more than you would for a comparable
ANSI standard.
24-May-88 07:45:14-EDT,3619;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Tue 24 May 88 07:45:08-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Tue, 24 May 88 07:42:43 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 4184; Tue, 24 May 88 07:42:41 EDT
Received: by BITNIC (Mailer X1.25) id 1096; Tue, 24 May 88 07:43:26 EDT
Date:         Tue, 24 May 88 12:52:00 MET
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Johan van Wingen <MOSGLA%HLERUL2.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Reply to 7171 change
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Dear list subscribers
So, Mr. Sperberg MacQueen wants to be cursed. My connections with
hellish powers are not all that, but I'll try.

I am certainly one in the "community" who does not agree, but I can
respect Mr. SMQ's experiments, not, however, a proposal that only solves
US problems.

First, what is EBCDIC? If we take the yellow card (GX20-1850), we see
two columns, one "standard", one for the T-11 and TN chains. Also, there
are the GT10 type tables for the IBM 3800 printers, and the national
variants (see 3270 manuals). The differences affect only a restricted
number of graphics.

 ID    NAME                  US   TN Intern GT10 CP500
SM06 left  square bracket    --   AD   4A   AD   BA
SM08 right square bracket    --   BD   5A   BD   BB
SM11 left  curly  bracket    C0   8B   C0   C0   C0
SM14 right curly  bracket    D0   8D   D0   D0   D0
SC04 cent sign               4A   4A   --   4A   4A
SP02 exclamation mark        5A   5A   4F   5A   5A
SM13 vertical line           4F   4F   --   4F   4F
SM65 broken vertical line    6A   --   6A   6A   6A
SM07 reverse solidus (slash) E0   --   E0   E0   E0
SD19 tilde                   A1   --   A1   A1   A1
SD13 grave accent            79   --   79   79   79

This is valid except for national variants at some of the 14 codes:
4A 5A 6A 79 5B 7B 7C 5F A1 C0 D0 E0 4F 7F ;
following US are:
Canadian Bilingual, English (UK), Hebrew, Japanese, Portuguese, Spanish;
following International are:
German, Belgian, Brazilian, Canadian French, Danish/Norwegian, Finnish/
Swedish, French, Italian, Swiss.
The best test case is 4F:
US/CP037: exclamation mark
Int/CP500: vertical line

As for the extensions, CP037 and CP500 seem to be identical. The NOT
sign is a separate problem to be discussed later on.

Second, what is ISO8859? Be warned, you do not solve anything if you
include ISO8859-1 only in the discussion, (a note: it used to be
ISO8859/1, but ISO changed very recently their rules for designating the
Parts of a Standard, now it is ISO8859-1). There will be very soon a new
set, called internally ISO-XYZ, being the harmonization of ISO6937 and
8859.  SC2 will meet the week of 17 Oct. 1988 in London. Prepare your
campaign, start to lobby now!

But all this leaves the central question unanswered. Shall the code page
be adapted to the translate table or the reverse? Mr. SMQ has shown that
the translate table of the 7171 can be changed. Is that all?

It is time to discuss the merits of the code pages. I'll keep my own
opinion until my next contribution.

Yours faithfully, Johan van Wingen


 FROM  J. W. van Wingen    MOSGLA@HLERUL2   :
     Mail to                                :
 P. O. Box 486,  2300AL Leiden, Netherlands :

25-May-88 06:24:14-EDT,1288;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Wed 25 May 88 06:24:09-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Wed, 25 May 88 06:21:58 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 6244; Wed, 25 May 88 06:21:56 EDT
Received: by BITNIC (Mailer X1.25) id 0291; Wed, 25 May 88 06:22:52 EDT
Date:         Wed, 25 May 88 02:28:00 CDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Richard <TILLEY%UOFMCC.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re:Reply to 7171 change
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Johan van Wingen <MOSGLA@HLERUL2> says:
>you do not solve anything if you include ISO8859-1 only in the discussion,

I agree. The high order half if ISO8859-1 is little use to anyone.
Far better to use Adobe's "Standard Encoding" or even
Xerox's "Character Set 0" as a basis for an 8 bit ASCII.
Both of these codes store accents as seperate characters instead of
trying to store all possible combinations of accents and characters.

25-May-88 09:35:49-EDT,1560;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Wed 25 May 88 09:35:47-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Wed, 25 May 88 09:33:38 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 6396; Wed, 25 May 88 09:33:35 EDT
Received: by BITNIC (Mailer X1.25) id 2455; Wed, 25 May 88 09:34:05 EDT
Date:         Wed, 25 May 88 08:56:48 EDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Edwin Hart <HART%APLVM.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Usefulness of ISO 8859-1
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

In contrast to flying accent options,
the ISO 8859-1 character set and code is very useful.  It contains the
necessary characters for over 40 countries.  Compare that to ISO 646
and the National variations (one per language).  ISO 8859-1 also has the
extra characters needed to make the US 94 character EBCDIC and US ASCII X3.4
character sets match.  ISO 8859-1 was developed because the computer
manufacturers required a one-character per code point.

However, when a printer implements the ISO 8859-1 code, nothing says that
internally the printer could not use flying accents to form the characters.
However, they need to be careful about the "i" character with accents like
the umlaut.

Ed Hart
25-May-88 09:36:58-EDT,2978;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Wed 25 May 88 09:36:54-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Wed, 25 May 88 09:34:48 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 6400; Wed, 25 May 88 09:34:47 EDT
Received: by BITNIC (Mailer X1.25) id 2506; Wed, 25 May 88 09:35:17 EDT
Date:         Wed, 25 May 88 08:53:02 EST
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         John C Klensin <KLENSIN@INFOODS.MIT.EDU>
Subject:      RE:       Re:Reply to 7171 change
X-To:         ASCII/EBCDIC character set related issues
              <ISO8859%JHUVM.BITNET@MITVMA.MIT.EDU>
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Now, come on.  "Of little use to anyone" is clearly at least a bit of an
exaggeration.  And the store-the-character, store-the-position,
store-the-accent strategy ignores two important problems:
  - For many purposes, these characters-with-extra-marks are CHARACTERS,
not simply "some other character with an accent".
  - The programming language implications of trying to cope with
characters and accents stored separately are pretty unpleasant.  I'm not
asserting that they cannot be made to work, but people keep assuming
that
  * length(string) == number of characters in it
  * if length(string1) = length(string2), then they contain the same
number of characters *and* occupy the same amount of storage (i.e., that
either string1 or string2 can be copied into the storage occupied by the
other).
  * that there is such a thing as character-width, and that characters
can be extracted from strings and stored into a character-width object.
  * that a comparison for identity between character1 and character2
will be true iff they are the same character (and not that one of them
is followed by an accent that changes its meaning).    And
  * things can be sorted into collating order using simplistic
bit-compare algorithms.
   I stipulate that some of those principles overlap, and that a smaller
number of rules is possible.  I also stipulate that one can design
runtime to eliminate or hide all of the problems (given careful runtime
and user programming), but suggest that such runtime would get little
assistance from current hardware and, consequently, would tend to
deliver unacceptable performance.
   character-overstrike_indicator-accent approaches are fine for page
definition languages (I note that your two examples were both of that
class), and are OK for a data communications stream that will be
printed (or displayed), but not further processed, but really fairly
poor for either information interchange or processing and text
manipulation.
   John Klensin, MIT

25-May-88 10:17:04-EDT,1907;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Wed 25 May 88 10:17:02-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Wed, 25 May 88 10:14:55 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 6458; Wed, 25 May 88 10:14:53 EDT
Received: by BITNIC (Mailer X1.25) id 3323; Wed, 25 May 88 10:10:41 EDT
Date:         Wed, 25 May 88 14:58:00 MET
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Johan van Wingen <MOSGLA%HLERUL2.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      7171 and Mr.Troth
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Dear list subscribers

I have been inspecting Mr. Troth's tables. The first one (Raw EBCDIC)
appears to have arrived quite correctly. If you turn HEX ON (under PDF)
you see that all 256 codes are there. I tried it on a 3278, a 3192-G,
a VT100 (by class=C71, that is by the 7171) and on a PC by KERMIT, also
by C71, and there is no difference. Only if you look at the actual
characters on the screen you see other representations.
This implies that 8-bit codes are being transferred by BITNET correctly.
Only as soon as you start interpreting those codes no longer as EBCDIC
problems arise. But that is a matter of local changes to character
interpretations of codes. If you turn on APL at your terminal, you get
other things to see. Why not invent an ISO8859-1 button?
This done, I do not understand what the fuss with the 7171 is about.
Who is fooling whom?
Yours faithfully, Johan van Wingen


 FROM  J. W. van Wingen    MOSGLA@HLERUL2   :
     Mail to                                :
 P. O. Box 486,  2300AL Leiden, Netherlands :

25-May-88 15:53:48-EDT,2527;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Wed 25 May 88 15:53:43-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Wed, 25 May 88 15:53:33 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 7079; Wed, 25 May 88 15:53:25 EDT
Received: by BITNIC (Mailer X1.25) id 1608; Wed, 25 May 88 15:54:04 EDT
Date:         Wed, 25 May 88 11:46:00 CDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Rick Troth <TROTH%TAMCBA.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: Usefulness of ISO 8859-1
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Your message of Wed 25 May 88 08:56:48 EDT

        oAmen to that,  Ed!   (Spanish syntax)

        The one-to-one ASCII to EBCDIC table(s) is *the strength* of the
ISO8859-1 set.  As I indicated in a recent long posting,  we suffer from
the confusion of three different EBCDIC's here at A&M.  The problem is
most clearly illustrated in the display of square brackets.

        Suppose some random user sits down at some random terminal.
He logs in and reads his Inter-Net mail with embedded brackets.
Whether he sees brackets or "something else" depends on what terminal
he is using and what code points the brackets were translated to by
whatever gateway passed the mail to BITNET.

         Y     display as left and right brackets on a 3192
        & a     display as left and right brackets on a 3180
        [ ]     display as left and right brackets on a 7171

        It so happens that the character set in the 3192 displays ALL
of the characters in the ISO8859-1 set.  The 3180 DOES NOT.  The 7171
DOES NOT.  If you map 7-bit ASCII to some of EBCDIC, then you may be
able to put up with this.  But ASCII machines are starting to use all
all eight bits.  Furthermore connectivity is the word of the day.

        Personally,  I would hope that ISO8859-2 can be mapped to
the coresponding national EBCDIC,  and likewise for ISO8859-3, etc.
I did not get the impression that Michael S-McQ nor anyone else on
this list wants to "leave Europe out in the cold".  But let's take one
step at a time,  please.  How does "raw EBCDIC" display on your tube?

                                                                 - Rick
25-May-88 12:08:08-EDT,2494;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Wed 25 May 88 12:08:03-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Wed, 25 May 88 12:02:00 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 6766; Wed, 25 May 88 12:01:58 EDT
Received: by BITNIC (Mailer X1.25) id 5126; Wed, 25 May 88 12:00:08 EDT
Date:         Wed, 25 May 88 10:06:12 CDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Phil Howard KA9WGN <PHIL%UIUCVMD.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: Usefulness of ISO 8859-1
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Your message of Wed, 25 May 88 08:56:48 EDT

> However, when a printer implements the ISO 8859-1 code, nothing says that
> internally the printer could not use flying accents to form the characters.
> However, they need to be careful about the "i" character with accents like
> the umlaut.

Does this mean that "flying accents" are only formed by overstriking?

Does a backspace control character separate the base character from its
accent mark?

(((  I would think this not necessary when designing a new code with   )))
(((  the sophistication of today's computers.  The accent code could   )))
(((  be made to preceed the base character, and the accent code would  )))
(((  imply a modification to the next coming base character.           )))

> character sets match.  ISO 8859-1 was developed because the computer
> manufacturers required a one-character per code point.

Not knowing the actual codes ISO puts out, it is hard to make specific comments
since they may be really part of a different code.  I once looked at a number
of ways to do this myself.  I looked at many languages and collected a list of
different accents.  Then, by combining them with the Roman alphabet, I came up
with over 3000 possibilities.  Double that again for Cyrillic.  And that is
just most of Europe.  Still, the number of actually used accented letters in
the various languages would put a stress on codifying them all in just 256
possible codes.

Just how many different languages are being codified here?  Does anyone have
a list of them?  Are these standards going to lock out certain languages?
25-May-88 12:36:07-EDT,4047;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Wed 25 May 88 12:36:04-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Wed, 25 May 88 12:25:10 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 6800; Wed, 25 May 88 12:25:08 EDT
Received: by BITNIC (Mailer X1.25) id 5801; Wed, 25 May 88 12:18:58 EDT
Date:         Wed, 25 May 88 10:18:42 CDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Phil Howard KA9WGN <PHIL%UIUCVMD.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      RE:       Re:Reply to 7171 change
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Your message of Wed, 25 May 88 08:53:02 EST

>   - For many purposes, these characters-with-extra-marks are CHARACTERS,
> not simply "some other character with an accent".
>   - The programming language implications of trying to cope with
> characters and accents stored separately are pretty unpleasant.  I'm not
> asserting that they cannot be made to work, but people keep assuming
> that
>   * length(string) == number of characters in it
>   * if length(string1) = length(string2), then they contain the same
> number of characters *and* occupy the same amount of storage (i.e., that
> either string1 or string2 can be copied into the storage occupied by the
> other).
>   * that there is such a thing as character-width, and that characters
> can be extracted from strings and stored into a character-width object.
>   * that a comparison for identity between character1 and character2
> will be true iff they are the same character (and not that one of them
> is followed by an accent that changes its meaning).    And
>   * things can be sorted into collating order using simplistic
> bit-compare algorithms.

Is it absolutely necessary that the representation of character codes
INTERNAL to a machine, and EXTERNALLY (inter-machine communication) be
identical?   Clearly if not, a processing logic must be applied as a
gateway in and out of a machine to transpose the code sets.  This overhead
is typical, however, given that many communications protocols even now
include various forms of Huffman or Lempel-Ziv compression protocols.
So, overhead is a weak argument.

The last "I" in ASCII means "Interchange".  Does the implications also
apply in practice for ISO codes?

>    I stipulate that some of those principles overlap, and that a smaller
> number of rules is possible.  I also stipulate that one can design
> runtime to eliminate or hide all of the problems (given careful runtime
> and user programming), but suggest that such runtime would get little
> assistance from current hardware and, consequently, would tend to
> deliver unacceptable performance.

How about a wider character code for INTERNAL machine processing where the
convenience of fixed interval addressing is very important, and a RELATED
EXTERNAL code for "Interchanging" these codes knowing that typical uses
will involve small subsets of the overall code, making it possible to
apply an "obvious" compression of selecting code subsets.  Some data
compression techniques can actually do this for you and make a 16-bit
code set where less than 256 codes are typically used transmit just about
as efficiently as if the codes had been defined in an 8-bit set.

>    character-overstrike_indicator-accent approaches are fine for page

What's wrong with (accent_implying_zero_forward_space)-(character) coding?

> definition languages (I note that your two examples were both of that
> class), and are OK for a data communications stream that will be
> printed (or displayed), but not further processed, but really fairly
> poor for either information interchange or processing and text
> manipulation.
>    John Klensin, MIT
27-May-88 20:42:03-EDT,1210;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 27 May 88 20:41:59-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 27 May 88 20:42:32 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 0543; Fri, 27 May 88 20:42:31 EDT
Received: by BITNIC (Mailer X1.25) id 0892; Fri, 27 May 88 20:42:29 EDT
Date:         Fri, 27 May 88 20:22:49 EST
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         John Kesich <KESICH%NYUCIMSA.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      ECMA registered codes via DRCS?
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

ISO2022 defines Dynamically Redefinable Character Sets (DRCS) - which
essentially allows a user to define and load their own character set.

VT200's and emulators (which means most terminals of recent vintage)
support DRCS.

Are there any DRCS's available for ECMA codes?

If not, does anyone know of any software tools for creating DRCS's?
27-May-88 20:22:15-EDT,2696;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Fri 27 May 88 20:22:09-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Fri, 27 May 88 20:22:44 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 0539; Fri, 27 May 88 20:22:43 EDT
Received: by BITNIC (Mailer X1.25) id 0808; Fri, 27 May 88 20:22:29 EDT
Date:         Fri, 27 May 88 18:35:47 EST
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         John Kesich <KESICH%NYUCIMSA.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: Extended ASCII with Kermit
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Message of Fri, 27 May 88 14:44:05 +0200 from <A-PIRARD@BLIULG11>

From my reading of ISO's 646, 2022, 4873, 8859-1 & 8859-2 I have come
to the conclusion that there is a fairly widespread misunderstanding of
ISO8859.  If I'm the one who has misunderstood I hope someone will take
the trouble to correct me.
People seem to think that you pick one of the ISO8859-x sets and then
those 256 characters are the only ones used.  However, ISO's 2022 & 4873
define a number of escape sequences for switching among different
versions (as they term character sets which conform to the standards).
What this means is that simple translation table mappings are not enough
to translate ISO to other code sets, one must also change translation
tables 'on the fly' as the escape sequences are encountered.  A somewhat
simplified example may help to illustrate the problem:

data stream
(ISO notation)   hex       comments
--------------   ---       --------
ESC 02/00 04/12  1B 20 4C  select level 1 of ISO4873
ESC 02/13 04/01  1B 2D 41  designate (and invoke) ISO8859-1's G1 set
12/00            C0        1st 'real' character - capital A, grave accent
ESC 02/13 04/02  1B 2D 42  designate (and invoke) ISO8859-2's G1 set
12/00            C0        2nd 'real' character - capital R, grave accent

Does an implementation which uses a single set of ISO8859-x characters
conform to the standard?
Even if it does, would it make any sense to standardize on a particular
ISO8859-x to the exclusion of others?
Finally, if one were to do so, how would the 2 character text in my
example be transmitted?

Any implementation which doesn't include the ISO escape sequences will
eventually have to incorporate some such mechanism.  I think the ISO
escape sequences should be a part of any standard which is adopted.
28-May-88 06:13:40-EDT,1772;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:A-PIRARD@BLIULG11.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Sat 28 May 88 06:13:30-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Sat, 28 May 88 06:13:55 EDT
Received: from VM1.ULG.AC.BE by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 0716; Sat, 28 May 88 06:13:53 EDT
Received: by BLIULG11 (Mailer X1.25) id 1548; Sat, 28 May 88 12:12:35 +0200
Date:         Sat, 28 May 88 12:11:26 +0200
From:         Andre PIRARD <A-PIRARD%BLIULG11.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Precision to my ISO8859/1 document
To:           ISO8859@JHUVM,
              Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>,
              IBM-KERMIT@CU20B.COLUMBIA.EDU,
              Paul Placeway <PAUL@TUT.CIS.OHIO-STATE.EDU>,
              Matthias Aebi <K116430@CZHRZU1A>


In  the  document describing the ISO8859/1 and related  character
sets,  I  forgot to make the following remark to be added to  the
file. Sorry.

Andre'.

- The  character  range 80-9F is undefined in the  description of
ISO885/1 I have.  I don't know its real status,  but this feature
is welcome for two reasons.
     First, it avoids control characters during transmission on a
7-bit  line (ISO2022:  an SO code shifts to the upper half of the
set,  an  SI code reverts to the lower one).  As an added  bonus,
this keeps Kermit overhead (8-th bit quoting) to a minimum.
     Second, it allows rearranging a previous 8-bit code set that
used this range for national characters.  These are moved to  the
ISO  positions and the expelled non-ISO characters can  be  moved
to the 80-9F range.
     What appears in my listing is the assignment made by IBM for
its graphic characters mainly.
30-May-88 14:19:26-EDT,2798;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Mon 30 May 88 14:19:21-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Mon, 30 May 88 05:53:51 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 1929; Mon, 30 May 88 05:53:50 EDT
Received: by BITNIC (Mailer X1.25) id 0253; Mon, 30 May 88 05:53:10 EDT
Date:         Mon, 30 May 88 11:06:44 +0200
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Andre PIRARD <A-PIRARD%BLIULG11.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: Extended ASCII with Kermit
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Message of Fri, 27 May 88 18:35:47 EST from <KESICH@NYUCIMSA>

>People seem to think that you pick one of the ISO8859-x sets and then
>those 256 characters are the only ones used.  However, ISO's 2022 & 4873
>define a number of escape sequences for switching among different
>versions (as they term character sets which conform to the standards).
>What this means is that simple translation table mappings are not enough
>to translate ISO to other code sets, one must also change translation
>tables 'on the fly' as the escape sequences are encountered.  A somewhat
>simplified example may help to illustrate the problem:
>
>data stream
>(ISO notation)   hex       comments
>--------------   ---       --------
>ESC 02/00 04/12  1B 20 4C  select level 1 of ISO4873
>ESC 02/13 04/01  1B 2D 41  designate (and invoke) ISO8859-1's G1 set
>12/00            C0        1st 'real' character - capital A, grave accent
>ESC 02/13 04/02  1B 2D 42  designate (and invoke) ISO8859-2's G1 set
>12/00            C0        2nd 'real' character - capital R, grave accent

That's the way to build a super terminal to display data from a super
text processor that can manage all languages simultaneously.
But how will this processor store its text? Not in a plain 8-bit text
file obviously.
And that's what's I was talking of: transferring to-day's 8-bit files
that store one version of ISO8859 and terminal support for that
one version of code. Let's first agree on how to do that.
File transfer of more elaborate data will have to encode the data for
integrity anyway. So, the ISO scheme can apply only to terminal mode.

But thanks for the information John.
By the way, could you describe in a couple of lines how ISO defines
switching between the two halves of a single 8-bit set with SI/SO
for a 7-bit line? The mechanism looks fairly obvious, but I would hate
missing some subtle feature.

Andr).
31-May-88 08:33:12-EDT,1095;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Tue 31 May 88 08:33:10-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Tue, 31 May 88 08:33:58 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 2888; Tue, 31 May 88 08:33:56 EDT
Received: by BITNIC (Mailer X1.25) id 0442; Tue, 31 May 88 08:33:32 EDT
Date:         Tue, 31 May 88 08:24:11 EDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Edwin Hart <HART%APLVM.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: Precision to my ISO8859/1 document
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Your message of Sat, 28 May 88 12:11:26 +0200

ISO 8859-1 columns 8 and 9 (X'80' to X'9F') are reserved for the C1 control
character set.  They may not be used for (printable) characters, only for
control characters.
31-May-88 09:15:25-EDT,3911;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Tue 31 May 88 09:15:17-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Tue, 31 May 88 09:16:08 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 2930; Tue, 31 May 88 09:16:07 EDT
Received: by BITNIC (Mailer X1.25) id 2115; Tue, 31 May 88 09:14:59 EDT
Date:         Tue, 31 May 88 15:00:00 MET
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         Johan van Wingen <MOSGLA%HLERUL2.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      What is EBCDIC?
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Dear list subscribers
It is very difficult indeed to be clear and precise. Thus I have to
present a corrected version of the EBCDIC part of my "Reply to 7171
change". As CP037 and CP500 are not available here, please send me
any correction to these tables, in order that we know what we are
speaking about when we are discussing variants of EBCDIC.
Yours faithfully, Johan van Wingen

""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
First, what is EBCDIC? We consider for this moment the basic set with
94 characters only.

If we take the yellow card (GX20-1850), we see two columns, one
"standard", one for the T-11 and TN chains.
Also, there are the GT10 type tables for the IBM 3800 printers.
Further, there are national variants, based on "US" and "International",
(see IBM3270 Information Display System, Character Set Reference,
GA27-2837-9, Figure 10-43).
Finally, there are CP037 and CP500, containing extensions.
(The 4250 code pages, which are still more different, are left out.)

The differences affect only a restricted number of graphics.
(the CP037 and CP500 are only a guess, please send corrections)

                                 Interna                         ISO
 ID    NAME                  US  tional  TN   GT10  CP037 CP500 8859-1

SM06 left  square bracket    --    4A    AD    AD    BA    4A    5B
SM08 right square bracket    --    5A    BD    BD    BB    5A    5D
SM11 left  curly  bracket    C0    C0    8B    8B    C0    C0    7B
SM14 right curly  bracket    D0    D0    9B    9B    D0    D0    7D
SC04 cent sign               4A    --    4A    4A    4A    4A    A2
SP02 exclamation mark        5A    4F    5A    5A    5A    4F    21
SM13 vertical line           4F    --    4F    4F    4F    5A    7C
SM65 broken vertical line    6A    6A    --    --    6A    6A    A6
SM07 reverse solidus (slash) E0    E0    --    E0    E0    E0    5C
SD19 tilde                   A1    A1    --    --    A1    A1    7E
SD13 grave accent            79    79    --    --    79    79    60
SM66 not sign                5F    5F    5F    5F    5F    5F    AC
SD15 circumflex accent       --    --    --    --    B0    B0    5E

This is valid except for national variants at some of the 14 codes:
4A 5A 6A 79 5B 7B 7C 5F A1 C0 D0 E0 4F 7F ;
following US are:
Canadian Bilingual, English (UK), Hebrew, Japanese, Portuguese, Spanish;
following International are:
German, Belgian, Brazilian, Canadian French, Danish/Norwegian, Finnish/
Swedish, French, Italian, Swiss.

The best test case for determining your set is 4F:
US/CP037: exclamation mark
International/CP500: vertical line

As for the extensions, CP037 and CP500 seem to be identical, (TN and
GT10 have different extensions).

The NOT sign is a separate problem to be discussed later on.
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""


 FROM  J. W. van Wingen    MOSGLA@HLERUL2   :
     Mail to                                :
 P. O. Box 486,  2300AL Leiden, Netherlands :

31-May-88 21:48:06-EDT,4711;000000000001
Return-Path: <@CUVMA.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.COLUMBIA.EDU by CU20B.COLUMBIA.EDU with TCP; Tue 31 May 88 21:48:00-EDT
Received: from CUVMA.COLUMBIA.EDU(MAILER) by CUVMA.COLUMBIA.EDU(SMTP) ; Tue, 31 May 88 21:48:36 EDT
Received: from BITNIC.BITNET by CUVMA.COLUMBIA.EDU (Mailer X1.25) with BSMTP id
 4073; Tue, 31 May 88 21:48:35 EDT
Received: by BITNIC (Mailer X1.25) id 2182; Tue, 31 May 88 21:43:33 EDT
Date:         Tue, 31 May 88 18:36:21 EST
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.COLUMBIA.EDU>
From:         John Kesich <KESICH%NYUCIMSA.BITNET@CUVMA.COLUMBIA.EDU>
Subject:      Re: Extended ASCII with Kermit
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>
In-Reply-To:  Message of Mon, 30 May 88 11:06:44 +0200 from <A-PIRARD@BLIULG11>

> That's the way to build a super terminal to display data from a super
> text processor that can manage all languages simultaneously.
> But how will this processor store its text? Not in a plain 8-bit text
> file obviously.
> And that's what's I was talking of: transferring to-day's 8-bit files
> that store one version of ISO8859 and terminal support for that
> one version of code. Let's first agree on how to do that.
> File transfer of more elaborate data will have to encode the data for
> integrity anyway. So, the ISO scheme can apply only to terminal mode.

By today's 8-bit codes I can only assume that you are refering to ECMA
registered codes (such as the ISO8859 character sets).  Each of these
codes has 2 registered designation sequences (G0 and G1 character sets
are designated seperately).  What I described in my previous note was not
a proposed ISO standard but something that has been around since 1973.
The only new element in the picture is ISO8859.  As I understand it,
ISO8859 represents the first set of internationally agreed upon VERSIONS
of ISO character sets.   ** perhaps someone could post a list of the other
ECMA registered character sets **
There is currently limited hardware support for these escape sequences, I
can only guess that they are more heavily used in Europe than in America.
However, even here the DRCS escape sequence defined in ISO2022 is widely
supported (as I mentioned in a previous note).  There is at least one word
processing package that I know of which makes use of it to provide alternate
characters (WordMARC which provides Greek characters and math symbols).
However, such programs as Tex, Troff, Script, MacWrite, etc should be able
to do the same.  (I can't guess at how much effort would be required,
but reinventing the ISO escape sequences - and I am sure they would be
reinvented - can't be easier.)

As far as I am concerned it makes no sense to adopt ISO8859 without the
related escape sequences.

> By the way, could you describe in a couple of lines how ISO defines
> switching between the two halves of a single 8-bit set with SI/SO
> for a 7-bit line? The mechanism looks fairly obvious, but I would hate
> missing some subtle feature.
I don't claim to be an expert, and I hope others will correct any mistakes,
but here is my understanding of how it works:
There are 3 sets of escape sequences:

             designator
        94 char     96 char       invoker    single-character-invoker

G0      ESC 2/8 f                 SI
G1      ESC 2/9 f   ESC 2/13 f    SO
G2      ESC 2/10 f  ESC 2/14 f    LS2        SS2
G3      ESC 2/11 f  ESC 2/15 f    LS3        SS3

where 'f' is the code assigned by ECMA in accordance with ISO2375.
and the shift sequences are defined as follows:

SO   0/14  (called LS1 in 8-bit environments - I don't know the difference)
SI   0/15  (called LS0 in 8-bit environments - I don't know the difference)
LS2  ESC 6/14
LS3  ESC 6/15
SS2  ESC 4/14  (8/14 in 8-bit environments)
SS3  ESC 4/15  (8/15 in 8-bit environments)

So you designate your 4 graphic character sets and then use the various
shift (invoker) sequences as needed.

For example:
ESC 2/8 4/2               designate ISO8859-1 G0 as your G0
ESC 2/13 4/1              designate ISO8859-1 G1 as your G1
ESC 2/14 4/2              designate ISO8859-2 G1 as your G2
SO 4/0                    'load' G1 (ISO8859-1 G1)
4/0 6/0                   'print' A-grave a-grave
SS2 4/0                   'load' position 4/0 from G2 (ISO8859-2 G1)
4/0 6/0                   'print' R-acute a-grave
SI                        'load' G0 (ISO8859-1 G0)
4/0 6/0                   'print' @ `  (commercial at, grave accent)

I hope this will be helpful.
 6-Jun-88 10:20:23-EDT,3898;000000000001
Return-Path: <@CUVMA.CC.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.CC.COLUMBIA.EDU by CU20B.CC.COLUMBIA.EDU with TCP; Mon 6 Jun 88 10:20:19-EDT
Received: from CUVMA.CC.COLUMBIA.EDU(MAILER) by CUVMA.CC.COLUMBIA.EDU(SMTP) ; Mon, 06 Jun 88 10:21:15 EDT
Received: from BITNIC.BITNET by CUVMA.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 0666; Mon, 06 Jun 88 10:21:08 EDT
Received: by BITNIC (Mailer X1.25) id 3087; Mon, 06 Jun 88 09:37:30 EDT
Date:         Mon, 6 Jun 88 15:19:00 MET
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.CC.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.CC.COLUMBIA.EDU>
From:         Johan van Wingen <MOSGLA%HLERUL2.BITNET@CUVMA.CC.COLUMBIA.EDU>
Subject:      What is EBCDIC? (2nd correction)
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Dear list subscribers
With the help of Mr. Pirard's table I could correct the CP500 column.
Here is the second corrected version.
Yours faithfully, Johan van Wingen

########################################################################
         What is EBCDIC? (2nd Correction)
########################################################################
First, what is EBCDIC? We consider for this moment the basic set with
94 characters only.

If we take the yellow card (GX20-1850), we see two columns, one
"standard", one for the T-11 and TN chains.
Also, there are the GT10 type tables for the IBM 3800 printers.
Further, there are national variants, based on "US" and "International",
(see IBM3270 Information Display System, Character Set Reference,
GA27-2837-9, Figure 10-43).
Finally, there are CP037 and CP500, containing extensions.
(The 4250 code pages, which are still more different, are left out.)

The differences affect only a restricted number of graphics.
A compromise between CP037 and CP500 should be possible.

                             ISO       Interna My
 ID    NAME                 8859-1 US  tional  TN   GT10  CP037 CP500 prop.

SM06 left  square bracket    5B    --    4A    AD    AD    BA    4A    4A
SM08 right square bracket    5D    --    5A    BD    BD    BB    5A    5A
SM11 left  curly  bracket    7B    C0    C0    8B    8B    C0    C0    C0
SM14 right curly  bracket    7D    D0    D0    9B    9B    D0    D0    D0
SC04 cent sign               A2    4A    --    4A    4A    4A    B0    BA
SP02 exclamation mark        21    5A    4F    5A    5A    5A    4F    6A
SM13 vertical line           7C    4F    --    4F    4F    4F    BB    4F
SM65 broken vertical line    A6    6A    6A    --    --    6A    6A    BB
SM07 reverse solidus (slash) 5C    E0    E0    --    E0    E0    E0    E0
SD19 tilde                   7E    A1    A1    --    --    A1    A1    A1
SD13 grave accent            60    79    79    --    --    79    79    79
SM66 not sign                AC    5F    5F    5F    5F    5F    BA    5F
SD15 circumflex accent       5E    --    --    --    --    B0    5F    B0

This is valid except for national variants at some of the 14 codes:
4A 5A 6A 79 5B 7B 7C 5F A1 C0 D0 E0 4F 7F ;
following US are:
Canadian Bilingual, English (UK), Hebrew, Japanese, Portuguese, Spanish;
following International are:
German, Belgian, Brazilian, Canadian French, Danish/Norwegian, Finnish/
Swedish, French, Italian, Swiss.

The best test case for determining your set is 4F:
US/CP037: exclamation mark
International/CP500: vertical line

As for the extensions, CP037 and CP500 are identical, (TN and GT10 have
different extensions).

The NOT sign is a separate problem to be discussed later on.
########################################################################


 FROM  J. W. van Wingen    MOSGLA@HLERUL2   :
     Mail to                                :
 P. O. Box 486,  2300AL Leiden, Netherlands :

 8-Jun-88 06:35:48-EDT,2223;000000000001
Return-Path: <@CUVMA.CC.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.CC.COLUMBIA.EDU by CU20B.CC.COLUMBIA.EDU with TCP; Wed 8 Jun 88 06:35:46-EDT
Received: from CUVMA.CC.COLUMBIA.EDU(MAILER) by CUVMA.CC.COLUMBIA.EDU(SMTP) ; Wed, 08 Jun 88 06:36:44 EDT
Received: from BITNIC.BITNET by CUVMA.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 3019; Wed, 08 Jun 88 06:36:43 EDT
Received: by BITNIC (Mailer X1.25) id 4920; Wed, 08 Jun 88 06:36:38 EDT
Date:         Wed, 8 Jun 88 12:24:00 MET
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.CC.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.CC.COLUMBIA.EDU>
From:         Johan van Wingen <MOSGLA%HLERUL2.BITNET@CUVMA.CC.COLUMBIA.EDU>
Subject:      EBCDIC on screen
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Dear list subscribers
What shows up on your screen when looking at all 256 bytes ("raw
EBCDIC") may depend on several things.
1.  the hardware of your terminal
2.  how the control unit (3174, 3274) has been customized
3.  the presence of PS ("programmable storage")
4.  the operating system (OS/MVS/TSO or VM/CMS)
5.  the option chosen under your editor
    (with TSO/ISPF/PDF you may use PDF 0.1 setting one of 3278, 3278A,
    3278T, 3278CN, 3278KN, each giving a different screen content)
It would be helpful to know how some effects on several terminal types
may be achieved. I have no idea how to show CP037 on a 3192G. Here it is
a MVS-only site. It seems that most of the contributions came from
VM/CMS sites, producing very little that I could use directly. With all
editing done under ISPF/PDF, it would the best solution to have the GDDM
symbol sets for CP037 and CP500 (due to Mr. J. Wilhelm) accessible to
ISPF. As these can be easily supplemented by other sets, all code page
problems can be solved. As for printers either IEBIMAGE or APA software
can realize everything desirable. Can anyone report having experience in
doing this?
Yours faithfully, Johan van Wingen


 FROM  J. W. van Wingen    MOSGLA@HLERUL2   :
     Mail to                                :
 P. O. Box 486,  2300AL Leiden, Netherlands :

 8-Jun-88 09:38:53-EDT,11965;000000000001
Return-Path: <@CUVMA.CC.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.CC.COLUMBIA.EDU by CU20B.CC.COLUMBIA.EDU with TCP; Wed 8 Jun 88 09:38:42-EDT
Received: from CUVMA.CC.COLUMBIA.EDU(MAILER) by CUVMA.CC.COLUMBIA.EDU(SMTP) ; Wed, 08 Jun 88 09:39:38 EDT
Received: from BITNIC.BITNET by CUVMA.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 3223; Wed, 08 Jun 88 09:39:35 EDT
Received: by BITNIC (Mailer X1.25) id 7377; Wed, 08 Jun 88 09:37:12 EDT
Date:         Wed, 8 Jun 88 15:23:00 MET
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.CC.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.CC.COLUMBIA.EDU>
From:         Johan van Wingen <MOSGLA%HLERUL2.BITNET@CUVMA.CC.COLUMBIA.EDU>
Subject:      Notation
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Dear list subscribers
It seems convenient to have a compact representation of the character
content of tables under discussion, enabling an immediate view on their
differences. Thus I extended a notation system, such as previously
sent, for all characters in ISO8859-1, -2 and -9, (the last one includes
Turkish, and is still to be approved).
The code used for vertical line is 4F, that for exclamation sign 5A.
Yours faithfully, Johan van Wingen




  A NOTATION SYSTEM FOR LETTERS NOT IN ASCII OR 94-EBCDIC

  The notation consists of two characters, the first being one of
  a restricted set of special characters, the second being one out of
  the common subset of ASCII and 94-EBCDIC. Thus it is suitable for
  processing by a program to a single byte code.
  Reference is made to the character identifications (ID) found in
  ISO6937, (these consist of two letters and two digits).

  First characters:
  (descriptions taken from ISO 6937-2, additions between parentheses,
   numbering system for identification taken from ISO6937-1, p. 7)

  /  acute accent                           11,12
  \  grave accent                           13,14
  ^  circumflex accent                      15,17
  %  diaeresis (umlaut, trema)              17,18
  ~  tilde                                  19,20
  *  caron (hachek)                         21,22
  #  breve (Rumanian a)                     23,24
  #  double acute accent (Hungarian o,u)    25,26
  @  ring (above: a,u)                      27,28
  @  dot (above: z)                         29,30
  =  macron (upper line)                    31,32
  $  cedilla (c,s,t)                        41,42
  $  ogonek (Polish a,e)                    43,44
  $  (barred: o, eth, thorn)                61....
  _  (underline, fraction)
  &  (ligature: ae,oe,sz)                   51,52
  ?  (dot below)

   REGULAR LETTERS AND DECIMAL DIGITS

 not.   ID       Name or description

     a LA01   small a
     A LA02   capital A
     :  :      :
     z LZ01   small z
     Z LZ02   capital z
     1 ND01   digit one
     :  :      :
     9 ND09   digit nine
     0 ND10   digit zero

   VOWELS

 not.   ID       Name or description

  /a   LA11   small a with acute accent
  \a   LA13   small a with grave accent
  ^a   LA15   small a with circumflex accent
  %a   LA17   small a with diaeresis or umlaut mark
  ~a   LA19   small a with tilde
  #a   LA23   small a with breve
  @a   LA27   small a with ring
  =a   LA31   small a with macron
  $a   LA43   small a with ogonek
  &a   LA51   small ae diphtong
  /e   LE11   small e with acute accent
  \e   LE13   small e with grave accent
  ^e   LE15   small e with circumflex accent
  %e   LE17   small e with diaeresis or umlaut mark
  *e   LE21   small e with caron
  @e   LE29   small e with dot above
  =e   LE31   small e with macron
  $e   LE43   small e with ogonek
  /i   LI11   small i with acute accent
  \i   LI13   small i with grave accent
  ^i   LI15   small i with circumflex accent
  %i   LI17   small i with diaeresis
  ~i   LI19   small i with tilde
  =i   LI31   small i with macron
  $i   LI43   small i with ogonek
  &i   LI51   small ij ligature
  @i   LI61   small i without dot
  /o   LO11   small o with acute accent
  \o   LO13   small o with grave accent
  ^o   LO15   small o with circumflex accent
  %o   LO17   small o with diaeresis or umlaut mark
  ~o   LO19   small o with tilde
  #o   LO25   small o with double acute accent
  =o   LO31   small o with macron
  &o   LO51   small oe ligature
  $o   LO51   small o with slash
  /u   LU11   small u with acute accent
  \u   LU13   small u with grave accent
  ^u   LU15   small u with circumflex accent
  %u   LU17   small u with diaeresis or umlaut mark
  ~u   LU19   small u with tilde
  #u   LU25   small u with double acute accent
  @u   LU27   small u with ring
  =u   LU31   small u with macron
  $u   LU43   small u with ogonek
  /y   LY11   small y with acute accent
  \y   LY13   small y with grave accent
  ^y   LY15   small y with circumflex accent
  %y   LY17   small y with diaeresis or umlaut mark

   CONSONANTS (ISO8859-1, -2 and -9 only)

 not.   ID       Name or description

  /c   LC11   small c with acute accent
  *c   LC21   small c with caron
  $c   LC41   small c with cedilla
  *d   LD21   small d with caron
  =d   LD61   small d with stroke
  $d   LD63   small eth, Icelandic
  #g   LG23   small g with breve
  /l   LL11   small l with acute accent
  *l   LL21   small n with caron
  $l   LL61   small l with stroke
  /n   LN11   small n with acute accent
  ~n   LN19   small n with tilde
  *n   LN21   small l with caron
  /r   LR11   small r with acute accent
  *r   LR21   small r with caron
  /s   LS11   small s with acute accent
  *s   LS21   small s with caron
  $s   LS41   small s with cedilla
  &s   LS61   small sharp s, German
  $p   LT17   small thorn, Icelandic
  *t   LT21   small t with caron
  $t   LT41   small t with cedilla
  /z   LZ11   small z with acute accent
  *z   LZ21   small z with caron
  @z   LZ29   small z with dot above

  Capital letters have even numbers, odd + 1.
  But notice the following:

     i LI01   small i
  @i   LI61   small i without dot
     I LI02   capital I (without dot)
  @I   LI30   capital I with dot above
  $D   LD62   capital D with stroke, Icelandic eth

   DIGITS AND NUMBERS

 not.   ID       Name or description

  @1   NS01   superscript one
  @2   NS02   superscript two
  @3   NS03   superscript three
  _2   NF01   fraction one-half
  _3   NF04   fraction one-quarter
  _4   NF05   fraction three-quarters

   SPECIAL CHARACTERS

 not.   ID       Name or description

  =f   SC01   general currency sign
  =L   SC02   pound sign
     $ SC03   dollar sign
  =c   SC04   cent sign
  =Y   SC05   yen

     ! SP02   exclamation mark
  *!   SP03   inverted exclamation mark
     " SP04   quotation mark
     ' SP05   apostrophe
     ( SP06   left parenthesis
     ) SP07   right parenthesis
     , SP08   comma
     _ SP09   low line
     - SP10   hyphen or minus sign
     . SP11   full stop, period
     / SP12   solidus
     : SP13   colon
     ; SP14   semicolon
     ? SP15   question mark
  *?   SP16   inverted question mark
  *<   SP17   angle quotation mark left
  *>   SP18   angle quotation mark right

     + SA01   plus sign
  _+   SA02   plus or minus sign
     < SA03   less-than sign
     = SA04   equals sign
     > SA05   greater-than sign
  _:   SA06   divide sign
  _*   SA07   multiply sign

     # SM01   number sign
     % SM02   percent sign
     & SM03   ampersand
     * SM04   asterisk
     @ SM05   commercial at
  *(   SM06   left square bracket
     \ SM07   reverse solidus
  *)   SM08   right square bracket
     { SM11   left curly bracket
     | SM13   vertical line
     } SM14   right curly bracket
  #m   SM17   micro sign
  @0   SM19   degree sign
  _o   SM20   ordinal indicator masculine
  _a   SM21   ordinal indicator feminine
  #S   SM24   section sign
  #p   SM25   pilchrow
  #.   SM26   middle dot
  #c   SM52   copyright sign
  #r   SM53   registered sign
  *| : SM65   broken bar
     ^ SM66   not sign

  @/   SD11   acute accent
  @\ ` SD13   grave accent
  @^   SD15   circumflex accent
  @%   SD17   diaeresis or umlaut mark
  @$ ~ SD19   tilde
  @*   SD21   caron
  @#   SD23   breve
  @"   SD25   double acute accent
  @0   SD27   ring
  @@   SD29   dot above
  @=   SD31   macron
  _)   SD41   cedilla
  _(   SD42   ogonek

  NOTE: If necessary, the following characters will denoted as:
  SP          space
  NB          no-break space
  SH          soft hyphen



    ISO8859-1                             ISO8859-2

                                        .
    2. 3. 4. 5. 6. 7. A. B. C. D. E. F. . 2. 3. 4. 5. 6. 7. A. B. C. D. E. F.
                                        .
.0      0  @  P  `  p NB @0 \A $D \a $d .     0  @  P  `  p NB @0 /R $D /r =d .
.1   !  1  A  Q  a  q *! _+ /A ~N /a ~n .  !  1  A  Q  a  q $A $a /A /N /a /n .
.2   "  2  B  R  b  r =c @2 ^A \O ^a \o .  "  2  B  R  b  r @# _( ^A *N ^a *n .
.3   #  3  C  S  c  s =L @3 ~A /O ~a /o .  #  3  C  S  c  s $L $l #A /O #a /o .
.4   $  4  D  T  d  t =f @/ %A ^O %a ^o .  $  4  D  T  d  t =f @/ %A ^O %a ^o .
.5   %  5  E  U  e  u =Y #m @A ~O @a ~o .  %  5  E  U  e  u *L *l /L #O /l #o .
.6   &  6  F  V  f  v *| #p &A %O &a %o .  &  6  F  V  f  v /S /s /C %O /c %o .
.7   '  7  G  W  g  w #S #. $C _* $c _: .  '  7  G  W  g  w #S @* $C _* $c _: .
.8   (  8  H  X  h  x @% _) \E $O \e $o .  (  8  H  X  h  x @% _) *C *R *c *r .
.9   )  9  I  Y  i  y #c @1 /E \U /e \u .  )  9  I  Y  i  y *S *s /E @U /e @u .
.A   *  :  J  Z  j  z _a _o ^E /U ^e /u .  *  :  J  Z  j  z $S $s $E /U $e /u .
.B   +  ;  K *(  k  { *< *> %E ^U %e ^u .  +  ;  K *(  k  { *T *t %E #U %e #u .
.C   ,  <  L  \  l  |  ^ _4 \I %U \i %u .  ,  <  L  \  l  | /Z /z *E %U *e %u .
.D   -  =  M *)  m  } SH _2 /I /Y /i /y .  -  =  M *)  m  } SH @" /I /Y /i /y .
.E   .  >  N @^  n  ~ #r _3 ^I $P ^i $p .  .  >  N @^  n  ~ *Z *z ^I $T ^i $t .
.F   /  ?  O  _  o  _ @= *? %I /s %i %y .  /  ?  O  _  o  _ @Z @z *D /s *d @@ .


               CP037                    .            CP500
                                        .
                                        .
    4. 5. 6. 7. 8. 9. A. B. C. D. E. F. . 4. 5. 6. 7. 8. 9. A. B. C. D. E. F.
                                        .
.0      &  - $o $O @0 #m  ^  {  }  \  0 .     &  - $o $O @0 #m =c  {  }  \  0
.1  NS /e  / /E  a  j  ~ =L  A  J _:  1 . NS /e  / /E  a  j  ~ =L  A  J _:  1
.2  ^a ^e ^A ^E  b  k  s =Y  B  K  S  2 . ^a ^e ^A ^E  b  k  s =Y  B  K  S  2
.3  %a %e %A %E  c  l  t #.  C  L  T  3 . %a %e %A %E  c  l  t #.  C  L  T  3
.4  \a \e \A \E  d  m  u #c  D  M  U  4 . \a \e \A \E  d  m  u #c  D  M  U  4
.5  /a /i /A /I  e  n  v #S  E  N  V  5 . /a /i /A /I  e  n  v #S  E  N  V  5
.6  ~a ^i ~A ^I  f  o  w #p  F  O  W  6 . ~a ^i ~A ^I  f  o  w #p  F  O  W  6
.7  @a %i @A %I  g  p  x _4  G  P  X  7 . @a %i @A %I  g  p  x _4  G  P  X  7
.8  $c \i $C \I  h  q  y _2  H  Q  Y  8 . $c \i $C \I  h  q  y _2  H  Q  Y  8
.9  ~n &s ~N  `  i  r  z _3  I  R  Z  9 . ~n &s ~N  `  i  r  z _3  I  R  Z  9
.A  =c  ! *|  : *< _a *! *( SH @1 @2 @3 . *( *) *|  : *< _a *!  ^ SH @1 @2 @3
.B   .  $  ,  # *> _o *? *) ^o ^u ^O ^U .  .  $  ,  # *> _o *?  | ^o ^u ^O ^U
.C   <  *  %  @ $d &a $D @= %o %u %O %U .  <  *  %  @ $d &a $D @= %o %u %O %U
.D   (  )  _  ' /y _) /Y @% \o \u \O \U .  (  )  _  ' /y _) /Y @% \o \u \O \U
.E   +  ;  >  = $p &A $P @/ /o /u /O /U .  +  ;  >  = $p &A $P @/ /o /u /O /U
.F   |  ^  ?  " _+ =f #r _* ~o %y ~O    .  ! @^  ?  " _+ =f #r _* ~o %y ~O
                                        .
                                        .

 FROM  J. W. van Wingen    MOSGLA@HLERUL2   :
     Mail to                                :
 P. O. Box 486,  2300AL Leiden, Netherlands :

 8-Jun-88 13:59:09-EDT,1972;000000000001
Return-Path: <@CUVMA.CC.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.CC.COLUMBIA.EDU by CU20B.CC.COLUMBIA.EDU with TCP; Wed 8 Jun 88 13:59:06-EDT
Received: from CUVMA.CC.COLUMBIA.EDU(MAILER) by CUVMA.CC.COLUMBIA.EDU(SMTP) ; Wed, 08 Jun 88 14:00:03 EDT
Received: from BITNIC.BITNET by CUVMA.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 3836; Wed, 08 Jun 88 13:59:58 EDT
Received: by BITNIC (Mailer X1.25) id 8680; Wed, 08 Jun 88 13:59:49 EDT
Date:         Wed, 8 Jun 88 13:37:58 EDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.CC.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.CC.COLUMBIA.EDU>
From:         Edwin Hart <HART%APLVM.BITNET@CUVMA.CC.COLUMBIA.EDU>
Subject:      ISO 8859-1, -2, -3, . . . -9 Standards and High Level Languages
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

I am trying to understand two areas:

1.  I have copies of ISO 8859-1, through -4.  What languages are covered by
    -5, through -9?

2.  Did ISO put any restrictions on High Level (Programming) Languages with
    respect to the 8859 series of codes?  In particular I can think of two
    possibilities:

    a.  Only characters in common to all ISO 8859 codes are valid in
        high level languages for operators, etc.  From my look at 8859-1
        through -4, this means that code points X'20' through X'7F' and
        multiplication (small x) and division symbols.

    b.  Only characters in ISO 8859-1 are valid for high level languages.
        The IBM NOT symbol "^" (X'5F' for CP 37 and X'BA' for CP 500) is
        included in 8859-1 but not in -2, -3, -4.

    Using 2.b. implies that 8859-1 must be common to terminals in use in
    several countries where the primary code is -2 through -9.  Using 2.a.
    means that the NOT symbol will be unavailable for programming languages.

Thank you for your comments,
Ed Hart
 8-Jun-88 16:15:54-EDT,2586;000000000001
Return-Path: <@CUVMA.CC.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.CC.COLUMBIA.EDU by CU20B.CC.COLUMBIA.EDU with TCP; Wed 8 Jun 88 16:15:49-EDT
Received: from CUVMA.CC.COLUMBIA.EDU(MAILER) by CUVMA.CC.COLUMBIA.EDU(SMTP) ; Wed, 08 Jun 88 16:16:41 EDT
Received: from BITNIC.BITNET by CUVMA.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 4203; Wed, 08 Jun 88 16:16:40 EDT
Received: by BITNIC (Mailer X1.25) id 2403; Wed, 08 Jun 88 16:15:44 EDT
Date:         Wed, 8 Jun 88 15:42:44 EST
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.CC.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.CC.COLUMBIA.EDU>
From:         John C Klensin <KLENSIN@INFOODS.MIT.EDU>
Subject:      RE:       ISO 8859-1, -2, -3,
              . . . -9 Standards and High Level Languages
X-To:         ASCII/EBCDIC character set related issues
              <ISO8859%JHUVM.BITNET@MITVMA.MIT.EDU>
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

  There has been some discussion about "requiring" the programming
languages to make a common statement about codes, but nothing definitive
has happened.  The strongest statements have been about support for
multi-octet character sets, which has nothing to do with ISO8859.
  There was a strong suggestion several years ago that the languages
avoid the use of any character not in the ISO646 Basic Table (i.e.,
seven-bit graphics with all national use positions excluded), but it
mostly resulted in a survey of deviants (almost everyone) and little
action.   The one noticable consequence of that effort may well be the
exclamation mark/vertical bar confusion about which character to use for
"or", and the similar tilde/diresis problem with "not".
  The different programming language standards differ in how they
approach character sets.  A few say what amounts to "you will use
ASCII".  At the other extreme, at least one says "use whatever external
form you like, as long as there is an abstraction that maps it into...".
Some of those approaches are more easily consistent with ISO8859
variations than others.  There has been, as far as I know, no serious
proposal for a common ISO8859 subset for programming languages.  The
common subset is an ISO646 subset.
  Finally, it is very difficult for the working groups in one
subcommittee (e.g., character sets and codes) to require the working
groups in another subcommittee (e.g., programming languages) to do (or
not do) anything.   That is just not how the process works.

12-Jun-88 08:24:22-EDT,1957;000000000001
Return-Path: <@CUVMA.CC.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMA.CC.COLUMBIA.EDU by CU20B.CC.COLUMBIA.EDU with TCP; Sun 12 Jun 88 08:24:20-EDT
Received: from CUVMA.CC.COLUMBIA.EDU(MAILER) by CUVMA.CC.COLUMBIA.EDU(SMTP) ; Sun, 12 Jun 88 08:23:36 EDT
Received: from BITNIC.BITNET by CUVMA.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 6094; Fri, 10 Jun 88 12:58:25 EDT
Received: by BITNIC (Mailer X1.25) id 5835; Fri, 10 Jun 88 12:58:15 EDT
Date:         Fri, 10 Jun 88 16:50:00 MET
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.CC.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMA.CC.COLUMBIA.EDU>
From:         Johan van Wingen <MOSGLA%HLERUL2.BITNET@CUVMA.CC.COLUMBIA.EDU>
Subject:      a few notes
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Dear list subscribers
A few little points this time.
1.  People who cannot get CP037 and CP500 elsewhere may consult a new
    IBM publication (Jan. 1988) which just arrived here.
    It is SC33-0554-00 GDDM Type faces and Shading Patterns.
2.  Please correct an error in the CP037 table recently mailed by me.
    At B0 "^" should be "@^".
3.  Mr. Hart is quite right in concluding that the NOT sign only occurs
    in ISO8859-1. Apparently people in Poland or Hungary are not
    supposed by ISO/TC97/SC2 to use PL/I, which is not according to the
    facts.
4.  As most programming language standards are much older than ISO8859
    (1987), it cannot be expected that these take it into account.
    There is much more to say about the relation, but that must come at
    a later moment.
5.  A list of all the relevant standards was mailed 24 March, and is
    contained in LOG8803.
Yours faithfully, johan van Wingen


 FROM  J. W. van Wingen    MOSGLA@HLERUL2   :
     Mail to                                :
 P. O. Box 486,  2300AL Leiden, Netherlands :

15-Jul-88 11:40:15-EDT,3236;000000000001
Return-Path: <@CUVMB.CC.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMB.CC.COLUMBIA.EDU by CU20B.CC.COLUMBIA.EDU with TCP; Fri 15 Jul 88 11:40:12-EDT
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Fri, 15 Jul 88 11:36:26 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 4961; Fri, 15 Jul 88 11:36:22 EDT
Received: by BITNIC (Mailer X1.25) id 7043; Fri, 15 Jul 88 11:36:33 EDT
Date:         Fri, 15 Jul 88 17:12:00 MET
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMB.CC.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMB.CC.COLUMBIA.EDU>
From:         Johan van Wingen <MOSGLA%HLERUL2.BITNET@CUVMB.CC.COLUMBIA.EDU>
Subject:      Code switching
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Dear list subscribers

My congratulations to Mr. Kesich for his excellent analysis. I sent
a copy of it and of Mr Hart's mailing to the SEAS Secretary
(DECK@RCSCK11). I hope both of you will not mind.
It is strange to see how IBM attitudes are with respect to ISO
standards, for one can find IBM people at all key positions in ISO
committees. Chairman of ISO/TC97, now ISO/IEC JTC1, is J. Rankine, from
IBM. Convener of SC2/WG2 (multibyte characters) is J. Andersen, IBM. In
SC2/WG3, (7-8 bit codes) you find Mr. W. F. Bohn, IBM. And this is a
far from complete list.

As for the problem of switching to another code page, I looked at CP870,
which should be the equivalent of ISO 8859-2 (Eastern Europe).
I was surprised that it is not identical with the code page
you get when you convert ISO 8859-2 with the same translate table as for
producing CP037 from ISO 8859-1. This has curious consequences.
Suppose you use ISO 2022 for table switching in a 8859 file.
Then if you have
<start with 8859-1> \a <shift to 8859-2> /r <end>
the codes for \a and /r are identical, but they produce a different
graphic as a result of the shift, ("a" with grave accent is denoted by
\a, and "r" with acute accent with /r). Now, if you translate this file
to EBCDIC with the customary translate table, all codes are converted
accordingly, equal codes remaining equal. But this does not produce a /r
any more, when the shift is translated to cause switching from CP037 to
CP870. This implies that an extra function is required at translating,
to switch also the translate table at finding a shift code. This puts an
additional burden to our poor hardware and software.

A note for people who complained that ISO does not keep to its own rules
when introducing 96-character sets. It appears that there is a Third
edition of ISO 2022 (1986-05-01) which differs from the Second
(1982-12-15) edition in this respect. I became aware of this only very
recently.

A few corrections should be made in the tables I sent. In that one
headed ISO8859-1 position 7F should be blank, and DF should contain &s,
not /s. The table for ISO8859-2 contains the same errors.

Yours faithfully, Johan van Wingen


 FROM  J. W. van Wingen    MOSGLA@HLERUL2   :
     Mail to                                :
 P. O. Box 486,  2300AL Leiden, Netherlands :

15-Jul-88 13:37:31-EDT,3313;000000000001
Return-Path: <@CUVMB.CC.COLUMBIA.EDU:ISO8859@JHUVM.BITNET>
Received: from CUVMB.CC.COLUMBIA.EDU by CU20B.CC.COLUMBIA.EDU with TCP; Fri 15 Jul 88 13:37:29-EDT
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Fri, 15 Jul 88 13:33:45 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 5256; Fri, 15 Jul 88 13:33:40 EDT
Received: by BITNIC (Mailer X1.25) id 8100; Fri, 15 Jul 88 13:33:23 EDT
Date:         Fri, 15 Jul 88 10:57:21 CDT
Reply-To:     ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMB.CC.COLUMBIA.EDU>
Sender:       ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@CUVMB.CC.COLUMBIA.EDU>
From:         Michael Sperberg-McQueen <U18189%UICVM.BITNET@CUVMB.CC.COLUMBIA.EDU>
Subject:      IBM and standards (PS/2 code page, ISO8859-2 translation)
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Footnote to John Kesich's remarks about IBM and standards:  it's not the
programs I care about (any program that hard codes non-standard
character set extensions is asking for everything it gets), but the
users' data.  You don't really think PC users should have to convert all
of their data with non-ASCII characters when they move from PCs to PSs,
do you?  You don't really think they would, even if they were supposed
to, do you?  I don't want the phone calls I would get if IBM had moved
the umlauts to their ISO 8859-1 positions.  And you don't either.

Yes, it would have been nice for the PC to have had a rational extension
of ASCII instead of the mess it actually has.  Yes, it would be nice if
the PS/2 could switch code pages in flight.  Yes, it would have been
nice for IBM to adhere fully to ISO 8859-1.  But given the original PC
character set, given the hardware the PS/2 actually has, and given IBM's
unwillingness to make users eat data conversion costs even for their own
good, the PS/2 code page does look (to me) like a step forward.  Let's
rejoice:  it's not often we see even small steps moving in the right
direction.

Footnote to J. W. van Wingen's remark about ISO 8859-2 translation:
doesn't IBM's decision to avoid data conversion problems by retaining
national versions of the extended EBCDIC code pages imply, already and
by itself, the impossibility of using the same translate table for the
various parts of ISO8859?  I agree it's a shame, it is a rather large
step in the wrong direction.  But is it a surprise?  At least part of
the mapping must be determined by the existing EBCDICs for Greece,
Israel, and so on.  What firm wants to impose immense data conversion
costs on whole countries of users, if they can avoid them by fouling
up the EBCDIC/ASCII translation problem a little bit more?  (I don't
mean just IBM -- I've seen ASCII machines screw up the translation too.)



It's depressing to think how long lists like this one are going to be
necessary.  Our poor hardware and software are going to continue to be
strained.  (By the way, thanks to JWvW for acknowledging that ISO 2022
had changed recently on the 94/96 character issue.  I thought I was
going crazy.)

All the above pessimism is my own and not the official policy of my
employer.

Michael Sperberg-McQueen, University of Illinois at Chicago
31-Aug-88  2:52:45-GMT,3038;000000000001
Received: from CU20B.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA02759; Tue, 30 Aug 88 22:52:39 EDT
Received: from CUVMB.CC.COLUMBIA.EDU by CU20B.CC.COLUMBIA.EDU with TCP; Tue 30 Aug 88 22:51:54-EDT
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Tue, 30 Aug 88 22:51:17 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 7711; Tue, 30 Aug 88 22:51:16 EDT
Received: by BITNIC (Mailer X1.25) id 7025; Tue, 30 Aug 88 22:53:09 EDT
Date:         Tue, 30 Aug 88 19:49:00 CDT
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Richard <TILLEY%UOFMCC.BITNET@cuvmb.cc.columbia.edu>
Subject:      Re: SHARE White Paper
To: Frank da Cruz <SY.FDC@cu20b.cc.columbia.edu>

>>IBM was following an international standard to our benefit.
If this were true no mapping would be needed.
Do you receive money from these people?

>>If you do not like the character set, your complaint is with ISO--not IBM
ISO8859-1 fills the need for a multi-lingual Latin character
set very well. Many people at many installations including this one
will be happy with it.  Perhaps most people in some W. European
countries will be happy with it. I have no complaints with ISO.
However, I get the impression that some members of this list believe
this character set is suitable for *general* use in North America.
Have I misunderstood?

>>IBM finally gave us the potential to have a 1-to-1 mapping between
>>CECPs 37 and 500 and an 8-bit "ASCII" ...
It would be an excellent achievement if this effort produced an
8 bit ASCII-to-EBCDIC conversion that gained as widespread use as
the most popular of the current 7 bit tables.
Code page 500 IS *NOT* just as much EBCDIC as code page 37 is,
because the former is inlikely to gain widespread use.
The latter has some chance although there will be a game of
musical brackets for the next decade. So whats a thousand wasted
man years?

>>Special characters, like the ones needed for publishing,
>>will require code page switching.
Agree. What I would like to minimize is the number of translate tables
tables visible to the user. Preferably just one for most users.

>>If you look at Code Page 500 and the "Standard" ASCII-to-EBCDIC conversions
>>you will discover that code page 500 maps very will into 7-bit ASCII.
Does your installation use "Standard" ASCII-to-EBCDIC conversions
to attach ASCII terminals to your EBCDIC mainframe?
I can think of 3 printable things to say about this table:
 - Some people use it.
 - Most people do not use it.
 - It has produced F.U.D.

Having 2 Code Pages for ISO8859-1 has produced F.U.D.
Moving Brackets from where IBM once recommended has produced F.U.D.
Moving Braces from where IBM once recommended has produced 15 years of F.U.D.
With no obvious explanation for these changes, one begins to
   suspect that their only purpose is the production of F.U.D.


31-Aug-88  3:59:43-GMT,1785;000000000001
Received: from CU20B.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA06584; Tue, 30 Aug 88 23:59:40 EDT
Received: from CUVMB.CC.COLUMBIA.EDU by CU20B.CC.COLUMBIA.EDU with TCP; Tue 30 Aug 88 23:58:56-EDT
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Tue, 30 Aug 88 23:58:20 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 7745; Tue, 30 Aug 88 23:58:19 EDT
Received: by BITNIC (Mailer X1.25) id 7513; Wed, 31 Aug 88 00:00:15 EDT
Date:         Tue, 30 Aug 88 23:38:03 EST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: John C Klensin <KLENSIN@infoods.mit.edu>
Subject:      RE:       Re: SHARE White Paper
X-To:         ASCII/EBCDIC character set related issues
              <ISO8859%JHUVM.BITNET@MITVMA.MIT.EDU>
To: Frank da Cruz <SY.FDC@cu20b.cc.columbia.edu>

For English and French-speaking North American purposes, since ISO8859-1
is a superset of ASCII (with all of the leading-zero-bit characters in
the same positions as the ASCII seven-bit characters), and contains, as
far as I know, an adequate set of non-ASCII characters (diacritical
markings, etc) to represent French, there appears to be no reason why it
should not be adopted for most general use when:
 - an eight bit set can be handled and processed   and
 - an ANSI/ISO character set is to be used, rather than, e.g., EBCDIC.
So, yes, it has been assumed that, in these contexts, 8859-1 is suitable
for general USA and Canadian use.  I lack the experience to know whether
it would be adequate/suitable for general Mexican use, so can't make a
general statement about North America.


31-Aug-88 17:00:15-GMT,5256;000000000001
Received: from CU20B.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA02848; Wed, 31 Aug 88 12:59:57 EDT
Received: from CUVMB.CC.COLUMBIA.EDU by CU20B.CC.COLUMBIA.EDU with TCP; Wed 31 Aug 88 12:59:11-EDT
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Wed, 31 Aug 88 12:58:34 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 8301; Wed, 31 Aug 88 12:58:32 EDT
Received: by BITNIC (Mailer X1.25) id 4246; Wed, 31 Aug 88 12:59:51 EDT
Date:         Wed, 31 Aug 88 09:40:27 EDT
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Edwin Hart <HART%APLVM.BITNET@cuvmb.cc.columbia.edu>
Subject:      Re: SHARE White Paper
To: Frank da Cruz <SY.FDC@cu20b.cc.columbia.edu>
In-Reply-To:  Your message of Tue, 30 Aug 88 19:49:00 CDT

I am NOT getting paid by IBM to work in this area.  They could not affort it.
In fact, my manager and wife and children
have many ideas about how I could better use the time I am
spending on this effort.  90% of my week at SHARE was devoted to these
issues rather than attending other sessions which would have benefited my
installation more (in the short term).  I am working in this area because I too
am angry and frustrated.  We have tried to fix this problem for the last
15 years and have been unsuccessful.

Do I like IBM's failure to make a decision on one EBCDIC for Latin Alphabet
Number 1?  NO!|  (If you are using CP 500 you would see "|!".)
In fact, IBM has taken an internal IBM argument and made it into
an international argument by encouraging code page 500 adoption in Belgium
and Switzerland.  My words for that are unprintable.  Such actions are
irresponsible and show no regard for customers.

Code page 500 is being widely used here and abroad--especially in Belgium and
Switzerland.  One of the members of the SHARE committee was using it
in the U.S. on 3274s just to avoid the ASCII-EBCDIC translate issues.  Messages
from Belgian installations on EARN have been coded in code page 500.
A recent message to me stated that code page 37 was not up for consideration
and that IBM documentation (and he quoted it) said to use code page 500 for
international operations.  I believe he was from Germany.  Just weeks ago,
I talked to an IBM contact in the Corporate standards area.  He said that IBM
as a Corporation (meaning all of the IBM development Divisions)
has NOT decided on one EBCDIC code page for ISO Latin Alphabet Number 1.

Concerning changing the code points for brackets from the TN/T11 print train,
if we have a requirement to move them back, we can certainly tell IBM that.
(We have, in fact, drafted such a requirement.)

Part of the problem is that IBM is too big and that too many people within
IBM have no idea of the problems.  The other part is that to correct the
problems requires changes to too many IBM products, and requires customers
to convert a lot of data and programs.  IBM understands big customers.  When US
multinational banks, insurance companies, manufacturers, oil companies start
telling IBM to fix the problems, IBM might put MORE resources into the effort
than they have now.  The intent of the SHARE paper is to have it approved as
a SHAREwide position paper.  In other words, to get approval from the 30 SHARE
managers who come from not only the Universities but also Commercial accounts.
As a part time effort, it will be 9 to 12 months before we can obtain SHAREwide
status.

If you want to contribute to the effort you are welcome.  But if you are angry,
take it out on IBM--not me.  I need some help with the discussions at the SHARE
meetings.  I need feedback on the content of the paper.  Is the paper clear?
Does it read smoothly?  Does it really describe all of the problems?  Where are
the mistakes in the paper?  Solutions to the problems are going to cost IBM
millions (billions?) of dollars, what business justification do we have to
convince IBM to spend that kind of money?  If you were faced with solving all
of the problems, it would be easier to stick your head in the sand that to try
to find a solution.  It will take an IBM Corporate commitment to solve the
problems.  One IBM Division cannot do it alone.  Right now, the IBM Corporation
has a tremondous dollar commitment to Systems Applications Architecture.  SAA
is the right target for this effort.  We have a window of opportunity.  If the
paper could have been handed over to IBM as a SHAREwide position paper in
August, we could have had an earlier and harder impact.

We have IBM people who work with the Committee.  They are commited to working
for solutions.  Material from SHARE European Association (SEAS) and the
SHARE white paper effort have already been used to influence IBM decision
makers to commit resources to solve the problems.  If you come to the SHARE
meeting, use some restraint when talking to the IBM people who are listening
and trying to help us.  They are people just like you and I.  They will get
very defensive if you start screaming at them.  We have a good working
relationship and I do not want to ruin it this far into the effort.

Ed Hart

 2-Sep-88 12:54:15-GMT,2369;000000000001
Received: from CU20B.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA28559; Fri, 2 Sep 88 08:54:12 EDT
Received: from CUVMB.CC.COLUMBIA.EDU by CU20B.CC.COLUMBIA.EDU with TCP; Fri 2 Sep 88 08:53:26-EDT
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Fri, 02 Sep 88 08:47:33 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 0170; Fri, 02 Sep 88 08:47:31 EDT
Received: by BITNIC (Mailer X1.25) id 7006; Fri, 02 Sep 88 08:45:44 EDT
Date:         Thu, 1 Sep 88 16:49:00 MET
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Johan van Wingen <MOSGLA%HLERUL2.BITNET@cuvmb.cc.columbia.edu>
Subject:      SC2 meeting
To: Frank da Cruz <SY.FDC@cu20b.cc.columbia.edu>


Dear list subscribers

The paper by Mr. Hart is a very valuable contribution to the character
code discussion with IBM. But we should not forget that also ISO codes
have their imperfections. ISO standards are not conceived in a ivory
tower, and development can be influenced. The responsible subcommittee
ISO/IEC JTC1/SC2 will meet in the week of 17 October in London.
Attendance is restricted to national delegates. You can be one, if you
be appointed by your national standards institute (the money for the
trip you have to provide yourself, mostly). If there is a national SC2
for your country, they will nominate. But, often they are in want for
people knowing the matter, who are also prepared to do some work. So,
contact your NSI, ask for the names of the people charged with character
codes and tell them your interests. If you are successful we'll see each
other in London, if not, explain to me your ideas, perhaps I can do
something with it. (For US citizens, ANSI rules are different, you have
to be backed by some organization.) Should you not know where your NSI
is, contact me.
I have just completed the first draft of a paper for SC22 and SC2, on
Coded Character Sets and Programming Languages. Its 800 lines will be
sent to Mr. Hart, to be ordered on request from JHUVM, for your comment.
Yours faithfully, Johan van Wingen

 FROM  J. W. van Wingen    MOSGLA@HLERUL2   :
     Mail to                                :
 P. O. Box 486,  2300AL Leiden, Netherlands :


 9-Sep-88 14:19:35-GMT,3809;000000000001
Received: from CU20B.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA26215; Fri, 9 Sep 88 10:19:19 EDT
Received: from CUVMB.CC.COLUMBIA.EDU by CU20B.CC.COLUMBIA.EDU with TCP; Fri 9 Sep 88 10:28:00-EDT
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Fri, 09 Sep 88 10:26:38 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 5539; Fri, 09 Sep 88 10:26:37 EDT
Received: by BITNIC (Mailer X1.25) id 4428; Fri, 09 Sep 88 10:28:30 EDT
Date:         Fri, 9 Sep 88 12:12:11 +0200
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Andre PIRARD <A-PIRARD%BLIULG11.BITNET@cuvmb.cc.columbia.edu>
Subject:      Re: CP 37 vs CP 500 vs ?
To: Frank da Cruz <SY.FDC@cu20b.cc.columbia.edu>
In-Reply-To:  Message of Wed, 7 Sep 88 17:18:52 EST from <KESICH@NYUCIMSA>

>What is BITNET's position on all this - have they chosen an official

BITNET transfers files without conversion and ignores codes issues apparently,
being mostly EBCDIC-EBCDIC or restricted to understanding notes.
But the problem *had* to be solved when the data goes through an ASCII-EBCDIC
gateway. These, to the best of my knowledge, converged to translating ASCII
to "that" EBCDIC which is Edwin's proposition: CECP 037 with brackets
at AD,BD. But they also convert ASCII circumflex 5E to EBCDIC 5F which is
the not sign in 037. Much EBCDIC data is consequently stored on BITNET servers
in this code. These gateways have thus established a de-facto standard.
Pressing them to change their translation to respect that of the
graphics "circumflex" and "not sign" would certainly cause acute problems.
On the other hand, when (if?) a BITNET standard code will exist, it would be
most welcome these gateways perform 8-bit translation of this code to ISO8859.
The other side is 8-bit mostly, isn't it?
And while we are at it, why not ask them to implement a "no conversion"
feature that would be specified in the RFC header?
Will other versions of ISO8859 and their corresponding EBCDICs (one each :-) )
be defined so that they use the same translation?

To be specific, I'd like to make sure the modified CECP 037 is as follows:

- 037 brackets are moved from BA to AD and BB to BD.
- the displaced characters conversely move from AD to BA and BD to BB.
- thus we have ISO-CECP 037' conversion 5B-AD 5D-BD DD-BA A8-BB.
? will "circumflex" and "not sign" still be 5E-B0 AC-5F (1)
  or as per the gateways 5E-5F and consequently AC-B0 ? (2)

I could not find a better solution than to implement (1) for terminal mode
and (2) for file transfer. (2) for terminal mode would impair either ISO8859
or CECP037 and worse, ASCII or EBCDIC.

Right?  Any comment?

>      1) code page translation can be implemented automatically and
>         transparently when data traverses certain RSCS links

IBM says so, but it is not that easy.
Translating requires both source and destination codes to be known, binary
being a special case where the table is null translation.
If a receiver cares to indicate its codepage, the only way to know the source
one is to have the sender tag the file accordingly. Many can be expected not
to care for that.
A by-site code tag to be added to the BITNET tables could be imagined,
but one still has to know if the file is binary or text and really that
installation's code.
No, I think this would add to the problem of knowing what was the source code
that of also knowing what translation RSCS did use and keep us checking files
codes forever. And the tagging of files can take longer to install than using
a common code, which is a better long term solution towards everyone's promised
peace of mind.

Andr).

 9-Sep-88 19:44:39-GMT,1699;000000000001
Received: from CU20B.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA18632; Fri, 9 Sep 88 15:44:28 EDT
Received: from CUVMB.CC.COLUMBIA.EDU by CU20B.CC.COLUMBIA.EDU with TCP; Fri 9 Sep 88 15:53:27-EDT
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Fri, 09 Sep 88 15:52:01 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 6067; Fri, 09 Sep 88 15:52:00 EDT
Received: by BITNIC (Mailer X1.25) id 1166; Fri, 09 Sep 88 15:52:02 EDT
Date:         Fri, 9 Sep 88 15:14:26 EDT
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Roger Fajman <RAF%NIHCU.BITNET@cuvmb.cc.columbia.edu>
Subject:      Re:  CP37 code assignments
To: Frank da Cruz <SY.FDC@cu20b.cc.columbia.edu>

Getting any given program to accept multiple code points for brackets
or whatever characters is only half the battle.  It also has to be able
to produce output that will be readable on the devices you are using.
How would you like to read a listing of a C program in which all the
brackets and braces printed as blanks?

As for BITNET, it seems to me that the network should pick a single
standard code page for EBCDIC text files and nodes should be encouraged
to translate whatever local code page they are using to that standard
as the file is transmitted.  The receiver can then translate it to
whatever they use.  I don't think that intermediate nodes should be
performing translations.  At this point, however, it seems best to see
what IBM does before making a decision about which code page to use.

10-Sep-88  8:29:24-GMT,4925;000000000001
Received: from CU20B.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA22026; Sat, 10 Sep 88 04:29:20 EDT
Received: from CUVMB.CC.COLUMBIA.EDU by CU20B.CC.COLUMBIA.EDU with TCP; Sat 10 Sep 88 04:29:12-EDT
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Sat, 10 Sep 88 04:27:49 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 6437; Sat, 10 Sep 88 04:27:47 EDT
Received: by BITNIC (Mailer X1.25) id 6492; Sat, 10 Sep 88 04:29:38 EDT
Date:         Sat, 10 Sep 88 03:20:00 CDT
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Richard <TILLEY%UOFMCC.BITNET@cuvmb.cc.columbia.edu>
Subject:      Re: CP 37 vs CP 500 vs ?
To: Frank da Cruz <SY.FDC@cu20b.cc.columbia.edu>

>   - thus we have ISO-CECP 037' conversion 5B-AD 5D-BD DD-BA A8-BB.
>   ? will "circumflex" and "not sign" still be 5E-B0 AC-5F (1)
>     or as per the gateways 5E-5F and consequently AC-B0 ? (2)
>
>   I could not find a better solution than to implement (1) for terminal mode
>   and (2) for file transfer. (2) for terminal mode would impair either ISO8859
>   or CECP037 and worse, ASCII or EBCDIC.

A better solution is to use (2) for both terminal mode and for file transfer.
This has the effect of interchanging "not" and "circumflex" in CP 37 so that
it would agree with the current de facto standard.
This standard is widely used in software that supports ASCII terminals such as
Waterloo Script, Kermit, and SAS. We rarely have to make mods to software.
Following existing standards is the easiest way to get a "Standard" accepted,
since it already has been accepted. No changes to gateways or languages such
as PLI would be needed. Existing EBCDIC devices would display a "not sign"
instead of a "circumflex" unless they were changed.

I believe that it is the nature of standards to evolve unless a lot of work
is done to invent new standards that do not follow the emerging one.
Such work can cause many years of problems. A good example is the original
"bit mapped" ASCII keyboard which took decades to finally expire.
IBM's decision to ignore both there own, and SHARE's recommmendations for
the position of "braces" is another example. In this case
it was the original standard that died.  The point is that it takes
at least a decade of "evolution" to resolve the confusion so caused.

The reason for the current relatively stable ASCII/EBCDIC standard is
that many years have elapsed since an existing standard was ignored.
I am not counting the new "Standard" in the VS Fortran manual since
it is too silly to be a threat. However CP 37 as it exists is very
much a threat. The lower half is only wrong in 3 places. This table
could very well become the next standard but only after another
decade of confusion. This confusion would not be "to our benefit."
However a modified CP 37 Version 2 as suggested above would become
a standard almost overnight.

One is tempted to wonder why, each time a standard evolves, a monkey
wrench is then thrown?  I don't believe the excuses that have been
suggested - IBM is not aware of the problem or is too big or cannot
afford to follow standards. Following standards is much easier
than ignoring them.  Who cares where the brackets and braces and
circumflex, tilde, vertical, etc. are, as long as they don't
dance around. This battle of the brackets may be a minor skermish
in the war between ASCII and EBCDIC. The eventual winner of this
war will be the one supported by the most software, much like
the VCR war between BETA and VHS. It has been suggested that it
is easy to modify software. This is true for software that has no
explicit support for ASCII terminals. I once installed a version
of APL from Yale that had more translate tables than I could count.
Some in TCAM. Some in APL itself. Some in "auxillary processors".
Data often went through more than one table serially, so that
changes to one table required changes to others. Unless there is
a standard translate it is a mistake to use ASCII terminals on
corporate mainframes.

Dedicated ASCII terminals seem to be going the way of dedicated
word processors - to be replaced with micros. The latter are able
to emulate either ASCII or EBCDIC terminals. Ten more years of
confusion will discourage developers from including ASCII support
in their software.

This current effort by SHARE to create a new "Standard" is likely
to delay the evolution of a standard since it is seeking support
from an organization that doesnt want a standard to evolve. Neither
SHARE nor BITNET could impose a standard without the help of
a rich organization. How about the US military? If they can make
COBOL a standard, then this should be trivial!

Sorry for the overall negative tone. I hope a few positive notes
crept through.

Date:         Fri, 4 Nov 88 20:23:43 GMT
Sender:       ASCII/EBCDIC character set related issues <ISO8859@JHUVM>
From:         "Matthias Melcher +49 6221 5645-23,-01" <$28@DHDURZ1>
Subject:      Code Page 037 vs. 500
To:           Frank da Cruz <SY.FDC@CU20B.COLUMBIA.EDU>

Today we received IBM's answer to our formal enquiry about their
recommendation concerning the choice between code page 500 and 037:

"The statement of Mr. Hart is correct that IBM has not decided on
a single code page with the character repertoire for the international
standard ISO 8859-1. This would not be within the meaning of the CECP
(Country Extended Code Page) concept either, which is supposed to
enable the user to undisturbedly migrate from the current to the
extended character repertoire of the respective EBCDIC version in use.

This does not alter the fact, however, that IBM declared the table 500
the "strategic" code page and - as correctly quoted by Mr. Melcher
from the CECP announcement - recommend it for international
applications.

The decision which table to use is always up to the user himself.
Regarding the internationality, however, of the network under discussion
I would consider it false and short-sighted to prefer a national version
of EBCDIC (even if it is the American one) to the international version.
This is especially true because also non-EBCDIC oriented devices and
systems will be connected to the network (7- and 8-bit ASCII). A
one-to-one correspondence of the characters of 7-bit ASCII to the
restricted EBCDIC, for instance, is given only when using table 500."

(Wilhelm Friedrich Bohn, National Requirements and Standards,
IBM Headquarter, Stuttgart, Germany)
11-Nov-88 22:13:58-GMT,1851;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA11215; Fri, 11 Nov 88 17:13:41 EST
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Fri, 11 Nov 88 17:14:28 EDT
Received: from PSUVM.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 5046; Fri, 11 Nov 88 17:14:26 EDT
Received: by PSUVM (Mailer X1.25) id 5545; Fri, 11 Nov 88 17:07:53 EST
Date:         Fri, 11 Nov 88 13:23:35 CST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Rick Troth <TROTH%TAMCBA.BITNET@cuvmb.cc.columbia.edu>
Subject:      Re: Code Page 037 vs. 500
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>
In-Reply-To:  Message of Fri, 4 Nov 88 20:23:43 GMT from <$28@DHDURZ1>

        That code page 500 is "strategic" conflicts with the CECP objective
of "undisturbed" migration because:  all of the gateways on this international
network (that translate EBCDIC to ASCII and vice versa) conform to an EBCDIC
code page incompatible with CP 500.  How can you "undistrubedly" migrate when
everything current is written for a not-compatible code set?

        But obviously Code Page 037 is not satisfactory either,  although it
does conform to most existing compilers (except C, TeX, SAS, etc).  Ed, what do
do we call a modified CP 037?  "Network EBCDIC" ?  "Code Page 037-M" ?
And if we just swap the brackets from CP 37 to their Kermit/WiscNet/7171
points,  what do we do about the circumflex -vs- logical not problem?

"Truth is truth"                             - Rick Troth <TROTH@TAMCBA.BITNET>
        Louis Gossett, Jr.                                 TAMCBA VM Operations
                "Enemy Mine"                      Texas A&M College of Business

12-Nov-88  3:19:46-GMT,2265;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA05421; Fri, 11 Nov 88 22:19:43 EST
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Fri, 11 Nov 88 22:20:33 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 5301; Fri, 11 Nov 88 22:20:31 EDT
Received: by BITNIC (Mailer X1.25) id 1093; Fri, 11 Nov 88 22:19:50 EST
Date:         Fri, 11 Nov 88 21:39:41 EST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Edwin Hart <HART%APLVM.BITNET@cuvmb.cc.columbia.edu>
Subject:      Re: Code Page 037 vs. 500
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>
In-Reply-To:  Your message of Fri, 11 Nov 88 13:23:35 CST

In the SHARE paper we are asking for one "Reference EBCDIC", it may be
code page 500 v1 or code page 37 v1 (or v2 with the brackets at X'AD' and
X'BD'.  I do not know.  I get conflicting requirements from discussions.

  1.  Some say give me one standard, I don't care what it is as long as
      you support it.  (I would include compiler support here.)  I will
      convert once, but don't ever ask met to do it again.
  2.  Others say give me one standard but make it code page 37 with the
      brackets in the TN/T11 code points.
  3.  Many Europeans have already converted to code page 500, they do not
      want to convert to code page 37.  I can't blame them.  However, they
      are already 99% there with code page 500.  (characters at 7 code points
      are shuffled around between CP 37 and CP 500).

For the logical not problem, I see requirements for:

  1.  a utility to help you convert data and programs from many different
      code pages into the new "Reference EBCDIC and ASCII" code pages.
      I also want it to be able to map both the EBCDIC not (^) and circumflex
      (5) into the logical NOT.  I also want both vertical bar (|) and split
      vertical bar (:) to map into the logical OR.  Some also want the tilde
      to map into the logical NOT.
  2.  compilers to recognize (possibly as a user invoked option) these kinds
      of relationships

What do you think?
Ed Hart

17-Nov-88  1:15:26-GMT,3794;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA23325; Wed, 16 Nov 88 20:15:09 EST
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Wed, 16 Nov 88 20:15:13 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 1534; Wed, 16 Nov 88 20:15:10 EDT
Received: by BITNIC (Mailer X1.25) id 3262; Wed, 16 Nov 88 20:14:30 EST
Date:         Wed, 16 Nov 88 19:35:58 +0100
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Andre' PIRARD <A-PIRARD%BLIULG11.BITNET@cuvmb.cc.columbia.edu>
Subject:      Re: Code Page 037 vs. 500
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>
In-Reply-To:  Message of Fri, 4 Nov 88 20:23:43 GMT from <$28@DHDURZ1>

>Today we received IBM's answer to our formal enquiry about their
>recommendation concerning the choice between code page 500 and 037:
> [text deleted]
>enable the user to undisturbedly migrate from the current to the
>extended character repertoire of the respective EBCDIC version in use.
>
>This does not alter the fact, however, that IBM declared the table 500
>the "strategic" code page and - as correctly quoted by Mr. Melcher
>from the CECP announcement - recommend it for international
>applications.
>
>The decision which table to use is always up to the user himself.

Before CECP's, we (Belgian) decided of an ASCII/EBCDIC conversion table
when we started to use communication software. This was in fact
removing a historical mod and adopting the standard VM tables. From
then on, every software in sight, the 7171's, and joining BITNET were
telling us we had made the right move.
Having two kinds of 3270 terminals and some funny printers was
considered as the inevitable result of the chaos of the computing
world. Without our accented letters, we were used to restrictions and
didn't care much as long as a capital A was a capital A.

That the hardware finally supporting these accented letters was of
that strange kind too was discovered by chance and it came as a shock
when I did. Only later could we raise a discussion with IBM and hear
the same tune as the one quoted above and of the existence of 037.
Only later did I learn from BITNET that our code is called a ghost
name 037 v2 and that many people love ghosts.

What undisturbed migration means to us is that I had to start CECP 500
support on our 7171's and file transfer in addition to 037 v2. That it
took a long time, catalysed a serious bug in the PC (KEYB losing
interrupts) and an annoying feature of the 7171 in APL mode
(refreshing the end of line on overstrike) and raises embarrassing
questions from our users.

>Regarding the internationality, however, of the network under discussion
>I would consider it false and short-sighted to prefer a national version
>of EBCDIC (even if it is the American one) to the international version.

I *am* short-sighted, but still can tell an exclamation mark instead
of a vertical bar in a VM help screen.
Why does such a strategic code lack a decent font on our 3812?
Is all that software really going to be converted or is GDDM be a
CECP 500 to 037 translator for European use only?

>This is especially true because also non-EBCDIC oriented devices and
>systems will be connected to the network (7- and 8-bit ASCII). A
>one-to-one correspondence of the characters of 7-bit ASCII to the
>restricted EBCDIC, for instance, is given only when using table 500."

I can translate any PC code page to any CECP for the simple reason
that IBM has defined translation between all these tables and ISO8859
and claims it's the rule of the game of future communication.
And I applaud this point.

Andr).

17-Nov-88  1:24:46-GMT,5138;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA24689; Wed, 16 Nov 88 20:24:42 EST
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Wed, 16 Nov 88 20:24:47 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 1544; Wed, 16 Nov 88 20:24:45 EDT
Received: by BITNIC (Mailer X1.25) id 3501; Wed, 16 Nov 88 20:24:11 EST
Date:         Wed, 16 Nov 88 17:38:08 +0100
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Andre' PIRARD <A-PIRARD%BLIULG11.BITNET@cuvmb.cc.columbia.edu>
Subject:      Re: Code Page 037 vs. 500
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>
In-Reply-To:  Message of Mon, 14 Nov 88 09:18:18 EST from <GILBERT@YALEVM>

Two problems of codes are discussed on this list.
The first is the meaning of computer encoded data and is an intricate
one indeed. And I totally aggree that software must be designed to be
code independent.
The second is that our networks are text based and do not transport
data in an undisturbed way, just because of of a lack of agreement in
the field of the first problem. What should be a simple matter is
hidden behind a puzzle, despite a simple evident solution.

Either code A and B do not represent the same objects and there is no
meaning attached to transcoding one to the other. Or they do and there
is no theorical reason more than one should exist. Practically
however, at least ASCII and EBCDIC exist for 7-bits codes and a host
of looking alike ones for 8-bit extensions.

To cope with N+1 codes that can fully transcode one to the other (or
whose similarity is such that an approximate transcoding can be
defined and accepted), a common single reference code is needed and
should be the favoured vehicle. Else, every user of one of these must
cope with N translate tables (and find out which to use) instead of a
single one. This amounts to a total of N! tables pairs (some will
write N| :-) instead of N. This is what I call each minding his own
business.
I would have reached more than 100000 otherwise.
At least, there should be no ambiguity as to the code used on a given
data path in terms of translation to the reference one, and, by
extension, to any other, when, for any reason, a code other than the
chosen reference one is used. This means the transcodings by gateways
should be coherent and the data unchanged when it returns to a path
using the same code.
Given that, anyone will be able to peacefully send his C source file
to anyone.

Transporting data that cannot meaningfully transcode to the reference
code would suggest that a so-called binary mode should be implemented,
that is no translation across gateways, in fact the less ambiguous
translation. But if we have agreed upon the first point, we know that
user B will receive unmodified data from user A as long as both ends
communication lines use the same code.
If they don't and the data is meant to be usable on the receiving
system, two codes exist to represent the data.
These two codes should translate the same way as would data of the
reference code. This is the second rule of the game.

Of course, 8-bit codes will travel easily only on 8-bit paths, but I
think 8-bit communication is one thing easily at hand.
8-bit codes are limited, but just as 8-bit I/O chips are still used
with 32-bit processors I see no near future for wider.
So, we must admit a single 8-bit code will not cover all needs. The
various ISO8859 versions are the most obvious example and it makes me
sorry to see that we are forced to repeat with 8-bit a cause of the
present problem, which was tucking different codes on 7-bit. But this
time, we are limited by hardware.
So, unless code switching techniques are used (exactly what the
extension to 8-bit is trying to avoid), the x of ISO8859-x will be a
tag of the data. But the data lines will be independent of x.

The problem of deciding of wich translate tables to use is complicated
by the fact that it must be discussed in terms of practical existing
codes and will favour one instead of the other. And this raises
religion wars, apparently only in the EBCDIC world. Ironically enough
because of a 7-bit problem only and a handful of code points.

ISO8859 does not look like controversed in the ASCII world and is what
I take as the reference one.
But is must be emphasised that all devices loose their ASCII label
when considered with an 8-bit point of view. An IBM PC, Macintosh or
whatever must translate its code to ISO8859 on the communication line,
be it for text file transfer or terminal mode. This requires that a
precise translate table be defined for the 256 code points. This has
been done by IBM, but I have been unable to obtain that from Apple. I
did not even try others.

If anyone knows any, I am much interested.

All this with the best of my limited knowledge of a code called
English. But this is another problem that was forced upon us by the ages.
Let us at least make simple what we can invent.

Andr).

23-Nov-88  5:19:08-GMT,1710;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA00983; Wed, 23 Nov 88 00:19:04 EST
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Wed, 23 Nov 88 00:08:06 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 0162; Wed, 23 Nov 88 00:08:05 EDT
Received: by BITNIC (Mailer X1.25) id 2703; Wed, 23 Nov 88 00:07:59 EST
Date:         Tue, 22 Nov 88 12:13:20 CST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Michael Sperberg-McQueen <U18189%UICVM.BITNET@cuvmb.cc.columbia.edu>
Subject:      TCP/IP support for ISO8859?
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>

When I asked this question on the TCP list, no one answered, either
because they thought it uninteresting, or because they didn't know the
answer, or because they didn't understand the question.  This list ought
at least to understand the question.

I am using IBM's VM and PC TCP/IP products to Telnet into our 3081
running VM/CMS as a virtual 3270, across an Ethernet.  The connection is
clearly an eight-bit connection, and the code points not assigned by
94-character EBCDIC are being mapped into the eight-bit extended ASCII
of my PS/2.

The question:  does anyone know where this mapping is being done, or
what is needed to customize it (or, I should say, to correct it) so that
it maps correctly from the PS/2 modification of ISO8859-1 to one or the
other of the extended EBCDIC code pages?

Many thanks for any hints or tips.

-Michael Sperberg-McQueen
 University of Illinois at Chicago

23-Nov-88  5:19:12-GMT,1918;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA00987; Wed, 23 Nov 88 00:19:07 EST
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Wed, 23 Nov 88 00:13:13 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 0170; Wed, 23 Nov 88 00:13:12 EDT
Received: by BITNIC (Mailer X1.25) id 2958; Wed, 23 Nov 88 00:12:57 EST
Date:         Tue, 22 Nov 88 18:19:46 +0100
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Andre' PIRARD <A-PIRARD%BLIULG11.BITNET@cuvmb.cc.columbia.edu>
Subject:      Translation of ASCII DEL
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>

A discussion with John Chandler raised a question.
I modified his Kermit 370 tables as per the tables I got from IBM
as the official ones and I once sent to this list to have Kermit
translate 037 v2 to ISO8859 by default.
Modifying IBM's 037 to be 037 v2 and applying them to Kermit's
was only extending the Kermit tables. Except that the Ascii DEL was
now translated to EBCDIC FF instead of the former 07 which is labeled
as DEL in the EBCDIC chart.

What happens is that, in addition to PC graphic symbols, IBM tucked two
characters in the ISO 80-9F unassigned range:
9Fa=07e=Florin sign     LI61
9Ea=0Ae=i dotless small SC07
Other control codes that Kermit used to translate to nulls now have
a definition for a graphic in that range.
I see something good in the IBM tables. They are defined for all the 256
code points and are revertible. I take it as IBM strictest right to
define a translation of its 32 additional Ctl characters to the 32 undefined
ones of ISO8859 for that sake. Why they chose not to assign the florin
sign to FF, I don't know.

A Florin is a Gulden, isn't it Johan?

Any idea?

Andr).

23-Nov-88  5:19:06-GMT,1231;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA00979; Wed, 23 Nov 88 00:19:01 EST
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Wed, 23 Nov 88 00:03:47 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 0160; Wed, 23 Nov 88 00:03:45 EDT
Received: by BITNIC (Mailer X1.25) id 2554; Wed, 23 Nov 88 00:03:49 EST
Date:         Tue, 22 Nov 88 15:38:09 EST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Edwin Hart <HART%APLVM.BITNET@cuvmb.cc.columbia.edu>
Subject:      Re: TCP/IP support for ISO8859?
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>
In-Reply-To:  Your message of Tue, 22 Nov 88 12:13:20 CST

IBM VM TCP/IP connection to PC and translations.

You might try using code page switching with DOS 3.3 or 4.1 to use code page
850 which contains all of the characters of ISO 8859-1.  I am using it
successfully with the IBM PC 3270 Emulation Program Version 3.03 to transmit
files back and forth over a 3270 coax connection to Code Page 37, v1 on the
mainframe.

Ed Hart

29-Nov-88  2:41:21-GMT,2646;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA08871; Mon, 28 Nov 88 21:41:12 EST
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Mon, 28 Nov 88 21:40:57 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 5924; Mon, 28 Nov 88 21:40:55 EDT
Received: by BITNIC (Mailer X1.25) id 4552; Mon, 28 Nov 88 21:36:54 EST
Date:         Wed, 23 Nov 88 12:43:09 CST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Rick Troth <TROTH%TAMCBA.BITNET@cuvmb.cc.columbia.edu>
Subject:      Re: Code Page 037 vs. 500
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>
In-Reply-To:  Message of Fri, 11 Nov 88 21:39:41 EST from <HART@APLVM>

        Ed,  you answered my questions concisely.  Thank you.  I have put off
replying because you asked what I think and that means I will have to stop and
do so.

        I suggested to the networking group that Texas A&M adopt Code Page 37
and was met with  "hem ... haw"  response.  So I waited,  hoping that something
would develope from Share 71.5 or elsewhere.  Then I read the statement from
IBM in Europe (was that Germany?) supporting Code Page 500 for  "international"
use;  that bothers me since I am partial to CP 037,  (see below).

        We are a  "traditional VM EBCDIC"  site:  we have a 7171,  run Kermit
quite a bit,  have an RSCS connection to dozens of VAXen,  etc.  What I can now
call CP 37 V2 will work VERY well for us.  (Over and over again)  because of
the defacto translation in WISCNET and cousins,  CP 37 V2 is an easy extension
of  "EBCDIC".

        As I am a C fan of late,  the 7 points different between CP 037 and
CP 500 is the biggest headache IBM has created lately  (second to CP R5).
In C,  !  means  "logical NOT"  and  |  means  "bitwise OR".  Furthermore,
!=  is the relation  "does not equal"  and  |=  is bitwise OR assignment.
Brackets  []  (I now use points AD and BD without fear!)  are used to sub-
script arrays.  In a world without CP 500,  multiple code points can be mapped
to a signle meaning without having to toggle a compiler option switch,  as in
AD, BA, and (from "the 3180 set") 41 all being mapped to an open bracket.
But enter CP 500 and such  "universal mapping"  fails.

"It is free;  it is not cheap."                Rick Troth <TROTH@TAMCBA.BITNET>
 - Chris Osborne                                           TAMCBA VM Operations
                                                  Texas A&M College of Business

29-Nov-88 14:37:38-GMT,1540;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA13335; Tue, 29 Nov 88 09:37:34 EST
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Tue, 29 Nov 88 09:37:18 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 6376; Tue, 29 Nov 88 09:37:16 EDT
Received: by BITNIC (Mailer X1.25) id 5902; Tue, 29 Nov 88 08:56:37 EST
Date:         Tue, 29 Nov 88 08:39:11 EST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Edwin Hart <HART%APLVM.BITNET@cuvmb.cc.columbia.edu>
Subject:      Re: Code Page 037 vs. 500
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>
In-Reply-To:  Your message of Wed, 23 Nov 88 12:43:09 CST

I am writing the requirements in the paper now.  Basically we ask that IBM
standardize on one CECP for Latin alphabet no. 1 as defined in ISO 8859-1.
Although the wording will say SHARE prefers Code Page 37 over 500, we say
that CP 500 is acceptable if the EBCDIC compilers are modified to use CP 500
code points.  Furthermore, if IBM selects CP 37 as the base, we require that
either the IBM PASCAL and C compilers (on mainframe and midrange) be changed
to use the BA, BB brackets or that CP 37 v2 be defined with brackets in the
AD, BD code points.  In any case some kind of translation utility is required
for migration, particularly in Europe.

Wait for the exact wording.

Ed Hart

15-Dec-88 13:36:32-GMT,1876;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA00874; Thu, 15 Dec 88 08:36:27 EST
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Thu, 15 Dec 88 08:38:44 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 6322; Thu, 15 Dec 88 08:38:42 EDT
Received: by BITNIC (Mailer X1.25) id 8918; Thu, 15 Dec 88 08:38:54 EST
Date:         Thu, 15 Dec 88 14:31:00 CET
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Johan van Wingen <MOSGLA%HLERUL2.BITNET@cuvmb.cc.columbia.edu>
Subject:      sc2
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>


Dear list subscribers
I paused a while with contributing to the list. Thorough comments on the
various proposals require more time than I had as yet, but some news
from the ISO JTC1/SC2 meeting in London, 17-21 Oct. may interest you. I
also attended WG2, Multiple octet coding.

Of course nothing was said about EBCDIC, but Mr. W. F. Bohn was there,
and at least two other people from IBM.  A number of Resolutions was
adopted, I'll give the text in my next contribution.

Some comments I cannot leave to a later moment. I was quite perplexed
when reading that CP037 has "versions". What does this mean? Also it was
proposed to copy things from Postscript. It should be remembered that
Adobe is taking an active part in ISO standardization and may change its
code tables to those from ISO just the moment it likes. Besides that,
ISO is now busy with developing the successor of Postscript called SPDL,
Standard Page Description Language. Doing the actual work are people
from Adobe, Xerox and IBM.

FROM  J. W. van Wingen    MOSGLA@HLERUL2
Mail to
P. O. Box 486,  2300AL Leiden, Netherlands

12-Jan-89 23:47:35-GMT,1997;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA20709; Thu, 12 Jan 89 18:47:27 EST
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Thu, 12 Jan 89 18:46:25 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 0810; Thu, 12 Jan 89 18:46:23 EDT
Received: by BITNIC (Mailer X1.25) id 8384; Thu, 12 Jan 89 18:48:25 EST
Date:         Thu, 12 Jan 89 18:44:04 EST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Brian Eliot <USERALVE%RPITSMTS.BITNET@cuvmb.cc.columbia.edu>
Subject:      IBM 3174 code page/character set description
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>

A new manual I just received contains some interesting material on this
topic.  It is

  GA27-3831-0  "3174 Subsystem Control Unit Character Set Reference"

I believe this replaces the earlier manual

  GA27-2837    "IBM 3270 Information Display System Character Set Reference"

which applies to older 3270 control units.  The items I noted were

 1.  3270 national language support is described in terms of "code pages"
     and "character sets" rather than the earlier "I/O interface codes".
     Thus the description clearly distinguishes character sets, character
     generators (display hardware), and code pages.

 2.  There is a description of the values returned by a Query Reply
     (Character Sets) structured field.  This query may be used to ask
     the terminal what character set/code page combinations it supports.
     Only a few terminals support the CGCSGID field.

 3.  In conjunction with the manual GA23-0214-3 "3174 Subsystem Control
     Unit Customizing Guide" you can find out about Country Extended Code
     Page (CECP) support.

 4.  A mapping is implicitly defined for certain control codes between
     EBCDIC and ASCII-8 (a.k.a. ISO 8859).

12-Jan-89 22:16:53-GMT,1561;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA16573; Thu, 12 Jan 89 17:16:48 EST
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Thu, 12 Jan 89 17:15:46 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 0586; Thu, 12 Jan 89 17:15:45 EDT
Received: by BITNIC (Mailer X1.25) id 0476; Thu, 12 Jan 89 17:16:55 EST
Date:         Thu, 12 Jan 89 16:21:32 EST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Frank da Cruz <fdc@cunixc.cc.columbia.edu>
Subject:      ISO8859 vs Kermit
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>

We are looking into the possibility of adding ISO8859 transfer syntax to the
Kermit protocol, to allow for transfer of textual data in other than the Roman
ASCII alphabet, including the transfer of text in mixed alphabets.

Unfortunately, I have yet to see the actual 8859 documents, and I don't really
understand how one transmits (or stores) text in mixed alphabets.  Is there
some kind of meta-character or sequence that introduces an "alphabet shift",
followed by a code that designates the alphabet to be used?  If so, can anyone
describe the actual mechanism, what the alphabet codes are, etc?  (Not the
alphabets themselves!  Just the mechanism for identifying them and switching
among them.)

Any information, insights, suggestions, caveats, etc, would be most
appreciated.


16-Jan-89 15:45:20-GMT,5629;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA26322; Mon, 16 Jan 89 10:45:16 EST
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Mon, 16 Jan 89 10:44:21 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 4417; Mon, 16 Jan 89 10:44:19 EDT
Received: by BITNIC (Mailer X1.25) id 2349; Mon, 16 Jan 89 10:45:57 EST
Date:         Mon, 16 Jan 89 14:00:17 +0100
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Andre' PIRARD <A-PIRARD%BLIULG11.BITNET@cuvmb.cc.columbia.edu>
Subject:      Re: ISO8859 vs Kermit
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>
In-Reply-To:  Message of Thu,
              12 Jan 89 16:21:32 EST from <fdc@CUNIXC.CC.COLUMBIA.EDU>

>We are looking into the possibility of adding ISO8859 transfer syntax to the
>Kermit protocol, to allow for transfer of textual data in other than the Roman
>ASCII alphabet, including the transfer of text in mixed alphabets.

Nice to meet you here too, Frank,

Well,  it's  true ISO and ANSI define escape mechanisms to switch
from  one character set to another and in particular between  the
G0  and G1 sets of a single version of ISO 8859 when  transmitted
over a 7-bit line.  I don't think the intent is to define a means
to  store the data,  what Kermit is involved in transmitting.  It
would both be very inefficient in terms of storage space and ease
of  processing and take us back to the previous  situation  where
accented letters were stored in the form of printer-ready symbols
overstrikes,  exactly  what ISO is trying to avoid.  While  these
escape mechanisms can be used to implement a super terminal (this
may  apply to a Kermit's terminal mode) which would know all  ISO
8859  versions  and would be driven by a fancy  host,  this  host
would be better off storing its data in 16-bits or more elements.
Consequently,  Kermit  would  transmit these.  I think  that  the
ISO8859 versions are exclusive,  but that they must translate the
same  way between ANSI and EBCDIC.  IBM switches character  sets,
but does not mix them.

16 or more bits codes is a final solution,  but puts a heavy load
upon  hardware.  The  only place I've read anything like  it  but
theory  is in the OS/2 technical manual which speaks of  DBCS  in
chapter 6 ("Language DBCS environment vector of lead bytes",  how
filename  elements are not truncated in case DBCS is involved and
such faint remarks I'd like to know more about).  But DBCS  still
means  "double  byte character sets" and does not look like  true
16-bit codes.

Anyone knows more about that?

As to Kermit dealing with ISO 8859, I've done that between IBM PC
and CMS,  and it may be interesting to explain how.  Both the CMS
(and it could be TSO) through the 7171 and the IBM PC act as  ISO
8859 host and terminal respectively,  because I assume every byte
that  travels on the communication line is (at least supposed  to
be)  coded  in ISO.  Which version is irrelevant if I'm right  in
saying  all versions translate the same between ANSI/ISO  and  an
IBM mainframe code page. The IBM world is the worst case, because
code  pages  for  a single ISO version are multiple on  the  same
machine.  The  working  in  a different ISO  version  would  just
involve  a code page switch in terminal mode and when having  DOS
process the data.

- The  7171  translate tables have been set up to  translate  the
host  code  page to/from ISO 8859/x.  Which code page for the  /x
version  is used (037v2 or 500) is selected by the answer to  the
terminal type request.
- CMS  Kermit  translate  tables have  been  modified  to  extend
ASCII/EBCDIC  translation  to ISO/CECP 037v2 to minimize  dynamic
redefinition.  E. G. selecting CECP 500 is now a handful of SETs.
Thanks to John Chandler for a versatile file transfer translation
support.
- The program on the micro translates transferred text files from
the line's ISO code to/from a user's selectable one (437,  850 or
ISO itself which means no translation). This is super easy to add
to any Kermit (just the user interface causes problems).
- It does the same for terminal mode.  Easy too: SI/SO + a simple
translation some already do.

This  is  in line with the idea I once developed  on  the  Kermit
lists  that  using  ISO  as  the  inter-systems  vehicle   really
simplifies the handling and user understanding of the various IBM
or  other's codes (each system deals with its own(s)).  In  fact,
I've  made  a step beyond that.  In addition to the one for  file
transfer,  the translations made in the program on the micro  are
made  at  the  keyboard  and screen  interfaces.  This  means  it
processes the ISO code in memory (but it could be any) and  never
does  translation of the line code.  The internal encoding of the
program's  messages is ISO;  this makes them independent  of  the
code  page the systems uses.  Two translation are made.  One  for
menu mode and the other for terminal mode.  The one for menu mode
is  the  translation  between ISO and the system's code  page  at
startup.  The one for terminal mode is the user's choice and  may
also  implies  a code page the system is asked to switch to  each
time the user enters terminal mode.

Again,  there is no restrictions on which code can be  used.  New
ones can be added to the program by configuration,  including the
null-translation-throughout  so  that it remains compatible  with
any Kermit implementation.

I hope this will help.

Andr).

16-Jan-89 20:46:31-GMT,1906;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA08283; Mon, 16 Jan 89 15:46:28 EST
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Mon, 16 Jan 89 15:45:33 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 4742; Mon, 16 Jan 89 15:45:31 EDT
Received: by BITNIC (Mailer X1.25) id 5200; Mon, 16 Jan 89 15:47:36 EST
Date:         Mon, 16 Jan 89 10:49:33 EST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Edwin Hart <HART%APLVM.BITNET@cuvmb.cc.columbia.edu>
Subject:      Re: ISO8859 vs Kermit
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>
In-Reply-To:  Kermit and ISO 8859

For character set switching, see the ISO 2022 standard.  The ANSI X3.134.1
standard, 8-bit ASCII Structure and Rules appears to have the information
you will need.  ANSI X3.134.2 is the U.S. equivalent of ISO 8859-1.  However,
as I read it, it allows the 8859-1 characters to exist in the 7-bit world.
This may be of interest to you.  You should also read about IBM Country
Extended Code Pages (9 of them) which have the same character set as ISO
8859-1, and PC Multilingual Code Page 850.  (See SHARE 69 Proceedings, pp.
19-28, August, 1987.)

With respect to ISO 8859-2, and the corresponding IBM Code Page, the translate
table for these two is DIFFERENT from the one for 8859-1 to CECPs.  I do not
know about the other 8859 and IBM code page translate tables.

If you have an IBM APA printer and SCRIPT, I can send you code tables for
ISO 8859-1, CECP 37 v1. Data Processing Code Page, and CECP 500 v1. Office
Systems Code Page.  The code tables print correctly except for about 5-10
characters.

ISO standards are available from ANSI which is right in New York City.

Ed Hart

24-Jan-89 12:15:21-GMT,3335;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA29846; Tue, 24 Jan 89 07:15:09 EST
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Tue, 24 Jan 89 07:13:04 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 5066; Tue, 24 Jan 89 07:13:02 EDT
Received: by BITNIC (Mailer X1.25) id 3455; Tue, 24 Jan 89 07:14:59 EST
Date:         Tue, 24 Jan 89 12:45:00 CET
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Johan van Wingen <MOSGLA%HLERUL2.BITNET@cuvmb.cc.columbia.edu>
Subject:      Parts of ISO 8859
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>


Dear list subscribers
Here is the information that was wanted about ISO 8859.
  ISO 8859               8-bit single byte coded graphic character sets
  ISO 8859/1  1987-02-15  Latin alphabet no. 1
  ISO 8859/2  1987-02-15  Latin alphabet no. 2
  ISO 8859/3  1988-04-15  Latin alphabet no. 3
  ISO 8859/4  1988-04-15  Latin alphabet no. 4
  DIS 8859/5 (1988-03-15) Latin/Cyrillic alphabet
  ISO 8859/6  1988-08-15  Latin/Arabic alphabet
  ISO 8859/7  1987-11-15  Latin/Greek alphabet
  ISO 8859/8  1988-06-15  Latin/Hebrew alphabet
  DIS 8859/9 (1989-02-15) Latin alphabet no. 5
  ISO 9036    1987-04-15  Arabic 7-bit coded character set
                      for information interchange
  (date for a DIS means "voting terminates on:".)

There is a list of languages covered by each of the 9 parts, under
"Field of application". This includes:
for Part 1:
Spanish, Portuguese, Italian, French, English, Irish, German, Dutch,
Danish, Faeroese, Icelandic, Norwegian, Swedish, Finnish.
for Part 2:
English, German,
Czech, Slovak, Hungarian, Polish, Rumanian, Serbocroatian, Slovene,
Albanian.
for Part 3:
Spanish, Italian, French, English, German, Dutch,
Afrikaans, Catalan, Maltese, Turkish, Esperanto.
for Part 4:
English, German,
Danish, Greenlandic, Norwegian, Swedish, Finnish, Lappish,
Estonian, Latvian, Lithuanian.
for Part 5:
English,
Russian, Byelorussian, Ukrainian, Bulgarian, Serbocroatian, Macedonian.
for Part 9:
(as Part1, but with Turkish instead of Icelandic)

Annex A gives: "The coded character set of this part of ISO 8859 contains
graphic characters used in at least the following countries:".
This includes:
for Part 1: all countries of North, South and Middle America, Australia,
New Zealand, Spain, Portugal, Italy, France, United Kingdom, Ireland,
Switzerland, Liechtenstein, Austria, Germany,
Belgium, The Netherlands, Luxemburg,
Denmark, Faroe Islands, Iceland, Norway, Sweden, Finland.
for Part 2:  Switzerland, Austria, Germany,
Czechoslovakia, Hungary, Poland, Romania, Yugoslavia, Albania.

The Parts 1,2,3,4,9 include MULTIPLY and DIVIDE, always with the same
code. Parts 5,6,7,8 do not.

  Correspondence between ISO and ECMA standards
    ISO    ECMA    Registration number of escape sequence (ISO 2375)
   8859/1    94    100
   8859/2    94    101
   8859/3    94    109
   8859/4    94    110
   8859/5   113    111
   8859/6   114    127
   8859/7   118    126
   8859/8   121    138
   8859/9   128    148

FROM  J. W. van Wingen    MOSGLA@HLERUL2
Mail to
P. O. Box 486,  2300AL Leiden, Netherlands

 1-Feb-89 17:19:02-GMT,1904;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA29458; Wed, 1 Feb 89 12:18:28 EST
Received: from CUVMB.CC.COLUMBIA.EDU(MAILER) by CUVMB.CC.COLUMBIA.EDU(SMTP) ; Wed, 01 Feb 89 12:13:15 EDT
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 6487; Wed, 01 Feb 89 09:33:07 EDT
Received: by BITNIC (Mailer X1.25) id 5236; Wed, 01 Feb 89 10:30:55 EST
Date:         Wed, 1 Feb 89 14:53:00 CET
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM.BITNET@cuvmb.cc.columbia.edu>
From: Johan van Wingen <MOSGLA%HLERUL2.BITNET@cuvmb.cc.columbia.edu>
Subject:      ISO 8859-5
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>


Dear list subscribers
I just received ECMA Memento 1989. It includes a list of ECMA Standards,
with the remark: "Free copies of all documents listed below are
available upon request."  They are mostly identical of those of ISO.
The address is ECMA Headquarters, Rue du Rhone 114, CC-1204 GENEVA,
Switzerland, (Telex 222.88, after 1989-06-14 41.32.37). The document
numbers are in my previous mailing.
As for Cyrillic (8859-5), the code is NEW (from the USSR). Col.s 11,12
now contain 32 capitals and 14,15 32 small letters in the CORRECT
alphabetic order. Col. 10 contains the capitals of Jugocyrillic etc.,
and col. 15 the small ones. In 10 there is NBSP, E-trema, Dj, G-acc,
Ukr. E, Maced. S, I, I-trema, J, Lj, Nj, H-barred, K-acc, SHY, U-short,
Dz. In 15 there is "No" at 15/00 and "SS" (paragraph sign) at 15/13.
Note that the Jugoslav Nat. Standard is different, conforming to
the alphabetic order of the Latin transliteration, (just like the old
GOST).  DIS 8859-5.2 contains several mistakes in the letter names.

FROM  J. W. van Wingen    MOSGLA@HLERUL2
Mail to
P. O. Box 486,  2300AL Leiden, Netherlands

10-Feb-89 14:29:42-GMT,1391;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA21089; Fri, 10 Feb 89 09:29:37 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.CC.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 4820; Fri, 10 Feb 89 09:27:09 EST
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 9342; Fri, 10 Feb 89 09:27:08 EST
Received: by BITNIC (Mailer X1.25) id 8149; Fri, 10 Feb 89 10:28:02 EST
Date:         Fri, 10 Feb 89 15:03:00 CET
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
From: Johan van Wingen <MOSGLA%HLERUL2@cuvmb.cc.columbia.edu>
Subject:      ISO10646
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>


Dear list subscribers
First Draft Proposal DP 10646, Multiple octet coded character set (SC2 N
1987) arrived last Monday. It is 140 pages. The voting period ends
1989-05-30. It is under the care of ISO/IEC JTC1/SC2/WG2. It will have
considerable influence on coding in the next decade. To give you an
impression on what it is about, I'll mail a copy of an Informal
Introduction on it (3 pages).  Be warned, it contains box characters,
conform to GT12 (or even TN?). All the letters, however, are orthodox.

FROM  J. W. van Wingen    MOSGLA@HLERUL2
Mail to
P. O. Box 486,  2300AL Leiden, Netherlands

10-Feb-89 15:02:53-GMT,11012;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA23517; Fri, 10 Feb 89 10:02:45 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.CC.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 4834; Fri, 10 Feb 89 10:00:16 EST
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 9387; Fri, 10 Feb 89 10:00:15 EST
Received: by BITNIC (Mailer X1.25) id 8442; Fri, 10 Feb 89 10:56:02 EST
Date:         Fri, 10 Feb 89 15:27:00 CET
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
From: Johan van Wingen <MOSGLA%HLERUL2@cuvmb.cc.columbia.edu>
Subject:      Informal Introduction to ISO 10646
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>


1

  INTERNATIONAL ORGANIZATION FOR STANDARDIZATION    ISO/IEC JTC1/SC2/WG2
  INTERNATIONAL ELECTROTECHNICAL COMMISSION                       N 274

  Joint Technical Committee 1
  Subcommittee 2 Characters and Information Coding, Working Group 2





  ======================================================================
  Introduction to ISO 10646 - Multiple-Octet Coded Character Set
  ======================================================================

  A new standard is being developed within Working Group 2 of ISO/IEC
  JTC1/SC2 for the multiple-octet coded character set. Formal drafts
  will be issued during 1989.

  Its purpose is to provide a single character code which will permit
+     _______
  the written form of all present-day languages throughout the world to
  be used within computers, to be processed and interchanged. All types
  of text written in character form will be provided for, from simple
  commercial documents to publication of technical reports etc. Also the
  bibliographic requirements of librarians will be met.

  The structure of the whole code may be illustrated thus, with an octet
+     _________                                                    _____
  of bits for each dimension:



                                           ZDDDDDDDDDDDDDDDDDDD?
                                      ZDDDDDDDDDDDDDDDDDDD?    3
                                 ZDDDDDDDDDDDDDDDDDDD?    3    3
                            ZDDDDDDDDDDDDDDDDDDD?    3    3    3
     Plane             ZDDDDDDDDDDDDDDDDDDD?    3    3    3    3
    /             ZDDDDDDDDDDDDDDDDDDD?    3    3    3    3    3
   /         ZDDDDDDDDDDDDDDDDDDD?    3    3    3    3    3    3
  ZDD>  ZDDDDDDDDDDDDDDDDDDD?    3    3    3    3    3    3    3
  3Cell 3                   3    3    3    3    3    3    3    3
  3     3  ZDDDDDD?  ZDDDDDD    3    3    3    3    3    3    3
  V     3  3  A00 3  3  A01 3    3    3    3    3    3    3    3
  Row   3  DDDDDD  DDDDDD    3    3    3    3    3    3    3
        3  3      3  3      3    3    3    3    3    3    3    3
        3  3  J1  3  3  DD  3    3    3    3    3    3    3    3
        3  3      3  3      3    3    3    3    3    3    3    3
        3  @DDDDDDY  @DDDDDD    3    3    3    3    3    3    3
        3                   3    3    3    3    3    3    3DDDDY
        3  ZDDDDDD?  ZDDDDDD    3    3    3    3    3DDDDY
        3  3  A10 3  3  A11 3    3    3    3    3DDDDY (future
        3  DDDDDD  DDDDDD    3    3    3DDDDY   standardization)
        3  3      3  3      3    3    3DDDDY (Korean)
        3  3  C1  3  3  K1  3    3DDDDY (Japanese)
        3  3      3  3      3DDDDY (Chinese)
        @DDJDDDDDDJDDJDDDDDDY (bibliographic)

    Basic multi-lingual plane                  Supplementary planes




  The basic multi-lingual plane will contain four segments for graphic
+     _________________________                   ________
  characters, each holding 96 * 96 characters.

  Each segment will be divided into two zones: an alphabetic zone of
+                                       _____
  16 * 96 characters, and another zone either for the most-frequently
  used characters of the Chinese, Japanese and Korean ideographic
  scripts, or for certain special purposes.

  The shaded area outside the graphic quadrants will be used for control
+                                                                _______
  functions. All those of ISO 6429, ISO 6937 and ISO 8613 will be
+ _________
  available, with the same coding.

  The supplementary planes will accomodate characters that overflow from
+ ________________________
  the basic multi-lingual plane.
1
  A coded character anywhere in the code may be uniquely identified by
  means of three octets:

   m-s  ZDDDDDDDDDDDDDD>DDDDDDDDDDDDDD>DDDDDDDDDDDDDD?  l-s
        3 Plane-octet  3 Row-octet    3 Cell-octet   3
        @DDDDDDDDDDDDDDJDDDDDDDDDDDDDDJDDDDDDDDDDDDDDY

    NOTE: Sequences of characters run horizontally along the rows, not
          vertically as in previous code tables.

  The code may be used in different forms-of-use:
+                                   ____________

    a) A four-octet form, in which the three octets for the character
       are preceded by one for systems use. Three octet coding will
       never be used.

    b) A two-octet form, restricted exclusively to a single plane.
       Especially for users with alphabetic scripts, this will
       accomodate probably 99% of their applications.

    c) A two-octet form with extension using occasional four-octets.

    d) A compacted form, permitting strings of related characters to be
       used as single-octets.

  The basic multi-lingual plane is being designed to permit easy
  inter-working with existing 8-bit codes. Generally, conversion will be
  by the table look-up technique; however, conversion with ISO 8859
  parts 1,2,5,6,7,8 may use a simple algorithm.

  All designation, invocation and shifting as in ISO 2022 will be
  avoided.
+ _______

  It is considered that the consequent simplification of software,
+                                      __________________________
  especially for generalized applications in the OSI environment, will
  make this code economically attractive despite the the relatively
  extravagant use of bits.

  The layout of the basic multi-lingual plane may be illustrated in
+     ______        _________________________
  FIGURE 1 (next page), the axes being not drawn linearly.

    NOTE: The value of any octet is shown in simple decimal notation,
          e.g.  032, 255.

  The contents of any of the rows are set out in detailed code tables.
+                                                ____________________
  These are drawn on a pro-forma which shows a complete row in twelve
  strips, each of 16 graphic characters.

  Because the code is designed to be used as a whole, especially the
  basic multi-lingual plane, no significance attaches to whether certain
  characters are in the left hand or right-hand halves of a row, or
  early or late in the code table.

  A character once included in the code table is not duplicated
  elsewhere. Therefore for any particular application characters will
  be taken from many different places in the code table. For example
  users within Greece will find Greek letters in row 040, the equivalent
  Latin letters they use for transliteration in row 032, and some
  symbols they use in row 034.

  It will be trivially easy to adapt any equipment designed for the
  Japanese or Chinese scripts to provide all the characters of the basic
  multi-lingual plane. Therefore it is expected that suitable
  cost-effective equipment will become readily available.
+ ________________________

  The feature of fixed length coding, especially in the two-octet
+                ___________________
  mode-of-use, will make this code very easy to use in high-level
  programming languages and other software as employed for OSI and ODA.


  Hugh McG Ross, editor.                        Revised  Oct.  1988


1



  FIGURE 1    ISO 10646  Structure of the basic multi-lingual plane


        /   /                      /  /                       /
  Row. /000/032   Cell-octet   126/  /160                 255/
  oct.ZDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD?
   0003                                                     3
      3  ZDDDDDDDDDDDDDDDDDDDDDDD?  ZDDDDDDDDDDDDDDDDDDDDDDD
   0323  3   Latin script for    3  3 European languages    3 \
   0333  3   ISO 8859-1 and -2   3  3 and ISO 6937-2        3  \
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD   \
   0343  3   Extended symbols    3  3 from ISO 8879         3    \
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD     \
   0353  3   Extended Latin      3  3 script for            3      \
      3  3     all world         3  3 languages             3       \
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD        \
   0373  3   Special African and 3  3 phonetic letters      3         \
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD Alphabetic
   0383  3   Cyrillic script for 3  3 major languages       3
      3  3     Cyrillic for all  3  3 minority languages    3   scripts
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD          /
   0403  3       Greek script    3  3 for all               3         /
      3  3          forms of     3  3  writing              3        /
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD       /
   0423  3   Arabic script for   3  3 all languages         3      /
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD     /
   0433  3            Hebrew     3  3 script                3    /
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD   /
   0443  3             Other     3  3 scripts               3  /
      3  3                       3  3                       3 /
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD
   0483  3     Japanese          3  3    Special Purpose    3 Ideographs
      3  3     JIS X 0208        3  3                       3
   1263  3                       3  3                       3
      3  @DDDDDDDDDDDDDDDDDDDDDDDY  @DDDDDDDDDDDDDDDDDDDDDDD
      3                                                     3
      3  ZDDDDDDDDDDDDDDDDDDDDDDD?  ZDDDDDDDDDDDDDDDDDDDDDDD \
   1603  3                       3  3                       3  \
      3  3             Indian    3  3 scripts               3   \
      3  3                       3  3                       3
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD Alphabetic
      3  3         Mathematical  3  3 symbols               3   /
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD  /
      3  3           Oriental    3  3 scripts               3 /
      3  DDDDDDDDDDDDDDDDDDDDDDD  DDDDDDDDDDDDDDDDDDDDDDD
   1763  3      Chinese          3  3    Korean             3 Ideographs
      3  3      GB 2312          3  3   KS C 5601           3
   2553  3                       3  3                       3
      @DDJDDDDDDDDDDDDDDDDDDDDDDDJDDJDDDDDDDDDDDDDDDDDDDDDDDY

15-Feb-89 13:29:51-GMT,1151;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA01608; Wed, 15 Feb 89 08:29:47 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.CC.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 6673; Wed, 15 Feb 89 08:29:42 EST
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 6388; Wed, 15 Feb 89 08:29:41 EST
Received: by BITNIC (Mailer X1.25) id 0664; Wed, 15 Feb 89 08:30:46 EST
Date:         Wed, 15 Feb 89 08:25:54 EST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
From: Edwin Hart <HART%APLVM@cuvmb.cc.columbia.edu>
Subject:      Re: Requirements Feedback/Agreements and Disagreements
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>
In-Reply-To:  Translations, AECS Requirement 6

Inheirent in ISO 8859-1 and the Country Extended Code Pages (EBCDIC) is a
one-to-one mapping for the characters.  We require that the one-to-one
relation be extended to control characters.  This will allow "round-trip"
integrity for all data.  See AECS Requirement 6.

Ed Hart

15-Feb-89 17:42:38-GMT,1684;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA22518; Wed, 15 Feb 89 12:42:27 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.CC.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 6801; Wed, 15 Feb 89 12:42:34 EST
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 6915; Wed, 15 Feb 89 12:42:33 EST
Received: by BITNIC (Mailer X1.25) id 8074; Wed, 15 Feb 89 12:06:41 EST
Date:         Wed, 15 Feb 89 10:37:02 EST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
From: Edwin Hart <HART%APLVM@cuvmb.cc.columbia.edu>
Subject:      Re: Requirements Feedback/Agreements and Disagreements
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>
In-Reply-To:  Your message of Wed, 15 Feb 89 08:52:52 EST

>I would agree with rick's statement above that such translation be one-to-one
>reversible.  I would add another primary requirement: the code for any
>printable ASCII character be translated to a EBCDIC code that represents
>the same printable character.   These two requirements will mean that some
>printable EBCDIC characters are lost, but that is life!

The IBM Country Extended Code Pages (CECPs) and ISO 8859-1 share the same
character set.  In other words, if a character is in a CECP, it is in 8859-1
and vice versa.  Thus, for graphic characters (those which display),
a one-to-one mapping exists.  The pieces are already in place for your
requirement IF you move to the 8-bit ASCII world of ISO 8859-1 (which uses
ANSI X3.4-1986 (U.S. ASCII) as the left half of the code table).

Ed Hart

15-Feb-89 18:13:19-GMT,3094;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA24860; Wed, 15 Feb 89 13:12:57 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.CC.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 6815; Wed, 15 Feb 89 13:13:04 EST
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 6991; Wed, 15 Feb 89 13:13:03 EST
Received: by BITNIC (Mailer X1.25) id 8118; Wed, 15 Feb 89 12:07:02 EST
Date:         Wed, 15 Feb 89 16:22:00 CET
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
From: Johan van Wingen <MOSGLA%HLERUL2@cuvmb.cc.columbia.edu>
Subject:      SHARE Requirements 2
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>


Dear list subscribers
Here is my next installment.
Some little writing errors first.
P. 1  prividing --> providing
P. 1  Polyas --> Pllya (P/olya)
P. 2  coexistance --> coexistence
P. 4  Stardards --> Standards
Now the requirements:
R8: This contradicts what is said in R1:
" ........  End users should be concerned with using  applications,  not  how
" the  character  data  is encoded.   IBM must hide the way character data is
" encoded.  How the character data is coded must be invisible  to  end  users
" and  applications developers.  However, ...................................
R22: Such a thing may be included. It shall, however, only express an
INTENTION, not act as a barrier to interpreting data differently. Of
course, this facility cannot be meant only for DURING THE MIGRATION.
The last paragraph is far too optimistic in regarding the issues it
reflects.
R23: This is too vague to me. It should say that there should be as few
borders as possible, acting as code barriers. IBM should state clearly
that national CECP are only a short-term approach, and that a unique
EBCDIC is what is aimed at, a compromise between CP037 and CP500.
If and when that is said, we can start discussing with IBM what should
be in it.
With ISO 8859 we have only the East-West and perhaps the North-South
code barrier, and if we succeed with the 254 char. set, we have even
the Iron Curtain eliminated. A good question: how sacrosanct are
cols 0-3 of EBCDIC? We may need them for the next conversion scheme.
R25: ISO 646 is quite dead now, and will only be kept for the CCITT
Telematic Services.
R27: At present a printer will prints blanks for unprintables, which
I prefer over the proposed options.
R28: IBM will say: You are knocking at the wrong door. Nothing prevents
you at going to ANSI or their counterparts with these ideas.

A thing I missed is a position towards multi-byte sets. Do not overlook
that IBM included support for it in TSO/ISPF and produced the 5550 for
the Japanese market. Are we willing to code our Latin letters with two
bytes instead of one, just for mixing more alphabets and scripts in
one document in the future? Xerox has it, but that will not become the
ISO Standard.

FROM  J. W. van Wingen    MOSGLA@HLERUL2
Mail to
P. O. Box 486,  2300AL Leiden, Netherlands

16-Feb-89 11:55:05-GMT,3191;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA05038; Thu, 16 Feb 89 06:54:58 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.CC.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 7166; Thu, 16 Feb 89 06:51:56 EST
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 8203; Thu, 16 Feb 89 06:51:55 EST
Received: by BITNIC (Mailer X1.25) id 4088; Thu, 16 Feb 89 06:52:19 EST
Date:         Thu, 16 Feb 89 12:48:00 CET
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
From: Johan van Wingen <MOSGLA%HLERUL2@cuvmb.cc.columbia.edu>
Subject:      SHARE req. 3
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>


Dear list subscribers
As a further installment I would like to discuss the use of the term
"character set". ASCII is often called thus, but in fact the code is
meant. There are two concepts, that of the set of characters, and that
of the way these are represented by bytes. The ISO term for the first
is "repertoire", (strictly speaking it is used only in ISO 6937, not in
ISO 8859). We may introduce that term into the EBCDIC world too. Thus
ISO 8859-1, CP037 and CP500 share the same repertoire, but have
different coding, as do the several CECP's for Western Europe. CP850
contains this repertoire as a subset, with again different coding.
The ASCII repertoire (7-bit) is a subset of all those in the 9 parts of
ISO 8859, always with the same coding. The repertoire of ISO 8859-2 is
identical to that of CP870 (as far known to me, can anybody tell me in
which IBM manual it is defined?), but not with the same coding. I hope
this will be helpful.

Just as a bonus I offer the following text in German (from Goethe's
Faust), which, I hope, I have correctly coded in CP037, (I am not going
to provide a translation). It may serve as a motto to our effort, for
it is an early description of a conversion algorithm, with appropriate
comments by the Devil.

  Die Hexe
  (mit groer Emphase f
  Du mut verstehn!
          Aus Eins mach Zehn,
          Und Zwei la gehn,
          Und Drei mach gleich,
          So bist du reich.
          Verlier die Vier!
          Aus Funf und Sechs,
          So sagt die Hex,
          Mach Sieben und Acht,
          So ist's vollbracht:
          Und Neun ist Eins,
          Und Zehn ist keins.
          Das ist das Hexen-Einmaleins!
  Faust.  Mich dunkt die Alte Spricht im Fieber.
  Mephistopheles.  Das ist noch lange nicht voruber,
  Ich kenn es wohl, so klingt das ganze Buch;
  Ich habe manche Zeit damit verloren,
  Denn ein vollkommner Widerspruch
  Bleibt gleich geheimnisvoll fur Kluge wie fur Toren.
  Mein Freund, die Kunst ist alt und neu.
  Es war die Art zu allen Zeiten,
  Durch Drei und Eins, und Eins und Drei
  Irrtum statt Wahrheit zu verbreiten.
  So schw
  Wer will sich mit den Narrn befassen?
  Gew>hnlich glaubt der Mensch, wenn er nur Worte h>rt,
  Es musse sich dabei doch auch was denken lassen.

  Goethe, Faust Teil I, 2540-2566

FROM  J. W. van Wingen    MOSGLA@HLERUL2
Mail to
P. O. Box 486,  2300AL Leiden, Netherlands

16-Feb-89 17:40:39-GMT,8033;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA27630; Thu, 16 Feb 89 12:40:31 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.CC.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 7322; Thu, 16 Feb 89 12:37:39 EST
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 9192; Thu, 16 Feb 89 12:37:38 EST
Received: by BITNIC (Mailer X1.25) id 4482; Thu, 16 Feb 89 12:09:57 EST
Date:         Thu, 16 Feb 89 08:38:37 EST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
From: Edwin Hart <HART%APLVM@cuvmb.cc.columbia.edu>
Subject:      Re: SHARE req. 3
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>
In-Reply-To:  Code Page 500 versus 37:  Compromise Needed?

(This note started to a reply to Johan van Wingen's note with the Faust quote
in German.  But once I started writing it, I was writing about an area which
concerns me.  I believe that IBM will resolve the code page 37 versus 500
issue by supporting both of them for the long-term.  To me, the political
situation dictates that kind of solution from IBM.  I would be interested in
your thoughts.)

Although I cannot read German, I know the Faust text came through correctly.
The reason is that the code points (ISO code positions (hex values)) for
code page 37 and code page 500 for the alphabet, numbers, and most other
characters are exactly the same.  They only differ for 7 characters and
code points:

  Code Point        37 V1                        500 V1

     4A         US cent                   Left Square Bracket 
     4F         Vertical Bar |             Exclamation Point |
     5A         Exclamation Point !        Right Square Bracket !
     5F         Logical Not ^              Circumflex ^
     B0         Circumflex 5               US cent 5
     BA         Left Square Bracket       Logical Not 
     BB         Right Square Bracket Y     Vertical Bar |

(The 37 V1 column uses CP 37 characters and
the 500 V1 column uses CP 500 characters (I hope!). Ed Hart)

I am concerned IBM will not standardize on one EBCDIC code page.
With Europe, CP 500 seems to be firmly entrenched.  In the US, Canada,
and Portugal, CP 37 is entrenced.  With this situation, I believe IBM will
respond by narrowing from 9 CECPs to two CECPs:  CP 37 v1 and CP 500 v1.
They will do this to maintain data compatibility with data customers are
already using to to avoid offending customers who have recently converted
data to CP 37 or CP 500.  Then IBM will build systems to
automatically do the translations between CP 37 and 500 for mail, etc.

An alternative to standardizing on both CP 37 and CP 500 is for ALL OF US to
find a compromise code page somewhere between CP 37 and CP 500.  The compromise
must be something we can accept--because it cannot be perfect.  Before
suggesting anything, I want to raise the following issues:

1.  Mainframe and Midrange Programming Languages depend on the US code page(s).
    Since the US Standard EBCDIC does not define code points for brackets,
    many products use the TN/T11 print train standard code points for brackets:
    X'AD' and X'BD'.
2.  EBCDIC Code Point X'5F' should be reserved for the NOT function
    because it is ingrained in the IBM products.  However,
    the ISO 8859 family of codes does not have the NOT character (cp 37 ^/
    cp 500 ) in any code but ISO 8859-1.  Consequently, the NOT character
    should not be
    allowed in programming language syntax.  However, in EBCDIC, the compilers
    use code point X'5F' for NOT.  For ASCII terminals, it is fairly common to
    map the ASCII circumflex (cp 37 5/cp 500 ^) into the EBCDIC NOT (cp 37 ^/
    cp 500 5).  (This may be the result of the ASCII-1968 standard which
    allowed the ASCII X'5E' code point to have "stylized graphics".
    If use of the NOT character (cp 37 ^/cp 500 5) is an issue to IBM, they
    should change the compilers to accept the code points for either the NOT
    or circumflex characters.
3.  EBCDIC Code Point X'4F' should be reserved for the vertical bar character
    (cp 37 |/cp 500 Y) because it is ingrained in the IBM products.  This is
    another code point and character
    used in programming languages for the OR function.
4.  Brackets in CP 37 or CP 500 do not match the code points generally
    used, X'AD' and X'BD'.  The Code Page 37 assignments for brackets are not
    widely used.  Code points for brackets affect the PASCAL and C
    programming languages.
    Therefore, regardless of the code selected (CP 37 or 500),
    both PASCAL and C compilers must be changed for new code points
    for brackets.
5.  The C language uses the exclamation point character (cp 37 !/cp 500 |).
    However, because of issue number 4, the C compiler must be changed for
    brackets.  If C must be changed for brackets, changing C for a new
    code point for the exclamation point is not unreasonable.
6.  To my knowledge, mainframes do not place any syntactic significance to the
    US EBCDIC code points X'4A' (cp 37 /cp 500 5) or X'B0' (cp 37 5/cp 500 ^).
    Therefore, character assignments to these code points is not as critical
    as the others mentioned earlier.

Based on these issues, I would recommend a compromise code point assignment.
This recommendation uses code point assignments for characters from both CP 37
and CP 500.


The first two code points are the most critical assignments.
  X'5F' to circumflex (issue 2)  (cp 500)
  X'4F' to vertical bar (issue 3) (cp 37)

These assignment for brackets is a recommendation.
  X'4A' and '5A' to left and right brackets (issue 4) (cp 500)
    The reasons for this choice are:
    1. The code points X'AD' and X'BD' are unavailable in CP 37 and CP 500,
       and I believe we should focus on fixing the differences between the
       two code pages rather than creating more differences.
    2. The Code Page 37 code points for brackets are not widely used.
    3. The X'4A' and X'5A' code points are in wide use in Europe
       in Country-specific EBCDIC code pages.
    4. The code points are next to each other in the code table.

Code points for the remaining characters may be defined by IBM.  I believe
that the assignments are not critical and therefore, we would waste time
discussing assignments.  If I am wrong, tell me.

    US cent (cp 37 /cp 500 5)
    Exclamation point (cp 37 !/cp 500 |)
    NOT (cp 37 ^/cp 500 5)


What are your thoughs?

  1.  Should we continue to request one EBCDIC code page selected from
      cp 37 or cp 500?
  2.  Should we request cp 37 v2 with brackets at code points X'AD' and
      X'BD'?
  3.  Should we pursue a technical compromise similar to this one to solve
      what I perceive as very serious political problems?  This assumes that
      one EBCDIC code page for ISO Latin alphabet number 1 is so critical
      that installations will be willing to convert to it, and those
      installations who have already converted to CP 37 or CP 500 would be
      willing to change again (They might be more willing if the character
      and code point changes had minimum effect on them; that is, they
      do not use the characters affected.).
  4.  Should we be prepared to accept the idea that the political situation
      will dictate a technical solution of two code pages:  37 and 500?

Before you answer, please consider what kind of changes your installation will
REALLY be willing to make to obtain one EBCDIC code for ISO Latin alphabet
number one.  Are US and Canadian installations really willing to convert
their data, documents, and source programs to one EBCDIC code page if IBM
selects code page 500 as the long-term solution?  Are installations in Europe
who have recently converted to code page 500, willing to make another
conversion to code page 37 or to some compromise code page between code page
37 and 500?


Thank you for all of your comments to date.
Sincerely,

Ed Hart

16-Feb-89 22:59:43-GMT,2459;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA23485; Thu, 16 Feb 89 17:59:38 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.CC.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 7471; Thu, 16 Feb 89 17:56:51 EST
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 9809; Thu, 16 Feb 89 17:56:50 EST
Received: by BITNIC (Mailer X1.25) id 5958; Thu, 16 Feb 89 17:57:49 EST
Date:         Thu, 16 Feb 89 14:25:40 EST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
From: John C Klensin <KLENSIN%INFOODS.MIT.EDU@cuvmb.cc.columbia.edu>
Subject:      RE:       Re: Requirements Feedback/Agreements and Disagreements
X-To:         ASCII/EBCDIC character set related issues
              <ISO8859%JHUVM.BITNET@MITVMA.MIT.EDU>
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>

    Actually, the vertical bar / exclamation-point swapping is a result
of the "national character use" positions of ISO 646 and early efforts
to confine certain things, like Standards for programming languages that
were initially defined in terms of EBCDIC, to the basic version
positions.  The controversy also included some strange discussions about
whether ! (exclamation-point) "looked more" like EBCDIC "solid vertical
bar" than "|" (ASCII broken vertical bar) did.
   It has been well over a decade since the predecessor of todays's ISO
character set committees started sending little notes to programming
language standards committees encouraging them (us) to clean up their
acts and use *only* the basic character set of ISO646.  Since the basic
character set does not contain | (broken vertical bar at 7/12) and does
not contain ^ (carat or circumflex at 5/14) or ~ (tilde at 7/14) either,
the "obvious" solution was to map EBCDIC vertical bar into ISO646 2/1
(exclamation mark) and to do something creative with EBCDIC not-sign,
like writing <> rather than ^= or ~=.
   And, of course, since the character set folks were willing to tell
the language folks what *not* to do, but not what to do instead, there
was no "standard" about the 'solutions'.
   Sometimes good intentions go a little astray.

   John Klensin, MIT       Klensin@INFOODS.MIT.EDU
   To identify the perspective from which the above is written:
   Chair, ANSI X3J1 (PL/I); Project Editor for PL/I, ISO/IEC JTC1/SC22


17-Feb-89 16:57:51-GMT,7823;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA12654; Fri, 17 Feb 89 11:57:32 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.CC.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 7759; Fri, 17 Feb 89 11:54:44 EST
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 1152; Fri, 17 Feb 89 11:54:43 EST
Received: by BITNIC (Mailer X1.25) id 7044; Fri, 17 Feb 89 11:50:32 EST
Date:         Fri, 17 Feb 89 09:25:15 EST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
From: John C Klensin <KLENSIN%INFOODS.MIT.EDU@cuvmb.cc.columbia.edu>
Subject:      RE:       Re: SHARE req. 3
X-To:         ASCII/EBCDIC character set related issues
              <ISO8859%JHUVM.BITNET@MITVMA.MIT.EDU>
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>

I find myself strongly in agreement with Ed's main point here, that,
absent both a strong recommendation from the user community AND a
clear willingness to bear pain, IBM will "compromise" on two code pages.
That would be an improvement, but...

A few small observations and quibbles:

>2.  EBCDIC Code Point X'5F' should be reserved for the NOT function
>    because it is ingrained in the IBM products.  However,
>    the ISO 8859 family of codes does not have the NOT character (cp 37 ^/
>    cp 500 ) in any code but ISO 8859-1.  Consequently, the NOT character
>    should not be
>    allowed in programming language syntax.  However, in EBCDIC, the compilers
>    use code point X'5F' for NOT.  For ASCII terminals, it is fairly common to
>    map the ASCII circumflex (cp 37 5/cp 500 ^) into the EBCDIC NOT (cp 37 ^/
>    cp 500 5).  (This may be the result of the ASCII-1968 standard which
>    allowed the ASCII X'5E' code point to have "stylized graphics".
>    If use of the NOT character (cp 37 ^/cp 500 5) is an issue to IBM, they
>    should change the compilers to accept the code points for either the NOT
>    or circumflex characters.
   Several implementations of ISO-standard compilers permit either ASCII
caret/circumflex or ASCII tilde as the appropriate stylization of what
started our as an EBCDIC 'not'.  With the introduction of 'not' in
ISO8859-1, I expect that some vendors will decide to accept that too.
Or, worse, instead.  As I indicated in my note yesterday, parts of this
conversion mess started out in the other direction.  Unlike the custom
in some of the communications and OSI Standards (the CCITT PAD standards
are excellent examples), the ISO programming language Standards do not,
in general, specify the codings of the character sets to be used, even
in ASCII; their language is a more or less specific version of "use
characters that look like this".  That has resulted in some tough
intra-ASCII conversion problems which some vendors, responding to
perceived user needs, have resolved by mapping more than one ASCII
character onto a given language character.  All of this confuses the
'unambigious translation between ASCII and EBCDIC' problem considerably,
since we can't unambiguously translate between ASCII and ASCII when the
semantics assigned by a programming langauge to a character are
considered.
  The three important examples that I know of are:
   EBCDIC NOT maps to ASCII caret/circumflex and/or ASCII tilde
   EBCDIC vertical bar maps to ASCII exclamation-mark and/or ASCII
broken vertical bar (yesterday's discussion)
   EBCDIC (single-)quote maps to ASCII quote (i.e., double quote) and/or
ASCII (acute) accent.
  Since, for all of the vendors who chose one of each of these
code/graphics pairs and some of those who chose them as alternatives,
the "other" character is permitted in strings, translation between one
set of conventions or the other--and hence back to EBCDIC code pages--
that are semantics-preserving have to be done by a parsing process,
sometimes with a few heuristics, rather than by character by character
translation in a data stream.  It makes it hard to make firm statements
about what programming languages "do" or "should do".

>6.  To my knowledge, mainframes do not place any syntactic significance to the
>    US EBCDIC code points X'4A' (cp 37 /cp 500 5) or X'B0'
>    (cp 37 5/cp 500 ^).
  Probably nothing Standardized.  X'4A' is the character that is often
used as a stylization of ASCII back-slant/reverse-solidus in some
software, especially terminal emulators and is used as a separator in
some widely-circulated applications packages (precisely because it is
not used by anything else).  I know of nothing that uses X'B0', or
anything else in column B in a critical way as a character with semantic
significance, but that may be just my lack of knowledge.

>Based on these issues, I would recommend a compromise code point assignment.
>This recommendation uses code point assignments for characters from both CP 37
>and CP 500.
>
>The first two code points are the most critical assignments.
>  X'5F' to circumflex (issue 2)  (cp 500)
>  X'4F' to vertical bar (issue 3) (cp 37)
>
>These assignment for brackets is a recommendation.
>  X'4A' and '5A' to left and right brackets (issue 4) (cp 500)
This seems technically reasonable and politically attractive.

>Code points for the remaining characters may be defined by IBM.  I believe
>that the assignments are not critical and therefore, we would waste time
>discussing assignments.  If I am wrong, tell me.
I agree, but the recommendation must stress, perhaps even more strongly
than the present text, that "defined by IBM" means "defined once, in one
place", not "IBM may define a series of alternatives".
>    Exclamation point (cp 37 !/cp 500 |)
>    NOT (cp 37 ^/cp 500 5)
Also see comments on these characters above.

>From what I've seen of IBM's decision-making in other areas, they tend
to prefer leaving those who are already unhappy in that state, rather
than making, or even risking making, those who are happy less happy.
Consequently, pushing even toward two (only) code pages is going to be a
tough one.  The case will, I think, be considerably strengthened if the
people who want something are in a position to say "this change is going
to hurt us a lot too, but it is important if there is going to be a
future in which things are not worse".  Part of the argument that should
be made, and which I don't think Ed's draft makes clearly enough, is
that, if we can get
 (a) Unambiguous and reversible mappings between ISO8859-n and EBCDIC
CPm, with IBM agreement to specify the "official" 'n,m' pairs in a public
way and to increase 'm' as needed as 'n' increases.  There really is no
alternative to this, unfortunately: if 'n' has to rise above 1 because
of character set content (not just code point mapping), then the number
of code pages will have to rise above 1.
 (b) A single, standard, compromise, EBCDIC code page to be use in IBM
operating systems and products, especially programming languages and
data communications, such that alternate code pages are used the way
alternate ISO8859-n forms are used: locally or by control-sequence
introduced departures from the 'standard'.  And, as with ISO8859, the
"alternate" code pages are built up from a common core that permits
those operating systems and products to be completely standard across
code pages.  Otherwise, you just get the present chaos at a new point.
 ...then we will be satisified, if not happy.  And, more important, IBM
will be spared a strong case for replacing EBCDIC internally with
ISO8859 at some point in the future, since no one (well, nearly no one)
should care what they do internally as long as they communicate clearly
at the boundaries.
  More than that is probably unrealistic to hope for.  On the other
hand, that is quite a lot.

John Klensin



18-Feb-89  2:42:59-GMT,2397;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA05609; Fri, 17 Feb 89 21:42:56 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.CC.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 8032; Fri, 17 Feb 89 21:40:37 EST
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 2119; Fri, 17 Feb 89 21:40:36 EST
Received: by BITNIC (Mailer X1.25) id 0444; Fri, 17 Feb 89 21:34:38 EST
Date:         Fri, 17 Feb 89 14:42:37 EST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
From: John C Klensin <KLENSIN%INFOODS.MIT.EDU@cuvmb.cc.columbia.edu>
Subject:      RE:       Re: SHARE req. 3
X-To:         ASCII/EBCDIC character set related issues
              <ISO8859%JHUVM.BITNET@MITVMA.MIT.EDU>
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>

>        Personally,  (you're all going to laugh)...
> ... in such a way that EBCDIC and ASCII be transparent to the user.
I promise not to laugh if you promise to not make me hold my breath.
I'd expect to see a distinct temperature drop in the usual hot place
sometime first.  Not that it might not be a good idea, but all of us put
together are not worth one large bank or insurance company in terms of
getting IBM to change its ways, policies, or software.

>  The day will come when type "char" in C will be
>16 bits rather than the current 8.
That will be the day that most of the C programs in the world stop
working.  Keep in mind that this change will seriously alter the
semantics of every C program that believes that 'char' == 'int' == one
eight bit byte.  Lots of stuff, including parts of the language
definition, seem to depend on that assumption.  What you might see
instead is the introduction of 'longchar', with the use of 'char'
gradually disappearing, but that is not transparent and not something
that is likely to happen soon either.

>  EBCDIC would play
>only a minor roll and then go the way of card punches.
  Clearly the "right" solution.  Now let me introduce you to the guy in
the next office who has been trying to get me to attach a card
reader/punch to my VAX for the last four years so he can process his
data archive (which closely resembles a row of 24 drawer grey cabinets
in the hall).

   John Klensin, MIT  (Klensin@INFOODS.MIT.EDU)


22-Feb-89 11:20:40-GMT,9164;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA28035; Wed, 22 Feb 89 06:20:35 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.CC.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 9571; Wed, 22 Feb 89 06:27:49 EST
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 7285; Wed, 22 Feb 89 06:27:47 EST
Received: by BITNIC (Mailer X1.25) id 5851; Wed, 22 Feb 89 06:28:06 EST
Date:         Wed, 22 Feb 89 12:19:00 CET
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
From: Johan van Wingen <MOSGLA%HLERUL2@cuvmb.cc.columbia.edu>
Subject:      ISO 8859 trouble spots
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>


Dear list subscribers
The following document I intend to submit to ISO/JTC1/SC2/WG3 for their
next meeting. But before doing that I would like to have your comments
on it.
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
1
  THE REMAINING TROUBLE SPOTS IN ISO 8859

  0.  Introduction

  The several parts of ISO 8859 have been approved a few years ago and
  are now being implememented increasingly. A lot of experience has been
  collected. In general the reaction has been that the standard is
  excellent, but some weaker points of the standard are now becoming
  visible. These should be discussed before habits grow entrenched.
  Applications in the field of programming languages have been the
  source of most of the comments.

  1.  The problem of diacritics

  There is a long tradition in the writing and printing industry for
  extending the available 26 letter Latin alphabet. Extra letters are
  created by putting a little mark over or under a letter. These are
  called "diacritical marks": accents, umlauts, cedilla's and so on.
  They are also used in some languages for putting a stress on a
  syllable. (Barring a letter is not considered applying a diacritic.)
  Where the number of available characters was severely restricted, as
  with typewriters, separate diacritics provided a solution with the
  practice of overprinting. This approach was copied in ISO 646 using
  BACKSPACE, and with ISO 6937-2 using non-spacing diacritics. ISO 646
  provided only a few: underline, acute, grave and circumflex accent,
  diaeresis (umlaut), overline/tilde. These can also be used
  free-standing, that is without BACKSPACE, in which form they soon
  acquired a new meaning: low line, apostrophe, prime, upward arrow
  head, quotation mark. The comma could also be used for cedilla. This
  double use (already considerably reduced in ISO 646-1983) was not
  allowed in ISO 6937-2, where diacritics (a larger set) must occur only
  in predefined combinations with certain letters, or, exceptionally,
  with a SPACE. They are always non-spacing. In order to preserve the
  existing characters from ISO 646, ISO 6937-2 contains both a spacing
  and a non-spacing circumflex, grave accent and tilde. This introduces
  a double way of representing three characters. Astonishingly, the
  standard prefers for these three the single byte representation, the
  other "is deprecated".

  In ISO 8859 diacritics occur again. But all characters in it are
  always spacing without exception. However, diacritics have no meaning
  in itself. What is the use of a free-standing cedilla? One can only
  conclude that their presence is useless and a waste of valuable
  positions. Keeping them there can lead to two undesirable
  developments.  First, implementers may violate the rules of ISO 8859
  by making the diacritics non-spacing, or second, they may attach to
  them, when free-standing, a new meaning, as has been done with the
  circumflex, often used as "control". These characters deserve to be
  removed at the first opportunity. It will make it possible to include
  Turkish in ISO 8859-1.

  2.  The Logical OR and the Logical NOT

  A need for characters having the meaning of the Logical OR and the
  Logical NOT was introduced by PL/I (1964). The first compilers used
  EBCDIC. Thus the problem for ASCII and ISO 646 became apparent only
  somewhat later. As there were no positions left, some way of escape
  had to be found.

1 ASCII (USAS X3.4-1968) contains in 6.4:
  "No specific meaning is prescribed for any graphics in the code table
  except that which is understood by the users. Furthermore, this
  standard does not specify a type style for the printing or display of
  the various graphic characters. In specific applications, it may be
  desirable to employ distinctive styling of individual graphics to
  facilitate their use for specific purposes as, for example, to stylize
  the graphics in code positions 2/1 and 5/14 into those frequently
  associated with Logical OR and Logical NOT, respectively."
  (These graphics normally represent Exclamation Point and Circumflex.)

  In ISO R 646-1967 the text is somewhat different:
  "4.3 Interpretation of graphics
  The meaning of the graphics is not defined by this ISO Recommendation.
  It will be necessary to reach agreement on the meaning and this will
  depend upon the particular application except in cases where other ISO
  Recommendations already exist. However no interpretation may be chosen
  which is contradictory to the customary meaning. A graphic symbol can
  have more than one meaning, e.g. the graphical symbol - (minus) also
  can have the meaning of hyphen or separation mark. The font design of
  the symbol is not part of this ISO Recommendation."

  Mackenzie (2) comments on this:
  "The last sentence of Section 4.3 leaves the question of "font design"
  open; that is, a manufacturer could design Exclamation Point to look
  like Vertical Bar and Circumflex like NOT sign. The LOGICAL OR/Logical
  NOT problem had finally been solved."
  Unfortunately this was an illusion, as we shall see.

  In ISO 646-1973 we still find in 5.3:
  "The names chosen to denote graphic characters are intended to reflect
  their customary meanings. However, this International Standard does
  not define and does not restrict the meanings of graphic characters.
  In addition, it does not specify a particular style or font design for
  the graphic characters."

  In ISO 646-1983 we find at the end of 4. :
  "The names chosen to denote graphic characters are intended to reflect
  their customary meaning. However, this International Standard does
  not define and does not restrict the meanings of graphic characters.
  Neither does it specify a particular style or font design for
  the graphic characters when imaged."

  Graphic characters are distinguished by their name, not by their
  shape. In ISO 646 the Vertical Line turns up, that can be used for
  Logical OR, but that name is not included. Equally, Upward Arrow Head,
  Circumflex (for 5/14) is never additionally named Logical NOT. Thus a
  sound basis for using both in this way is missing. Nevertheless,
  widespread use of Vertical Line and Circumflex for OR and NOT could be
  found, just as * and / are employed for "multiply" and "divide". This
  development cannot easily be redressed. Thus it was a most unfortunate
  idea to include a new code for NOT in ISO 8859-1. Confusion was
  aggravated by not including it in ISO 8859-2. It continues to cause
  problems at attempting to establish a uniform translate table for
  EBCDIC - ISO8859.

1 3.   Obsolete signs

  A compiler writer needs to know how a certain character in a program
  has to be classified, as a digit, as a letter (mostly it does not
  matter which) or a special character with a given meaning. Checking
  whether a byte is meant to be a letter would be easier if the letter
  areas of ISO 8859 would have been contiguous. Instead of that, quite
  obsolete characters for multiply and divide, for which * and / are
  used in programs for more then 25 years, have been inserted in the
  middle of a column. A look-up table is required to decide whether a
  character is considered a letter or not. Even if this cannot be
  avoided anyway, the introduction of unnecessary exceptions is always
  a bad thing, as is the destruction of a stable convention. It is no
  good if language designers are now going to be pressed for including
  two graphic symbols, meaning the same thing, into the syntax.
  Removing "multiply" and "divide" would make place for putting in the
  French ligature "OE" and "oe" again, which the logic of 6937 wanted
  to keep and that of 8859 wanted to go.

  4.  Icelandic versus Turkish

  Mixing characters from several parts from ISO 8859 requires invoking
  the help of ISO 2022, which much hardware does not support. This
  imposes a considerable cultural barrier between certain groups of
  nations. If this barrier coincides with one raised by world politics
  things are as they are. But if there is none, other priorities should
  dominate. We have now Latin alphabet no. 5 (ISO 8859-9), and it should
  be discussed whether or not one including Turkish should prevail over
  one with Icelandic. There are more as 100 times as many Turks as there
  are Icelanders.

23-Feb-89  0:47:38-GMT,1729;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA29033; Wed, 22 Feb 89 19:47:35 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.CC.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 0013; Wed, 22 Feb 89 19:45:28 EST
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 8428; Wed, 22 Feb 89 19:45:27 EST
Received: by BITNIC (Mailer X1.25) id 3817; Wed, 22 Feb 89 19:25:18 EST
Date:         Wed, 22 Feb 89 14:52:39 EST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
From: Edwin Hart <HART%APLVM@cuvmb.cc.columbia.edu>
Subject:      Re: ISO 8859 trouble spots
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>
In-Reply-To:  Criticism of ISO 8859

I read through your note on ISO 8859 problems.  I agree.  I would add that
the PL/1 not symbol is only in ISO 8859-1 (and maybe -9, I have not seen -9).

Also, many sites in North America map the ASCII tilde (7/14) into the EBCDIC
Not.  Formal logic courses frequently use tilde as the Not operator.  The
courses also use V for inclusive Or, and a circumflex-like character for
logical And.  At some of my SHARE presentations, several people said "Do not
use the circumflex character to mean logical Not."

In my note about a compromise EBCDIC code page for Reference EBCDIC-1,
I proposed keeping the Not FUNCTION at EBCDIC code point X'5F'
but using the circumflex CHARACTER there because circumflex was a character
common to all of the ISO 8859 standards, and one would presume that people
using other ISO 8859 parts would want to use PL/1 or other languages which
use a Not symbol.

Ed Hart

23-Feb-89  0:54:58-GMT,2902;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA29402; Wed, 22 Feb 89 19:54:55 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.CC.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 0017; Wed, 22 Feb 89 19:52:48 EST
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 8444; Wed, 22 Feb 89 19:52:47 EST
Received: by BITNIC (Mailer X1.25) id 3909; Wed, 22 Feb 89 19:30:21 EST
Date:         Wed, 22 Feb 89 16:19:27 EST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
From: "Nelson H.F. Beebe" <Beebe%SCIENCE.UTAH.EDU@cuvmb.cc.columbia.edu>
Subject:      Comment on ISO 8859 multiply and divide
X-To:         ISO8859%JHUVM.BITNET@CUNYVM.CUNY.EDU
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>
In-Reply-To:  Message from "Johan van Wingen <MOSGLA@HLERUL2.BITNET>" of Wed 22
              Feb 89 04:32:00-MST

Johan van Wingen <MOSGLA@HLERUL2.BITNET> in a posting dated Wed,
22 Feb 89 12:19:00 CET remarks:

>> ... Instead of that, quite
>> obsolete characters for multiply and divide, for which * and /
>> are used in programs for more then 25 years, have been
>> inserted in the middle of a column.
>> ... Removing "multiply" and "divide" would make place for
>> putting in the French ligature "OE" and "oe" again, which the
>> logic of 6937 wanted to keep and that of 8859 wanted to go.

I have not seen a printed representation of these two characters.
If, as I presume, they are a centered sans-serif x for multiply,
and a minus with a dot above and below for divide, then there is
another problem.  In the English-speaking world, that symbol is
used to mean division, but in Denmark (and possibly elsewhere in
Scandinavia), it means subtract!

While circumflex may have been used as a logical NOT in PL/1
environments running with ISO character sets, I would like to
point out that in the C language, exclamation point is used as a
Boolean (logical) NOT, tilde is used as a one's complement
(another kind of NOT) and circumflex as an exclusive OR.  It
would surprise me if there is not now substantially more code
extant in C than in PL/1.

Given that both EBCDIC and the ISO character sets each contain an
exclamation point, and each contain a (possibly-split) bar, it is
foolish to consider mapping exclamation point into vertical bar.
No responsible editor would permit a vertical bar to be used in
natural language text to mean exclamation point, and the heavy use of
both symbols in the C programming language for completely
different purposes (that lead to syntactically correct, but
semantically wrong, code, when the two are exchanged, as I have
earlier pointed out on this list) require that a mapping of
between exclamation point and vertical bar be discouraged, if not
outright forbidden.
-------

23-Feb-89 14:48:41-GMT,1850;000000000001
Received: from CUVMB.CC.COLUMBIA.EDU by cunixc.cc.columbia.edu (5.54/5.10) id AA21150; Thu, 23 Feb 89 09:48:33 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.CC.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 0179; Thu, 23 Feb 89 09:46:35 EST
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 9016; Thu, 23 Feb 89 09:46:34 EST
Received: by BITNIC (Mailer X1.25) id 6564; Thu, 23 Feb 89 09:34:37 EST
Date:         Wed, 22 Feb 89 23:52:20 EST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
From: John C Klensin <KLENSIN%INFOODS.MIT.EDU@cuvmb.cc.columbia.edu>
Subject:      RE:       Comment on ISO 8859 multiply and divide
X-To:         ASCII/EBCDIC character set related issues
              <ISO8859%JHUVM.BITNET@MITVMA.MIT.EDU>
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>

In those environments that run with EBCDIC, I would strongly suspect
that there is more PL/I use than C use.  Even today.  And more COBOL use
than either.   There are also approved, cast-in-concrete ISO and ANSI
Standards for PL/I and only Draft Proposals for C; your comments could
be construed as "C should be changed prior to standardization, because
it uses too many characters in violation of the style in which other
programming languages, etc., use them".  That is not a proposal or
suggestion, serious or otherwise, just a comment about how things work.

What is more important is that this type of semi-quantitative reasoning
won't solve any problems.  What it will do is to encourage the vendor to
say "ok, different character sets for different audiences, since the
market pressures run against goring the oxen of large customers", which is
what we are trying to avoid.

  John Klensin, MIT


24-Feb-89 12:33:52-GMT,3034;000000000001
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by cunixc.cc.columbia.edu (5.54/5.10) id AA04125; Fri, 24 Feb 89 07:33:49 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 0657; Fri, 24 Feb 89 07:31:58 EST
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 0750; Fri, 24 Feb 89 07:31:57 EST
Received: by BITNIC (Mailer X1.25) id 7970; Fri, 24 Feb 89 07:14:40 EST
Date:         Thu, 23 Feb 89 01:59:06 PST
Reply-To: "Joan M. Winters" <WINTERS%SLACVM@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
From: "Joan M. Winters" <WINTERS%SLACVM@cuvmb.cc.columbia.edu>
Subject:      Summary of Responses on Hex Codes for Curly Braces
X-Cc:         SAXTON@SLACSLD, JXH@SLACVM, WBJ@SLACVM, BEBO@CERNVM,
              COTTRELL@SLACVM
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>

Folks - Finally, here's my summary on what hexadecimal codes are
actually used around the EBCDIC world to define curly braces (graphic
characters {} at my place), primarily from you on the ISO8859 list.  To
simplify, of the 20 institutions total I heard from:

16  use  X'C0' and X'D0'
 3  use  X'8B' and X'9B'
 1  uses X'C0' and X'D0' for terminals, X'8B' and X'9B' for printers

"Use" means these are the only, default, or primary codes for braces.

Of the 16, 4 mentioned that by default they print both pairs of code
points as braces, even though on input they encode braces only as
X'C0' and X'D0'.  Another provides such "bi-lingual" code sets for
printers, but not by default.  In addition to SLAC, 1 site has old
Tektronix-style plotting software that considers braces to be X'8B'
and X'9B', in spite of a general EBCDIC use of X'C0' and X'D0'.

No organization mentioned plans to convert their code points for
braces.  However, 6 noted conversion within recent years to X'C0'
and X'D0';  3 within the last two years.  1 of the X'8B' and X'9B'
places said they may change some things to accept both code pairs.
Another seems already to have good support for both.  The site with
the X'8B' and X'9B' printer-only default has a new character set that
prints braces for both code pairs.

The places that had converted to X'C0' and X'D0' seemed basically
content with the change.  1 site said they'd never convert again;  2
said if the standard required it, they would one more time.  1
organization even made a plea for being able to re-use the X'8B' and
X'9B' codes points for other characters.  Of the places that use X'8B'
and X'9B', 1 said they'd most likely convert to a standard if such
came to exist.

It's hard to classify some responses.  As usual in this area, answers
often differ within an organization, depending on the exact
circumstances.  I'm bringing the mail I got to SHARE, for those of you
who'll be there and are interested in the gory details.

I enjoyed reading your notes, in all their variations.  Thank you very
much for your help!                                 Joan Winters

 7-Mar-89 22:12:59-GMT,9208;000000000201
Return-Path: <@cuvmb.cc.columbia.edu:ISO8859@JHUVM.BITNET>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA17364; Tue, 7 Mar 89 17:12:54 EST
Message-Id: <8903072212.AA17364@watsun.cc.columbia.edu>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by cunixc.cc.columbia.edu (5.54/5.10) id AA28389; Tue, 7 Mar 89 17:11:14 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 5041; Tue, 07 Mar 89 17:09:02 EST
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 6947; Tue, 07 Mar 89 17:09:00 EST
Received: by BITNIC (Mailer X1.25) id 4183; Tue, 07 Mar 89 17:04:18 EST
Date:         Tue, 7 Mar 89 15:27:55 EST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
From: Edwin Hart <HART%APLVM@cuvmb.cc.columbia.edu>
Subject:      White Paper Executive Summary
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>

Enclosed is a redraft of the Executive Summary of the paper.  It is exactly
two pages long on an IBM 3820 printer.  It is 4 pages on a 1403.

I would appreciate any comments by Friday (March 10).

Thank you.
Ed Hart


                             Executive Summary

          . . . Let us go down, and there confound their language,
            that they may not understand one another's speech.

                                                              Genesis  11:7

     Unless IBM resolves fundamental character set and code issues, Systems
Application  Architecture  (SAA)  will  fail  to fully meet its consistency
goal.  Inconsistencies make using IBM  equipment  unnecessarily  difficult.
People  find  it difficult (1) to exploit PS/2s with mainframe and midrange
systems, (2) to communicate business information and mail  internationally,
and  (3)  to  exploit  applications  and high-level languages.   Because of
mainframe and communications inconsistencies on  a  PS/2,  end  users  type
certain characters and are confused by the results.  Character set and code
problems  create  a human factors trap for end usersthe very people SAA is
to  serve.    The inconsistencies affect not only IBM's European customers6
but also IBM's U. S. and Canadian, English-speaking customers.   In  short,
the  inconsistencies  make IBM systems more difficult to use for both naive
and experienced end users, and this must change for SAA to succeed.

Character Set and Code Problems

     Since the early 1970s, end users have experienced many  problems  with
ASCII  and EBCDIC character sets and codes.  The fundamental problem occurs
because certain  characters  change  when  people  move  them  between  IBM
systems,  MVS/TSO,  VM/CMS, OS/400, and the PS/2.  This problem consists of
four interrelated facets.

The ASCII and EBCDIC Character Sets and Codes Are Inconsistent.

     The ASCII and EBCDIC character sets do not match.    Three  ASCII  and
three EBCDIC characters exist in one code but not the other.  Moreover, the
ASCII  standard evolved but many IBM products still reflect the back level,
1968 standard, rather than the 1977 or 1986 version.   EBCDIC  is  not  one
code  but a family of codes.  People misunderstand this.  In the U. S., end
users use several EBCDIC codes (U. S. standard EBCDIC, TN/T11 print  train,
and various coded fonts for the IBM 3800 printer series, and office systems
EBCDIC).    End  users  are confused because the same character will have a
different binary value assigned in  different  EBCDIC  codes,  and  certain
binary  values  will  have  different character assignments.   As a result,
users of IBM computers must be aware of the code being used.

Translations between ASCII and EBCDIC Are Inconsistent.

     Depending  on  the  computer  and communications system, people obtain
different results when certain keys are struck on a PS/2 or ASCII terminal.
MVS uses translations different  from  VM;  communication  controllers  use
different  translations  than  protocol  converters; ASCII tapes have yet a
different translation.  End users cannot understand this.

     In addition, the IBM "standard" ASCII-to-EBCDIC translation  makes  no
sense  to  English-speaking U. S. and Canadian customers, or to anyone else
for that matter.  For example, to force an end user to type the  ASCII  "!"
to  enter  an  EBCDIC "|", and the ASCII "[" to enter an EBCDIC "!", simply
makes no sense| (oops) !

Required Characters Are Absent from ASCII and EBCDIC.

     Characters required for modern applications and programming  languages
are   missing  from  ASCII  and  EBCDIC.    High  level  languages  require
syntactically-significant characters to have specific binary values.    For
example,   the  NOT  symbol,  "^",  must  be  X5F.    To  compensate,  many
installations  modified  the  translate  tables.     High-level   languages
frequently  allow  alternate,  multiple character sequences for the missing
characters.     However,  end   users   insisted   on   typing   just   one
characterespecially  when the character is on the keyboard.  Also, because
U. S. standard  EBCDIC  lacks  bracket  characters,  installations  defined
EBCDIC-to-EBCDIC  translate tables for IBM 3270 terminals to use IBM's high
level languages.

IBM's Apparent Character Set and Code Strategy Is Inadequate.

     IBM appears to have embarked on a strategy which will resolve many  of
the  problems.   It seems to be based on standardizing on the character set
of the ISO} 8859-1 standard which contains most of the characters  required
for  Western  European  languages.    For  EBCDIC, IBM created nine Country
Extended Code Pages by expanding the  language-dependent  EBCDIC  codes  to
contain  the  full  character  set.    For  the  PS/2,  IBM  created its PC
Multilingual Code.  With these changes, the Western European character  set
is available on all SAA computers.



The International Organization for Standardization.

     Although  we  are  beginning  to  see  some benefits, this strategy is
inadequate.  It was designed so customers could avoid  a  data  conversion.
However,  IBM has never announced any strategy.  As a result, installations
in Europe and North America are diverging  by  focusing  on  two  different
Country Extended Code Pages for the long-term.

Requirements

     Because  the problems and issues are interrelated, customers demand an
integrated solution.  The primary objective is to preserve the  meaning  of
character  data  across  SAA  systems.    This  objective expands into four
different requirement categories.

1.  IBM needs an architecture for character sets and codes in SAA.  Many of
    the end user problems result not from a lack of standards but from  too
    many inconsistent standards.  IBM must focus on one EBCDIC code and one
    ASCII code for the Western European character set.  The paper refers to
    these  as  "Reference EBCDIC" and "Reference ASCII".  IBM must announce
    its direction so customers can  start  planning.    Implementing  these
    requirements  will  solve many issues of the first three problem areas.
    Not implementing them will (a) put IBM at a disadvantage to competitors
    (like Digital Equipment Corporation) which use the ISO 8859-1 code, (b)
    will allow  the  existing  proliferation  of  code  inconsistencies  to
    continue, and (c) make solving the problems later much worse.  However,
    merely defining standards in SAA is insufficient.

2.  IBM  SAA  products  must  exploit the "Reference EBCDIC" and "Reference
    ASCII" codes.   People use computers for  applications.    Recall  that
    current  applications  only  support  specific  codes.   Therefore, SAA
    products must use the "Reference EBCDIC" and "Reference ASCII" codes.

3.  Installations require help migrating  to  the  "Reference  EBCDIC"  and
    "Reference  ASCII" codes. The migration period will extend over several
    years because customers face both IBM and non-IBM software  conversions
    and  have inventories of older equipment.  The primary concerns are (1)
    to migrate once, (2) to minimize difficulties during migration, (3)  to
    allow  each  installation  to choose its own migration plan, and (4) to
    provide tools to assist migration.  Implementing migration requirements
    will help customers rise above the mire of present problems.

4.  SHARE must become more involved in Standards issues.  This is an  issue
    not  for  IBM  but for SHARE.   SHARE must influence standards to avoid
    future problems.

     This summarizes the SHARE requirements for resolving the problems  and
issues.    They will not be easy to resolve.  If they were, customers could
have resolved them years ago.  Resolution will require difficult  decisions
for IBM and its customers.  Nevertheless, the decisions must be made.  Some
in  IBM  believe that nothing need be done now.  This is untrue because the
problems become worse every day.  SAA provides a unique opportunity for IBM
and its customers to break with past problems, and make a fresh start.  But
IBM must act quickly or lose the opportunity.  Act now!
30-Mar-89 20:32:22-GMT,6876;000000000411
Return-Path: <@cuvmb.cc.columbia.edu:ISO8859@JHUVM.BITNET>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA19127; Thu, 30 Mar 89 15:32:15 EST
Message-Id: <8903302032.AA19127@watsun.cc.columbia.edu>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by cunixc.cc.columbia.edu (5.54/5.10) id AA22424; Thu, 30 Mar 89 15:29:26 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 4600; Thu, 30 Mar 89 15:27:44 EST
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 0924; Thu, 30 Mar 89 15:27:43 EST
Received: by BITNIC (Mailer X1.25) id 0137; Thu, 30 Mar 89 15:28:38 EST
Date:         Thu, 30 Mar 89 12:47:24 CST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
From: Michael Sperberg-McQueen <U18189%UICVM@cuvmb.cc.columbia.edu>
Subject:      query about overstruck characters in ISO 8859
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>

Johan van Wingen has pointed out several times in this forum that in ISO
8859, as opposed to ISO 6937, 646, and other earlier coded character
sets, it is illegal to use backspaces to overstrike two characters as a
method of obtaining a new character.  At least, that's what I understood
him to say.  ISO 8859-1 : 1987 (E) says (paragraph 7) "The use of
control functions, such as BACKSPACE or CARRIAGE RETURN for the coded
representation of composite characters is prohibited by ISO 8859."

I have two questions:  (1) just what sorts of activities are supposed to
be forbidden here? and (2) why?

To be more specific:  if I need to print a Serbo-Croatian word
containing a 'c' with an acute accent, I could probably do any of the
following things (depending on my system environment).  Which of them
are legal, and which illegal?  And can we construct a rationale for the
legality and illegality of each?  (= *should* they be legal?)

(a) embed the sequence 'c' BACKSPACE &acute. (hex 63 07 B4) in my file
(if I'm using an editor that allows me to embed backspace characters, as
some do and some don't) and let the printer, the display, and other
devices deal with it as best they can.  The display will probably show
me the acute, and the printer will do an overstrike, unless it's a line
printer, in which case I may get a variety of things but almost
certainly not what I want.

(b) use a Script command like ".dc bs <" and then use the combination
'c<&acute.' in my file.  Script will arrange to have the acute and the
'c' overstruck, either by issuing a backspace or by doing something
else.

(c) use the same Script command, and also define a Script symbol with
".sr cacute = 'c<&acute.'" or ".sr cacute = 'c&sysbs.&acute."  and then
in my file use "&cacute." instead of "c<&acute."

(d) use some relevant system facility (either in Script or in a
microcomputer word processor) to define the width of hex B4 as 0.  Then
send the sequence hex B4 63 to the printer.

(e) use the editor or some (imaginary) Script facilities to embed a
sequence like ESC '-' 'B' (hex 1B 2D 42) at the beginning of my file to
set up ISO 8859-2 as my G1 character set, and then in my file embed
SHIFT-IN X'B6' SHIFT-OUT (hex 0F B6 0E) for the acute-accented 'c'

(f) embed the ESC '-' 'B' sequence in some way, use Script's symbol
facility to define ".sr cacute = &x'0FB60E' " and then use "&cacute."
in my file as usual.

If I understand the text of paragraph 7, approach (a) is clearly in
violation of the spirit and letter of the standard.  What about approach
(b)?  In my file, I'm not using any control characters to create
composite characters:  only graphics.  I don't expect any editor to
resolve the multi-character encoding for me and display an accented 'c'.
But I am, I admit, using backspace or CR in the printer stream (or if
the printer is more sophisticated, maybe something even more devious).
Or perhaps I'm not.  I don't know what Script97 does with the Xerox
9700; all I know is that the ".sr" command given should give me
something resembling the character I want on my output.

Approach (c) is much the same as (b), except that a lot of these symbols
are already defined at installation.  Is it a violation of the standard
to use them, if they produce backspaces in the printer data stream?

Approach (d) avoids the backspace in the data stream, but probably
violates another part of paragraph 7:  "None of these characters are
<q>non-spacing<eq>."

Approach (e) and (f) sound as though they are what the standards
committee expects us to do.  But given that very few pieces of software
will handle such escape sequences, I am not sure what paragraph 7 can
mean or is supposed to mean for sites, developers, or end users.  If I
cannot use character 11/4 (acute accent) to form composite characters,
why is it there?  For use in mathematics to distinguish symbols (K and
K' = K-prime)?  In that case it would be far better to use slots 11/4,
10/8, 11/8, and 10/15 to include Turkish, and define another single
character set for all sorts of mathematical symbols.  ("Lead us
not into temptation.")

I imagine the point of paragraph 7 must be to say that extension of the
character set to handle things like accented 'c' should be done through
the extension techniques defined by other ISO standards, and not by
overstriking characters of the ISO 8859 sets.  In an ideal world, all
the equipment would support ISO 8859-1 through -9, and ISO 2022 and so
on.  But in the real world -- is it considered a violation of ISO 8859
to use non-standard code extension techniques in order to make
non-conforming equipment produce appropriate results?  Our printer
probably doesn't have a-umlaut as a separate character.  Is it a
violation of paragraph 7 to write a printer driver that reads character
14/4 from a file and sends an overstrike sequence including BACKSPACE to
the printer?  Would it be a violation if the printer driver translated
from ISO 8859 to ISO 6937?

Frankly, I find the blanket prohibition against use of BACKSPACE and CR
in paragraph 7 a bit confusing and don't believe I understand the logic
behind it.

I am involved in a large international project to formulate methods for
encoding literary and linguistic data in machine-readable form.  It is
important that we be able to recommend sound practice for encoding
diacritics.  To me, that means practice which agrees with relevant
standards.  But it is also essential that the recommended practice be
something that people can actually work with using the software that
exists.  So I am particularly interested in finding out what the
character set committee had in mind when they wrote paragraph 7.

-Michael Sperberg-McQueen
 Editor in Chief, ACH / ACL / ALLC Text Encoding Initiative
 University of Illinois at Chicago

31-Mar-89  2:05:08-GMT,2089;000000000001
Return-Path: <@cuvmb.cc.columbia.edu:ISO8859@JHUVM.BITNET>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA20280; Thu, 30 Mar 89 21:05:06 EST
Message-Id: <8903310205.AA20280@watsun.cc.columbia.edu>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by cunixc.cc.columbia.edu (5.54/5.10) id AA19487; Thu, 30 Mar 89 21:01:50 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 4791; Thu, 30 Mar 89 21:00:35 EST
Received: from BITNIC.BITNET by CUVMB.CC.COLUMBIA.EDU (Mailer X1.25) with BSMTP
 id 1560; Thu, 30 Mar 89 21:00:34 EST
Received: by BITNIC (Mailer X1.25) id 8444; Thu, 30 Mar 89 21:01:34 EST
Date:         Thu, 30 Mar 89 18:53:57 EST
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
From: Frank da Cruz <fdc%WATSUN.CC.COLUMBIA.EDU@cuvmb.cc.columbia.edu>
Subject:      Re: query about overstruck characters in ISO 8859
X-To:         ASCII/EBCDIC character set related issues
              <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
X-Cc:         Christine M Gianone <cmg@watsun.cc.columbia.edu>
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>
In-Reply-To:  Your message of Thu, 30 Mar 89 12:47:24 CST

We share your curiosity about the ISO8859 prohibition on composite
characters.  Not that it doesn't make sense -- ISO 8859 wants a character
to be a character, so that it is possible for character and string
oriented software to deal with text in a uniform way.  Hence ISO 8859
shuns the composite "character building" allowed by ISO 646, and *required*
by CCITT T.61.  Our curiosity, like yours, is about how mixed-alphabet
data is to be stored on disk.  This relates closely to an extension to the
Kermit file transfer protocol that we're working on, for transferring text
in mixed alphabets between unlike systems.  If you'd like to read & comment
on it, or want to be added to the "isokermit" discussion group, let us
know.  - Christine Gianone and Frank da Cruz

31-Mar-89 11:09:56-GMT,7097;000000000000
Return-Path: <@mitvma.mit.edu:KLENSIN@INFOODS.MIT.EDU>
Received: from mitvma.mit.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA21290; Fri, 31 Mar 89 06:09:52 EST
Received: from INFOODS.MIT.EDU by mitvma.mit.edu (IBM VM SMTP R1.2) with TCP; Fri, 31 Mar 89 06:09:44 EST
Received: by INFOODS id <00002470066@INFOODS.MIT.EDU> ;
       Fri, 31 Mar 89 06:01:57 EST
Date: Fri, 31 Mar 89 05:24:36 EST
From: John C Klensin <KLENSIN@INFOODS.MIT.EDU>
Subject: Overstruck characters and 8859
To: Frank da Cruz <@mitvma.mit.edu:fdc@watsun.cc.columbia.edu>
X-Vms-Mail-To: EXOS%"Frank da Cruz <@mitvma:fdc@watsun.cc.columbia.edu>"
Message-Id: <890331052436.00002470066@INFOODS.MIT.EDU>

Frank,
  First of all, if you have an isokermit list going, please add
me to it.  Maybe, even though the newsletters seem to have stopped 
getting through to here, and info-kermit-request respondeth not, I can
get that. 
   Klensin@INFOODS.MIT.EDU

  I'm waiting until I have a chance to study the responses to the 
original question for a bit before I put together a response of my own
(which, by then, may not be necessary) but let me provide a piece of the
answer from a standards-policy viewpoint.
  One of the big problems with this evolving standards stuff is a global
lack of coordination. We are at a sufficiently primitive point that
"coordination" means "telling other people what you are doing", and we
are not doing very well at that. ANSI has just initiated its third--in
about as many years--attempt at a system for on standards developer
notifying others when new projects are initiated.  The other two fizzled
out into nothing in short order and, in at least some respects, the
ISO/IEC/CCITT situation is worse. 
  Now, against that backdrop, ISO/IEC JTC1/SC2 and its ANSI/X3
equivalent ought to be forced to (a) make clear statements about what
each of these character set standards is *for* and how each relates to,
and can be translated to and from, any of the others and (b) understand
that more alternatives is often a vice, not a virtue.  Otherwise, they
are headed, and heading us, rapidly down the path that IEEE 802 seemed
to be going down for a while: you can "standardize" any network physical
and link level technology you like, as long as you can write a clear
specification.  Better than not writing a clear specfication, I suppose.
  SC22 (ISO programming languages) has finally (after umpity years) 
established a strong liaison with SC2 and is beginning to say "look 
guys, some of these things are impossibly difficult in use, and there 
are some things you have to specify".  It is not clear that will cure 
the problem.
  Anyway, CCITT's traditional goal has been clear--to transmit the 
maximum number of character representations down a communications line, 
with minimum switching around, and a minimum requirement for really 
fancy hardware at the far end.  Hence a lot of overstrike logic.  *Some* 
of the SC2 standards follow that tack, and are standards for 
transmission of characters over communications links.  But, if you are 
trying to do a programming language system--especially if you are trying 
to compare, catenate, or overlay character strings--variable-length 
logical characters (which is what a graphic BS graphic amounts to) is a 
pain in the neck to deal with.  Even the definition of the length of a 
string gets funny when length-in-"bytes" is not equal to length-in-
logical-characters.   So 8859 comes along, and, with good intentions and 
for good reasons, they say, or try to say, "no composite characters".
  And, of course, someone comes along and says "but I want to have 
composite characters, how do I do it?"  In the 8859 world, you don't.  
You have a code point and, at any point, you need to know which 8859 
element that code point is to be interpreted with respect to.  That 
combination of character set and code point--I know of one experimental 
implementation in progress that simply canonicalizes all of the 
switching into and out of 8859 sets into representing each "character" 
internally with two octets, the 8859/n set and the code--gives a unique, 
testable character, under a rule that two character sets means two 
different characters, even if the graphics are the same.  If you want a 
rule that says "if the graphics and/or character names are the same, the 
characters are the same", then you need a further canonicalizer that 
prefers, for example, low-numbered 8859 sets to high-numbered ones.  And 
dealing with multiple sets requires very high tech devices, which can 
understand all of them and, presumably, bit map characters onto the 
screen.  'Taint a $400 terminal.

  The kermit meta-question depends on what you are trying to do, and 
what needs you are trying to solve and, to partially repeat what I've 
said earlier, the needs and requirements are different enough that I'd 
get out my ten foot pole and use it to define a boundary between "data 
transfer" and a lot of very complex data transformation issues.  Let me 
suggest a nasty analogy.  Plus or minus a certain amount of precision 
loss, it is possible to convert any floating point number representation 
into any other.  I don't much favor the idea, but it would be possible 
to invent a way of defining floating point formats, and to define a 
"kermit-standard" floating point.  You could then fix up an attribute 
packet that would say "this here file is completely in kermit-standard 
floating point" and expect that kermits at both ends would convert 
between local representations and that format.  Problem is that either
it would work only for files that contained nothing but floating point 
numbers, or you would have to invent a mechanism for flagging which 
values were floating point and which were something else.  The number of 
"pure" floating point files drops each year, especially since people 
want to transmit, e.g., array dimensionality, with their data files. 
And, right after you headed down that slippery slope, we would be 
talking about a general kermit self-describing file.
  I would think about this as a way to describe the "thing" that is 
being transmitted--an atomic file, if you will.  "thing" descriptions 
are pretty simple: 646Text.  You-better-not-mess-with-this-"binary". 
8859-1Text.  8859-nText, where "n" is another attribute.  Now, there is 
nothing wrong with T.61Text as a "thing", as long as no one has 
delusions about conversions between graphic stylizations associated with 
T.61 and characters associated with 8859-n being performed 
automatically, especially in poor, helpless, kermit programs as distinct 
from converters with lots of user-adaptable tables and heuristics of 
their own.  T.61, if I recall, lacks even the elementary required 
canonicalization rules that made string compares work on Multics (those 
are, effectively, designed around the "if it looks the same, it is the 
same" principle, something that 8859 implicitly disavows).

   john
   Identification of hat being worn as this is written:
     Chairman, ACM Standards Committee; Member, ANSI/ISSB.
   Klensin@INFOODS.MIT.EDU


31-Mar-89 16:05:07-GMT,37228;000000000001
Return-Path: <FDCCU@cuvmb.cc.columbia.edu>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA21708; Fri, 31 Mar 89 11:05:03 EST
Message-Id: <8903311605.AA21708@watsun.cc.columbia.edu>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by cunixc.cc.columbia.edu (5.54/5.10) id AA27286; Fri, 31 Mar 89 11:01:38 EST
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 5027; Fri, 31 Mar 89 11:00:28 EST
Received: by CUVMB (Mailer X1.25) id 2511; Fri, 31 Mar 89 11:00:25 EST
Date: 03/31 10:11:34
From: FDCCU@cuvmb.cc.columbia.edu
Subject: PUN file from RSCS - MOSGLA.MAIL
X-Tag: FILE (4053) ORIGIN HLERUL2  MAILER    3/31/89  5:15:33 E.S.T.
To: fdc@cunixc.cc.columbia.edu
Reply-To: MAILER%HLERUL2@cuvmb.cc.columbia.edu

Date:    Fri, 31 Mar 89 16:57 CET
From:    "Johan van Wingen"                          <MOSGLA@HLERUL2>
To:      "M. Sperberg-McQueen"                <U35395@UICVM>,
         "F. da Cruz"                         <FDCCU@CUVMA>,
         "E. Hart"                            <HART@APLVM>
Subject: overstruck characters

Dear Character Overstrikers
By way of attempt to convince you that there are good reasons for
prohibiting composite characters in ISO 8859 I send here the revised
version of my ISO paper (Ed, you have seen the first version). It are
670 lines.

_ INTERNATIONAL ORGANIZATION FOR STANDARDIZATION


                                               ISO/IEC JTC1/SC2  N 1961R


                                               ISO/IEC JTC1/SC22 N  578R
-
                                                         September 1988
 |                                                   Revised April 1989
0|                                                          VERSION 1.2
- CODED CHARACTER SETS AND PROGRAMMING LANGUAGES
- Johan W van Wingen
0 Leiden, the Netherlands

- Personal contribution

- Table of Contents
0 0   Introduction
  0.1   The Problem
  0.2   Terminology, notations and conventions
0 1   Coded Character Sets
  1.1   The birth of ASCII
  1.2   Extension of the character set
  1.3   Composite characters
  1.4   Multiple-byte character sets
0 2   Languages
  2.1   Computer data processing
  2.2   Operating system considerations
  2.3   Basic elements of the language
 |2.4   Problems of character representation
  2.5   Non-English languages and Information processing
  2.5.1   Linguistic skeleton of the language
  2.5.2   Identifiers
  2.5.3   Comments
  2.5.4   Handling textual data in the program
  2.5.4.1   Unrestricted strings
  2.5.4.2   Restrictions on string content and their validation
  2.5.4.3   The type "character"
0 3   Sorting considerations
0 4   Conclusions
 |4.1   Recommendations to SC22
 |4.2   Recommendations to SC2
 |4.3   Unsolved issues
0 Annexes

1

                  Entia non sunt multiplicanda praeter necessitatem.
                  (Entities are not to be multiplied beyond necessity.)
                                                   William of Occam

  0  INTRODUCTION

  0.1  The problem

  In recent years there has been an increasing demand for computer
  facilities that do not need the English language for their expression.
  In the field of International Standards this affects in the first
  place the work of ISO/IEC JTC1/SC2, Characters and Information Coding,
  because this committee develops the elementary tools for expressing
  everything dependent on language. SC22, Languages (for Information
  Processing) is one of the important users of these tools, and at the
  same time the primary target for requirements from non-English
  speakers. At its 1987 Washington meeting two resolutions were adopted,
  that formulated the principles of a future policy (see SC22 N 406,
  Resolutions 85 and 86).

  Up to now several papers have been produced on the subject, (SC22 N
  113,357,410,403,410,444,460,470,509, SC22/WG10 N 130,204,208,211,213,
  214), a number of them by the SC2/SC22 Liaison, Mr. Holka. These
  showed to SC22 that the SC2 matter is far from simple, and difficult
  to explain. In a reaction, on N 410 in particular, the Convener of
  SC2/WG3 complained of inaccuracies, of the use of a non-standard
  terminology, and of a general ignorance of the aspects of the work of
  SC2 (N 509). To resolve the issues he suggested a joint meeting of SC2
  and SC22 delegates, which idea is to be acclaimed. The present paper
  is intended as a first contribution to the working documents for that
  meeting, and as a renewed attempt at illustrating the relations
  between the SC22 and SC2 products in a clear way, while acknowledging
  the valuable ideas and suggestions from Mr. Holka.

  This paper does not express any opinion of the Netherlands Member
  Body (NNI), not from any disagreement on the content, but because
  taking any position is considered premature at the moment.

  0.2  Terminology, notations and conventions

  The terminology in this paper is that of the ISO standards in the
  field. The terms "bit pattern", "bit combination", "byte" are used
  almost as synonyms, "bit string" is not used. "Byte" is not restricted
  to 8-bit combinations. For those, "octet" is used instead. Bytes are
  denoted with the customary hexadecimal representation, but
  incidentally also according to the ISO convention (15/15 for FF).
  Where clear from the context, "character" means "graphic character".
  All graphic characters that are not letters or digits are called
  "specials". The terms "control character" and "control function" are
  used as defined in ISO 2022. Where "language" is used, it is in the
  sense of the SC22 scope, unless it can be derived from the context
  that "natural language" is meant.

1 1  CODED CHARACTER SETS

  1.1  The birth of ASCII

  The idea of coding data is rather old. For several purposes it
  appeared necessary to represent texts or numbers in a form other than
  spoken or written. The Morse code was an important step in a long
  development, as was the Hollerith punched card. The idea of having
  holes as a unit of information, the bit, was very fruitful, and could
  be generalized for use on electronic media. As early as 1931 the 5-bit
  TELEX code (CCITT # 2) was adopted, introducing the concept of bit
  pattern, or bit combination. As main areas of application of
  representing data with bit patterns emerged in the course of time:

  1. Storage of data.
     Numerical results of the census could be stored
     in punched cards and manipulated in a simple way.
     Sorting in particular became easy to do.
  2. Transmission of data.
     Texts could be transferred by telex in an easier way
     than was possible by Morse code.
  3. Processing data by a computer.
     When computers were developed, bit patterns played
     an essential role. Storage and registers were organized
     in "machine words", bit patterns of fixed length. Most
     popular were 24,32,36,48,60,64.

  Increasing use of electronic methods necessitated the adoption of
  standards, which had to serve the areas of application where data
  interchange was of primary importance. Thus ASCII, a 7-bit code
 |(characters mapped on 7-bit patterns) saw the light in 1963. An
  excellent description of the developments leading up to ASCII is found
  in the paper by Bemer (1) and the book by Mackenzie (2).

  ASCII provided codes (assigned bit combinations) for 94 graphic
  characters (26 letters, 52 after 1968, 10 digits and 32 specials), the
  SPACE and 33 control characters for control functions. The code table
  is in FIG 1. The control characters are in columns 0 and 1, the
  capital letters in 4 and 5, small letters (after 1968) in 6 and 7,
  digits in 3, SPACE at position 2/0, DELETE at 7/15, specials in the
  positions left over.

  ASCII was designed by its structure to serve the first two application
  areas well.

  -- By assigning to letters bit patterns in ascending order without
  gaps, a contiguous "collating sequence" could be defined, easily
  implementable on a electronic device. (The old telex code did not
  possess this property.)

  -- By providing codes for control functions and making them easily
  recognizable by putting them together in two columns of the code
  tables, ASCII was well suited for transmission of data, text in
  particular.

1 For internal processing by a computer ASCII was not very well adapted.
  A 7-bit machine word is hardly usable. For internal representation of
  codes 6-bit or 8-bit "bytes" were much better, as 6-bit bytes could be
  contained in a 24,36,48,60 bit machine word 4,6.8,10 times, or a 8-bit
  byte in a 32 or 64 byte word. Only DEC succeeded in putting 5 ASCII
  characters into a 36-bit word. It is no surprise that many computer
  manufacturers defined their own 6 or 8-bit coded character sets for
  their specific machine use. Particularly influential became EBCDIC
 |from IBM (FIG 1).

  ASCII has another important property (not present in the old TELEX
  code). Every character of the set has a unique code, and every bit
  combination has a unique meaning. The presence of 8-bit bytes in a
  computer poses a new problem. If we want to transfer collections of
  these outside the computer ASCII does not provide facilities. We may
  define certain 8-bit combinations as being equivalent to ASCII codes,
  but even then we are faced with the fact that there are 128 left
  without a clear meaning. For the interpretation of these we would need
 |what we could call a Standard for Charactered Code Sets. In other
 |words, a standard (as it is now) specifies a mapping of every
 |character it includes to a single byte. What we want is that it also
 |says how every byte shall be interpreted as a character.

  The essence of data communication by transmitting characters coded in
  deviation from ASCII is "a previous agreement between sender and
  recipient of the data". The problem with transferring computer data
  from and to storage is that it is mostly not clear who sender and
  recipient are. Not only should coding be defined, but also
  interpretation. A statement that certain "bit combinations shall not
  be used" is as sensible as saying "a program shall not contain
  errors". It is the task of the programmer to code his program
  correctly, but of the compiler to interpret the sequence of bytes,
  without stopping at the first unrepresented byte.

  1.2 Extension of the character set

  It was clear from the start that ASCII deserved an international
  status such as could be achieved under the responsibility of ISO.
  Because countries other than the US have different requirements to the
  contents of the coded character set the approved document ISO R 646
  contains options for a number of positions in the code table. Once
  exercised, the result is a National Version, ASCII being the US
  National Version. Unfortunately this implied that the principle of
  unique code-character correspondence was abandoned.

  With the rules of ISO R 646-1968 (revised in 1973 and 1983) it became
  possible to code texts in Danish or Swedish, which carry a 29 letter
  alphabet, at the price of losing 6 specials. Further needs such as
  accented letters (for French) or additional specials could not be
  satisfied.

  To this purpose an extension scheme was devised, standardized as ISO
  2022. The idea is that different characters may be coded with the same
  bit combination. To indicate which character is meant, a control
  function SHIFT is inserted (several are defined) or a ESCAPE sequence
  with analogous effects. At reading (or receiving), each time a SHIFT
  or ESCAPE sequence is detected, the "state" of the reader changes, and
  a different code table is accessed. (Possible code tables may be a
  national version of ISO 646, or one registered by the Registration
  authority.)

1 ISO 2022 provides the means for coding an almost unlimited number of
  characters by a single, but not unique bit combination. It is not
  restricted to 7-bits, but was later extended to include 8-bit coded
  character sets, as soon as the structure of these was defined in ISO
  4873. Because reading data encoded according to ISO 2022 requires a
  finite state machine with very many states, practical use never has
  been extensive. With the advent of hardware with 8-bit facilities
  partial solutions for the more urgent problems became feasible.
  Nevertheless, ISO 2022 supplies the general method in all cases where
  switching of code tables is unavoidable. Even for multiple-byte coded
  sets rules are defined.

  ISO 4873 specifies the structure of 8-bit coded character sets, but
  does not define a single code table. It fixes the content of some
  areas, but for the rest only options are given. For the purpose of ISO
  2022 sets are identified with a set designation C0,C1,G0,G1. Control
  characters occupy columns 0,1 (C0), 8,9 (C1). Columns 2-7 (G0) are
  identical with those of ISO 646 (including its options). For 10-15
  (G1), options for 94 or 96 characters are specified. Thus ISO 4873 is
  only a generic standard, providing for 188 or 190 graphic characters.

  1.3  Composite characters

  In order to restrict the complexities of coding by the ISO 2022
  method, especially where hardware does not allow midstream code table
  switching, other approaches for extending the available number of
  graphic characters were recommended. Some characters can be
  represented by combinations of several other characters. Following the
  practice of overprinting, ISO 646 allows creation of composite graphic
  characters by the use of BACKSPACE and/or CARRIAGE RETURN. But it
  warns (on p. 7):  "According to clause 5 it is permitted to use
  composite graphic characters and there is no limit to their number.
  Because of this freedom, their processing and imaging may cause
  difficulties at the receiving end. Therefore agreement between sender
  and recipient is recommended if composite characters are used."

  To meet this pitfall, ISO 6937 follows a different approach. There are
  simple and composite graphic characters. Several characters are coded
  with a single bit combination (digits, specials, letters of the Latin
  alphabet and some additional ones). Others are coded by a double one:
  the first representing a diacritical mark (non-spacing), the second a
  Latin letter (spacing). Arbitrary composite graphic characters are not
  allowed. The number of graphic characters defined by ISO 6937-2 is
  restricted to those occurring in a "repertoire". Equally, not all
  "duples" are permitted, only those included in the repertoire. It is
  assumed that these duples can be displayed by hardware (the "character
  imaging device") as one single graphic symbol. ISO 6937-2 defines a
 |"primary" (G0) and a "secondary" (G1) set, which can be combined to
 |form the graphic character part of an 8-bit code (popularly, the left
 |and the right hand of the table). In this way a unique, but mixed
  single/double octet representation of characters is created. All
  European languages and several others can thus be represented.

1 ISO 8859 was developed for presenting a unique single octet
  representation of graphic characters. Because not all characters that
  are desired can be accomodated in a 94+96 code table, ISO 8859 is in
  several parts, each defined for a particular region of the world,
  serving the need of groups of languages (Europe: West, East, North,
  South; Cyrillic, Greek, Arabic, Hebrew). Each part contains the 94
  (G0) from ASCII as a subset, suppleted by a varying 96 (G1) set
 |(FIG 2, IBM followed by extending its EBCDIC to contain the same 190
 |graphic characters, FIG 3). Thus the code of ISO 8859 is only unique
  in a restricted sense. Where graphics from different regions are to
  be combined in a text, switching techniques from ISO 2022 are
  required.

  1.4  Multiple-byte coded character sets

  Multiple-byte character sets have attracted a lot of attention, in
  recent times. From this it might seem that it is a clearly defined
  concept. But it is not, it is not even a new one. Four schemes have
  emerged as yet:

  --- That of ISO 646. Characters may be represented by 1,3,5 or more
  bytes, by use of sequences such as "char" BACKSPACE "char", and so on.

  --- That of ISO 6937-2. Graphic characters may be formed from
  diacritic plus letter, giving a mixed single/double byte
  representation.

  --- The scheme used by the Chinese, Japanese and Korean national
  standards. A graphic character is represented by two bytes, each taken
  from a 7-bit set with 94 positions, the same as is used for the
  graphic characters in ASCII.

 |--- A standard in development by SC2/WG2, to which the number DP 10646
 |now has been assigned. All imaginable characters of the whole world
  (except cuneiform and hieroglyphs) are uniquely represented with 4
  bytes per character.  This is the price to be paid for doing without
  ISO 2022.
 |
 |Besides these four, several schemes have been invented and used in
 |Japan and China that mix single and double byte character
 |representations, in a bewildering variation.

  2.  LANGUAGES

  2.1  Computer data processing

  Computer data processing consists, in its simplest form, of a program
  operating on data. Data has to be represented in bit patterns, that
  is, in machine words, or parts thereof that can be addressed by the
  hardware organization. Some of these parts may be considered as a
 |character, or better yet, as the internal representation of a single
 |character. Mostly, the bit patterns of these representations
  are not identical to those found in any ISO standard.
 |
1 A program is written in a language. It may exist written on paper, or
  even in the mind of the programmer. But as soon as it is prepared for
  input into the computer it consists of a sequence of character
 |representations, perhaps divided into lines by some device. Once
  stored in the computer, there is no intrinsic difference between a
  program and data. Both need representation. After the program has been
  compiled to an executable form, its representation has changed, but is
  still expressed in the bit pattern of the machine. What makes data a
  (potential) program is the place in storage, a matter of
  interpretation by the computer.

  2.2  Operating system considerations

  Computers, at present, do not work by single programs only. It is the
  Operating System that, as one of its tasks, performs the data
  management. It assigns meaning to some data, and decides on the shape
  of output or on the validity of input, quite often outside the control
  of the program that is supposed to handle these. In fact, speaking in
  transmission terms, it acts both as "sender and recipient of the
  data". Because of that, it generally stores the description of the
  nature of the data outside the data itself, contrary to the practice
  in transmission ("announcers" being part of the data stream).

  Another aspect of the data management is the division of data into
  "records" (and sometimes "blocks"). This may be a different concept
  from that of "line", such as is defined by the language of the
  application program.

  2.3  Basic elements of languages

  A program is, according to the language definition, always built from
 |basic elements. They may be called "characters" or "basic symbols",
 |but they are essentially of an immaterial nature. They may look like
 |letters or mathematical symbols, but sometimes they may never be used
 |outside that particular programming language, like some APL
 |characters. To be suitable for input into a computer they must be
 |transliterated into sequences (or perhaps lines of sequences) of those
 |characters the computer can represent. The transliteration rules, from
 |the abstract character (used in the language definition) to the
 |concrete character of the machine, are traditionally called "hardware
 |representation". The concrete character sequences on the input medium
 |are read in by the computer, and processed by the language compiler.
 |This external representation of the basic element will then be
 |converted again, now to the internal representation used by the
 |compiler, often an integral value.
 |
  The importance of this step is sometimes ignored by defining the
  character set of the language (if there is one thus called, otherwise
  the list of all the symbols needed for expressing a program in the
 |language) as being identical to that from an existing standard for
 |coded character sets, without indication for any extension. At this
  point the restriction of the elements to English expression creeps in.
 |Some designers of a language have even been so silly, that they use
 |brackets and braces, without substitutes, not realizing that this
 |precludes its use in Scandinavia, where these characters are replaced
 |by letters from the extended alphabet in use there.

1|2.4  Problems of character representation

 |Even with a single byte character representation, coded programs
 |generally have to be translated into the character set of the actual
  computer at input. This may be a one-to-one process only.
  But if characters are represented by a varying number of bytes things
  grow worse. This is aggravated when the hardware representation is not
  unique. APL uses graphic characters not found in any ISO standard.
 |These can be produced as composite characters using BACKSPACE (in the
 |following abbreviated to $). Now, if we want an underlined capital A,
  we can write A$_ or _$A (with ISO 646), or _A (with ISO 6937-2).
  Which of these is acceptable is a matter of hardware representation.
 |One may do, or both, or several others.
 |
  In ALGOL 60 end is a single basic element, that can be written as
+             ___
  e$_n$_d$_ or _$e_$n_$d, but also as end$$$___ or ___$$$end, with a
  surprising number of other combinations possible, (ISO 6937-2 allows
 |only _e_n_d, which is an improvement). If all of this is permitted we
  have created the "line reconstruction problem", which was solved in
  the ALGOL 60 compiler for the Ferranti ATLAS (4). Few people are
  prepared nowadays to accept these complexities. Should an ISO 2022
  style of coding be permitted, including code table switching in the
  middle of program lines, then designing an adequate hardware
  representation scheme requires real genius.

 |There is another point to consider. In certain parts of a program
 |literal use of characters is needed (in strings for example). They
 |have to be stored and handled as such by the compiler, and thus an
 |internal representation is required, in contrast with the external
 |representation in which they are being read in. If they are always
 |being coded with the same number of bytes both representations can be
 |made identical. Otherwise many implementers may have to resort to an
 |internal mapping on integers. Then it would be difficult to use the
 |hardware of the present day octet-machines efficiently. If integer
 |arithmetic is to become needed for handling characters, the clock has
 |put just backwards to the situation of 25 years ago, when Fortran and
 |Algol provided for character processing this method only.

  The conclusion must be drawn that, other than in exceptional
  situations, only coded character sets that are unmixed and unshifted,
 |not permitting the use of BACKSPACE, are acceptable for coding a
  program text. Otherwise, strict and perhaps complex rules are required
  to ensure a unique hardware representation. These sets are ASCII, and
  the single parts of ISO 8859, that is ISO 4873 without shift (called
  Level 1). A consistent double-octet scheme, such as found in the
  "west-pacific" standards, may also be considered.

  2.5  Non-English languages and Information Processing

  Traditionally, the Information Processing world is English speaking
  only. Now that the access to this world is no longer reserved to an
  intellectual elite, this practice has become a untenable barrier to
  large groups of people. The extent of the problems has been
  excellently summarized in the SEAS White Paper on National Language
  support (3). With programming languages we distinguish four areas
  requiring attention.

1 2.5.1  Linguistic skeleton of the language

  Almost every language has a number of elements looking like words from
  the English language. Some have been assigned a fixed role, some may
  be chosen freely. Those that are fixed, together with some specials
  (brackets, separators, delimiters), constitute the linguistic skeleton
  of every program. According to the specific language definition, they
  may be called "word symbols", "reserved words" or "keywords". They are
  supposed to show a program as a running text in English. It is clear
  that it is not generally possible to translate every single word into
  another natural language without violating its syntactic rules. Some
 |statements may even become ambiguous. Language definitions explicitly
 |containing provision for expressing programs in languages other than
 |English are scarce, that of ALGOL 68 being the most notable. The ALGOL
 |68 Report has been translated successfully into Bulgarian and Chinese
 |(5). But in general it may be advisable to keep the word symbols as
 |they are in English.

  2.5.2  Identifiers

  For naming quantities, variables, labels and so on, "identifiers" are
  commonly used. These word-like constructs are mostly defined as
  starting with a letter, and continuing with letters, digits and,
  sometimes, the low line. The problem lies in the definition of
  "letter". In the beginning only capital letters were allowed, but even
  after adding 26 small ones the Scandinavian languages cannot be
  served. As compilers to a large extent are US products (or written
  elsewhere with an eye on the US market) they use ASCII or EBCDIC. This
  means that characters from a national version of ISO 646 are
  interpreted as specials (brackets etc.). Applying an 8-bit code like
  one from ISO 8859 is the way out. But even then, the compiler has to
  be told which part of 8859 the program makes use of, because what may
  be a letter in one part may be a special in another (FIG 4).

  Mixing characters from several parts from ISO 8859 requires invoking
  the help of ISO 2022, which a compiler writer would not like. Checking
  whether a byte is meant to be a letter would be easier if the letter
  areas of ISO 8859 would have been contiguous. Instead of that, quite
  obsolete characters for multiply and divide, for which * and / are
  used in programs for more then 25 years, have been inserted in the
  middle of a column. A look-up table is required to decide whether a
  character is considered a letter or not. This cannot be avoided anyway
  if a double-octet representation is used.

  2.5.3  Comments

 |In general a comment does not present problems when containing any
 |byte, if it is only clear where it stops (or begins). If a ";"
 |(semi-colon) is defined for stopping, and the hardware representation
  transliterates this as ".," unintended effects may occur, especially
  if spaces are ignored. Besides this, it does not matter which
 |characters the bytes represent, even if some of them cannot be
 |displayed properly.

1 2.5.4  Handling textual data in the program

  In order to produce output of text, means to handle its elements must
  be provided. The usual method is a "string", also called "text
  constant" or "text literal". A string need not consist of single
  characters; it may even be nested, in some languages. If we take the
  simplest form, it remains to be specified what a string can contain:
  "graphic characters", or "anything that is allowed by the processor",
  say "bytes".

 |-- If "graphic characters", some may be represented by a single byte,
     some by a double. There may be a SHIFT character in it, causing a
     change of character set from capital to small letters, or to a
     different national alphabet. All this can be implemented, and has
     been, as early as 1962, in the Dijkstra ALGOL 60 compiler for the
     X1. However, problems arise when operations on strings are
     introduced, not provided in the definition of ALGOL 60.

 |-- If "anything", bytes may be all 6-bit, or all 8-bit, as with the
     48-bit word Burroughs computer, where a word may contain 8 6-bit
     bytes, or 6 8-bytes, necessitating the inclusion in a program of a
     more precise description of the kind of string that is meant, with
     a type indicator.

  Only if the program is coded according to a standard, that is, by
  defining a unique relationship between all bytes and all characters,
  it becomes possible to have characters and bytes indiscriminately as
  elements of a string, and to introduce counting the number of those
 |elements without ambiguity. Internal and external representation can
 |be made identical. If not, the character count can be different from
 |the byte count, and string parsing is necessary.

  There remains the question, even with a single octet character
  representation such as that from ISO 8859, what to do with a byte
  that is not "printable", because there is no graphic character defined
  for it. It should be remembered, however, that actual printing is
  outside the control of the application program, and left to the
  supervision of the operating system, which may process options as to
  the selection of a printing font, or a coded character set. This may
  result in something quite different from that in which the program
  originally was coded (FIG 5,6).

  FIG 5 shows a little program (in SNOBOL) that converts Greek text in
  Latin transliteration to one in single Greek letters (even with
  diacritics). It is printed with the normal printing font which does
  not include these Greek letters, which are thus invisible (though
  present) in the result string (HEXALL). But exactly the same program
  can also be printed (FIG 6) with a Greek font corresponding to a Greek
  character set, which shows the contents of the string clearly, but the
  identifiers all in Greek. To the compiler it does not make a
 |difference, because all the bytes are the same.
 |In FIG 7 it is demonstrated that if a proportional script is used, a
 |table layout may be completely obscured.
 |In that style a program cannot be understood from its printout only.

  It is the task of the operating system as well, to deal with the
  control characters or sequences in the text. There are two aspects to
  be envisaged: presenting the program text for inspection and
  understanding, and specifying certain actions to be performed by the
  output.
 |
1 As ISO 6429 specifies, the control functions indicated may be
  disabled, and interpreted as graphic characters, by changing the
  "mode" of a device at detecting a specific control function in the
  data stream, and restoring the action by another. ISO 2047 specifies
  graphic representations for the control characters of ISO 646. Thus
  every byte of a program can be shown, if only the ISO standards have
 |been implemented. Modern terminals like the DEC VT340 allow setting
 |the "mode" at will, and then display on the screen the text according
 |to the option chosen.

  As for specifying actions in a program regarding the output, the
  desired effect may be realized either by putting a certain character
  sequence in an output string including control functions, or by
  calling a library function. In fact, both methods should be
  available, because neither can cover all situations. There will never
  be enough library functions to create the effects to be caused by a
  specific byte sequence. On the other hand, for example, detection of a
  NEW LINE character by the operating system may not result in the
  effect intended. Requiring transfer to a new output line by CALL
  NEWLINE may cause, in a fixed length record environment, the storing
  of the current line, padded at the right by spaces up to the required
  length. In a variable length record environment, it may put the number
  of bytes in the current line before it, and store the whole. Or it may
  simply store the current line with a new line character attached at
  the end. (The use of the first byte of a record for printer control is
  not considered here for simplicity.) All this should be kept outside
  the control of the application program, and thus not defined by the
  programming language.

  2.5.4.1  Unrestricted strings

  If these problems have been sorted out, and a string is to contain
  octets only, without any restriction, regardless of their meaning in
  any code, operations on strings do not pose difficulties. A type
  "string" can be introduced, with string constants and string
  variables. A LENGTH function can be provided, substrings can be
  defined, starting from a given element number to another, and
 |concatenation is possible. The string can behave like an octet
  array. Because any stream of octets can be produced, files can be
  prepared and sent, which can be read by the recipient according to the
  rules of ISO 6937-2, or even ISO 2022. No special provision is
  required for double-octet character sets, if these are carefully
  designed like the Chinese, Japanese and Korean sets. The string
  'STANDARD' will then be printed without hesitation by a Japanese
  printer as four Japanese characters (of course with a quite different
  meaning).

  2.5.4.2  Restrictions on content of strings and their validation

  It is only if certain restrictions are put on the contents of strings
  that things become complicated again. This may happen if it is
  required that a certain string have an even length, because
  double-byte characters will be put into it. Also, restrictions may be
  introduced regarding the validity of some octets (making illegal those
  that are not "printable"). It is imaginable to define string types
  that have the desired property, and have them checked for syntax by
  the compiler. But library functions can perform the checking for
  validity on string arguments equally well, without further
  complicating the string syntax of the language.

1 2.5.4.3  The type "character"

  Several languages know a type "character" for a single byte string
  (the CHARACTER type in FORTRAN is in fact a string type). Longer
  strings may be defined as character arrays. A function ORD can be
  defined with a string argument, giving an integer value, the byte
  converted to decimal. Conversely, a function CHAR with integer
  argument may deliver a character. This scheme presupposes a single
  byte and unique character representation. A double-byte but unique
  code requires an appropriate new type, and new functions, but no
  special tricks. All others cause a mess.

  3  SORTING CONSIDERATIONS

  The topic of sorting belongs only partially to the subject of
  programming languages. It is only because some of them know the
  concept of "collating sequence" that it is dealt with here.
  Historically, it was of utmost importance that numbers could be sorted
  on base of their bit representation. Also, in the period of capital
  letters only, putting words into alphabetic sequence could be
  performed based on a collating sequence defined by the code table. But
  when requirements became more subtle (as with a telephone directory or
  a lexicon) bit patterns were only of little help. Thus the merit of
  having letters contiguously in a code table has become increasingly
  insignificant. Non-English languages may even have a varied order for
  the same accented letters, or assign letter combinations an unexpected
  place in their alphabet. Further discussion can be found in Mackenzie
  (2) and in the SEAS White Paper (3).

  One aspect should be pointed at, however. Many compilers are able to
  produce an identifier list from the program, alphabetically sorted. If
  identifiers are to contain characters other than those from ASCII, it
  is unclear how these are to be sorted. It may be that the order from
  the chosen part of ISO 8859 is kept, but that may not correspond to
 |national usage. But one should not confuse "ordering" with "sorting".
 |Only with one-case Latin letters sorting can be directly derived from
 |the ordering according to the numeric value of the codes, in all other
 |cases a key transformation is needed. If some identifiers are to be
  read from right to left (Hebrew, Arabic) more problems turn up.

  4  CONCLUSIONS

 |Many of the comments on the non-English or multiple-octet issue one
 |finds in the literature (even in ISO documents) are too imprecise or
  too incomplete to be really of use to the language standard developer.
  In the preceding lines an attempt has been made to clarify the issues
  which depend on coded character sets. The actual work of providing
  solutions has to be done by the SC22 Working Groups itself.

 |Nevertheless, some recommendations may be given for consideration
 |either by SC22, or by SC2, or by both.
 |< recommendations still under revision, will be released later >
  Annexes are not included for space reasons, and special characters.

29-May-89 16:49:25-GMT,2703;000000000001
Return-Path: <@cuvmb.cc.columbia.edu:ISO8859@JHUVM.BITNET>
Received: from cunixc.cc.columbia.edu by watsun.cc.columbia.edu (4.0/SMI-4.0)
	id AA26983; Mon, 29 May 89 12:49:23 EDT
Message-Id: <8905291649.AA26983@watsun.cc.columbia.edu>
Received: from CUVMB.COLUMBIA.EDU (cuvmb.cc.columbia.edu) by cunixc.cc.columbia.edu (5.54/5.10) id AA11207; Mon, 29 May 89 12:48:37 EDT
Received: from CUVMB.CC.COLUMBIA.EDU by CUVMB.COLUMBIA.EDU (IBM VM SMTP R1.2) with BSMTP id 0097; Mon, 29 May 89 12:48:32 EDT
Received: from PSUVM.PSU.EDU by CUVMB.CC.COLUMBIA.EDU (Mailer R2.03B) with
 BSMTP id 3204; Mon, 29 May 89 12:48:31 EDT
Received: by PSUVM (Mailer R2.03B) id 7252; Mon, 29 May 89 12:44:05 EDT
Date:         Mon, 29 May 89 17:31:00 CET
Reply-To: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
Sender: ASCII/EBCDIC character set related issues <ISO8859%JHUVM@cuvmb.cc.columbia.edu>
From: Johan van Wingen <MOSGLA%HLERUL2@cuvmb.cc.columbia.edu>
Subject:      CP850 vs ISO 8859-1
To: Frank da Cruz <SY.FDC@cunixc.cc.columbia.edu>

Date:    Mon,  1 May 89 16:15 CET
From:    "Johan van Wingen"                          <MOSGLA>
To:      "E. Hart"                            <HART@APLVM>
Subject: ISO equivalent of CP850

Dear List Subscribers
There is an issue not sufficiently discussed up to now.
It is proposed replacing CP850 by ISO 8859-1. I applaud this, because
CP850 is a miserable misconstruct. But, as with CP437, it has 256
graphic characters, where ISO 8859-1 has only 190 (SPACE not included),
with 65 positions reserved for control characters. Thus both are not
equivalent. Even if we prefer the more logical distribution of graphics
over the code page of ISO 8859 to the chaos of CP850, we have not said
anything about filling the four empty columns (0,1,8,9).
Our attempt at having a 254 graphic set as an extension of ISO 8859-1
has failed for the moment. The question is what users want:
1. An additional 64 graphics
   1.1  on the PC only
   1.2  also under VM or MVS (what to do with the controls)
2. 64 controls only, either on PC or mainframe
3. bytes interpreted as controls or as graphics, depending on "mode set"
   (this is more or less available with MS/DOS, but with CP437, CP850,
   and also with DEC VT340, not with 3270 terminals as far I know)
The question is, if we want 64 extra graphics, which should we select.
There is no guidance in any ISO standard.
I am rather concerned about this, because I am thinking on presenting
a new attempt for a 254 graphic set, and before showing it anybody,
comments would welcome contributing to its content.

FROM  J. W. van Wingen    MOSGLA@HLERUL2.BITNET
Mail to
P. O. Box 486,  2300AL Leiden, Netherlands