Path: news1.ucsd.edu!ihnp4.ucsd.edu!swrinde!news.sgi.com!news.msfc.nasa.gov!newsfeed.internetmci.com!news.fibr.net!nntp04.primenet.com!news.shkoo.com!nntp.primenet.com!news.cais.net!van-bc!unixg.ubc.ca!news.bc.net!arclight.uoregon.edu!netnews.worldnet.att.net!news.alt.net!news1.alt.net!news.aa.net!usenet
From: boutell@boutell.com (Thomas Boutell)
Newsgroups: comp.infosystems.www.servers.mac
Subject: comp.infosystems.www.servers.mac Frequently Asked Questions (FAQ)
Supersedes: <servers.mac.105@news.aa.net>
Date: 29 Jul 1996 06:57:36 GMT
Organization: Nerdsholm
Lines: 504
Distribution: world
Message-ID: <servers.mac.106@news.aa.net>
NNTP-Posting-Host: boutell.com
Keywords: FAQ



             WHAT IS THIS NEWSGROUP ABOUT? WHAT POSTS BELONG HERE?
                                       
   comp.infosystems.www.servers.mac is a forum for the discussion of
   World Wide Web servers for the Apple Macintosh. Web servers are
   programs which are used to deliver World Wide Web documents to other
   computers. If your question relates directly to Macintosh versions of
   World Wide Web servers, and is not covered in this FAQ or a document
   referenced by this FAQ, it belongs in this newsgroup.
   
   If not, consider this list of newsgroups in the comp.infosystems.www
   hierarchy and check out the most appropriate group. If possible, use
   the most specific group that relates to your topic, rather than a
   .misc group.
   
   This posting is only an excerpt from the complete WWW FAQ. See the
   next section for information on accessing the complete FAQ once you
   have web access.
     * comp.infosystems.www.authoring.cgi
     * comp.infosystems.www.authoring.html
     * comp.infosystems.www.authoring.images
     * comp.infosystems.www.authoring.misc
     * comp.infosystems.www.browsers.misc
     * comp.infosystems.www.browsers.ms-windows
     * comp.infosystems.www.browsers.x
     * comp.infosystems.www.browsers.mac
     * comp.infosystems.www.servers.mac
     * comp.infosystems.www.servers.misc
     * comp.infosystems.www.servers.ms-windows
     * comp.infosystems.www.servers.unix
     * comp.infosystems.www.misc
       
   

                         ABOUT THE WORLD WIDE WEB FAQ
                                       
   The World Wide Web Frequently Asked Questions (FAQ) is intended to
   answer the most common questions about the web.
   
   The FAQ is maintained by by Thomas Boutell
   <URL:http://www.boutell.com/boutell/>. Copyright 1994, 1995, 1996 by
   Thomas Boutell and Boutell.Com, Inc.
   
   The complete FAQ is available from several sites. If you can, you will
   want to access it through the web. Use the site closest to you in the
   language you prefer (non-English sites are marked);
   
     * Boutell.Com, Inc., western United States (North America):
       <URL:http://www.boutell.com/faq/>
     * DBasics Software Company, western United States (North America):
       <URL:http://www.dbasic.com/users_group/wwwfaq>
     * Compusult Inc., California, USA (North America):
       <URL:http://www.compusult.nf.ca/WWW_FAQ/index.htm>
     * Seton Hall University, eastern United States (North America):
       <URL:http://www.shu.edu/about/WWWFaq/>
     * United States Military Academy, West Point (North America):
       <URL:http://www.usma.edu/mirror/WWW/faq/>
     * Oxford University, UK (Europe):
       <URL:http://info.ox.ac.uk/help/wwwfaq/index.html>
     * Poznan University of Technology, Poznan, Poland (Europe, in
       Polish):
       <URL:http://www.put.poznan.pl/hypertext/Internet/faq/www/www_pl.ht
       m>
     * Poznan University of Technology, Poznan, Poland (Europe, in
       English):
       <URL:http://www.put.poznan.pl/hypertext/Internet/faq/www/www_en.ht
       m>
     * New Software Technologies Service, Austria (Europe):
       <URL:http://nswt.tuwien.ac.at:8000/htdocs/boutell/>
     * Astronomical Observatory of Padova, Italy (Europe):
       <URL:http://www.pd.astro.it/faqes/www/>
     * University of Jan Evangelista Purkyne, Czech Republic (Europe):
       <URL:http://sun.ujep.cz/wwwfaq/>
     * University of Oviedo, Spain (Europe):
       <URL:http://www3.uniovi.es/~rivero/WWW/faq/>
     * Glocom, Japan (Asia):
       <URL:http://www.glocom.ac.jp/mirror/sunsite.unc.edu/boutell/faq/>
     * The University of Melbourne (Australia/Pacific):
       <URL:http://www.unimelb.edu.au/public/www-faq/>
     * Telstra Corporation, Australia (Australia/Pacific):
       <URL:http://www.telstra.com.au/docs/www-faq/>
     * Internex Online, Toronto, Canada (North America):
       <URL:http://www.io.org/faq/www/>
     * Communications Vir, Montreal, Canada (North America):
       <URL:http://www.vir.com/WWWfaq/index.html>
     * Community Access Canada, University of New Brunswick, Canada
       (North America): <URL:http://cnet.unb.ca/www/faq/>
     * Island Internet, British Columbia, Canada (North America):
       <URL:http://www.island.net/help/faq/www_faq/>
     * Acer Inc., Taipei, Taiwan (Asia, in Chinese):
       <URL:http://www.acer.net/document/cwwwfaq/>
     * Academia Sinica, Taipei, Taiwan (Asia):
       <URL:http://www.sinica.edu.tw/www/faq/boutell/index.htm>
     * Fraunhofer Institute for Computer Graphics, Darmstadt, Germany:
       <URL:http://www.igd.fhg.de/www/documents/servers/mirrors/www-faq/>
       
     * Mikomtek, CSIR (South Africa):
       <URL:http://www.mikom.csir.co.za/faq/www/index.htm> 
     * Michael Babcock at www.feldspar.com (Ontario, Canada):
       <URL:http://www.feldspar.com/~mbabcock/WWW_FAQ/>
       
   

                   HOW CAN I PROVIDE INFORMATION TO THE WEB?
                                       
   
   
   Information providers run programs that the browsers can obtain
   hypertext from. These programs can either be WWW servers that
   understand the HyperText Transfer Protocol HTTP (best if you are
   creating your information database from scratch), "gateway" programs
   that convert an existing information format to hypertext, or a
   non-HTTP server that WWW browsers can access -- anonymous FTP or
   gopher, for example.
   
   To learn more about World Wide Web servers, see the server section.
   You can also consult a www server primer by Nathan Torkington,
   available at the URL
   http://www.vuw.ac.nz/who/Nathan.Torkington/ideas/www-servers.html .
   
   If you only want to provide information to local users, placing your
   information in local files is also an option. This means, however,
   that there can be no off-machine access.
   
   

                               MACINTOSH SERVERS
                                       
   WebSTAR
          WebSTAR is an "industrial-strength" commercial World Wide Web
          server from StarNine, Inc. (URL is
          <URL:http://www.starnine.com/> ).
          
   MacHTTP
          MacHTTP <URL:http://www.starnine.com/machttp/machttpsoft.html>
          is a freely available web server for the Macintosh. There is
          also a Frequently Asked Questions posting dedicated to MacHTTP:
          <URL:http://arpp1.carleton.ca/machttp/doc/>
          
   Mac Common Lisp Server
          A server written in Mac Common Lisp (URL is
          <URL:http://www.ai.mit.edu/projects/iiip/doc/cl-http/home-page.
          html> ) is now available. The Mac Common Lisp server supports
          extension of the server with object-oriented Lisp code and is
          freely available, including source.
          
   http4mac
          http4mac is a simple, free web server for the Macintosh.
          <URL:http://130.246.18.52/>
          
   NetPresenz
          NetPresenz is a very inexpensive package for the Macintosh that
          is capable of serving three protocols: FTP, HTTP, and gopher.
          CGI programming and other new features have been added
          recently. Formerly known as FTPd.
          <URL:http://www.share.com/peterlewis/>
          
   InterServer Publisher
          <URL:http://www.intercon.com/newpi/InterServerP.html>, is a
          commercial web, FTP, and gopher server for the Macintosh. It
          emphasizes ease of configuration but also supports
          configuration through AppleScript. The server also offers a
          server-side HTML extension which supports hit counters, image
          maps, and directory listings as standard features. A 30-day
          demo is available by anonymous ftp from ftp.intercon.com in the
          /intercon/sales/Mac/Demo_Software/ directory.
          
   Enhanced Mosaic
          Enhanced Mosaic, from Spyglass, Incorporated, is the commercial
          version of NCSA Mosaic. Spyglass does sell the browser directly
          to the public, although you can download an evaluation version
          to try it out; instead, they seek to license it to various
          OEMs. You can learn more about their licensing arrangements and
          the existing licensees from the Spyglass home page (URL is
          <URL:http://www.spyglass.com/> ).
          
   Common Lisp Hypermedia Server (CL-HTTP)
          The CL-HTTP server
          <URL:http://www.ai.mit.edu/projects/iiip/doc/cl-http/server.htm
          l> is a web server written entirely in Common Lisp. It is
          available on many platforms, and can be programmed at a
          remarkably high level, using Lisp code to generate much of the
          output of the server. An interesting option when development
          time is limited.
          
   

                  HOW FAST DOES MY NET CONNECTION NEED TO BE?
                                       
   The following response to this very-frequently-asked-question was
   provided by Mike Meyer (mwm@contessa.phone.net).
   
     The answer is "It depends." What it depends on is what kind of
     things you want to provide on your server. Here are some rules of
     thumb to use when deciding what kind of connection you need for your
     server.
     
     The first rule of thumb is:
     
     _Don't worry about simultaneous access at first._
     
     The first thing to do is make sure you've got enough bandwidth to
     send the objects you want to send in a reasonable time. That
     provides a lower bound on your line speed no matter what level of
     traffic you have.
     
     The second rule of thumb is:
     
     _It should take at most 5 seconds to send a page._
     
     The five second rule dates from command line days, when that was
     about how long people would wait before getting impatient with the
     system. It seems like a reasonable number to use now.
     
     Since external images/audio/etc. are somewhat exceptional, allow
     more time for them. If you think they should have the same
     restrictions as above, buy the bandwidth your site will need to do
     so. However, the rule of thumb for external images/audio/etc is:
     
     _It should take at most 30 seconds to send an external file._
     
     Given these rules, it's pretty straightforward to work out how large
     an HTML page and external files can be. At least, it's easy after
     you simplify things by ignoring IP overhead on the line, compression
     on modem lines, and anything that's less than 10% of the total (or
     even a little bit more than 10%).
     
     The one simplification not to ignore is the multiple packet
     round-trips it takes to get data flowing through an HTTP channel.
     For modem lines, this is nearly a second for each HTTP connection,
     which is significant. For leased lines, it's more like .1 or .2
     seconds, which is not significant.
     
     On a 14.4 line assumed to be sending 1.4K bytes of data/second, with
     a 1 second startup, you get 4 * 1.4 or 5.6K of HTML. If you want to
     include a single inline image, that's 2 seconds of startup, so
     you're down to 3 * 1.4 or 4.2K of HTML + image. This means smallish
     HTML pages, and simple inline images. For external files, you get 29
     * 1.4 or 40K, which is still a small image. If you have a 28.8 line,
     you get to double those figures; for a 9600 line, figure 2/3rds of
     that size.
     
     On a 56K leased line assumed to be sending 5K/second, you get 25K of
     HTML, or mixed HTML/data. For external images, it's 150K. That
     should cover any reasonable HTML document, and small to medium
     external files. An MPEG movie might be a bit much.
     
     With a T1 line assumed to be sending 150K/second, you get 750K of
     HTML, or 4.5 megabytes in an external file. Barring very large
     animations, this should be sufficient for anything you want to
     serve. More would be faster, but it also gets drastically more
     expensive.
     
     Now that you know the minimum bandwidth to deliver a single object
     in a timely fashion, let's consider the total throughput of your
     site. The maximum throughput is about 118 megaybtes for a 14.4 modem
     line, 422 megabytes for a 56K line and 12 gigabytes for a T1 line.
     
     Now look at the total bandwidth you are going to use. Don't forget
     that things other than the HTTP server will be using the line, and
     some of them may require more bandwidth than the server. If you need
     more than 100% of the available bandwidth, you have to buy more
     bandwidth. If you need more than 50% of that bandwidth, you should
     probably buy more bandwidth. If you need less than 10% of the
     bandwidth, you are fine.
     
     To plug in some sample numbers, assume the average size of served
     objects is 20K. Rounding to the nearest hundred or thousand in all
     cases, we find that you are fine up to 600 access/day on a 14.4
     line, and acceptable up to 3,000. For a 56K line, that's 2,300 and
     11,500. For a T1, that's 63,000 and 315,000 access/day. If your
     document sizes are smaller - which is likely - multiply the numbers
     by the appropriate factor.
     
     As a final note, people working well below the 50% limit for a T1
     have encountered problems with the server platform. Usually, this is
     caused by the HTTP server software encountering some system limit.
     If you are working with servers in these ranges, you need to
     consider server platform as well.
     
   

              HOW CAN I MAKE MY WEB SITE SEARCHABLE BY THE USER?
                                       
   Both free and commercial tools are available for this task. A brief
   list of such tools follows. Thanks to John K. Hinsdale for
   contributing the original list.
   
  Free Web Site Search Engines
  
   freeWAIS-sf
          The well-known freeWAIS-sf engine offers an HTTP front end,
          sf-gate, with which users can explore indexed documents on your
          site.
          <URL:http://ls6-www.informatik.uni-dortmund.de/freeWAIS-sf/free
          WAIS-sf.html>
          
   glimpse
          From the University of Arizona, the glimpse engine can be used
          to easily search large numbers of HTML documents.
          <URL:http://glimpse.cs.arizona.edu:1994/index.html>
          
   Harvest
          Harvest, from the University of Colorado, is a powerful but
          somewhat complex information search and replication system.
          Used properly, Harvest can be a powerful tool to distribute
          your documents. <URL:http://harvest.cs.colorado.edu>
          
  Commercial Search Engines (Some Available Free)
  
   Excerpt
          From Alma Mater Software. An off-the-shelf indexer for SunOS
          machines. Includes web-based forms. <URL:http://www.alma.com/>
          
   Excite
          From ArchiText, Excite is expressly designed to add
          straightforward searching capabilities to existing web sites.
          <URL:http://www.excite.com/navigate>
          
   Topic
          From Verity, Inc. Topic indexes documents in a high-level
          fashion by "concept." <URL:http://www.verity.com/>
          
   WAIS
          From America Online, WAIS is a modern commercial verison of the
          original WAIS system, one of the first indexing systems of this
          type. <URL:http://www.wais.com/>
          
   

       HOW CAN I SERVE [WORD DOCUMENTS, EXCEL SPREADSHEETS, DOUGHNUTS]?
                                       
   In order to deliver documents of new and different types from your
   server, you need to configure the correct "content type" for each type
   of document, and use the proper extension when naming the file on the
   server. If the document type is highly unusual, you will also need to
   see to it that users know what content type to configure their
   browsers for, and what application to launch for that content type.
   
   Presented below is a list of the better-known content types with
   commentary on those the author is familiar with. This information is
   drawn from appendix 2 of the author's book, CGI Programming in C and
   Perl <URL:http://www.boutell.com/cgibook/>. The original list of
   content types was taken from the public domain NCSA web server
   <URL:http://hoohoo.ncsa.uiuc.edu/>.
   
   Please note: new media types are coming into existence regularly. The
   official registry is often well behind actual practice. This list is
   based on that included with NCSA's public domain web server as of
   September 1995.
   
   No attempt is made here to document the format of the data associated
   with these mime types. This list is intended to make it easier to
   determine what content type should be assigned to documents produced
   by various well-known applications.

Media Content Type                      Comments

application/activemessage
application/andrew-inset
application/applefile
application/atomicmail
application/dca-rft
application/dec-dx
application/mac-binhex40
application/macwriteii                  MacWrite Document
application/msword                      Microsoft Word Document
application/news-message-id
application/news-transmission
application/octet-stream                Use for binary file downloads
application/oda
application/pdf                         Adobe Acrobat Documents
application/postscript                  Postscript
application/remote-printing
application/rtf                         Rich Text Format
application/slate
application/x-mif
application/wita
application/wordperfect5.1              WordPerfect 5.1 Documents
application/wordperfect6.0              WordPerfect 6.0 Documents
application/x-csh                       Potentially dangerous [1]
application/x-dvi                       TeX/LaTeX Output (not TeX source)
application/x-hdf
application/x-latex                     LaTeX Source
application/x-netcdf
application/x-sh                        Potentially dangerous [1]
application/x-tcl                       Potentially dangerous [1]
application/x-tex                       TeX Source
application/x-texinfo
application/x-troff                     Troff Formatter Source
application/x-troff-man                 Troff Source, -man argument assumed
application/x-troff-me                  Troff Source, -me argument assumed

application/x-troff-ms                  Troff Source, -ms argument assumed
application/x-wais-source
application/zip                         Many users have ZIP helper apps
application/x-bcpio
application/x-cpio                      cpio tape format (Unix)
application/x-gtar                      gnu tar tape format (Unix)
application/x-shar                      Potentially dangerous [1]
application/x-sv4cpio
application/x-sv4crc
application/x-ustar
audio/basic                             Sun-style .au format audio
audio/x-aiff                            Amiga-format .aiff audio
audio/x-wav                             Microsoft Windows-format .wav audio
image/gif                               Compuserve GIF 8-bit lossless images
image/ief
image/jpeg                              JPEG lossy photographic images
image/png                               w3 consortium PNG lossless images
image/tiff                              TIFF format images
image/x-cmu-raster
image/x-portable-anymap                 netpbm/pbmplus images (any subtype)
image/x-portable-bitmap                 netpbm/pbmplus black and white images
image/x-portable-graymap                netpbm/pbmplus grayscale images
image/x-portable-pixmap                 netpbm/pbmplus truecolor images
image/x-rgb
image/x-xbitmap                         X Window System black and white images
image/x-xpixmap                         X Window System color images
image/x-xwindowdump                     X Window System screen dump format
message/external-body
message/news
message/partial
message/rfc822
multipart/alternative
multipart/appledouble
multipart/digest
multipart/mixed                         Server push
multipart/parallel
text/html                               HTML documents
text/x-sgml                             SGML documents, not limited to HTML
text/plain                              Plain ASCII text
text/richtext                           This is not RTF (see above)
text/tab-separated-values               Useful for spreadsheet interchange
text/x-setext
video/mpeg                              MPEG video format; common on PCs, Unix
video/quicktime                         Apple video format
video/x-msvideo                         Microsoft/Intel AVI video format
video/x-sgi-movie

   [1]: Browsers should almost never be configured to execute shell
   scripts. This is a dangerous practice as the script in question could
   simply consist of rm * or another harmful command. Those interested in
   sending code to the browser should consider safe scripting languages
   such as Java, Safe-TCL and PGP-SafePerl. 

                     HOW CAN I KEEP ROBOTS OFF MY SERVER?
                                       
   Programs that automatically traverse the web can be quite useful, but
   have the potential to make a serious mess of things. Every so often
   someone will write a "depth-first" searching robot that brings servers
   to their knees. See the section on writing robots for details.
   
   Fortunately, most robots on the web follow a simple protocol by which
   you can keep them off your server if you wish, or keep them out of
   portions of your server which are robot traps (ie, they contain an
   infinite number of possible links). Read the document World Wide Web
   Robots, Wanderers and Spiders (URL is
   <URL:http://web.nexor.co.uk/mak/doc/robots/robots.html> ) and learn
   about the emerging standards for exclusion of robots from areas in
   which they are not wanted. You can also read about existing robots
   there, including useful cataloging robots you probably do _not_ want
   to keep off your server.

            HEY, I KNOW, I'LL WRITE A WWW-EXPLORING ROBOT! WHY NOT?
                                       
   Programs that automatically traverse the web can be quite useful, but
   have the potential to make a serious mess of things. Robots have been
   written which do a "breadth-first" search of the web, exploring many
   sites in a gradual fashion instead of aggressively "rooting out" the
   pages of one site at a time. Some of these robots now produce
   excellent indexes of information available on the web.
   
   But others have written simple depth-first searches which, at the
   worst, can bring servers to their knees in minutes by recursively
   downloading information from CGI script-based pages that contain an
   infinite number of possible links. (Often robots can't realize this!)
   Imagine what happens when a robot decides to "index" the CONTENTS of
   several hundred mpeg movies. Shudder.
   
   The moral: a robot that does what you want may already exist; if it
   doesn't, please study the document World Wide Web Robots, Wanderers
   and Spiders (URL is: http://web.nexor.co.uk/mak/doc/robots/robots.html
   ) and learn about the emerging standards for exclusion of robots from
   areas in which they are not wanted. You can also read about existing
   robots there.

                                    CREDITS
                                       
   Copyright 1994, 1995, 1996 by Thomas Boutell and Boutell.Com, Inc.
   
   Maintainer (11/93 to present): Thomas Boutell, _<boutell@boutell.com>_
   
   
   Former Maintainer (until 11/93): Nathan Torkington,
   _<Nathan.Torkington@vuw.ac.nz>_

         HOW CAN TWO DIFFERENT HOME PAGES SHARE ONE PHYSICAL MACHINE?
                                       
   Dan Pritchett maintains a document detailing the process of running
   two or more servers on the same machine without end users being able
   to tell the difference (URL is
   <URL:http://www.thesphere.com/~dlp/TwoServers/> ).