Path: news1.ucsd.edu!ihnp4.ucsd.edu!swrinde!news.sgi.com!news.msfc.nasa.gov!newsfeed.internetmci.com!news.fibr.net!nntp04.primenet.com!news.shkoo.com!nntp.primenet.com!news.cais.net!van-bc!unixg.ubc.ca!news.bc.net!arclight.uoregon.edu!netnews.worldnet.att.net!news.alt.net!news1.alt.net!news.aa.net!usenet From: boutell@boutell.com (Thomas Boutell) Newsgroups: comp.infosystems.www.servers.mac Subject: comp.infosystems.www.servers.mac Frequently Asked Questions (FAQ) Supersedes: <servers.mac.105@news.aa.net> Date: 29 Jul 1996 06:57:36 GMT Organization: Nerdsholm Lines: 504 Distribution: world Message-ID: <servers.mac.106@news.aa.net> NNTP-Posting-Host: boutell.com Keywords: FAQ WHAT IS THIS NEWSGROUP ABOUT? WHAT POSTS BELONG HERE? comp.infosystems.www.servers.mac is a forum for the discussion of World Wide Web servers for the Apple Macintosh. Web servers are programs which are used to deliver World Wide Web documents to other computers. If your question relates directly to Macintosh versions of World Wide Web servers, and is not covered in this FAQ or a document referenced by this FAQ, it belongs in this newsgroup. If not, consider this list of newsgroups in the comp.infosystems.www hierarchy and check out the most appropriate group. If possible, use the most specific group that relates to your topic, rather than a .misc group. This posting is only an excerpt from the complete WWW FAQ. See the next section for information on accessing the complete FAQ once you have web access. * comp.infosystems.www.authoring.cgi * comp.infosystems.www.authoring.html * comp.infosystems.www.authoring.images * comp.infosystems.www.authoring.misc * comp.infosystems.www.browsers.misc * comp.infosystems.www.browsers.ms-windows * comp.infosystems.www.browsers.x * comp.infosystems.www.browsers.mac * comp.infosystems.www.servers.mac * comp.infosystems.www.servers.misc * comp.infosystems.www.servers.ms-windows * comp.infosystems.www.servers.unix * comp.infosystems.www.misc ABOUT THE WORLD WIDE WEB FAQ The World Wide Web Frequently Asked Questions (FAQ) is intended to answer the most common questions about the web. The FAQ is maintained by by Thomas Boutell <URL:http://www.boutell.com/boutell/>. Copyright 1994, 1995, 1996 by Thomas Boutell and Boutell.Com, Inc. The complete FAQ is available from several sites. If you can, you will want to access it through the web. Use the site closest to you in the language you prefer (non-English sites are marked); * Boutell.Com, Inc., western United States (North America): <URL:http://www.boutell.com/faq/> * DBasics Software Company, western United States (North America): <URL:http://www.dbasic.com/users_group/wwwfaq> * Compusult Inc., California, USA (North America): <URL:http://www.compusult.nf.ca/WWW_FAQ/index.htm> * Seton Hall University, eastern United States (North America): <URL:http://www.shu.edu/about/WWWFaq/> * United States Military Academy, West Point (North America): <URL:http://www.usma.edu/mirror/WWW/faq/> * Oxford University, UK (Europe): <URL:http://info.ox.ac.uk/help/wwwfaq/index.html> * Poznan University of Technology, Poznan, Poland (Europe, in Polish): <URL:http://www.put.poznan.pl/hypertext/Internet/faq/www/www_pl.ht m> * Poznan University of Technology, Poznan, Poland (Europe, in English): <URL:http://www.put.poznan.pl/hypertext/Internet/faq/www/www_en.ht m> * New Software Technologies Service, Austria (Europe): <URL:http://nswt.tuwien.ac.at:8000/htdocs/boutell/> * Astronomical Observatory of Padova, Italy (Europe): <URL:http://www.pd.astro.it/faqes/www/> * University of Jan Evangelista Purkyne, Czech Republic (Europe): <URL:http://sun.ujep.cz/wwwfaq/> * University of Oviedo, Spain (Europe): <URL:http://www3.uniovi.es/~rivero/WWW/faq/> * Glocom, Japan (Asia): <URL:http://www.glocom.ac.jp/mirror/sunsite.unc.edu/boutell/faq/> * The University of Melbourne (Australia/Pacific): <URL:http://www.unimelb.edu.au/public/www-faq/> * Telstra Corporation, Australia (Australia/Pacific): <URL:http://www.telstra.com.au/docs/www-faq/> * Internex Online, Toronto, Canada (North America): <URL:http://www.io.org/faq/www/> * Communications Vir, Montreal, Canada (North America): <URL:http://www.vir.com/WWWfaq/index.html> * Community Access Canada, University of New Brunswick, Canada (North America): <URL:http://cnet.unb.ca/www/faq/> * Island Internet, British Columbia, Canada (North America): <URL:http://www.island.net/help/faq/www_faq/> * Acer Inc., Taipei, Taiwan (Asia, in Chinese): <URL:http://www.acer.net/document/cwwwfaq/> * Academia Sinica, Taipei, Taiwan (Asia): <URL:http://www.sinica.edu.tw/www/faq/boutell/index.htm> * Fraunhofer Institute for Computer Graphics, Darmstadt, Germany: <URL:http://www.igd.fhg.de/www/documents/servers/mirrors/www-faq/> * Mikomtek, CSIR (South Africa): <URL:http://www.mikom.csir.co.za/faq/www/index.htm> * Michael Babcock at www.feldspar.com (Ontario, Canada): <URL:http://www.feldspar.com/~mbabcock/WWW_FAQ/> HOW CAN I PROVIDE INFORMATION TO THE WEB? Information providers run programs that the browsers can obtain hypertext from. These programs can either be WWW servers that understand the HyperText Transfer Protocol HTTP (best if you are creating your information database from scratch), "gateway" programs that convert an existing information format to hypertext, or a non-HTTP server that WWW browsers can access -- anonymous FTP or gopher, for example. To learn more about World Wide Web servers, see the server section. You can also consult a www server primer by Nathan Torkington, available at the URL http://www.vuw.ac.nz/who/Nathan.Torkington/ideas/www-servers.html . If you only want to provide information to local users, placing your information in local files is also an option. This means, however, that there can be no off-machine access. MACINTOSH SERVERS WebSTAR WebSTAR is an "industrial-strength" commercial World Wide Web server from StarNine, Inc. (URL is <URL:http://www.starnine.com/> ). MacHTTP MacHTTP <URL:http://www.starnine.com/machttp/machttpsoft.html> is a freely available web server for the Macintosh. There is also a Frequently Asked Questions posting dedicated to MacHTTP: <URL:http://arpp1.carleton.ca/machttp/doc/> Mac Common Lisp Server A server written in Mac Common Lisp (URL is <URL:http://www.ai.mit.edu/projects/iiip/doc/cl-http/home-page. html> ) is now available. The Mac Common Lisp server supports extension of the server with object-oriented Lisp code and is freely available, including source. http4mac http4mac is a simple, free web server for the Macintosh. <URL:http://130.246.18.52/> NetPresenz NetPresenz is a very inexpensive package for the Macintosh that is capable of serving three protocols: FTP, HTTP, and gopher. CGI programming and other new features have been added recently. Formerly known as FTPd. <URL:http://www.share.com/peterlewis/> InterServer Publisher <URL:http://www.intercon.com/newpi/InterServerP.html>, is a commercial web, FTP, and gopher server for the Macintosh. It emphasizes ease of configuration but also supports configuration through AppleScript. The server also offers a server-side HTML extension which supports hit counters, image maps, and directory listings as standard features. A 30-day demo is available by anonymous ftp from ftp.intercon.com in the /intercon/sales/Mac/Demo_Software/ directory. Enhanced Mosaic Enhanced Mosaic, from Spyglass, Incorporated, is the commercial version of NCSA Mosaic. Spyglass does sell the browser directly to the public, although you can download an evaluation version to try it out; instead, they seek to license it to various OEMs. You can learn more about their licensing arrangements and the existing licensees from the Spyglass home page (URL is <URL:http://www.spyglass.com/> ). Common Lisp Hypermedia Server (CL-HTTP) The CL-HTTP server <URL:http://www.ai.mit.edu/projects/iiip/doc/cl-http/server.htm l> is a web server written entirely in Common Lisp. It is available on many platforms, and can be programmed at a remarkably high level, using Lisp code to generate much of the output of the server. An interesting option when development time is limited. HOW FAST DOES MY NET CONNECTION NEED TO BE? The following response to this very-frequently-asked-question was provided by Mike Meyer (mwm@contessa.phone.net). The answer is "It depends." What it depends on is what kind of things you want to provide on your server. Here are some rules of thumb to use when deciding what kind of connection you need for your server. The first rule of thumb is: _Don't worry about simultaneous access at first._ The first thing to do is make sure you've got enough bandwidth to send the objects you want to send in a reasonable time. That provides a lower bound on your line speed no matter what level of traffic you have. The second rule of thumb is: _It should take at most 5 seconds to send a page._ The five second rule dates from command line days, when that was about how long people would wait before getting impatient with the system. It seems like a reasonable number to use now. Since external images/audio/etc. are somewhat exceptional, allow more time for them. If you think they should have the same restrictions as above, buy the bandwidth your site will need to do so. However, the rule of thumb for external images/audio/etc is: _It should take at most 30 seconds to send an external file._ Given these rules, it's pretty straightforward to work out how large an HTML page and external files can be. At least, it's easy after you simplify things by ignoring IP overhead on the line, compression on modem lines, and anything that's less than 10% of the total (or even a little bit more than 10%). The one simplification not to ignore is the multiple packet round-trips it takes to get data flowing through an HTTP channel. For modem lines, this is nearly a second for each HTTP connection, which is significant. For leased lines, it's more like .1 or .2 seconds, which is not significant. On a 14.4 line assumed to be sending 1.4K bytes of data/second, with a 1 second startup, you get 4 * 1.4 or 5.6K of HTML. If you want to include a single inline image, that's 2 seconds of startup, so you're down to 3 * 1.4 or 4.2K of HTML + image. This means smallish HTML pages, and simple inline images. For external files, you get 29 * 1.4 or 40K, which is still a small image. If you have a 28.8 line, you get to double those figures; for a 9600 line, figure 2/3rds of that size. On a 56K leased line assumed to be sending 5K/second, you get 25K of HTML, or mixed HTML/data. For external images, it's 150K. That should cover any reasonable HTML document, and small to medium external files. An MPEG movie might be a bit much. With a T1 line assumed to be sending 150K/second, you get 750K of HTML, or 4.5 megabytes in an external file. Barring very large animations, this should be sufficient for anything you want to serve. More would be faster, but it also gets drastically more expensive. Now that you know the minimum bandwidth to deliver a single object in a timely fashion, let's consider the total throughput of your site. The maximum throughput is about 118 megaybtes for a 14.4 modem line, 422 megabytes for a 56K line and 12 gigabytes for a T1 line. Now look at the total bandwidth you are going to use. Don't forget that things other than the HTTP server will be using the line, and some of them may require more bandwidth than the server. If you need more than 100% of the available bandwidth, you have to buy more bandwidth. If you need more than 50% of that bandwidth, you should probably buy more bandwidth. If you need less than 10% of the bandwidth, you are fine. To plug in some sample numbers, assume the average size of served objects is 20K. Rounding to the nearest hundred or thousand in all cases, we find that you are fine up to 600 access/day on a 14.4 line, and acceptable up to 3,000. For a 56K line, that's 2,300 and 11,500. For a T1, that's 63,000 and 315,000 access/day. If your document sizes are smaller - which is likely - multiply the numbers by the appropriate factor. As a final note, people working well below the 50% limit for a T1 have encountered problems with the server platform. Usually, this is caused by the HTTP server software encountering some system limit. If you are working with servers in these ranges, you need to consider server platform as well. HOW CAN I MAKE MY WEB SITE SEARCHABLE BY THE USER? Both free and commercial tools are available for this task. A brief list of such tools follows. Thanks to John K. Hinsdale for contributing the original list. Free Web Site Search Engines freeWAIS-sf The well-known freeWAIS-sf engine offers an HTTP front end, sf-gate, with which users can explore indexed documents on your site. <URL:http://ls6-www.informatik.uni-dortmund.de/freeWAIS-sf/free WAIS-sf.html> glimpse From the University of Arizona, the glimpse engine can be used to easily search large numbers of HTML documents. <URL:http://glimpse.cs.arizona.edu:1994/index.html> Harvest Harvest, from the University of Colorado, is a powerful but somewhat complex information search and replication system. Used properly, Harvest can be a powerful tool to distribute your documents. <URL:http://harvest.cs.colorado.edu> Commercial Search Engines (Some Available Free) Excerpt From Alma Mater Software. An off-the-shelf indexer for SunOS machines. Includes web-based forms. <URL:http://www.alma.com/> Excite From ArchiText, Excite is expressly designed to add straightforward searching capabilities to existing web sites. <URL:http://www.excite.com/navigate> Topic From Verity, Inc. Topic indexes documents in a high-level fashion by "concept." <URL:http://www.verity.com/> WAIS From America Online, WAIS is a modern commercial verison of the original WAIS system, one of the first indexing systems of this type. <URL:http://www.wais.com/> HOW CAN I SERVE [WORD DOCUMENTS, EXCEL SPREADSHEETS, DOUGHNUTS]? In order to deliver documents of new and different types from your server, you need to configure the correct "content type" for each type of document, and use the proper extension when naming the file on the server. If the document type is highly unusual, you will also need to see to it that users know what content type to configure their browsers for, and what application to launch for that content type. Presented below is a list of the better-known content types with commentary on those the author is familiar with. This information is drawn from appendix 2 of the author's book, CGI Programming in C and Perl <URL:http://www.boutell.com/cgibook/>. The original list of content types was taken from the public domain NCSA web server <URL:http://hoohoo.ncsa.uiuc.edu/>. Please note: new media types are coming into existence regularly. The official registry is often well behind actual practice. This list is based on that included with NCSA's public domain web server as of September 1995. No attempt is made here to document the format of the data associated with these mime types. This list is intended to make it easier to determine what content type should be assigned to documents produced by various well-known applications. Media Content Type Comments application/activemessage application/andrew-inset application/applefile application/atomicmail application/dca-rft application/dec-dx application/mac-binhex40 application/macwriteii MacWrite Document application/msword Microsoft Word Document application/news-message-id application/news-transmission application/octet-stream Use for binary file downloads application/oda application/pdf Adobe Acrobat Documents application/postscript Postscript application/remote-printing application/rtf Rich Text Format application/slate application/x-mif application/wita application/wordperfect5.1 WordPerfect 5.1 Documents application/wordperfect6.0 WordPerfect 6.0 Documents application/x-csh Potentially dangerous [1] application/x-dvi TeX/LaTeX Output (not TeX source) application/x-hdf application/x-latex LaTeX Source application/x-netcdf application/x-sh Potentially dangerous [1] application/x-tcl Potentially dangerous [1] application/x-tex TeX Source application/x-texinfo application/x-troff Troff Formatter Source application/x-troff-man Troff Source, -man argument assumed application/x-troff-me Troff Source, -me argument assumed application/x-troff-ms Troff Source, -ms argument assumed application/x-wais-source application/zip Many users have ZIP helper apps application/x-bcpio application/x-cpio cpio tape format (Unix) application/x-gtar gnu tar tape format (Unix) application/x-shar Potentially dangerous [1] application/x-sv4cpio application/x-sv4crc application/x-ustar audio/basic Sun-style .au format audio audio/x-aiff Amiga-format .aiff audio audio/x-wav Microsoft Windows-format .wav audio image/gif Compuserve GIF 8-bit lossless images image/ief image/jpeg JPEG lossy photographic images image/png w3 consortium PNG lossless images image/tiff TIFF format images image/x-cmu-raster image/x-portable-anymap netpbm/pbmplus images (any subtype) image/x-portable-bitmap netpbm/pbmplus black and white images image/x-portable-graymap netpbm/pbmplus grayscale images image/x-portable-pixmap netpbm/pbmplus truecolor images image/x-rgb image/x-xbitmap X Window System black and white images image/x-xpixmap X Window System color images image/x-xwindowdump X Window System screen dump format message/external-body message/news message/partial message/rfc822 multipart/alternative multipart/appledouble multipart/digest multipart/mixed Server push multipart/parallel text/html HTML documents text/x-sgml SGML documents, not limited to HTML text/plain Plain ASCII text text/richtext This is not RTF (see above) text/tab-separated-values Useful for spreadsheet interchange text/x-setext video/mpeg MPEG video format; common on PCs, Unix video/quicktime Apple video format video/x-msvideo Microsoft/Intel AVI video format video/x-sgi-movie [1]: Browsers should almost never be configured to execute shell scripts. This is a dangerous practice as the script in question could simply consist of rm * or another harmful command. Those interested in sending code to the browser should consider safe scripting languages such as Java, Safe-TCL and PGP-SafePerl. HOW CAN I KEEP ROBOTS OFF MY SERVER? Programs that automatically traverse the web can be quite useful, but have the potential to make a serious mess of things. Every so often someone will write a "depth-first" searching robot that brings servers to their knees. See the section on writing robots for details. Fortunately, most robots on the web follow a simple protocol by which you can keep them off your server if you wish, or keep them out of portions of your server which are robot traps (ie, they contain an infinite number of possible links). Read the document World Wide Web Robots, Wanderers and Spiders (URL is <URL:http://web.nexor.co.uk/mak/doc/robots/robots.html> ) and learn about the emerging standards for exclusion of robots from areas in which they are not wanted. You can also read about existing robots there, including useful cataloging robots you probably do _not_ want to keep off your server. HEY, I KNOW, I'LL WRITE A WWW-EXPLORING ROBOT! WHY NOT? Programs that automatically traverse the web can be quite useful, but have the potential to make a serious mess of things. Robots have been written which do a "breadth-first" search of the web, exploring many sites in a gradual fashion instead of aggressively "rooting out" the pages of one site at a time. Some of these robots now produce excellent indexes of information available on the web. But others have written simple depth-first searches which, at the worst, can bring servers to their knees in minutes by recursively downloading information from CGI script-based pages that contain an infinite number of possible links. (Often robots can't realize this!) Imagine what happens when a robot decides to "index" the CONTENTS of several hundred mpeg movies. Shudder. The moral: a robot that does what you want may already exist; if it doesn't, please study the document World Wide Web Robots, Wanderers and Spiders (URL is: http://web.nexor.co.uk/mak/doc/robots/robots.html ) and learn about the emerging standards for exclusion of robots from areas in which they are not wanted. You can also read about existing robots there. CREDITS Copyright 1994, 1995, 1996 by Thomas Boutell and Boutell.Com, Inc. Maintainer (11/93 to present): Thomas Boutell, _<boutell@boutell.com>_ Former Maintainer (until 11/93): Nathan Torkington, _<Nathan.Torkington@vuw.ac.nz>_ HOW CAN TWO DIFFERENT HOME PAGES SHARE ONE PHYSICAL MACHINE? Dan Pritchett maintains a document detailing the process of running two or more servers on the same machine without end users being able to tell the difference (URL is <URL:http://www.thesphere.com/~dlp/TwoServers/> ).