Running WWW on top of Gopher

paper for WWW'94, CERN, May 1994 and GopherCON '94, Minneapolis, Minn, April 1994 

Mike Potter  (http://www.lanl.gov:52271/?-l+103424) 
Network Engineering 
C-5, MS B255 
Los Alamos National Laboratory 
Los Alamos, New Mexico, USA 87545 
mep@lanl.gov


At Los Alamos National Laboratory (LANL), we wanted to set up a World-Wide-Web (WWW) server. A few months earlier we had already set up a Gopher server. Rather than maintaining two separate systems, with a plain ASCII copy of the information in one system along with an HTML version in the other, we decided to write a WWW server that would make use of the existing Gopher information structure. 

Of course, you don't need to do anything special in order to view Gopher information via the Web, you can simply use the URL of the form 

gopher://nodename/type/path

to point to your existing Gopher server. However, we wanted the ability to enhance the existing information using HTML, without having to duplicate all of it. The following goals lead to the development of the gopherhttpd server: 

	Seamless integration of Gopher and WWW from the perspective of both the users and the information provider. If a new document is added to the Gopher system, it should automatically appear in the WWW system. 

	Provide a WWW server that installs and operates much like a Gopher server to allow existing system and network managers to install a WWW server with little additional training. 

	Provide the ability to completely override ASCII documents in Gopher with HTML documents in WWW when desired. 

	Allow easy annotation of the existing Gopherspace. 

	Provide nicely formatted HTML pages by default. 

This document describes the gopherhttpd server that achieves all of the above goals. The installation and operation of this server will be described, and examples from the LANL server will be shown. 


Overview of gopherhttpd

The easiest way to understand the power of gopherhttpd is to take a look at an example. 

Here is what the LANL Gopher server looks like.  (gopher://gopher.lanl.gov/) 

Here is what the LANL WWW Home Page looks like. (http://www.lanl.gov/) 

The LANL WWW Home Page was generated "on the fly" based upon information from the LANL Gopher server. Note that all of the Gopher menu items appear in the WWW Home Page. Each menu item is "annotated" with a short description. In addition, the entire menu is preceded by a short description (like an "About" or "README" file would normally be used for in Gopher). 

Note that the WWW Home Page takes full advantage of HTML, including the ability to present a logo. 


A Closer Look

Let's take a closer look at each element in the LANL WWW Home Page and see how it is generated from the data on the Gopher server. 

<Optional logo> 

This text is used to describe the current page. It is optional. The contents of this section are taken from the file README.html in the current Gopher directory. This file can contain any HTML code that is desired. 

<optional icon>First Menu Item  
	Description of the First Menu Item. This section is optional. 

<optional icon>Second Menu Item 
	Description of Second Menu Item 

________________________________________ 
Author Name or email address

Each of the Menu Items in the template shown above are taken from the Gopher menu. The gopherhttpd server reads both the .cap files and the .Links files and produces a menu just like the Gopher menu. Each menu item corresponds to a specific file or directory in Gopherspace. The description of this file or directory is read from the file filename.about.html where filename is the name of the Gopher file or directory. The optional icon is taken from the file .cap/filename.gif if it exists. 

The only thing that was done to the Gopher server to make all of this work was to add the line 

ignore:		.html

to the gopherd configuration file. Thus, any file ending in .html will be ignore by the Gopher server, and handled instead by gopherhttpd. 

The Author name or email address shown at the bottom of the screen is taken from a file in the current Gopher directory called AUTHOR.html. If this file is not present, and is not found in any parent directories, default information contained in the gopherhttpd configuration file is placed here instead. Since this information can contain any HTML commands, you can provides links to your phone book entries. 

That's it! This is the basic template for every HTML page created by gopherhttpd from your existing Gopher menus. 


Further Customization

There are some tricks that you can use to further customize the look of your WWW server. 

Normally, gopherhttpd will automatically create the URL for each item in the menu to point to either another menu, or to the information file itself. If the file filename.html exists, then a link to this file will be generated instead. This allows you to completely override a file in Gopher with an HTML file by putting both into the same Gopher directory. Since Gopher ignores HTML files, Gopher users will only see the original data, and WWW users will only see the HTML data. 

You can prevent Gopher items from appearing in the WWW page by adding a file to the Gopher directory called filename.ignore Of course, you then also need to tell your Gopher server to ignore files that end in .ignore. To avoid this, gopherhttpd also recognizes the file .cap/filename.ignore. Now Gopher users will see the original filename, but gopherhttpd will omit it from the WWW menu. 


Extensions to the Gopher .cap files

Instead of creating description files using the filename.about.html method described above, you can achieve this same effect through some extensions to the Gopher .cap/filename and .Links files. Gopher clients ignore lines in these files that do not start with recognized keywords. Thus, we have added some new keywords that gopherhttpd will recognize, but Gopher will ignore. Here is a list of the new keywords: 

Desc 
	Allows you to specify the description to be placed under the menu item name. Only a single line description is allowed, but that line can be as long as you want, and Mosaic will wrap the long line as needed into a full paragraph. 

WWW 
	What if a system is running both a Gopher and a WWW server? Normally, gopherhttpd points to the Gopher system. You can override this using the WWW=url option in the .cap file. The url specified on this line will be used as the menu item. Also, a line in the description field will automatically be generated that says 

You can also access their Gopher server. 

which points to the Gopher system. If you put an asterisk (*) at the end of the URL, this Gopher message will be supressed. A dollar-sign ($) in the URL is expanded from the Host field. If the protocol string is missing from the front of the URL, http:// is added by default. Thus, the string 

WWW=$/welcome.html 

will generate a menu item with a link of http://hostname/welcome.html where hostname is taken from the Host=hostname line in the .cap file. 

Pre 
	HTML text assigned to Pre will be output immediately before the highlighted Name of the menu item. 

Post 
	HTML text assigned to Post will be output immediately after the highlighted Name of the menu item. 

Before 
	HTML text assigned to Before will be also output before the menu item, but also before the <dt> that flags the start of the menu item. You can think of this text as appearing after the Description text of the previous item. A common use of this item is with Before=<hr> to put a horizontal rule before the menu item. 

NOTE: When using extended syntax in the .cap/filename file, be sure and put the new items after any existing Gopher items. Many gopher clients abort their parsing of the .cap file when they reach an unknown keyword. Thus, all standard Gopher keywords should come first, followed by the extended gopherhttpd keywords. 


Putting .cap information into the About file

As mentioned in a previous section, you can use a file called filename.about.html, where filename is the name of the Gopher file or directory, to specify the description that appears under the menu item. This is just the default use for the About file -- you can do much more. If a line in the About file doesn't start with keyword=, then it is assumed to be a line in the menu description. A single line in this file is equivalent to putting a Desc=description line in the .cap file. 

However, in the About file (the filename.about.html file), you can specify multiple description lines. In the HTML output, a <dd> is inserted at the beginning of each line to force a line break in the description. 

You can also put any valid .cap information into the About file. Any information specified in the About file will override information in the .cap file. This allows you to further modify and customize your WWW page since you can change the Gopher information in the gopherhttpd About file for a given menu entry. 


Some real examples

Let's look at a real-life examples where we make use of some of these advanced features. 


Separating menu items into sections

In order to organize a Gopher menu into sections, sometimes you will see gopher sites do this: 

                  Internet Gopher Information Client 2.0 pl10
 
                      Root gopher server: gopher.lanl.gov
 
 -->  1.  News Flash 7-Mar-1994: What's New in the LANL Gopher....
      2.  ---------------------LANL Information---------------------.
      3.  News and Events/
      4.  Phone Book/
      5.  Job Openings/
      6.  Library Catalogs and Information/
      7.  Computing at LANL/
      8.  Information Architecture Project/
      9.  Software Archive/
      10. Information by Division/
      11. Information by Subject/
      12. -----------------------The Internet-----------------------.
      13. About the Internet/
      14. How to get Gopher/Mosaic Software/
      15. The Internet via Gopher/Mosaic/
      16. Finding People, Places, and Information/
      17. Selected Software Archives (FTP)/
      18. Network News (USENET)/
 
Press ? for Help, q to Quit                                   Page: 1/2


What we want the WWW home page to look like is something like this: 

News Flash 7-Mar-1994: What's New in the LANL Gopher.... 

LANL Information 

News and Events 

Phone Book 

Job Openings 

Library Catalogs and Information 

Computing at LANL 

Information Architecture Project 

Software Archive 

Information by Division 

Information by Subject 

The Internet 

About the Internet 

How to get Gopher/Mosaic Software 

The Internet via Gopher/Mosaic 

Finding People, Places, and Information 

Selected Software Archives (FTP) 

Network News (USENET) 


The way to achieve this is to override the Name of the section headings using the About file. Let's concentrate on the menu item titled "LANL Information". This Gopher menu item points to a file of information about LANL, with a filename of lanl. Here is the contents of the .cap/lanl Gopher file: 

Name=---------------------LANL Information---------------------
Numb=2

Pretty simple. Now, here is the contents of the lanl.about.html file used by gopherhttpd: 

Name=LANL Information
Desc=<dl>
Before=<p>

The Name= line overrides the Gopher name that contains all of the hyphens. The Desc=<dl> line tells HTML to start a new description list for the following menu items. The Before= line adds some space between the previous menu item and the current one. gopherhttpd will automatically create a link to the existing lanl Gopher file. If you want, you could create an HTML version of this file called lanl.html, and the link would automatically point to the HTML file rather than the ASCII file. 


Creating a WWW-only menu item

We have seen how you create a Gopher-only menu item; simply create a file called .cap/filename.ignore and gopherhttpd will omit the item from the menu listing. However, what if you only want a particular item to appear in the WWW page, and not the Gopher page? 

To create a new WWW-only menu item, simply create a filename.about.html file. For example, Let's say we want to add a menu item that points to the master list of WWW servers. Obviously we don't want our Gopher users to see this, since it contains a list of WWW servers, not Gopher servers. We create a file called www-list.about.html with the following contents: 

Name=Master list of WWW servers around the world
Numb=10
WWW=http://info.cern.ch/hypertext/DataSources/bySubject/Overview.html
Desc=A listing of registered World-Wide-Web servers maintained at CERN

The Name= line specifies the highlighted text of the menu item. The Numb= line specifies that this item appears in tenth place in the menu. Without the WWW= line, gopherhttpd would create a link to a file called www-list or www-list.html. By overridding this link, we can point to the server list at CERN instead. Finally, the Desc= line adds a short annotation for this menu item. 


Some LANL-specific features

gopherhttpd also supports some more exotic features that are used at LANL. By modifying the perl code for gopherhttpd, you can probably make use of these features at your own site. However, feel free to skip this section if you'd like. 


Linking author entries with the phone book

The file AUTHOR.html is used to sign the bottom of each WWW page. If the file does not exist in the current Gopher directory, or in any parent directories, default information from the gopherhttpd configuration file is used. This file can contain any HTML code. However, if the file contains a single line with the syntax: 

text,nnnnnn

or 

nnnnnn:{text}

where nnnnnn is a six-digit number, then gopherhttpd automatically creates a link to the LANL phone book. The text will be highlighted, and linked to the following URL: 

http://www.lanl.gov:52271/?-l+nnnnnn

The LANL phone book runs on port 52271 and takes a query. The -l tells the LANL phone book to output the long form of the record, and the 6-digit number represents the LANL employee number. 


Creating a local list of servers

When we started to create a master list of all Gopher and WWW servers at LANL, we realized that much of the effort spent writing HTML code could be automated. After all, what we wanted was a menu of servers, much like one of our Gopher menus. The difference is that we wanted to add some extra information to each item, like the status of the server, the name of the contact, etc. We could have just used the Description capability already build into gopherhttpd, but we decided to add some more extensions to make it even easier. The following additional keywords were added to the .cap/filename and filename.about.html file syntax. 

Status 
	Specifies the status of the server (under construction, production, etc.). This field can contain any HTML code. The presence of the Status keyword in the .cap or About file triggers the code in gopherhttpd that produces the special format of this menu item. 

Admin 
	Specifies the contact person in charge of the server. It can contain plain HTML text, or it can contain links to the LANL phone book using the syntax: 
	nnnnnn:{text of first contact},nnnnnn:{text of second contact}...
	Each contact will be placed on a separate line. 

GopherLink 
	Normally, the link to the Gopher server will be taken from the Host, Port, Path keywords. You can override this using the GopherLink keyword. In particular, you can use a value of none if the system is not running a Gopher server. 

WebLink 
	Normally, the link to the WWW server will be taken from the Host, Port, and Path keywords, possibly overriden with the WWW keyword. You can override this with the WebLink keyword. In particular, you can use a value of none if the system is not running a WWW server. 

Let's take a look at an example of a .cap file for a system running both a Gopher and WWW server, and how gopherhttpd formats this entry: 

Name=DOE High Performance Computing Research Center (ACL)
Type=1
Host=gopher.acl.lanl.gov
Port=70
Status=production
Path=
WWW=http://www.acl.lanl.gov/Home.html
Admin=102733:{Jerry DeLapp (jgd@acl.lanl.gov), Gopher}\n114212:{Ron Daniel (rdaniel@acl.lanl.gov), WWW}
Desc=Information about the Advanced Computing Laboratory and all of the projects that they are involved in.  ACL staff and facilities information.  Link to central LANL server.


gopherhttpd displays this menu item like this: 

DOE High Performance Computing Research Center (ACL)  (http://www.acl.lanl.gov/Home.html) 

	Status...production 

	WWW...... (http://www.acl.lanl.gov/Home.html) 

	Gopher...(gopher://gopher.acl.lanl.gov/) 

	Admin....Jerry DeLapp (jgd@acl.lanl.gov), Gopher  (http://www.lanl.gov:52271/?-l+102733) 

	.........Ron Daniel (rdaniel@acl.lanl.gov), WWW  (http://www.lanl.gov:52271/?-l+114212) 

	Information about the Advanced Computing Laboratory and all of the projects that they are involved in. ACL staff and facilities information. Link to central LANL server.


Note that all of the links are created for you so that in addition to providing information about the server, you can actually jump to their WWW server, Gopher server, or phone book entries. 

If you put this information into the Gopher .Links file, rather than using a .cap file, you will end up with menu items in your Gopher server. At LANL, we put entries in the .Links file for all servers running Gopher, then create individual filename.about.html files for servers that do not run Gopher. This way, Gopher users see a list of all LANL Gopher servers, and WWW users see a nice annotated list of all Gopher and WWW servers at the Lab. 


Installing gopherhttpd

Installation of gopherhttpd is very similar to the installation of a Gopher server. gopherhttpd is meant to be run from the Unix inetd daemon. Here are the steps involved in installation: 

Add an entry to your /etc/services file for your new WWW server. This entry should look something like: 
	httpd   80/tcp      # WWW server

Add an entry to your /etc/inetd.conf file. This entry should look something like: 
	http  stream  tcp  nowait  nobody  /etc/gopherhttpd gopherhttpd /gopher /etc/gopherhttpd.conf
	The meaning of the parameters will be listed in the next section. Note that this daemon is run as user nobody. This is recommended as a security precaution to prevent someone from gaining root access through unknown holes in gopherhttpd. This example is taken from a Sun Sparcstation. Some Unix systems do not allow you to specify the user id that your server runs as. 

Restart your inetd daemon. Use 
	ps -ax | grep inetd
	to determine the process id of inetd. Then issue a kill -1 pid to restart it. 

The above example assumes you put the gopherhttpd code and configuration file into your /etc directory. Feel free to use any directory you wish, and simply update the entry in inetd.conf to reflect the actual location of these files. Here is the source for gopherhttpd (file://ftp.lanl.gov/pub/unix/www/gopherhttpd/gopherhttpd) 

gopherhttpd is written in perl. perl is an interpreted language that requires a run-time interpreter. The first line in gopherhttpd points to the location of the perl interpreter. The default location is /usr/bin/perl. If your perl interpreter is located in a different place, change the first line in gopherhttpd. If you don't have PERL, go get it! (http://www.lanl.gov/software/unix/perl) No Unix system should be without it. 


Command line parameters

gopherhttpd takes two parameters on the command line. The first parameter is the directory where your Gopher files are located. This should be the same as the home directory specified when loading your Gopher server. 

The second parameter is the location of the gopherhttpd configuration file. The contents of this configuration file are very similar to the contents of your Gopher configuration file. In particular, it contains information about MIME file types, access control lists, and miscellaneous information such as the node and port of your Gopher and WWW servers. The sample configuration file (file://ftp.lanl.gov/pub/unix/www/gopherhttpd/gopherhttpd.conf) is full of comments that explain each parameter. 


Security Considerations

gopherhttpd acts much like the browse option available in other WWW servers. This means that you don't have to provide a specific file name to gopherhttpd, but can simply give it a directory name and browse the files in that directory. gopherhttpd will allow access to any world-readable file or directory within your Gopher hierarchy. It will also follow any symbolic links contains in that hierarchy. It will not otherwise allow access outside of the tree specified as the first parameter to the server. 


Access Control

gopherhttpd implements the same type of node-based access control as does the UMN Gopher server. In the gopherhttpd configuration file, you add lines of the form: 

access: ip-address  access

where ip-address is the full or partial IP address of the system of network you want to control access on. access is either a + or - to allow or deny access. If the access field is any string beginning with an exclamation mark (!), access is denied, any other string not beginning with an ! allows access. The second form of the syntax makes the configuration file compatible with existing gopher configuration files. 

The ip-address field can actually be any Unix regular expression. Periods (.) not followed by a * or + are automatically escaped. Missing fields in the IP address are filled with .* automatically. Thus, the ip-address 128.165 expands to the regular expression 128\.165\..*\..*, matching any node beginning with the specified numbers. 


Conclusion

Running gopherhttpd in addition to a Gopher server makes your information available to the widest possible audience. The defaults are designed to get your system up and running as quickly and easily as possible. Many customization hooks are provided that allow you to fine tune to look of your WWW pages using HTML. Using the Gopher directory hierarchy forces the system administrator to organize their information. The system administrator doesn't need to write HTML code in order to generate nice looking menus. Until everyone starts generating their information directly in HTML, using gopherhttpd is a good compromise between the Gopher ASCII information and the WWW HTML information, giving you the advantages of both systems.