Received: with ECARTIS (v1.0.0; list gopher);
 Wed, 12 Oct 2005 21:52:51 -0500 (CDT)
Received: from mo-69-69-114-6.sta.sprint-hsd.net ([69.69.114.6]
 helo=erwin.lan.complete.org)
	by glockenspiel.complete.org with esmtps
	(with TLS-1.0:RSA_AES_256_CBC_SHA:32)
	(TLS peer CN erwin.complete.org, certificate verified)
	(Exim 4.50)
	id 1EPtD1-0000QD-75; Wed, 12 Oct 2005 21:52:50 -0500
Received: from katherina.lan.complete.org ([10.200.0.4])
	by erwin.lan.complete.org with esmtps
	(with TLS-1.0:RSA_AES_256_CBC_SHA:32)
	(No TLS peer certificate)
	(Exim 4.50)
	id 1EPtCr-00035q-Jt; Wed, 12 Oct 2005 21:52:33 -0500
Received: from jgoerzen by katherina.lan.complete.org with local (Exim 4.54)
	id 1EPtCr-00076P-5W; Wed, 12 Oct 2005 21:52:33 -0500
Date: Wed, 12 Oct 2005 21:52:33 -0500
From: John Goerzen <jgoerzen@complete.org>
To: gopher@complete.org
Subject: [gopher] Re: New Gopher Wayback Machine Bot
Message-ID: <20051013025233.GA26984@katherina.lan.complete.org>
References: <20051012180132.GA19083@complete.org>
 <200510122345.QAA17070@floodgap.com>
MIME-Version: 1.0
Content-type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200510122345.QAA17070@floodgap.com>
User-Agent: Mutt/1.5.11
X-Spam-Status: No (score 0.1): AWL=0.008, FORGED_RCVD_HELO=0.05
X-Virus-Scanned: by Exiscan on glockenspiel.complete.org at Wed,
 12 Oct 2005 21:52:50 -0500
Content-Transfer-Encoding: 8bit
X-archive-position: 1114
X-ecartis-version: Ecartis v1.0.0
Sender: gopher-bounce@complete.org
Errors-to: gopher-bounce@complete.org
X-original-sender: jgoerzen@complete.org
Precedence: bulk
Reply-to: gopher@complete.org
List-help: <mailto:ecartis@complete.org?Subject=help>
List-unsubscribe: <mailto:gopher-request@complete.org?Subject=unsubscribe>
List-software: Ecartis version 1.0.0
List-Id: Gopher <gopher.complete.org>
X-List-ID: Gopher <gopher.complete.org>
List-subscribe: <mailto:gopher-request@complete.org?Subject=subscribe>
List-owner: <mailto:jgoerzen@complete.org>
List-post: <mailto:gopher@complete.org>
List-archive: <http://www.complete.org/mailinglists/archives/>
X-list: gopher

On Wed, Oct 12, 2005 at 04:45:56PM -0700, Cameron Kaiser wrote:
> > Cameron, floodgap.com seems to have some sort of rate limiting and keeps
> > giving me a Connection refused error after a certain number of documents
> > have been spidered.
> 
> I'm a little concerned about your project since I do host a number of large
> subparts which are actually proxied services, and I think even a gentle bot
> going methodically through them would not be pleasant for the other side
> (especially if you mean to regularly update your snapshot).

Valid concern.  I had actually already marked your site off-limits
because I noticed that.  Incidentally, your robots.txt doesn't seem to
disallow anything -- might want to take a look at that ;-)

[snip]

> I do support robots.txt, see
> 
> 	gopher.floodgap.com/0/v2/help/indexer

Do you happen to have the source code for that available?  I've got
some questions for you that it could explain (or you could), such as:

 1. Which would you use?  (Do you expect URLs to be HTTP-escaped?)

    Disallow: /Applications and Games
    Disallow: /Applications%20and%20Games

2. Do you assume that all Disallow patterns begin with a slash as they
   do in HTML, even if the Gopher selector doesn't?

3. Do you have any special code to handle the UMN case where
   1/foo, /foo, and foo all refer to the same document?

I will be adding robots.txt support to my bot and restarting it shortly.

Thanks,

-- John



-- 
John Goerzen
Author, Foundations of Python Network Programming
http://www.amazon.com/exec/obidos/tg/detail/-/1590593715