Gopher Proxy Blacklisting
-------------------------

As I mentioned a couple of days ago [1], I've started seeing my posts
pop up in Google searches.  On the one hand I can see that it is
generally a "good thing" for people to be able to find information
relevant to a given query regardless of how it's hosted.

On the other hand though, I tend to agree with Tomasino [2] and I
would prefer to keep my phlog posts off of the google results page.

If this were web page this would simply a matter of configuring a
robots.txt file: crawlers for major search engines apparently respect
these.  But this doesn't seem to work for gopherholes visible to the
crawlers via a proxy web app. I expect that this is because, from the
point of view of the crawler, the HTML version of my gopherhole is
simply a page belong to the proxy site. Meaning the only robots.txt
that applies is the one on the website hosting that proxy app.  The
fact that _my_ gopherhole has a robots.txt selector [3] which
prohibits crawling by anything besides Veronika is irrelevant, and I
am therefore given no say in which parts of my gopherhole are slurped
up.
 
Thus I am now experimenting with maintaining a blacklist of hosts
which seem to correspond to gopher proxy servers.  (The procedure I've
used for identifying these is completely manual: search for text from
my phlog on using Google, then open a connection to
gopher://thelamdbalab.xyz from the offending proxy and watch the
address pop up in the log.)  In this way I've blocked requests coming
from all proxy apps which show up on the first page of the google
search results for my phlog.

Hopefully there's no collateral damage from this move.  (If you become
aware of any, I'd be greatful for a heads up, either by email or
leaving a guestbook comment.) Obviously the best solution would be if
the proxies were to mark the gopher pages they serve as disallowed
using their own robots.txt files, however until this occurs I see no
other real option. :-(

---
[1]: gopher://thelambdalab.xyz/0/phlog/2019-08-27-Email-thoughts.txt
[2]: gopher://gopher.black/1/phlog/20190524-robots-txt
[3]: gopher://thelambdalab.xyz/0/robots.txt