Gopher Proxy Blacklisting ------------------------- As I mentioned a couple of days ago [1], I've started seeing my posts pop up in Google searches. On the one hand I can see that it is generally a "good thing" for people to be able to find information relevant to a given query regardless of how it's hosted. On the other hand though, I tend to agree with Tomasino [2] and I would prefer to keep my phlog posts off of the google results page. If this were web page this would simply a matter of configuring a robots.txt file: crawlers for major search engines apparently respect these. But this doesn't seem to work for gopherholes visible to the crawlers via a proxy web app. I expect that this is because, from the point of view of the crawler, the HTML version of my gopherhole is simply a page belong to the proxy site. Meaning the only robots.txt that applies is the one on the website hosting that proxy app. The fact that _my_ gopherhole has a robots.txt selector [3] which prohibits crawling by anything besides Veronika is irrelevant, and I am therefore given no say in which parts of my gopherhole are slurped up. Thus I am now experimenting with maintaining a blacklist of hosts which seem to correspond to gopher proxy servers. (The procedure I've used for identifying these is completely manual: search for text from my phlog on using Google, then open a connection to gopher://thelamdbalab.xyz from the offending proxy and watch the address pop up in the log.) In this way I've blocked requests coming from all proxy apps which show up on the first page of the google search results for my phlog. Hopefully there's no collateral damage from this move. (If you become aware of any, I'd be greatful for a heads up, either by email or leaving a guestbook comment.) Obviously the best solution would be if the proxies were to mark the gopher pages they serve as disallowed using their own robots.txt files, however until this occurs I see no other real option. :-( --- [1]: gopher://thelambdalab.xyz/0/phlog/2019-08-27-Email-thoughts.txt [2]: gopher://gopher.black/1/phlog/20190524-robots-txt [3]: gopher://thelambdalab.xyz/0/robots.txt