# Quarry (indexer, search interface and supporting tools) > A place, cavern, or pit where stones are dug from the earth, or > separated, as by blasting with gunpowder, from a large mass of rock. > Hunted or slaughtered game, or any object of eager pursuit. Quarry contains a number of components: 1. Crawler/indexer (quarry.pl) 2. Gopher search, front end to search index (search.dcgi) 3. Wrapper for quarry.pl to process pending host index requests (indexPending.pl) 4. Sitemap generator (generateSitemap.pl) 5. Host and selector maintenance (checkHosts.pl) Requirements: * Perl * curl * MariaDB/MySQL Try it: gopher://gopher.icu/1/quarry Get it: git clone git://gopher.icu/quarry ## 1. Crawler/indexer The indexer will by default visit every link on a gopher site and store the type, link-title, selector, hostname and port in the 'selectors' table. It will do this only for those types defined in HARVEST_TYPES. The robots.txt standard file format is supported and honoured. A bespoke sitemap file format is also supported and will be used to populate the database if found. There are a number of parameters which can be set at the top of the file to change it's behavior: DEBUG (Default 1) Display verbose status messages. MAX_DEPTH (Default 5) Defines the maximum number of levels of recursion. IGNORE_ROBOTS (Default 0) Ignores robots.txt and any directives therein. IGNORE_SITEMAP (Default 0) Ignores the sitemap and instead indexes the site by recursion. CRAWL_DELAY (Default 2) Default delay in seconds between requests. This parameter is overridden by robots.txt if found. HARVEST_TYPES (Default '10Igs9') Defines the gopher types captured. TRAVERSE_DOMAINS (Default 0) *Best avoided* as sitemap and robots are only parsed for the start domain. It is better to index each host individually and use alternative means of host discovery. REINDEX (Default 0) Removes all selectors for host before re-indexing. Usage: quarry.pl some-gopher-domain.net The port can be optionally specified eg. some-gopher-domain.net:7070. ## 2. Gopher search This provides a font end to the index generated by quarry.pl. Features iclude: * General search * Image search * Sound search * Video search * Submit site to be indexed The current search function is basic and tries to match the search string against the selector or title fields and returns any that match. This will change once metadata is added and implemented in the search. ## 3. Wrapper This program simply looks in the 'pending' database table for hosts submitted to be indexed, via the gopher search front end, and passes them to quarry.pl to be indexed. Usage: indexPending.pl ## 4. Sitemap generator The sitemap generator uses data from the index generated by quarry.pl. The reasons for the sitemap are twofold: 1. Efficiency, downloading a single index file rather than crawling. 2. The format supports additional metadata: * Description * Categories * Keywords These extra metadata fields can be used to greatly enhance search results. Example of records: ``` Type: 1 Selector: /contact.dcgi Host: gopher.icu Port: 70 LinkName: Contact Description: My contact details Categories: Keywords: -------- Type: 1 Selector: /gutenberg Host: gopher.icu Port: 70 LinkName: Gutenberg (unofficial book and audio search interface) Description: Gopher search interface to the official Gutenberg book repository Categories: Books Keywords: Books -------- ``` Usage: generateSitemap.pl some-gopher-domain.net > sitemap.txt ## 5. Host and selector maintenance Only basic host checking has been implemented using 'hostcheck'. Currently the checkHosts.pl script checks each host twice in a 24 hour period. If the host fails two concurrent checks then it is flagged as inactive and selectors will not display in search results. If on a subsequent check the host has recovered then it is again flagged as active. Hostcheck: git clone git://gopher.icu/hostcheck ## 6. IndexNow IndexNow[1] is an easy way for website owners to instantly inform search engines about content changes on their website. It has been implemented in a basic way to allow submission of a single URI per request: curl -s 'gopher://gopher.icu/7/quarry/indexnow.dcgi?url=<MY-URL>&key=<MY-KEY>' [1](https://www.indexnow.org/)