[HN Gopher] YaCy - your own search engine
___________________________________________________________________
 
YaCy - your own search engine
 
Author : modinfo
Score  : 163 points
Date   : 2022-08-25 17:47 UTC (5 hours ago)
 
web link (yacy.net)
w3m dump (yacy.net)
 
| rasulkireev wrote:
| Recently installed YaCy on my Synology via docker image the
| provide. Already saved about 10Gb of content interesting to me.
| Now, I have a personal Search Engine. Awesome.
 
  | BaseballPhysics wrote:
  | So what's your workflow for using it? You mentioned it's saved
  | "content interesting to me". Are you doing directed crawls
  | or...?
 
    | rasulkireev wrote:
    | Yeah, if it is just one articles or a blog post I crawl at
    | depth 0, and if it is someone's personal website who I enjoy
    | reading always, no matter what they write, I do an infinite
    | crawl on that specific domain.
 
  | Tijdreiziger wrote:
  | Off-topic, but how do you like Synology? I'm familiar with one
  | of their units for work, but I'm looking into a new NAS for my
  | home, and I'm trying to decide between Synology or building my
  | own and putting Nextcloud on it.
 
    | justsomehnguy wrote:
    | Grearly depends on what you are expecting from it.
    | 
    | After $300 per unit S. has only two advantages:
    | 
    | 1. Form-factor: you can build a comparable small enough unit
    | from OTC/OTS parts but usually it costs at least $200 more
    | 
    | 2. Basic functionality (ie filesharing eg with SMB) just
    | works, with a nice webgui to configure it.
    | 
    | If you need something more...
 
      | Tijdreiziger wrote:
      | Expectations: file/photo sync, media server, ad blocking
      | (Pi-hole). I saw that Synology has first-party apps for
      | most of this (Synology Drive, Moments, Video).
 
    | rasulkireev wrote:
    | Love it, have 0 complaints! I got DS220+
 
      | chrisweekly wrote:
      | Happy w my DS-220+ too
 
    | wccrawford wrote:
    | Also not OP. I've got a Synology 918+ that I've used for
    | years, and as a file store, I'm quite pleased.
    | 
    | I've tried running apps on it, and the ones that are
    | available are decent, but I pretty quickly got to where I
    | needed to SSH in to make certain things happen, and that felt
    | weird for an appliance like this. I added Docker and ran a
    | bunch of stuff on that, and that was kind of a pain. They
    | don't make it easy to update the images and the community's
    | solution is to SSH in and install watchtower to do it.
    | 
    | I'm now just using it for network file storage and running
    | all those services on a Linux box instead.
    | 
    | I thought about just putting the drives in the Linux box, but
    | I did some network testing and the NAS was faster, and it
    | provides a lot of storage-related niceties, so I'm keeping it
    | in the mix. For instance, I recently decided to upgrade the
    | drives to faster, larger ones, and it's been pretty easy.
 
      | Tijdreiziger wrote:
      | Thanks! So are you running the first-party Synology Drive,
      | Moments, etc. for file/photo syncing, or do you run
      | something like Nextcloud on your Linux box? Or do you not
      | use software like that?
 
    | usefulcat wrote:
    | I used a small Synology NAS from 2012-2019, at which point I
    | replaced it with small linux box because I wanted ZFS.
    | Inability to support ZFS was really the only reason I
    | replaced it; it was still working fine.
 
      | Tijdreiziger wrote:
      | What software are you running, and how much time do you
      | spend on maintenance?
 
        | usefulcat wrote:
        | Vanilla Ubuntu 18.04 LTS. Every couple of months or so I
        | update all the packages and reboot. That's really all the
        | maintenance I've ever done on it, apart from initial
        | setup. I ought to set it up so that it can email me if a
        | zfs scrub ever detects a problem, but I haven't done that
        | yet.
 
        | Tijdreiziger wrote:
        | Thanks! That's a valuable data point for my comparison.
        | 
        | By the way, do you run software like Nextcloud, or are
        | you just using it as a storage tank?
 
    | rpdillon wrote:
    | Not OP, but I've been using a Synology NAS since 2013 and
    | it's a great product. I bought a router from them as well,
    | which is also superb. I think it's a fabulous investment.
 
| sciguy77 wrote:
| Has anyone tried LinkAce? I'd love to hear someone's thoughts on
| YaCy vs LinkAce.
| 
| This is great timing. After looking at YaCy for my Synology NAS a
| few week ago, I looked at some alternatives. I like the look of
| LinkAce, though it seems to be less popular and I haven't found
| much on how a setup on a Synology NAS works.
| 
| I'd love some advice, I have a massive number of bookmarks across
| dozens of folders. Something like this is exactly what I'm
| looking for.
 
  | rasulkireev wrote:
  | I did that a couple of months ago. Was planning to write
  | something up in the next month or so.
 
  | encryptluks2 wrote:
  | They serve very different purposes. While a search engine in
  | turn can archives sites it isn't the only purpose. LinkAce is
  | designed more for bookmarking and archiving sites akin to a
  | bookmark manager, not as a search engine.
 
| AndyMcConachie wrote:
| I have about 100,000 PDFs that I want indexed and searchable.
| They're on a website and I want people to be able to visit the
| website and search through the PDFs.
| 
| Should I use Yacy or Apache Solr?
| 
| All opinions and rants welcome.
 
| dang wrote:
| Related:
| 
|  _YaCy: Decentralized Web Search_ -
| https://news.ycombinator.com/item?id=22246732 - Feb 2020 (41
| comments)
| 
|  _YaCy: a free distributed search engine_ -
| https://news.ycombinator.com/item?id=12433010 - Sept 2016 (24
| comments)
| 
|  _YaCy - Peer to Peer Search Engine_ -
| https://news.ycombinator.com/item?id=11956268 - June 2016 (3
| comments)
| 
|  _YaCy: Decentralized Web Search_ -
| https://news.ycombinator.com/item?id=8746883 - Dec 2014 (29
| comments)
| 
|  _YaCy takes on Google with open source search engine_ -
| https://news.ycombinator.com/item?id=3288586 - Nov 2011 (17
| comments)
 
| a5huynh wrote:
| Shameless self-plug, I've been building some similar that you can
| run locally as an app: https://github.com/a5huynh/spyglass
| 
| You can define some basic rules & it'll go out and crawl those
| particular sites. Or use one that someone else has built. It can
| also sync with your Chrome/Firefox bookmarks. Would love feedback
| from folks who get a chance to use it !
 
| bobajeff wrote:
| I would like to use this. However, in the past when I've tried it
| I didn't like the results. It would be nice to hear about more
| competition in the P2P information retrieval (search engine) tech
| space. YaCy seems to be the only one I've consistently heard
| about over the years.
 
| pacifika wrote:
| Use this as a personal knowledge base. Indexed my blog. Indexed a
| bookmarks export. Indexed a knowledge base. Works well. It also
| convinced me of power user ui
 
  | gavmor wrote:
  | That sounds promising! How often do you export your bookmarks,
  | and in what format do you keep your knowledge base?
 
  | tecoholic wrote:
  | Self plug - If you want to skip bookmarking and go straight to
  | indexing, I have a firefox extension for it -
  | https://github.com/tecoholic/yacy-it
 
  | ThinkingGuy wrote:
  | I keep everything on my home server: photos, music, home
  | videos, movies, downloaded webpages, ebooks, instruction
  | manuals, etc., all shared out over HTTP. Yacy basically gives
  | me a centralized, private search engine for my house. Example
  | searches: "Frigidaire manual" "living room collection:Photos"
  | "London Philharmonic Orchestra collection:Music"
  | 
  | Of course, having things in an organized hierarchical file
  | system, with good metadata, helps.
 
  | pacifika wrote:
  | Firefox export as html then point yacy to it. My knowledge base
  | is a bookstack instance
 
| mtlynch wrote:
| I love the idea of this, but I tried to spin up my own instance
| and was immediately overwhelmed by the million little knobs and
| settings for it.
| 
| It seems like a lot of fun if you understand all the tuning, but
| I feel like the current state alienates most users who want to
| use it in simple scenarios.
 
  | 6510 wrote:
  | Default settings works well enough but I agree 90% should be
  | hidden behind an advanced settings check box. (I suspect the
  | organization of features is more obvious in German.) There are
  | also lots of other cool things one can do that are not in the
  | interface but arguably should be.
  | 
  | That said, for what it is it is pretty epic already. As a proof
  | of concept it's completely convincing.
 
  | bityard wrote:
  | There are lots of settings because it's very powerful software.
  | I don't understand the part about being overwhelmed... surely
  | the developers have chosen sane defaults for most things and
  | you can just ignore the ones you don't understand?
 
    | mtlynch wrote:
    | That wasn't my experience. YaCy didn't do what I wanted out
    | of the box, so I was just left with 100+ settings that I
    | didn't know how to adjust to get to a desired state.
 
| bityard wrote:
| It's interesting that this uses a distributed P2P index. That's a
| very good idea and one of the things that has held me back from
| even thinking about trying to build my own tech-focused search
| engine.
| 
| One thing I was hoping to see in the FAQ was how they prevent
| rogue nodes from inserting spam or other kinds of mischief into
| the public index.
 
  | viraptor wrote:
  | They don't really. You have to apply your own filtering.
 
| alxjsn wrote:
| If you haven't heard of Brave Goggles
| (https://github.com/brave/goggles-quickstart) I highly recommend
| checking it out. Just being able to create the search index is a
| massive task, so being able to apply rules server-side to their
| "expanded recall set" will give you what most people building
| search engines want, which is to control the algorithm. We
| weren't able to do that until now since applying rules client-
| side doesn't work well on a small search result set.
| 
| Related: I created a tool to create Goggles using subreddits as a
| signal source for domains:
| https://github.com/forcesunseen/narwhalizer
 
  | upupandup wrote:
  | I see Brave. I close tab. I don't trust them or anybody that
  | pushes their offerings which are just crypto ponzi schemes.
 
    | hunterb123 wrote:
    | The crypto stuff is disabled by default, get a new talking
    | point.
 
      | upupandup wrote:
      | a deliberate ponzi enabling mechanism shouldn't even be
      | available
 
        | hunterb123 wrote:
        | k
 
        | 867-5309 wrote:
        | at least they put the safety on before throwing you the
        | gun
 
    | UberFly wrote:
    | It's just a different revenue model than the usual ad
    | garbage. You don't have to use it.
 
    | metalliqaz wrote:
    | I thought Brave was just a web browser with built-in adblock,
    | but after your comment I decided to look it up on wikipedia.
    | Holey moley, what a nightmare.
 
  | mimimi31 wrote:
  | Kagi (https://kagi.com) has very similar tools with their
  | "Lenses" and customizable prioritization of specific domains.
 
    | rtev wrote:
    | Kagi actually did it first, I think. Too bad everyone only
    | knows about it via Brave, Kagi is an awesome search engine
 
      | scrollaway wrote:
      | Seconding, Kagi is great. I hope they succeed...
 
    | Entinel wrote:
    | Kagi is a weird beast. I'd like to use it but I also don't
    | understand how searches are private if I have to login. Not
    | understanding that is definitely on me but I feel like it
    | should be a frequent enough question that they try to make
    | the answer obvious.
 
  | skybrian wrote:
  | Seems like you're burying the lead a bit since your "Basic
  | Usage" involves running some Docker instance for some reason
  | and you don't need to do that just to try it out?
  | 
  | It looks like Goggles are just text files hosted on GitHub or
  | GitLab and you can try them out with Brave's search engine
  | without installing anything. Some to try:
  | 
  | https://search.brave.com/goggles/discover
  | 
  | The netsec Goggle is here:
  | 
  | https://search.brave.com/goggles?goggles_id=https://github.c...
 
| 10g1k wrote:
| Copernic used to be a great way to do this. Register every search
| engine you like in the local software, apply rules, search all
| the web search engines at once. Until they went 100% corporate,
| it was awesome.
 
___________________________________________________________________
(page generated 2022-08-25 23:00 UTC)