proxy70

	[HN Gopher] An Introduction to Knowledge Graphs ___________________________________________________________________ An Introduction to Knowledge Graphs Author : umangkeshri Score : 97 points Date : 2021-05-22 12:02 UTC (10 hours ago)
	web link (ai.stanford.edu)
	w3m dump (ai.stanford.edu)
	\| julienreszka wrote: \| I wish all of you not to fall in the trap of ontologies. I worked \| very hard in this domain my conclusion is that all ontologies \| fail to scale eventually. I would recommend people in the field \| to go towards "perspectivism". \| omginternets wrote: \| Could you (please, pretty please) elaborate? \| hnxs wrote: \| I'd like to read more about what you worked on specifically if \| you're willing to share! \| quag wrote: \| Is this[1] an example of ontological perspectivism? Can you \| point us at a good place to start? \| \| [1]: \| https://link.springer.com/article/10.1007/s11406-021-00371-1 \| mistrial9 wrote: \| great start - I have presented this point of view myself.. no \| clue what "perspectivism" means really, though \| Veuxdo wrote: \| Is this just a way of saying that no relations are absolute? \| bluecerulean wrote: \| He most likely means that reasoners and databases that \| provide reasoning abilities do not scale. This makes sense, \| specially for OWL ontologies. For most OWL reasoners, if you \| feed them with the ontology and with a large set of instance \| data (class instances connected by edges that are labeled \| with properties defined in said ontology), it will likely \| take way more time than you would like to produce results (if \| it produces something). \| \| The reason for that is twofold: \| \| 1. Many of tools created for reasoning are research-first \| tools. Some papers were published about the tool and it \| really was a petter and more scalable tool than anything \| before it. But every PhD student graduates and needs to find \| a job or move to the next hyped research area 2. Tools are \| designed under the assumption that the whole ontology, all \| the instance data and all results fit in main memory (RAM). \| This assumption is de-facto necessary for more powerful \| entailment regimes of OWL. \| \| Reason 2 as a secondary sub-reason that OWL ontologies use \| URIs (actually IRIs), which are really inneficient \| identifiers compared to 32/64-bit integers. HDT is a format \| that fixes this inneficiency for RDF (and thus is applicable \| to ontologies) but since it came about nearly all reasoners \| where already abandoned as per reason #1 above. \| \| Newer reasoners that actually scale quite a bit are RDFox [1] \| and VLog [2]. They use compact representations and try to be \| nice with the CPU cache and pipeline. However, they are \| limited to a single shared memory (even if NUMA). \| \| There is a lot of mostly academic distributed reasoners \| designed to scale horizontally instead of vertically. These \| systems technically scale, but vertically scaling the \| centralized aforementioned systems will be more efficient. \| The intrinsic problem with distributing is that (i) it is \| hard to partition the input aiming at a fair distribution of \| work and (ii) inferred facts derived at one node often are \| evidence that multiple other nodes need to known. \| \| loose from modern single-node However, the problem of \| computing all inferred edges from a knowledge graph involves \| a great deal of communication, since one inference found by \| one node is evidence required by another processing node. \| \| [1]: https://www.oxfordsemantic.tech/product [2]: \| https://github.com/karmaresearch/vlog/ \| JoelJacobson wrote: \| SQL might be a good fit to model Knowledge Graphs, since FOREIGN \| KEYs can be named, using the CONSTRAINT constraint_name FOREIGN \| KEY ... syntax. We thus have support to label edges. \| \| Nodes = Tables \| \| Edges = Foreign keys \| \| Edge labels = Foreign key constraint names \| FigmentEngine wrote: \| yes, you can always map most structures into tables, or even \| excel. \| \| but it think "good fit" is a stretch. when designing systems \| you generally want to look at data access patterns, and pick a \| data exec approach that aligns to that. \| \| in tech, unfortunately, RDBMS are the "hammer" in "if your only \| tool is a hammer then every problem looks like a nail." \| lmeyerov wrote: \| This kind of approach is pretty common, including in compute \| engines like Spark's graphx. I suspect a lot of teams using \| graph DBs would be better off realizing this: it's good for \| simple and small problems \| \| it does fall down for graphy tasks like multihop joins, connect \| the dots, and supernodes. So for GB/TBs of that, either you \| should do those outside the DB, or with an optimized DB. \| Likewise, not explicitly discussed in the article, modern \| knowledge graphs are often really about embedding vectors, not \| entity UUIDs, and few/no databases straddle relational queries, \| graph queries, and vector queries \| zozbot234 wrote: \| > it does fall down for graphy tasks like multihop joins, \| connect the dots, and supernodes. \| \| These can always be accomplished via recursive SQL queries. \| Of course any given implementation might be unoptimized for \| such tasks. But in practice, this kind of network analytics \| tends to be quite rare anyway. \| \| One should note that even inference tasks, that are often \| thought of as exclusive to the "semantic" or "knowledge" \| based paradigm, can be expressed very simply via SQL VIEW's. \| Of course this kind of inference often turns out to be \| infeasible in practice, or to introduce unwanted noise in the \| 'inferred' data, but this has nothing to do with SQL per se \| and is just as true of the "knowledge base" or "semantic" \| approach. \| er4hn wrote: \| Graph databases likely are more optimized for this sort of data \| storage, but you've hit it on the head that SQL databases can \| be used to represent node/edge style data. \| zozbot234 wrote: \| The definition seems faulty to me, since the pair (E: subset(N x \| N), f: E - L) does not admit of multiple edges with different \| labels, connecting the same ordered pair of nodes. Of course this \| is most often allowed in practical KG's. \| mmarx wrote: \| Indeed multiple edges (with different labels) are quite useful, \| particularly when you want to represent RDF graphs. But since \| there is no restriction on the form of L, you can still \| represent those by, e.g., letting L be a set of sets of IRIs, \| and thus labelling your edges with sets of IRIs, which you then \| interpret as a set of RDF triples (i.e., as a set of edges). \| low_tech_love wrote: \| On a side note, I love the idea of researchers writing "articles" \| in this format. No paywall, no complex two-column format, no \| PDFs. As a researcher myself, I wish this is what my \| "productivity" was judged upon, I'd probably have a lot more fun \| and motivation to work and produce! \| wrnr wrote: \| KG are cool, but I haven't find a practical framework of \| combining simple logical predicates with temporal facts (things \| that are true at a certain moment in time) and information \| provenance (the truthiness of information given the origin). \| There might be ways to encode this information in a hyper graph \| but they are far from practical. \| physicsyogi wrote: \| Checkout Datomic. It's a temporal database that uses datalog as \| it's query language. There's also Datascript, which does the \| same thing. \| superlopuh wrote: \| Unfortunate name for a product, I can't find anything called \| Dynamic on DDG, only dynamic things with a lowercase d. Do \| you have a link to the project? \| bosie wrote: \| not dynamic but Datomic \| superlopuh wrote: \| Dyslexic moment on my part, thank you \| omginternets wrote: \| Lysdexia makes fools of us all ;) \| mmarx wrote: \| Wikidata statements (which roughly correspond to the edges in \| the Knowledge Graph) have quite a bit of Metadata associated \| with them: they can have refer to sources that state this \| particular bit of knowledge, they have a so-called rank that \| allows distinguishing preferred and deprecated statements, and \| the can be qualified by another statement in the graph. \| Temporal validity is encoded using a combination of rank and \| qualifiers, as for, e.g., Pluto[0], where the instance-of \| statement saying that "Pluto is a planet" is deprecated and has \| an "end time" qualifier, and the preferred statement says \| "Pluto is a dwarf planet," with a corresponding "start time" \| qualifier. \| \| In principle, all of this information is available through the \| SPARQL endpoint or as an RDF export (there is also the \| simplified export that contains only "simple" statements \| lacking all of that metadata), so reasoning over this data is \| not entirely out of reach, but the sheer size (the full RDF \| dump is a few hundred GBs) is also not particularly practical \| to deal with. \| \| [0] https://www.wikidata.org/wiki/Q339#P31 \| gballan wrote: \| Might be worth looking at Sowa's Conceptual Graphs. E.g., [1], \| talks about time, and links to his book. \| \| [1] http://www.jfsowa.com/ontology/process.htm \| hocuspocus wrote: \| Check out Nexus, which was designed with versioning in mind, \| that solves this kind of challenge at the Blue Brain Project: \| \| https://bluebrainnexus.io/ \| physicsgraph wrote: \| Knowledge graphs for text (the focus of the article) seem \| narrowly-scoped since they require "objective" facts and \| relations to be practical. Capturing the subjective and transient \| perspective of observations made by multiple observers (which is \| what we actually have access to) is more complicated. \| \| For example, asking the same person the same question may yield \| different answers based on their mood or other environmental or \| situational factors. Who's asking the question can also matter, \| as does the specific phrasing of the question. ___________________________________________________________________ (page generated 2021-05-22 23:00 UTC)