[HN Gopher] An Introduction to Knowledge Graphs
___________________________________________________________________
 
An Introduction to Knowledge Graphs
 
Author : umangkeshri
Score  : 97 points
Date   : 2021-05-22 12:02 UTC (10 hours ago)
 
web link (ai.stanford.edu)
w3m dump (ai.stanford.edu)
 
| julienreszka wrote:
| I wish all of you not to fall in the trap of ontologies. I worked
| very hard in this domain my conclusion is that all ontologies
| fail to scale eventually. I would recommend people in the field
| to go towards "perspectivism".
 
  | omginternets wrote:
  | Could you (please, pretty please) elaborate?
 
  | hnxs wrote:
  | I'd like to read more about what you worked on specifically if
  | you're willing to share!
 
  | quag wrote:
  | Is this[1] an example of ontological perspectivism? Can you
  | point us at a good place to start?
  | 
  | [1]:
  | https://link.springer.com/article/10.1007/s11406-021-00371-1
 
    | mistrial9 wrote:
    | great start - I have presented this point of view myself.. no
    | clue what "perspectivism" means really, though
 
  | Veuxdo wrote:
  | Is this just a way of saying that no relations are absolute?
 
    | bluecerulean wrote:
    | He most likely means that reasoners and databases that
    | provide reasoning abilities do not scale. This makes sense,
    | specially for OWL ontologies. For most OWL reasoners, if you
    | feed them with the ontology and with a large set of instance
    | data (class instances connected by edges that are labeled
    | with properties defined in said ontology), it will likely
    | take way more time than you would like to produce results (if
    | it produces something).
    | 
    | The reason for that is twofold:
    | 
    | 1. Many of tools created for reasoning are research-first
    | tools. Some papers were published about the tool and it
    | really was a petter and more scalable tool than anything
    | before it. But every PhD student graduates and needs to find
    | a job or move to the next hyped research area 2. Tools are
    | designed under the assumption that the whole ontology, all
    | the instance data and all results fit in main memory (RAM).
    | This assumption is de-facto necessary for more powerful
    | entailment regimes of OWL.
    | 
    | Reason 2 as a secondary sub-reason that OWL ontologies use
    | URIs (actually IRIs), which are really inneficient
    | identifiers compared to 32/64-bit integers. HDT is a format
    | that fixes this inneficiency for RDF (and thus is applicable
    | to ontologies) but since it came about nearly all reasoners
    | where already abandoned as per reason #1 above.
    | 
    | Newer reasoners that actually scale quite a bit are RDFox [1]
    | and VLog [2]. They use compact representations and try to be
    | nice with the CPU cache and pipeline. However, they are
    | limited to a single shared memory (even if NUMA).
    | 
    | There is a lot of mostly academic distributed reasoners
    | designed to scale horizontally instead of vertically. These
    | systems technically scale, but vertically scaling the
    | centralized aforementioned systems will be more efficient.
    | The intrinsic problem with distributing is that (i) it is
    | hard to partition the input aiming at a fair distribution of
    | work and (ii) inferred facts derived at one node often are
    | evidence that multiple other nodes need to known.
    | 
    | loose from modern single-node However, the problem of
    | computing all inferred edges from a knowledge graph involves
    | a great deal of communication, since one inference found by
    | one node is evidence required by another processing node.
    | 
    | [1]: https://www.oxfordsemantic.tech/product [2]:
    | https://github.com/karmaresearch/vlog/
 
| JoelJacobson wrote:
| SQL might be a good fit to model Knowledge Graphs, since FOREIGN
| KEYs can be named, using the CONSTRAINT constraint_name FOREIGN
| KEY ... syntax. We thus have support to label edges.
| 
| Nodes = Tables
| 
| Edges = Foreign keys
| 
| Edge labels = Foreign key constraint names
 
  | FigmentEngine wrote:
  | yes, you can always map most structures into tables, or even
  | excel.
  | 
  | but it think "good fit" is a stretch. when designing systems
  | you generally want to look at data access patterns, and pick a
  | data exec approach that aligns to that.
  | 
  | in tech, unfortunately, RDBMS are the "hammer" in "if your only
  | tool is a hammer then every problem looks like a nail."
 
  | lmeyerov wrote:
  | This kind of approach is pretty common, including in compute
  | engines like Spark's graphx. I suspect a lot of teams using
  | graph DBs would be better off realizing this: it's good for
  | simple and small problems
  | 
  | it does fall down for graphy tasks like multihop joins, connect
  | the dots, and supernodes. So for GB/TBs of that, either you
  | should do those outside the DB, or with an optimized DB.
  | Likewise, not explicitly discussed in the article, modern
  | knowledge graphs are often really about embedding vectors, not
  | entity UUIDs, and few/no databases straddle relational queries,
  | graph queries, and vector queries
 
    | zozbot234 wrote:
    | > it does fall down for graphy tasks like multihop joins,
    | connect the dots, and supernodes.
    | 
    | These can always be accomplished via recursive SQL queries.
    | Of course any given implementation might be unoptimized for
    | such tasks. But in practice, this kind of network analytics
    | tends to be quite rare anyway.
    | 
    | One should note that even inference tasks, that are often
    | thought of as exclusive to the "semantic" or "knowledge"
    | based paradigm, can be expressed very simply via SQL VIEW's.
    | Of course this kind of inference often turns out to be
    | infeasible in practice, or to introduce unwanted noise in the
    | 'inferred' data, but this has nothing to do with SQL per se
    | and is just as true of the "knowledge base" or "semantic"
    | approach.
 
  | er4hn wrote:
  | Graph databases likely are more optimized for this sort of data
  | storage, but you've hit it on the head that SQL databases can
  | be used to represent node/edge style data.
 
| zozbot234 wrote:
| The definition seems faulty to me, since the pair (E: subset(N x
| N), f: E - L) does not admit of multiple edges with different
| labels, connecting the same ordered pair of nodes. Of course this
| is most often allowed in practical KG's.
 
  | mmarx wrote:
  | Indeed multiple edges (with different labels) are quite useful,
  | particularly when you want to represent RDF graphs. But since
  | there is no restriction on the form of L, you can still
  | represent those by, e.g., letting L be a set of sets of IRIs,
  | and thus labelling your edges with sets of IRIs, which you then
  | interpret as a set of RDF triples (i.e., as a set of edges).
 
| low_tech_love wrote:
| On a side note, I love the idea of researchers writing "articles"
| in this format. No paywall, no complex two-column format, no
| PDFs. As a researcher myself, I wish this is what my
| "productivity" was judged upon, I'd probably have a lot more fun
| and motivation to work and produce!
 
| wrnr wrote:
| KG are cool, but I haven't find a practical framework of
| combining simple logical predicates with temporal facts (things
| that are true at a certain moment in time) and information
| provenance (the truthiness of information given the origin).
| There might be ways to encode this information in a hyper graph
| but they are far from practical.
 
  | physicsyogi wrote:
  | Checkout Datomic. It's a temporal database that uses datalog as
  | it's query language. There's also Datascript, which does the
  | same thing.
 
    | superlopuh wrote:
    | Unfortunate name for a product, I can't find anything called
    | Dynamic on DDG, only dynamic things with a lowercase d. Do
    | you have a link to the project?
 
      | bosie wrote:
      | not dynamic but Datomic
 
        | superlopuh wrote:
        | Dyslexic moment on my part, thank you
 
        | omginternets wrote:
        | Lysdexia makes fools of us all ;)
 
  | mmarx wrote:
  | Wikidata statements (which roughly correspond to the edges in
  | the Knowledge Graph) have quite a bit of Metadata associated
  | with them: they can have refer to sources that state this
  | particular bit of knowledge, they have a so-called rank that
  | allows distinguishing preferred and deprecated statements, and
  | the can be qualified by another statement in the graph.
  | Temporal validity is encoded using a combination of rank and
  | qualifiers, as for, e.g., Pluto[0], where the instance-of
  | statement saying that "Pluto is a planet" is deprecated and has
  | an "end time" qualifier, and the preferred statement says
  | "Pluto is a dwarf planet," with a corresponding "start time"
  | qualifier.
  | 
  | In principle, all of this information is available through the
  | SPARQL endpoint or as an RDF export (there is also the
  | simplified export that contains only "simple" statements
  | lacking all of that metadata), so reasoning over this data is
  | not entirely out of reach, but the sheer size (the full RDF
  | dump is a few hundred GBs) is also not particularly practical
  | to deal with.
  | 
  | [0] https://www.wikidata.org/wiki/Q339#P31
 
  | gballan wrote:
  | Might be worth looking at Sowa's Conceptual Graphs. E.g., [1],
  | talks about time, and links to his book.
  | 
  | [1] http://www.jfsowa.com/ontology/process.htm
 
  | hocuspocus wrote:
  | Check out Nexus, which was designed with versioning in mind,
  | that solves this kind of challenge at the Blue Brain Project:
  | 
  | https://bluebrainnexus.io/
 
| physicsgraph wrote:
| Knowledge graphs for text (the focus of the article) seem
| narrowly-scoped since they require "objective" facts and
| relations to be practical. Capturing the subjective and transient
| perspective of observations made by multiple observers (which is
| what we actually have access to) is more complicated.
| 
| For example, asking the same person the same question may yield
| different answers based on their mood or other environmental or
| situational factors. Who's asking the question can also matter,
| as does the specific phrasing of the question.
 
___________________________________________________________________
(page generated 2021-05-22 23:00 UTC)