|
| julienreszka wrote:
| I wish all of you not to fall in the trap of ontologies. I worked
| very hard in this domain my conclusion is that all ontologies
| fail to scale eventually. I would recommend people in the field
| to go towards "perspectivism".
| omginternets wrote:
| Could you (please, pretty please) elaborate?
| hnxs wrote:
| I'd like to read more about what you worked on specifically if
| you're willing to share!
| quag wrote:
| Is this[1] an example of ontological perspectivism? Can you
| point us at a good place to start?
|
| [1]:
| https://link.springer.com/article/10.1007/s11406-021-00371-1
| mistrial9 wrote:
| great start - I have presented this point of view myself.. no
| clue what "perspectivism" means really, though
| Veuxdo wrote:
| Is this just a way of saying that no relations are absolute?
| bluecerulean wrote:
| He most likely means that reasoners and databases that
| provide reasoning abilities do not scale. This makes sense,
| specially for OWL ontologies. For most OWL reasoners, if you
| feed them with the ontology and with a large set of instance
| data (class instances connected by edges that are labeled
| with properties defined in said ontology), it will likely
| take way more time than you would like to produce results (if
| it produces something).
|
| The reason for that is twofold:
|
| 1. Many of tools created for reasoning are research-first
| tools. Some papers were published about the tool and it
| really was a petter and more scalable tool than anything
| before it. But every PhD student graduates and needs to find
| a job or move to the next hyped research area 2. Tools are
| designed under the assumption that the whole ontology, all
| the instance data and all results fit in main memory (RAM).
| This assumption is de-facto necessary for more powerful
| entailment regimes of OWL.
|
| Reason 2 as a secondary sub-reason that OWL ontologies use
| URIs (actually IRIs), which are really inneficient
| identifiers compared to 32/64-bit integers. HDT is a format
| that fixes this inneficiency for RDF (and thus is applicable
| to ontologies) but since it came about nearly all reasoners
| where already abandoned as per reason #1 above.
|
| Newer reasoners that actually scale quite a bit are RDFox [1]
| and VLog [2]. They use compact representations and try to be
| nice with the CPU cache and pipeline. However, they are
| limited to a single shared memory (even if NUMA).
|
| There is a lot of mostly academic distributed reasoners
| designed to scale horizontally instead of vertically. These
| systems technically scale, but vertically scaling the
| centralized aforementioned systems will be more efficient.
| The intrinsic problem with distributing is that (i) it is
| hard to partition the input aiming at a fair distribution of
| work and (ii) inferred facts derived at one node often are
| evidence that multiple other nodes need to known.
|
| loose from modern single-node However, the problem of
| computing all inferred edges from a knowledge graph involves
| a great deal of communication, since one inference found by
| one node is evidence required by another processing node.
|
| [1]: https://www.oxfordsemantic.tech/product [2]:
| https://github.com/karmaresearch/vlog/
| JoelJacobson wrote:
| SQL might be a good fit to model Knowledge Graphs, since FOREIGN
| KEYs can be named, using the CONSTRAINT constraint_name FOREIGN
| KEY ... syntax. We thus have support to label edges.
|
| Nodes = Tables
|
| Edges = Foreign keys
|
| Edge labels = Foreign key constraint names
| FigmentEngine wrote:
| yes, you can always map most structures into tables, or even
| excel.
|
| but it think "good fit" is a stretch. when designing systems
| you generally want to look at data access patterns, and pick a
| data exec approach that aligns to that.
|
| in tech, unfortunately, RDBMS are the "hammer" in "if your only
| tool is a hammer then every problem looks like a nail."
| lmeyerov wrote:
| This kind of approach is pretty common, including in compute
| engines like Spark's graphx. I suspect a lot of teams using
| graph DBs would be better off realizing this: it's good for
| simple and small problems
|
| it does fall down for graphy tasks like multihop joins, connect
| the dots, and supernodes. So for GB/TBs of that, either you
| should do those outside the DB, or with an optimized DB.
| Likewise, not explicitly discussed in the article, modern
| knowledge graphs are often really about embedding vectors, not
| entity UUIDs, and few/no databases straddle relational queries,
| graph queries, and vector queries
| zozbot234 wrote:
| > it does fall down for graphy tasks like multihop joins,
| connect the dots, and supernodes.
|
| These can always be accomplished via recursive SQL queries.
| Of course any given implementation might be unoptimized for
| such tasks. But in practice, this kind of network analytics
| tends to be quite rare anyway.
|
| One should note that even inference tasks, that are often
| thought of as exclusive to the "semantic" or "knowledge"
| based paradigm, can be expressed very simply via SQL VIEW's.
| Of course this kind of inference often turns out to be
| infeasible in practice, or to introduce unwanted noise in the
| 'inferred' data, but this has nothing to do with SQL per se
| and is just as true of the "knowledge base" or "semantic"
| approach.
| er4hn wrote:
| Graph databases likely are more optimized for this sort of data
| storage, but you've hit it on the head that SQL databases can
| be used to represent node/edge style data.
| zozbot234 wrote:
| The definition seems faulty to me, since the pair (E: subset(N x
| N), f: E - L) does not admit of multiple edges with different
| labels, connecting the same ordered pair of nodes. Of course this
| is most often allowed in practical KG's.
| mmarx wrote:
| Indeed multiple edges (with different labels) are quite useful,
| particularly when you want to represent RDF graphs. But since
| there is no restriction on the form of L, you can still
| represent those by, e.g., letting L be a set of sets of IRIs,
| and thus labelling your edges with sets of IRIs, which you then
| interpret as a set of RDF triples (i.e., as a set of edges).
| low_tech_love wrote:
| On a side note, I love the idea of researchers writing "articles"
| in this format. No paywall, no complex two-column format, no
| PDFs. As a researcher myself, I wish this is what my
| "productivity" was judged upon, I'd probably have a lot more fun
| and motivation to work and produce!
| wrnr wrote:
| KG are cool, but I haven't find a practical framework of
| combining simple logical predicates with temporal facts (things
| that are true at a certain moment in time) and information
| provenance (the truthiness of information given the origin).
| There might be ways to encode this information in a hyper graph
| but they are far from practical.
| physicsyogi wrote:
| Checkout Datomic. It's a temporal database that uses datalog as
| it's query language. There's also Datascript, which does the
| same thing.
| superlopuh wrote:
| Unfortunate name for a product, I can't find anything called
| Dynamic on DDG, only dynamic things with a lowercase d. Do
| you have a link to the project?
| bosie wrote:
| not dynamic but Datomic
| superlopuh wrote:
| Dyslexic moment on my part, thank you
| omginternets wrote:
| Lysdexia makes fools of us all ;)
| mmarx wrote:
| Wikidata statements (which roughly correspond to the edges in
| the Knowledge Graph) have quite a bit of Metadata associated
| with them: they can have refer to sources that state this
| particular bit of knowledge, they have a so-called rank that
| allows distinguishing preferred and deprecated statements, and
| the can be qualified by another statement in the graph.
| Temporal validity is encoded using a combination of rank and
| qualifiers, as for, e.g., Pluto[0], where the instance-of
| statement saying that "Pluto is a planet" is deprecated and has
| an "end time" qualifier, and the preferred statement says
| "Pluto is a dwarf planet," with a corresponding "start time"
| qualifier.
|
| In principle, all of this information is available through the
| SPARQL endpoint or as an RDF export (there is also the
| simplified export that contains only "simple" statements
| lacking all of that metadata), so reasoning over this data is
| not entirely out of reach, but the sheer size (the full RDF
| dump is a few hundred GBs) is also not particularly practical
| to deal with.
|
| [0] https://www.wikidata.org/wiki/Q339#P31
| gballan wrote:
| Might be worth looking at Sowa's Conceptual Graphs. E.g., [1],
| talks about time, and links to his book.
|
| [1] http://www.jfsowa.com/ontology/process.htm
| hocuspocus wrote:
| Check out Nexus, which was designed with versioning in mind,
| that solves this kind of challenge at the Blue Brain Project:
|
| https://bluebrainnexus.io/
| physicsgraph wrote:
| Knowledge graphs for text (the focus of the article) seem
| narrowly-scoped since they require "objective" facts and
| relations to be practical. Capturing the subjective and transient
| perspective of observations made by multiple observers (which is
| what we actually have access to) is more complicated.
|
| For example, asking the same person the same question may yield
| different answers based on their mood or other environmental or
| situational factors. Who's asking the question can also matter,
| as does the specific phrasing of the question.
___________________________________________________________________
(page generated 2021-05-22 23:00 UTC) |