|
| mdavidn wrote:
| I do love PostgreSQL, and I often reach for this approach when
| working with hierarchical data, but a word of warning: Recursive
| queries will not scale to handle _very_ large edge tables. The
| recursive query will always consider _all_ edges, even when the
| outer query seeks only a single node's relationships. The
| solution is to build and index denormalized n-th order
| relationship tables.
|
| Others have already pointed out the issues of cycles.
| Nezteb wrote:
| I learned from one of the comments on the post about AGE[1], "a
| PostgreSQL extension that provides graph database
| functionality."
|
| [1] https://age.apache.org/
| afandian wrote:
| I'm sorry could you spell it out? What exactly does "recursive
| query will always consider _all_ edges" mean? A table scan? I'd
| be very grateful if you could give some pesudocode or point to
| a doc page.
| jeremyjh wrote:
| I think GP means that it has to completely expand the
| recursive part for every branch before any where condition on
| the edge nodes can be applied. Graph databases presumably can
| optimize this.
|
| I've found recursive queries to be difficult to scale in
| real-world queries past a few million edge nodes. We've
| denormalized several tree relationships so that the edge is
| connected both to its parent and to the root of its tree in
| several cases.
| afandian wrote:
| Thanks! Sounds like you're saying that brute-force
| recursive algorithms without backtracking or early
| termination aren't a good match for path finding. That's
| not a surprise.
|
| I'm using recursive algorithm on Postgres to find trees in
| a graph, but only where I'm specificaly interested in all
| of the nodes.
| cryptonector wrote:
| Cycles are not a problem: just use `UNION`, not `UNION ALL`.
|
| I myself build transitive closure tables that I update
| incrementally as needed (sometimes in a deferred way), so that
| many of the recursive queries I do can be very fast, but I only
| build transitive closures for the smallest portion I can
| (basically group nesting).
| liotier wrote:
| > Recursive queries will not scale to handle _very_ large edge
| tables
|
| What do you consider large ? We have a 12-year old telco
| application with a few million objects in Neo4J, doing fault
| impact calculations... Would PostgreSQL handle that easily
| nowadays ?
| simonw wrote:
| My hunch is that a few million objects is pretty tiny these
| days - you could probably write a script to import all of
| them into PostgreSQL in a few minutes and try it out yourself
| to see how it behaves.
| ellisv wrote:
| It depends a lot on how connected the graph is and whether
| you can fit it in memory.
| simonw wrote:
| I tried it against SQLite. I got ChatGPT to write me a
| script that would insert 1m edges and 1m nodes:
|
| https://gist.github.com/simonw/c16ce01244760e186a3a0aa3fee0
| 4...
|
| Then I ran that query again and it seems to return results
| in about 80ms:
|
| https://lite.datasette.io/?sql=https://gist.github.com/simo
| n...
| eurasiantiger wrote:
| Not very realistic example, you need to be requesting
| some actual fields across nodes and doing some filtering
| on at least strings and dates, maybe geo areas as well.
| simonw wrote:
| Go ahead and try it! I've documented all of the tools you
| need to run some detailed experiments against SQLite
| entirely in your browser here.
| ellisv wrote:
| It's not even that it always considers all edges, but that you
| can't exit the search early when a condition is met. In other
| words, you have to first find all the paths and then, outside
| the CTE, filter to the shortest. We push the node filter into
| the CTE by wrapping it in a function.
|
| > The solution is to build and index denormalized n-th order
| relationship tables.
|
| This sounds much more performant but also more difficult to
| maintain.
| switchbak wrote:
| "you can't exit the search early when a condition is met" - I
| have a DAG traversal written in a recursive CTE, and I can
| bail early out just fine when my traversal predicates no
| longer match. Not sure why I'd have to do that outside the
| (recursive part of the) CTE?
|
| Obviously maintaining a flattened version is going to perform
| better in queries, but you're trading maintenance (write)
| time for query time. Or materialization time if you decide to
| do it lazily.
| ellisv wrote:
| If you have an edge list like:
|
| A->B
|
| A->D
|
| B->C
|
| C->D
|
| Postgres will walk both paths from A to D in the recursive
| CTE and then you can filter them afterwards to keep only
| the shortest.
|
| You can use aggregate functions within the recursive CTE,
| so you can't GROUP BY your starting node and stop once you
| find the first path. There isn't a way to compare across
| multiple paths or iterations of the for-loop.
| FleaFlicker99 wrote:
| Ok but that's a little different than saying you can't
| cut the traversal short.
|
| If I'm traversing ancestors (let's say) until I find one
| that doesn't satisfy a condition, it'll bail out then. I
| get that this doesn't serve all uses cases, but it isn't
| a small thing either.
| mdavidn wrote:
| Yes, it is difficult to maintain. That might be a good moment
| to consider a proper graph database!
| ellisv wrote:
| I've considered using a column to store indicate 2nd, 3rd,
| etc degree relations instead of n-th order tables.
| YouWhy wrote:
| While the basic technique is sound, SQL recursive queries' no-
| side-effects character means a huge number of basic graph
| traversal strategies (e.g.: pruning, shortest path
| first/Dijkstra) are beyond reasonable engineering.
|
| In addition to that, Postgres is difficult as a computational
| platform - the execution strategy for recursive queries is not
| necessarily intuitive, and their performance is hard to assess
| (for example: how soon will you run out of "stack"?)
| cpa wrote:
| It's a good tool as long as your queries are simple, and your
| graph isn't too large (which is probably the case 99% of the
| time). And if that allows you to have one less tool to deal with,
| go for it!
|
| But I've tried to push it to its limits (using PostGIS, my nodes
| being land plots that were connected to each others -- the
| biggest query used joins, lateral, recCTE, windows... you name
| it) and ran into many issues: can't branch out early, hard to
| optimize, no support for complex queries (writing anything beyond
| BFS and DFS is a pita). Yet, in the end I accepted the struggle
| (and the long queries) because it was so nice to work with my
| data directly where it was kept :)
| amyres wrote:
| Using the built-in PostGIS topology tools? Or something more
| custom? I'm curious as I just started digging into and using
| PostGIS for a land parcel use case. I've wondered about the
| graph structure of parcels adjacent to others, rather than just
| a big unordered table of geom objects.
| cpa wrote:
| Just using the geography stuff and doing joins on
| ST_Intersects. I couldn't guarantee that my data formed a
| partition of the plane, which is necessary for topology iirc.
|
| Fun was had and headaches too. The biggest speedup I got was
| computing the graph online (creating a node/vertices tables
| from geometries) and then doing the recursive CTE on that
| table.
| amyres wrote:
| Cool thanks! I'm amazed at how much can be done in PostGIS
| (if my SQL queries are good enough...)
| Rapzid wrote:
| I've found this post pretty inspiring for what can be done with
| graphs in Postgres: https://www.alibabacloud.com/blog/postgresql-
| graph-search-pr...
|
| There are a LOT of use cases that will fit the capabilities and
| performance characteristics.
| stevebmark wrote:
| Is with_recursive nearly as performant as dedicated graph
| database query languages?
| vrglvrglvrgl wrote:
| [dead]
| crunchengine wrote:
| I use EdgeDB for graphs, it works well enough, I just don't know
| about performances but for sure it is sufficient for most use-
| cases.
| e1g wrote:
| EdgeDB is Postgres so the same constraints apply.
| pcthrowaway wrote:
| I imagine that's why the person you're responding to
| mentioned it. I'm surprised the author of this article didn't
| claytongulick wrote:
| It doesn't handle all cases or all queries, but one method I've
| used to deal with graph data in relational databases it to
| materialize paths and cache them on the node (this only really
| works well with DAGs).
|
| So, the path key could look something like: bob.ted.lisa.bill
| (obviously, use ids instead of names here)
|
| Then you can index this in various ways to make LIKE queries
| fast.
|
| Want do know anyone that rolls up to ted?
| select * from blah where path like 'bob.ted%'
|
| Or, how many people report to ted, but aren't under lisa?
| select count(*) from blah where path like 'bob.ted%' and path not
| like 'bob.ted.lisa%'
|
| And something a bit more complex that would typically be
| considered the domain of a graph database, like "What are the
| average number of direct reports each manager in my organization
| has?" select name, avg((select count(*) from
| blah as sub where path like '%' || name || '%' and
| (array_position(string_to_array(path,'.'), name) =
| ((array_position(string_to_array(path,'.'), sub.name) - 1))
|
| So, even with that degree of complexity, you're still avoiding
| recursion, though you might be getting into O(n^2) territory if
| you don't have an index that will handle '%' || ... || '%' well.
| You can expand on this with various techniques depending on the
| application.
|
| It's certainly no replacement for a graph database, but for basic
| stuff it's very fast, handles huge volumes, and doesn't rely on
| any sort of recursion at the database layer (unless you
| materialize in a trigger or something).
|
| The actual materialization code can be tricky, one technique to
| deal with that is to use the author's technique or something
| similar to have a recursive CTE in a view. When a graph change is
| made, you can scope the change based on the path's hierarchy and
| update the paths from the recursive view, that way you only hit
| the performance penalty at write time, not read.
| szarnyasg wrote:
| I designed and maintain several graph benchmarks in the Linked
| Data Benchmark Council, including workloads aimed for databases
| [1]. We make no restrictions on implementations, they can any
| query language like Cypher, SQL, etc.
|
| In our last benchmark aimed at analytical systems [2], we found
| that SQL queries using WITH RECURSIVE _can_ work for expressing
| reachability and even weighted shortest path queries. However,
| formulating an efficient algorithm yields very complex SQL
| queries [3] and their execution requires a system with a
| sophisticated optimizer such as Umbra developed at TU Munich [4].
| Industry SQL systems are not yet at this level but they may
| attain that sometime in the future.
|
| Another direction to include graph queries in SQL is the upcoming
| SQL/PGQ (Property Graph Queries) extension. I'm involved in a
| project at CWI Amsterdam to incorporate this language into DuckDB
| [5].
|
| [1] https://ldbcouncil.org/benchmarks/snb/
|
| [2] https://www.vldb.org/pvldb/vol16/p877-szarnyas.pdf
|
| [3]
| https://github.com/ldbc/ldbc_snb_bi/blob/main/umbra/queries/...
|
| [4] https://umbra-db.com/
|
| [5] https://www.cidrdb.org/cidr2023/slides/p66-wolde-slides.pdf
| eternalban wrote:
| Re [5]'s asssertion under "blunders" of the diminish usecases
| post sql/pgq, what do you think of [something] like Quine?
|
| https://github.com/thatdot/quine
|
| Their claim to fame is progressive incremental computation -
| each node is an actor responding to events -- and I'm not sure
| how a relational db could do that and match the latencies. That
| usecase is pretty much pattern matching and forensics and stuff
| like that.
|
| https://docs.quine.io/core-concepts/architecture.html
| hot_gril wrote:
| Postgres is powerful. I've implemented graph DB, mapreduce, huge
| sparse matrix math (which was cool), and ofc a KV store with it.
| With actual performance benefits over the alternatives. It's not
| very ergonomic for those, but it works.
|
| That said, I've had so many jobs where the team decides to fit
| our whole data model into a fully recursive graph, and it's been
| a huge mistake every time regardless of whether they use Postgres
| or a dedicated graph DB. Just because you can do it (you always
| can!) doesn't mean you should. Start with just a traditional
| n-hierarchy schema and only look at modeling things as a graph if
| you're sure it's necessary. Usually it's not. Sometimes you'll
| have a mostly non-graph schema with a couple of graph tables for
| the things that really need that.
| ape4 wrote:
| Postgres has commands to easily list conventional tables nicely.
| While I can see this would work, there are no native Postgres
| commands to list the graph.
| jzelinskie wrote:
| We used the "WITH RECURSIVE" basically the moment it shipped in
| Postgres for the original CoreOS Clair[0] data model which was a
| graph of software versions and vulnerabilities. It scaled well
| compared to a naive graph abstraction implemented outside the
| database, but when performance wasn't great, it REALLY wasn't
| great. After running that version in production for a couple
| years, we ended up throwing it out and remodeling the domain to
| be less graph-y to achieve more consistent performance.
|
| I've since worked on SpiceDB[1] which scales much better by
| taking the traditional design approach for graph databases and
| only treating Postgres as triple-store. IME, there's no short-
| cut: if you need a graph, you probably want to use a database
| optimized for graph access patterns. Most general-purpose graph
| databases are full of optimizations for common traversals that
| are uncommon operations in relation databases.
|
| [0]: https://github.com/quay/clair
|
| [1]: https://github.com/authzed/spicedb
| yooloo wrote:
| Of course relational db can act like a graph db. It's just not as
| efficient due to how things are stored and queried. Would be
| great to have a graph db plugin (and I found one
| https://github.com/apache/age)
| piyh wrote:
| Having first class pagerank and shortest path functions in
| tinkerpop gremlin vs having to roll it yourself with recursive
| CTEs feels like a graph DB win.
| sigstoat wrote:
| any suggestions for getting nice graph db-style functionality out
| of an postgres schema, where you've got the nodes and edges of a
| labelled property graph spread across a bunch of different
| tables? (AGE seems to want to create a separate graph object that
| lives alongside your other tables, and toy examples like this
| article just dump everything into special tables)
|
| in my particular case i don't actually have a lot of data so i'm
| not looking for performance, just want to save developer time.
| ghotli wrote:
| I like this idea and ultimately what I want is "is this a DAG or
| not" and if so "give me the topological sort".
|
| Anyone know of good docs for this sort of pattern with SQL that
| addresses those two common tasks?
| varunjain99 wrote:
| This is a neat data modeling exercise, but not sure why you'd
| prefer this approach over Neo4j.
|
| It's much simpler to query in native graph database land (though
| it's not exactly SQL, it's pretty close to be honest). And there
| are performance / reliability implications of forcing postgres
| into graph land.
| piyh wrote:
| This isn't just a PG thing. The wiki article isn't great, but the
| references have links to sqlite, mssql, oracle, ibm, etc.
|
| https://en.wikipedia.org/wiki/Hierarchical_and_recursive_que...
| simonw wrote:
| This same technique - WITH RECURSIVE - is also supported by
| SQLite.
|
| I ported the example in this tutorial to SQLite here - now you
| can play with it in Datasette Lite:
| https://lite.datasette.io/?sql=https%3A%2F%2Fgist.githubuser...
|
| Amusingly I ported the sample PostgreSQL code using the new
| ChatGPT "Browsing" alpha, with the following prompt:
|
| > Read this tutorial and then output equivalent create table and
| insert statements for SQLite
| https://www.dylanpaulus.com/posts/postgres-is-a-graph-databa...
| punnerud wrote:
| If we have links to other Sqlite3 databases in the SQLite
| itself, we could make an static webpage distributed graph
| database: https://news.ycombinator.com/item?id=27016630
|
| With the static database trick, combined with recursive graphs
| mharig wrote:
| IIRC, the maybe biggest graph database in the world, TAO (The
| Associations and Objects ?) of FB, is based on MySQL.
|
| Meta may have had a good reason not to choose a 'native' graph
| DB.
| its-summertime wrote:
| Handling cycles seems to be a missing point.
|
| https://www.postgresql.org/docs/current/queries-with.html#QU...
| the docs themselves do have information about it.
| ellisv wrote:
| Postgres can detect cycles; see 7.8.2.2 in the docs:
| https://www.postgresql.org/docs/current/queries-with.html#QU...
| jakewins wrote:
| Johan wrote the original Neo4j implementation as a graph layer on
| top of Postgres, random trivia. Then Java released NIO, he got
| excited and tried replacing the PG engine with one built for
| graphs from scratch.
| kaba0 wrote:
| Only remotely relevant, but I have recently come across Gremlin,
| a sort of "SQL for graphs". Does anyone have some experience with
| it they can share? I only used it on an in-memory graph database,
| and I found it quite great so far.
| inguz wrote:
| Tinkerpop/Gremlin is not really SQL-like at all. My experience
| with it has not been fun: it's difficult to reason about, has
| obscenely bad documentation, poor tooling, idiosyncratic
| library support, and simple data access patterns are hard.
| user00012-ab wrote:
| whoa.... postgres is moving quickly and taking jobs away from
| other databases. Maybe we need to hold up for 6 months on any new
| uses for postgres until we all have some time to figure this out.
| truth_seeker wrote:
| Few suggestions based on my prior experience on the same task in
| production system:-
|
| 1. Both the tables (Nodes and Edges) MUST HAVE one more column as
| "label" and then you must create partitions based on distinct
| values of "label" column. This greatly helps queries to run
| faster as search plan is narrowed when you include it in where
| clause and PG knows partitions to hit.
|
| 2. Instead of one big fat primary key column in each table use,
| consider having different uniques indexes on each partition.
|
| 3. The EDGES table can afford to have "data" column of data type
| TEXT or perhaps JSONB. PG saves it in different data disk layout
| and compression works great here. We used it to cache the result
| for previous/next nodes data or compute the result for many extra
| hops which are required on frequent basis.
|
| 4. Use PG procedures to hide the complexity of reusable DFS/BFS
| queries.
| yayr wrote:
| Can someone explain or point towards an explanation, what the
| performance impact of such a query is? How does it scale with
| number of edges or depth of recursion? Is there a special smart
| way of indexing this to be efficient?
| henryfjordan wrote:
| If you store a list of edges like (startId, endId) with an
| index on startId, then every time you want to follow an edge
| will cost O(log(n)) where n is the number of edges in your
| graph.
|
| You can get O(1) performance on similar queries using a
| different storage strategy (by storing edges as pointers to the
| other nodes in memory) but SQL is not the right tool for that,
| you would be better off with a true graph DB.
| winrid wrote:
| If you have indexes on next_node and previous_node, it'll
| preform okay.
|
| The size of the table (# of edges) won't actually matter *that*
| much performance wise with btree indexes. The performance will
| mostly depend on how deep you want to go with your recursion.
| The more nodes you check per query, the slower it'll be.
| Luckily PG will keep the frequently accessed nodes and index
| pages in the buffer pool, so it'll perform close enough to an
| in-memory graph database for most use cases that access the
| same data often - while not requiring your whole dataset to fit
| into memory all the time.
|
| MongoDB's current graph search mechanism works the same way.
|
| I think actually if you are using incremental ids and you don't
| delete edges, you could try BRIN indexes for the next/previous
| node columns. This could still get you pretty good performance
| with _much_ lower storage space requirements than a
| "traditional" index, so you could theoretically store a pretty
| large graph (thinking billions of edges) with minimal disk
| space.
| ForHackernews wrote:
| I can't believe this article doesn't mention the (first-party)
| ltree extension: https://hoverbear.org/blog/postgresql-
| hierarchical-structure...
| fbn79 wrote:
| Is not nested sets https://en.wikipedia.org/wiki/Nested_set_model
| a better solution than the one described in the article?
| elchief wrote:
| Joe Celko covers graphs in chapters 12-14 of Trees and
| Hierarchies in SQL for Smarties
|
| as well as chapter 27 of SQL for Smarties, 5th
|
| isn't ISO working on an graph extension to regular SQL? Can't
| seem to google it correctly
| garyclarke27 wrote:
| In my experience Postgres works really with a Recursive CTE
| inside a Materialized View for Graphs. Latest version can detect
| cycles. I used to add a distance column as well, to show number
| of hops apart, any 2 nodes are. Hoping Postgres adds Automatic
| Incremental Mat View Maintenance soon, it will be perfect then as
| a Graph Database.
| TOMDM wrote:
| Incremental materialised view maintenance is a killer feature.
|
| If Postgres added it, almost all of my interest in Materialize
| (the database, not the technique) would vanish immediately.
| dfragnito wrote:
| We accomplish similar with SchemafreeSQL. Currently MySQL 8 back-
| end but working on other SQL back-ends. This demo uses
| PlanetScale as the DB back-end, Fly.io handles the SFSQL
| endpoint, Netlify function handles the query, and cytoscape.js
| graph network library for the UI
|
| https://harmonious-mermaid-c4d794.netlify.app/got (sorry not
| mobile friendly but functional)
| samlambert wrote:
| This is a cool project.
| dfragnito wrote:
| Thanks !! We had hoped to use PS for our serverless backend
| and also offer a dedicated PS option. But we ran into some
| issues with Vitess. I was just thinking today I wanted to get
| back into it and see if we can resolve these issues. We had
| to go with Aurora.
|
| The issues were on deletes. This demo is read only so it
| works well with a PS backend. My experience with your CS has
| been amazing despite these issues!!
| _boffin_ wrote:
| Again, i'll post
| https://books.google.com/books/about/Joe_Celko_s_Trees_and_H...
|
| pdf:
| https://github.com/Gumplefik/pdf/blob/master/Joe%20Celko's%2...
| packetslave wrote:
| we're posting copyrighted PDFs to HN now?
| jghn wrote:
| Over time I've come to take this a step further. I feel people
| reach for a graph database way too early. Most use cases that
| most people have will work fine in a standard relational setup.
| And that brings with it the benefits of using technologies like
| Postgres that have stood the test of time. So I treat it as I do
| any performance consideration: first demonstrate that you
| actually need to go down the path less traveled, don't just
| assume you need to do so.
| rajman187 wrote:
| Completely agree. In fact, even massive graph data can make use
| of a relational database + caching [1]
|
| [1] https://engineering.fb.com/2013/06/25/core-data/tao-the-
| powe...
| cryptonector wrote:
| The reason graph DBs exist is mainly to provide easier syntax for
| recursive queries (which aren't really recursive because they're
| tail-recursive, which means iterative rather than what most
| people think of as recursive).
|
| Any questions?
|
| It's not clear how one might make SQL syntax for recursive
| queries simpler. Clearly we can have syntactic sugar for
| "dereferencing" relationships, so one could say `SELECT
| child-->name FROM parent_child WHERE ...;`, and we could have
| aggregation `SELECT array_agg(child-->name) FROM parent_child
| WHERE ...;`, and `SELECT child--->name FROM parent_child WHERE
| ...;` to say "recurse all the way through the child relation",
| but if you want more than one column then things get tricky.
| dapearce wrote:
| Also see Apache Age, a graph database extension for Postgres:
| https://github.com/apache/age
| philzook wrote:
| Very interesting. Relatedly, I've been refining simple
| lightweight ways to use a regular SQL database to execute
| seminaive datalog queries. Path reachability in a graph is the
| canonical example. One could use stored functions to implement
| the outer loop also.
|
| There are some other fun related applications like graph
| rewriting using SQL, etc.
|
| MiniLitelog: Easy Breezy SQLite Datalog -
| https://www.philipzucker.com/tiny-sqlite-datalog/
| ChicagoDave wrote:
| As a C# guy, I've figured out how to build in-memory graphs with
| generics and dsl/fluid queries. I'm working on a blog entry about
| it because it's such a powerful skill to learn and leverage. No
| data store required. I can deserialize/serialize the data
| structure to/from binary or json.
| ralusek wrote:
| I've done this on many occasions, but I actually only go with a
| single nodes table.
|
| Schema is like this id type: string (or
| enum) data: jsonb indexed: jsonb (but this one
| gets a GIN index) toId: nullable foreign key to this same
| table fromId: nullable foreign key to this same table
| createdAt: date updatedAt: date
|
| So if I had a post with a comment on it, the data would look
| something like this { id: '1111',
| type: 'users', data: { name: 'John Doe' }, }
| { id: '2222', type: 'users', data: {
| name: 'Jane Doe' }, } { id: '3333',
| type: 'posts', data: { text: 'Nice day today' },
| } { id: '4444', type: 'comments',
| data: { text: 'For you, maybe. Raining here' }, }
|
| And now for the edges: { id: '5555',
| type: '(posts < users)', toId: '3333',
| fromId: '1111', } { id: '6666',
| type: '(comments < users)', toId: '4444',
| fromId: '2222', } { id: '7777',
| type: '(posts < comments)', toId: '3333',
| fromId: '4444', }
|
| So then, for example, if I have a post, and I want to find the
| comments on it, I search for toId = , type = '(posts
| < comments)'.
| TOMDM wrote:
| This works well for nodes with at most 1 parent and child, the
| post is aimed at supporting many to many relationships though
| ralusek wrote:
| What I've described here supports many to many...
| fauigerzigerk wrote:
| Yes, but the downside of your schema is that the
| distinction between nodes and edges is somewhat muddled.
|
| Presumably, all nodes would have nulls in both fromId and
| toId, but the schema doesn't enforce that. The schema also
| allows linking edges to other edges.
|
| Don't you think this is a bit too much flexibility if the
| intention is to model a graph?
| Dopameaner wrote:
| Andy Pavlov, mentioned something similar. Where, he advocated
| Relational database to solve most of the graph database
| utilities. He also made a bet, if graph databases spawn a legit
| usecase by 2030, that he would change his CMU profile picture.
| henryfjordan wrote:
| I implemented pretty much exactly this setup once.
|
| For what it's worth, there's a lot of footguns with this
| approach. You need to be careful about infinite cycles and things
| like that. You also just do not get the ergonomics of using a
| query language that is meant for graph data. Once you get it all
| setup and experience the issues you will no doubt be able to
| build whatever you want but there's a learning curve.
|
| In the end we switched to using Neo4j and were able to build new
| features a lot more quickly than we would've been able to on
| Postgres.
|
| It's also worth mentioning that there are many ways to store
| graph data, and using an "adjacency list" like is being done here
| may or may not be the best way for your use case. Neo4j I think
| uses a different approach where nodes store info about edges they
| are connected to directly, so you don't even need to hit an index
| to follow an edge.
| mistermann wrote:
| Piggybacking on your Neo4J reference....can anyone suggest any
| resources that would be good for someone from the SQL world who
| is very uninformed on graph databases to quickly get their head
| around the key ideas, but even more importantly, how to choose
| a platform among the offerings (licensing being I think a key
| issue, I've heard Neo4J can get expensive)...assume a large,
| "social media" scale system, to be safe.
| viksit wrote:
| you can use datomic as a good example.
| https://hashrocket.com/blog/posts/using-datomic-as-a-
| graph-d...
|
| another good example using predicates -
| https://github.com/threatgrid/asami/wiki/2.-Introduction
| pottertheotter wrote:
| I was looking at graph databases/Neo4j a lot early last year
| and found Neo4j has a lot of good resources.
|
| You can get this book for free through their website. [1]
|
| This have some good documentation[2], including this page
| "Transition from relational to graph database".[3]
|
| [1] https://neo4j.com/graph-databases-book/
|
| [2] https://neo4j.com/docs/getting-started/current/
|
| [3] https://neo4j.com/docs/getting-
| started/current/appendix/grap...
| henryfjordan wrote:
| Neo4j has a self-hosted community edition that is free,
| though you need to update to Enterprise to expand your DB
| beyond one node. It's worth noting though that scaling from a
| single node to a cluster is going to have some pretty huge
| performance issues whether you are using SQL or Neo4j to
| store your graph. The power of graph databases is that it
| should be very inexpensive to follow an edge in the graph but
| when that leads to a network hop you lose a lot of
| performance.
|
| If you are comfortable fitting your data onto one box, then
| development speed is probably more important than other
| factors and I would just try a few databases out and
| especially pay attention to how good the libraries are in
| your programming language of choice. Neo4j for instance had
| high quality libraries in Java but the experience in Python
| was not as good.
|
| If you have a lot of data or need next-level performance, I
| would start by doing some research on the various ways to
| store graph data and then pick a DB that supports the
| approach you want to take.
| joshspankit wrote:
| As someone who hated being in the SQL world and couldn't
| figure out why until years later when I found out about graph
| databases, here's one big shift:
|
| Graph database store true relationships.
|
| The R in RDBMS and the concept of relationships by primary
| keys are lies (imo). They lead people away from what true
| relationships can be. Databases like Neo4j are not about
| doing any sort of PK or join or merge-before-query. Looking
| up by connection is the _fastest_ way to find information. If
| your RDBMS had one table for phone numbers, one for email
| addresses, one for first names, one for last names, one for
| street, and so-on it would take huge amounts of time to query
| to find all the detail for a single person. With a graph
| database it takes a small amount of time to find the first
| bit of info (let's say phone number since it's easily
| sorted), and then every other bit of linked data is
| essentially free.
| eurasiantiger wrote:
| Effectively, moving from RDB to a graph database just
| shifts relationship resolution from query time to
| insert/update time. Sometimes this can be an advantage. I'd
| even wager usually.
| taubek wrote:
| For some basics of graph modeling take a look at
| https://memgraph.com/docs/memgraph/tutorials/graph-modeling.
|
| Some differences between SQL and graph databases are covered
| at https://memgraph.com/blog/graph-database-vs-relational-
| datab....
| cryptonector wrote:
| > You need to be careful about infinite cycles and things like
| that.
|
| Not really, just never forget this: always use `UNION`, _and
| never_ `UNION ALL`, in recursive queries!
| revskill wrote:
| PostgreSQL is my love. Thanks for all developers.
|
| Do not blame the tool if you feel it doesn't fit your niche.
| Blame yourself first, and try to use it the right way.
| amir734jj wrote:
| Good example. Now do it with undirected graph. How will the query
| look like?
|
| So, is this basically running transitive closure on the database
| during query? it would be expensive.
| pphysch wrote:
| Small nitpick but I wish the author just made `data` a JSONB
| field rather than VARCHAR. That really shows the power of
| Postgres, being able to operate as a "NoSQL graph DB" with little
| to no hacking.
| ganderzz wrote:
| Hey, author here! Thanks for the feedback! I went back and
| forth with having `data` being a JSONB field or VARCHAR, but
| ended up with VARCHAR to show that `nodes` don't have to be
| anything crazy. Really `nodes` is just a standard table you'd
| see in any old SQL database, and what makes it a "graph" is the
| `edges`/relationship table.
| bottlepalm wrote:
| I can see how this possibly won't scale compared to dedicated
| graph dbs.
|
| I'm interested in what the fundamental difference is in those
| graph dbs because rdbms seem so close I don't see why it isn't
| just provided as an extra layer on top.
|
| As described in the article, any graph can already be built with
| an rbms, so why for example does neo4j need to be standalone and
| can't possibly be a layer on top of postgres to leverage existing
| relationships? Any good articles on that?
| emodendroket wrote:
| I don't know off the top of my head but I'd assume that one
| could adjust the storage strategy and other performance
| tweaking to optimize for the expected use case.
| samwillis wrote:
| If you are looking to utilise this with Django, there is the
| brilliant Django CTE project that enables it and other CTE
| queries with the Django ORM:
|
| https://dimagi.github.io/django-cte/#recursive-common-table-...
| taubek wrote:
| I personally prefer native graph databases such as Memgraph[1].
| Graph databases have it's advantages and disadvantages. Sometimes
| you, need RDBMS, sometimes you need NoSQL solution. So I pick my
| stack based on situation and issue that I need to solve.
|
| [1] https://memgraph.com
| rajman187 wrote:
| "Native" is a misleading or misused term in the context of
| graph databases because a graph cannot be mapped in a maximally
| consistent way with the underlying hardware (the von Neumann
| architecture represents data in a sequential manner and it is
| much faster to access it this way rather than randomly).
|
| There are generally two families in the graph database world:
| those which use underlying traditional tables of nodes and
| many-to-many edges; and index-free adjacency which just means
| each node in the graph knows the memory address of its
| connections (other side of the edges).
|
| Distributed graphs necessarily end up using the former because
| it's difficult if not impossible for a node to know the memory
| address of its connection when that crosses a physical
| boundary. So typically index-free adjacency graphs have a
| master-slave setup with multiple read replicas but a single one
| to write to.
|
| So with a "native graph" you don't rely on potentially
| expensive join operations to find neighbors of neighbors and
| can traverse complex paths easily.
| dtomicevic wrote:
| This is spot on. Though, in Memgraph's context, there is a
| combination of lock free techniques and raw pointer
| representation that can work exceptionally well for a large
| variety of use cases, and even better so compared to others
| on streaming / event driven / large volume of writes. Using
| raw pointers, there are also optimization opportunities when
| storing or rebalancing the graph to maximize the number of
| adjacent nodes in a cache line so benefiting from sequential
| representation too.
| mashroom wrote:
| Thanks for pointing me toward Memgraph. I didn't hear of it
| before and it appears to be a great alternative to neo4j.
| nilsbunger wrote:
| Sure, at its most basic a graph database is just a bunch of
| many2many relationships.
|
| The recursive query is cool, but how is the performance for graph
| queries compared to a DB optimized for this use case ?
| ellisv wrote:
| I haven't compared, but something like Kuzu is probably much
| more performant for a lot of use cases because it implements
| Worst-Case Optimal Joins.
|
| But I think Postgres is likely to get these improvements in a
| few years too.
___________________________________________________________________
(page generated 2023-03-31 23:00 UTC) |