[HN Gopher] Postgres as a graph database
___________________________________________________________________
 
Postgres as a graph database
 
Author : elorant
Score  : 378 points
Date   : 2023-03-31 13:35 UTC (9 hours ago)
 
web link (www.dylanpaulus.com)
w3m dump (www.dylanpaulus.com)
 
| mdavidn wrote:
| I do love PostgreSQL, and I often reach for this approach when
| working with hierarchical data, but a word of warning: Recursive
| queries will not scale to handle _very_ large edge tables. The
| recursive query will always consider _all_ edges, even when the
| outer query seeks only a single node's relationships. The
| solution is to build and index denormalized n-th order
| relationship tables.
| 
| Others have already pointed out the issues of cycles.
 
  | Nezteb wrote:
  | I learned from one of the comments on the post about AGE[1], "a
  | PostgreSQL extension that provides graph database
  | functionality."
  | 
  | [1] https://age.apache.org/
 
  | afandian wrote:
  | I'm sorry could you spell it out? What exactly does "recursive
  | query will always consider _all_ edges" mean? A table scan? I'd
  | be very grateful if you could give some pesudocode or point to
  | a doc page.
 
    | jeremyjh wrote:
    | I think GP means that it has to completely expand the
    | recursive part for every branch before any where condition on
    | the edge nodes can be applied. Graph databases presumably can
    | optimize this.
    | 
    | I've found recursive queries to be difficult to scale in
    | real-world queries past a few million edge nodes. We've
    | denormalized several tree relationships so that the edge is
    | connected both to its parent and to the root of its tree in
    | several cases.
 
      | afandian wrote:
      | Thanks! Sounds like you're saying that brute-force
      | recursive algorithms without backtracking or early
      | termination aren't a good match for path finding. That's
      | not a surprise.
      | 
      | I'm using recursive algorithm on Postgres to find trees in
      | a graph, but only where I'm specificaly interested in all
      | of the nodes.
 
  | cryptonector wrote:
  | Cycles are not a problem: just use `UNION`, not `UNION ALL`.
  | 
  | I myself build transitive closure tables that I update
  | incrementally as needed (sometimes in a deferred way), so that
  | many of the recursive queries I do can be very fast, but I only
  | build transitive closures for the smallest portion I can
  | (basically group nesting).
 
  | liotier wrote:
  | > Recursive queries will not scale to handle _very_ large edge
  | tables
  | 
  | What do you consider large ? We have a 12-year old telco
  | application with a few million objects in Neo4J, doing fault
  | impact calculations... Would PostgreSQL handle that easily
  | nowadays ?
 
    | simonw wrote:
    | My hunch is that a few million objects is pretty tiny these
    | days - you could probably write a script to import all of
    | them into PostgreSQL in a few minutes and try it out yourself
    | to see how it behaves.
 
      | ellisv wrote:
      | It depends a lot on how connected the graph is and whether
      | you can fit it in memory.
 
      | simonw wrote:
      | I tried it against SQLite. I got ChatGPT to write me a
      | script that would insert 1m edges and 1m nodes:
      | 
      | https://gist.github.com/simonw/c16ce01244760e186a3a0aa3fee0
      | 4...
      | 
      | Then I ran that query again and it seems to return results
      | in about 80ms:
      | 
      | https://lite.datasette.io/?sql=https://gist.github.com/simo
      | n...
 
        | eurasiantiger wrote:
        | Not very realistic example, you need to be requesting
        | some actual fields across nodes and doing some filtering
        | on at least strings and dates, maybe geo areas as well.
 
        | simonw wrote:
        | Go ahead and try it! I've documented all of the tools you
        | need to run some detailed experiments against SQLite
        | entirely in your browser here.
 
  | ellisv wrote:
  | It's not even that it always considers all edges, but that you
  | can't exit the search early when a condition is met. In other
  | words, you have to first find all the paths and then, outside
  | the CTE, filter to the shortest. We push the node filter into
  | the CTE by wrapping it in a function.
  | 
  | > The solution is to build and index denormalized n-th order
  | relationship tables.
  | 
  | This sounds much more performant but also more difficult to
  | maintain.
 
    | switchbak wrote:
    | "you can't exit the search early when a condition is met" - I
    | have a DAG traversal written in a recursive CTE, and I can
    | bail early out just fine when my traversal predicates no
    | longer match. Not sure why I'd have to do that outside the
    | (recursive part of the) CTE?
    | 
    | Obviously maintaining a flattened version is going to perform
    | better in queries, but you're trading maintenance (write)
    | time for query time. Or materialization time if you decide to
    | do it lazily.
 
      | ellisv wrote:
      | If you have an edge list like:
      | 
      | A->B
      | 
      | A->D
      | 
      | B->C
      | 
      | C->D
      | 
      | Postgres will walk both paths from A to D in the recursive
      | CTE and then you can filter them afterwards to keep only
      | the shortest.
      | 
      | You can use aggregate functions within the recursive CTE,
      | so you can't GROUP BY your starting node and stop once you
      | find the first path. There isn't a way to compare across
      | multiple paths or iterations of the for-loop.
 
        | FleaFlicker99 wrote:
        | Ok but that's a little different than saying you can't
        | cut the traversal short.
        | 
        | If I'm traversing ancestors (let's say) until I find one
        | that doesn't satisfy a condition, it'll bail out then. I
        | get that this doesn't serve all uses cases, but it isn't
        | a small thing either.
 
    | mdavidn wrote:
    | Yes, it is difficult to maintain. That might be a good moment
    | to consider a proper graph database!
 
      | ellisv wrote:
      | I've considered using a column to store indicate 2nd, 3rd,
      | etc degree relations instead of n-th order tables.
 
| YouWhy wrote:
| While the basic technique is sound, SQL recursive queries' no-
| side-effects character means a huge number of basic graph
| traversal strategies (e.g.: pruning, shortest path
| first/Dijkstra) are beyond reasonable engineering.
| 
| In addition to that, Postgres is difficult as a computational
| platform - the execution strategy for recursive queries is not
| necessarily intuitive, and their performance is hard to assess
| (for example: how soon will you run out of "stack"?)
 
| cpa wrote:
| It's a good tool as long as your queries are simple, and your
| graph isn't too large (which is probably the case 99% of the
| time). And if that allows you to have one less tool to deal with,
| go for it!
| 
| But I've tried to push it to its limits (using PostGIS, my nodes
| being land plots that were connected to each others -- the
| biggest query used joins, lateral, recCTE, windows... you name
| it) and ran into many issues: can't branch out early, hard to
| optimize, no support for complex queries (writing anything beyond
| BFS and DFS is a pita). Yet, in the end I accepted the struggle
| (and the long queries) because it was so nice to work with my
| data directly where it was kept :)
 
  | amyres wrote:
  | Using the built-in PostGIS topology tools? Or something more
  | custom? I'm curious as I just started digging into and using
  | PostGIS for a land parcel use case. I've wondered about the
  | graph structure of parcels adjacent to others, rather than just
  | a big unordered table of geom objects.
 
    | cpa wrote:
    | Just using the geography stuff and doing joins on
    | ST_Intersects. I couldn't guarantee that my data formed a
    | partition of the plane, which is necessary for topology iirc.
    | 
    | Fun was had and headaches too. The biggest speedup I got was
    | computing the graph online (creating a node/vertices tables
    | from geometries) and then doing the recursive CTE on that
    | table.
 
      | amyres wrote:
      | Cool thanks! I'm amazed at how much can be done in PostGIS
      | (if my SQL queries are good enough...)
 
| Rapzid wrote:
| I've found this post pretty inspiring for what can be done with
| graphs in Postgres: https://www.alibabacloud.com/blog/postgresql-
| graph-search-pr...
| 
| There are a LOT of use cases that will fit the capabilities and
| performance characteristics.
 
| stevebmark wrote:
| Is with_recursive nearly as performant as dedicated graph
| database query languages?
 
| vrglvrglvrgl wrote:
| [dead]
 
| crunchengine wrote:
| I use EdgeDB for graphs, it works well enough, I just don't know
| about performances but for sure it is sufficient for most use-
| cases.
 
  | e1g wrote:
  | EdgeDB is Postgres so the same constraints apply.
 
    | pcthrowaway wrote:
    | I imagine that's why the person you're responding to
    | mentioned it. I'm surprised the author of this article didn't
 
| claytongulick wrote:
| It doesn't handle all cases or all queries, but one method I've
| used to deal with graph data in relational databases it to
| materialize paths and cache them on the node (this only really
| works well with DAGs).
| 
| So, the path key could look something like: bob.ted.lisa.bill
| (obviously, use ids instead of names here)
| 
| Then you can index this in various ways to make LIKE queries
| fast.
| 
| Want do know anyone that rolls up to ted?
| select * from blah where path like 'bob.ted%'
| 
| Or, how many people report to ted, but aren't under lisa?
| select count(*) from blah where path like 'bob.ted%' and path not
| like 'bob.ted.lisa%'
| 
| And something a bit more complex that would typically be
| considered the domain of a graph database, like "What are the
| average number of direct reports each manager in my organization
| has?"                   select name, avg((select count(*) from
| blah as sub where path like  '%' || name || '%' and
| (array_position(string_to_array(path,'.'), name) =
| ((array_position(string_to_array(path,'.'), sub.name) - 1))
| 
| So, even with that degree of complexity, you're still avoiding
| recursion, though you might be getting into O(n^2) territory if
| you don't have an index that will handle '%' || ... || '%' well.
| You can expand on this with various techniques depending on the
| application.
| 
| It's certainly no replacement for a graph database, but for basic
| stuff it's very fast, handles huge volumes, and doesn't rely on
| any sort of recursion at the database layer (unless you
| materialize in a trigger or something).
| 
| The actual materialization code can be tricky, one technique to
| deal with that is to use the author's technique or something
| similar to have a recursive CTE in a view. When a graph change is
| made, you can scope the change based on the path's hierarchy and
| update the paths from the recursive view, that way you only hit
| the performance penalty at write time, not read.
 
| szarnyasg wrote:
| I designed and maintain several graph benchmarks in the Linked
| Data Benchmark Council, including workloads aimed for databases
| [1]. We make no restrictions on implementations, they can any
| query language like Cypher, SQL, etc.
| 
| In our last benchmark aimed at analytical systems [2], we found
| that SQL queries using WITH RECURSIVE _can_ work for expressing
| reachability and even weighted shortest path queries. However,
| formulating an efficient algorithm yields very complex SQL
| queries [3] and their execution requires a system with a
| sophisticated optimizer such as Umbra developed at TU Munich [4].
| Industry SQL systems are not yet at this level but they may
| attain that sometime in the future.
| 
| Another direction to include graph queries in SQL is the upcoming
| SQL/PGQ (Property Graph Queries) extension. I'm involved in a
| project at CWI Amsterdam to incorporate this language into DuckDB
| [5].
| 
| [1] https://ldbcouncil.org/benchmarks/snb/
| 
| [2] https://www.vldb.org/pvldb/vol16/p877-szarnyas.pdf
| 
| [3]
| https://github.com/ldbc/ldbc_snb_bi/blob/main/umbra/queries/...
| 
| [4] https://umbra-db.com/
| 
| [5] https://www.cidrdb.org/cidr2023/slides/p66-wolde-slides.pdf
 
  | eternalban wrote:
  | Re [5]'s asssertion under "blunders" of the diminish usecases
  | post sql/pgq, what do you think of [something] like Quine?
  | 
  | https://github.com/thatdot/quine
  | 
  | Their claim to fame is progressive incremental computation -
  | each node is an actor responding to events -- and I'm not sure
  | how a relational db could do that and match the latencies. That
  | usecase is pretty much pattern matching and forensics and stuff
  | like that.
  | 
  | https://docs.quine.io/core-concepts/architecture.html
 
| hot_gril wrote:
| Postgres is powerful. I've implemented graph DB, mapreduce, huge
| sparse matrix math (which was cool), and ofc a KV store with it.
| With actual performance benefits over the alternatives. It's not
| very ergonomic for those, but it works.
| 
| That said, I've had so many jobs where the team decides to fit
| our whole data model into a fully recursive graph, and it's been
| a huge mistake every time regardless of whether they use Postgres
| or a dedicated graph DB. Just because you can do it (you always
| can!) doesn't mean you should. Start with just a traditional
| n-hierarchy schema and only look at modeling things as a graph if
| you're sure it's necessary. Usually it's not. Sometimes you'll
| have a mostly non-graph schema with a couple of graph tables for
| the things that really need that.
 
| ape4 wrote:
| Postgres has commands to easily list conventional tables nicely.
| While I can see this would work, there are no native Postgres
| commands to list the graph.
 
| jzelinskie wrote:
| We used the "WITH RECURSIVE" basically the moment it shipped in
| Postgres for the original CoreOS Clair[0] data model which was a
| graph of software versions and vulnerabilities. It scaled well
| compared to a naive graph abstraction implemented outside the
| database, but when performance wasn't great, it REALLY wasn't
| great. After running that version in production for a couple
| years, we ended up throwing it out and remodeling the domain to
| be less graph-y to achieve more consistent performance.
| 
| I've since worked on SpiceDB[1] which scales much better by
| taking the traditional design approach for graph databases and
| only treating Postgres as triple-store. IME, there's no short-
| cut: if you need a graph, you probably want to use a database
| optimized for graph access patterns. Most general-purpose graph
| databases are full of optimizations for common traversals that
| are uncommon operations in relation databases.
| 
| [0]: https://github.com/quay/clair
| 
| [1]: https://github.com/authzed/spicedb
 
| yooloo wrote:
| Of course relational db can act like a graph db. It's just not as
| efficient due to how things are stored and queried. Would be
| great to have a graph db plugin (and I found one
| https://github.com/apache/age)
 
  | piyh wrote:
  | Having first class pagerank and shortest path functions in
  | tinkerpop gremlin vs having to roll it yourself with recursive
  | CTEs feels like a graph DB win.
 
| sigstoat wrote:
| any suggestions for getting nice graph db-style functionality out
| of an postgres schema, where you've got the nodes and edges of a
| labelled property graph spread across a bunch of different
| tables? (AGE seems to want to create a separate graph object that
| lives alongside your other tables, and toy examples like this
| article just dump everything into special tables)
| 
| in my particular case i don't actually have a lot of data so i'm
| not looking for performance, just want to save developer time.
 
| ghotli wrote:
| I like this idea and ultimately what I want is "is this a DAG or
| not" and if so "give me the topological sort".
| 
| Anyone know of good docs for this sort of pattern with SQL that
| addresses those two common tasks?
 
| varunjain99 wrote:
| This is a neat data modeling exercise, but not sure why you'd
| prefer this approach over Neo4j.
| 
| It's much simpler to query in native graph database land (though
| it's not exactly SQL, it's pretty close to be honest). And there
| are performance / reliability implications of forcing postgres
| into graph land.
 
| piyh wrote:
| This isn't just a PG thing. The wiki article isn't great, but the
| references have links to sqlite, mssql, oracle, ibm, etc.
| 
| https://en.wikipedia.org/wiki/Hierarchical_and_recursive_que...
 
| simonw wrote:
| This same technique - WITH RECURSIVE - is also supported by
| SQLite.
| 
| I ported the example in this tutorial to SQLite here - now you
| can play with it in Datasette Lite:
| https://lite.datasette.io/?sql=https%3A%2F%2Fgist.githubuser...
| 
| Amusingly I ported the sample PostgreSQL code using the new
| ChatGPT "Browsing" alpha, with the following prompt:
| 
| > Read this tutorial and then output equivalent create table and
| insert statements for SQLite
| https://www.dylanpaulus.com/posts/postgres-is-a-graph-databa...
 
  | punnerud wrote:
  | If we have links to other Sqlite3 databases in the SQLite
  | itself, we could make an static webpage distributed graph
  | database: https://news.ycombinator.com/item?id=27016630
  | 
  | With the static database trick, combined with recursive graphs
 
| mharig wrote:
| IIRC, the maybe biggest graph database in the world, TAO (The
| Associations and Objects ?) of FB, is based on MySQL.
| 
| Meta may have had a good reason not to choose a 'native' graph
| DB.
 
| its-summertime wrote:
| Handling cycles seems to be a missing point.
| 
| https://www.postgresql.org/docs/current/queries-with.html#QU...
| the docs themselves do have information about it.
 
  | ellisv wrote:
  | Postgres can detect cycles; see 7.8.2.2 in the docs:
  | https://www.postgresql.org/docs/current/queries-with.html#QU...
 
| jakewins wrote:
| Johan wrote the original Neo4j implementation as a graph layer on
| top of Postgres, random trivia. Then Java released NIO, he got
| excited and tried replacing the PG engine with one built for
| graphs from scratch.
 
| kaba0 wrote:
| Only remotely relevant, but I have recently come across Gremlin,
| a sort of "SQL for graphs". Does anyone have some experience with
| it they can share? I only used it on an in-memory graph database,
| and I found it quite great so far.
 
  | inguz wrote:
  | Tinkerpop/Gremlin is not really SQL-like at all. My experience
  | with it has not been fun: it's difficult to reason about, has
  | obscenely bad documentation, poor tooling, idiosyncratic
  | library support, and simple data access patterns are hard.
 
| user00012-ab wrote:
| whoa.... postgres is moving quickly and taking jobs away from
| other databases. Maybe we need to hold up for 6 months on any new
| uses for postgres until we all have some time to figure this out.
 
| truth_seeker wrote:
| Few suggestions based on my prior experience on the same task in
| production system:-
| 
| 1. Both the tables (Nodes and Edges) MUST HAVE one more column as
| "label" and then you must create partitions based on distinct
| values of "label" column. This greatly helps queries to run
| faster as search plan is narrowed when you include it in where
| clause and PG knows partitions to hit.
| 
| 2. Instead of one big fat primary key column in each table use,
| consider having different uniques indexes on each partition.
| 
| 3. The EDGES table can afford to have "data" column of data type
| TEXT or perhaps JSONB. PG saves it in different data disk layout
| and compression works great here. We used it to cache the result
| for previous/next nodes data or compute the result for many extra
| hops which are required on frequent basis.
| 
| 4. Use PG procedures to hide the complexity of reusable DFS/BFS
| queries.
 
| yayr wrote:
| Can someone explain or point towards an explanation, what the
| performance impact of such a query is? How does it scale with
| number of edges or depth of recursion? Is there a special smart
| way of indexing this to be efficient?
 
  | henryfjordan wrote:
  | If you store a list of edges like (startId, endId) with an
  | index on startId, then every time you want to follow an edge
  | will cost O(log(n)) where n is the number of edges in your
  | graph.
  | 
  | You can get O(1) performance on similar queries using a
  | different storage strategy (by storing edges as pointers to the
  | other nodes in memory) but SQL is not the right tool for that,
  | you would be better off with a true graph DB.
 
  | winrid wrote:
  | If you have indexes on next_node and previous_node, it'll
  | preform okay.
  | 
  | The size of the table (# of edges) won't actually matter *that*
  | much performance wise with btree indexes. The performance will
  | mostly depend on how deep you want to go with your recursion.
  | The more nodes you check per query, the slower it'll be.
  | Luckily PG will keep the frequently accessed nodes and index
  | pages in the buffer pool, so it'll perform close enough to an
  | in-memory graph database for most use cases that access the
  | same data often - while not requiring your whole dataset to fit
  | into memory all the time.
  | 
  | MongoDB's current graph search mechanism works the same way.
  | 
  | I think actually if you are using incremental ids and you don't
  | delete edges, you could try BRIN indexes for the next/previous
  | node columns. This could still get you pretty good performance
  | with _much_ lower storage space requirements than a
  | "traditional" index, so you could theoretically store a pretty
  | large graph (thinking billions of edges) with minimal disk
  | space.
 
| ForHackernews wrote:
| I can't believe this article doesn't mention the (first-party)
| ltree extension: https://hoverbear.org/blog/postgresql-
| hierarchical-structure...
 
| fbn79 wrote:
| Is not nested sets https://en.wikipedia.org/wiki/Nested_set_model
| a better solution than the one described in the article?
 
| elchief wrote:
| Joe Celko covers graphs in chapters 12-14 of Trees and
| Hierarchies in SQL for Smarties
| 
| as well as chapter 27 of SQL for Smarties, 5th
| 
| isn't ISO working on an graph extension to regular SQL? Can't
| seem to google it correctly
 
| garyclarke27 wrote:
| In my experience Postgres works really with a Recursive CTE
| inside a Materialized View for Graphs. Latest version can detect
| cycles. I used to add a distance column as well, to show number
| of hops apart, any 2 nodes are. Hoping Postgres adds Automatic
| Incremental Mat View Maintenance soon, it will be perfect then as
| a Graph Database.
 
  | TOMDM wrote:
  | Incremental materialised view maintenance is a killer feature.
  | 
  | If Postgres added it, almost all of my interest in Materialize
  | (the database, not the technique) would vanish immediately.
 
| dfragnito wrote:
| We accomplish similar with SchemafreeSQL. Currently MySQL 8 back-
| end but working on other SQL back-ends. This demo uses
| PlanetScale as the DB back-end, Fly.io handles the SFSQL
| endpoint, Netlify function handles the query, and cytoscape.js
| graph network library for the UI
| 
| https://harmonious-mermaid-c4d794.netlify.app/got (sorry not
| mobile friendly but functional)
 
  | samlambert wrote:
  | This is a cool project.
 
    | dfragnito wrote:
    | Thanks !! We had hoped to use PS for our serverless backend
    | and also offer a dedicated PS option. But we ran into some
    | issues with Vitess. I was just thinking today I wanted to get
    | back into it and see if we can resolve these issues. We had
    | to go with Aurora.
    | 
    | The issues were on deletes. This demo is read only so it
    | works well with a PS backend. My experience with your CS has
    | been amazing despite these issues!!
 
| _boffin_ wrote:
| Again, i'll post
| https://books.google.com/books/about/Joe_Celko_s_Trees_and_H...
| 
| pdf:
| https://github.com/Gumplefik/pdf/blob/master/Joe%20Celko's%2...
 
  | packetslave wrote:
  | we're posting copyrighted PDFs to HN now?
 
| jghn wrote:
| Over time I've come to take this a step further. I feel people
| reach for a graph database way too early. Most use cases that
| most people have will work fine in a standard relational setup.
| And that brings with it the benefits of using technologies like
| Postgres that have stood the test of time. So I treat it as I do
| any performance consideration: first demonstrate that you
| actually need to go down the path less traveled, don't just
| assume you need to do so.
 
  | rajman187 wrote:
  | Completely agree. In fact, even massive graph data can make use
  | of a relational database + caching [1]
  | 
  | [1] https://engineering.fb.com/2013/06/25/core-data/tao-the-
  | powe...
 
| cryptonector wrote:
| The reason graph DBs exist is mainly to provide easier syntax for
| recursive queries (which aren't really recursive because they're
| tail-recursive, which means iterative rather than what most
| people think of as recursive).
| 
| Any questions?
| 
| It's not clear how one might make SQL syntax for recursive
| queries simpler. Clearly we can have syntactic sugar for
| "dereferencing" relationships, so one could say `SELECT
| child-->name FROM parent_child WHERE ...;`, and we could have
| aggregation `SELECT array_agg(child-->name) FROM parent_child
| WHERE ...;`, and `SELECT child--->name FROM parent_child WHERE
| ...;` to say "recurse all the way through the child relation",
| but if you want more than one column then things get tricky.
 
| dapearce wrote:
| Also see Apache Age, a graph database extension for Postgres:
| https://github.com/apache/age
 
| philzook wrote:
| Very interesting. Relatedly, I've been refining simple
| lightweight ways to use a regular SQL database to execute
| seminaive datalog queries. Path reachability in a graph is the
| canonical example. One could use stored functions to implement
| the outer loop also.
| 
| There are some other fun related applications like graph
| rewriting using SQL, etc.
| 
| MiniLitelog: Easy Breezy SQLite Datalog -
| https://www.philipzucker.com/tiny-sqlite-datalog/
 
| ChicagoDave wrote:
| As a C# guy, I've figured out how to build in-memory graphs with
| generics and dsl/fluid queries. I'm working on a blog entry about
| it because it's such a powerful skill to learn and leverage. No
| data store required. I can deserialize/serialize the data
| structure to/from binary or json.
 
| ralusek wrote:
| I've done this on many occasions, but I actually only go with a
| single nodes table.
| 
| Schema is like this                   id         type: string (or
| enum)         data: jsonb         indexed: jsonb (but this one
| gets a GIN index)         toId: nullable foreign key to this same
| table         fromId: nullable foreign key to this same table
| createdAt: date         updatedAt: date
| 
| So if I had a post with a comment on it, the data would look
| something like this                  {          id: '1111',
| type: 'users',          data: { name: 'John Doe' },        }
| {          id: '2222',          type: 'users',          data: {
| name: 'Jane Doe' },        }             {          id: '3333',
| type: 'posts',          data: { text: 'Nice day today' },
| }             {          id: '4444',          type: 'comments',
| data: { text: 'For you, maybe. Raining here' },        }
| 
| And now for the edges:                   {           id: '5555',
| type: '(posts < users)',           toId: '3333',
| fromId: '1111',         }              {           id: '6666',
| type: '(comments < users)',           toId: '4444',
| fromId: '2222',         }              {           id: '7777',
| type: '(posts < comments)',           toId: '3333',
| fromId: '4444',         }
| 
| So then, for example, if I have a post, and I want to find the
| comments on it, I search for toId = , type = '(posts
| < comments)'.
 
  | TOMDM wrote:
  | This works well for nodes with at most 1 parent and child, the
  | post is aimed at supporting many to many relationships though
 
    | ralusek wrote:
    | What I've described here supports many to many...
 
      | fauigerzigerk wrote:
      | Yes, but the downside of your schema is that the
      | distinction between nodes and edges is somewhat muddled.
      | 
      | Presumably, all nodes would have nulls in both fromId and
      | toId, but the schema doesn't enforce that. The schema also
      | allows linking edges to other edges.
      | 
      | Don't you think this is a bit too much flexibility if the
      | intention is to model a graph?
 
| Dopameaner wrote:
| Andy Pavlov, mentioned something similar. Where, he advocated
| Relational database to solve most of the graph database
| utilities. He also made a bet, if graph databases spawn a legit
| usecase by 2030, that he would change his CMU profile picture.
 
| henryfjordan wrote:
| I implemented pretty much exactly this setup once.
| 
| For what it's worth, there's a lot of footguns with this
| approach. You need to be careful about infinite cycles and things
| like that. You also just do not get the ergonomics of using a
| query language that is meant for graph data. Once you get it all
| setup and experience the issues you will no doubt be able to
| build whatever you want but there's a learning curve.
| 
| In the end we switched to using Neo4j and were able to build new
| features a lot more quickly than we would've been able to on
| Postgres.
| 
| It's also worth mentioning that there are many ways to store
| graph data, and using an "adjacency list" like is being done here
| may or may not be the best way for your use case. Neo4j I think
| uses a different approach where nodes store info about edges they
| are connected to directly, so you don't even need to hit an index
| to follow an edge.
 
  | mistermann wrote:
  | Piggybacking on your Neo4J reference....can anyone suggest any
  | resources that would be good for someone from the SQL world who
  | is very uninformed on graph databases to quickly get their head
  | around the key ideas, but even more importantly, how to choose
  | a platform among the offerings (licensing being I think a key
  | issue, I've heard Neo4J can get expensive)...assume a large,
  | "social media" scale system, to be safe.
 
    | viksit wrote:
    | you can use datomic as a good example.
    | https://hashrocket.com/blog/posts/using-datomic-as-a-
    | graph-d...
    | 
    | another good example using predicates -
    | https://github.com/threatgrid/asami/wiki/2.-Introduction
 
    | pottertheotter wrote:
    | I was looking at graph databases/Neo4j a lot early last year
    | and found Neo4j has a lot of good resources.
    | 
    | You can get this book for free through their website. [1]
    | 
    | This have some good documentation[2], including this page
    | "Transition from relational to graph database".[3]
    | 
    | [1] https://neo4j.com/graph-databases-book/
    | 
    | [2] https://neo4j.com/docs/getting-started/current/
    | 
    | [3] https://neo4j.com/docs/getting-
    | started/current/appendix/grap...
 
    | henryfjordan wrote:
    | Neo4j has a self-hosted community edition that is free,
    | though you need to update to Enterprise to expand your DB
    | beyond one node. It's worth noting though that scaling from a
    | single node to a cluster is going to have some pretty huge
    | performance issues whether you are using SQL or Neo4j to
    | store your graph. The power of graph databases is that it
    | should be very inexpensive to follow an edge in the graph but
    | when that leads to a network hop you lose a lot of
    | performance.
    | 
    | If you are comfortable fitting your data onto one box, then
    | development speed is probably more important than other
    | factors and I would just try a few databases out and
    | especially pay attention to how good the libraries are in
    | your programming language of choice. Neo4j for instance had
    | high quality libraries in Java but the experience in Python
    | was not as good.
    | 
    | If you have a lot of data or need next-level performance, I
    | would start by doing some research on the various ways to
    | store graph data and then pick a DB that supports the
    | approach you want to take.
 
    | joshspankit wrote:
    | As someone who hated being in the SQL world and couldn't
    | figure out why until years later when I found out about graph
    | databases, here's one big shift:
    | 
    | Graph database store true relationships.
    | 
    | The R in RDBMS and the concept of relationships by primary
    | keys are lies (imo). They lead people away from what true
    | relationships can be. Databases like Neo4j are not about
    | doing any sort of PK or join or merge-before-query. Looking
    | up by connection is the _fastest_ way to find information. If
    | your RDBMS had one table for phone numbers, one for email
    | addresses, one for first names, one for last names, one for
    | street, and so-on it would take huge amounts of time to query
    | to find all the detail for a single person. With a graph
    | database it takes a small amount of time to find the first
    | bit of info (let's say phone number since it's easily
    | sorted), and then every other bit of linked data is
    | essentially free.
 
      | eurasiantiger wrote:
      | Effectively, moving from RDB to a graph database just
      | shifts relationship resolution from query time to
      | insert/update time. Sometimes this can be an advantage. I'd
      | even wager usually.
 
    | taubek wrote:
    | For some basics of graph modeling take a look at
    | https://memgraph.com/docs/memgraph/tutorials/graph-modeling.
    | 
    | Some differences between SQL and graph databases are covered
    | at https://memgraph.com/blog/graph-database-vs-relational-
    | datab....
 
  | cryptonector wrote:
  | > You need to be careful about infinite cycles and things like
  | that.
  | 
  | Not really, just never forget this: always use `UNION`, _and
  | never_ `UNION ALL`, in recursive queries!
 
| revskill wrote:
| PostgreSQL is my love. Thanks for all developers.
| 
| Do not blame the tool if you feel it doesn't fit your niche.
| Blame yourself first, and try to use it the right way.
 
| amir734jj wrote:
| Good example. Now do it with undirected graph. How will the query
| look like?
| 
| So, is this basically running transitive closure on the database
| during query? it would be expensive.
 
| pphysch wrote:
| Small nitpick but I wish the author just made `data` a JSONB
| field rather than VARCHAR. That really shows the power of
| Postgres, being able to operate as a "NoSQL graph DB" with little
| to no hacking.
 
  | ganderzz wrote:
  | Hey, author here! Thanks for the feedback! I went back and
  | forth with having `data` being a JSONB field or VARCHAR, but
  | ended up with VARCHAR to show that `nodes` don't have to be
  | anything crazy. Really `nodes` is just a standard table you'd
  | see in any old SQL database, and what makes it a "graph" is the
  | `edges`/relationship table.
 
| bottlepalm wrote:
| I can see how this possibly won't scale compared to dedicated
| graph dbs.
| 
| I'm interested in what the fundamental difference is in those
| graph dbs because rdbms seem so close I don't see why it isn't
| just provided as an extra layer on top.
| 
| As described in the article, any graph can already be built with
| an rbms, so why for example does neo4j need to be standalone and
| can't possibly be a layer on top of postgres to leverage existing
| relationships? Any good articles on that?
 
  | emodendroket wrote:
  | I don't know off the top of my head but I'd assume that one
  | could adjust the storage strategy and other performance
  | tweaking to optimize for the expected use case.
 
| samwillis wrote:
| If you are looking to utilise this with Django, there is the
| brilliant Django CTE project that enables it and other CTE
| queries with the Django ORM:
| 
| https://dimagi.github.io/django-cte/#recursive-common-table-...
 
| taubek wrote:
| I personally prefer native graph databases such as Memgraph[1].
| Graph databases have it's advantages and disadvantages. Sometimes
| you, need RDBMS, sometimes you need NoSQL solution. So I pick my
| stack based on situation and issue that I need to solve.
| 
| [1] https://memgraph.com
 
  | rajman187 wrote:
  | "Native" is a misleading or misused term in the context of
  | graph databases because a graph cannot be mapped in a maximally
  | consistent way with the underlying hardware (the von Neumann
  | architecture represents data in a sequential manner and it is
  | much faster to access it this way rather than randomly).
  | 
  | There are generally two families in the graph database world:
  | those which use underlying traditional tables of nodes and
  | many-to-many edges; and index-free adjacency which just means
  | each node in the graph knows the memory address of its
  | connections (other side of the edges).
  | 
  | Distributed graphs necessarily end up using the former because
  | it's difficult if not impossible for a node to know the memory
  | address of its connection when that crosses a physical
  | boundary. So typically index-free adjacency graphs have a
  | master-slave setup with multiple read replicas but a single one
  | to write to.
  | 
  | So with a "native graph" you don't rely on potentially
  | expensive join operations to find neighbors of neighbors and
  | can traverse complex paths easily.
 
    | dtomicevic wrote:
    | This is spot on. Though, in Memgraph's context, there is a
    | combination of lock free techniques and raw pointer
    | representation that can work exceptionally well for a large
    | variety of use cases, and even better so compared to others
    | on streaming / event driven / large volume of writes. Using
    | raw pointers, there are also optimization opportunities when
    | storing or rebalancing the graph to maximize the number of
    | adjacent nodes in a cache line so benefiting from sequential
    | representation too.
 
  | mashroom wrote:
  | Thanks for pointing me toward Memgraph. I didn't hear of it
  | before and it appears to be a great alternative to neo4j.
 
| nilsbunger wrote:
| Sure, at its most basic a graph database is just a bunch of
| many2many relationships.
| 
| The recursive query is cool, but how is the performance for graph
| queries compared to a DB optimized for this use case ?
 
  | ellisv wrote:
  | I haven't compared, but something like Kuzu is probably much
  | more performant for a lot of use cases because it implements
  | Worst-Case Optimal Joins.
  | 
  | But I think Postgres is likely to get these improvements in a
  | few years too.
 
___________________________________________________________________
(page generated 2023-03-31 23:00 UTC)