proxy70

	[HN Gopher] Postgres as a graph database ___________________________________________________________________ Postgres as a graph database Author : elorant Score : 378 points Date : 2023-03-31 13:35 UTC (9 hours ago)
	web link (www.dylanpaulus.com)
	w3m dump (www.dylanpaulus.com)
	\| mdavidn wrote: \| I do love PostgreSQL, and I often reach for this approach when \| working with hierarchical data, but a word of warning: Recursive \| queries will not scale to handle _very_ large edge tables. The \| recursive query will always consider _all_ edges, even when the \| outer query seeks only a single node's relationships. The \| solution is to build and index denormalized n-th order \| relationship tables. \| \| Others have already pointed out the issues of cycles. \| Nezteb wrote: \| I learned from one of the comments on the post about AGE[1], "a \| PostgreSQL extension that provides graph database \| functionality." \| \| [1] https://age.apache.org/ \| afandian wrote: \| I'm sorry could you spell it out? What exactly does "recursive \| query will always consider _all_ edges" mean? A table scan? I'd \| be very grateful if you could give some pesudocode or point to \| a doc page. \| jeremyjh wrote: \| I think GP means that it has to completely expand the \| recursive part for every branch before any where condition on \| the edge nodes can be applied. Graph databases presumably can \| optimize this. \| \| I've found recursive queries to be difficult to scale in \| real-world queries past a few million edge nodes. We've \| denormalized several tree relationships so that the edge is \| connected both to its parent and to the root of its tree in \| several cases. \| afandian wrote: \| Thanks! Sounds like you're saying that brute-force \| recursive algorithms without backtracking or early \| termination aren't a good match for path finding. That's \| not a surprise. \| \| I'm using recursive algorithm on Postgres to find trees in \| a graph, but only where I'm specificaly interested in all \| of the nodes. \| cryptonector wrote: \| Cycles are not a problem: just use `UNION`, not `UNION ALL`. \| \| I myself build transitive closure tables that I update \| incrementally as needed (sometimes in a deferred way), so that \| many of the recursive queries I do can be very fast, but I only \| build transitive closures for the smallest portion I can \| (basically group nesting). \| liotier wrote: \| > Recursive queries will not scale to handle _very_ large edge \| tables \| \| What do you consider large ? We have a 12-year old telco \| application with a few million objects in Neo4J, doing fault \| impact calculations... Would PostgreSQL handle that easily \| nowadays ? \| simonw wrote: \| My hunch is that a few million objects is pretty tiny these \| days - you could probably write a script to import all of \| them into PostgreSQL in a few minutes and try it out yourself \| to see how it behaves. \| ellisv wrote: \| It depends a lot on how connected the graph is and whether \| you can fit it in memory. \| simonw wrote: \| I tried it against SQLite. I got ChatGPT to write me a \| script that would insert 1m edges and 1m nodes: \| \| https://gist.github.com/simonw/c16ce01244760e186a3a0aa3fee0 \| 4... \| \| Then I ran that query again and it seems to return results \| in about 80ms: \| \| https://lite.datasette.io/?sql=https://gist.github.com/simo \| n... \| eurasiantiger wrote: \| Not very realistic example, you need to be requesting \| some actual fields across nodes and doing some filtering \| on at least strings and dates, maybe geo areas as well. \| simonw wrote: \| Go ahead and try it! I've documented all of the tools you \| need to run some detailed experiments against SQLite \| entirely in your browser here. \| ellisv wrote: \| It's not even that it always considers all edges, but that you \| can't exit the search early when a condition is met. In other \| words, you have to first find all the paths and then, outside \| the CTE, filter to the shortest. We push the node filter into \| the CTE by wrapping it in a function. \| \| > The solution is to build and index denormalized n-th order \| relationship tables. \| \| This sounds much more performant but also more difficult to \| maintain. \| switchbak wrote: \| "you can't exit the search early when a condition is met" - I \| have a DAG traversal written in a recursive CTE, and I can \| bail early out just fine when my traversal predicates no \| longer match. Not sure why I'd have to do that outside the \| (recursive part of the) CTE? \| \| Obviously maintaining a flattened version is going to perform \| better in queries, but you're trading maintenance (write) \| time for query time. Or materialization time if you decide to \| do it lazily. \| ellisv wrote: \| If you have an edge list like: \| \| A->B \| \| A->D \| \| B->C \| \| C->D \| \| Postgres will walk both paths from A to D in the recursive \| CTE and then you can filter them afterwards to keep only \| the shortest. \| \| You can use aggregate functions within the recursive CTE, \| so you can't GROUP BY your starting node and stop once you \| find the first path. There isn't a way to compare across \| multiple paths or iterations of the for-loop. \| FleaFlicker99 wrote: \| Ok but that's a little different than saying you can't \| cut the traversal short. \| \| If I'm traversing ancestors (let's say) until I find one \| that doesn't satisfy a condition, it'll bail out then. I \| get that this doesn't serve all uses cases, but it isn't \| a small thing either. \| mdavidn wrote: \| Yes, it is difficult to maintain. That might be a good moment \| to consider a proper graph database! \| ellisv wrote: \| I've considered using a column to store indicate 2nd, 3rd, \| etc degree relations instead of n-th order tables. \| YouWhy wrote: \| While the basic technique is sound, SQL recursive queries' no- \| side-effects character means a huge number of basic graph \| traversal strategies (e.g.: pruning, shortest path \| first/Dijkstra) are beyond reasonable engineering. \| \| In addition to that, Postgres is difficult as a computational \| platform - the execution strategy for recursive queries is not \| necessarily intuitive, and their performance is hard to assess \| (for example: how soon will you run out of "stack"?) \| cpa wrote: \| It's a good tool as long as your queries are simple, and your \| graph isn't too large (which is probably the case 99% of the \| time). And if that allows you to have one less tool to deal with, \| go for it! \| \| But I've tried to push it to its limits (using PostGIS, my nodes \| being land plots that were connected to each others -- the \| biggest query used joins, lateral, recCTE, windows... you name \| it) and ran into many issues: can't branch out early, hard to \| optimize, no support for complex queries (writing anything beyond \| BFS and DFS is a pita). Yet, in the end I accepted the struggle \| (and the long queries) because it was so nice to work with my \| data directly where it was kept :) \| amyres wrote: \| Using the built-in PostGIS topology tools? Or something more \| custom? I'm curious as I just started digging into and using \| PostGIS for a land parcel use case. I've wondered about the \| graph structure of parcels adjacent to others, rather than just \| a big unordered table of geom objects. \| cpa wrote: \| Just using the geography stuff and doing joins on \| ST_Intersects. I couldn't guarantee that my data formed a \| partition of the plane, which is necessary for topology iirc. \| \| Fun was had and headaches too. The biggest speedup I got was \| computing the graph online (creating a node/vertices tables \| from geometries) and then doing the recursive CTE on that \| table. \| amyres wrote: \| Cool thanks! I'm amazed at how much can be done in PostGIS \| (if my SQL queries are good enough...) \| Rapzid wrote: \| I've found this post pretty inspiring for what can be done with \| graphs in Postgres: https://www.alibabacloud.com/blog/postgresql- \| graph-search-pr... \| \| There are a LOT of use cases that will fit the capabilities and \| performance characteristics. \| stevebmark wrote: \| Is with_recursive nearly as performant as dedicated graph \| database query languages? \| vrglvrglvrgl wrote: \| [dead] \| crunchengine wrote: \| I use EdgeDB for graphs, it works well enough, I just don't know \| about performances but for sure it is sufficient for most use- \| cases. \| e1g wrote: \| EdgeDB is Postgres so the same constraints apply. \| pcthrowaway wrote: \| I imagine that's why the person you're responding to \| mentioned it. I'm surprised the author of this article didn't \| claytongulick wrote: \| It doesn't handle all cases or all queries, but one method I've \| used to deal with graph data in relational databases it to \| materialize paths and cache them on the node (this only really \| works well with DAGs). \| \| So, the path key could look something like: bob.ted.lisa.bill \| (obviously, use ids instead of names here) \| \| Then you can index this in various ways to make LIKE queries \| fast. \| \| Want do know anyone that rolls up to ted? \| select * from blah where path like 'bob.ted%' \| \| Or, how many people report to ted, but aren't under lisa? \| select count() from blah where path like 'bob.ted%' and path not \| like 'bob.ted.lisa%' \| \| And something a bit more complex that would typically be \| considered the domain of a graph database, like "What are the \| average number of direct reports each manager in my organization \| has?" select name, avg((select count() from \| blah as sub where path like '%' \|\| name \|\| '%' and \| (array_position(string_to_array(path,'.'), name) = \| ((array_position(string_to_array(path,'.'), sub.name) - 1)) \| \| So, even with that degree of complexity, you're still avoiding \| recursion, though you might be getting into O(n^2) territory if \| you don't have an index that will handle '%' \|\| ... \|\| '%' well. \| You can expand on this with various techniques depending on the \| application. \| \| It's certainly no replacement for a graph database, but for basic \| stuff it's very fast, handles huge volumes, and doesn't rely on \| any sort of recursion at the database layer (unless you \| materialize in a trigger or something). \| \| The actual materialization code can be tricky, one technique to \| deal with that is to use the author's technique or something \| similar to have a recursive CTE in a view. When a graph change is \| made, you can scope the change based on the path's hierarchy and \| update the paths from the recursive view, that way you only hit \| the performance penalty at write time, not read. \| szarnyasg wrote: \| I designed and maintain several graph benchmarks in the Linked \| Data Benchmark Council, including workloads aimed for databases \| [1]. We make no restrictions on implementations, they can any \| query language like Cypher, SQL, etc. \| \| In our last benchmark aimed at analytical systems [2], we found \| that SQL queries using WITH RECURSIVE _can_ work for expressing \| reachability and even weighted shortest path queries. However, \| formulating an efficient algorithm yields very complex SQL \| queries [3] and their execution requires a system with a \| sophisticated optimizer such as Umbra developed at TU Munich [4]. \| Industry SQL systems are not yet at this level but they may \| attain that sometime in the future. \| \| Another direction to include graph queries in SQL is the upcoming \| SQL/PGQ (Property Graph Queries) extension. I'm involved in a \| project at CWI Amsterdam to incorporate this language into DuckDB \| [5]. \| \| [1] https://ldbcouncil.org/benchmarks/snb/ \| \| [2] https://www.vldb.org/pvldb/vol16/p877-szarnyas.pdf \| \| [3] \| https://github.com/ldbc/ldbc_snb_bi/blob/main/umbra/queries/... \| \| [4] https://umbra-db.com/ \| \| [5] https://www.cidrdb.org/cidr2023/slides/p66-wolde-slides.pdf \| eternalban wrote: \| Re [5]'s asssertion under "blunders" of the diminish usecases \| post sql/pgq, what do you think of [something] like Quine? \| \| https://github.com/thatdot/quine \| \| Their claim to fame is progressive incremental computation - \| each node is an actor responding to events -- and I'm not sure \| how a relational db could do that and match the latencies. That \| usecase is pretty much pattern matching and forensics and stuff \| like that. \| \| https://docs.quine.io/core-concepts/architecture.html \| hot_gril wrote: \| Postgres is powerful. I've implemented graph DB, mapreduce, huge \| sparse matrix math (which was cool), and ofc a KV store with it. \| With actual performance benefits over the alternatives. It's not \| very ergonomic for those, but it works. \| \| That said, I've had so many jobs where the team decides to fit \| our whole data model into a fully recursive graph, and it's been \| a huge mistake every time regardless of whether they use Postgres \| or a dedicated graph DB. Just because you can do it (you always \| can!) doesn't mean you should. Start with just a traditional \| n-hierarchy schema and only look at modeling things as a graph if \| you're sure it's necessary. Usually it's not. Sometimes you'll \| have a mostly non-graph schema with a couple of graph tables for \| the things that really need that. \| ape4 wrote: \| Postgres has commands to easily list conventional tables nicely. \| While I can see this would work, there are no native Postgres \| commands to list the graph. \| jzelinskie wrote: \| We used the "WITH RECURSIVE" basically the moment it shipped in \| Postgres for the original CoreOS Clair[0] data model which was a \| graph of software versions and vulnerabilities. It scaled well \| compared to a naive graph abstraction implemented outside the \| database, but when performance wasn't great, it REALLY wasn't \| great. After running that version in production for a couple \| years, we ended up throwing it out and remodeling the domain to \| be less graph-y to achieve more consistent performance. \| \| I've since worked on SpiceDB[1] which scales much better by \| taking the traditional design approach for graph databases and \| only treating Postgres as triple-store. IME, there's no short- \| cut: if you need a graph, you probably want to use a database \| optimized for graph access patterns. Most general-purpose graph \| databases are full of optimizations for common traversals that \| are uncommon operations in relation databases. \| \| [0]: https://github.com/quay/clair \| \| [1]: https://github.com/authzed/spicedb \| yooloo wrote: \| Of course relational db can act like a graph db. It's just not as \| efficient due to how things are stored and queried. Would be \| great to have a graph db plugin (and I found one \| https://github.com/apache/age) \| piyh wrote: \| Having first class pagerank and shortest path functions in \| tinkerpop gremlin vs having to roll it yourself with recursive \| CTEs feels like a graph DB win. \| sigstoat wrote: \| any suggestions for getting nice graph db-style functionality out \| of an postgres schema, where you've got the nodes and edges of a \| labelled property graph spread across a bunch of different \| tables? (AGE seems to want to create a separate graph object that \| lives alongside your other tables, and toy examples like this \| article just dump everything into special tables) \| \| in my particular case i don't actually have a lot of data so i'm \| not looking for performance, just want to save developer time. \| ghotli wrote: \| I like this idea and ultimately what I want is "is this a DAG or \| not" and if so "give me the topological sort". \| \| Anyone know of good docs for this sort of pattern with SQL that \| addresses those two common tasks? \| varunjain99 wrote: \| This is a neat data modeling exercise, but not sure why you'd \| prefer this approach over Neo4j. \| \| It's much simpler to query in native graph database land (though \| it's not exactly SQL, it's pretty close to be honest). And there \| are performance / reliability implications of forcing postgres \| into graph land. \| piyh wrote: \| This isn't just a PG thing. The wiki article isn't great, but the \| references have links to sqlite, mssql, oracle, ibm, etc. \| \| https://en.wikipedia.org/wiki/Hierarchical_and_recursive_que... \| simonw wrote: \| This same technique - WITH RECURSIVE - is also supported by \| SQLite. \| \| I ported the example in this tutorial to SQLite here - now you \| can play with it in Datasette Lite: \| https://lite.datasette.io/?sql=https%3A%2F%2Fgist.githubuser... \| \| Amusingly I ported the sample PostgreSQL code using the new \| ChatGPT "Browsing" alpha, with the following prompt: \| \| > Read this tutorial and then output equivalent create table and \| insert statements for SQLite \| https://www.dylanpaulus.com/posts/postgres-is-a-graph-databa... \| punnerud wrote: \| If we have links to other Sqlite3 databases in the SQLite \| itself, we could make an static webpage distributed graph \| database: https://news.ycombinator.com/item?id=27016630 \| \| With the static database trick, combined with recursive graphs \| mharig wrote: \| IIRC, the maybe biggest graph database in the world, TAO (The \| Associations and Objects ?) of FB, is based on MySQL. \| \| Meta may have had a good reason not to choose a 'native' graph \| DB. \| its-summertime wrote: \| Handling cycles seems to be a missing point. \| \| https://www.postgresql.org/docs/current/queries-with.html#QU... \| the docs themselves do have information about it. \| ellisv wrote: \| Postgres can detect cycles; see 7.8.2.2 in the docs: \| https://www.postgresql.org/docs/current/queries-with.html#QU... \| jakewins wrote: \| Johan wrote the original Neo4j implementation as a graph layer on \| top of Postgres, random trivia. Then Java released NIO, he got \| excited and tried replacing the PG engine with one built for \| graphs from scratch. \| kaba0 wrote: \| Only remotely relevant, but I have recently come across Gremlin, \| a sort of "SQL for graphs". Does anyone have some experience with \| it they can share? I only used it on an in-memory graph database, \| and I found it quite great so far. \| inguz wrote: \| Tinkerpop/Gremlin is not really SQL-like at all. My experience \| with it has not been fun: it's difficult to reason about, has \| obscenely bad documentation, poor tooling, idiosyncratic \| library support, and simple data access patterns are hard. \| user00012-ab wrote: \| whoa.... postgres is moving quickly and taking jobs away from \| other databases. Maybe we need to hold up for 6 months on any new \| uses for postgres until we all have some time to figure this out. \| truth_seeker wrote: \| Few suggestions based on my prior experience on the same task in \| production system:- \| \| 1. Both the tables (Nodes and Edges) MUST HAVE one more column as \| "label" and then you must create partitions based on distinct \| values of "label" column. This greatly helps queries to run \| faster as search plan is narrowed when you include it in where \| clause and PG knows partitions to hit. \| \| 2. Instead of one big fat primary key column in each table use, \| consider having different uniques indexes on each partition. \| \| 3. The EDGES table can afford to have "data" column of data type \| TEXT or perhaps JSONB. PG saves it in different data disk layout \| and compression works great here. We used it to cache the result \| for previous/next nodes data or compute the result for many extra \| hops which are required on frequent basis. \| \| 4. Use PG procedures to hide the complexity of reusable DFS/BFS \| queries. \| yayr wrote: \| Can someone explain or point towards an explanation, what the \| performance impact of such a query is? How does it scale with \| number of edges or depth of recursion? Is there a special smart \| way of indexing this to be efficient? \| henryfjordan wrote: \| If you store a list of edges like (startId, endId) with an \| index on startId, then every time you want to follow an edge \| will cost O(log(n)) where n is the number of edges in your \| graph. \| \| You can get O(1) performance on similar queries using a \| different storage strategy (by storing edges as pointers to the \| other nodes in memory) but SQL is not the right tool for that, \| you would be better off with a true graph DB. \| winrid wrote: \| If you have indexes on next_node and previous_node, it'll \| preform okay. \| \| The size of the table (# of edges) won't actually matter that \| much performance wise with btree indexes. The performance will \| mostly depend on how deep you want to go with your recursion. \| The more nodes you check per query, the slower it'll be. \| Luckily PG will keep the frequently accessed nodes and index \| pages in the buffer pool, so it'll perform close enough to an \| in-memory graph database for most use cases that access the \| same data often - while not requiring your whole dataset to fit \| into memory all the time. \| \| MongoDB's current graph search mechanism works the same way. \| \| I think actually if you are using incremental ids and you don't \| delete edges, you could try BRIN indexes for the next/previous \| node columns. This could still get you pretty good performance \| with _much_ lower storage space requirements than a \| "traditional" index, so you could theoretically store a pretty \| large graph (thinking billions of edges) with minimal disk \| space. \| ForHackernews wrote: \| I can't believe this article doesn't mention the (first-party) \| ltree extension: https://hoverbear.org/blog/postgresql- \| hierarchical-structure... \| fbn79 wrote: \| Is not nested sets https://en.wikipedia.org/wiki/Nested_set_model \| a better solution than the one described in the article? \| elchief wrote: \| Joe Celko covers graphs in chapters 12-14 of Trees and \| Hierarchies in SQL for Smarties \| \| as well as chapter 27 of SQL for Smarties, 5th \| \| isn't ISO working on an graph extension to regular SQL? Can't \| seem to google it correctly \| garyclarke27 wrote: \| In my experience Postgres works really with a Recursive CTE \| inside a Materialized View for Graphs. Latest version can detect \| cycles. I used to add a distance column as well, to show number \| of hops apart, any 2 nodes are. Hoping Postgres adds Automatic \| Incremental Mat View Maintenance soon, it will be perfect then as \| a Graph Database. \| TOMDM wrote: \| Incremental materialised view maintenance is a killer feature. \| \| If Postgres added it, almost all of my interest in Materialize \| (the database, not the technique) would vanish immediately. \| dfragnito wrote: \| We accomplish similar with SchemafreeSQL. Currently MySQL 8 back- \| end but working on other SQL back-ends. This demo uses \| PlanetScale as the DB back-end, Fly.io handles the SFSQL \| endpoint, Netlify function handles the query, and cytoscape.js \| graph network library for the UI \| \| https://harmonious-mermaid-c4d794.netlify.app/got (sorry not \| mobile friendly but functional) \| samlambert wrote: \| This is a cool project. \| dfragnito wrote: \| Thanks !! We had hoped to use PS for our serverless backend \| and also offer a dedicated PS option. But we ran into some \| issues with Vitess. I was just thinking today I wanted to get \| back into it and see if we can resolve these issues. We had \| to go with Aurora. \| \| The issues were on deletes. This demo is read only so it \| works well with a PS backend. My experience with your CS has \| been amazing despite these issues!! \| _boffin_ wrote: \| Again, i'll post \| https://books.google.com/books/about/Joe_Celko_s_Trees_and_H... \| \| pdf: \| https://github.com/Gumplefik/pdf/blob/master/Joe%20Celko's%2... \| packetslave wrote: \| we're posting copyrighted PDFs to HN now? \| jghn wrote: \| Over time I've come to take this a step further. I feel people \| reach for a graph database way too early. Most use cases that \| most people have will work fine in a standard relational setup. \| And that brings with it the benefits of using technologies like \| Postgres that have stood the test of time. So I treat it as I do \| any performance consideration: first demonstrate that you \| actually need to go down the path less traveled, don't just \| assume you need to do so. \| rajman187 wrote: \| Completely agree. In fact, even massive graph data can make use \| of a relational database + caching [1] \| \| [1] https://engineering.fb.com/2013/06/25/core-data/tao-the- \| powe... \| cryptonector wrote: \| The reason graph DBs exist is mainly to provide easier syntax for \| recursive queries (which aren't really recursive because they're \| tail-recursive, which means iterative rather than what most \| people think of as recursive). \| \| Any questions? \| \| It's not clear how one might make SQL syntax for recursive \| queries simpler. Clearly we can have syntactic sugar for \| "dereferencing" relationships, so one could say `SELECT \| child-->name FROM parent_child WHERE ...;`, and we could have \| aggregation `SELECT array_agg(child-->name) FROM parent_child \| WHERE ...;`, and `SELECT child--->name FROM parent_child WHERE \| ...;` to say "recurse all the way through the child relation", \| but if you want more than one column then things get tricky. \| dapearce wrote: \| Also see Apache Age, a graph database extension for Postgres: \| https://github.com/apache/age \| philzook wrote: \| Very interesting. Relatedly, I've been refining simple \| lightweight ways to use a regular SQL database to execute \| seminaive datalog queries. Path reachability in a graph is the \| canonical example. One could use stored functions to implement \| the outer loop also. \| \| There are some other fun related applications like graph \| rewriting using SQL, etc. \| \| MiniLitelog: Easy Breezy SQLite Datalog - \| https://www.philipzucker.com/tiny-sqlite-datalog/ \| ChicagoDave wrote: \| As a C# guy, I've figured out how to build in-memory graphs with \| generics and dsl/fluid queries. I'm working on a blog entry about \| it because it's such a powerful skill to learn and leverage. No \| data store required. I can deserialize/serialize the data \| structure to/from binary or json. \| ralusek wrote: \| I've done this on many occasions, but I actually only go with a \| single nodes table. \| \| Schema is like this id type: string (or \| enum) data: jsonb indexed: jsonb (but this one \| gets a GIN index) toId: nullable foreign key to this same \| table fromId: nullable foreign key to this same table \| createdAt: date updatedAt: date \| \| So if I had a post with a comment on it, the data would look \| something like this { id: '1111', \| type: 'users', data: { name: 'John Doe' }, } \| { id: '2222', type: 'users', data: { \| name: 'Jane Doe' }, } { id: '3333', \| type: 'posts', data: { text: 'Nice day today' }, \| } { id: '4444', type: 'comments', \| data: { text: 'For you, maybe. Raining here' }, } \| \| And now for the edges: { id: '5555', \| type: '(posts < users)', toId: '3333', \| fromId: '1111', } { id: '6666', \| type: '(comments < users)', toId: '4444', \| fromId: '2222', } { id: '7777', \| type: '(posts < comments)', toId: '3333', \| fromId: '4444', } \| \| So then, for example, if I have a post, and I want to find the \| comments on it, I search for toId = , type = '(posts \| < comments)'. \| TOMDM wrote: \| This works well for nodes with at most 1 parent and child, the \| post is aimed at supporting many to many relationships though \| ralusek wrote: \| What I've described here supports many to many... \| fauigerzigerk wrote: \| Yes, but the downside of your schema is that the \| distinction between nodes and edges is somewhat muddled. \| \| Presumably, all nodes would have nulls in both fromId and \| toId, but the schema doesn't enforce that. The schema also \| allows linking edges to other edges. \| \| Don't you think this is a bit too much flexibility if the \| intention is to model a graph? \| Dopameaner wrote: \| Andy Pavlov, mentioned something similar. Where, he advocated \| Relational database to solve most of the graph database \| utilities. He also made a bet, if graph databases spawn a legit \| usecase by 2030, that he would change his CMU profile picture. \| henryfjordan wrote: \| I implemented pretty much exactly this setup once. \| \| For what it's worth, there's a lot of footguns with this \| approach. You need to be careful about infinite cycles and things \| like that. You also just do not get the ergonomics of using a \| query language that is meant for graph data. Once you get it all \| setup and experience the issues you will no doubt be able to \| build whatever you want but there's a learning curve. \| \| In the end we switched to using Neo4j and were able to build new \| features a lot more quickly than we would've been able to on \| Postgres. \| \| It's also worth mentioning that there are many ways to store \| graph data, and using an "adjacency list" like is being done here \| may or may not be the best way for your use case. Neo4j I think \| uses a different approach where nodes store info about edges they \| are connected to directly, so you don't even need to hit an index \| to follow an edge. \| mistermann wrote: \| Piggybacking on your Neo4J reference....can anyone suggest any \| resources that would be good for someone from the SQL world who \| is very uninformed on graph databases to quickly get their head \| around the key ideas, but even more importantly, how to choose \| a platform among the offerings (licensing being I think a key \| issue, I've heard Neo4J can get expensive)...assume a large, \| "social media" scale system, to be safe. \| viksit wrote: \| you can use datomic as a good example. \| https://hashrocket.com/blog/posts/using-datomic-as-a- \| graph-d... \| \| another good example using predicates - \| https://github.com/threatgrid/asami/wiki/2.-Introduction \| pottertheotter wrote: \| I was looking at graph databases/Neo4j a lot early last year \| and found Neo4j has a lot of good resources. \| \| You can get this book for free through their website. [1] \| \| This have some good documentation[2], including this page \| "Transition from relational to graph database".[3] \| \| [1] https://neo4j.com/graph-databases-book/ \| \| [2] https://neo4j.com/docs/getting-started/current/ \| \| [3] https://neo4j.com/docs/getting- \| started/current/appendix/grap... \| henryfjordan wrote: \| Neo4j has a self-hosted community edition that is free, \| though you need to update to Enterprise to expand your DB \| beyond one node. It's worth noting though that scaling from a \| single node to a cluster is going to have some pretty huge \| performance issues whether you are using SQL or Neo4j to \| store your graph. The power of graph databases is that it \| should be very inexpensive to follow an edge in the graph but \| when that leads to a network hop you lose a lot of \| performance. \| \| If you are comfortable fitting your data onto one box, then \| development speed is probably more important than other \| factors and I would just try a few databases out and \| especially pay attention to how good the libraries are in \| your programming language of choice. Neo4j for instance had \| high quality libraries in Java but the experience in Python \| was not as good. \| \| If you have a lot of data or need next-level performance, I \| would start by doing some research on the various ways to \| store graph data and then pick a DB that supports the \| approach you want to take. \| joshspankit wrote: \| As someone who hated being in the SQL world and couldn't \| figure out why until years later when I found out about graph \| databases, here's one big shift: \| \| Graph database store true relationships. \| \| The R in RDBMS and the concept of relationships by primary \| keys are lies (imo). They lead people away from what true \| relationships can be. Databases like Neo4j are not about \| doing any sort of PK or join or merge-before-query. Looking \| up by connection is the _fastest_ way to find information. If \| your RDBMS had one table for phone numbers, one for email \| addresses, one for first names, one for last names, one for \| street, and so-on it would take huge amounts of time to query \| to find all the detail for a single person. With a graph \| database it takes a small amount of time to find the first \| bit of info (let's say phone number since it's easily \| sorted), and then every other bit of linked data is \| essentially free. \| eurasiantiger wrote: \| Effectively, moving from RDB to a graph database just \| shifts relationship resolution from query time to \| insert/update time. Sometimes this can be an advantage. I'd \| even wager usually. \| taubek wrote: \| For some basics of graph modeling take a look at \| https://memgraph.com/docs/memgraph/tutorials/graph-modeling. \| \| Some differences between SQL and graph databases are covered \| at https://memgraph.com/blog/graph-database-vs-relational- \| datab.... \| cryptonector wrote: \| > You need to be careful about infinite cycles and things like \| that. \| \| Not really, just never forget this: always use `UNION`, _and \| never_ `UNION ALL`, in recursive queries! \| revskill wrote: \| PostgreSQL is my love. Thanks for all developers. \| \| Do not blame the tool if you feel it doesn't fit your niche. \| Blame yourself first, and try to use it the right way. \| amir734jj wrote: \| Good example. Now do it with undirected graph. How will the query \| look like? \| \| So, is this basically running transitive closure on the database \| during query? it would be expensive. \| pphysch wrote: \| Small nitpick but I wish the author just made `data` a JSONB \| field rather than VARCHAR. That really shows the power of \| Postgres, being able to operate as a "NoSQL graph DB" with little \| to no hacking. \| ganderzz wrote: \| Hey, author here! Thanks for the feedback! I went back and \| forth with having `data` being a JSONB field or VARCHAR, but \| ended up with VARCHAR to show that `nodes` don't have to be \| anything crazy. Really `nodes` is just a standard table you'd \| see in any old SQL database, and what makes it a "graph" is the \| `edges`/relationship table. \| bottlepalm wrote: \| I can see how this possibly won't scale compared to dedicated \| graph dbs. \| \| I'm interested in what the fundamental difference is in those \| graph dbs because rdbms seem so close I don't see why it isn't \| just provided as an extra layer on top. \| \| As described in the article, any graph can already be built with \| an rbms, so why for example does neo4j need to be standalone and \| can't possibly be a layer on top of postgres to leverage existing \| relationships? Any good articles on that? \| emodendroket wrote: \| I don't know off the top of my head but I'd assume that one \| could adjust the storage strategy and other performance \| tweaking to optimize for the expected use case. \| samwillis wrote: \| If you are looking to utilise this with Django, there is the \| brilliant Django CTE project that enables it and other CTE \| queries with the Django ORM: \| \| https://dimagi.github.io/django-cte/#recursive-common-table-... \| taubek wrote: \| I personally prefer native graph databases such as Memgraph[1]. \| Graph databases have it's advantages and disadvantages. Sometimes \| you, need RDBMS, sometimes you need NoSQL solution. So I pick my \| stack based on situation and issue that I need to solve. \| \| [1] https://memgraph.com \| rajman187 wrote: \| "Native" is a misleading or misused term in the context of \| graph databases because a graph cannot be mapped in a maximally \| consistent way with the underlying hardware (the von Neumann \| architecture represents data in a sequential manner and it is \| much faster to access it this way rather than randomly). \| \| There are generally two families in the graph database world: \| those which use underlying traditional tables of nodes and \| many-to-many edges; and index-free adjacency which just means \| each node in the graph knows the memory address of its \| connections (other side of the edges). \| \| Distributed graphs necessarily end up using the former because \| it's difficult if not impossible for a node to know the memory \| address of its connection when that crosses a physical \| boundary. So typically index-free adjacency graphs have a \| master-slave setup with multiple read replicas but a single one \| to write to. \| \| So with a "native graph" you don't rely on potentially \| expensive join operations to find neighbors of neighbors and \| can traverse complex paths easily. \| dtomicevic wrote: \| This is spot on. Though, in Memgraph's context, there is a \| combination of lock free techniques and raw pointer \| representation that can work exceptionally well for a large \| variety of use cases, and even better so compared to others \| on streaming / event driven / large volume of writes. Using \| raw pointers, there are also optimization opportunities when \| storing or rebalancing the graph to maximize the number of \| adjacent nodes in a cache line so benefiting from sequential \| representation too. \| mashroom wrote: \| Thanks for pointing me toward Memgraph. I didn't hear of it \| before and it appears to be a great alternative to neo4j. \| nilsbunger wrote: \| Sure, at its most basic a graph database is just a bunch of \| many2many relationships. \| \| The recursive query is cool, but how is the performance for graph \| queries compared to a DB optimized for this use case ? \| ellisv wrote: \| I haven't compared, but something like Kuzu is probably much \| more performant for a lot of use cases because it implements \| Worst-Case Optimal Joins. \| \| But I think Postgres is likely to get these improvements in a \| few years too. ___________________________________________________________________ (page generated 2023-03-31 23:00 UTC)