proxy70

	[HN Gopher] Show HN: Cozo - new Graph DB with Datalog, embedded ... ___________________________________________________________________ Show HN: Cozo - new Graph DB with Datalog, embedded like SQLite Hi HN, I have been making this Cozo database since half a year ago, and now it is ready for public release. My initial motivation is that I want a graph database. Lightweight and easy to use, like SQLite. Powerful and performant, like Postgres. I found none of the existing solutions good enough. Deciding to roll my own, I need to choose a query language. I am familiar with Cypher but consider it not much of an improvement over CTE in SQL (Cypher is sometimes notationally more convenient, but not more expressive). I like Gremlin but would prefer something more declarative. Experimentations with Datomic and its clones convinced me that Datalog is the way to go. Then I need a data model. I find the property graph model (Neo4j, etc.) over-constraining, and the triple store model (Datomic, etc.) suffering from inherent performance problems. They also lack the most important property of the relational model: being an algebra. Non-algebraic models are not very composable: you may store data as property graphs or triples, but when you do a query, you always get back relations. So I decided to have relational algebra as the data model. The end result, I now present to you. Let me know what you think, good or bad, and I'll do my best to address them. This is the first time that I use Rust in a significant project, and I love the experience! Author : zh217 Score : 278 points Date : 2022-11-08 12:25 UTC (10 hours ago)
	web link (github.com)
	w3m dump (github.com)
	\| canadiantim wrote: \| You mention how Cypher is not much of an improvement over CTE in \| SQL, I was wondering if you could expand on this point a bit if \| possible? \| \| Some part of me is considering using Apache AGE graph extension \| for postgres, but another part wonders whether it's worth it \| considering CTE's can do a lot very similarly. \| \| I'll definitely be following the progress for Cozo though, sounds \| great on the face of it. Definitely will have to consider \| potentially using Cozo as well. I wonder if it could make sense \| to use Postgres and Cozo together? \| zh217 wrote: \| Yes of course. \| \| Perhaps I should start by clarifying that I am talking about \| the number of queries the Cypher language can express, without \| any vendor-specific extensions, since my consideration was \| whether to use it as the query language for my own database. \| And Cypher is of course much more convenient to _type_ than SQL \| for expressing graph traversals - it was built for that. \| \| With that understanding, any cypher pattern can be translated \| into a series of joins and projections in SQL, and any \| recursive query in cypher can be translated into a recursive \| CTE. Theoretically, SQL with recursive CTE is not Turing \| complete (unless you also add in window functions in recursive \| CTE, which I don't think any of the Cypher databases currently \| provide), whereas Datalog with function symbol is. Practically, \| you can easily write a shortest path query in pure Datalog \| without recourse to built-in algorithms (an example is shown in \| README), and at least in Cozo it executes essentially as a \| variant of Dijkstra's algorithm. I'm not sure I can do that in \| Cypher. I don't think it is doable. \| samuell wrote: \| Does Cypher even support nested and/or recursive queries? I \| remember asking the Neo4j guys at a meetup about that many \| years ago, and they didn't even seem to understand the \| question. Might have changed since then of course. \| \| Otherwise the thing I have noticed with the datalog (as well \| as prolog) syntax, is you are able to build a vocabulary of \| re-usable queries, in a much more usable was than any of the \| solutions I've seen in SQL, or other similar languages. \| \| It thus allows you to raise your level of abstraction, by \| layer by layer define your definitions (or "classes" if you \| will) with well crafted queries, that can be used for further \| refined classifying queries. \| zh217 wrote: \| Re Datalog syntax: yes, the "composability" is the main \| reason that I decided to adopt it as the query language. \| This is also the reason why we made storing query results \| back into the database very easy (no pre-declaration of \| "tables" necessary) so that intermediate results can be \| materialized in the database at will and be used by \| multiple subsequent queries. \| samuell wrote: \| Indeed, composability is the spot-on keyword here. \| [deleted] \| samuell wrote: \| How I have waited for this: A simple, accessible library for \| graph-like data with datalog (also in a statically compiled \| language, yay). Have even pondered using SWI-prolog for this kind \| of stuff, but it seems so much nicer to be able to use it \| embedded in more "normal" types of languages. \| \| Looking forward to play with this! \| \| The main thing I will be wondering now is how it will scale to \| really large datasets. Any input on that? \| samuell wrote: \| For folks looking for documentation or getting started- \| examples, see: \| \| - The tutorial: https://nbviewer.org/github/cozodb/cozo- \| docs/blob/main/tutor... \| \| - The language documentation: \| https://cozodb.github.io/current/manual/ \| \| - The pycozo library README for some examples on how to run \| this from inline python: \| https://github.com/cozodb/pycozo#readme \| zh217 wrote: \| Thanks for your interest in this! \| \| It currently uses RocksDB as the storage engine. If your server \| has enough resources, I believe it can store TBs of data with \| no problem. \| \| Running queries on datasets this big is a complicated story. \| Point lookups should be nearly instant, whereas running \| complicated graph algorithms on the whole dataset is \| (currently) out of the question, since all the rows a query \| touches must reside in memory. Also, the algorithmic complexity \| of some of the graph algorithms is too high for big data and \| there's nothing we can do about it. We aim to provide a smooth \| way for big data to be distilled layer by layer, but we are not \| there yet. \| samuell wrote: \| Many thanks for the detailed answer! \| mark_l_watson wrote: \| Thank you, this looks very useful. I will try the Python embedded \| mode when I have time. \| \| I especially like the Datalog query examples in your READ project \| file. I usually use RDF/RDFS and the SPARQL query language, with \| must less use of property graphs using Neo4J. I expect an easy \| ramp up learning your library. \| \| BTW, I read the discussion of your use of the AGPL license. For \| what it is worth, that license is fine with me. I usually release \| my open source projects using Apache 2, but when required \| libraries use GPL or AGPL, I simply use those licenses. \| dmitriid wrote: \| I nitpick for the README: consider converting examples from \| images to code blocks (you can even directly copy-paste them into \| the code blocks and they should retain their formatting) \| \| Otherwise: yes, please. I love the idea. \| mola wrote: \| Graph query over relational data, brilliant. I need this \| yesterday. \| OtomotO wrote: \| Awesome work, congrats. \| \| For someone who never did anything datalog I didn't see an \| example in the repo and the docs (docs.rs) could need some more \| content. \| \| I hope to see a 1.0 at some point and performance that can \| compete with SQLite. \| \| Would love to have an alternative, especially as I have a few pet \| projects that have graph data (well, in the end the whole \| universe can be modelled as a graph ;)) \| zh217 wrote: \| I'm very happy that you like it! \| \| The "teasers" section in the repo README contains a few \| examples. Or you could look at the tutorial \| (https://nbviewer.org/github/cozodb/cozo- \| docs/blob/main/tutor...), which contains all sorts of examples. \| \| The Rust documentation on docs.rs could certainly be improved, \| will do that later! \| OtomotO wrote: \| Ah, yes, mea culpa. Was browsing on the phone and did miss \| that link indeed. \| \| Is is also okay to store big data that would otherwise go \| into another storage like e.g. blog-posts? \| \| I mean the content could also be modeled as a leaf-node and \| not be part of the db itself. (not sure if that would be \| abusing the kv storage) \| zh217 wrote: \| In short: yes, but not right now. See this issue: \| https://github.com/cozodb/cozo/issues/2. Also in this case \| you are not really using it as an embedded database \| anymore, which is our original motivation. We currently \| also provide a "cozoserver", but it is pretty primitive at \| the moment. "Big data" capabilities, when they arrive in \| Cozo, will probably go into the server instead of the \| embedded binaries. \| OtomotO wrote: \| Hm, why wouldn't that be embedded? \| \| How do you define embedded? \| \| One of my application is a simple "blog-like" webservice \| where you can either use a SQLite db or Postgres. \| \| Personally I often prefer SQLite because it doesn't need \| a thousand configurations and I can just migrate all the \| content with copying a file. \| zh217 wrote: \| My use of "embedded" means that the whole database runs \| in the same process as your application. This is how \| SQLite works. Your application doesn't "connect" to an \| SQLite database in the usual sense. Your application \| simply contains SQLite as part of itself. Contrast this \| with Postgres, where you first need to start a Postgres \| server and then have your application talk to it. \| OtomotO wrote: \| Exactly. \| \| I was just curious because of your comment: \| \| > Also in this case you are not really using it as an \| embedded database anymore, which is our original \| motivation \| \| As by your (and mine) definition, I am indeed using it as \| an embedded database. It's running inside the process and \| storing (and persisting) blog-posts. \| Serow225 wrote: \| I'm excited to get some more Rust docs! \| \| Even just a pointer to serde ::from_value(value).unwrap(), \| and ::deserialize(value), would be \| helpful to get people pointed in the right direction. \| \| Looks like a super cool project, congrats! \| ithrow wrote: \| _you may store data as property graphs or triples, but when you \| do a query, you always get back relations_ \| \| Can you elaborate on this? in datomic you can get back \| hierarchical data \| ekidd wrote: \| This is a really impressive piece of work! Congratulations! \| \| I note that it appears to be a library, but it's licensed under \| the Affero GPL. I believe this means that if I link your library \| into a program, and if I then allow users to interact with that \| combined program in any way over a network, then I have to make \| it possible for users to download the source code to my entire \| program. Is that your goal here? Were you thinking of some kind \| of commercial licensing model for people writing server-side apps \| that use your library? \| \| (I'm curious because I've been deciding whether or not to roll my \| own toy Datalog for a permissively-licensed open source Rust \| project.) \| zh217 wrote: \| No, my understanding is that if you don't make any changes to \| the Cozo code, you don't need to release anything to the \| public. If you do, and you cannot release your non-Cozo code, \| then you must dynamically link to the library (and release your \| changes to the Cozo code). The Python, NodeJS and Java/Clojure \| libraries all use dynamic linking. \| \| There is no plan for any commercial license - this is a \| personal project at the moment. My hope is for this project to \| grow into a true FOSS database with wide contributions and no \| company controlling it. If a community forms and after I \| understand the consequences a little bit more, the license may \| change if the community decides that it is better for the long- \| term good of the project. For the moment though, it is staying \| AGPL. \| Cu3PO42 wrote: \| Let me preface by saying that this seems like a great piece \| of software and it is absolutely within your right to license \| it as whatever you would like, no matter what any of the \| commenters here think. \| \| However, I don't believe your understanding of AGPL is \| accurate. \| \| > No, my understanding is that if you don't make any changes \| to the Cozo code, you don't need to release anything to the \| public. If you do, and you cannot release your non-Cozo code, \| then you must dynamically link to the library (and release \| your changes to the Cozo code). The Python, NodeJS and \| Java/Clojure libraries all use dynamic linking. \| \| This sounds like you're thinking of the LGPL, not AGPL. \| Whereas LGPL is less strict than GPL because the exception \| you describe above applies. AGPL on the other hand is more \| strict. Essentially, if you use any AGPL code to provide a \| service to users then you must also make the source code \| available, even if the software itself is never delivered to \| users. \| \| The intention here is that you can't get around GPL by hiding \| any use of the GPL code behind a server, so it makes perfect \| sense to use it for a database. But I don't think it does \| what you want. \| \| Whichever way you decide to go, be it AGPL, LGPL or something \| else, I encourage you to make a choice before accepting any \| outside contributions. As soon as you have code from other \| authors without a CLA you will need to obtain their \| permission to change the license (with some exceptions). \| \| (Disclaimer: I'm not a lawyer, just interested in licenses.) \| zh217 wrote: \| It seems that I really did misunderstand the differences. \| It is now under LGPL. The repo still requires CLA for \| contribution for the moment until I am really sure. \| zh217 wrote: \| Thank you for your perspective. \| \| Maybe I was confused about the case of using an executable \| vs linking against a library. Let me double-check with a \| few friends who understand copyright laws better than me. \| If everything checks out, the next release will be under \| LGPL. \| \| About CLA: at the previous suggestion of a friend, the repo \| was locked with CLA requirement currently (even though \| nobody outside contributed yet). This will be lifted once \| the situation becomes clearer. \| [deleted] \| georgewfraser wrote: \| Licensing under AGPL will make it hard for any startup to use \| Cozo. Lawyers always ask about AGPL in venture financing \| diligence and it is considered a red flag. You can argue that \| they are wrong, the linking exception and so on, but you're \| basically shouting into the wind. \| ekidd wrote: \| > If a community forms and after I understand the \| consequences a little bit more, the license may change if the \| community decides that it is better for the long-term good of \| the project. For the moment though, it is staying AGPL. \| \| Yes, I do want to be clear: I encourage you to use whatever \| license you like. You wrote the code! I was just curious, \| because it would also affect the license of any hypothetical \| software I wrote that used the library. \| \| Here's a _super oversimplified_ version of the main license \| types (I am not a lawyer): \| \| - Permissive: "Do whatever you want but don't sue me." \| \| - LGPL: "If you give this library to other people, you must \| 'share and share alike' the source and your changes to this \| library." \| \| - GPL: "If you use this code in your program, you must 'share \| and share alike' your entire program, but only if you give \| people copies of the program." \| \| - AGPL: "If you use this code in your program, you must \| 'share and share alike' your entire program with anyone who \| can interact with it over a network." \| \| The AGPL makes a ton of sense for an advanced database \| _server,_ because otherwise AWS may make their own version \| and run it on their servers as a paid service, without \| contributing back. \| \| But like I said, I'm simplifying way too much. Take a look at \| the FSF's license descriptions and/or talk to a lawyer. This \| shouldn't be stressful. Figure out what license supports the \| kind of users and community you want, pick it, and don't look \| back. :-) \| \| (I may end up writing a super-simple non-persistent Datalog \| at some point for an open source project. My needs are _much_ \| simpler than the things you support, anyways--I only ever \| need to run one particular query.) \| zh217 wrote: \| I realized my mistake, as I said in the other comments. The \| main repo is now under LGPL. I'll see what I'll do with the \| bindings. Writing code is so much better than dealing with \| licenses! \| ekidd wrote: \| Oh, cool! \| \| And yeah, licenses can be challenging and frustrating, \| especially the first time you release a major project. \| \| I am really super excited by the idea of embedded Datalog \| in Rust. I sometimes run into situations where I need \| something that fits in that awkward gap between SQL and \| Prolog. I want more expressiveness, better composability, \| and better graph support than SQL. But I also want \| finite-sized results that I can materialize in bounded \| time. \| \| There has been some very neat work with incrementally- \| updated Datalog in the Rust community. For example, I \| think Datafrog is really neat: https://github.com/frankmc \| sherry/blog/blob/master/posts/2018... But it's great to \| see more cool projects in this space, so thank you. \| kylebarron wrote: \| If I'm not mistaken that sounds more like LGPL than the AGPL? \| zh217 wrote: \| Maybe, and maybe I need to consult a lawyer someday to get \| the facts straight. To tell you the truth my head hurts \| when I attempt to understand what these licenses say. \| Regardless, I intend this project to be true FOSS, the \| "finer detail" of which FOSS license it uses may change. \| mijoharas wrote: \| My understanding is the same as kylebarron's[0] since you \| lack linking protections (which you would get under \| LGPL), so any work that includes cozo would be a "derived \| work" under the (A)GPL. Interestingly there doesn't seem \| to be an affero LGPL license[1], which could be what you \| might want here. \| \| Otherwise, simplest solution provided you want a copyleft \| license would be to use the LGPL I think. \| \| NOTE: not a lawyer. \| \| [0] https://softwareengineering.stackexchange.com/questio \| ns/1078... \| \| [1] https://redmonk.com/dberkholz/2012/09/07/opening-the- \| infrast... (old link, but I couldn't find anything since \| then describing this kind of license?) \| wizzwizz4 wrote: \| We kinda do have it; it's just mostly useless, given the \| linking clause. (Not entirely useless, though, as that \| article sets out.) \| \| GPL and AGPL have the same layout, so you can just take \| the LGPL, and replace all references to 'GPL' and 'GNU \| General Public License' with 'AGPL' and 'GNU Affero \| General Public License'. Of course, you couldn't call \| that license 'GNU ALGPL' or 'GNU LAGPL'; you'd have to \| come up with your own name. (Disclaimer: I'm not a \| lawyer, and I haven't checked this as thoroughly as I \| would if I were going to use this for my own software.) \| \| Maybe it's worth bothering Bradley M. Kuhn \| (http://ebb.org/bkuhn/) again and seeing what the current \| status of a Lesser AGPL is? \| _frkl wrote: \| That's a fair enough stance. I'd recommend not taking any \| outside contributions until you are sure about the \| license, since it'll make it much harder to change the \| license if you do. Or maybe require all outside \| contributions to be licensed very permissively, like \| using the BSD license. Or you could use a CLA, but that's \| not something I'd recommend. Either way, licensing is \| hard :(. I can emphasise with the head hurting.... Oh, \| also, check out https://tldrlegal.com/ . \| kapilvt wrote: \| its also odd then re the python bindings being MIT, as \| the AGPL will convey throughout any aggregation or \| library usage, as would GPL, the primary delta for GPL vs \| AGPL is the intent on the later for network offered \| services, which in the context of an embedded library/db \| is odd. rightly or wrongly many orgs will refuse to allow \| usage of gpl/agpl software due to the licensing concerns \| around the effects of the rest of their ip. duckdb \| (embedded analytics sql) uses mit, etc. so in terms of \| creating a "true foss" project ie a community of users \| and contributors, its definitely worth considering a \| licensing change imho, but of course dealers choice. \| zh217 wrote: \| OP here. Nothing about the license is final yet since \| there are no outside contributors. I just changed the \| main repo to LGPL, not because what I believed in \| changed, but because it seems that I really misunderstood \| the licenses. \| dangoor wrote: \| I am not a lawyer, but I work in an open source programs \| office and am currently working specifically on open source \| license compliance. \| \| Beyond what the sibling comments have said about LGPL \| sounding more like what you're going for, I'll just note that \| if you'd like broad adoption of this while still ensuring \| that changes to your code remain open, you might also want to \| consider the Mozilla Public License. \| \| From what I understand of MPL and LGPL is that MPL is better \| for instances where dynamic linking isn't possible. The MPL \| basically says that any changes _to the files you created_ \| must be available under the MPL, preserving their public \| availability. \| \| That said, most organizations are fine with the LGPL, but it \| just gets gnarly if there are instances where you really want \| to statically link something but you still fully want to \| support the original library's openness. \| pie_flavor wrote: \| AGPL is a variant of the GPL, not the LGPL. Meaning that \| dynamic linking still constitutes (according to them) a \| derivative work, meaning that even programs that dynamically \| link against it must themselves be AGPL in their entirety. \| Dynamic linking is also meaningfully complicated to do in \| Rust, and this licensure of the crates.io crate will be a \| footgun for anyone not using cargo-deny. \| \| I think this is a very cool project, but its use of GPL \| essentially ensures I'm not going to use it for anything. If \| you're planning on reducing it to LGPL, I'm not sure what the \| GPL is getting you over going with the Rust standard license \| set of MIT + Apache 2.0. \| jitl wrote: \| This is amazing! \| \| Have you looked at differential-datalog? It's rust-based, \| maintained by VMWare, and has a very rich, well-typed Datalog \| language. differential-datalog is in-memory only right now, but \| could be ideal to integrate your graph as a datastore or disk \| spill cache. \| \| https://github.com/vmware/differential-datalog \| abc3354 wrote: \| This look nice ! \| \| Datascript seems to be another Datalog engine (in memory only) \| \| https://github.com/tonsky/datascript \| fsiefken wrote: \| there are a few more, including ones supporting on disk \| databases \| https://en.wikipedia.org/wiki/Datalog#Systems_implementing_D... \| billylindeman wrote: \| This is amazing. I can't wait to play with it \| typon wrote: \| I have been meaning to do this exact project for 5 years at \| least. Congrats on making it happen - looking forward to using it \| stevesimmons wrote: \| This does look very nice! \| \| Especially (from my point of view) having the Python interface. \| \| What's the max practical graph sizes you anticipate? \| zh217 wrote: \| For the moment: you can have as much data as you want on disk \| as long as the RocksDB storage engine can handle it, which I \| believe is quite large. For any single query though, you want \| all the data you touch to fit in memory. The good news is that \| Rust is very efficient in using memory. This will be improved \| in future versions. \| \| For the built-in graph algorithms, you are also limited by the \| algorithmic complexity, which for some of them is quite high \| (notably betweenness centrality). There is nothing the database \| can help in this case, though we may add some approximate \| algorithms with lower complexities later. \| pgt wrote: \| Good job! How to transact? The examples only show queries. \| zh217 wrote: \| Transactions are described in the manual: https://cozodb.github \| .io/current/manual/stored.html#chaining.... \| \| Sorry about the docs being all over the place at the moment! My \| only excuse is that Cozo is very young. The documentation (and \| the implementation) still needs a lot of work! \| dwenzek wrote: \| Really nice! \| \| I like the design choices of Datalog for the query language and \| Relations for the data model. This contrasts with the typical \| choices made for graph databases where the word graph seems to \| make _links_ a mandatory query and representation tool. \| philzook wrote: \| Very cool! I love the sqlite install everywhere model. \| \| Could you compare use case with Souffle? https://souffle- \| lang.github.io/ \| \| I'd suggest putting the link to the docs more prominently on the \| github page \| \| Is the "traditional" datalog `path(x,z) :- edge(x,y), path(y,z).` \| syntax not pleasant to the modern eye? I've grown to rather like \| it. Or is there something that syntax can't do? \| \| I've been building a Datalog shim layer in python to bridge \| across a couple different datalog systems \| https://github.com/philzook58/snakelog (including a datalog built \| on top of the python sqlite bindings), so I should look into \| including yours \| zh217 wrote: \| I find nothing wrong with the classical syntax, but there is a \| very practical, even stupid reason why the syntax is the way it \| is now. As you can see from the tutorial \| (https://nbviewer.org/github/cozodb/cozo- \| docs/blob/main/tutor...), you can run Cozo in Jupyter notebooks \| and mix it with Python code. This is the main way that I myself \| interact with Cozo. Since I don't fancy writing an \| unmaintainable mess of Jupyter frontend code that may become \| obsolete in a few years, CozoScript had better look like python \| enough so as not to completely baffle the Jupyter syntax \| highlighter. That's why the syntax for comments is `#`, not \| `//`. That's also why the syntax for stored relation is \| `stored`, not `&stored` or `%stored`. \| \| This is a hack from the beginning, but over time I grew to like \| the syntax quite a bit. And hopefully by being similar to \| Python or JS superficially, fewer confusion results for new \| users :) \| philzook wrote: \| Ah, that's very interesting. Thank you. `s.add(path(x,z) <= \| edge(x,y) & path(y,z))` is what I chose as python syntax, but \| it is clunkier. \| samuell wrote: \| Interesting! I'm thinking ... perhaps a small syntax \| comparison for prolog/classical datalog vs cozo, would help \| people used to the classical syntax quickly get started. \| packetlost wrote: \| This is very similar to the goals of a project I've been working \| on, though I've been focusing on the raw storage format \| (literally a drop-in replacement for RocksDB, so this could be \| interesting). I think datalog databases are _far_ underrated. ___________________________________________________________________ (page generated 2022-11-08 23:00 UTC)