|
| canadiantim wrote:
| You mention how Cypher is not much of an improvement over CTE in
| SQL, I was wondering if you could expand on this point a bit if
| possible?
|
| Some part of me is considering using Apache AGE graph extension
| for postgres, but another part wonders whether it's worth it
| considering CTE's can do a lot very similarly.
|
| I'll definitely be following the progress for Cozo though, sounds
| great on the face of it. Definitely will have to consider
| potentially using Cozo as well. I wonder if it could make sense
| to use Postgres and Cozo together?
| zh217 wrote:
| Yes of course.
|
| Perhaps I should start by clarifying that I am talking about
| the number of queries the Cypher language can express, without
| any vendor-specific extensions, since my consideration was
| whether to use it as the query language for my own database.
| And Cypher is of course much more convenient to _type_ than SQL
| for expressing graph traversals - it was built for that.
|
| With that understanding, any cypher pattern can be translated
| into a series of joins and projections in SQL, and any
| recursive query in cypher can be translated into a recursive
| CTE. Theoretically, SQL with recursive CTE is not Turing
| complete (unless you also add in window functions in recursive
| CTE, which I don't think any of the Cypher databases currently
| provide), whereas Datalog with function symbol is. Practically,
| you can easily write a shortest path query in pure Datalog
| without recourse to built-in algorithms (an example is shown in
| README), and at least in Cozo it executes essentially as a
| variant of Dijkstra's algorithm. I'm not sure I can do that in
| Cypher. I don't think it is doable.
| samuell wrote:
| Does Cypher even support nested and/or recursive queries? I
| remember asking the Neo4j guys at a meetup about that many
| years ago, and they didn't even seem to understand the
| question. Might have changed since then of course.
|
| Otherwise the thing I have noticed with the datalog (as well
| as prolog) syntax, is you are able to build a vocabulary of
| re-usable queries, in a much more usable was than any of the
| solutions I've seen in SQL, or other similar languages.
|
| It thus allows you to raise your level of abstraction, by
| layer by layer define your definitions (or "classes" if you
| will) with well crafted queries, that can be used for further
| refined classifying queries.
| zh217 wrote:
| Re Datalog syntax: yes, the "composability" is the main
| reason that I decided to adopt it as the query language.
| This is also the reason why we made storing query results
| back into the database very easy (no pre-declaration of
| "tables" necessary) so that intermediate results can be
| materialized in the database at will and be used by
| multiple subsequent queries.
| samuell wrote:
| Indeed, composability is the spot-on keyword here.
| [deleted]
| samuell wrote:
| How I have waited for this: A simple, accessible library for
| graph-like data with datalog (also in a statically compiled
| language, yay). Have even pondered using SWI-prolog for this kind
| of stuff, but it seems so much nicer to be able to use it
| embedded in more "normal" types of languages.
|
| Looking forward to play with this!
|
| The main thing I will be wondering now is how it will scale to
| really large datasets. Any input on that?
| samuell wrote:
| For folks looking for documentation or getting started-
| examples, see:
|
| - The tutorial: https://nbviewer.org/github/cozodb/cozo-
| docs/blob/main/tutor...
|
| - The language documentation:
| https://cozodb.github.io/current/manual/
|
| - The pycozo library README for some examples on how to run
| this from inline python:
| https://github.com/cozodb/pycozo#readme
| zh217 wrote:
| Thanks for your interest in this!
|
| It currently uses RocksDB as the storage engine. If your server
| has enough resources, I believe it can store TBs of data with
| no problem.
|
| Running queries on datasets this big is a complicated story.
| Point lookups should be nearly instant, whereas running
| complicated graph algorithms on the whole dataset is
| (currently) out of the question, since all the rows a query
| touches must reside in memory. Also, the algorithmic complexity
| of some of the graph algorithms is too high for big data and
| there's nothing we can do about it. We aim to provide a smooth
| way for big data to be distilled layer by layer, but we are not
| there yet.
| samuell wrote:
| Many thanks for the detailed answer!
| mark_l_watson wrote:
| Thank you, this looks very useful. I will try the Python embedded
| mode when I have time.
|
| I especially like the Datalog query examples in your READ project
| file. I usually use RDF/RDFS and the SPARQL query language, with
| must less use of property graphs using Neo4J. I expect an easy
| ramp up learning your library.
|
| BTW, I read the discussion of your use of the AGPL license. For
| what it is worth, that license is fine with me. I usually release
| my open source projects using Apache 2, but when required
| libraries use GPL or AGPL, I simply use those licenses.
| dmitriid wrote:
| I nitpick for the README: consider converting examples from
| images to code blocks (you can even directly copy-paste them into
| the code blocks and they should retain their formatting)
|
| Otherwise: yes, please. I love the idea.
| mola wrote:
| Graph query over relational data, brilliant. I need this
| yesterday.
| OtomotO wrote:
| Awesome work, congrats.
|
| For someone who never did anything datalog I didn't see an
| example in the repo and the docs (docs.rs) could need some more
| content.
|
| I hope to see a 1.0 at some point and performance that can
| compete with SQLite.
|
| Would love to have an alternative, especially as I have a few pet
| projects that have graph data (well, in the end the whole
| universe can be modelled as a graph ;))
| zh217 wrote:
| I'm very happy that you like it!
|
| The "teasers" section in the repo README contains a few
| examples. Or you could look at the tutorial
| (https://nbviewer.org/github/cozodb/cozo-
| docs/blob/main/tutor...), which contains all sorts of examples.
|
| The Rust documentation on docs.rs could certainly be improved,
| will do that later!
| OtomotO wrote:
| Ah, yes, mea culpa. Was browsing on the phone and did miss
| that link indeed.
|
| Is is also okay to store big data that would otherwise go
| into another storage like e.g. blog-posts?
|
| I mean the content could also be modeled as a leaf-node and
| not be part of the db itself. (not sure if that would be
| abusing the kv storage)
| zh217 wrote:
| In short: yes, but not right now. See this issue:
| https://github.com/cozodb/cozo/issues/2. Also in this case
| you are not really using it as an embedded database
| anymore, which is our original motivation. We currently
| also provide a "cozoserver", but it is pretty primitive at
| the moment. "Big data" capabilities, when they arrive in
| Cozo, will probably go into the server instead of the
| embedded binaries.
| OtomotO wrote:
| Hm, why wouldn't that be embedded?
|
| How do you define embedded?
|
| One of my application is a simple "blog-like" webservice
| where you can either use a SQLite db or Postgres.
|
| Personally I often prefer SQLite because it doesn't need
| a thousand configurations and I can just migrate all the
| content with copying a file.
| zh217 wrote:
| My use of "embedded" means that the whole database runs
| in the same process as your application. This is how
| SQLite works. Your application doesn't "connect" to an
| SQLite database in the usual sense. Your application
| simply contains SQLite as part of itself. Contrast this
| with Postgres, where you first need to start a Postgres
| server and then have your application talk to it.
| OtomotO wrote:
| Exactly.
|
| I was just curious because of your comment:
|
| > Also in this case you are not really using it as an
| embedded database anymore, which is our original
| motivation
|
| As by your (and mine) definition, I am indeed using it as
| an embedded database. It's running inside the process and
| storing (and persisting) blog-posts.
| Serow225 wrote:
| I'm excited to get some more Rust docs!
|
| Even just a pointer to serde ::from_value(value).unwrap(),
| and ::deserialize(value), would be
| helpful to get people pointed in the right direction.
|
| Looks like a super cool project, congrats!
| ithrow wrote:
| _you may store data as property graphs or triples, but when you
| do a query, you always get back relations_
|
| Can you elaborate on this? in datomic you can get back
| hierarchical data
| ekidd wrote:
| This is a really impressive piece of work! Congratulations!
|
| I note that it appears to be a library, but it's licensed under
| the Affero GPL. I believe this means that if I link your library
| into a program, and if I then allow users to interact with that
| combined program in any way over a network, then I have to make
| it possible for users to download the source code to my entire
| program. Is that your goal here? Were you thinking of some kind
| of commercial licensing model for people writing server-side apps
| that use your library?
|
| (I'm curious because I've been deciding whether or not to roll my
| own toy Datalog for a permissively-licensed open source Rust
| project.)
| zh217 wrote:
| No, my understanding is that if you don't make any changes to
| the Cozo code, you don't need to release anything to the
| public. If you do, and you cannot release your non-Cozo code,
| then you must dynamically link to the library (and release your
| changes to the Cozo code). The Python, NodeJS and Java/Clojure
| libraries all use dynamic linking.
|
| There is no plan for any commercial license - this is a
| personal project at the moment. My hope is for this project to
| grow into a true FOSS database with wide contributions and no
| company controlling it. If a community forms and after I
| understand the consequences a little bit more, the license may
| change if the community decides that it is better for the long-
| term good of the project. For the moment though, it is staying
| AGPL.
| Cu3PO42 wrote:
| Let me preface by saying that this seems like a great piece
| of software and it is absolutely within your right to license
| it as whatever you would like, no matter what any of the
| commenters here think.
|
| However, I don't believe your understanding of AGPL is
| accurate.
|
| > No, my understanding is that if you don't make any changes
| to the Cozo code, you don't need to release anything to the
| public. If you do, and you cannot release your non-Cozo code,
| then you must dynamically link to the library (and release
| your changes to the Cozo code). The Python, NodeJS and
| Java/Clojure libraries all use dynamic linking.
|
| This sounds like you're thinking of the LGPL, not AGPL.
| Whereas LGPL is less strict than GPL because the exception
| you describe above applies. AGPL on the other hand is more
| strict. Essentially, if you use any AGPL code to provide a
| service to users then you must also make the source code
| available, even if the software itself is never delivered to
| users.
|
| The intention here is that you can't get around GPL by hiding
| any use of the GPL code behind a server, so it makes perfect
| sense to use it for a database. But I don't think it does
| what you want.
|
| Whichever way you decide to go, be it AGPL, LGPL or something
| else, I encourage you to make a choice before accepting any
| outside contributions. As soon as you have code from other
| authors without a CLA you will need to obtain their
| permission to change the license (with some exceptions).
|
| (Disclaimer: I'm not a lawyer, just interested in licenses.)
| zh217 wrote:
| It seems that I really did misunderstand the differences.
| It is now under LGPL. The repo still requires CLA for
| contribution for the moment until I am really sure.
| zh217 wrote:
| Thank you for your perspective.
|
| Maybe I was confused about the case of using an executable
| vs linking against a library. Let me double-check with a
| few friends who understand copyright laws better than me.
| If everything checks out, the next release will be under
| LGPL.
|
| About CLA: at the previous suggestion of a friend, the repo
| was locked with CLA requirement currently (even though
| nobody outside contributed yet). This will be lifted once
| the situation becomes clearer.
| [deleted]
| georgewfraser wrote:
| Licensing under AGPL will make it hard for any startup to use
| Cozo. Lawyers always ask about AGPL in venture financing
| diligence and it is considered a red flag. You can argue that
| they are wrong, the linking exception and so on, but you're
| basically shouting into the wind.
| ekidd wrote:
| > If a community forms and after I understand the
| consequences a little bit more, the license may change if the
| community decides that it is better for the long-term good of
| the project. For the moment though, it is staying AGPL.
|
| Yes, I do want to be clear: I encourage you to use whatever
| license you like. You wrote the code! I was just curious,
| because it would also affect the license of any hypothetical
| software I wrote that used the library.
|
| Here's a _super oversimplified_ version of the main license
| types (I am not a lawyer):
|
| - Permissive: "Do whatever you want but don't sue me."
|
| - LGPL: "If you give this library to other people, you must
| 'share and share alike' the source and your changes to this
| library."
|
| - GPL: "If you use this code in your program, you must 'share
| and share alike' your entire program, but only if you give
| people copies of the program."
|
| - AGPL: "If you use this code in your program, you must
| 'share and share alike' your entire program with anyone who
| can interact with it over a network."
|
| The AGPL makes a ton of sense for an advanced database
| _server,_ because otherwise AWS may make their own version
| and run it on their servers as a paid service, without
| contributing back.
|
| But like I said, I'm simplifying way too much. Take a look at
| the FSF's license descriptions and/or talk to a lawyer. This
| shouldn't be stressful. Figure out what license supports the
| kind of users and community you want, pick it, and don't look
| back. :-)
|
| (I may end up writing a super-simple non-persistent Datalog
| at some point for an open source project. My needs are _much_
| simpler than the things you support, anyways--I only ever
| need to run one particular query.)
| zh217 wrote:
| I realized my mistake, as I said in the other comments. The
| main repo is now under LGPL. I'll see what I'll do with the
| bindings. Writing code is so much better than dealing with
| licenses!
| ekidd wrote:
| Oh, cool!
|
| And yeah, licenses can be challenging and frustrating,
| especially the first time you release a major project.
|
| I am really super excited by the idea of embedded Datalog
| in Rust. I sometimes run into situations where I need
| something that fits in that awkward gap between SQL and
| Prolog. I want more expressiveness, better composability,
| and better graph support than SQL. But I also want
| finite-sized results that I can materialize in bounded
| time.
|
| There has been some very neat work with incrementally-
| updated Datalog in the Rust community. For example, I
| think Datafrog is really neat: https://github.com/frankmc
| sherry/blog/blob/master/posts/2018... But it's great to
| see more cool projects in this space, so thank you.
| kylebarron wrote:
| If I'm not mistaken that sounds more like LGPL than the AGPL?
| zh217 wrote:
| Maybe, and maybe I need to consult a lawyer someday to get
| the facts straight. To tell you the truth my head hurts
| when I attempt to understand what these licenses say.
| Regardless, I intend this project to be true FOSS, the
| "finer detail" of which FOSS license it uses may change.
| mijoharas wrote:
| My understanding is the same as kylebarron's[0] since you
| lack linking protections (which you would get under
| LGPL), so any work that includes cozo would be a "derived
| work" under the (A)GPL. Interestingly there doesn't seem
| to be an affero LGPL license[1], which could be what you
| might want here.
|
| Otherwise, simplest solution provided you want a copyleft
| license would be to use the LGPL I think.
|
| NOTE: not a lawyer.
|
| [0] https://softwareengineering.stackexchange.com/questio
| ns/1078...
|
| [1] https://redmonk.com/dberkholz/2012/09/07/opening-the-
| infrast... (old link, but I couldn't find anything since
| then describing this kind of license?)
| wizzwizz4 wrote:
| We kinda do have it; it's just mostly useless, given the
| linking clause. (Not entirely useless, though, as that
| article sets out.)
|
| GPL and AGPL have the same layout, so you can just take
| the LGPL, and replace all references to 'GPL' and 'GNU
| General Public License' with 'AGPL' and 'GNU Affero
| General Public License'. Of course, you couldn't call
| that license 'GNU ALGPL' or 'GNU LAGPL'; you'd have to
| come up with your own name. (Disclaimer: I'm not a
| lawyer, and I haven't checked this as thoroughly as I
| would if I were going to use this for my own software.)
|
| Maybe it's worth bothering Bradley M. Kuhn
| (http://ebb.org/bkuhn/) again and seeing what the current
| status of a Lesser AGPL is?
| _frkl wrote:
| That's a fair enough stance. I'd recommend not taking any
| outside contributions until you are sure about the
| license, since it'll make it much harder to change the
| license if you do. Or maybe require all outside
| contributions to be licensed very permissively, like
| using the BSD license. Or you could use a CLA, but that's
| not something I'd recommend. Either way, licensing is
| hard :(. I can emphasise with the head hurting.... Oh,
| also, check out https://tldrlegal.com/ .
| kapilvt wrote:
| its also odd then re the python bindings being MIT, as
| the AGPL will convey throughout any aggregation or
| library usage, as would GPL, the primary delta for GPL vs
| AGPL is the intent on the later for network offered
| services, which in the context of an embedded library/db
| is odd. rightly or wrongly many orgs will refuse to allow
| usage of gpl/agpl software due to the licensing concerns
| around the effects of the rest of their ip. duckdb
| (embedded analytics sql) uses mit, etc. so in terms of
| creating a "true foss" project ie a community of users
| and contributors, its definitely worth considering a
| licensing change imho, but of course dealers choice.
| zh217 wrote:
| OP here. Nothing about the license is final yet since
| there are no outside contributors. I just changed the
| main repo to LGPL, not because what I believed in
| changed, but because it seems that I really misunderstood
| the licenses.
| dangoor wrote:
| I am not a lawyer, but I work in an open source programs
| office and am currently working specifically on open source
| license compliance.
|
| Beyond what the sibling comments have said about LGPL
| sounding more like what you're going for, I'll just note that
| if you'd like broad adoption of this while still ensuring
| that changes to your code remain open, you might also want to
| consider the Mozilla Public License.
|
| From what I understand of MPL and LGPL is that MPL is better
| for instances where dynamic linking isn't possible. The MPL
| basically says that any changes _to the files you created_
| must be available under the MPL, preserving their public
| availability.
|
| That said, most organizations are fine with the LGPL, but it
| just gets gnarly if there are instances where you really want
| to statically link something but you still fully want to
| support the original library's openness.
| pie_flavor wrote:
| AGPL is a variant of the GPL, not the LGPL. Meaning that
| dynamic linking still constitutes (according to them) a
| derivative work, meaning that even programs that dynamically
| link against it must themselves be AGPL in their entirety.
| Dynamic linking is also meaningfully complicated to do in
| Rust, and this licensure of the crates.io crate will be a
| footgun for anyone not using cargo-deny.
|
| I think this is a very cool project, but its use of *GPL
| essentially ensures I'm not going to use it for anything. If
| you're planning on reducing it to LGPL, I'm not sure what the
| GPL is getting you over going with the Rust standard license
| set of MIT + Apache 2.0.
| jitl wrote:
| This is amazing!
|
| Have you looked at differential-datalog? It's rust-based,
| maintained by VMWare, and has a very rich, well-typed Datalog
| language. differential-datalog is in-memory only right now, but
| could be ideal to integrate your graph as a datastore or disk
| spill cache.
|
| https://github.com/vmware/differential-datalog
| abc3354 wrote:
| This look nice !
|
| Datascript seems to be another Datalog engine (in memory only)
|
| https://github.com/tonsky/datascript
| fsiefken wrote:
| there are a few more, including ones supporting on disk
| databases
| https://en.wikipedia.org/wiki/Datalog#Systems_implementing_D...
| billylindeman wrote:
| This is amazing. I can't wait to play with it
| typon wrote:
| I have been meaning to do this exact project for 5 years at
| least. Congrats on making it happen - looking forward to using it
| stevesimmons wrote:
| This does look very nice!
|
| Especially (from my point of view) having the Python interface.
|
| What's the max practical graph sizes you anticipate?
| zh217 wrote:
| For the moment: you can have as much data as you want on disk
| as long as the RocksDB storage engine can handle it, which I
| believe is quite large. For any single query though, you want
| all the data you touch to fit in memory. The good news is that
| Rust is very efficient in using memory. This will be improved
| in future versions.
|
| For the built-in graph algorithms, you are also limited by the
| algorithmic complexity, which for some of them is quite high
| (notably betweenness centrality). There is nothing the database
| can help in this case, though we may add some approximate
| algorithms with lower complexities later.
| pgt wrote:
| Good job! How to transact? The examples only show queries.
| zh217 wrote:
| Transactions are described in the manual: https://cozodb.github
| .io/current/manual/stored.html#chaining....
|
| Sorry about the docs being all over the place at the moment! My
| only excuse is that Cozo is very young. The documentation (and
| the implementation) still needs a lot of work!
| dwenzek wrote:
| Really nice!
|
| I like the design choices of Datalog for the query language and
| Relations for the data model. This contrasts with the typical
| choices made for graph databases where the word graph seems to
| make _links_ a mandatory query and representation tool.
| philzook wrote:
| Very cool! I love the sqlite install everywhere model.
|
| Could you compare use case with Souffle? https://souffle-
| lang.github.io/
|
| I'd suggest putting the link to the docs more prominently on the
| github page
|
| Is the "traditional" datalog `path(x,z) :- edge(x,y), path(y,z).`
| syntax not pleasant to the modern eye? I've grown to rather like
| it. Or is there something that syntax can't do?
|
| I've been building a Datalog shim layer in python to bridge
| across a couple different datalog systems
| https://github.com/philzook58/snakelog (including a datalog built
| on top of the python sqlite bindings), so I should look into
| including yours
| zh217 wrote:
| I find nothing wrong with the classical syntax, but there is a
| very practical, even stupid reason why the syntax is the way it
| is now. As you can see from the tutorial
| (https://nbviewer.org/github/cozodb/cozo-
| docs/blob/main/tutor...), you can run Cozo in Jupyter notebooks
| and mix it with Python code. This is the main way that I myself
| interact with Cozo. Since I don't fancy writing an
| unmaintainable mess of Jupyter frontend code that may become
| obsolete in a few years, CozoScript had better look like python
| enough so as not to completely baffle the Jupyter syntax
| highlighter. That's why the syntax for comments is `#`, not
| `//`. That's also why the syntax for stored relation is
| `*stored`, not `&stored` or `%stored`.
|
| This is a hack from the beginning, but over time I grew to like
| the syntax quite a bit. And hopefully by being similar to
| Python or JS superficially, fewer confusion results for new
| users :)
| philzook wrote:
| Ah, that's very interesting. Thank you. `s.add(path(x,z) <=
| edge(x,y) & path(y,z))` is what I chose as python syntax, but
| it is clunkier.
| samuell wrote:
| Interesting! I'm thinking ... perhaps a small syntax
| comparison for prolog/classical datalog vs cozo, would help
| people used to the classical syntax quickly get started.
| packetlost wrote:
| This is very similar to the goals of a project I've been working
| on, though I've been focusing on the raw storage format
| (literally a drop-in replacement for RocksDB, so this could be
| interesting). I think datalog databases are _far_ underrated.
___________________________________________________________________
(page generated 2022-11-08 23:00 UTC) |