proxy70

	[HN Gopher] Postgres Auditing in 150 lines of SQL ___________________________________________________________________ Postgres Auditing in 150 lines of SQL Author : oliverrice Score : 146 points Date : 2022-03-09 15:26 UTC (7 hours ago)
	web link (supabase.com)
	w3m dump (supabase.com)
	\| jph wrote: \| The post leads to a good Supabase Postgres auditing tool as a PG \| extension named supa_audit: \| \| https://github.com/supabase/supa_audit \| simonw wrote: \| I've battled this problem many times over my career in many \| different ways - this approach looks very solid to me. I \| particularly like the way the schema design incorporates a record \| ID, old record ID, updated record and old record. \| \| You do end up storing a lot of extra data, but it's very thorough \| - it's not going to miss anything, and figuring out what happened \| later on should be very easy. \| lichtenberger wrote: \| I'd argue that's still a lot of work to manually do. However, \| great work and detail, thanks a lot :-) \| \| I'm working on a database system[1] in my spare time, which \| automatically retains all revisions and assignes revision \| timestamps during commits (single timestamp in the \| RevisionRootPage). Furthermore, it is tamper proof and the whole \| storage can be verified by comparing a single UberPage hash as in \| ZFS. \| \| Basically it is a persistent trie-based revision index (plus \| document and secondary indexes) mapped to durable storage, a \| simple log-structured append-only file. A second file tracks \| revision offsets to provide binary search on an in-memory map of \| timestamps. As the root of the tree is atomically swapped it does \| not need a WAL, which basically is another data file and can be \| omitted in this case. \| \| Besides versioning the data itself in a binary encoding similar \| to BSON it tracks changes and writes simple JSON diff files for \| each new revision. \| \| The data pages are furthermore not simply copied on write, but a \| sliding snapshot algorithm makes sure, that only changed records \| mainly have to be written. Before the page fragments are written \| on durable storage they are furthermore compressed and in the \| future might be encrypted. \| \| [1] https://sirix.io \| https://github.com/sirixdb/sirix \| jmalicki wrote: \| Every google hit for "UberPage" seems to be your writing, is \| there standard ZFS terminology that would be found in ZFS \| documentation to what you're referring? \| lichtenberger wrote: \| It's in our case the tree root of the index, which is always \| written after all the descendant pages have been written as a \| new revision is committed (during a postorder traversal of \| the new pages). \| \| In ZFS the UberPage is called UberBlock. We borrowed some of \| the concepts as to add checksums in the parent pages instead \| of the pages itself. In ZFS they are blocks :-) \| \| Thanks for asking. \| ithrow wrote: \| Yes but sirix is not a SQL/RDBMS. \| lichtenberger wrote: \| True. In general I could add storing relational data as well, \| but currently I'm entirely focusing on JSON and auto-indexing \| for secondary indexes as well higher order function support \| in Brackit. \| \| Of course we'd need more man-power as I'm more or less the \| only one working on the core in my spare time (since 2012). \| \| Moshe mainly works on the clients and a frontend. A new \| frontend based on SolidJS is in the works showing the history \| and diffs as in \| \| https://raw.githubusercontent.com/sirixdb/sirix/master/Scree. \| .. \| \| However, we're of course looking forward to suggestions, bug \| reports, real world use cases and contributions :-) \| lichtenberger wrote: \| In order to provide fast audits of subtrees, the system stores \| optionally a merkle hash tree (a hash in each node) and updates \| the hashes for all ancestors automatically during updates. \| chatmasta wrote: \| You might like what we're doing with Splitgraph. Our command \| line tool (sgr) installs an audit log into Postgres to track \| changes [0]. Then `sgr commit` can write these changes to \| delta-compressed objects [1], where each object is a columnar \| fragment of data, addressable by the LTHash of rows \| added/deleted by the fragment, and attached to metadata \| describing its index [2]. \| \| I haven't explored sirix before, but at first glance it looks \| like we have some similar ideas -- thanks for sharing, I'm \| excited to learn more, especially about its application of ZFS. \| \| [0] https://www.splitgraph.com/docs/working-with- \| data/tracking-c... \| \| [1] https://www.splitgraph.com/docs/concepts/objects \| \| [2] \| https://github.com/splitgraph/splitgraph/blob/master/splitgr... \| lichtenberger wrote: \| Very interesting, thanks for pointing out :-) \| oliverrice wrote: \| author here (of the blog post and \| https://github.com/supabase/supa_audit) \| \| happy to answer any questions \| simonw wrote: \| Are you at all worried about the size of the audit table? \| Keeping a JSON copy of both the old and the new data is going \| to add up pretty quickly for fast-changing tables. \| \| I've built audit systems in the past which track just the new \| values, with the idea being that you can replay the audit log \| for a particular record from the beginning to reconstruct its \| state. I'm not convinced that's a better approach than yours \| here though, just interested in your thinking. \| oliverrice wrote: \| > Are you at all worried about the size of the audit table? \| Keeping a JSON copy of both the old and the new data is going \| to add up pretty quickly for fast-changing tables. \| \| That's def a valid concern if you're auditing a high \| frequency insert/update table, or have your entire database \| under audit. If you do have those needs a system that logs \| outside of postgres (like pgaudit) would be a better fit. \| \| In my experience most startups and mid-size enterprises \| sprinkle in auditing around the sensitive parts of the DB, \| like `account` or `access_control` tables where writes load \| isn't much of a concern. \| \| > I've built audit systems in the past which track just the \| new values \| \| Yeah, that solution works great! The one place it breaks down \| is if you apply auditing to an existing table, and need to be \| able to recover the records from when auditing was enabled \| (initial state is not snapshotted) \| \| We toyed with another approach to avoiding having to track \| old_record by first checking to see if the `record_id` exists \| in the audit table, and then inserting a row with the OLD \| values with a `SNAPSHOT` `operation` if it does not. \| \| Even though that query/check was operating on an index, the \| performance overhead was higher than we were comfortable \| with. \| \| Storage is pretty cheap these days so we opted to optimize \| for write throughput + reduced memory usage rather than any \| considerations wrt disk. \| jsmith99 wrote: \| I've also used an update trigger to track only changed \| columns, inspired by this post[0] which wraps a comparison \| function into a new subtract operator for JSONB, similar to \| the built in subtract operator for HSTORE which removes any \| keypairs that haven't changed. \| \| [0] http://8kb.co.uk/blog/2015/01/19/copying-pavel-stehules- \| simp... \| oliverrice wrote: \| thats a cool solution for reducing storage. nice! \| craigkerstiens wrote: \| Building a bit off Simon's above question. I'm curious if any \| thought was given to using JSON over JSONB. The trade off of \| really fast ability to ingest them and save them seems like \| it could be better than JSONB which has to do some conversion \| and if there are any indexes on the JSONB columns could \| really dramatically slow down throughput. \| oliverrice wrote: \| > I'm curious if any thought was given to using JSON over \| JSONB \| \| For sure! There is a conversion penalty that is paid \| incrementally (at write time). For paying that penalty, \| jsonb gets you reduced storage size and dramatically faster \| column value extraction with `->` or `->>` if you end up \| having to filter the audit table by something other than \| the `record_id` \| \| It is a tradeoff though and depending on the specifics of \| your use-case JSON could be a better choice. \| netcraft wrote: \| this is great! I've done something similar many times, but the \| ability to do it on arbitrary tables is really nice. \| \| If you have the HSTORE extension, you can also use it to get a \| diff of the old and new: `HSTORE(OLD) - HSTORE(NEW)` and vice \| versa, very nice to easily have the deltas. \| perlgeek wrote: \| First of all, this looks super awesome, and at least an order \| of magnitude better than our home-grown solution (grown over \| 25+ years on a DB that is now mariadb). \| \| How well does this work when you audit multiple tables with \| foreign key relationships between them? \| \| If the user references a customer, and you want to find out \| what customer the user belonged to at time X (but the customer \| might have been deleted in the mean time), you have to track \| through the history of multiple tables to find answers. More if \| you have n2m relationships... \| \| It sounds doable, but not trivial to get right for more \| complicated models. \| \| --- \| \| Update: to clarify, I want to be able to present a somewhat \| homan-readable history of an object without doing much table- \| specific logic. \| \| If an author can publish many books, you typically have an n2m \| mapping table, like book_authors. If you want to add newly \| published books to the "author" history view, you have to \| determine than you have to include parts of the history of the \| book_authors table. I don't think you can do that through a \| join though? Trying to think my way through this... \| \| Maybe this can be dealt with easier if you can assume each \| table has a non-composite primary key, and include that primary \| key in the history table, and then you can do joins more easily \| on the history? \| oliverrice wrote: \| All the data would be available and joining it would be \| possible with some jsonb gymnastics but its optimized more \| for compliance style auditing than rapid lookups for complex \| queries. \| \| For that scenario replicating audit data to a OLAP system may \| be a better fit \| simonw wrote: \| I'm not completely clear on the reason for both record_id and \| old_record_id - is this so that you can capture instances where \| one of the columns that makes up the primary key for the row is \| itself modified for some reason? \| oliverrice wrote: \| Being able to see when the primary key changes using \| `record_id` and `old_record_id` is a perk, but we also wanted \| to make sure a snapshot of the existing data was available \| when auditing is enabled on a table that already has data in \| it. See comment ^ for more color \| \| Long story short: it is avoidable if you're okay with paying \| the penalty of querying the audit table by `record_id` before \| every write. We decided to use more disk and keep the higher \| write throughput \| bob1029 wrote: \| We have a new business system we are trying to build that has a \| _lot_ of very granular auditing and versioning requirements \| throughout, and this is where I started looking. \| \| After a while of screwing with this kind of approach, I realized \| that the number of different types of objects involved was going \| to cause an explosion in the number of related auditing and \| versioning tables. Keeping track of not only who changed \| something but also the exact facts they changed (such that you \| could revert as needed) requires a lot of additional boilerplate \| in each instance. \| \| So, I made a pretty big gamble and went all-in on an event- \| sourced abstraction where everything is written to 1 gigantic \| append-only log of events. Turns out, this was worth it in our \| case. As long as everything has a timestamp and you have a way to \| refer to specific log entries (i.e. by offset), you basically \| have a complete solution. The hard parts are handled by a JSON \| serializer and gzip. We use in-memory indexes that are simply \| long/long dictionaries that map an object identity to a physical \| log offset. \| \| The only downside with what I have proposed is that there is no \| garbage collection or cleanup possible. At this time, this is \| actually a highly-desirable property of the system. Disk space is \| cheap. Losing 3 weeks of work to an accidental button click is \| not. \| infogulch wrote: \| I've done this before on smaller scales. Writing a set of \| table-valued functions that expose the event data in relational \| form "as of date given by param X", and views on top of that \| where X is set to getdate(), you get a very ergonomic way to \| consume "current" & "historical" data. Materialized views can \| make querying these more natural relations very cheap if your \| engine supports them. \| pstuart wrote: \| > everything is written to 1 gigantic append-only log of events \| \| Do you replicate that stream to avoid SPOF? \| bob1029 wrote: \| I am currently investigating synchronous replication of each \| compressed log batch to a 2nd host. We are totally fine with \| availability issues (i.e. system down because log replication \| broken). Availability problems are far preferable to \| consistency and other nasty bugs. \| lichtenberger wrote: \| As posted in another comment that's basically what \| https://github.com/sirixdb/sirix supports along with easily \| reconstructing former revisions of a JSON document (or any node \| therein), sophisticated secondary (also versioned) indexes, \| querying with JSONiq and optimizations at query compile time as \| for joins and aggregates.... \| \| In our case we opted for three types of secondary user-defined \| indexes (besides the main document trie index to find unique, \| stable nodes by their 64bit ID), namely \| \| 1. field indexes \| \| 2. path indexes \| \| 3. content-and-structure indexes which index paths and their \| typed values \| \| Furthermore a path summary keeps track of all distinct paths in \| the JSON resource. \| scwoodal wrote: \| I didn't see this mentioned in the article but with JSONB you can \| leverage operators to know what data changed. > \| -- record = {"id": 1, "name": "bar"}, old_record = {"id": 1, \| "name": "foo"} > select record - old_record where id=3; \| name => bar > \| netcraft wrote: \| I dont think this is built in, I think you must have an \| extension or custom operator providing that: \| https://dbfiddle.uk/?rdbms=postgres_14&fiddle=273a440d043a82... \| scwoodal wrote: \| Ah yes, I was thinking HSTORE. \| \| https://dbfiddle.uk/?rdbms=postgres_14&fiddle=2f406630f367b5. \| .. \| efxhoy wrote: \| This SO answer has a good jsonb diff function: \| https://stackoverflow.com/a/36043269 \| craigkerstiens wrote: \| Love the detail and approach here, though (and it is mentioned in \| the bottom of the article) this shouldn't be used at any very \| large scale... the estimate of at about 1k transactions per \| second or less seems about right. For any larger scale you want \| to look at something like pg_audit which can scale much much \| further and be used with other extensions to ensure you have a \| full audit log that can't be modified (when dealing with \| compliance). \| sususu wrote: \| I really like this approach, however I have a use case where the \| application user id that made the operation must be saved in the \| audit table, unfortunately, I cannot see how to do that with a \| pure SQL solution. \| \| Has anyone done something similar with SQL only? \| oliverrice wrote: \| if `user_id` is in the table you're auditing you could do \| \| ``` alter table audit.record_version add column user_id bigint \| default (coalesce(record, old_record) ->> 'user_id'); ``` \| \| or if you meant the SQL `current_user` you could extend the \| trigger to track it. \| \| but if the user_id is only available in application space (not \| in the db) it will not be possible using straight SQL \| timwis wrote: \| PostgREST has an interesting approach to authentication that \| might give you some ideas: \| \| https://postgrest.org/en/stable/auth.html \| \| Supabase is built on top of PostgREST, but I can't say for sure \| it uses the same means of authentication (I think it has its \| own Auth layer). \| oliverrice wrote: \| Supabase passes all the JWT info to postgrest so it is still \| available in transaction local config \| netcraft wrote: \| we do that by setting a local variable before the query and \| then reading that in the triggers: \| \| ``` SET LOCAL foo.who_id = 'some-uuid'; \| UPDATE table SET ... \| \| ``` \| \| ``` -- function to make getting the setting \| easier DROP FUNCTION IF EXISTS get_who_id (text); \| CREATE OR REPLACE FUNCTION get_who_id (default_value text \| DEFAULT null::text) RETURNS text AS $get_who_id$ DECLARE \| who_id text; BEGIN BEGIN who_id := \| current_setting('foo.who_id'); EXCEPTION WHEN \| SQLSTATE '42704' THEN RETURN default_value; END; \| IF (length(who_id) = 0) THEN RETURN default_value; \| END IF; return who_id; END $get_who_id$ \| LANGUAGE plpgsql VOLATILE; \| \| ``` \| \| ``` CREATE OR REPLACE FUNCTION some_table_audit \| () RETURNS TRIGGER AS $some_table_audit$ DECLARE \| who_id text; BEGIN who_id := \| get_who_id(null::text); ... \| \| ``` \| \| Identifiers changed, but hopefully will give you the idea. \| tda wrote: \| I've experiment with a very similar solution and it felt a \| bit dirty but so far it seems to be working just fine. I have \| made an integration for auditing with sqlmodel which I intend \| to share, but it is kind of rough and I was a bit stuck \| trying to clean it up. The idea is that you add e.g. a \| HeroHistory model derived from HeroBase and a HistoryMixin \| that creates the triggers and relationships to have an Audit \| log of the Hero table. \| \| If anyone is interested give me a shout out \| sususu wrote: \| That's really interesting and gave me some (hopefully) good \| ideas. \| \| Thank you very much! \| raimille1 wrote: \| This is a great technique, we used to have it setup very \| similarly at a previous startup and worked wonders. Thanks for \| sharing a generic way of doing it @oliverrice ! \| ithrow wrote: \| Debezium can be used for this in embedded mode if you are on the \| JVM. \| johndfsgdgdfg wrote: \| Can you please elaborate? Last time I checked it needed Kafka \| to work. ___________________________________________________________________ (page generated 2022-03-09 23:00 UTC)