[HN Gopher] Postgres Auditing in 150 lines of SQL
___________________________________________________________________
 
Postgres Auditing in 150 lines of SQL
 
Author : oliverrice
Score  : 146 points
Date   : 2022-03-09 15:26 UTC (7 hours ago)
 
web link (supabase.com)
w3m dump (supabase.com)
 
| jph wrote:
| The post leads to a good Supabase Postgres auditing tool as a PG
| extension named supa_audit:
| 
| https://github.com/supabase/supa_audit
 
| simonw wrote:
| I've battled this problem many times over my career in many
| different ways - this approach looks very solid to me. I
| particularly like the way the schema design incorporates a record
| ID, old record ID, updated record and old record.
| 
| You do end up storing a lot of extra data, but it's very thorough
| - it's not going to miss anything, and figuring out what happened
| later on should be very easy.
 
| lichtenberger wrote:
| I'd argue that's still a lot of work to manually do. However,
| great work and detail, thanks a lot :-)
| 
| I'm working on a database system[1] in my spare time, which
| automatically retains all revisions and assignes revision
| timestamps during commits (single timestamp in the
| RevisionRootPage). Furthermore, it is tamper proof and the whole
| storage can be verified by comparing a single UberPage hash as in
| ZFS.
| 
| Basically it is a persistent trie-based revision index (plus
| document and secondary indexes) mapped to durable storage, a
| simple log-structured append-only file. A second file tracks
| revision offsets to provide binary search on an in-memory map of
| timestamps. As the root of the tree is atomically swapped it does
| not need a WAL, which basically is another data file and can be
| omitted in this case.
| 
| Besides versioning the data itself in a binary encoding similar
| to BSON it tracks changes and writes simple JSON diff files for
| each new revision.
| 
| The data pages are furthermore not simply copied on write, but a
| sliding snapshot algorithm makes sure, that only changed records
| mainly have to be written. Before the page fragments are written
| on durable storage they are furthermore compressed and in the
| future might be encrypted.
| 
| [1] https://sirix.io | https://github.com/sirixdb/sirix
 
  | jmalicki wrote:
  | Every google hit for "UberPage" seems to be your writing, is
  | there standard ZFS terminology that would be found in ZFS
  | documentation to what you're referring?
 
    | lichtenberger wrote:
    | It's in our case the tree root of the index, which is always
    | written after all the descendant pages have been written as a
    | new revision is committed (during a postorder traversal of
    | the new pages).
    | 
    | In ZFS the UberPage is called UberBlock. We borrowed some of
    | the concepts as to add checksums in the parent pages instead
    | of the pages itself. In ZFS they are blocks :-)
    | 
    | Thanks for asking.
 
  | ithrow wrote:
  | Yes but sirix is not a SQL/RDBMS.
 
    | lichtenberger wrote:
    | True. In general I could add storing relational data as well,
    | but currently I'm entirely focusing on JSON and auto-indexing
    | for secondary indexes as well higher order function support
    | in Brackit.
    | 
    | Of course we'd need more man-power as I'm more or less the
    | only one working on the core in my spare time (since 2012).
    | 
    | Moshe mainly works on the clients and a frontend. A new
    | frontend based on SolidJS is in the works showing the history
    | and diffs as in
    | 
    | https://raw.githubusercontent.com/sirixdb/sirix/master/Scree.
    | ..
    | 
    | However, we're of course looking forward to suggestions, bug
    | reports, real world use cases and contributions :-)
 
  | lichtenberger wrote:
  | In order to provide fast audits of subtrees, the system stores
  | optionally a merkle hash tree (a hash in each node) and updates
  | the hashes for all ancestors automatically during updates.
 
  | chatmasta wrote:
  | You might like what we're doing with Splitgraph. Our command
  | line tool (sgr) installs an audit log into Postgres to track
  | changes [0]. Then `sgr commit` can write these changes to
  | delta-compressed objects [1], where each object is a columnar
  | fragment of data, addressable by the LTHash of rows
  | added/deleted by the fragment, and attached to metadata
  | describing its index [2].
  | 
  | I haven't explored sirix before, but at first glance it looks
  | like we have some similar ideas -- thanks for sharing, I'm
  | excited to learn more, especially about its application of ZFS.
  | 
  | [0] https://www.splitgraph.com/docs/working-with-
  | data/tracking-c...
  | 
  | [1] https://www.splitgraph.com/docs/concepts/objects
  | 
  | [2]
  | https://github.com/splitgraph/splitgraph/blob/master/splitgr...
 
    | lichtenberger wrote:
    | Very interesting, thanks for pointing out :-)
 
| oliverrice wrote:
| author here (of the blog post and
| https://github.com/supabase/supa_audit)
| 
| happy to answer any questions
 
  | simonw wrote:
  | Are you at all worried about the size of the audit table?
  | Keeping a JSON copy of both the old and the new data is going
  | to add up pretty quickly for fast-changing tables.
  | 
  | I've built audit systems in the past which track just the new
  | values, with the idea being that you can replay the audit log
  | for a particular record from the beginning to reconstruct its
  | state. I'm not convinced that's a better approach than yours
  | here though, just interested in your thinking.
 
    | oliverrice wrote:
    | > Are you at all worried about the size of the audit table?
    | Keeping a JSON copy of both the old and the new data is going
    | to add up pretty quickly for fast-changing tables.
    | 
    | That's def a valid concern if you're auditing a high
    | frequency insert/update table, or have your entire database
    | under audit. If you do have those needs a system that logs
    | outside of postgres (like pgaudit) would be a better fit.
    | 
    | In my experience most startups and mid-size enterprises
    | sprinkle in auditing around the sensitive parts of the DB,
    | like `account` or `access_control` tables where writes load
    | isn't much of a concern.
    | 
    | > I've built audit systems in the past which track just the
    | new values
    | 
    | Yeah, that solution works great! The one place it breaks down
    | is if you apply auditing to an existing table, and need to be
    | able to recover the records from when auditing was enabled
    | (initial state is not snapshotted)
    | 
    | We toyed with another approach to avoiding having to track
    | old_record by first checking to see if the `record_id` exists
    | in the audit table, and then inserting a row with the OLD
    | values with a `SNAPSHOT` `operation` if it does not.
    | 
    | Even though that query/check was operating on an index, the
    | performance overhead was higher than we were comfortable
    | with.
    | 
    | Storage is pretty cheap these days so we opted to optimize
    | for write throughput + reduced memory usage rather than any
    | considerations wrt disk.
 
    | jsmith99 wrote:
    | I've also used an update trigger to track only changed
    | columns, inspired by this post[0] which wraps a comparison
    | function into a new subtract operator for JSONB, similar to
    | the built in subtract operator for HSTORE which removes any
    | keypairs that haven't changed.
    | 
    | [0] http://8kb.co.uk/blog/2015/01/19/copying-pavel-stehules-
    | simp...
 
      | oliverrice wrote:
      | thats a cool solution for reducing storage. nice!
 
    | craigkerstiens wrote:
    | Building a bit off Simon's above question. I'm curious if any
    | thought was given to using JSON over JSONB. The trade off of
    | really fast ability to ingest them and save them seems like
    | it could be better than JSONB which has to do some conversion
    | and if there are any indexes on the JSONB columns could
    | really dramatically slow down throughput.
 
      | oliverrice wrote:
      | > I'm curious if any thought was given to using JSON over
      | JSONB
      | 
      | For sure! There is a conversion penalty that is paid
      | incrementally (at write time). For paying that penalty,
      | jsonb gets you reduced storage size and dramatically faster
      | column value extraction with `->` or `->>` if you end up
      | having to filter the audit table by something other than
      | the `record_id`
      | 
      | It is a tradeoff though and depending on the specifics of
      | your use-case JSON could be a better choice.
 
  | netcraft wrote:
  | this is great! I've done something similar many times, but the
  | ability to do it on arbitrary tables is really nice.
  | 
  | If you have the HSTORE extension, you can also use it to get a
  | diff of the old and new: `HSTORE(OLD) - HSTORE(NEW)` and vice
  | versa, very nice to easily have the deltas.
 
  | perlgeek wrote:
  | First of all, this looks super awesome, and at least an order
  | of magnitude better than our home-grown solution (grown over
  | 25+ years on a DB that is now mariadb).
  | 
  | How well does this work when you audit multiple tables with
  | foreign key relationships between them?
  | 
  | If the user references a customer, and you want to find out
  | what customer the user belonged to at time X (but the customer
  | might have been deleted in the mean time), you have to track
  | through the history of multiple tables to find answers. More if
  | you have n2m relationships...
  | 
  | It sounds doable, but not trivial to get right for more
  | complicated models.
  | 
  | ---
  | 
  | Update: to clarify, I want to be able to present a somewhat
  | homan-readable history of an object without doing much table-
  | specific logic.
  | 
  | If an author can publish many books, you typically have an n2m
  | mapping table, like book_authors. If you want to add newly
  | published books to the "author" history view, you have to
  | determine than you have to include parts of the history of the
  | book_authors table. I don't think you can do that through a
  | join though? Trying to think my way through this...
  | 
  | Maybe this can be dealt with easier if you can assume each
  | table has a non-composite primary key, and include that primary
  | key in the history table, and then you can do joins more easily
  | on the history?
 
    | oliverrice wrote:
    | All the data would be available and joining it would be
    | possible with some jsonb gymnastics but its optimized more
    | for compliance style auditing than rapid lookups for complex
    | queries.
    | 
    | For that scenario replicating audit data to a OLAP system may
    | be a better fit
 
  | simonw wrote:
  | I'm not completely clear on the reason for both record_id and
  | old_record_id - is this so that you can capture instances where
  | one of the columns that makes up the primary key for the row is
  | itself modified for some reason?
 
    | oliverrice wrote:
    | Being able to see when the primary key changes using
    | `record_id` and `old_record_id` is a perk, but we also wanted
    | to make sure a snapshot of the existing data was available
    | when auditing is enabled on a table that already has data in
    | it. See comment ^ for more color
    | 
    | Long story short: it is avoidable if you're okay with paying
    | the penalty of querying the audit table by `record_id` before
    | every write. We decided to use more disk and keep the higher
    | write throughput
 
| bob1029 wrote:
| We have a new business system we are trying to build that has a
| _lot_ of very granular auditing and versioning requirements
| throughout, and this is where I started looking.
| 
| After a while of screwing with this kind of approach, I realized
| that the number of different types of objects involved was going
| to cause an explosion in the number of related auditing and
| versioning tables. Keeping track of not only who changed
| something but also the exact facts they changed (such that you
| could revert as needed) requires a lot of additional boilerplate
| in each instance.
| 
| So, I made a pretty big gamble and went all-in on an event-
| sourced abstraction where everything is written to 1 gigantic
| append-only log of events. Turns out, this was worth it in our
| case. As long as everything has a timestamp and you have a way to
| refer to specific log entries (i.e. by offset), you basically
| have a complete solution. The hard parts are handled by a JSON
| serializer and gzip. We use in-memory indexes that are simply
| long/long dictionaries that map an object identity to a physical
| log offset.
| 
| The only downside with what I have proposed is that there is no
| garbage collection or cleanup possible. At this time, this is
| actually a highly-desirable property of the system. Disk space is
| cheap. Losing 3 weeks of work to an accidental button click is
| not.
 
  | infogulch wrote:
  | I've done this before on smaller scales. Writing a set of
  | table-valued functions that expose the event data in relational
  | form "as of date given by param X", and views on top of that
  | where X is set to getdate(), you get a very ergonomic way to
  | consume "current" & "historical" data. Materialized views can
  | make querying these more natural relations very cheap if your
  | engine supports them.
 
  | pstuart wrote:
  | > everything is written to 1 gigantic append-only log of events
  | 
  | Do you replicate that stream to avoid SPOF?
 
    | bob1029 wrote:
    | I am currently investigating synchronous replication of each
    | compressed log batch to a 2nd host. We are totally fine with
    | availability issues (i.e. system down because log replication
    | broken). Availability problems are far preferable to
    | consistency and other nasty bugs.
 
  | lichtenberger wrote:
  | As posted in another comment that's basically what
  | https://github.com/sirixdb/sirix supports along with easily
  | reconstructing former revisions of a JSON document (or any node
  | therein), sophisticated secondary (also versioned) indexes,
  | querying with JSONiq and optimizations at query compile time as
  | for joins and aggregates....
  | 
  | In our case we opted for three types of secondary user-defined
  | indexes (besides the main document trie index to find unique,
  | stable nodes by their 64bit ID), namely
  | 
  | 1. field indexes
  | 
  | 2. path indexes
  | 
  | 3. content-and-structure indexes which index paths and their
  | typed values
  | 
  | Furthermore a path summary keeps track of all distinct paths in
  | the JSON resource.
 
| scwoodal wrote:
| I didn't see this mentioned in the article but with JSONB you can
| leverage operators to know what data changed.                 >
| -- record = {"id": 1, "name": "bar"}, old_record = {"id": 1,
| "name": "foo"}       > select record - old_record where id=3;
| name => bar       >
 
  | netcraft wrote:
  | I dont think this is built in, I think you must have an
  | extension or custom operator providing that:
  | https://dbfiddle.uk/?rdbms=postgres_14&fiddle=273a440d043a82...
 
    | scwoodal wrote:
    | Ah yes, I was thinking HSTORE.
    | 
    | https://dbfiddle.uk/?rdbms=postgres_14&fiddle=2f406630f367b5.
    | ..
 
  | efxhoy wrote:
  | This SO answer has a good jsonb diff function:
  | https://stackoverflow.com/a/36043269
 
| craigkerstiens wrote:
| Love the detail and approach here, though (and it is mentioned in
| the bottom of the article) this shouldn't be used at any very
| large scale... the estimate of at about 1k transactions per
| second or less seems about right. For any larger scale you want
| to look at something like pg_audit which can scale much much
| further and be used with other extensions to ensure you have a
| full audit log that can't be modified (when dealing with
| compliance).
 
| sususu wrote:
| I really like this approach, however I have a use case where the
| application user id that made the operation must be saved in the
| audit table, unfortunately, I cannot see how to do that with a
| pure SQL solution.
| 
| Has anyone done something similar with SQL only?
 
  | oliverrice wrote:
  | if `user_id` is in the table you're auditing you could do
  | 
  | ``` alter table audit.record_version add column user_id bigint
  | default (coalesce(record, old_record) ->> 'user_id'); ```
  | 
  | or if you meant the SQL `current_user` you could extend the
  | trigger to track it.
  | 
  | but if the user_id is only available in application space (not
  | in the db) it will not be possible using straight SQL
 
  | timwis wrote:
  | PostgREST has an interesting approach to authentication that
  | might give you some ideas:
  | 
  | https://postgrest.org/en/stable/auth.html
  | 
  | Supabase is built on top of PostgREST, but I can't say for sure
  | it uses the same means of authentication (I think it has its
  | own Auth layer).
 
    | oliverrice wrote:
    | Supabase passes all the JWT info to postgrest so it is still
    | available in transaction local config
 
  | netcraft wrote:
  | we do that by setting a local variable before the query and
  | then reading that in the triggers:
  | 
  | ```                 SET LOCAL foo.who_id = 'some-uuid';
  | UPDATE table       SET ...
  | 
  | ```
  | 
  | ```                 -- function to make getting the setting
  | easier       DROP FUNCTION IF EXISTS get_who_id (text);
  | CREATE OR REPLACE FUNCTION get_who_id (default_value text
  | DEFAULT null::text) RETURNS text AS $get_who_id$      DECLARE
  | who_id text;      BEGIN       BEGIN        who_id :=
  | current_setting('foo.who_id');       EXCEPTION        WHEN
  | SQLSTATE '42704' THEN         RETURN default_value;       END;
  | IF (length(who_id) = 0) THEN        RETURN default_value;
  | END IF;       return who_id;      END       $get_who_id$
  | LANGUAGE plpgsql VOLATILE;
  | 
  | ```
  | 
  | ```                 CREATE OR REPLACE FUNCTION some_table_audit
  | () RETURNS   TRIGGER AS $some_table_audit$      DECLARE
  | who_id text;      BEGIN            who_id :=
  | get_who_id(null::text);       ...
  | 
  | ```
  | 
  | Identifiers changed, but hopefully will give you the idea.
 
    | tda wrote:
    | I've experiment with a very similar solution and it felt a
    | bit dirty but so far it seems to be working just fine. I have
    | made an integration for auditing with sqlmodel which I intend
    | to share, but it is kind of rough and I was a bit stuck
    | trying to clean it up. The idea is that you add e.g. a
    | HeroHistory model derived from HeroBase and a HistoryMixin
    | that creates the triggers and relationships to have an Audit
    | log of the Hero table.
    | 
    | If anyone is interested give me a shout out
 
    | sususu wrote:
    | That's really interesting and gave me some (hopefully) good
    | ideas.
    | 
    | Thank you very much!
 
| raimille1 wrote:
| This is a great technique, we used to have it setup very
| similarly at a previous startup and worked wonders. Thanks for
| sharing a generic way of doing it @oliverrice !
 
| ithrow wrote:
| Debezium can be used for this in embedded mode if you are on the
| JVM.
 
  | johndfsgdgdfg wrote:
  | Can you please elaborate? Last time I checked it needed Kafka
  | to work.
 
___________________________________________________________________
(page generated 2022-03-09 23:00 UTC)