[HN Gopher] Building a new vector based storage model
___________________________________________________________________
 
Building a new vector based storage model
 
Author : bluestreak
Score  : 51 points
Date   : 2021-05-11 13:51 UTC (9 hours ago)
 
web link (questdb.slab.com)
w3m dump (questdb.slab.com)
 
| bluestreak wrote:
| We launched QuestDB last summer [1, 2]. Our storage model is
| vector-based and append-only. This meant that all incoming data
| had to arrive in the correct time order. This worked well for
| some use cases but we increasingly saw real-world cases where
| data doesn't always land at the database in chronological order.
| We saw plenty of developers and users come and go specifically
| because of this technical limitation. So it became a priority to
| deal with out-of-order data.
| 
| The big decision was which direction to take to tackle the
| problem. LSM trees seemed an obvious choice, but we chose an
| alternative route so we wouldn't lose the performance we spent
| years building. Our latest release supports out-of-order
| ingestion by re-ordering data on the fly. That's what this
| article is about.
| 
| Also, we had many people asking about the differences between
| QuestDB and other open-source databases and why users should
| consider giving it a try instead of other systems. When we
| launched on HN, readers showed a lot of interest in side-by-side
| comparisons to other databases on the market. One suggestion [3]
| that we thought would be great to try out was to benchmark
| ingestion and query speeds using the Time Series Benchmark Suite
| (TSBS) [4] developed by TimescaleDB. We're super excited to share
| the results in the article.
| 
| [1] https://news.ycombinator.com/item?id=23975807
| 
| [2] https://news.ycombinator.com/item?id=23616878
| 
| [3] https://news.ycombinator.com/item?id=23977183
| 
| [4] https://github.com/timescale/tsbs
 
  | Darkphibre wrote:
  | Oh, this is fascinating. Seven years ago I architected a true-
  | realtime telemetry pipeline with end-to-end sequential
  | guarantees (with roundtrip times <200ms excluding network
  | latencies, and cloud processing times <20ms, leveraging
  | BOND/ProtocolBuffer over AMQP over Websocket). It's still used
  | by every 1st-party game for a large publisher.
  | 
  | It allowed for non-windowed event sequence analytics, enabling
  | realtime feedback (think achievements that have multiple
  | conditions).
  | 
  | And then the requirement was dropped, and (as you've found),
  | everyone just uses it like a standard telemetry stream and is
  | OK with 5-15min bins. :P
  | 
  | I still have a passion for the space, will definitely be
  | reading up on this. I firmly believe this is the future of
  | telemetry analytics; Congratulations on your efforts seeing the
  | light of day!!
  | 
  | Disclaimer, I currently work for Microsoft, all words here are
  | my own and do not necessarily reflect those of my employer,
  | etc. ;)
 
    | j1897 wrote:
    | Thanks for the kind words and your perspective !
 
  | [deleted]
 
| alcio wrote:
| Excited to see this new release. Seems to me this would
| (slightly?) negatively impact query performance for recent data
| (when the query concerns data is both in O3 and persisted zones),
| is that the case?
 
  | bluestreak wrote:
  | Query performance would be affected in so far as ingest jobs
  | share the same thread pool as query jobs. As I am writing this
  | I am also realising that perhaps we should have an option to
  | separate these jobs... If we ignore resource usage and commit()
  | latency, query performance would remain unaffected. Reader
  | remains lockless largely unchanged code-wise. This was one of
  | our major objectives to maintain data model as seen by the
  | readers. I hope I'm making sense here?
 
| hartem_ wrote:
| Congrats on the release! The benchmark results look really
| impressive :).
| 
| Curious to learn more about your approach to verifying the
| correctness of the implementation. Did you try testing it with
| Jepsen or something similar?
 
  | bluestreak wrote:
  | Thank you! We are not yet distributed. That's coming right up
  | along with Jensen style tests. We are really serious about
  | testing!
 
___________________________________________________________________
(page generated 2021-05-11 23:00 UTC)