|
| bluestreak wrote:
| We launched QuestDB last summer [1, 2]. Our storage model is
| vector-based and append-only. This meant that all incoming data
| had to arrive in the correct time order. This worked well for
| some use cases but we increasingly saw real-world cases where
| data doesn't always land at the database in chronological order.
| We saw plenty of developers and users come and go specifically
| because of this technical limitation. So it became a priority to
| deal with out-of-order data.
|
| The big decision was which direction to take to tackle the
| problem. LSM trees seemed an obvious choice, but we chose an
| alternative route so we wouldn't lose the performance we spent
| years building. Our latest release supports out-of-order
| ingestion by re-ordering data on the fly. That's what this
| article is about.
|
| Also, we had many people asking about the differences between
| QuestDB and other open-source databases and why users should
| consider giving it a try instead of other systems. When we
| launched on HN, readers showed a lot of interest in side-by-side
| comparisons to other databases on the market. One suggestion [3]
| that we thought would be great to try out was to benchmark
| ingestion and query speeds using the Time Series Benchmark Suite
| (TSBS) [4] developed by TimescaleDB. We're super excited to share
| the results in the article.
|
| [1] https://news.ycombinator.com/item?id=23975807
|
| [2] https://news.ycombinator.com/item?id=23616878
|
| [3] https://news.ycombinator.com/item?id=23977183
|
| [4] https://github.com/timescale/tsbs
| Darkphibre wrote:
| Oh, this is fascinating. Seven years ago I architected a true-
| realtime telemetry pipeline with end-to-end sequential
| guarantees (with roundtrip times <200ms excluding network
| latencies, and cloud processing times <20ms, leveraging
| BOND/ProtocolBuffer over AMQP over Websocket). It's still used
| by every 1st-party game for a large publisher.
|
| It allowed for non-windowed event sequence analytics, enabling
| realtime feedback (think achievements that have multiple
| conditions).
|
| And then the requirement was dropped, and (as you've found),
| everyone just uses it like a standard telemetry stream and is
| OK with 5-15min bins. :P
|
| I still have a passion for the space, will definitely be
| reading up on this. I firmly believe this is the future of
| telemetry analytics; Congratulations on your efforts seeing the
| light of day!!
|
| Disclaimer, I currently work for Microsoft, all words here are
| my own and do not necessarily reflect those of my employer,
| etc. ;)
| j1897 wrote:
| Thanks for the kind words and your perspective !
| [deleted]
| alcio wrote:
| Excited to see this new release. Seems to me this would
| (slightly?) negatively impact query performance for recent data
| (when the query concerns data is both in O3 and persisted zones),
| is that the case?
| bluestreak wrote:
| Query performance would be affected in so far as ingest jobs
| share the same thread pool as query jobs. As I am writing this
| I am also realising that perhaps we should have an option to
| separate these jobs... If we ignore resource usage and commit()
| latency, query performance would remain unaffected. Reader
| remains lockless largely unchanged code-wise. This was one of
| our major objectives to maintain data model as seen by the
| readers. I hope I'm making sense here?
| hartem_ wrote:
| Congrats on the release! The benchmark results look really
| impressive :).
|
| Curious to learn more about your approach to verifying the
| correctness of the implementation. Did you try testing it with
| Jepsen or something similar?
| bluestreak wrote:
| Thank you! We are not yet distributed. That's coming right up
| along with Jensen style tests. We are really serious about
| testing!
___________________________________________________________________
(page generated 2021-05-11 23:00 UTC) |