proxy70

	[HN Gopher] Building a new vector based storage model ___________________________________________________________________ Building a new vector based storage model Author : bluestreak Score : 51 points Date : 2021-05-11 13:51 UTC (9 hours ago)
	web link (questdb.slab.com)
	w3m dump (questdb.slab.com)
	\| bluestreak wrote: \| We launched QuestDB last summer [1, 2]. Our storage model is \| vector-based and append-only. This meant that all incoming data \| had to arrive in the correct time order. This worked well for \| some use cases but we increasingly saw real-world cases where \| data doesn't always land at the database in chronological order. \| We saw plenty of developers and users come and go specifically \| because of this technical limitation. So it became a priority to \| deal with out-of-order data. \| \| The big decision was which direction to take to tackle the \| problem. LSM trees seemed an obvious choice, but we chose an \| alternative route so we wouldn't lose the performance we spent \| years building. Our latest release supports out-of-order \| ingestion by re-ordering data on the fly. That's what this \| article is about. \| \| Also, we had many people asking about the differences between \| QuestDB and other open-source databases and why users should \| consider giving it a try instead of other systems. When we \| launched on HN, readers showed a lot of interest in side-by-side \| comparisons to other databases on the market. One suggestion [3] \| that we thought would be great to try out was to benchmark \| ingestion and query speeds using the Time Series Benchmark Suite \| (TSBS) [4] developed by TimescaleDB. We're super excited to share \| the results in the article. \| \| [1] https://news.ycombinator.com/item?id=23975807 \| \| [2] https://news.ycombinator.com/item?id=23616878 \| \| [3] https://news.ycombinator.com/item?id=23977183 \| \| [4] https://github.com/timescale/tsbs \| Darkphibre wrote: \| Oh, this is fascinating. Seven years ago I architected a true- \| realtime telemetry pipeline with end-to-end sequential \| guarantees (with roundtrip times <200ms excluding network \| latencies, and cloud processing times <20ms, leveraging \| BOND/ProtocolBuffer over AMQP over Websocket). It's still used \| by every 1st-party game for a large publisher. \| \| It allowed for non-windowed event sequence analytics, enabling \| realtime feedback (think achievements that have multiple \| conditions). \| \| And then the requirement was dropped, and (as you've found), \| everyone just uses it like a standard telemetry stream and is \| OK with 5-15min bins. :P \| \| I still have a passion for the space, will definitely be \| reading up on this. I firmly believe this is the future of \| telemetry analytics; Congratulations on your efforts seeing the \| light of day!! \| \| Disclaimer, I currently work for Microsoft, all words here are \| my own and do not necessarily reflect those of my employer, \| etc. ;) \| j1897 wrote: \| Thanks for the kind words and your perspective ! \| [deleted] \| alcio wrote: \| Excited to see this new release. Seems to me this would \| (slightly?) negatively impact query performance for recent data \| (when the query concerns data is both in O3 and persisted zones), \| is that the case? \| bluestreak wrote: \| Query performance would be affected in so far as ingest jobs \| share the same thread pool as query jobs. As I am writing this \| I am also realising that perhaps we should have an option to \| separate these jobs... If we ignore resource usage and commit() \| latency, query performance would remain unaffected. Reader \| remains lockless largely unchanged code-wise. This was one of \| our major objectives to maintain data model as seen by the \| readers. I hope I'm making sense here? \| hartem_ wrote: \| Congrats on the release! The benchmark results look really \| impressive :). \| \| Curious to learn more about your approach to verifying the \| correctness of the implementation. Did you try testing it with \| Jepsen or something similar? \| bluestreak wrote: \| Thank you! We are not yet distributed. That's coming right up \| along with Jensen style tests. We are really serious about \| testing! ___________________________________________________________________ (page generated 2021-05-11 23:00 UTC)