|
| craigching wrote:
| Integrate this into Grafana as an app plugin and you'd have me. I
| don't want to leave Grafana where I have all my other operational
| dashboards for this.
| cplli wrote:
| Now we need a tool called NeighbourhoodWatch to monitor the
| cluster monitors.
| DaiPlusPlus wrote:
| [flagged]
| donutshop wrote:
| NeighborhoodWatch for US based resources
| KevinChen6 wrote:
| [flagged]
| Exuma wrote:
| Define "Better"
| KevinChen6 wrote:
| MDX describes data through a multidimensional structure,
| which makes the semantic model it presents closer to the real
| business, and based on this multidimensional model for more
| complex queries, SQL models can also provide similar
| capabilities, but it may be laborious or even extremely
| difficult to achieve when dealing with complex queries, but
| MDX also has disadvantages compared to SQL, that is, to
| thoroughly understand the multidimensional data model than to
| understand the SQL table model requires more learning costs.
| JOnAgain wrote:
| Readme could link to or explain what clickhouse is, for those of
| us who might not know.
| linuxdude314 wrote:
| [flagged]
| mdekkers wrote:
| Clickhouse is a really cool and stupidly fast columnar database
| schoolornot wrote:
| I understand why OLAP writes are faster but is there any
| reason why OLTPs can't achieve similar read performance with
| denormalized and sharded data?
| tempest_ wrote:
| What problems can I solve with a columnar database?
|
| What type of data benefits from that type of Database?
| datatrashfire wrote:
| Row based databases are optimized for accessing compete
| rows and joins. Columnar storage is optimized for accessing
| all, or many column values across rows. This makes
| aggregates and applying transformation logic faster with
| columnar storage than row based storage. Ie they are great
| for data warehouses and other analytical workloads.
|
| Ps, great and still highly relevant resource covering all
| the major database system designs, their advantages and
| drawbacks: https://www.oreilly.com/library/view/designing-
| data-intensiv...
| anonacct37 wrote:
| This is an overly simplistic but also correct answer:
| clickhouse was developed for analytics on clickstreams.
|
| Technically the overall idea is that if you have lots of
| queries that only read certain columns and your database
| stores rows contiguously it's a waste to read a whole row
| and then discard columns.
|
| Also compression (such as run length or delta or even ztsd)
| often works better if you give it a block of data that's
| from one column (such as a timestamp or tag value).
| linuxdude314 wrote:
| That's a longer subject that fits in a comment here.
|
| If you are _actually_ interested I suggest using google
| search to find some good sites that go over what a column
| oriented database does/is used for.
|
| This isn't hard; I'll get you started:
|
| https://www.kdnuggets.com/2021/02/understanding-nosql-
| databa...
| Exuma wrote:
| Or he, you know, could just ask, because that is the
| spirit of discussion.
| FridgeSeal wrote:
| Less about the data itself and more about the specific
| operations you want to do on it.
|
| Large aggregations, massive datasets, large joins, and
| workloads that are ready heavy and eschew row-level
| mutations.
|
| They get used for data analysis frequently, time series
| data and associated analysis meshes quite nicely too.
| ClickHouse itself was originally built to support arbitrary
| analytical queries on clickstream data at pretty massive
| scale. Cloudflare uses it for live analytics, Uber uses it
| for logs.
| esafak wrote:
| Columnar databases let you do fast aggregations and read
| only the columns you are interested in. They are for
| analyzing data.
| cplli wrote:
| Personally tried it, it can handle logs nicely. And from
| their page, many more things
|
| https://clickhouse.com/use-cases
| craigching wrote:
| Uber wrote a blog on using Clickhouse to store logs:
| https://www.uber.com/blog/logging/
| Dachande663 wrote:
| Cloudflare use it to ingest 6M/s
|
| https://blog.cloudflare.com/http-analytics-
| for-6m-requests-p...
| jgrahamc wrote:
| Way more than that now.
| pjot wrote:
| An over simplification:
|
| Columnar stores are optimized for reads. Row stores are
| optimized for writes.
| Exuma wrote:
| Imagine you have a small business that tracks in the order
| of 10's - 100's of millions of events (pageviews, clicks,
| whatever), and you have reporting you want to run. Trying
| to do this in PG/MySQL would likely need to use
| materialized views so your reports don't take a long time
| to run. You could store your event data in CH directly, or
| use ELT/ETL process to sync/copy it into clickhouse just
| for reporting. Then, your queries would be very fast. It's
| must faster (for certain types of queries, mainly
| timeseries queries or queries involving aggregation of many
| rows). It's faster because of how the data is stored on
| disk. It's NOT good for fetching/updating/deleting single
| rows however.
|
| It's originally designed to handle hundreds of columns, and
| billions of rows, but I think it can still apply to much
| smaller use cases that value performance. I'm implementing
| it currently in a similar scenario, and I'm using AirByte
| OSS version to ELT from postgres. Then I'm using tableau or
| some other BI tool to analyze that data much more
| effectively (I will be trying to perform complex
| aggregations/group by reports on 100mm rows)
| ram_rar wrote:
| Love the tool, but its not practical in the enterprise world to
| have yet another dashboard service to look at just for metrics.
| It would be great, if this plays well with grafana or Otel
| collectors.
|
| OTOH, monitoring long running background jobs on CH cluster is
| very valuable to have. Its real pain to verify, if parent and
| child queries have executed correctly. I would suggest doubling
| down on features that users cannot readily get via grafana or
| Otel.
| nightpool wrote:
| "not practical" for who? If you need to debug your clickhouse
| clusters, you look at the clickhouse tool. That's it. This
| isn't an alerting/monitoring solution, it's a specialized tool
| for debugging and fixing issues with running clusters.
|
| that kind of thinking (that it's too hard to learn a second
| tool) is how datadog gets away with charging $$$$ for mediocre
| versions of 10 different products that cost an order of
| magnitude more than they would individually. the benefits you
| get from combining everything into one tool are vastly
| overstated compared to the benefits you get from having the in-
| house expertise to use the right tool for the job.
___________________________________________________________________
(page generated 2023-06-17 23:01 UTC) |