proxy70

	[HN Gopher] HouseWatch: Open-source tool for monitoring and mana... ___________________________________________________________________ HouseWatch: Open-source tool for monitoring and managing ClickHouse clusters Author : yakkomajuri Score : 84 points Date : 2023-06-17 12:09 UTC (10 hours ago)
	web link (github.com)
	w3m dump (github.com)
	\| craigching wrote: \| Integrate this into Grafana as an app plugin and you'd have me. I \| don't want to leave Grafana where I have all my other operational \| dashboards for this. \| cplli wrote: \| Now we need a tool called NeighbourhoodWatch to monitor the \| cluster monitors. \| DaiPlusPlus wrote: \| [flagged] \| donutshop wrote: \| NeighborhoodWatch for US based resources \| KevinChen6 wrote: \| [flagged] \| Exuma wrote: \| Define "Better" \| KevinChen6 wrote: \| MDX describes data through a multidimensional structure, \| which makes the semantic model it presents closer to the real \| business, and based on this multidimensional model for more \| complex queries, SQL models can also provide similar \| capabilities, but it may be laborious or even extremely \| difficult to achieve when dealing with complex queries, but \| MDX also has disadvantages compared to SQL, that is, to \| thoroughly understand the multidimensional data model than to \| understand the SQL table model requires more learning costs. \| JOnAgain wrote: \| Readme could link to or explain what clickhouse is, for those of \| us who might not know. \| linuxdude314 wrote: \| [flagged] \| mdekkers wrote: \| Clickhouse is a really cool and stupidly fast columnar database \| schoolornot wrote: \| I understand why OLAP writes are faster but is there any \| reason why OLTPs can't achieve similar read performance with \| denormalized and sharded data? \| tempest_ wrote: \| What problems can I solve with a columnar database? \| \| What type of data benefits from that type of Database? \| datatrashfire wrote: \| Row based databases are optimized for accessing compete \| rows and joins. Columnar storage is optimized for accessing \| all, or many column values across rows. This makes \| aggregates and applying transformation logic faster with \| columnar storage than row based storage. Ie they are great \| for data warehouses and other analytical workloads. \| \| Ps, great and still highly relevant resource covering all \| the major database system designs, their advantages and \| drawbacks: https://www.oreilly.com/library/view/designing- \| data-intensiv... \| anonacct37 wrote: \| This is an overly simplistic but also correct answer: \| clickhouse was developed for analytics on clickstreams. \| \| Technically the overall idea is that if you have lots of \| queries that only read certain columns and your database \| stores rows contiguously it's a waste to read a whole row \| and then discard columns. \| \| Also compression (such as run length or delta or even ztsd) \| often works better if you give it a block of data that's \| from one column (such as a timestamp or tag value). \| linuxdude314 wrote: \| That's a longer subject that fits in a comment here. \| \| If you are _actually_ interested I suggest using google \| search to find some good sites that go over what a column \| oriented database does/is used for. \| \| This isn't hard; I'll get you started: \| \| https://www.kdnuggets.com/2021/02/understanding-nosql- \| databa... \| Exuma wrote: \| Or he, you know, could just ask, because that is the \| spirit of discussion. \| FridgeSeal wrote: \| Less about the data itself and more about the specific \| operations you want to do on it. \| \| Large aggregations, massive datasets, large joins, and \| workloads that are ready heavy and eschew row-level \| mutations. \| \| They get used for data analysis frequently, time series \| data and associated analysis meshes quite nicely too. \| ClickHouse itself was originally built to support arbitrary \| analytical queries on clickstream data at pretty massive \| scale. Cloudflare uses it for live analytics, Uber uses it \| for logs. \| esafak wrote: \| Columnar databases let you do fast aggregations and read \| only the columns you are interested in. They are for \| analyzing data. \| cplli wrote: \| Personally tried it, it can handle logs nicely. And from \| their page, many more things \| \| https://clickhouse.com/use-cases \| craigching wrote: \| Uber wrote a blog on using Clickhouse to store logs: \| https://www.uber.com/blog/logging/ \| Dachande663 wrote: \| Cloudflare use it to ingest 6M/s \| \| https://blog.cloudflare.com/http-analytics- \| for-6m-requests-p... \| jgrahamc wrote: \| Way more than that now. \| pjot wrote: \| An over simplification: \| \| Columnar stores are optimized for reads. Row stores are \| optimized for writes. \| Exuma wrote: \| Imagine you have a small business that tracks in the order \| of 10's - 100's of millions of events (pageviews, clicks, \| whatever), and you have reporting you want to run. Trying \| to do this in PG/MySQL would likely need to use \| materialized views so your reports don't take a long time \| to run. You could store your event data in CH directly, or \| use ELT/ETL process to sync/copy it into clickhouse just \| for reporting. Then, your queries would be very fast. It's \| must faster (for certain types of queries, mainly \| timeseries queries or queries involving aggregation of many \| rows). It's faster because of how the data is stored on \| disk. It's NOT good for fetching/updating/deleting single \| rows however. \| \| It's originally designed to handle hundreds of columns, and \| billions of rows, but I think it can still apply to much \| smaller use cases that value performance. I'm implementing \| it currently in a similar scenario, and I'm using AirByte \| OSS version to ELT from postgres. Then I'm using tableau or \| some other BI tool to analyze that data much more \| effectively (I will be trying to perform complex \| aggregations/group by reports on 100mm rows) \| ram_rar wrote: \| Love the tool, but its not practical in the enterprise world to \| have yet another dashboard service to look at just for metrics. \| It would be great, if this plays well with grafana or Otel \| collectors. \| \| OTOH, monitoring long running background jobs on CH cluster is \| very valuable to have. Its real pain to verify, if parent and \| child queries have executed correctly. I would suggest doubling \| down on features that users cannot readily get via grafana or \| Otel. \| nightpool wrote: \| "not practical" for who? If you need to debug your clickhouse \| clusters, you look at the clickhouse tool. That's it. This \| isn't an alerting/monitoring solution, it's a specialized tool \| for debugging and fixing issues with running clusters. \| \| that kind of thinking (that it's too hard to learn a second \| tool) is how datadog gets away with charging $$$$ for mediocre \| versions of 10 different products that cost an order of \| magnitude more than they would individually. the benefits you \| get from combining everything into one tool are vastly \| overstated compared to the benefits you get from having the in- \| house expertise to use the right tool for the job. ___________________________________________________________________ (page generated 2023-06-17 23:01 UTC)