[HN Gopher] HouseWatch: Open-source tool for monitoring and mana...
___________________________________________________________________
 
HouseWatch: Open-source tool for monitoring and managing ClickHouse
clusters
 
Author : yakkomajuri
Score  : 84 points
Date   : 2023-06-17 12:09 UTC (10 hours ago)
 
web link (github.com)
w3m dump (github.com)
 
| craigching wrote:
| Integrate this into Grafana as an app plugin and you'd have me. I
| don't want to leave Grafana where I have all my other operational
| dashboards for this.
 
| cplli wrote:
| Now we need a tool called NeighbourhoodWatch to monitor the
| cluster monitors.
 
  | DaiPlusPlus wrote:
  | [flagged]
 
  | donutshop wrote:
  | NeighborhoodWatch for US based resources
 
| KevinChen6 wrote:
| [flagged]
 
  | Exuma wrote:
  | Define "Better"
 
    | KevinChen6 wrote:
    | MDX describes data through a multidimensional structure,
    | which makes the semantic model it presents closer to the real
    | business, and based on this multidimensional model for more
    | complex queries, SQL models can also provide similar
    | capabilities, but it may be laborious or even extremely
    | difficult to achieve when dealing with complex queries, but
    | MDX also has disadvantages compared to SQL, that is, to
    | thoroughly understand the multidimensional data model than to
    | understand the SQL table model requires more learning costs.
 
| JOnAgain wrote:
| Readme could link to or explain what clickhouse is, for those of
| us who might not know.
 
  | linuxdude314 wrote:
  | [flagged]
 
  | mdekkers wrote:
  | Clickhouse is a really cool and stupidly fast columnar database
 
    | schoolornot wrote:
    | I understand why OLAP writes are faster but is there any
    | reason why OLTPs can't achieve similar read performance with
    | denormalized and sharded data?
 
    | tempest_ wrote:
    | What problems can I solve with a columnar database?
    | 
    | What type of data benefits from that type of Database?
 
      | datatrashfire wrote:
      | Row based databases are optimized for accessing compete
      | rows and joins. Columnar storage is optimized for accessing
      | all, or many column values across rows. This makes
      | aggregates and applying transformation logic faster with
      | columnar storage than row based storage. Ie they are great
      | for data warehouses and other analytical workloads.
      | 
      | Ps, great and still highly relevant resource covering all
      | the major database system designs, their advantages and
      | drawbacks: https://www.oreilly.com/library/view/designing-
      | data-intensiv...
 
      | anonacct37 wrote:
      | This is an overly simplistic but also correct answer:
      | clickhouse was developed for analytics on clickstreams.
      | 
      | Technically the overall idea is that if you have lots of
      | queries that only read certain columns and your database
      | stores rows contiguously it's a waste to read a whole row
      | and then discard columns.
      | 
      | Also compression (such as run length or delta or even ztsd)
      | often works better if you give it a block of data that's
      | from one column (such as a timestamp or tag value).
 
      | linuxdude314 wrote:
      | That's a longer subject that fits in a comment here.
      | 
      | If you are _actually_ interested I suggest using google
      | search to find some good sites that go over what a column
      | oriented database does/is used for.
      | 
      | This isn't hard; I'll get you started:
      | 
      | https://www.kdnuggets.com/2021/02/understanding-nosql-
      | databa...
 
        | Exuma wrote:
        | Or he, you know, could just ask, because that is the
        | spirit of discussion.
 
      | FridgeSeal wrote:
      | Less about the data itself and more about the specific
      | operations you want to do on it.
      | 
      | Large aggregations, massive datasets, large joins, and
      | workloads that are ready heavy and eschew row-level
      | mutations.
      | 
      | They get used for data analysis frequently, time series
      | data and associated analysis meshes quite nicely too.
      | ClickHouse itself was originally built to support arbitrary
      | analytical queries on clickstream data at pretty massive
      | scale. Cloudflare uses it for live analytics, Uber uses it
      | for logs.
 
      | esafak wrote:
      | Columnar databases let you do fast aggregations and read
      | only the columns you are interested in. They are for
      | analyzing data.
 
      | cplli wrote:
      | Personally tried it, it can handle logs nicely. And from
      | their page, many more things
      | 
      | https://clickhouse.com/use-cases
 
        | craigching wrote:
        | Uber wrote a blog on using Clickhouse to store logs:
        | https://www.uber.com/blog/logging/
 
      | Dachande663 wrote:
      | Cloudflare use it to ingest 6M/s
      | 
      | https://blog.cloudflare.com/http-analytics-
      | for-6m-requests-p...
 
        | jgrahamc wrote:
        | Way more than that now.
 
      | pjot wrote:
      | An over simplification:
      | 
      | Columnar stores are optimized for reads. Row stores are
      | optimized for writes.
 
      | Exuma wrote:
      | Imagine you have a small business that tracks in the order
      | of 10's - 100's of millions of events (pageviews, clicks,
      | whatever), and you have reporting you want to run. Trying
      | to do this in PG/MySQL would likely need to use
      | materialized views so your reports don't take a long time
      | to run. You could store your event data in CH directly, or
      | use ELT/ETL process to sync/copy it into clickhouse just
      | for reporting. Then, your queries would be very fast. It's
      | must faster (for certain types of queries, mainly
      | timeseries queries or queries involving aggregation of many
      | rows). It's faster because of how the data is stored on
      | disk. It's NOT good for fetching/updating/deleting single
      | rows however.
      | 
      | It's originally designed to handle hundreds of columns, and
      | billions of rows, but I think it can still apply to much
      | smaller use cases that value performance. I'm implementing
      | it currently in a similar scenario, and I'm using AirByte
      | OSS version to ELT from postgres. Then I'm using tableau or
      | some other BI tool to analyze that data much more
      | effectively (I will be trying to perform complex
      | aggregations/group by reports on 100mm rows)
 
| ram_rar wrote:
| Love the tool, but its not practical in the enterprise world to
| have yet another dashboard service to look at just for metrics.
| It would be great, if this plays well with grafana or Otel
| collectors.
| 
| OTOH, monitoring long running background jobs on CH cluster is
| very valuable to have. Its real pain to verify, if parent and
| child queries have executed correctly. I would suggest doubling
| down on features that users cannot readily get via grafana or
| Otel.
 
  | nightpool wrote:
  | "not practical" for who? If you need to debug your clickhouse
  | clusters, you look at the clickhouse tool. That's it. This
  | isn't an alerting/monitoring solution, it's a specialized tool
  | for debugging and fixing issues with running clusters.
  | 
  | that kind of thinking (that it's too hard to learn a second
  | tool) is how datadog gets away with charging $$$$ for mediocre
  | versions of 10 different products that cost an order of
  | magnitude more than they would individually. the benefits you
  | get from combining everything into one tool are vastly
  | overstated compared to the benefits you get from having the in-
  | house expertise to use the right tool for the job.
 
___________________________________________________________________
(page generated 2023-06-17 23:01 UTC)