[HN Gopher] Hutter Prize for compressing human knowledge
___________________________________________________________________
 
Hutter Prize for compressing human knowledge
 
Author : kelseyfrog
Score  : 45 points
Date   : 2023-09-13 22:03 UTC (56 minutes ago)
 
web link (prize.hutter1.net)
w3m dump (prize.hutter1.net)
 
| TheRealPomax wrote:
| Q: Why do you restrict to a single CPU core and exclude GPUs?
| A: The primary intention is to limit compute and memory to some
| generally available amount in a transparent, easy, fair, and
| measurable way. 100 hours on one i7 core with 10GB RAM seems to
| get sufficiently close to this ideal
| 
| Sorry, who are these people that don't have a GPU? Even laptops
| have GPUs. Why would you spend 100 hours on an "i7" (which
| generation? 4790K or six times faster 13700k?) CPU when you can
| achieve orders of magnitude better performance on a consumer GPU
| that literally everyone has access to?
 
  | lucb1e wrote:
  | Note that the competition close to 20 years old
  | 
  | ...though I also had a GPU in 2006, so idk. Then again, you
  | need to define _something_ as reference hardware and it doesn
  | 't really matter what it is. Better compression should win out
  | over less-good compression no matter if you run both on a
  | 100-core system or a 1-core system, I think?
 
    | TheRealPomax wrote:
    | In the category "then update your FAQ, you've have many, many
    | years to do so" =D
    | 
    | (not to change the rules, but to explain why they rules
    | _haven 't_ changed. Level playing fields are a worthwhile
    | pursuit)
 
  | caseyavila wrote:
  | I do think it's interesting that recent submissions use nearly
  | the entire 50 hours. I wonder how much better people could do
  | if faster hardware was allowed.
 
| dang wrote:
| Related. Others?
| 
|  _Saurabh Kumar 's fast-cmix wins EUR5187 Hutter Prize Award_ -
| https://news.ycombinator.com/item?id=36839446 - July 2023 (1
| comment)
| 
|  _Hutter Prize Submission 2021a: STARLIT and cmix (2021)_ -
| https://news.ycombinator.com/item?id=36745104 - July 2023 (1
| comment)
| 
|  _Hutter Prize Entry: Saurabh Kumar 's "Fast Cmix" Starts 30 Day
| Comment Period_ - https://news.ycombinator.com/item?id=36154813 -
| June 2023 (5 comments)
| 
|  _Hutter Prize_ - https://news.ycombinator.com/item?id=33046194 -
| Oct 2022 (3 comments)
| 
|  _Hutter Prize_ - https://news.ycombinator.com/item?id=26562212 -
| March 2021 (48 comments)
| 
|  _500 '000EUR Prize for Compressing Human Knowledge_ -
| https://news.ycombinator.com/item?id=22431251 - Feb 2020 (1
| comment)
| 
|  _Hutter Prize expanded by a factor of 10_ -
| https://news.ycombinator.com/item?id=22388359 - Feb 2020 (2
| comments)
| 
|  _Hutter Prize: up to 50k EUR for the best compression algorithm_
| - https://news.ycombinator.com/item?id=21903594 - Dec 2019 (2
| comments)
| 
|  _Hutter Prize: Compress a 100MB file to less than the current
| record of 16 MB_ - https://news.ycombinator.com/item?id=20669827
| - Aug 2019 (101 comments)
| 
|  _New Hutter Prize submission - 8 years since previous winner_ -
| https://news.ycombinator.com/item?id=14478373 - June 2017 (1
| comment)
| 
|  _Hutter Prize for Compressing Human Knowledge_ -
| https://news.ycombinator.com/item?id=7405129 - March 2014 (24
| comments)
| 
|  _Build a human-level AI by compressing Wikipedia_ -
| https://news.ycombinator.com/item?id=143704 - March 2008 (4
| comments)
 
| slashdev wrote:
| I think the mistake here is to require lossless compression.
| 
| Humans and LLMs only do lossy compression. I think lossy
| compression might be more critical to intelligence. The ability
| to forget, change your synapses or weights, is crucial to being
| able to adapt to change.
 
  | version_five wrote:
  | Yeah it makes no sense to say it's inspired by intelligence and
  | then require lossless which is definitionally rote work and not
  | intelligent.
 
    | whimsicalism wrote:
    | Not true, a smart model could be really good at lossy
    | compression and then you only have to store a small delta to
    | make it lossless.
 
      | ClassyJacket wrote:
      | I'm no mathematician but I don't believe this is true.
      | Lossless information encoding requires _all_ the original
      | information to be present.
 
        | AnotherGoodName wrote:
        | Arithmetic coding allows you to make a prediction and
        | only provide bits for correction.
        | 
        | Have the de-compressor predict the next data based on the
        | outcome so far (a statistical prediction of next data
        | will be lossy as it won't always be correct). If the
        | prediction is correct you need to spend very little to
        | confirm that. If it's incorrect you'll need to spend data
        | to correct it. Arithmetic coding is the best way to make
        | this work.
        | 
        | It's also been used by all winning entries of the Hutter
        | prize so far.
 
        | glitchc wrote:
        | Or at least reproducible. It could still be compressed.
 
        | vladf wrote:
        | What
 
      | AnotherGoodName wrote:
      | That's literally arithmetic coding which is used by all
      | winning entries in the above so far.
 
  | sytelus wrote:
  | Humans can do lossy or lossless. There are plenty of people who
  | can recite entire Bible or Koran flawlessly.
 
    | kadoban wrote:
    | That's true, but it seems unlikely that that's a particularly
    | important part of intelligence. The vast majority of people
    | do _not_ do that type of memorization, are they still
    | intelligent?
 
    | anonylizard wrote:
    | Many can recite the Koran flawlessly, its short and heavily
    | encouraged in education through rote repetition.
    | 
    | Much, much fewer can recite the bible, its many times longer.
    | 
    | LLMs can also recite the bible and Koran flawlessly, given
    | how frequent the text appears in their training material.
 
    | TheRealPomax wrote:
    | This is more the equivalent of asking humans to create an
    | exact copy of the text, typesetting and all, including the
    | publishing information, page numbers, and exact linebreaks.
    | Not just recite the text, which would be a lossy encoding of
    | the original.
    | 
    | Humans are _terrible_ at lossless encoding of information, it
    | 's what we invented machines for =D
 
    | Supply5411 wrote:
    | And there are humans that can jump 8ft in the air. Doesn't
    | mean it's correct to say that "humans can jump 8ft in the
    | air." Very few people are regurgitating verbatim information.
 
  | mik1998 wrote:
  | Lossy text compression has little utility.
 
    | JumpCrisscross wrote:
    | > _Lossy text compression has little utility_
    | 
    | You're describing every book you've ever read and learned
    | from.
 
| TheAlchemist wrote:
| I mean, come on man. For some reason, the nerd in me sees this
| and immediately adds it on my 'I really need to do this' list.
| 
| Just memories of old times doing some similar (albeit less
| challenging probably) competitions on TopCoder almost a decade
| ago, and also the curiosity to see how I would manage it know,
| with experience. Given that the current scores are also very far
| from what they estimate the lower bound to be, this is really
| interesting ! The prize is however very misleading - per their
| own FAQ - the total possible payout is ~223k euros.
| 
| Definitely not thanking you for the hours I will put into this !
 
| omoikane wrote:
| 500000 EUR is the prize pool. Each winner has to gain at least 1%
| improvement over previous record to claim a prize that is
| proportional to the improvement. Getting the full 500000 EUR
| prize requires an 100% improvement (i.e. compressing 1GB to zero
| bytes).
 
  | lainga wrote:
  | Ah... I had professors who graded like that
 
  | phobotics wrote:
  | Does it or does it just require 1% improvement over the last
  | winner? As opposed to a static additional 1% improvement vs the
  | initial best "score".
 
    | omoikane wrote:
    | It's 1% over the last winner. The latest winner has a total
    | size of 114156155, compared to previous winner of 115352938.
    | The payout was                  500000 * (1 - 114156155 /
    | 115352938) = 5187
    | 
    | (see table near "Baseline Enwik9 and Previous Records
    | Enwik8")
 
  | bigyikes wrote:
  | Probably if you succeed at this, 500,000 will be worthless to
  | you
 
    | sytelus wrote:
    | Why? How does this improvement translates to more financial
    | gains?
 
      | Eduard wrote:
      | because with that knowledge, you will be able to decompress
      | 0 dollar to infinite dollars which the storage mafia will
      | pay you for not publishing your breakthrough in making them
      | obsolete.
 
___________________________________________________________________
(page generated 2023-09-13 23:00 UTC)