[HN Gopher] GPT Unicorn: A Daily Exploration of GPT-4's Image Ge...
___________________________________________________________________
 
GPT Unicorn: A Daily Exploration of GPT-4's Image Generation
Capabilities
 
Author : imdsm
Score  : 51 points
Date   : 2023-04-13 20:40 UTC (2 hours ago)
 
web link (adamkdean.co.uk)
w3m dump (adamkdean.co.uk)
 
| dr_dshiv wrote:
| \\            \\             \\  ,__,              \\ (oo)____
| (__)    )\\                      ||--|| *
| 
| "Draw an ASCII unicorn" (GPT4)
 
| Mystery-Machine wrote:
| Would be great if these days would have dates as well. Otherwise,
| there's little use of "Day 69". If I could see "Day 69 (June 21
| 2023)"
 
| thomasfromcdnjs wrote:
| Might as well make a Twitter account! or get AutoGPT to do it.
 
| bestcoder69 wrote:
| It can draw a penis fyi
 
| MH15 wrote:
| Would be useful if the prompts used to generate the drawing code
| were included in the site.
 
  | abrichr wrote:
  | They appear to be here:
  | 
  | https://github.com/adamkdean/gpt-unicorn/blob/master/src/lib...
  | { role: 'system', content: `You are a helpful assistant that
  | generates SVG drawings. You respond only with SVG. You do not
  | respond with text.` },           { role: 'user', content: `Draw
  | a unicorn in SVG format. Dimensions: 500x500. Respond ONLY with
  | a single SVG string. Do not respond with conversation or
  | codeblocks.` }
 
    | dmix wrote:
    | Sadly it outputs raw svg code so you have to save it locally
    | as .svg to see it. Or just insert it into an HTML page via
    | devtools if you're lazy like me.
 
    | bee_rider wrote:
    | "You are a helpful assistant" seems like it is always
    | included in these sort of prompts. I wonder if it really
    | helps...
 
      | LeoPanthera wrote:
      | It's quite funny to tell it that it is an unhelpful
      | assistant. During the first few responses it is amusingly
      | obstinate.
      | 
      | It always seems to revert back to "helpful assistant" after
      | a few messages, whatever the prompt says.
 
      | Guillaume86 wrote:
      | It's too generic I think, my prompt immediately gave me a
      | better result that the ones in his post:
      | You are a SVG expert, when asked by the user to draw
      | something, you reply to the best of your ability with SVG
      | code that satisfies the request.
 
| ShamelessC wrote:
| As is noted in the paper from which this is inspired from:
| GPT-4's image generation capabilities were severely diminished by
| the instruction/safety-tuning process. Unfortunately this means
| the currently available model from the API won't be very capable
| - certainly not as capable as the early version of GPT-4 that
| Microsoft had access to.
| 
| edit: I'm specifically referring to the "image generation by
| trickery (e.g. SVG)" technique being diminished. Other tasks were
| diminished as well though - is my understanding.
 
  | og_kalu wrote:
  | It's not just image generation the rlhf worsens too.
  | Calibration (confidence on solving a question in relation to
  | ability to solve that problem) went from excellent to non
  | existent. and you can see from the report that the base model
  | performed better on a number of tests. Basically a dumber
  | model.
 
    | tbalsam wrote:
    | Not dumber. More biased.
    | 
    | Important distinction, especially if we're looking to push
    | back out towards the Pareto Frontier of the problem.
    | 
    | RLHF is still very much in its infancy and does not maximize
    | the bias-variance tradeoff by a long shot, in my personal
    | experience.
 
      | og_kalu wrote:
      | No dumber. Sure more biased too if you want but also
      | dumber. Open ai have indicated as much.
 
        | psychphysic wrote:
        | Also generally less creative and insightful.
        | 
        | "No I won't do it" becomes a good option no matter what
        | if you turn safety too high.
 
      | ShamelessC wrote:
      | My understanding is that OpenAI did indeed find diminished
      | capability across a range of tasks after doing RLHF. You're
      | correct to question this though - as I believe the opposite
      | was true of GPT-3 where it improved certain tasks.
      | 
      | The benefits from a business perspective were still clear
      | however, and of course the instruction-tuned GPT-4 model
      | still outperformed GPT-3, in general.
      | 
      | There are probably some weird edge cases and nuances that
      | I'm missing - and I'd be happy to be corrected.
 
    | arthurcolle wrote:
    | Are you saying this specifically for the GPT-4 API endpoint
    | compared to idealized described GPT-4 from the paper?
 
      | og_kalu wrote:
      | yes the public api (or on paid chatGPT) vs the base model
      | from the paper
 
| Varqu wrote:
| Is anyone else also getting tired of seeing "GPT" prefix / suffix
| in the name of 90% new AI-related products?
 
  | mustacheemperor wrote:
  | Given this is a process specifically to evaluate the changing
  | performance of GPT-4 over time, it seems appropriate.
 
  | squeaky-clean wrote:
  | This isn't a new AI product. It's a (seemingly auto updating)
  | blog entry about GPT-4
 
| ansk wrote:
| This is a great rorschach test. Show these four images to someone
| hyping AI, and if they see evidence of a growing/emerging
| intelligence, you can diagnose them as being wholly unqualified
| to comment on anything related to AI.
 
  | syntaxing wrote:
  | I don't get it, wouldn't something like HuggingGPT be able to
  | command stable diffusion to do this? Just because GPT can't do
  | this natively doesn't mean it's not possible with the right
  | framework?
 
    | ansk wrote:
    | These images were all generated by an identical model. The
    | fact that this individual has convinced themself that the
    | model is improving indicates that they don't understand how
    | these models are trained and deployed. Furthermore, any
    | conclusions reached on such limited data reveal more about
    | one's predisposed opinions than anything about the nature of
    | the data. Show this person an ink blot and they very well may
    | see an image of a superintelligent AGI.
 
| einpoklum wrote:
| Perhaps you should ask it to draw you a sheep.
 
| dangond wrote:
| > The idea behind GPT Unicorn is quite simple: every day, GPT-4
| will be asked to draw a unicorn in SVG format. This daily
| interaction with the model will allow us to observe changes in
| the model over time, as reflected in the output. Is it useful to
| do this every day? Correct me if I'm wrong, but my understanding
| is that OpenAI does not update the models available in production
| incrementally on a day-to-day basis.
 
  | sacred_numbers wrote:
  | They do update the model in the background, although I'm not
  | sure how often or how much they update it. To avoid issues with
  | this practice they offer gpt-4-0314 which says this in the
  | documentation:
  | 
  | "Snapshot of gpt-4 from March 14th 2023. Unlike gpt-4, this
  | model will not receive updates, and will only be supported for
  | a three month period ending on June 14th 2023."
  | 
  | Unfortunately this experiment is using the frozen snapshot
  | model gpt-4-0314 instead of the unfrozen gpt-4 or gpt-4-32k
  | models, so any differences are literally 100% noise. This would
  | be a somewhat interesting experiment if someone were to use an
  | unfrozen model, though. I do appreciate the author for
  | captioning the images with the exact model they used for
  | generation so that this bug could be caught quickly.
  | 
  | [0]https://platform.openai.com/docs/models/gpt-4
 
  | charcircuit wrote:
  | Similarly the quality of the model can't be judged with a
  | single sample. These end up canceling out.
 
| sp0rk wrote:
| Did you generate a bunch all at once before starting to get some
| idea of what the natural variance looks like? I would think it's
| important to verify some level of progression over time, because
| with the current four it seems entirely possible that the
| examples could have all been generated at the same time with no
| changes to the model.
 
  | gwern wrote:
  | Also unclear if he's sampling at temp=0. Looks like he doesn't
  | set a temp? https://github.com/adamkdean/gpt-
  | unicorn/blob/8ad76ec7161682... So not sure what he's really
  | doing.
 
  | ratg13 wrote:
  | Aren't they using the March 14 model like the general public?
  | 
  | It's frozen in time, there are no updates to it..
  | 
  | All of these will be drawn using the same model until they push
  | a new update, or you switch to a different GPT
  | 
  | But I already think they proved the point that the generation
  | is random enough that it would be extremely difficult to track
  | progress this way.
 
    | williamstein wrote:
    | GPT's output is by default somewhat random. If you ask the
    | same exact question several times, you'll potentially get
    | several different answers. Each successive word in the output
    | is chosen from a distribution of possibilities -- that
    | distribution is fixed, but that actual sample chosen from the
    | distribution is not fixed. See, e.g.,
    | https://platform.openai.com/docs/api-
    | reference/completions/c...
 
| startupsfail wrote:
| Sampling a single noisy sample from a model that doesn't update
| that often is hardly correlated with the claim of "Daily
| exploration".
 
| dang wrote:
| The unicorn example is discussed at length in Bubeck's recent
| talk:
| 
| https://www.youtube.com/watch?v=qbIk7-JPB2c#t=22m6s
 
| dmix wrote:
| Why would the model change over time when asking the same
| question? Just it's generation dataset for generating similar
| images? Or is this just tracking GPT's explicit model
| improvements over time?
 
  | pps wrote:
  | "GPT 5 Will be Released 'Incrementally' - 5 Points from
  | Brockman Statement" -
  | https://www.youtube.com/watch?v=1NAmLp5i4Ps
 
    | atleastoptimal wrote:
    | gpt-4-0314 is a snapshot model and won't be updated, they
    | shouldn't use that for this experiment.
 
  | tbalsam wrote:
  | The models seem to have been changing in the background, though
  | as another commenter pointed out.... having a variance-
  | calibrayion baseline for humans would be great too. :'))))
 
| m3kw9 wrote:
| Are they banking on OpenAI updating their model every day, or
| just prompting the same thing everyday wishing for a different
| outcome?
 
  | qumpis wrote:
  | In the "sparks of AGI" paper, authors noted that the unicorn
  | shape degrees as more "alignment" is injected to to. If openai
  | adjust the model (say by training more), the picture should
  | reflect it. If they make the model be more "aligned", it should
  | reflect as well.
  | 
  | So I'd guess the answer is the former.
 
| atleastoptimal wrote:
| if GPT-4 will update based on recent web training data, the fact
| that people are bringing much more attention to the "draw a
| unicorn" task magnifies the chance someone will have posted a
| perfect version of an svg unicorn, leading the model to leverage
| that rather than the aim of this experiment which I imagine is
| GPT-4's capacity to extrapolate.
| 
| EDIT: Also it makes no sense to constantly retry it every day on
| the gpt-4-0314 model, since OpenAI specified that that is a
| snapshot model that will not be updated.
 
___________________________________________________________________
(page generated 2023-04-13 23:00 UTC)