============
Introduction
============

Recently, I made a music video for one of my songs. I made the video
using Python 3 and Asciimatics_.

I'm interested in lo-fi videos, and this was a first experiment.
While my experiment existed solely within the bounds of a terminal
window, there were a number of lessons learned through the process.

These lessons will help me, as well as others who are interested in
performing similar lo-fi experiments.

============================
Use a real time (not frames)
============================

Asciimatics uses a single integer "frame" counter.

Regardless of how fast the screen is updating, it's easier to deal with
simple seconds and fractional seconds. Approximate timing is okay, especially
if we can readily use timing data gathered from other sources.

For instance, there's timing data in closed captions. It's also easy to create
timing data in something like Audacity. All of these use either human time or
seconds and fractional seconds.

Leverage your other tools by dropping the notion of frames. If you
_really_ need to be frame-precise, consider using a frame-separator in
other timestamps, maybe using an '@' as a separator between the seconds
and frames.

Idealized timeline
~~~~~~~~~~~~~~~~~~

A show can be thought of as a series of variable-length scenes, strung together.

In my music video, I had a start screen before the music started.

You need to be with flexible intro and outro content, while also fully supporting
binding the video to the audio's location. In most cases, you may be able to get
away with having a "starting time" for a scene which is simply subtracted to all
action in a scene.

If I have three scenes, A, B, and C, and I know they start ten minutes apart,
it could be something as simple as::

    scene_a = Scene(0)
    scene_a.at('0:05', do_it)
    # ... add 10 minutes of content (originally only 5)
    scene_a.at('5:05', part_of_scene_a)
    scene_b = Scene('5:00')
    scene_b.at('5:05', part_of_scene_b)
    # ... add 15 minutes of content
    scene_c = Scene('25:00')
    scene_c.at('25:05', do_it)
    show = [ scene_a, scene_c, scene_b ]

Since each scene can know its own starting point, it can keep the scene timing
consistent, even if the order of the scenes themselves change or the earlier
scenes change length.

Do you want to add a commercial? You shouldn't have to dick with timings.
Just create the scene and stick it in the show::

    commercial = Scene('3:00')
    show = [ scene_a, commercial, scene_c, scene_b ]

That sort of scene movement only works when music is bound to scenes, and not
the whole show, of course.

Going further
-------------

For an audio book or other long-form audio stream, you should be able to grab
the audio file and split it up in to separate scenes.

Getting markers for splits is, as mentioned, easy enough to gather in Audacity,
but -- while you can split things up in Audacity, it shouldn't be required.

If you need to use Audacity anywhere to find the split-points, it's not
really a time-saver. If, however, you can gather split-points within the
application, things get more interesting.

Splitting audio at scene-breaks allows you to use scene-breaks as explicit
restart points when iterating on a scene. It's faster and easier to only
allow jumping forward and backward at scene changes, as you know the screen
will start from black.

This means on the back-end, we'll need to track the time of the scene
change for the audio to support this, anyway. If we have the data, we
should support splicing a new scene in to that location.

Features and timeline
~~~~~~~~~~~~~~~~~~~~~

mvp:

    Timeline using real time units. The line between "scene" and
    "show" can be blurry or not exist.

mvp+1:

    Shows made of series of stitched-together scenes. Scenes described
    with starting times that may not map to their play-time.

mvp+2:

    Scenes and timelines integrate with long audio tracks and
    arbitrary starting points within those tracks.

==========================
Synchronize with the audio
==========================

My first experiment used PyGame_ to run the song. This back-end is designed for
background music in games.

You need to be able to query the audio to see where it is. If the audio isn't
where it is expected to be, you need to hold everything up until it catches up.

PyGame doesn't support this. It's more of a fire-and-forget service.

Idealized timeline
~~~~~~~~~~~~~~~~~~

In the very least, you need to delay the start of a scene until the
audio starts moving.

The disadvantage of audio running in a separate thread (as is normally
done) is that it may not be at the same place as the animation thread.
The play speed shouldn't have glitches, but the start times can be
a bit wobbly.

At the very least, you need to support pausing until the audio is
ready. Having all sense of time come from the audio goes one step
further, as it makes the primary timekeeper the audio system.

Some systems (such as PyGame) have a distinction between a sound effect
that is loaded entirely in to memory and a streamed background music
file.

Even if you're technically dealing with background music, getting the
timing right may require loading more of the file in to memory more of
the time. Accept that you may need a whole song in memory, and that
you can only reasonably change this during scene breaks.

Going further
-------------

You should be able to do your final rendering non-real-time, so you'll
always be properly synchronized.

Non-real-time is the ideal for keeping audio and video synchronized.
It allows you to bite off small pieces of audio at a time and know
that everything will line up.

Features and timeline
~~~~~~~~~~~~~~~~~~~~~

mvp:

    Preload entire audio file in to memory and avoid streaming from
    disk. 

    Try to change scenes or cameras at background song boundaries.
    Keep these as isolated units, knowing they'll get stitched together
    during editing.

mvp+1:

    Start of video is delayed until the audio starts playing.

    If each scene is independent, a scene may have a pause to start. This
    can be corrected in post, as needed, but should keep audio and video
    consistent.

    It might be useful to have the video's sense of time to come from the core
    background track, I'm not sure that's 100% needed without further testing.

mvp+2:

    Non-realtime rendering insures that audio and video is always synchronized.

    This is by far the gold standard. Ideally, you can do this at faster than
    real time.

======================
ASCII as a visual form
======================

I used big Figlet_ ASCII Art fonts for my test video.

Monitors are bigger and higher resolution than ever before, right?

But this is really what you need. Huge text, even in text mode.

Some of the viewers will watch it full-screen, sure, but a significant
population will half-distractedly watch a thumbnail instead.

If it's a silent film, folks will need to go larger to read what is
going on. So, in a silent film context a text-based Roguelike user
experience may still work? Further experiment is required.

Still, consider going for an older aesthetic and angling for 40x25 (or
thereabouts) instead of something more modern. 

Idealized visual form
~~~~~~~~~~~~~~~~~~~~~

I'm still thinking about old school RPGs.

Fixed camera at best. Top-down maps. A few fixed expressions in close-up.
Maybe a giant close-up like you find in visual novel games. 

And a dedicated section for dialog to appear.

Maybe menu-style alternative dialog, of course this would be just a fake,
but it would be easy flavor.

It would be mostly tile-based with a few larger graphics now and then.

It would probably be less than 40 tiles wide. The Roguelike people have
to make a lot of compromises about visible map size versus map quality,
so if you're curious about the how and why, you can always look there.

Going further
-------------

Honestly, I'd really like to have something like `The Sims`_ where instead
of semiautonomous entities, you just had actors you could control and
play and rewind their time.

There is MakeHuman_ which provides an open-source method to generate and
render humans. It has a lot of output formats.

I wouldn't mind using the entire virtual worlds of The Sims, though.
If we had the capacity to use assets from The Sims, (on-par with,
say, other open-source games that require comercial assets), it would
allow us to use the third-party assets as well, of which there are
considerable and some with decent licenses.

There are other 3D games we might be able possibly to leverage, but few
are designed for normal, ordinary world stuff like The Sims.

`Garry's Mod`_ might technically work, but modifying maps is a fair bit
more complicated, and it uses a commercial engine... Then there's the mod
community that mostly just steals stuff from commercial games and is
full of fascists... Not very appealing.

Features and timeline
~~~~~~~~~~~~~~~~~~~~~

mvp:

    Modeled after an RPG, or a text-based Roguelike. A dedicated place
    for the dialog. The right versus left, main character versus
    whomever being talked to. It's an easy UX to write that's flexible
    for many types of stories.

mvp+1:

    It's possible to experiment with 3D without actually having a 3D
    game. The portraits can be animated 3D models, there can be
    cut-scenes. These, too, are standard components of games.

mvp+2:

    This would be a 3D video, so more like a silent cartoon. Instead
    of the interface having a dedicated place for dialog, it would be
    handled more like standard closed captions.

=========
Phase One
=========

Timeline using real time units. The line between "scene" and
"show" can be blurry or not exist.

Preload entire audio file in to memory and avoid streaming from
disk. 

Try to change scenes or cameras at background song boundaries.
Keep these as isolated units, knowing they'll get stitched together
during editing.

Modeled after an RPG, or a text-based Roguelike. A dedicated place
for the dialog. The right versus left, main character versus
whomever being talked to. It's an easy UX to write that's flexible
for many types of stories.


Visual Idea
~~~~~~~~~~~

Here's an idea for a roguelike visual (since they map to documents
easier)::

    +----------------------------------------+
    |                 ",,,,.........."       |
    |                 ",,,,.,,""""".."       |
    |                 "####'##"   "..*""*    |
    |                  #...AB#    *....."    |
    |                  #.....#    """*..*"""*|
    |                #####D######    "".....0|
    |                #..........#     ""*"""*|
    |                #...>......#            |
    |                ############            |
    |                                        |
    +----------------------------------------+
    |Betty can:                              |
    |  signal to Ada to leave, ASAP.         |
    |> ask for garlic (nicely).             <|
    |  mock the blood on his necktie.        |
    |                                        |
    +----------+-----------------------------+
    | Ada      |Dracula: Good evening!       |
    |>Betty   <|Ada: We're here to fix your  |
    |          | computers.                  |
    |          |Dracula: The basement is     |
    |          | over here!                  |
    +----------+-----------------------------+
    

    +----------------------------------------+
    |                 """"""""""""""""       |
    |                 ",,,,.........."       |
    |                 ",,,,.,,""""".."       |
    |                 "####+##"   "..*""*    |
    |                             *....."    |
    |                             """*..*"""*|
    |                                ""...AB0|
    |                                 ""*"""*|
    |                                        |
    |                                        |
    |                                        |
    |                                        |
    |                                        |
    +You see:--+Near Old House---------------+
    |0: to car |The house appears ancient    |
    |          | with fine, hand-crafted     |
    +----------+ details now falling to ruin.|
    |>Ada     <|Betty: We're lost, Ada.      |
    | Betty    | Admit it!                   |
    |          |Ada: We're not lost! We're...|
    |          | ... Alright, Betty. We're   |
    |          | lost.                       |
    +----------+-----------------------------+

Source Idea
~~~~~~~~~~~

Here's a potential source snippet leading up to the above::

    ada = Actor('Ada', player=True)
    betty = Actor('Betty', player=True)
    dracula = Actor('Dracula')
    passage = Thing('to car')
    welcome_scene = Scene('0:00', map='dracula_floor_1', audio=ambient_creep,
                        title='Near Old House',
                        place={'A':ada, 'B':betty, 'D':dracula, '0':passage})
    betty.follow(ada)
    betty.say('0:01', "We're lost, Ada. Admit it!")
    ada.say("We're not lost. We're...")
    ada.say(1, "... Alright, Betty. We're lost.")
    ada.move_to('0:05', Scene.map.find('+'), proximity=3)    
    betty.choice("Dare Ada to lie about why we're here.",
                 "Say: We're computer technicians.",
                 "Say: We're here to suck his blood!",
                 "Say: We're pest control.",
                 pick=0, delay=0.5)
    betty.emote('smiles and looks at Ada.)
    ada.emote(0.2, 'squirms. "You have an idea.'
                     ' It's a bad one. That's your bad idea face.")
    betty.say("We should say we're here to suck his blood.")
    ada.say("What? No.")
    ada.say(0.2, "There's no reason he'd let us in if we said that.")
    betty.emote(0.2, 'nods. "You're right. We should do something else."')
    betty.say(0.1, "I know. I dare you to say we're computer technicians.")
    ada.say('What?')
    ada.say(0.5, "You're mean. You know that, right?")
    welcome_scene.wait(0.2)
    return welcome_scene

==========
Reflection
==========

It's interesting that nothing about my example actually needs the
background track to be sample-precise with the visual. How important is
that, really? Maybe this is something that's only really needed for the
lyric tracks and when there's explicit syncronized timing.

(For sample-precise timing to music, you might think of having a
dedicated MIDI track for the action triggers. However, that's different
than my above example.)

Even the "real timeline" thing is a bit fuzzy. Scenes start with a real
time that's used as an offset for timestamps mentioned in the scene,
yes. But what I actually use in the example are mostly relative time in
seconds.

The given example has what could be a looping ambient track for the
background. I think of it going silent and a knocking sound as part of
the transition to the scene with the door open, but... I can also see
long ambient tracks that fit multiple scenes.

This means we'd need an `advisory_start` which would start audio within
the file if you're jumping in to it, but let it flow naturally if you're
starting at a previous scene. Ideally, this could be part of the next
bit...

Not all scenes will have fixed starting state. Sometimes state will
depend upon previous state. We still need to jump to arbitrary scenes to
aid in development. We can manage this by caching scene state at the end
of scenes when this is needed.

We could either always overwrite, or create a new file separate from
the working file and make the developer manually overwrite. I favor
always-overwrite, but user-overwrite would be more like traditional
film. (I want fast and easy. Post-processing audio, as for traditional
film, is neither of these things.)

Roguelike games can easily have a dedicated region for text. My example
above was narrow, but I think if it's a Roguelike aiming for 80+ by
something 24 or greater is reasonable. Probably with three panes instead
of whatever I was thinking above, one for map, one for dialog and feedback,
and another for equipment or stats or even inventory.

A design aiming after a GUI RPG allows us to have potraits, but turns
back and forth dialog in to what is effecitvely a cut-scene. There's
nothing wrong with that, but it's different work than the main stuff.

Graphic RPGs will have smaller maps than the text-based games. Any
graphic RPG game that uses a "minimap" of some sort does so because
the primary view is pretty but doesn't convey enough information about
where you are in relationship to your objectives. You see this less with
third-person turn-based games than with first-person live-action games,
but this is totally fine for our particular use-case. Huge, pretty tiles
and a light-weight sketch of the neighborhood in a corner for flavor.

If a show were to mostly have back-and-forth dialog, it should
probably aim to feel more like a visual novel game and not an RPG.
This would be lots of dialog with big portraits and usually some
relationship-based questions.

.. References (inline links when rendered)

.. _Asciimatics: https://github.com/peterbrittain/asciimatics

.. _Garry's Mod: https://gmod.facepunch.com/

.. _Figlet: http://www.figlet.org/

.. _MakeHuman: http://www.makehumancommunity.org/

.. _PyGame: https://www.pygame.org/

.. _The Sims: https://www.ea.com/games/the-sims

----
Category: Essays and Thoughts
------------------------------------------------------------------
Home
Phlog
Products
Categories
Keywords