You have an existing LaTeX document, and would like to create an EPUB document
from it. LaTeX creates gorgeous printed works, but predates e-books by several
decades. On the other hand, LaTeX is a markup language, and EPUB is basically
XHTML, which is also a markup language, so there is a path. This article
describes that path.

This is part two of the Dictator's Handbook Colophon.

(Previous): Part I: Writing a Book Using Linux Tools Part II: Making an Epub
document from LaTeX When I wrote The Dictator's Handbook, I started with a
LaTeX manuscript and converted it to epub for electronic sales and
distribution. If you do too, here is your situation:

You have an existing LaTeX document, and would like to create an EPUB document
from it. LaTeX creates gorgeous printed works, but predates e-books by several
decades. On the other hand, LaTeX is a markup language, and EPUB is basically
XHTML, which is also a markup language, so there is a path. This article
describes that path.

My LaTeX document didn't rely on too many external packages, a mark in my favor
as each additional package puts you more at risk of some unforeseen
incompatibility. To create the Dictator's Handbook I needed only \\lettrine for
the initial caps and \\epigraph for the quotes that opened each chapter. I
tweaked a few settings, like the number of entries to show in the Table of
Contents, and inter-paragraph spacing, but with those exceptions, my document
was somewhat uncomplicated. I can't be sure this approach would work with more
complicated documents, having not tried it myself.

There are three steps to going from LaTeX to EPUB. None is particularly
difficult, but the graphical tools don't provide a smooth transition yet, and
you will have to do a bit of hand editing. This is annoying, but not difficult.


Convert LaTeX to XHTML (use: htlatex, found in the tex4ht package) Clean up the
XHTML (use: htmltidy) Convert XHTML to EPUB (use: Calibre) Tweak the EPUB's
XHTML docs (use: any text editor)

And if you only want the EPUB for your own purposes and don't want to sell or
publish it, you can actually skip the tweaks in step two.

Before you do anything though, you need to modify your source LaTeX file
because ebooks don't use some features of printed books. Specifically, e-books
create their own table of contents, so having one included in your text is
superfluous and confusing; they don't use an index because everything is
searchable. And footnotes in e-books are weird. Each one is an individual page,
which extends the page count and confuses the flow of the book somewhat. So
take your LaTeX files, copy them into a new folder, and work from there,
leaving your original files intact for production of the printed book.
Checklist:


Remove the line \\maketoc so LaTeX doesn't produce a Table of Contents Remove
the index from your LaTeX source file.  Instead of footnotes, I created
endnotes, which worked much better. You need to modify your LaTeX. Add the
package endnotes and then do a search/replace so that every footnote becomes an
endnote. The notes get hyperlinked, so you can jump back and forth from text to
endnote -- nice!

You now have a LaTeX file you can work with.

Convert LaTeX to XHTML

There are two tools that convert LaTeX to HTML. The first is latex2html, a perl
script that does a decent job of taking LaTeX files and outputting a series of
linked HTML files. I experimented with it for awhile and learned it struggles
with complex LaTeX and doesn't handle some of the formatting, including the
epigraphs and smart quotes. But it will produce a file you can turn into a
readable EPUB file. If you're looking for quick-and-easy, it will work!

It's as simple as: latex2html -split 0 -nonavigation sourcefile.tex to create
sourcefile.html. the "-split 0" flag tells it to create one huge HTML file
instead of breaking it up into chunks, as it would normally do. "-nonavigation"
turns off the navigation bar at the top of the pages. At the end of a few
minutes, you'll have an HTML document you can feed into the next step.

But htlatex, part of the tex4ht set of tools (tex4ht is the package name) does
a much better job, retaining the formatting of even the lettrines (initial
caps).  To use it, type: htlatex source.tex You'll now have source.html.  But
it's HTML, not XHTML (the stricter, tighter, more bulletproof language required
for EPUB. Turns out HTML was intentionally left sloppy in order to encourage
people to build web pages without fear of endless errors). Use HTMLtidy to
convert it. You're still going to have to tweak it, but you're much better off.
In a terminal, issue something like this: tidy -asxhtml -output
seconddraft.xhtml firstdraft.html That takes an HTML file called
firstdraft.html and turns it into an XHTML file called seconddraft.xhtml (See
brainbell.com for an explanation of the difference between HTML and XHTML).
Doing this makes the next step a lot easier, because if you don't, the EPUB
validator will find so many errors you'll never be able to get through all of
them manually.

Convert XHTML to EPUB Calibre is the right tool for this job, and it's a lovely
piece of software undergoing intensive development. I didn't find a copy of it
in my slightly old Linux distro's repositories, but no matter; it's distro
agnostic (depending mostly on Python) and has an installer that, as far as I
can tell, works on almost any Linux distro and BSD. Once you've installed
Calibre, add a new file, and when prompted to select the file, select your
XHTML file. Calibre will import it in XHTML format (that is, no conversion.)
Now use Calibre to set the metadata, choose a cover image, and so on, and then
convert it to EPUB format. It does so nicely, taking about a minute or two. If
all you want is an EPUB version you can read on your own Nook, that's it, you
are done!

But if you want to offer your ebook for sale, you've got some more work to do,
and Calibre doesn't (yet) offer a clean way to do it graphically: you have got
to "explode" the EPUB package into its individual XHTML files and edit them
manually. To find out how much work you have, you need to validate your EPUB
file. Very few publishing houses accept an EPUB file unless it passes a
validation test with no errors. You can validate your file online at a site
like idpf.org but unless you're lucky the first time, you'll be doing it a lot
and you had just as well download the little java app (it's called
epubcheck)from Google code and run it yourself.  java -jar epubcheck
myfile.epub >> output.txt The first time I tested my file, I got over a hundred
errors and nearly passed out on my desk. Later, when common sense prevailed, I
looked into the errors and realized there were only two or three different
errors being repeated ad nauseum. It took some time with a plain old text
editor to fix it.

First, you need to right click the EPUB file and from the context menu, select
"Tweak file." Calibre will explode the EPUB into its components and allow you
to open and edit them individually with your text editor. I used emacs and
you'll soon see why, but any text editor will do. Here's what I had to do:


Fix the Table of Contents: Calibre had made a table of contents that was far
too detailed (down to the subsection level, if I recall), whereas I only wanted
the chapters to appear in the TOC. So I had to remove those entries from the
TOC file by hand (five minutes' work).  Fix the Table of Contents Chapter
Titles: Again, Calibre had automatically created titles that I didn't find user
friendly, so I converted each one to something like "Chap 1: XXXXXX" Fix a lot
of remnant code that the EPUB validator didn't like. In my case, there were
about twenty links where two spans had been inserted by Calibre, and that
wouldn't work. I tested by removing one, and when that worked, did the rest.
Here's where emacs was useful: it was a simple affair to run a macro that
searched for the next instance, moved the cursor to the beginning and end, and
then deleted the offending structure. I would not have wanted to do that dozens
of times manually.

At the end of this process, you've got an EPUB file that will be more than fine
for Barnes and Noble, Kobo, and many other online publishers. The only one who
will still grumble is Apple.

Final Tweaks for the Apple Store The last step is only if you intend to submit
the EPUB to the Apple store as well for distribution through ibookstore
(recommended, given the number of people using their ipads to read books right
now). Calibre adds two files to the structure that contain Calibre-specific
metadata. After you've got every other issue fixed, copy Calibre's EPUB file
somewhere else, open it (remember, it's just a renamed zip file) and remove
them. The EPUB will now also validate for the ibookstore.

Seem like a lot of work?  It's really not, if you think about it.  And as a
bonus, you have not simply given money to some service provider to do it for
you, you've retained total control over your publication's every aspect, and
you've been able to write a book using the best software for the job.  Worth
it!