Wednesday 9 March 2011

ebook adventures

We've been working long and hard to get two new ebook formats into production; Amazon's Kindle format and Apple's iBooks format. They're similar but different enough to create a whole host of file conversion problems.

Some of this post is rather technical, which is aimed at anyone who is having similar problems and understands what we're babbling about.

Our first ebook release is The Pitching Bible by Paul Boross. It's a 70,000 word book with around 100 images, so it was quite a challenge to format it correctly.

ebook readers such as iPads, Kindles and a whole host of less popular devices as well as software readers for PCs share a fundamental design principle; they display text. Because the file format used for web pages, HTML, is a ubiquitous and simple text formatting language, it's perfect for use in ebook readers. An ebook is essentially a mini website stored locally on the ebook reader. Whilst Apple's iPad is a complex product capable of displaying many different file formats on its high definition screen, Amazon's Kindle uses e-ink technology. Its power consumption is tiny, giving you enough battery life to last through your summer holiday, but it can only display text and greyscale images.

The majority of ebooks are text only, so support for images in an ebook format is actually quite messy.

Here's the meandering and torturous route that we took to finally get everything working.

First; Amazon. Amazon like to hold their cards very close, so they let you upload a 'raw' HTML file which they kindly convert for you. A helpful hand? Maybe, or another way to look at it is that all Kindle conversions go through Amazon which means that they have total control over distribution and therefore royalties. You can't load an ebook onto your Kindle without going through Amazon. Whilst we could debate Amazon's business practices, from a technical point of view, this ebook was relatively easy to set up. The only downside is that you can't fiddle with the formatting; once it's uploaded, you have to wait for it to be approved before you can then upload a revised file, so if the formatting isn't quite right then it's easier just to leave it alone. Amazon don't make life easy when you're a perfectionist.

Apple's iBooks are far more complex. Apple use a 'standard' format called epub which, apparently, is the future of the ebook format. It's much more complex, so what can it do that good old HTML can't? So far, we can't find anything. It is, however, much more difficult to set up.

We use OpenOffice for the actual writing and formatting, and export the book as a HTML file. Then we used a piece of software called eCub to convert the HTML book to an epub file. Then we used another piece of software called epubchecker to tell us everything that was wrong with the ebook. Finally, a piece of software called Sigil allowed us to make the changes to correct the errors.

We went through about 20 file conversions before realising that the strange and meaningless errors displayed by the iPad were caused by exporting the book from OpenOffice as HTML instead of the more complex XHTML, even though eCub is supposed to convert HTML to epub. An epub file is actually just a renamed XHTML file, with all of the supporting files such as images packed inside.

OpenOffice fills the exported XHTML files with an unbelieveable amount of junk; formatting and styles, peculiar 'span' tags that only contain apostrophes and other miscellany. This creates two problems. Firstly, all of this hidden text doubles the file size. Secondly, the hidden text isn't actually hidden. Whereas a web browser wouldn't display all of the formatting, the iPad displays lots and lots of empty space instead. So, we went through and manually took all of the unecessary formatting out. Perfect!

The next problem was images. We create images for books using Inkscape, a SVG drawing program. Images are output in .png format and imported into OpenOffice. Being lazy, I make the images bigger than necessary and size them in OpenOffice so that their resolution is always more than 600dpi for printing. The problem with this is that when OpenOffice converts the file to HTML or XHTML it exports them at full size with image 'width' and 'height' tags to resize them. On the iPad, the images looked terrible. The first solution I tried was to resize all of the images manually and then take out the width and height attributes of the 'img' tag, however this just resulted in the iPad not being able to display the book at all. So we bit the bullet and re-inserted all of the images back into OpenOffice at the correct size so that OpenOffice would format them at 100% of their original size. In OpenOffice, the images were tiny and most were completely illegible. Yet when exported to XHTML, they all displayed at the correct, glorious size. In future, we'll be creating images at just the right size in the original document.

Once we had the image size issue fixed, we went through the OpenOffice - eCub - epubchecker - Sigil sausage machine again and the iPad opened the epub file perfectly.

So, finally, we have our first working epub iBook. The next challenge is to get Apple to accept it into the iBookstore, so we'll keep you posted with our continuing adventures.

No comments:

Post a Comment