Ebook Formatting – Part One – Overview


Creating Ebooks

I’ve been doing a lot of ebook formatting lately, and with each new project, there is a new lesson to learn, or a new barrier to cross. This has given me a lot to write about when it comes to ebook formatting, so I thought I would do an overview of the process, then cover some specific issues I have ran across recently.

Virtually every e-book reader platform utilizes some form of HTML as the programming language. Your goal then, in creating e-books, is to generate the cleanest HTML code possible that can be read by the largest number of devices – dependably. The method described below yields the cleanest, most robust HTML code that I have found.

The Genesis – every ebook in existence started in much the same way – an author recording his thoughts in some fashion. Regardless of whether that author initially wrote on paper, at some point, every ebook was entered into a word processor.

As a writer myself, I spend a lot of time in front of Microsoft Word, carefully crafting my story. I’ve been using Word for years, and I know a lot of the functionality that helps me save time and frustration when I’m writing. But once the story is written and it’s time to create an ebook, my time with Word is almost over.

There are a few good habits to remember when you are writing your book in Word, and a few things that you should avoid. My books will all become ebooks in the end, but I also create print books from my Word file. So, while I’m writing I have three goals in mind:

1) Finish the story

2) Have my Word file ready for ebook conversion

3) Have my Word file ready for print book production

When I’m writing, I do not concern myself at all with the formatting on the page. I set my line spacing to a comfortable level, set my view to page width, and I turn on “show non-printing characters” so I can see the symbols that are encoded into the document. To do so, click the “Pilcrow” button on the “Home” tab in the “Paragraph” section.


My only concern is that there is a paragraph return at the end of each paragraph, and of course, the rest of my punctuation is in place. Beyond that, I’m not concerned with page size, margins, headers, footers, page numbers, Chapter Headings, indents, or anything else.
(*A note on Indents – save yourself time and trouble by NOT using spaces or tabs to indent the first lines of your paragraphs. Word has a function to set this and it’s simple an consistent, as well as adjustable. Look it up…) 
That all comes later and will save me a ton of time if I ignore those things while writing the book. The only “formatting” thing I do is create a blank line before and after the chapter headings, just to create some white space around them, but even that is not necessary. Bottom line is – write the story and don’t worry about making it look pretty, or formatting it along the way.

When I’m finished writing, I’m going to check a few things. I have my copy of Word set up with all of my preferences and auto-correct features, but if I am formatting someone else’s work, I do the following:

  • Using Word’s “search and replace” feature, I’m going to find all occurrences of the double quote marks, (“) and replace them with double quote marks. I know it sounds silly, but this will invoke Word’s “Smart Quotes” feature and replace all double quote marks with proper curly quotes. I get a lot of books to convert that have a mix of quote mark styles, and this will make them all consistent – and – pay dividends in future steps.
  • I will do the same thing with single quote marks.
  • I will then type three periods in the “find” box, and a proper ellipsis in the “replace” box, then hit “replace all.” (There is an actual ellipsis character that is three dots… but when properly done, the dots cannot be separated and become a single character. This prevents the dots from separating one line to the next. Details in upcoming post.)
  • I will then “find” all — double dashes and “replace” with a proper “EM Dash” which is a single, longer dash.

Once this is done and I feel it’s ready for e-book and print book creation, I save two versions of my Word file. One will be called “My Story Ebook.docx” and the other will be called “My Story Print.docx” At this point, we have two versions of the same file, each of which will be formatted differently from this point forward.

I will create my e-book now, from the file designated for the e-book version.

With my e-book file open in Word, I’m going to prepare the text for creating a NEW e-book source file. This file will eventually be an HTML file, but NOT converted to HTML through Word. When word converts a file to HTML, you get a boatload of unnecessary “junk” code added to the HTML that just bogs things down and makes the HTML bloated and slow. What we are shooting for here is a “clean” HTML file, that is free of all the junk tags that Word adds to it. I will then “massage” this clean HTML to create a fully functional “source” file for further e-book conversion. This file will be free of all the formatting from Word, and will perform very well and dependably.

At this point, I’m going to urge you to read on here, but the details of this process have been covered very thoroughly by friend and fellow author Guido Henkel on his blog. Click here for specifics on this process, but I also urge you to read Guido’s entire series, titled “Take Pride in your e-book formatting.”  Read it. Study it. Get to understand it and you will be formatting like a pro.

If you took the time to read Guido’s excellent formatting articles, then you are up to speed. If not, I will continue with a high-level overview of the process, and include more details in future posts, expanding on some of the things I’ve learned from Guido, as well as a few things I’ve discovered on my own.

With Word still open, we are now going to add some HTML tags directly into our Word document, which we will then copy and paste into a programming editor. When we do this, we will lose ALL of the formatting created by Word – all italics, all bold, all headings, all indents – everything but the text. But don’t worry, it will be easier than you might think, and makes for a better e-book.

Again, this is covered in Part VI of Guido’s formatting series.

What we are going to do, is let Word’s powerful “find and replace” feature help us prepare the HTML we need and save a bunch of time, and preserve some of the desired formatting we want to retain. We are going to give Word’s “find” feature the task of finding all italic fonts, and wrap them with the italics HTML tags <i></i>

With the cursor in the Find Box, hit CTRL+i and Word will look for all instances of italic fonts.

find italics

find italics

In the replace box, we type;
Then click “Replace All”

This is a set of HTML tags for italics, and between them is a wildcard search term that will look for all italic fonts and wrap them the the tags.

This will preserve the italics font style when we take this text into the HTML editor.

When you go looking at your text in Word, all cases of italics should now be wrapped in tags.

In Guido’s series, he suggests not wrapping bold text with the proper tags. <b></b>
He has a reason for this – which he explains in the series. Most novels do not make use of bold fonts, except in Chapter Headings. If the only bold text you have in your book are Chapter Headings, then follow Guido’s advice, and we will handle them differently. However, I have formatted a number of e-books that utilize bold fonts – in blurbs, in quotes, in references, etc. If you have bold text in your book, I suggest following the same procedure and wrapping them with tags now. Chapter headings will be handled differently anyway.

To find and wrap all bold text, type CTRL+b in the find box, and <b>^&</b> in the “replace with” box. Then click “Replace All.” Now all bold and italics are wrapped in tags, and ready for the programming / HTML editor.

Now, still in Word, select all text (the entire document) and copy it to the clipboard. It’s time to move to the programming editor and finish the work.

At Guido’s suggestion, (since I’m on a PC) I downloaded a copy of jEdit – a free programming editor that works nicely for this task. Download and install the program, if you haven’t already. You will also want to install an available plugin called “JTidy” which will do a lot of work for you with a single click.

In jEdit, open a new workspace/file. (File/New)

Now paste the entire text from your book into jEdit.

What you should see is that every paragraph of your book is on a single, long line. Don’t worry, in just a minute it will look fairly normal again. While it’s in this state, we need to identify each of those lines as a paragraph for the HTML file. To accomplish that, we will use the “search” features in jEdit. Here’s a close-up screenshot of how it should look:



Now we can quickly wrap all of these lines with HTML paragraph tags.

In jEdit, go to “search” on the menu.

Then click “find” in the drop-down menu.

This will open the “Find and Replace” dialog box, and you have a selection to make.

Below the main dialog window, you will see a series of check boxes for search options. We will want to click on the option for “Regular Expressions” for our next search string.

With the “Regular Expressions” box checked, we will now enter in a wildcard “search for:” string and “replace with:” search string.

In the  Search for: window, type ^(.+)$, and in the “replace with” box, type <p>$1</p>

Click on “Replace All” and each paragraph line is now wrapped with <p> tags so that HTML will recognize each paragraph as such.

The next step is to let jEdit know that we are working with an HTML file, so we need to wrap everything with the proper HTML identifiers. One click with the jTidy plugin will do the trick. On the jEdit menu, go to “plugins” and then select jTidy from the menu.

Search and Replace

Search and Replace

On the jTidy flyout, select “Tidy Current Buffer.”

This plugin now has placed the proper HTML code into the file, both above and below your ebook text.

jTidy has also done another task for us that is greatly beneficial: It has converted all special characters into “named entities.” This means that all double-quotes, single-quotes, ellipsis, em-dashes, and other special characters have an HTML name and are now embedded in your document. Using this technique yields the most dependable, cross-platform HTML file that you can create.

Next step is to save the file as HTML.

On the jEdit Menu, go to “file” and then “save as” and select a name and location for your file. Make sure to give it an HTML extension. The name should be “my story.html”

Once you hit save, you will notice another difference in the look of your jEdit file – all of the tags turn blue, and all of the names entities turn magenta. All of your HTML tags are in place, and everything is looking good and ready to make the last tweaks.

Now it’s time to create your CSS “styles” as Guido describes in Part VII of his series. Read that now if you haven’t already.

CSS styles can give you an incredible amount of control over how your ebook displays on various devices, and is well worth playing with and experimenting with to get your ebook just right. Guido does an excellent job explaining styles, and I will create another post to expand on what Guido has explained so well, with examples that I have used to make my ebook formatting the best it can be.

At this point, your HTML source file can be converted by a number of different methods into a number of formats. Again, that will be covered in a post coming soon.


This entry was posted in 1. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *