Skip to content

Converting PDFs to multiple HTML pages with pdftk and pdftohtml

June 21, 2012

As already stated on this blog, Bada OS is total crap. Scripting is a mess, T9 is missing of original versions, updating is not an available option depending on your phone (even if the phone is less than a year old). It keeps being absolutely worthless when it comes to reading PDF. No matter how, even if you feed it a specifically cropped PDF with no margins, you’ll always end up with something not really readable, too big, too small, whatever. A pain in the ass.

I soon realized it’s best, with such an appalling combination of software and hardware, to convert ebooks/PDFs to HTML. And as the provided HTML reader can’t remember what page you last read (not surprising) and, ahem, is unable to load a 3 MB page (low memory it says: even if a 30 MB PDF can be loaded by the PDF reader with no issue on the exact same phone, go figure!), it needs splitted HTML.

PDF is usually an output format, not a source format. While there’s plenty to convert to PDF, fact is there is no complete suite to convert from. pdftk is powerful but not easy to handle IMHO and pdftohtml latest released is almost 10 years old. So I ended writing a small wrapper (pdf2htmls.pl) for both theses tools to convert one PDF to multiples HTML files with basic indexes. It takes –input=file.pdf and (optional) –output=directory arguments. Asides from Perl, it requires debian packages pdftk and poppler-utils.

The indexes are über-crude. They could be improved with chapters/titles, I’ll maybe add that later.

From → Desktop

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: