Wikipedia4epub

[Go back to Index]

Wikipedia4epub

Overview

Wikipedia4epub is command line application which creates the offline ebook from articles on wikipedia.

It doesn't provide the complete offline wikipedia but only the selected articles:

  1. Visited articles based on Firefox history (wiki4e-mkepub-firefox)
  2. Given article and its direct children (wiki4e-mkepub-subtree)

See example ebook from EPUB article using wiki4e-mkepub-subtree.

Simple diagram how the wiki4e-mkepub-firefox command works: Diagram

Installation

The best is to use HackageDB, the GHC 6.12.1 and newer is required.

$ cabal install wikipedia4epub

Please be aware that it is still ALPHA quality software.

Usage

There are following commands:

Example of usage (wiki4e-mkepub-firefox)

$ wiki4e-mkepub-firefox Wikipedia123
Please close your Firefox if you see this message longer than 5 seconds...
Going to connect on Firefox SQLite DB: /home/dixie/.mozilla/eclipse/places.sqlite
Going to connect on Firefox SQLite DB: /home/dixie/.mozilla/firefox/2zox86mc.default/places.sqlite
# STAGE 1/4 - Download Articles...
# STAGE 2/4 - Sanitize Articles...
# STAGE 3/4 - Download Images...
# STAGE 4/4 - Constructing EPUB...
Wikipedia123.epub constructed.
Done.

Example of usage (wiki4e-mkepub-subtree)

$ wiki4e-mkepub-subtree http://en.wikipedia.org/wiki/EPUB
# STAGE 1/4 - Fetch starting article: http://en.wikipedia.org/wiki/EPUB
[1/1] Already cached. Skipping download. /home/dixie/.wiki4e/wiki4e_fetch/EPUB
# STAGE 2/4 - Fetch children articles: 113
[1/113] Already cached. Skipping download. /home/dixie/.wiki4e/wiki4e_fetch/EPUB
[2/113] Fetching : http://en.wikipedia.org/wiki/Filename_extension
...
[11/113] Fetching : http://en.wikipedia.org/wiki/DTBook
[12/113] Fetching : http://en.wikipedia.org/wiki/Website
...
[113/113] Already cached. Skipping download. /home/dixie/.wiki4e/wiki4e_fetch/Main_Page
# STAGE 3/4 - Sanitize articles
# STAGE 4/4 - Download images
Count = 352
[1/352] Already cached. Skipping download. /home/dixie/.wiki4e/wiki4e_images/100px-EBookreal.jpg
[2/352] Already cached. Skipping download. /home/dixie/.wiki4e/wiki4e_images/50px-Question_book-new.svg.png
...
Done.

For performance and debug reasons download of article or image is cached. Cache has to be cleanup manually.

For caching purpose following directories are constructed:

  • $HOME/.wiki4e/wiki4e_fetch/
  • $HOME/.wiki4e/wiki4e_sanitize/
  • $HOME/.wiki4e/wiki4e_images/

Screenshot

The quality of the screenshots are not very good since they are taken with mobile phone.

Listing of the books in Reader

Book Listing. First is Wikipedia from Firefox

Book Listing. First is Wikipedia from Firefox

Context table of the Wikipedia articles

Context Table. Each Chapter is one Article

Context Table. Each Chapter is one Article

One selected Wikipedia article

Article Content, Top

Article Content, Top

The same article after scrolling

Article Content, Scrolled

Article Content, Scrolled

Source code & bugs

Darcs repositories:

For reporting the bug or questions please write me email to

Changes

EPUB Format

It is standarized & open format for ebooks, but basically it is ZIP-ed XHTML pages, images with some metadata. For the details see EPUB Article on Wikipedia

It is suppored with my ebook reader (PRS-505) and also my new ebook reader HanLin V5.

The "full blown" open source software for ebook management - Calibre - supports the EPUB too.