Shallow Rampa - Wikipedia4epub

Wikipedia4epub

by dixie on Date unknown

Tagged as: ebook, app.

Overview

Wikipedia4epub is command line application which creates the offline ebook from articles on wikipedia.

It doesn’t provide the complete offline wikipedia can download given article and its direct children (wiki4e-mkepub-subtree)

Examples via wiki4e-mkepub-subtree:

Installation

The best is to use HackageDB, the GHC 6.12.1 and newer is required.

$ cabal install wikipedia4epub

Please be aware that it is still ALPHA quality software.

Usage

There are following commands:

Example of usage (wiki4e-mkepub-subtree)

$ wiki4e-mkepub-subtree http://en.wikipedia.org/wiki/EPUB
# STAGE 1/4 - Fetch starting article: http://en.wikipedia.org/wiki/EPUB
[1/1] Already cached. Skipping download. /home/dixie/.wiki4e/wiki4e_fetch/EPUB
# STAGE 2/4 - Fetch children articles: 113
[1/113] Already cached. Skipping download. /home/dixie/.wiki4e/wiki4e_fetch/EPUB
[2/113] Fetching : http://en.wikipedia.org/wiki/Filename_extension
...
[11/113] Fetching : http://en.wikipedia.org/wiki/DTBook
[12/113] Fetching : http://en.wikipedia.org/wiki/Website
...
[113/113] Already cached. Skipping download. /home/dixie/.wiki4e/wiki4e_fetch/Main_Page
# STAGE 3/4 - Sanitize articles
# STAGE 4/4 - Download images
Count = 352
[1/352] Already cached. Skipping download. /home/dixie/.wiki4e/wiki4e_images/100px-EBookreal.jpg
[2/352] Already cached. Skipping download. /home/dixie/.wiki4e/wiki4e_images/50px-Question_book-new.svg.png
...
Done.

For a performance reasons, the downloaded articles amd images are cached. Cache has to be cleanup manually.

For caching purpose following directories are constructed:

Screenshot

The quality of screenshots is poor because they were shot with a phone camera.

Listing of the books in Reader

Book Listing. First is Wikipedia from Firefox

Book Listing. First is Wikipedia from Firefox

Context table of the Wikipedia articles

Context Table. Each Chapter is one Article

Context Table. Each Chapter is one Article

One selected Wikipedia article

Article Content, Top

Article Content, Top

The same article after scrolling

Article Content, Scrolled

Article Content, Scrolled

Source code & bugs

Darcs repositories:

For reporting the bug or questions please write me email to

Changes

EPUB Format

It is standarized & open format for ebooks, but basically it is ZIP-ed XHTML pages, images with some metadata. For the details see EPUB Article on Wikipedia

It is suppored with my ebook reader (PRS–505) and also my new ebook reader HanLin V5.

The “full blown” open source software for ebook management - Calibre - supports the EPUB too.