Zimbalaka – Zim file creator for Offline Wikipedia

OpenZim is a Wikimedia developed format for offline reading of Wikipedia. Read more here. But the project was sadly sidelined and the support from MediaWiki, the software that runs Wikipedia sites, was also removed.

I came to know about all this from Bala Jeyaraman of Vasippu. He is planning to introduce tablets in a classroom of 6th standard students, with exceptional comprehension levels compared to average Indian classrooms, and wanted a way to load select material into the tablets. The OpenZim files have an excellent reading app called Kiwix, which also offers complete Wiki sites as downloads. Tablets can’t afford to have a huge amount of data, like full Wikipedia. There is no way to create a zim file with select topics. One has to request the OpenZim team to do it for him/her.

Enter Zimbalaka

Zimbalaka is a project which tries to solve just that. It creates offline wikipedia content files in zim file format. A person can input a list of pages that need to be created as a zim, or at least a Wikipedia category. Then Zimbalaka downloads those pages, removes all the clutter like sidebar, toolbox, edit links …etc., and gives a cleaned version as a zim file for download. It can be opened in Kiwix.

The zim is created with a simple welcome page with all the pages as a list of links. The openzim format also has an inbuilt search index and Kiwix uses this really well. So you can create zims of 100 articles and still navigate to them easily either way.

Zimbalaka has multi-lingual and multi-site support. That is, you can create a zim file from pages of any language of the 280+ existing Wikipedias, and also from sites like Wikibooks, Wiktionary, Wikiversity and such. You can even input any custom URL like (http://sub.domain.com/), Zimblaka would add (/wiki/Page_title) to it and download the pages.

It is currently hosted by my good friend Srikanth (@logic) at http://srik.me/zimbalaka

Screenshots

Here is how the content looks in Kiwix for Android.

navigate

multi

Pain points

  • A small pain point is that Zimbalaka also strips the external references that occur at the end of the Wikipedia articles, as I didn’t find it useful in an offline setup.
  • You cannot add a custom Welcome page in the zim file. Not a very big priority. The current file does its work of listing all the pages
  • You cannot include pages from multiple sites as a single zim file. The workaround is to create multiple files or use a tool called zimwriterfs, which has to be compiled from source (this is used by zimbalaka behind the scenes).

Developers

This tool is written using Flask – A simple Python web framework for the backend, Bootstrap as the frontend and uses the zimwriterfs compiled binary as the workhorse. The zimming tasks are run by Celery, which has been automated by supervisord. All the coordination and message passing happen via Redis.

Do you want to peek in how it is all done? Here is the source code [https://github.com/tecoholic/Zimbalaka]. Feel free to fork, modify and host your own instance.

Update

The OpenZim team has appreciated the effort I had put in and offered to host the tool on their server at http://zimbalaka.openzim.org. They have also pointed me to the desired backend called ‘mwoffliner’ that they have developed to download and clean the HTML. I will be working on it in my free time.

Apparix – Bookmarking in terminal

When working in a large code base like Quantum GIS or when dealing with a lot of repositories in the machine, it is always tedious to cd all the way to the folder we require to move to. Enter apparix, an excellent linux tool I found by googling “bookmarking in the terminal”. This blog post has the complete details of how to use it.

Yay, no longer cd goto/project/src/core/of/module1 and again cd ../../../test/number/three. I can simply do

$ bm projectsrc
$ bm test3
$ bm fancypants4

to bookmark my locations and simply

$ to projectsrc
$ to test3
$ to fancypants4

One more tool added in the arsenal to improve productivity.

Unit Testing with CasperJS

Today I sat down to create a JavaScript library. I wanted to do it the way I have long dreamed of – TDD (Test Driven Development). There is no dearth of Unit testing libraries and frameworks for JavaScript, so after some reading on the internet settled on CasperJS and PhantomJS combination. CasperJS is just awesome for functional testing, but Unit testing? Even though it supports Unit Testing, it as such is not a dedicated unit testing framework like Karma or Protractor. Read this for more information on TTD frameworks for JS libraries.

Loading plain JavaScript files in CasperJS for unit testing seems to be completely undocumented. I tend to think it is because it wasn’t meant to be webpage-less. But the note on docs of tester module says:

The best way to learn how to use the Tester API and see it in action is probably to have a look at CasperJS’ own test suites.

Thanks for this quote, I found that using the fs module one can load local filesystem files as modules to be used. Using that now I could write unit tests while developing the library and later on write functional tests while using the library.

Here is the file structure

library
--test
  |--unit
  |--test.js
  |--index.html
--src
  |--livetransit.js

And here is are the two tests – i) uses a webpage based approach; ii) uses module approach
https://gist.github.com/tecoholic/937f51b6889448836db8

Live Transit Visualization

After a push from @logic, I started working on a project to show live location of suburban trains based on transit.js. The idea is to make a map similar to this for many cities, with a choice of schedules (weekday/sunday). But I found the jQuery plugin, transit.js, too hardcoded and disorganised to perform what I wanted to and also to my taste.

I found other projects like:

  • LiveBus which does a live status map using SVG maps based on d3.js library and GTFS feeds of the bus data.
  • TransitLive which uses LeafletJS as the map library with their own OSM tiles. The schdule data seems to come from the backend service.
  • NextBus which has a textual realtime status and route maps based completely on Google Maps.
  • King County Metro which again uses Google Maps for the map and a custom way of loading data from its servers. The best map I have seen so far and loads a ton of JS.

Each has its own technology stack solving the same problem.This blog post details the use of GTFS data or the lack of it for realtime visualization. So the generalized picture seems to be that everyone is rolling out a custom version of their own.

So, I am going to create a simple library that can do this.

Here is the mockup usage of the library:

// Create a new livetransit object with the map type
var lt = new LiveTransit();

// Assign a div for the map
var divId = "map";
var mapType = "google";

lt.setupMap(divId, mapType);

// Overloaded to perform both addRoute and initiateMovement
lt.initiateMovement("chennai_velachery.kml", "chennai_velachery_weekday.json");

// ------------------------------------
// other probable cases to deal with
// -----------------------------------
// Specify a different schedule like Sunday/Holiday Schedule
lt.changeSchedule( "chennai_velachery_sunday.json" );
// change the location - city
lt.changeLocation( "new.kml", "new.json" );

Obsession

I have been observing a pattern in my life over the past few months. I am obsessed about something in the evenings and the free time. It was books for a month, Far Cry 3 for another, and has recently turned into Chess.

I am trying to understand the underlying factor which is responsible for this behaviour. After reading through some pages about impact of games on human brain, watching the TED talks like Jane McGonigal: Gaming can make a better world and assuring myself that I am not really going crazy, I think I have a plausible answer.

Like all young people I need to have that sense of achievement.

Being a introvert, the above explanation makes a lot of sense. I am not uploading pics in Facebook, I am not tweeting even an average of 1 tweet/day – other things that could keep me filled with the achievement and appreciation factor I am looking for.

Obsession Hacking

The word hacking is being used in a lot of places where it means “modification” or “change” or “tweak”. I am trying to use it for channeling my obsession into something that could be productive – as in work – as well as supply me the required achievement factor. One activity which I know could do that is – Coding.

Taking a look at what I have done in 2014:

github_dismal

I think I would do what John Resig recommends – Write Code Everyday, starting from today December 1, 2014. Let me see how far the obsession hacking goes.

Update: December 20,2014
Well. This doesn’t seem to be as simple as it seems. Gaming, reading books, chess – all have been entertaining and relaxing. Because that is consumption of content. But coding is production of content, hence has proved to be a much difficult and straining task. I haven’t been able to get to coding at all. The experiment so far has been a big failure.