Home
Categories
Dictionnary
Download
Project Details
Changes Log
FAQ
License

Compressed format


By default the format of the serialized JSON index is compatible with the elasticlunr Javascript library. For example you can save the index as JSON in Java and load the same index in Javascript. However a much more compressed format for the index is also supported.

Compressed format overview

This compressed format is explained as a pull request in the elasticlunr project.

Sample code to exercise the format. It uses example_index.json from the demo website:

      elasticlunr = require(".");
      index = elasticlunr.Index.load(require("./example_index.json"));
      uncompressed = JSON.stringify(index.toJSON());
      compressed = JSON.stringify(index.toJSON(true));

      console.log("Uncompressed bytes: " + uncompressed.length);
      // 842369 bytes
      console.log("Compressed bytes: " + compressed.length);
      // 305818 bytes

To help illustrate the changes, here's a sample index with the word "the" for a single document:

      {"docs":{},"df":0,"t":{"docs":{},"df":0,"h":{"docs":{},"df":0,
      "e":{"docs":{"12345":{"tf":1.234567890123456}}}}}}}

A huge portion of the reduction is simply removing empty docs properties and calculating df upon load. That got the size down to about 400k.

      {"t":{"h":{"e":{"docs":{"12345":{"tf":1.1234567890123456}}}}}}}

Floating point numbers are now trimmed down to 8 digits. It's a loss in precision, but probably not one that will matter much:

      {"t":{"h":{"e":{"docs":{"12345":{"tf":1.1234567890123456}}}}}}}

The tf property is now pulled out of the object since it's the only property in that object. This removes a few more bytes:

      {"t":{"h":{"e":{"docs":{"12345":1.12345678}}}}}}}

Finally, indexes are combined when there was only one possible outcome of the indexing:

      {"the":{"docs":{"12345":1.12345678}}}

Each of these brought the size down dramatically. The loading function has been modified to handle either the compressed format or the regular one, and you can safely mix and match either style starting anywhere in the index tree.

Compressed format compatibility

The compressed format is not compatible with the regular elasticlunr Javascript library. However if you only use the library in Java you will have no problem.

However a modified version of the Javascript library, compatible with this compressed format (but also the regular uncompressed format) is provided in this project, under the js/compressed folder. The associated elasticlunr fork is available at hervegirod/elasticlunr.js.

See Also


Categories: format

Copyright 2017 Wei Song. Copyright 2018 Herve Girod. All Rights Reserved. Documentation and source under the MIT licence