Home
Categories
Dictionnary
Download
Project Details
Changes Log
FAQ
License

Pipeline


    1  General pipeline
    2  Built-in language-specific pipeline
       2.1  Associated built-in functions
          2.1.1  Metaphone function
          2.1.2  Remove accents function
       2.2  Example
    3  Creating a custom pipeline
    4  Default pipeline
    5  Notes

Adding a document in the index will call the Pipeline for each field content in the document. A Pipeline is an ordered list of Pipeline.Functions which have only one mandatory method:

      public String run(String token);

The output of the functions in the pipeline will be passed to the next function in the pipeline. To exclude a token from entering the index the function should return null, the rest of the pipeline will not be called with this token.

General pipeline

In general[1]
Though it is by no means mandatory
, a Pipeline can contain the following functions: However extra functions can be added to the Pipeline as needed. For example it is possible to add a Metaphoning processor to the functions.

Note that the StopWordFilter and the Stemmer usually depend on the language.

Built-in language-specific pipeline

By passing a language id to the index at creation, a Pipeline configured for this language will be chosen. For the moment, the languages are supported:

Associated built-in functions

The built-in language-specific pipelines contain the following functions customized for the specified language:

Metaphone function

The Metaphone function implements a Metaphoning processor. Metaphoning index words by their pronunciation. The Metaphone processor depends on the language.

Remove accents function

The Remove accents function Just remove all accentuation from the text.

Example

For example for creating a French Index:
      Index index = ElasticLunr.createIndex("fr");

Or for creating an English Index with Metaphoning:

      Index index = ElasticLunr.createIndex("en", PipelineFactory.METAPHONE_ENABLED);

Creating a custom pipeline

You can create a custom Pipeline and add functions to it by addFunction(Function). A Pipeline Function must implement only the run(String) method. This method returns the transformed Token[7]
Returning null means that the token is discarded and will not be passed to the next function
.

The Pipeline can be registered with a key to be used later in the index by calling the addPipeline(String, Pipeline) method of the PipelineFactory class. For example:

      Pipeline pipeline = new Pipeline();
      pipeline.addFunction(new Trimmer());
      pipeline.addFunction(new MyStopWordFilter());
      pipeline.addFunction(new MyStemmer());      
      
      PipelineFactory factory = PipelineFactory.getInstance();
      factory.addPipeline("myKey", pipeline);
      
      Index index = ElasticLunr.createIndex("myKey");

Default pipeline

When using the Index constructor without argument, an English-language Pipeline is created by default. So the following code:

      Index index = ElasticLunr.createIndex();

is equivalent to:

      Index index = ElasticLunr.createIndex("en");

Notes

  1. ^ Though it is by no means mandatory
  2. ^ For example words like a, the, etc... in English
  3. ^ A Stemmer usually implements the Porter Stemming Algorithm.
  4. ^ When creating an indew without specifying a language, the English language, will be used for the Pipeline
  5. ^ For example words like a, the, etc... in English
  6. ^ A Stemmer usually implements the Porter Stemming Algorithm.
  7. ^ Returning null means that the token is discarded and will not be passed to the next function

Categories: creation | general

Copyright 2017 Wei Song. Copyright 2018 Herve Girod. All Rights Reserved. Documentation and source under the MIT licence