Adding a document in the
index will call the
Pipeline for each field content in the document. A Pipeline is an ordered list of
Pipeline.Function
s which have only one mandatory method:
public String run(String token);
The output of the functions in the pipeline will be passed to the next function in the pipeline. To exclude a token from entering the index the function should return null, the rest of the pipeline will not be called with this token.
General pipeline
In general
[1]
Though it is by no means mandatory
, a Pipeline can contain the following functions:
- A trimmer which will trim the words in the text
- A StopWordFilter which will remove stop words[2]
For example words like a, the, etc... in English
- A Stemmer which will transform words into their root form[3]
However extra functions can be added to the Pipeline as needed. For example it is possible to add a
Metaphoning processor to the functions.
Note that the StopWordFilter and the Stemmer usually depend on the language.
Built-in language-specific pipeline
By passing a language id to the index at creation, a Pipeline configured for this language will be chosen. For the moment, the languages are supported:
- "en": English (default) language[4]
When creating an indew without specifying a language, the English language, will be used for the Pipeline
- "fr": French language
- "de": German language
- "es": Spanish language
Associated built-in functions
The built-in language-specific pipelines contain the following functions customized for the specified language:
- A generic trimmer which will trim the words in the text
- A language-specific StopWordFilter which will remove stop words[5]
For example words like a, the, etc... in English
- A language-specific Stemmer which will transform words into their root form[6]
- A generic Metaphone function if the index is specified to enable Metaphoning
- A generic Remove accents function if the index is specified to enable Removing accents
The Metaphone function implements a
Metaphoning processor. Metaphoning index words by their pronunciation. The Metaphone processor depends on the language.
Remove accents function
The Remove accents function Just remove all accentuation from the text.
Example
For example for creating a French Index:
Index index = ElasticLunr.createIndex("fr");
Or for creating an English Index with Metaphoning:
Index index = ElasticLunr.createIndex("en", PipelineFactory.METAPHONE_ENABLED);
Creating a custom pipeline
You can create a custom Pipeline and add functions to it by
addFunction(Function)
. A Pipeline Function must implement only the
run(String)
method. This method returns the transformed Token
[7]
Returning null means that the token is discarded and will not be passed to the next function
.
The Pipeline can be registered with a key to be used later in the index by calling the
addPipeline(String, Pipeline)
method of the PipelineFactory class. For example:
Pipeline pipeline = new Pipeline();
pipeline.addFunction(new Trimmer());
pipeline.addFunction(new MyStopWordFilter());
pipeline.addFunction(new MyStemmer());
PipelineFactory factory = PipelineFactory.getInstance();
factory.addPipeline("myKey", pipeline);
Index index = ElasticLunr.createIndex("myKey");
Default pipeline
When using the
Index
constructor without argument, an English-language Pipeline is created by default. So the following code:
Index index = ElasticLunr.createIndex();
is equivalent to:
Index index = ElasticLunr.createIndex("en");