cbElasticsearch, the Elasticsearch module for the Coldbox platform, announced a major release with version 2.0.0 this week. Version 2.0.0 represents a major rewrite of the core Elasticsearch integration and converts the previous java-based JEST client implementation to a native CFML implementation, and adds the Hyper HTTP module as a dependency. In addition, support for Elasticsearch server versions < 6.5 has been removed, with support for 6.x versions being officially ended.
Major features in this release include the ability to leverage Elasticsearch ingest pipelines for pre-processing of documents, as well as leveraging the media ingest capabilities which allow for text extraction on PDFs, MSWord, and all text-based document extensions supported by the Apache Tika processor. More information on utilizing the Ingest Attachment Plugin may be found here in the official Elasticsearch Docs.
Creating Pipelines
Let's say we want to automatically set a field on a document when we save it. We can add a processor on the ingest of documents like so:
var myPipeline = getInstance( "Pipeline@cbelasticsearch" ).new( { "id" : "foo-pipeline", "description" : "A test pipeline", "version" : 1, "processors" : [ { "set" : { "if" : "ctx.foo == null", "field" : "foo", "value" : "bar" } } ] } );
With this pipeline, if a value of foo
is not defined ( note that ctx
is the document reference in the if
conditional ) in the inbound document, then the value of that field will automatically be set to 'bar'
.
We can save/apply this pipeline in one of two ways.
Through the pipeline object:
myPipeline.save();
Or through the client:
getInstance( "Client@cbElasticsearch" ).applyPipeline( myPipeline );
Full CRUD operations on pipelines are also supported. For more information see the official documentation at OrtusBooks.
Using pipelines When Saving Documents
Pipelines may be used when saving individual or multiple documents. See the Documents section for more information on document creation.
To save an individual document, with pipeline processing:
myDocument.setPipeline( 'foo-pipeline' ).save();
For multiple documents, the pipeline may be set in the document, prior to the saveAll
call. Note, however, that all documents provided in the bulk save must share the same pipeline, as elasticsearch does not support multiple pipelines in bulk saves. Attempting to save multiple documents with different pipelines will throw an error. Alternately, you may pass the pipeline in as a param to the saveAll
call:
getInstance( "Client@cbElasticsearch" ).saveAll( documents=myDocuments, params={ "pipeline" : "foo-pipeline" } );
What's New
- Converts default native client to HyperClient ( native CFML implementation )
- Moves previous native JEST Client to
cbelasticsearch-jest
module. - Adds
cbElasticsearchPreSave
andcbElasticsearchPostSave
interceptions when saving individual or bulk documents - Adds the ability to create, update, read, and delete Elasticsearch pipelines
- Adds the ability to configure a pipeline for document processing ( e.g.
myDocument.setPipeline( 'my-pipeline' )
) - Adds the ability to add save query parameters when saving individual documents ( e.g.
myDocument.addParam( 'refresh', true )
) - Adds the ability to pass a struct of params to bulk save operations (e.g.
client.saveAll( documents, false, { "refresh" : true } )
)
Breaking Changes
- Removes the
deleteMapping
method in the main client, as it is no longer supported in ES versions 6.5 and up. - Removes support for Adobe Coldfusion 11
- Removes support for Lucee 4.x
- Ends official support for 6.x versions of Elasticsearch
Wrapping up
cbElasticsearch v2.0 represents a major advancement in the capabilities previously offered, while converting to a native CFML implementation. Version 2.0.0, despite breaking changes, loads faster, performs better, and adds additional modernization tools for expanding search to include documents.
Add Your Comment