Direct naar hoofdinhoud

Process large amounts of Elasticsearch data using TIBCO ActiveMatrix BusinessWorks 5

Pstune Large
Filter: blogs
Filter: tech
delen

Sometimes you want to create a longitudinal study of patterns in your Elasticsearch data and you want to analyze the entire event stream matching your criteria. The scroll API provides a mechanism for asking Elasticsearch for every last entry matching a query and then to get the results back in chunks which sequentially represent the entire set of matching records.

 

The following is an excerpt from the Elastic webpage(Verwijst naar een externe website) that explains the API:

While a search request returns a single “page” of results, the Elasticsearch scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use a cursor on a traditional database.

 

On my GitHub page(Verwijst naar een externe website) you can find my TIBCO ActiveMatrix BusinessWorks 5 ProcessDefinition (see bottom of this post for a image of the definition) including the neccessary XSDs. Mind you that I’ve used the REST and JSON plugin to make my life easier working with JSON in BW. In my process, I’m simply logging the results of the scrolled searches to a file (using the standard “Write To Log” activity).

 

You would want to have BusinessWorks to pipe the data to a stream processing application, built in something like Kafka Streams or Apache Spark. As a basis, I used patterns that I found in client helpers based on Python and Java programming languages.

If you’re solely interested in reindexing documents from one index to another, take a look at elasticdump(Verwijst naar een externe website) at hub.docker.com.
Joshua Moesa
Joshua Moesa

meer blogs in Blogs en Tech