Storm and Trident integration for Apache Solr. This package includes a bolt and a trident state that enable a Storm topology stream the contents of storm tuples to index Solr collections.
The The bolt and trident state provided use one of the supplied mappers to build a SolrRequest
object that is
responsible for making the update calls to Solr, thus updating the index of the collection specified.
In this section we provide some simple code snippets on how to build Storm and Trident topologies to index Solr. In subsequent sections we
describe in detail the two key components of the Storm Solr integration, the SolrUpdateBolt
, and the Mappers
, SolrFieldsMapper
, and SolrJsonMapper
.
new SolrUpdateBolt(solrConfig, solrMapper, solrCommitStgy)
// zkHostString for Solr 'gettingstarted' example
SolrConfig solrConfig = new SolrConfig("127.0.0.1:9983");
// JSON Mapper used to generate 'SolrRequest' requests to update the "gettingstarted" Solr collection with JSON content declared the tuple field with name "JSON"
SolrMapper solrMapper = new SolrJsonMapper.Builder("gettingstarted", "JSON").build();
// Acks every other five tuples. Setting to null acks every tuple
SolrCommitStrategy solrCommitStgy = new CountBasedCommit(5);
new SolrStateFactory(solrConfig, solrMapper);
// zkHostString for Solr 'gettingstarted' example
SolrConfig solrConfig = new SolrConfig("127.0.0.1:9983");
/* Solr Fields Mapper used to generate 'SolrRequest' requests to update the "gettingstarted" Solr collection. The Solr index is updated using the field values of the tuple fields that match static or dynamic fields declared in the schema object build using schemaBuilder */
SolrMapper solrMapper = new SolrFieldsMapper.Builder(schemaBuilder, "gettingstarted").build();
// builds the Schema object from the JSON representation of the schema as returned by the URL http://localhost:8983/solr/gettingstarted/schema/
SchemaBuilder schemaBuilder = new RestJsonSchemaBuilder("localhost", "8983", "gettingstarted")
SolrUpdateBolt
streams tuples directly into Apache Solr. The Solr index is updated usingSolrRequest
requests.
The SolrUpdateBolt
is configurable using implementations of SolrConfig
, SolrMapper
, and optionally SolrCommitStrategy
.
The data to stream onto Solr is extracted from the tuples using the strategy defined in the SolrMapper
implementation.
The SolrRquest
can be sent every tuple, or according to a strategy defined by SolrCommitStrategy
implementations.
If a SolrCommitStrategy
is in place and one of the tuples in the batch fails, the batch is not committed, and all the tuples in that
batch are marked as Fail, and retried. On the other hand, if all tuples succeed, the SolrRequest
is committed and all tuples are successfully acked.
SolrConfig
is the class containing Solr configuration to be made available to Storm Solr bolts. Any configuration needed in the bolts should be put in this class.
SorlMapper
implementations define the strategy to extract information from the tuples. The public method
toSolrRequest
receives a tuple or a list of tuples and returns a SolrRequest
object that is used to update the Solr index.
The SolrJsonMapper
creates a Solr update request that is sent to the URL endpoint defined by Solr as the resource
destination for requests in JSON format.
To create a SolrJsonMapper
the client must specify the name of the collection to update as well as the
tuple field that contains the JSON object used to update the Solr index. If the tuple does not contain the field specified,
a SolrMapperException
will be thrown when the method toSolrRequest
is called. If the field exists, its value can either
be a String with the contents in JSON format, or a Java object that will be serialized to JSON
Code snippet illustrating how to create a SolrJsonMapper
object to update the gettingstarted
Solr collection with JSON content
declared in the tuple field with name "JSON"
SolrMapper solrMapper = new SolrJsonMapper.Builder("gettingstarted", "JSON").build();
The SolrFieldsMapper
creates a Solr update request that is sent to the Solr URL endpoint that handles the updates of SolrInputDocument
objects.
To create a SolrFieldsMapper
the client must specify the name of the collection to update as well as the SolrSchemaBuilder
.
The Solr Schema
is used to extract information about the Solr schema fields and corresponding types. This metadata is used
to get the information from the tuples. Only tuple fields that match a static or dynamic Solr fields are added to the document. Tuple fields
that do not match the schema are not added to the SolrInputDocument
being prepared for indexing. A debug log message is printed for the
tuple fields that do not match the schema and hence are not indexed.
The SolrFieldsMapper
supports multivalue fields. A multivalue tuple field must be tokenized. The default token is |. Any
arbitrary token can be specified by calling the method org.apache.storm.solr.mapper.SolrFieldsMapper.Builder.setMultiValueFieldToken
that is part of the SolrFieldsMapper.Builder
builder class.
Code snippet illustrating how to create a SolrFieldsMapper
object to update the gettingstarted
Solr collection. The multivalue
field separates each value with the token % instead of the default | . To use the default token you can ommit the call to the method
setMultiValueFieldToken
.
new SolrFieldsMapper.Builder(
new RestJsonSchemaBuilder("localhost", "8983", "gettingstarted"), "gettingstarted")
.setMultiValueFieldToken("%").build();
##Working with Kerberized Solr If your topology is going to interact with kerberized Solr, your bolts/states need to be authenticated by Solr Server. We can enable authentication by distributing keytabs for solr user on all worker hosts. We can configure the solr bolt to use keytabs by setting SolrConfig.enableKerberos config property.
On worker hosts the bolt/trident-state code will use the keytab file with principal provided in the jaas config to authenticate with Solr. You need to specify a Kerberos principal for the client and a corresponding keytab in the JAAS client configuration file. Also make sure the provided principal is configured with required permissions to access solr collections.
Here’s an example JAAS config:
SolrJClient { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="/keytabs/solr.keytab" storeKey=true useTicketCache=true debug=true principal="[email protected]"; };
To be able to run the examples you must first build the java code in the package storm-solr
,
and then generate an uber jar with all the dependencies.
mvn clean install -f REPO_HOME/storm/external/storm-solr/pom.xml
Add the following to REPO_HOME/storm/external/storm-solr/pom.xml
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>org.apache.storm.solr.topology.SolrJsonTopology</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
create the uber jar by running the commmand:
mvn package -f REPO_HOME/storm/external/storm-solr/pom.xml
This will create the uber jar file with the name and location matching the following pattern:
REPO_HOME/storm/external/storm/target/storm-solr-0.11.0-SNAPSHOT.jar
Copy the file REPO_HOME/storm/external/storm-solr/target/storm-solr-0.11.0-SNAPSHOT.jar
to STORM_HOME/extlib
The code examples provided require that you first run the Solr gettingstarted example
STORM_HOME/bin/storm jar REPO_HOME/storm/external/storm-solr/target/storm-solr-0.11.0-SNAPSHOT-tests.jar org.apache.storm.solr.topology.SolrFieldsTopology
STORM_HOME/bin/storm jar REPO_HOME/storm/external/storm-solr/target/storm-solr-0.11.0-SNAPSHOT-tests.jar org.apache.storm.solr.topology.SolrJsonTopology
STORM_HOME/bin/storm jar REPO_HOME/storm/external/storm-solr/target/storm-solr-0.11.0-SNAPSHOT-tests.jar org.apache.storm.solr.trident.SolrFieldsTridentTopology
STORM_HOME/bin/storm jar REPO_HOME/storm/external/storm-solr/target/storm-solr-0.11.0-SNAPSHOT-tests.jar org.apache.storm.solr.trident.SolrJsonTridentTopology
The aforementioned Storm and Trident topologies index the Solr gettingstarted
collection with objects that have the following id
pattern:
*id_fields_test_val* for SolrFieldsTopology
and SolrFieldsTridentTopology
*json_test_val* for SolrJsonTopology
and SolrJsonTridentTopology
Querying Solr for these patterns, you will see the values that have been indexed by the Storm Apache Solr integration:
curl -X GET -H "Content-type:application/json" -H "Accept:application/json" http://localhost:8983/solr/gettingstarted_shard1_replica2/select?q=*id_fields_test_val*&wt=json&indent=true
curl -X GET -H "Content-type:application/json" -H "Accept:application/json" http://localhost:8983/solr/gettingstarted_shard1_replica2/select?q=*json_test_val*&wt=json&indent=true
You can also see the results by opening the Apache Solr UI and pasting the id
pattern in the q
textbox in the queries page
http://localhost:8983/solr/#/gettingstarted_shard1_replica2/query
Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
- Sriharsha Chintalapani ([email protected])
- P. Taylor Goetz ([email protected])