-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embedded API raises exception during search after restart #18
Comments
I have a similar problem, after the restart and search previously indexed documents: curl "http://localhost:20220/v1/indexes/idx/search?q=ipsum", I get correct result. But if I try to do advanced searches, like the following: curl "http://localhost:20220/v1/indexes/idx/search?q=ipsum&snippet=text", I get "Service unavailable" and the following exception in IndexTank: com.flaptor.indextank.api.IndexEngineApiException: java.lang.NullPointerException |
Does anyone have workaround for the issue? Reindexing whole dataset after restart is, well, waste of resources and does not help in availability. |
I think I have tracked down the cause of this. The problem occurs because IndexTank is expecting the InMemoryStorage instance to be in a particular state after startup however depending on how the engine is bootstrapped it may not have been initialized correctly. When starting an instance of the EmbeddedIndexEngine you MUST specify the parameter:
for example:
However I did notice that if I had an index that was already in a "bad" state providing this parameter resulted in a lot of other errors. It seems that if you start from scratch with this parameter it's all good. Unfortunately this seems like an incredibly flakey/unreliable situation. If for any reason the InMemoryStorage instance fails to load successfully your whole index is basically useless. I'm still tracing through the code to try to work out how this can be made more robust. It may be that it IS a robust solution and I'm just missing the point of course |
Follow up... The default implementation of the EmbeddedIndexEngine seems to only allow the use of this InMemoryStorage instance: Snippet from EmbeddedIndexEngine
I'm assuming IndexTank is using this Document storage to maintain a complete copy of the original document that was indexed, presumably because the underlying Lucene instance has been instructed to only index document fields and not store them. Index only would seem to be a sensible option however I would also assume that in almost all cases the user of IndexTank will already have a document storage system and would not need IndexTank to manage this itself. Unfortunately there also does not seem to be an easy way to instruct the engine to NOT use storage. Despite the snippet above, the engine also has this:
Of course none of these other values will every actually work because of the code in the first snippet. Confusing... |
I need help with this issue. I have edited the file and added --load-state true but when i start the service i receive an i just need to start my index, add documents, stop it, start it again y be able to search the past documents. |
Please keep in mind the IndexTank-engine was created as a way to make IndexTank easy to use as it was open-sourced. Originally it was part of IndexTank-service, and as such the recovery was provided by the "LogStorage" (a component of the service which is absent in the stand-alone engine) and indexes were killed and respawned routinely by the "Nebu" component, transparently for the user. You can still use this setup if you want to venture in that direction. So the stand-alone engine needs to be fed all the documents again after restart. The recovery time typically depends on the speed of the data source, as the engine can take documents much faster than a normal disk-based source can spew them. But unless you have a really large number of documents, it should take only a few seconds. |
If I start the embedded api and index some documents, I'm able to query them. If I stop and restart the embedded api, if I query for any document that was previously in the index, the embedded API throws the IndextankException below. Searching for a term that wasn't previously in the index returns a correct json result of zero matches.
I am using the default sample-engine-config and running on OS X.
Is there something I'm doing wrong here? Do I have to do something to trigger a reload of the previously indexed documents?
The text was updated successfully, but these errors were encountered: