A Study of Lucene (Mobile Version)

This project is a modified version of lucenestudy, it serves as an example of how you can use Lucene to index documents, search with sorting and paging, highlight search results, and provide search suggestions (also known as "autocomplete").

The sample application in this project works like a movie review search engine. A sample dataset is provided for testing the app.

Mobile version notes: This is a "mobile" version of lucenestudy that comes with the prebuilt JARs of Android Lucene. The source code is built as jar and the library can be utilized in Android project.

Sample Dataset

Subset of the Large Movie Review Dataset v1.0 prepared by Andrew L. Maas et al. as part of their 2011 paper "Learning Word Vectors for Sentiment Analysis" (PDF, BibTeX entry) is being used in this project. The original dataset is used for sentiment analysis research, and is selected from IMDb.

The subset contains 1,000 entries about 500 films, each two reviews for each film, one "positive" and one "negative" according to the Large Movie Review Dataset.

The original dataset contains two subsets, one "training" and one "testing". This is the standard division in machine learning and natural language processing. In each subset, each review takes one file, and there are separate listings of the source URL.

The sample subset here is from the "testing" dataset. I've converted the files into a JSON file in sample/acl-imdb-subset.json. The original dataset does not have the titles and the release years for the films reviewed. I've also collected them in the JSON file.

For more information about the original data set, please visit the Andrew Maas's website, on which you can find the download link to the dataset. A detailed README can also be found in the tarball.

How to Build and Run the App

To build the sample app, you'll need JDK 1.8 or above installed. The project uses Gradle, and I've included a Gradle wrapper that can bootstrap itself.

To build the app, simple run:

./gradlew build

This will compile the app, make the JAR, and also run the tests.

The built JAR is located at ./build/libs/lucenestudy.jar. Once you have the JAR, you can build the indices. Suppose you want to put the built indices under /tmp/testidx:

java -jar build/libs/lucenestudy.jar index sample/acl-imdb-subset.json /tmp/testidx

Then, to search for reviews containing the keyword "robot":

java -jar build/libs/lucenestudy.jar search /tmp/testidx robot

And to see the search suggestion in action:

java -jar build/libs/lucenestudy.jar suggest /tmp/testidx love

Boolean Searches

The sample app uses Lucene's query parser to process the query string. You can perform boolean searches with quote strings. For example, this searches for titles and reviews that contain the term "apocalypse" but not "now":

java -jar build/libs/lucenestudy.jar search /tmp/testidx "apocalypse -now"

Extending the App

This is nowhere near a comprehensive survey of Lucene. It is a big library that covers a wide range of use cases. Pull requests that extend the app are therefore more than welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
libs		libs
sample		sample
src		src
LICENSE		LICENSE
LICENSE.txt		LICENSE.txt
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
proguard-rules.pro		proguard-rules.pro
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

A Study of Lucene (Mobile Version)

Sample Dataset

How to Build and Run the App

Boolean Searches

Extending the App

About

Licenses found

Releases

Packages

Languages

License

Licenses found

AndroidLib/lucenestudy

Folders and files

Latest commit

History

Repository files navigation

A Study of Lucene (Mobile Version)

Sample Dataset

How to Build and Run the App

Boolean Searches

Extending the App

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages