Stars
Free and Open Source, Distributed, RESTful Search Engine
Apache Lucene and Solr open-source search software
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files