Name		Name	Last commit message	Last commit date
Latest commit History 617 Commits
.mvn		.mvn
beans		beans
cli		cli
core		core
crawler		crawler
distribution		distribution
docs		docs
elasticsearch-client		elasticsearch-client
framework		framework
integration-tests		integration-tests
rest		rest
settings		settings
src/main/resources/org/apache/maven/plugin/announcement		src/main/resources/org/apache/maven/plugin/announcement
test-documents		test-documents
test-framework		test-framework
tika		tika
.gitignore		.gitignore
.travis.yml		.travis.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
deploy-settings.xml		deploy-settings.xml
pom.xml		pom.xml
release.sh		release.sh

Repository files navigation

File System Crawler for Elasticsearch

Welcome to the FS Crawler for Elasticsearch

This crawler helps to index binary documents such as PDF, Open Office, MS Office.

Main features:

Local file system (or a mounted drive) crawling and index new files, update existing ones and removes old ones.
Remote file system over SSH crawling.
REST interface to let you "upload" your binary documents to elasticsearch.

You need to install a version matching your Elasticsearch version:

Elasticsearch	FS Crawler	Released	Docs
2.x, 5.x, 6.x	2.5-SNAPSHOT		See below
2.x, 5.x, 6.x	2.4	2017-08-11	2.4
2.x, 5.x, 6.x	2.3	2017-07-10	2.3
1.x, 2.x, 5.x	2.2	2017-02-03	2.2
1.x, 2.x, 5.x	2.1	2016-07-26	2.1
es-2.0	2.0.0	2015-10-30	2.0.0

Build and Quality Status

The guide has been moved to ReadTheDocs.

License

This software is licensed under the Apache 2 license, quoted below.

Copyright 2011-2018 David Pilato

Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

Incompatible 3rd party library licenses

Some libraries are not Apache2 compatible. Therefore they are not packaged with FSCrawler so you need to download and add manually them to the lib directory:

jbig2: com.levigo.jbig2:levigo-jbig2-imageio:2.0
tiff: com.github.jai-imageio:jai-imageio-core:1.3.1
JPEG2000: com.github.jai-imageio:jai-imageio-jpeg2000:1.3.0

See pdfbox for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

File System Crawler for Elasticsearch

Build and Quality Status

License

Incompatible 3rd party library licenses

About

Releases

Packages

Languages

License

Thurdi/fscrawler-acl

Folders and files

Latest commit

History

Repository files navigation

File System Crawler for Elasticsearch

Build and Quality Status

License

Incompatible 3rd party library licenses

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages