Skip to content

Commit

Permalink
Add support for Elasticsearch 7.0
Browse files Browse the repository at this point in the history
We can start supporting Elasticsearch 7.0 in FSCrawler.

TODO: replace deprecated methods (don't use types anymore).
  • Loading branch information
dadoonet committed Jan 29, 2019
1 parent a962453 commit 3f6cd5e
Show file tree
Hide file tree
Showing 28 changed files with 1,557 additions and 31 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ You need to install a version matching your Elasticsearch version:

| Elasticsearch | FS Crawler | Released | Docs |
|--------------------|-------------|----------|------------------------------------------------------------------------------|
| 2.x, 5.x, 6.x | 2.7-SNAPSHOT| |[2.6-SNAPSHOT](https://fscrawler.readthedocs.io/en/latest/) |
| 2.x, 5.x, 6.x, 7.x | 2.7-SNAPSHOT| |[2.7-SNAPSHOT](https://fscrawler.readthedocs.io/en/latest/) |
| 2.x, 5.x, 6.x | 2.6 |2019-01-09|[2.6](https://fscrawler.readthedocs.io/en/fscrawler-2.6) |
| 2.x, 5.x, 6.x | 2.5 |2018-08-04|[2.5](https://fscrawler.readthedocs.io/en/fscrawler-2.5) |
| 2.x, 5.x, 6.x | **2.4** |2017-08-11|[2.4](https://github.com/dadoonet/fscrawler/blob/fscrawler-2.4/README.md) |
Expand Down
31 changes: 31 additions & 0 deletions distribution/es7/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>fr.pilato.elasticsearch.crawler</groupId>
<artifactId>fscrawler-distribution</artifactId>
<version>2.7-SNAPSHOT</version>
</parent>

<artifactId>fscrawler-es7</artifactId>
<name>FSCrawler ZIP Distribution for Elasticsearch 7.x</name>

<dependencies>
<dependency>
<groupId>fr.pilato.elasticsearch.crawler</groupId>
<artifactId>fscrawler-elasticsearch-client-v7</artifactId>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
</plugin>
</plugins>
</build>

</project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* Licensed to David Pilato (the "Author") under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. Author licenses this
* file to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package fr.pilato.elasticsearch.crawler.fs.cli;

/**
* Main entry point to launch FsCrawler
*/
public class FsCrawler extends FsCrawlerCli {

}
1 change: 1 addition & 0 deletions distribution/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
<packaging>pom</packaging>

<modules>
<module>es7</module>
<module>es6</module>
<module>es5</module>
</modules>
Expand Down
16 changes: 10 additions & 6 deletions docs/source/admin/fs/elasticsearch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,16 +69,16 @@ Mappings

When FSCrawler needs to create the doc index, it applies some default
settings and mappings which are read from
``~/.fscrawler/_default/6/_settings.json``. You can read its content
``~/.fscrawler/_default/7/_settings.json``. You can read its content
from `the
source <https://github.com/dadoonet/fscrawler/blob/master/settings/src/main/resources/fr/pilato/elasticsearch/crawler/fs/_default/6/_settings.json>`__.
source <https://github.com/dadoonet/fscrawler/blob/master/settings/src/main/resources/fr/pilato/elasticsearch/crawler/fs/_default/7/_settings.json>`__.

Settings define an analyzer named ``fscrawler_path`` which uses a `path
hierarchy
tokenizer <https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pathhierarchy-tokenizer.html>`__.

FSCrawler applies as well a mapping automatically for the folders which can also be
read from `the source <https://github.com/dadoonet/fscrawler/blob/master/settings/src/main/resources/fr/pilato/elasticsearch/crawler/fs/_default/6/_settings_folder.json>`__.
read from `the source <https://github.com/dadoonet/fscrawler/blob/master/settings/src/main/resources/fr/pilato/elasticsearch/crawler/fs/_default/7/_settings_folder.json>`__.

You can also display the index mapping being used with Kibana:

Expand Down Expand Up @@ -106,6 +106,8 @@ Or fall back to the command line:
- ``5/_settings_folder.json``: for elasticsearch 5.x series folder index settings
- ``6/_settings.json``: for elasticsearch 6.x series document index settings
- ``6/_settings_folder.json``: for elasticsearch 6.x series folder index settings
- ``7/_settings.json``: for elasticsearch 7.x series document index settings
- ``7/_settings_folder.json``: for elasticsearch 7.x series folder index settings

.. note::

Expand All @@ -117,7 +119,7 @@ Creating your own mapping (analyzers)

If you want to define your own index settings and mapping to set
analyzers for example, you can either create the index and push the
mapping or define a ``~/.fscrawler/_default/6/_settings.json`` document
mapping or define a ``~/.fscrawler/_default/7/_settings.json`` document
which contains the index settings and mappings you wish **before
starting the FSCrawler**.

Expand Down Expand Up @@ -364,8 +366,8 @@ documents against an elasticsearch cluster running version ``6.x``.
If you create the following files, they will be picked up at job start
time instead of the :ref:`default ones <mappings>`:

- ``~/.fscrawler/{job_name}/_mappings/6/_settings.json``
- ``~/.fscrawler/{job_name}/_mappings/6/_settings_folder.json``
- ``~/.fscrawler/{job_name}/_mappings/7/_settings.json``
- ``~/.fscrawler/{job_name}/_mappings/7/_settings_folder.json``

.. tip::
You can do the same for other elasticsearch versions with:
Expand All @@ -374,6 +376,8 @@ time instead of the :ref:`default ones <mappings>`:
- ``~/.fscrawler/{job_name}/_mappings/2/_settings_folder.json`` for 2.x series (deprecated)
- ``~/.fscrawler/{job_name}/_mappings/5/_settings.json`` for 5.x series
- ``~/.fscrawler/{job_name}/_mappings/5/_settings_folder.json`` for 5.x series
- ``~/.fscrawler/{job_name}/_mappings/6/_settings.json`` for 6.x series
- ``~/.fscrawler/{job_name}/_mappings/6/_settings_folder.json`` for 6.x series

Replace existing mapping
""""""""""""""""""""""""
Expand Down
5 changes: 1 addition & 4 deletions docs/source/admin/fs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,10 +57,7 @@ The job file must comply to the following ``json`` specifications:
"password" : "password"
},
"rest" : {
"scheme" : "HTTP",
"host" : "127.0.0.1",
"port" : 8080,
"endpoint" : "fscrawler"
"url" : "https://127.0.0.1:8080/fscrawler"
}
}
Expand Down
7 changes: 7 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -236,10 +236,13 @@ def read_version(full_version=True):
.. |JPEG2000_version| replace:: jai-imageio-jpeg2000:{fmt_jpeg_version}
.. |Download_URL_V5| replace:: fscrawler-es5-{fmt_release}
.. |Download_URL_V6| replace:: fscrawler-es6-{fmt_release}
.. |Download_URL_V7| replace:: fscrawler-es7-{fmt_release}
.. |Maven_Central_V5| replace:: fscrawler-es5-*
.. |Maven_Central_V6| replace:: fscrawler-es6-*
.. |Maven_Central_V7| replace:: fscrawler-es7-*
.. |Sonatype_V5| replace:: fscrawler-es5-*
.. |Sonatype_V6| replace:: fscrawler-es6-*
.. |Sonatype_V7| replace:: fscrawler-es7-*
.. _Tika: http://tika.apache.org/{fmt_tika_version}/
.. _ES: https://www.elastic.co/products/elasticsearch
Expand All @@ -252,10 +255,13 @@ def read_version(full_version=True):
.. _JPEG2000_version: http://repo1.maven.org/maven2/com/github/jai-imageio/jai-imageio-jpeg2000/{fmt_jpeg_version}/
.. _Download_URL_V5: {fmt_downloadUrl_V5}
.. _Download_URL_V6: {fmt_downloadUrl_V6}
.. _Download_URL_V7: {fmt_downloadUrl_V7}
.. _Maven_Central_V5: https://repo1.maven.org/maven2/fr/pilato/elasticsearch/crawler/fscrawler-es5/
.. _Maven_Central_V6: https://repo1.maven.org/maven2/fr/pilato/elasticsearch/crawler/fscrawler-es6/
.. _Maven_Central_V7: https://repo1.maven.org/maven2/fr/pilato/elasticsearch/crawler/fscrawler-es7/
.. _Sonatype_V5: https://oss.sonatype.org/content/repositories/snapshots/fr/pilato/elasticsearch/crawler/fscrawler-es5/
.. _Sonatype_V6: https://oss.sonatype.org/content/repositories/snapshots/fr/pilato/elasticsearch/crawler/fscrawler-es6/
.. _Sonatype_V7: https://oss.sonatype.org/content/repositories/snapshots/fr/pilato/elasticsearch/crawler/fscrawler-es7/
""".format(
fmt_tika_version=config.get('3rdParty', 'TikaVersion'),
fmt_es_version=config.get('3rdParty', 'ElasticsearchVersion'),
Expand All @@ -264,5 +270,6 @@ def read_version(full_version=True):
fmt_jpeg_version=config.get('3rdParty', 'JpegVersion'),
fmt_downloadUrl_V5=downloadUrlV5,
fmt_downloadUrl_V6=downloadUrlV6,
fmt_downloadUrl_V7=downloadUrlV7,
fmt_release=release
)
14 changes: 8 additions & 6 deletions docs/source/dev/build.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ Run tests from your IDE
To run integration tests from your IDE, you need to start tests in ``fscrawler-it-common`` module.
But you need first to specify the Maven profile to use and rebuild the project.

* ``es-7x`` for Elasticsearch 7.x
* ``es-6x`` for Elasticsearch 6.x
* ``es-5x`` for Elasticsearch 5.x

Expand All @@ -55,13 +56,14 @@ Run tests with an external cluster

To run the test suite against an elasticsearch instance running locally, just run::

mvn verify -pl fr.pilato.elasticsearch.crawler:fscrawler-it-v6
mvn verify -pl fr.pilato.elasticsearch.crawler:fscrawler-it-v7

.. tip::

If you want to run against a version 5, run::
If you want to run against a version 5 or 6, run::

mvn verify -pl fr.pilato.elasticsearch.crawler:fscrawler-it-v5
mvn verify -pl fr.pilato.elasticsearch.crawler:fscrawler-it-v6

If elasticsearch is not running yet on ``http://localhost:9200``, FSCrawler project will run a Docker instance before
the tests start.
Expand All @@ -70,7 +72,7 @@ the tests start.

If you are using a secured instance, use ``tests.cluster.user``, ``tests.cluster.pass`` and ``tests.cluster.url``::

mvn verify -pl fr.pilato.elasticsearch.crawler:fscrawler-it-v6 \
mvn verify -pl fr.pilato.elasticsearch.crawler:fscrawler-it-v7 \
-Dtests.cluster.user=elastic \
-Dtests.cluster.pass=changeme \
-Dtests.cluster.url=https://127.0.0.1:9200 \
Expand All @@ -81,14 +83,14 @@ the tests start.
`Elasticsearch service by Elastic <https://www.elastic.co/cloud/elasticsearch-service>`_,
you can also use ``tests.cluster.url`` to set where elasticsearch is running::

mvn verify -pl fr.pilato.elasticsearch.crawler:fscrawler-it-v6 \
mvn verify -pl fr.pilato.elasticsearch.crawler:fscrawler-it-v7 \
-Dtests.cluster.user=elastic \
-Dtests.cluster.pass=changeme \
-Dtests.cluster.url=https://XYZ.es.io:9243

Or even easier, you can use the ``Cloud ID`` available on you Cloud Console::

mvn verify -pl fr.pilato.elasticsearch.crawler:fscrawler-it-v6 \
mvn verify -pl fr.pilato.elasticsearch.crawler:fscrawler-it-v7 \
-Dtests.cluster.user=elastic \
-Dtests.cluster.pass=changeme \
-Dtests.cluster.cloud_id=fscrawler:ZXVyb3BlLXdlc3QxLmdjcC5jbG91ZC5lcy5pbyQxZDFlYTk5Njg4Nzc0NWE2YTJiN2NiNzkzMTUzNDhhMyQyOTk1MDI3MzZmZGQ0OTI5OTE5M2UzNjdlOTk3ZmU3Nw==
Expand All @@ -111,7 +113,7 @@ Some options are available from the command line when running the tests:

For example::

mvn install -rf :fscrawler-it -Pes-6x -Dtests.output=always
mvn install -rf :fscrawler-it -Dtests.output=always

Check for vulnerabilities (CVE)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
10 changes: 10 additions & 0 deletions docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Download FSCrawler
Depending on your Elasticsearch cluster version, you can download
FSCrawler |version| using the following links:

* |Download_URL_V7|_ for Elasticsearch V7.
* |Download_URL_V6|_ for Elasticsearch V6.
* |Download_URL_V5|_ for Elasticsearch V5.

Expand All @@ -16,6 +17,7 @@ Download FSCrawler
This is a **SNAPSHOT** version.
You can also download a **stable** version from Maven Central:

* |Maven_Central_V7|_ for Elasticsearch V7.
* |Maven_Central_V6|_ for Elasticsearch V6.
* |Maven_Central_V5|_ for Elasticsearch V5.

Expand All @@ -24,6 +26,7 @@ Download FSCrawler
Depending on your Elasticsearch cluster version, you can download
FSCrawler |version| using the following links:

* |Download_URL_V7|_ for Elasticsearch V7.
* |Download_URL_V6|_ for Elasticsearch V6.
* |Download_URL_V5|_ for Elasticsearch V5.

Expand All @@ -32,11 +35,13 @@ Download FSCrawler
This is a **stable** version.
You can choose another version than |version| from Maven Central:

* |Maven_Central_V7|_ for Elasticsearch V7.
* |Maven_Central_V6|_ for Elasticsearch V6.
* |Maven_Central_V5|_ for Elasticsearch V5.

You can also download a **SNAPSHOT** version from Sonatype:

* |Sonatype_V7|_ for Elasticsearch V7.
* |Sonatype_V6|_ for Elasticsearch V6.
* |Sonatype_V5|_ for Elasticsearch V5.

Expand Down Expand Up @@ -328,3 +333,8 @@ Upgrade to 2.6
an easier notation using ``url`` setting like ``http://127.0.0.1:9200``. You will need to modify
your existing settings and use the new notation if warned.

Upgrade to 2.7
~~~~~~~~~~~~~~

- FSCrawler comes now with an elasticsearch 7.x implementation.

Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ public static ElasticsearchClient getInstance(Path config, FsSettings settings)

Objects.requireNonNull(settings, "settings can not be null");

for (int i = 6; i >= 1; i--) {
for (int i = 7; i >= 1; i--) {
logger.debug("Trying to find a client version {}", i);

try {
Expand Down
37 changes: 37 additions & 0 deletions elasticsearch-client/elasticsearch-client-v7/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<groupId>fr.pilato.elasticsearch.crawler</groupId>
<artifactId>fscrawler-elasticsearch-client</artifactId>
<version>2.7-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>

<artifactId>fscrawler-elasticsearch-client-v7</artifactId>
<name>FSCrawler Elasticsearch Client V7</name>

<repositories>
<repository>
<id>elastic-lucene-snapshots</id>
<name>Elastic Lucene Snapshots</name>
<url>http://s3.amazonaws.com/download.elasticsearch.org/lucenesnapshots/774e9aefbc</url>
<releases><enabled>true</enabled></releases>
<snapshots><enabled>false</enabled></snapshots>
</repository>
</repositories>

<dependencies>
<dependency>
<groupId>fr.pilato.elasticsearch.crawler</groupId>
<artifactId>fscrawler-elasticsearch-client-base</artifactId>
</dependency>
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>${elasticsearch7.version}</version>
</dependency>
</dependencies>

</project>
Loading

0 comments on commit 3f6cd5e

Please sign in to comment.