forked from elastic/elasticsearch
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Migrated from ES-Hadoop. Contains several improvements regarding: * Security Takes advantage of the pluggable security in ES 2.2 and uses that in order to grant the necessary permissions to the Hadoop libs. It relies on a dedicated DomainCombiner to grant permissions only when needed only to the libraries installed in the plugin folder Add security checks for SpecialPermission/scripting and provides out of the box permissions for the latest Hadoop 1.x (1.2.1) and 2.x (2.7.1) * Testing Uses a customized Local FS to perform actual integration testing of the Hadoop stack (and thus to make sure the proper permissions and ACC blocks are in place) however without requiring extra permissions for testing. If needed, a MiniDFS cluster is provided (though it requires extra permissions to bind ports) Provides a RestIT test * Build system Picks the build system used in ES (still Gradle)
- Loading branch information
Showing
26 changed files
with
1,778 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
[[repository-hdfs]] | ||
=== Hadoop HDFS Repository Plugin | ||
|
||
The HDFS repository plugin adds support for using HDFS File System as a repository for | ||
{ref}/modules-snapshots.html[Snapshot/Restore]. | ||
|
||
[[repository-hdfs-install]] | ||
[float] | ||
==== Installation | ||
|
||
This plugin can be installed using the plugin manager: | ||
|
||
[source,sh] | ||
---------------------------------------------------------------- | ||
sudo bin/plugin install repository-hdfs | ||
sudo bin/plugin install repository-hdfs-hadoop2 | ||
sudo bin/plugin install repository-hdfs-lite | ||
---------------------------------------------------------------- | ||
|
||
The plugin must be installed on every node in the cluster, and each node must | ||
be restarted after installation. | ||
|
||
[[repository-hdfs-remove]] | ||
[float] | ||
==== Removal | ||
|
||
The plugin can be removed with the following command: | ||
|
||
[source,sh] | ||
---------------------------------------------------------------- | ||
sudo bin/plugin remove repository-hdfs | ||
sudo bin/plugin remove repository-hdfs-hadoop2 | ||
sudo bin/plugin remove repository-hdfs-lite | ||
---------------------------------------------------------------- | ||
|
||
The node must be stopped before removing the plugin. | ||
|
||
[[repository-hdfs-usage]] | ||
==== Getting started with HDFS | ||
|
||
The HDFS snapshot/restore plugin comes in three _flavors_: | ||
|
||
* Default / Hadoop 1.x:: | ||
The default version contains the plugin jar alongside Apache Hadoop 1.x (stable) dependencies. | ||
* YARN / Hadoop 2.x:: | ||
The `hadoop2` version contains the plugin jar plus the Apache Hadoop 2.x (also known as YARN) dependencies. | ||
* Lite:: | ||
The `lite` version contains just the plugin jar, without any Hadoop dependencies. The user should provide these (read below). | ||
|
||
[[repository-hdfs-flavor]] | ||
===== What version to use? | ||
|
||
It depends on whether Hadoop is locally installed or not and if not, whether it is compatible with Apache Hadoop clients. | ||
|
||
* Are you using Apache Hadoop (or a _compatible_ distro) and do not have installed on the Elasticsearch nodes?:: | ||
+ | ||
If the answer is yes, for Apache Hadoop 1 use the default `repository-hdfs` or `repository-hdfs-hadoop2` for Apache Hadoop 2. | ||
+ | ||
* If you are have Hadoop installed locally on the Elasticsearch nodes or are using a certain distro:: | ||
+ | ||
Use the `lite` version and place your Hadoop _client_ jars and their dependencies in the plugin folder under `hadoop-libs`. | ||
For large deployments, it is recommended to package the libraries in the plugin zip and deploy it manually across nodes | ||
(and thus avoiding having to do the libraries setup on each node). | ||
|
||
[[repository-hdfs-security]] | ||
==== Handling JVM Security and Permissions | ||
|
||
Out of the box, Elasticsearch runs in a JVM with the security manager turned _on_ to make sure that unsafe or sensitive actions | ||
are allowed only from trusted code. Hadoop however is not really designed to run under one; it does not rely on privileged blocks | ||
to execute sensitive code, of which it uses plenty. | ||
|
||
The `repository-hdfs` plugin provides the necessary permissions for both Apache Hadoop 1.x and 2.x (latest versions) to successfully | ||
run in a secured JVM as one can tell from the number of permissions required when installing the plugin. | ||
However using a certain Hadoop File-System (outside DFS), a certain distro or operating system (in particular Windows), might require | ||
additional permissions which are not provided by the plugin. | ||
|
||
In this case there are several workarounds: | ||
* add the permission into `plugin-security.policy` (available in the plugin folder) | ||
* disable the security manager through `es.security.manager.enabled=false` configurations setting - NOT RECOMMENDED | ||
|
||
If you find yourself in such a situation, please let us know what Hadoop distro version and OS you are using and what permission is missing | ||
by raising an issue. Thank you! | ||
|
||
[[repository-hdfs-config]] | ||
==== Configuration Properties | ||
|
||
Once installed, define the configuration for the `hdfs` repository through `elasticsearch.yml` or the | ||
{ref}/modules-snapshots.html[REST API]: | ||
|
||
[source] | ||
---- | ||
repositories | ||
hdfs: | ||
uri: "hdfs://<host>:<port>/" # optional - Hadoop file-system URI | ||
path: "some/path" # required - path with the file-system where data is stored/loaded | ||
load_defaults: "true" # optional - whether to load the default Hadoop configuration (default) or not | ||
conf_location: "extra-cfg.xml" # optional - Hadoop configuration XML to be loaded (use commas for multi values) | ||
conf.<key> : "<value>" # optional - 'inlined' key=value added to the Hadoop configuration | ||
concurrent_streams: 5 # optional - the number of concurrent streams (defaults to 5) | ||
compress: "false" # optional - whether to compress the metadata or not (default) | ||
chunk_size: "10mb" # optional - chunk size (disabled by default) | ||
---- | ||
|
||
NOTE: Be careful when including a paths within the `uri` setting; Some implementations ignore them completely while | ||
others consider them. In general, we recommend keeping the `uri` to a minimum and using the `path` element instead. | ||
|
||
===== Plugging other file-systems | ||
|
||
Any HDFS-compatible file-systems (like Amazon `s3://` or Google `gs://`) can be used as long as the proper Hadoop | ||
configuration is passed to the Elasticsearch plugin. In practice, this means making sure the correct Hadoop configuration | ||
files (`core-site.xml` and `hdfs-site.xml`) and its jars are available in plugin classpath, just as you would with any | ||
other Hadoop client or job. | ||
|
||
Otherwise, the plugin will only read the _default_, vanilla configuration of Hadoop and will not be able to recognized | ||
the plugged-in file-system. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,203 @@ | ||
/* | ||
* Licensed to Elasticsearch under one or more contributor | ||
* license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright | ||
* ownership. Elasticsearch licenses this file to you under | ||
* the Apache License, Version 2.0 (the "License"); you may | ||
* not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
//apply plugin: 'nebula.provided-base' | ||
|
||
esplugin { | ||
description 'The HDFS repository plugin adds support for Hadoop Distributed File-System (HDFS) repositories.' | ||
classname 'org.elasticsearch.plugin.hadoop.hdfs.HdfsPlugin' | ||
} | ||
|
||
configurations { | ||
hadoop1 | ||
hadoop2 | ||
} | ||
|
||
versions << [ | ||
'hadoop1': '1.2.1', | ||
'hadoop2': '2.7.1' | ||
] | ||
|
||
dependencies { | ||
provided "org.elasticsearch:elasticsearch:${versions.elasticsearch}" | ||
provided "org.apache.hadoop:hadoop-core:${versions.hadoop1}" | ||
|
||
// use Hadoop1 to compile and test things (a subset of Hadoop2) | ||
testCompile "org.apache.hadoop:hadoop-core:${versions.hadoop1}" | ||
testCompile "org.apache.hadoop:hadoop-test:${versions.hadoop1}" | ||
// Hadoop dependencies | ||
testCompile "commons-configuration:commons-configuration:1.6" | ||
testCompile "commons-lang:commons-lang:${versions.commonslang}" | ||
testCompile "commons-collections:commons-collections:3.2.2" | ||
testCompile "commons-net:commons-net:1.4.1" | ||
testCompile "org.mortbay.jetty:jetty:6.1.26" | ||
testCompile "org.mortbay.jetty:jetty-util:6.1.26" | ||
testCompile "org.mortbay.jetty:servlet-api:2.5-20081211" | ||
testCompile "com.sun.jersey:jersey-core:1.8" | ||
|
||
|
||
hadoop1("org.apache.hadoop:hadoop-core:${versions.hadoop1}") { | ||
exclude module: "commons-cli" | ||
exclude group: "com.sun.jersey" | ||
exclude group: "org.mortbay.jetty" | ||
exclude group: "tomcat" | ||
exclude module: "commons-el" | ||
exclude module: "hsqldb" | ||
exclude group: "org.eclipse.jdt" | ||
exclude module: "commons-beanutils" | ||
exclude module: "commons-beanutils-core" | ||
exclude module: "junit" | ||
// provided by ES itself | ||
exclude group: "log4j" | ||
} | ||
|
||
hadoop2("org.apache.hadoop:hadoop-client:${versions.hadoop2}") { | ||
exclude module: "commons-cli" | ||
exclude group: "com.sun.jersey" | ||
exclude group: "com.sun.jersey.contribs" | ||
exclude group: "com.sun.jersey.jersey-test-framework" | ||
exclude module: "guice" | ||
exclude group: "org.mortbay.jetty" | ||
exclude group: "tomcat" | ||
exclude module: "commons-el" | ||
exclude module: "hsqldb" | ||
exclude group: "org.eclipse.jdt" | ||
exclude module: "commons-beanutils" | ||
exclude module: "commons-beanutils-core" | ||
exclude module: "javax.servlet" | ||
exclude module: "junit" | ||
// provided by ES itself | ||
exclude group: "log4j" | ||
} | ||
|
||
hadoop2("org.apache.hadoop:hadoop-hdfs:${versions.hadoop2}") { | ||
exclude module: "guava" | ||
exclude module: "junit" | ||
// provided by ES itself | ||
exclude group: "log4j" | ||
} | ||
} | ||
|
||
configurations.all { | ||
resolutionStrategy { | ||
force "commons-codec:commons-codec:${versions.commonscodec}" | ||
force "commons-logging:commons-logging:${versions.commonslogging}" | ||
force "commons-lang:commons-lang:2.6" | ||
force "commons-httpclient:commons-httpclient:3.0.1" | ||
force "org.codehaus.jackson:jackson-core-asl:1.8.8" | ||
force "org.codehaus.jackson:jackson-mapper-asl:1.8.8" | ||
force "com.google.code.findbugs:jsr305:3.0.0" | ||
force "com.google.guava:guava:16.0.1" | ||
force "org.slf4j:slf4j-api:1.7.10" | ||
force "org.slf4j:slf4j-log4j12:1.7.10" | ||
} | ||
} | ||
|
||
|
||
dependencyLicenses { | ||
mapping from: /hadoop-core.*/, to: 'hadoop-1' | ||
mapping from: /hadoop-.*/, to: 'hadoop-2' | ||
} | ||
|
||
compileJava.options.compilerArgs << '-Xlint:-deprecation,-rawtypes' | ||
|
||
// main jar includes just the plugin classes | ||
jar { | ||
include "org/elasticsearch/plugin/hadoop/hdfs/*" | ||
} | ||
|
||
// hadoop jar (which actually depend on Hadoop) | ||
task hadoopLinkedJar(type: Jar, dependsOn:jar) { | ||
appendix "internal" | ||
from sourceSets.main.output.classesDir | ||
// exclude plugin | ||
exclude "org/elasticsearch/plugin/hadoop/hdfs/*" | ||
} | ||
|
||
|
||
bundlePlugin.dependsOn hadoopLinkedJar | ||
|
||
// configure 'bundle' as being w/o Hadoop deps | ||
bundlePlugin { | ||
into ("internal-libs") { | ||
from hadoopLinkedJar.archivePath | ||
} | ||
|
||
into ("hadoop-libs") { | ||
from configurations.hadoop2.allArtifacts.files | ||
from configurations.hadoop2 | ||
} | ||
} | ||
|
||
|
||
task distZipHadoop1(type: Zip, dependsOn: [hadoopLinkedJar, jar]) { zipTask -> | ||
from (zipTree(bundlePlugin.archivePath)) { | ||
include "*" | ||
include "internal-libs/**" | ||
} | ||
|
||
description = "Builds archive (with Hadoop1 dependencies) suitable for download page." | ||
classifier = "hadoop1" | ||
|
||
into ("hadoop-libs") { | ||
from configurations.hadoop1.allArtifacts.files | ||
from configurations.hadoop1 | ||
} | ||
} | ||
|
||
task distZipHadoop2(type: Zip, dependsOn: [hadoopLinkedJar, jar]) { zipTask -> | ||
from (zipTree(bundlePlugin.archivePath)) { | ||
include "*" | ||
include "internal-libs/**" | ||
} | ||
|
||
description = "Builds archive (with Hadoop2/YARN dependencies) suitable for download page." | ||
classifier = "hadoop2" | ||
|
||
into ("hadoop-libs") { | ||
from configurations.hadoop2.allArtifacts.files | ||
from configurations.hadoop2 | ||
} | ||
} | ||
|
||
task distZipNoHadoop(type: Zip, dependsOn: [hadoopLinkedJar, jar]) { zipTask -> | ||
from (zipTree(bundlePlugin.archivePath)) { | ||
exclude "hadoop-libs/**" | ||
} | ||
|
||
from sourceSets.main.output.resourcesDir | ||
|
||
description = "Builds archive (without any Hadoop dependencies) suitable for download page." | ||
classifier = "lite" | ||
} | ||
|
||
|
||
artifacts { | ||
archives bundlePlugin | ||
'default' bundlePlugin | ||
archives distZipHadoop1 | ||
archives distZipHadoop2 | ||
archives distZipNoHadoop | ||
} | ||
|
||
integTest { | ||
cluster { | ||
plugin(pluginProperties.extension.name, zipTree(distZipHadoop2.archivePath)) | ||
} | ||
} |
Oops, something went wrong.