native
Folders and files
Name | Name | Last commit date | ||
---|---|---|---|---|
parent directory.. | ||||
This is a placeholder to put Hadoop native libraries -it's a component that contains platform-specific native code that significantly speeds up data (de)compression. Since there are no maven artifacts for this component the build process can't automatically download it. These libraries are purely optional, and if they are missing Hadoop will use corresponding pure Java components. The impact of native compression becomes noticeable with larger datasets and weaker CPU-s - if you notice that the CPU is routinely saturated when a job is sorting or reducing, then using these libs may help. Installation instructions ========================= You can obtain the necessary files from a distribution package of Hadoop, e.g. hadoop-0.20.2.tar.gz. Unpack this archive, and copy the content of lib/native here, so that the layout looks like this: <Nutch home>/lib/native/Linux-amd64-64/... <Nutch home>/lib/native/Linux-i386-32/... Local runtime ------------- The build process will include these native libraries when preparing the /runtime/local environment for running in local mode. /runtime/local/bin/nutch knows how to use these libs - if they are found and correctly used that's fine, however if they are not and you see WARN, don't worry, however you will see lines like this in your logs: 15:36:02,126 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ... probably quite a few more of the same ... Distributed runtime ------------------- If you want to use this component in an existing Hadoop cluster (when using /runtime/deploy artifacts) you need to make sure these files are placed in Hadoop/lib/native directory on each node, and then restart the cluster. If you installed the cluster from a distribution package of Hadoop then these libraries should already be in the right place and you shouldn't need to do anything else.