Skip to content

Commit

Permalink
removed Sigar library dependency
Browse files Browse the repository at this point in the history
This change is the first step towards more unified node statistics
across different platforms.

The following changes were made:

* network stats are deprecated and now always return `0` (as previously
when Sigar was not available
* read/write filesystem metrics for individual disks are deprecated and
now always return `-1`
* system/user/idle/stolen values for CPU are deprecated and now always
return `-1`
* a new "used" value for CPU has been added
* system/user values for prorcess CPU are deprecated and now always
return `-1`

On an internal level, there is now only a single concrete implementation of
ExtendedNodeInfo, and the whole Sigar loading mechanism has been
removed.
  • Loading branch information
chaudum committed Dec 12, 2017
1 parent 873504e commit 4814fd4
Show file tree
Hide file tree
Showing 80 changed files with 1,474 additions and 2,030 deletions.
11 changes: 11 additions & 0 deletions CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,14 @@ the appropriate section of the docs.
Breaking Changes
================

- Certain metrics in the ``sys.nodes`` table have been deprecated and now
always return ``-1``, except network metrics with still return ``0`` as
previously when Sigar was not available.
The affected metrics are: network metrics, read/write filesystem metrics for
individual disks, system/user/idle/stolen values for CPU, and system/user
values for process CPU.
This is due to the removal of the "Sigar" library dependency.

- The default value of the setting ``auth.host_based.enabled`` (``false``) is
overwritten with ``true`` in the ``crate.yml`` that is shipped with the
tarball and Linux distributions of CrateDB and also contains a sane default
Expand All @@ -39,6 +47,9 @@ Breaking Changes
Changes
=======

- Added ``os['cpu']['used']`` column to ``sys.nodes`` table. This replaces the
deprecated system/user/idle/stolen values.

- Subqueries which filter on primary key columns now have the same realtime
semantics as the equivalent top-level queries.

Expand Down
20 changes: 2 additions & 18 deletions app/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -424,21 +424,6 @@ task collectEnterpriseModules(dependsOn: [':enterprise:users:jar', ':enterprise:
collectEnterpriseModules.outputs.file ('enterprise')


// Enable dynamic loading of sigar; (There is no fixed compile-time-dependency)
// It's known to cause trouble on specific platform, so it's important
// that the JAR can be removed without breaking CrateDB
task collectOSSModules(dependsOn: [':sigar:jar']) {
doLast {
copy {
from(project(':sigar').tasks.jar.archivePath)
from(project(':sigar').projectDir.path + "/lib")
into 'oss_modules/sigar'
}
}
}
collectOSSModules.outputs.file ('oss_modules')


task downloadPlugins(
dependsOn: ['downloadAdminUI',
':es:es-repository-hdfs:jar',
Expand Down Expand Up @@ -514,7 +499,7 @@ task downloadAdminUI {
}


processResources.dependsOn(downloadPlugins, downloadCrash, collectEnterpriseModules, collectOSSModules)
processResources.dependsOn(downloadPlugins, downloadCrash, collectEnterpriseModules)

task(runDebug, dependsOn: 'classes', type: JavaExec) {
main = 'io.crate.bootstrap.CrateDB'
Expand Down Expand Up @@ -550,8 +535,7 @@ sourceSets {
}

clean.dependsOn(['cleanDownloadPlugins',
'cleanCollectEnterpriseModules',
'cleanCollectOSSModules'])
'cleanCollectEnterpriseModules'])


def extractTopfolder(File src, String trg) {
Expand Down
2 changes: 1 addition & 1 deletion app/src/bin/crate.bat
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ REM Disable netty recycler
set JAVA_OPTS=%JAVA_OPTS% -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0

if "%CRATE_CLASSPATH%" == "" (
set CRATE_CLASSPATH=%CRATE_HOME%/lib/*;%CRATE_HOME%/lib/enterprise/*;%CRATE_HOME%/lib/sigar/*
set CRATE_CLASSPATH=%CRATE_HOME%/lib/*;%CRATE_HOME%/lib/enterprise/*
) else (
ECHO Error: Don't modify the classpath with CRATE_CLASSPATH. 1>&2
ECHO Add plugins and their dependencies into the plugins/ folder instead. 1>&2
Expand Down
2 changes: 1 addition & 1 deletion app/src/bin/crate.in.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ EOF
exit 1
fi

CRATE_CLASSPATH=$CRATE_HOME/lib/*:$CRATE_HOME/lib/enterprise/*:$CRATE_HOME/lib/sigar/*
CRATE_CLASSPATH=$CRATE_HOME/lib/*:$CRATE_HOME/lib/enterprise/*

if [ "x$CRATE_MIN_MEM" = "x" ]; then
CRATE_MIN_MEM=256m
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ private void setup(boolean addShutdownHook, Environment environment) throws Boot
}

/*
* DISABLED setup of security manager due to policy problems with plugins (e.g. SigarPlugin will not work)
* DISABLED setup of security manager due to policy problems with plugins and dependencies.
*/
// install SM after natives, shutdown hooks, etc.
//try {
Expand Down
1 change: 0 additions & 1 deletion blackbox/bin/test-all
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,4 @@ DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
$DIR/test-docs
$DIR/test-hdfs
$DIR/test-jmx
$DIR/test-sigar
$DIR/test-sqllogic
4 changes: 0 additions & 4 deletions blackbox/bin/test-sigar

This file was deleted.

5 changes: 0 additions & 5 deletions blackbox/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,6 @@ task monitoringTest(type: Exec) {
commandLine "$projectDir/bin/test-jmx"
}

task sigarTest(type: Exec) {
commandLine "$projectDir/bin/test-sigar"
}

task itest(type: Exec) {
commandLine "$projectDir/bin/test-docs", '-1', '-t', '!process_test'
}
Expand All @@ -76,7 +72,6 @@ task buildDocs(type: Exec, dependsOn: bootstrap) {

hdfsTest.dependsOn(unpackDistTar, bootstrap, lessLogging, ignoreDiskThreshold,
project(':es:es-repository-hdfs').blackBoxTestJar)
sigarTest.dependsOn(unpackDistTar, bootstrap, lessLogging, ignoreDiskThreshold)
monitoringTest.dependsOn(unpackDistTar, bootstrap, lessLogging, ignoreDiskThreshold)
itest.dependsOn(unpackDistTar, bootstrap, lessLogging, ignoreDiskThreshold)
gtest.dependsOn(unpackDistTar, bootstrap, lessLogging, ignoreDiskThreshold)
Expand Down
58 changes: 51 additions & 7 deletions blackbox/docs/admin/system-information.txt
Original file line number Diff line number Diff line change
Expand Up @@ -371,12 +371,20 @@ The table schema is as follows:
| ``fs['disks']['available']`` | Available space of the disk in bytes. | ``LONG`` |
+----------------------------------+------------------------------------------------+-------------+
| ``fs['disks']['reads']`` | Number of reads on the disk. | ``LONG`` |
| | | |
| | DEPRECATED: always returns -1 | |
+----------------------------------+------------------------------------------------+-------------+
| ``fs['disks']['bytes_read']`` | Total size of reads on the disk in bytes. | ``LONG`` |
| | | |
| | DEPRECATED: always returns -1 | |
+----------------------------------+------------------------------------------------+-------------+
| ``fs['disks']['writes']`` | Number of writes on the disk. | ``LONG`` |
| | | |
| | DEPRECATED: always returns -1 | |
+----------------------------------+------------------------------------------------+-------------+
| ``fs['disks']['bytes_written']`` | Total size of writes on the disk in bytes. | ``LONG`` |
| | | |
| | DEPRECATED: always returns -1 | |
+----------------------------------+------------------------------------------------+-------------+
| ``fs['data']`` | Information about data paths used by the node. | ``ARRAY`` |
+----------------------------------+------------------------------------------------+-------------+
Expand Down Expand Up @@ -421,21 +429,32 @@ The table schema is as follows:
| ``os`` | Operating system stats | ``OBJECT`` |
+-------------------------------------------------+------------------------------------------------------+-------------+
| ``os['uptime']`` | System uptime in milliseconds | ``LONG`` |
| | | |
| | Requires allowing system calls on Windows and macOS. | |
| | See notes in :ref:`os_uptime_limitations`. | |
+-------------------------------------------------+------------------------------------------------------+-------------+
| ``os['timestamp']`` | UNIX timestamp in millisecond resolution | ``LONG`` |
+-------------------------------------------------+------------------------------------------------------+-------------+
| ``os['cpu']`` | Information about CPU utilization | ``OBJECT`` |
+-------------------------------------------------+------------------------------------------------------+-------------+
| ``os['cpu']['used']`` | System CPU usage as percentage | ``SHORT`` |
+-------------------------------------------------+------------------------------------------------------+-------------+
| ``os['cpu']['system']`` | CPU time used by the system | ``SHORT`` |
| | | |
| | DEPRECATED: always returns -1 | |
+-------------------------------------------------+------------------------------------------------------+-------------+
| ``os['cpu']['user']`` | CPU time used by applications | ``SHORT`` |
| | | |
| | DEPRECATED: always returns -1 | |
+-------------------------------------------------+------------------------------------------------------+-------------+
| ``os['cpu']['idle']`` | Idle CPU time | ``SHORT`` |
+-------------------------------------------------+------------------------------------------------------+-------------+
| ``os['cpu']['used']`` | Used CPU (system + user) | ``SHORT`` |
| | | |
| | DEPRECATED: always returns -1 | |
+-------------------------------------------------+------------------------------------------------------+-------------+
| ``os['cpu']['stolen']`` | The amount of CPU 'stolen' from this virtual | ``SHORT`` |
| | machine by the hypervisor for other tasks. | |
| | | |
| | DEPRECATED: always returns -1 | |
+-------------------------------------------------+------------------------------------------------------+-------------+
| ``os['probe_timestamp']`` | Unix timestamp at the time of collection | ``LONG`` |
| | of the OS probe. | |
Expand Down Expand Up @@ -480,16 +499,34 @@ The table schema is as follows:
| ``os['cgroup']['mem']['limit_bytes']`` | The max. amount of user memory in the cgroup. | ``STRING`` |
+-------------------------------------------------+------------------------------------------------------+-------------+

.. note::

Cgroup metrics only work if the stats are available from ``/sys/fs/cgroup/cpu``
and ``/sys/fs/cgroup/cpuacct``.

The cpu information values are cached for 1s. They might differ from the actual
values at query time. Use the probe timestamp to get the time of collection.
When analyzing the cpu usage over time, always use ``os['probe_timestamp']`` to
calculate the time difference between 2 probes.

.. _os_cgroup_limitations:

Cgroup Limitations
..................

.. NOTE::

Cgroup metrics only work if the stats are available from
``/sys/fs/cgroup/cpu`` and ``/sys/fs/cgroup/cpuacct``.

.. _os_uptime_limitations:

Uptime Limitations
..................

.. NOTE::

os['uptime'] required a system call when running CrateDB on Windows or
macOS, however, system calls are not permitted by default. If you require
this metric you need to allow system calls by setting ``bootstrap.seccomp``
to ``false``. This setting must be set in the crate.yml or via command line
argument and cannot be changed at runtime.

``os_info``
-----------

Expand Down Expand Up @@ -526,6 +563,9 @@ calculate the time difference between 2 probes.
``network``
-----------

Network statistics are deprecated in CrateDB 2.3 and may completely be removed
in subsequent versions. All ``LONG`` columns always return ``0``.

+--------------------------------------------------------+--------------------------------------------------------------------------------------------+-------------+
| Column Name | Description | Return Type |
+========================================================+============================================================================================+=============+
Expand Down Expand Up @@ -587,8 +627,12 @@ calculate the time difference between 2 probes.
| | in percent. | |
+------------------------------------------+------------------------------------------------+--------------+
| ``process['cpu']['user']`` | The process CPU user time in milliseconds. | ``LONG`` |
| | | |
| | DEPRECATED: always returns -1 | |
+------------------------------------------+------------------------------------------------+--------------+
| ``process['cpu']['system']`` | The process CPU kernel time in milliseconds. | ``LONG`` |
| | | |
| | DEPRECATED: always returns -1 | |
+------------------------------------------+------------------------------------------------+--------------+

The cpu information values are cached for 1s. They might differ from the actual
Expand Down
75 changes: 0 additions & 75 deletions blackbox/sigar/src/tests.py

This file was deleted.

1 change: 1 addition & 0 deletions core/src/main/java/io/crate/monitor/CgroupMemoryProbe.java
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
* @see <a href="github.com/elastic/elasticsearch/blob/6.1/core/src/main/java/org/elasticsearch/monitor/os/OsProbe.java">OsProbe</a>
*
*/
@Deprecated
public class CgroupMemoryProbe {

private static String readSingleLine(final Path path) throws IOException {
Expand Down
Loading

0 comments on commit 4814fd4

Please sign in to comment.