WIP: Documentation on performance test results (BlueBrain#263)

Documentation on performance test results
imsdu · Nov 13, 2018 · afea65c · afea65c
1 parent 69bf94f
commit afea65c
Show file tree

Hide file tree

Showing 6 changed files with 100 additions and 59 deletions.
diff --git a/src/main/paradox/assets/img/performance_tests_environment.png b/src/main/paradox/assets/img/performance_tests_environment.png
diff --git a/src/main/paradox/docs/benchmarks/data-volume-and-scenarios.md b/src/main/paradox/docs/benchmarks/data-volume-and-scenarios.md
@@ -0,0 +1,29 @@
+# Data volume and scenarios
+
+Prior to the test execution a reasonable amount of data was injected into the system to ensure the system behaves well
+under a typical volume. Specifically:
+
+*   Total number of resources: 115,690,687
+*   Total number of triples (edges): 2,493,134,304
+*   Total number of entities (nodes): 352,856,595
+*   Total number of property types: 74
+*   Total number of classes (distinct values of `@type`): 24
+
+The data was generated by replicating a collection of _provenance patterns_, each spanning several resources:
+
+*   resource [examples](https://github.com/BlueBrain/nexus-tests/tree/master/src/main/resources/bbp)
+*   corresponding SHACL [schemas](https://github.com/BlueBrain/nexus-tests/tree/master/scripts/schemas)
+
+The full collection of scenarios can be found [here](https://github.com/BlueBrain/nexus-tests/tree/master/src/it/scala/ch/epfl/bluebrain/nexus/perf)
+each within its own file.
+
+Several scenarios were executed to verify the behaviour of the system, most notably:
+
+*   create resource (with validation)
+*   create resource (without validation)
+*   get resource by id
+*   tag resource
+*   get resource by revision, get resource by tag
+*   mixed operations: list resources, get resource by id, ElasticSearch query, BlazeGraph query, update resource
+
+Please head over to the @ref:[results](results.md) section for a summary of the results and conclusions.
diff --git a/src/main/paradox/docs/benchmarks/deployment-configuration.md b/src/main/paradox/docs/benchmarks/deployment-configuration.md
@@ -5,5 +5,8 @@ The deployment configuration and the number of nodes assigned to each Nexus serv
 
 ![deployment configuration](../../assets/img/performance_tests_environment.png)
 
+The benchmarks were run on a AWS EC2 `m5.large` server outside of the Kubernetes cluster.
 
-The benchmarks were run on a AWS EC2 `m5.2xlarge` server outside of the Kubernetes cluster.
+Preliminary tests shows that KG service is the most critical component of the system (as expected) and it has the most
+impact of the performance aspects of the system. Thus, during the tests the KG service cluster size was scaled to 1, 2,
+4 and 6 replicas along with the number of concurrent connections (using the same multiplier) during the test executions.
diff --git a/src/main/paradox/docs/benchmarks/index.md b/src/main/paradox/docs/benchmarks/index.md
@@ -1,7 +1,7 @@
 @@@ index
 
 * [Deployment Configuration](deployment-configuration.md)
-* [Scenarios](scenarios.md)
+* [Data Volume and Scenarios](data-volume-and-scenarios.md)
 * [Results](results.md)
 
 @@@
@@ -20,11 +20,9 @@ and how they were affected by different factors, especially:
 * **hardware configuration and scalability** - does assigning more hardware increase the performance of the system and can the system scale both horizontally and vertically.
 * **clustering** - what's the effect of changing from a single node to clustered deployment, as well as, what's the effect of adding more nodes to the cluster.
 
-The description of the test scenarios can be found @ref:[here](scenarios.md).
+The description of the test scenarios can be found @ref:[here](data-volume-and-scenarios.md).
 The test scenarios and scripts can be found in [nexus-tests](https://github.com/BlueBrain/nexus-tests) repository.
 The results of the benchmarks are described in detail in the @ref:[Results section](results.md).
 
 The benchmarks were run on a Kubernetes cluster deployed on AWS. Fore more details see @ref:[deployment configuration](deployment-configuration.md).
-The tests were run against v1 API of Nexus in October 2018.
-
-The benchmarks were using `final_number_of_instances` million of instances distributed in `final_number_of_projects` of projects.
+The tests were run against v1 API of Nexus in November 2018 using [Gatling](https://gatling.io/) version 3.0.0.
diff --git a/src/main/paradox/docs/benchmarks/results.md b/src/main/paradox/docs/benchmarks/results.md
@@ -1,3 +1,66 @@
 # Results
 
-TBC.
+## Create Simulation
+
+| Nodes | Users | Throughput (req/s) | p50 (ms) | p75 (ms) | p95 (ms) | p99 (ms) |
+|-------|-------|--------------------|----------|----------|----------|----------|
+|     1 |    16 |                 36 |      513 |      681 |      820 |      999 |
+|     2 |    32 |                 51 |      572 |      706 |     1107 |     1191 |
+|     4 |    64 |                106 |      586 |      722 |     1033 |     1423 |
+|     6 |    96 |                148 |      589 |      802 |     1209 |     1741 |
+
+## Create Simulation (no validation)
+
+| Nodes | Users | Throughput (req/s) | p50 (ms) | p75 (ms) | p95 (ms) | p99 (ms) |
+|-------|-------|--------------------|----------|----------|----------|----------|
+|     1 |    16 |                456 |       11 |       72 |       85 |      198 |
+|     2 |    32 |                490 |       33 |       74 |      239 |      403 |
+|     4 |    64 |               1063 |       26 |       77 |      180 |      366 |
+|     6 |    96 |                891 |       59 |      113 |      431 |      546 |
+
+## Fetch Simulation
+
+| Nodes | Users | Throughput (req/s) | p50 (ms) | p75 (ms) | p95 (ms) | p99 (ms) |
+|-------|-------|--------------------|----------|----------|----------|----------|
+|     1 |    16 |               1849 |        4 |        4 |       55 |       68 |
+|     2 |    32 |               2826 |        5 |        6 |       47 |       93 |
+|     4 |    64 |               3440 |       10 |       13 |       53 |      116 |
+|     6 |    96 |               3860 |       15 |       21 |       57 |      127 |
+
+## Mixed Simulation
+
+| Nodes | Users | Throughput (req/s) | p50 (ms) | p75 (ms) | p95 (ms) | p99 (ms) |
+|-------|-------|--------------------|----------|----------|----------|----------|
+|     1 |    16 |                467 |       14 |       29 |       80 |      269 |
+|     2 |    32 |                566 |       15 |       25 |       89 |     1076 |
+|     4 |    64 |                567 |       16 |       27 |      128 |     3552 |
+|     6 |    96 |                506 |       16 |       26 |      154 |     7182 |
+
+## Tag Simulation
+
+| Nodes | Users | Throughput (req/s) | p50 (ms) | p75 (ms) | p95 (ms) | p99 (ms) |
+|-------|-------|--------------------|----------|----------|----------|----------|
+|     1 |    16 |                508 |        7 |       69 |       95 |      195 |
+|     2 |    32 |                699 |       16 |       76 |      100 |      199 |
+|     4 |    64 |               1107 |       15 |       80 |      163 |      276 |
+|     6 |    96 |               1661 |       26 |       72 |      144 |      257 |
+
+## GetByTag Simulation
+
+| Nodes | Users | Throughput (req/s) | p50 (ms) | p75 (ms) | p95 (ms) | p99 (ms) |
+|-------|-------|--------------------|----------|----------|----------|----------|
+|     1 |    16 |                156 |       86 |      116 |      298 |      418 |
+|     2 |    32 |                189 |      118 |      215 |      406 |      586 |
+|     4 |    64 |                192 |      200 |      460 |      975 |     1381 |
+|     6 |    96 |                285 |      202 |      477 |      988 |     1409 |
+
+# Conclusions
+
+The "_Create Simulation_" and "_Create Simulation (no validation)_" shows the impact of SHACL validation for resources
+on the throughput an latency. Most of the CPU cycles (~ 90%) are spent running the validation.
+
+The system scales fairly well with the number of nodes allocated, but depends on each of the operations. Although the
+number of concurrent requests is generally higher with more nodes, the penalty of node to node communication can have
+a fairly big impact. For example: assembling schemas (following `owl:import` and `@context` references) implies a lot of
+cross node communication; for "_Create Simulation_" increasing the cluster size from 1 to 2, while it shows an increase
+in the total throughput, the value is not double to that of a single node.
diff --git a/src/main/paradox/docs/benchmarks/scenarios.md b/src/main/paradox/docs/benchmarks/scenarios.md