Skip to content

Commit

Permalink
WIP: Documentation on performance test results (BlueBrain#263)
Browse files Browse the repository at this point in the history
Documentation on performance test results
  • Loading branch information
bogdanromanx authored Nov 13, 2018
1 parent 69bf94f commit afea65c
Show file tree
Hide file tree
Showing 6 changed files with 100 additions and 59 deletions.
Binary file modified src/main/paradox/assets/img/performance_tests_environment.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
29 changes: 29 additions & 0 deletions src/main/paradox/docs/benchmarks/data-volume-and-scenarios.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Data volume and scenarios

Prior to the test execution a reasonable amount of data was injected into the system to ensure the system behaves well
under a typical volume. Specifically:

* Total number of resources: 115,690,687
* Total number of triples (edges): 2,493,134,304
* Total number of entities (nodes): 352,856,595
* Total number of property types: 74
* Total number of classes (distinct values of `@type`): 24

The data was generated by replicating a collection of _provenance patterns_, each spanning several resources:

* resource [examples](https://github.com/BlueBrain/nexus-tests/tree/master/src/main/resources/bbp)
* corresponding SHACL [schemas](https://github.com/BlueBrain/nexus-tests/tree/master/scripts/schemas)

The full collection of scenarios can be found [here](https://github.com/BlueBrain/nexus-tests/tree/master/src/it/scala/ch/epfl/bluebrain/nexus/perf)
each within its own file.

Several scenarios were executed to verify the behaviour of the system, most notably:

* create resource (with validation)
* create resource (without validation)
* get resource by id
* tag resource
* get resource by revision, get resource by tag
* mixed operations: list resources, get resource by id, ElasticSearch query, BlazeGraph query, update resource

Please head over to the @ref:[results](results.md) section for a summary of the results and conclusions.
5 changes: 4 additions & 1 deletion src/main/paradox/docs/benchmarks/deployment-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,8 @@ The deployment configuration and the number of nodes assigned to each Nexus serv

![deployment configuration](../../assets/img/performance_tests_environment.png)

The benchmarks were run on a AWS EC2 `m5.large` server outside of the Kubernetes cluster.

The benchmarks were run on a AWS EC2 `m5.2xlarge` server outside of the Kubernetes cluster.
Preliminary tests shows that KG service is the most critical component of the system (as expected) and it has the most
impact of the performance aspects of the system. Thus, during the tests the KG service cluster size was scaled to 1, 2,
4 and 6 replicas along with the number of concurrent connections (using the same multiplier) during the test executions.
8 changes: 3 additions & 5 deletions src/main/paradox/docs/benchmarks/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
@@@ index

* [Deployment Configuration](deployment-configuration.md)
* [Scenarios](scenarios.md)
* [Data Volume and Scenarios](data-volume-and-scenarios.md)
* [Results](results.md)

@@@
Expand All @@ -20,11 +20,9 @@ and how they were affected by different factors, especially:
* **hardware configuration and scalability** - does assigning more hardware increase the performance of the system and can the system scale both horizontally and vertically.
* **clustering** - what's the effect of changing from a single node to clustered deployment, as well as, what's the effect of adding more nodes to the cluster.

The description of the test scenarios can be found @ref:[here](scenarios.md).
The description of the test scenarios can be found @ref:[here](data-volume-and-scenarios.md).
The test scenarios and scripts can be found in [nexus-tests](https://github.com/BlueBrain/nexus-tests) repository.
The results of the benchmarks are described in detail in the @ref:[Results section](results.md).

The benchmarks were run on a Kubernetes cluster deployed on AWS. Fore more details see @ref:[deployment configuration](deployment-configuration.md).
The tests were run against v1 API of Nexus in October 2018.

The benchmarks were using `final_number_of_instances` million of instances distributed in `final_number_of_projects` of projects.
The tests were run against v1 API of Nexus in November 2018 using [Gatling](https://gatling.io/) version 3.0.0.
65 changes: 64 additions & 1 deletion src/main/paradox/docs/benchmarks/results.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,66 @@
# Results

TBC.
## Create Simulation

| Nodes | Users | Throughput (req/s) | p50 (ms) | p75 (ms) | p95 (ms) | p99 (ms) |
|-------|-------|--------------------|----------|----------|----------|----------|
| 1 | 16 | 36 | 513 | 681 | 820 | 999 |
| 2 | 32 | 51 | 572 | 706 | 1107 | 1191 |
| 4 | 64 | 106 | 586 | 722 | 1033 | 1423 |
| 6 | 96 | 148 | 589 | 802 | 1209 | 1741 |

## Create Simulation (no validation)

| Nodes | Users | Throughput (req/s) | p50 (ms) | p75 (ms) | p95 (ms) | p99 (ms) |
|-------|-------|--------------------|----------|----------|----------|----------|
| 1 | 16 | 456 | 11 | 72 | 85 | 198 |
| 2 | 32 | 490 | 33 | 74 | 239 | 403 |
| 4 | 64 | 1063 | 26 | 77 | 180 | 366 |
| 6 | 96 | 891 | 59 | 113 | 431 | 546 |

## Fetch Simulation

| Nodes | Users | Throughput (req/s) | p50 (ms) | p75 (ms) | p95 (ms) | p99 (ms) |
|-------|-------|--------------------|----------|----------|----------|----------|
| 1 | 16 | 1849 | 4 | 4 | 55 | 68 |
| 2 | 32 | 2826 | 5 | 6 | 47 | 93 |
| 4 | 64 | 3440 | 10 | 13 | 53 | 116 |
| 6 | 96 | 3860 | 15 | 21 | 57 | 127 |

## Mixed Simulation

| Nodes | Users | Throughput (req/s) | p50 (ms) | p75 (ms) | p95 (ms) | p99 (ms) |
|-------|-------|--------------------|----------|----------|----------|----------|
| 1 | 16 | 467 | 14 | 29 | 80 | 269 |
| 2 | 32 | 566 | 15 | 25 | 89 | 1076 |
| 4 | 64 | 567 | 16 | 27 | 128 | 3552 |
| 6 | 96 | 506 | 16 | 26 | 154 | 7182 |

## Tag Simulation

| Nodes | Users | Throughput (req/s) | p50 (ms) | p75 (ms) | p95 (ms) | p99 (ms) |
|-------|-------|--------------------|----------|----------|----------|----------|
| 1 | 16 | 508 | 7 | 69 | 95 | 195 |
| 2 | 32 | 699 | 16 | 76 | 100 | 199 |
| 4 | 64 | 1107 | 15 | 80 | 163 | 276 |
| 6 | 96 | 1661 | 26 | 72 | 144 | 257 |

## GetByTag Simulation

| Nodes | Users | Throughput (req/s) | p50 (ms) | p75 (ms) | p95 (ms) | p99 (ms) |
|-------|-------|--------------------|----------|----------|----------|----------|
| 1 | 16 | 156 | 86 | 116 | 298 | 418 |
| 2 | 32 | 189 | 118 | 215 | 406 | 586 |
| 4 | 64 | 192 | 200 | 460 | 975 | 1381 |
| 6 | 96 | 285 | 202 | 477 | 988 | 1409 |

# Conclusions

The "_Create Simulation_" and "_Create Simulation (no validation)_" shows the impact of SHACL validation for resources
on the throughput an latency. Most of the CPU cycles (~ 90%) are spent running the validation.

The system scales fairly well with the number of nodes allocated, but depends on each of the operations. Although the
number of concurrent requests is generally higher with more nodes, the penalty of node to node communication can have
a fairly big impact. For example: assembling schemas (following `owl:import` and `@context` references) implies a lot of
cross node communication; for "_Create Simulation_" increasing the cluster size from 1 to 2, while it shows an increase
in the total throughput, the value is not double to that of a single node.
52 changes: 0 additions & 52 deletions src/main/paradox/docs/benchmarks/scenarios.md

This file was deleted.

0 comments on commit afea65c

Please sign in to comment.