forked from ravendb/docs
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
RDoc-2041_ShardingOverview2023_2 added Sharding root folder and overview
- Loading branch information
Showing
12 changed files
with
266 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
8 changes: 8 additions & 0 deletions
8
Documentation/6.0/Raven.Documentation.Pages/sharding/.docs.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
[ | ||
{ | ||
"Path": "overview.markdown", | ||
"Name": "Overview", | ||
"DiscussionId": "6069ef7d-5242-40d7-adbc-ec7bd5e68921", | ||
"Mappings": [] | ||
} | ||
] |
Binary file added
BIN
+13.4 KB
Documentation/6.0/Raven.Documentation.Pages/sharding/images/buckets-allocation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+60 KB
Documentation/6.0/Raven.Documentation.Pages/sharding/images/buckets-allocation.snag
Binary file not shown.
Binary file added
BIN
+13.7 KB
Documentation/6.0/Raven.Documentation.Pages/sharding/images/buckets-population.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+40 KB
Documentation/6.0/Raven.Documentation.Pages/sharding/images/buckets-population.snag
Binary file not shown.
Binary file added
BIN
+13.9 KB
...on/6.0/Raven.Documentation.Pages/sharding/images/force-docs-to-share-bucket.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+40 KB
Documentation/6.0/Raven.Documentation.Pages/sharding/images/force-docs-to-share-bucket.snag
Binary file not shown.
Binary file added
BIN
+28.6 KB
...n/6.0/Raven.Documentation.Pages/sharding/images/sharding-replication-factor.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+56 KB
Documentation/6.0/Raven.Documentation.Pages/sharding/images/sharding-replication-factor.snag
Binary file not shown.
250 changes: 250 additions & 0 deletions
250
Documentation/6.0/Raven.Documentation.Pages/sharding/overview.markdown
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,250 @@ | ||
# Sharding: Overview | ||
--- | ||
|
||
{NOTE: } | ||
* **Sharding**, supported by RavenDB from version 6.0 and on, | ||
is the distribution of a database's content between autonomous | ||
**Shards**. | ||
|
||
* In most cases, sharding is implemented to allow efficient usage | ||
and management of exceptionally large databases (i.e. a 10-terabyte DB). | ||
|
||
* Sharding is managed by the RavenDB server, no special adaptation | ||
is required from clients when accessing a sharding-capable server | ||
or a sharded database. | ||
* The client API is **unchanged** under a sharded database. | ||
* Particular modifications in RavenDB features under a sharded | ||
database are documented in detail in feature-specific articles. | ||
|
||
* Each [Shard](../sharding/overview#shards) hosts and manages | ||
a **unique subset** of the database content. | ||
Documents are sorted between shards by their **document ID**. | ||
|
||
* Each RavenDB shard is hosted by at least one cluster node. | ||
Shards can be replicated over multiple nodes to increase data | ||
accessibility. | ||
|
||
* In this page: | ||
* [Sharding](../sharding/overview#sharding) | ||
* [Client-Server Communication](../sharding/overview#client-server-communication) | ||
* [When Should Sharding Be Used?](../sharding/overview#when-should-sharding-be-used) | ||
* [Shards](../sharding/overview#shards) | ||
* [Shard Replication](../sharding/overview#shard-replication) | ||
* [Buckets](../sharding/overview#buckets) | ||
* [Buckets Allocation](../sharding/overview#buckets-allocation) | ||
* [Buckets Population](../sharding/overview#buckets-population) | ||
* [Document Extensions Storage](../sharding/overview#document-extensions-storage) | ||
* [Forcing Documents to Share a Bucket](../sharding/overview#forcing-documents-to-share-a-bucket) | ||
* [Resharding](../sharding/overview#resharding) | ||
* [Creating a Sharded Database](../sharding/overview#creating-a-sharded-database) | ||
|
||
{NOTE/} | ||
|
||
--- | ||
|
||
{PANEL: Sharding} | ||
|
||
As a database grows [very large](https://en.wikipedia.org/wiki/Very_large_database), | ||
storing and managing it may become too demanding for any single node. | ||
System performance may suffer as resources like RAM, CPU, and storage are | ||
exhausted, routine chores like indexing and backup become massive tasks, | ||
responsiveness to client requests and queries slows down, and the system's | ||
throughput spreads thin serving an ever-growing number of clients. | ||
|
||
As the volume of stored data grows, the database can be scaled out by | ||
splitting it to [shards](../sharding/overview#shards), allowing it to be | ||
handled by multiple nodes and presenting practically no limit to its growth. | ||
The size of the overall database, comprised of all shards, can reach in | ||
this fashion dozens of terabytes and more while keeping the resources | ||
of each shard in check and maintaining its high performance and throughput. | ||
|
||
--- | ||
|
||
### Client-Server Communication | ||
|
||
As a client approaches a sharded database, the node it connects becomes the | ||
**session orchestrator** and manages all the communication between the client | ||
and the shards containing the documents it requires access to. | ||
The client remains unaware of this process and uses the same API as if | ||
the database wasn't sharded. | ||
The additional communication between the client and the orchestrator and | ||
between the orchestrator and the shards does, however, present an overhead | ||
over the usage of a non-sharded database. | ||
|
||
--- | ||
|
||
### When Should Sharding Be Used? | ||
|
||
While sharding solves many issues related to the storage and management | ||
of high-volume databases, the overhead it presents outweighs its benefits | ||
when the database size still poses no problem. We can postpone the transit | ||
to a sharded database when, for example, the database size is 100 GB, the | ||
server is well equipped and would comfortably handle a much larger volume, | ||
and no dramatic increase is expected in the number of potential users | ||
any time soon. | ||
|
||
{NOTE: } | ||
We recommend that you plan ahead for a transition to a sharded database when | ||
your database size is in the vicinity of 250 GB, so the transition is already well | ||
established when it reaches 500 GB. | ||
{NOTE/} | ||
|
||
{NOTE: } | ||
RavenDB 6.0 and above can **migrate** its database to a sharded database | ||
via [external replication](../server/ongoing-tasks/external-replication) | ||
or [export & import](../studio/database/tasks/export-database) operations. | ||
|
||
You cannot, however, upgrade a non-sharded database into a sharded one. | ||
To upgrade RavenDB to 6.0 and migrate the database data you will need | ||
to upgrade the server, create a new, sharded database, and replicate or | ||
export the data into it. | ||
{NOTE/} | ||
|
||
{PANEL/} | ||
|
||
--- | ||
|
||
{PANEL: Shards} | ||
|
||
While each cluster node of a non-sharded database handles a full replica | ||
of the entire database, each **shard** is assigned with a **subset** of the | ||
entire database content. | ||
{NOTE: } | ||
Take, for example, a 3-shards database, in which shard **1** is populated with | ||
documents `Users/1`..`Users/2000`, shard **2** with documents `Users/2001`..`Users/4000`, | ||
and shard **3** with documents `Users/4001`..`Users/6000`. | ||
A client that connects this database to retrieve `Users/3000` and `Users/5000` | ||
would be served by an automatically-appointed [orchestrator node](../sharding/overview#client-server-communication) | ||
that would seamlessly retrieve `Users/3000` from shard **2** and `Users/5000` from | ||
shard **3** and hand them to the client. | ||
{NOTE/} | ||
|
||
--- | ||
|
||
### Shard Replication | ||
|
||
Similarly to non-sharded databases, shards can be **replicated** by cluster nodes | ||
to ensure the continuous availability of all shards in case of a node failure, | ||
provide multiple access points, and load-balance the traffic between shard replicas. | ||
|
||
The number of nodes a shard is replicated to is determined by | ||
the **Shard Replication Factor**. | ||
|
||
!["Shard Replication"](images/sharding-replication-factor.png "Shard Replication") | ||
|
||
* In the image above, a 3-shards database is hosted by a 5-nodes cluster (where | ||
two of the nodes, **D** and **E**, are unused by this database). | ||
The Shard Replication Factor is set to 2, maintaining two replicas of each shard. | ||
|
||
{PANEL/} | ||
|
||
{PANEL: Buckets} | ||
|
||
Documents are stored in a sharded database within virtual containers named **Buckets**. | ||
The number of documents and the amount of data stored in each bucket may vary. | ||
|
||
--- | ||
|
||
### Buckets Allocation | ||
|
||
The number of buckets allocated for the whole database is fixed, always remaining | ||
**1,048,576** (1024 times 1024). | ||
Each shard is assigned with a range of buckets from this overall portion, in which | ||
documents can be stored. | ||
|
||
!["Buckets Allocation"](images/buckets-allocation.png "Buckets Allocation") | ||
|
||
--- | ||
|
||
### Buckets Population | ||
|
||
Buckets are populated with documents automatically by the cluster. | ||
A hash algorithm is executed over each document ID. The resulting | ||
hash code, a number between 0 and 1,048,576, is the number of the | ||
bucket in which the document is stored. | ||
|
||
!["Buckets Population"](images/buckets-population.png "Buckets Population") | ||
|
||
As buckets are spread among different shards, the bucket number | ||
allocated for a document also determines which shard the document | ||
will reside on. | ||
|
||
--- | ||
|
||
### Document Extensions Storage | ||
|
||
Document extensions (i.e. Attachments, Time series, Counters, and | ||
Revisions) are stored in the same bucket as the document they belong to. | ||
To make this happen, the bucket number (hash code) they are given | ||
is calculated by the ID of the document that owns them. | ||
|
||
--- | ||
|
||
### Forcing Documents to Share a Bucket | ||
|
||
The cluster can be forced to store a document in the same bucket | ||
(and therefore in the same shard) as another document. | ||
To do this, add the document name a suffix: `$` + <`ID`> | ||
The cluster will calculate the document's bucket number by | ||
the suffix ID, and store the document in this bucket. | ||
|
||
{NOTE: } | ||
E.g. - | ||
Original document ID: `Users/70` | ||
The document you want Users/70 to share a bucket with: `Users/4` | ||
Rename `Users/70` to: `Users/70$Users/4` | ||
{NOTE/} | ||
|
||
!["Forcing Documents to Share a Bucket"](images/force-docs-to-share-bucket.png "Forcing Documents to Share a Bucket") | ||
|
||
{WARNING: } | ||
Be careful not to force the storage of too many documents in the same bucket | ||
(and shard), to prevent the creation of an imbalanced database in which one | ||
of the shards is overpopulated and others are underpopulated. | ||
{WARNING/} | ||
|
||
{PANEL/} | ||
|
||
{PANEL: Resharding} | ||
|
||
**Resharding** is a process in which the content of a bucket | ||
(e.g. bucket 100000 in shard **1**) is reallocated to another | ||
bucket (e.g. bucket 350000 in shard **2**) to maintain a balanced | ||
database in which all shards handle about the same volume of data. | ||
|
||
{PANEL/} | ||
|
||
{PANEL: Creating a Sharded Database} | ||
|
||
{NOTE: } | ||
|
||
* A sharded database can be created via API or using Studio. | ||
|
||
* A RavenDB cluster can run sharded and non-sharded databases in parallel. | ||
|
||
* RavenDB (6.0 and on) supports sharding by default, no further steps are required to enable the feature. | ||
{NOTE/} | ||
|
||
To create a sharded database via API, use [CreateDatabaseOperation](../client-api/operations/server-wide/create-database) as follows. | ||
|
||
{CODE-BLOCK:csharp} | ||
store.Maintenance.Server.Send( | ||
new CreateDatabaseOperation( | ||
new DatabaseRecord(database), | ||
replicationFactor: 2, // Sharding Replication Factor | ||
shardFactor: 3)); // Sharding Factor | ||
{CODE-BLOCK/} | ||
|
||
{PANEL/} | ||
|
||
## Related articles | ||
|
||
**Client API** | ||
[Create Database](../client-api/operations/server-wide/create-database) | ||
|
||
**Server** | ||
[External Replication](../server/ongoing-tasks/external-replication) | ||
|
||
**Studio** | ||
[Export Database](../studio/database/tasks/export-database) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters