support Multi-disk(Multi-path) #918

xiaobiaozhao · 2022-09-25T01:27:30Z

Search before asking

I had searched in the issues and found no similar issues.

Motivation

When the host has multiple disks, multiple disks can be used for data storage to increase the performance of KVROCKS.
Hot data can be stored on local SSDS and cold data can be stored on cloud disks

Solution

option.db_paths = {
                     {"/disk1", 1000 * 1000 * 1000},
                     {"/disk2", 1000 * 1000 * 1000},
                     {"/disk3", 1000 * 1000 * 1000},
                     {"/disk4", 1000 * 1000 * 1000}};

https://github.com/facebook/rocksdb/blob/main/include/rocksdb/options.h#L672

Are you willing to submit a PR?

I'm willing to submit a PR!

The text was updated successfully, but these errors were encountered:

caipengbo · 2022-09-25T01:38:12Z

Hi @xiaobiaozhao, I have two questions:

Why can multiple disks improve performance? Multiple paths do not seem to work in parallel.
How do we judge hot and cold data in kvrocks? Rocksdb simply determines where to place the SST based on when the SST was generated.

tanruixiang · 2022-09-25T15:46:45Z

Hi @xiaobiaozhao, I have two questions:

Why can multiple disks improve performance? Multiple paths do not seem to work in parallel.

How do we judge hot and cold data in kvrocks? Rocksdb simply determines where to place the SST based on when the SST was generated.

According to the description of the configuration, the lower level SST will be stored in the front of the db_paths. So we can arrange the db_paths according to the speed of the storage medium, and put the low-level SST in the faster storage medium, for example, put the SSD in the first of the db_paths to storage the low-level SST.

In fact, the level at which SST is located represents the hot and coldness of the data. Because rocksdb uses the LSM tree, it will naturally merge cold data to a higher level.

So if this feature is used, rocksdb can help us store cold data in slower storage media such as mechanical hard drives, and store hot data in faster storage media such as SSD.

xiaobiaozhao · 2022-09-25T23:56:56Z

Hi @xiaobiaozhao, I have two questions:

Why can multiple disks improve performance? Multiple paths do not seem to work in parallel.

How do we judge hot and cold data in kvrocks? Rocksdb simply determines where to place the SST based on when the SST was generated.

According to the description of the configuration, the lower level SST will be stored in the front of the db_paths. So we can arrange the db_paths according to the speed of the storage medium, and put the low-level SST in the faster storage medium, for example, put the SSD in the first of the db_paths to storage the low-level SST.

In fact, the level at which SST is located represents the hot and coldness of the data. Because rocksdb uses the LSM tree, it will naturally merge cold data to a higher level.

So if this feature is used, rocksdb can help us store cold data in slower storage media such as mechanical hard drives, and store hot data in faster storage media such as SSD.

Yes，In my test demo，rocksdb use first & last of the dp_paths config only.

option.db_paths = {
                     {"/disk1", 1000 * 1000 * 1000},
                     {"/disk2", 1000 * 1000 * 1000},
                     {"/disk3", 1000 * 1000 * 1000},
                     {"/disk4", 1000 * 1000 * 1000}};

Only disk1 & disk4 wiil be use to write data. And rocks limit max 4 db_paths.

caipengbo · 2022-09-26T01:36:14Z

In fact, the level at which SST is located represents the hot and coldness of the data. Because rocksdb uses the LSM tree, it will naturally merge cold data to a higher level.

@tanruixiang But about 90% of the data falls to the last layer of the LSM, so does that mean that 90% of the data is cold?

tanruixiang · 2022-09-26T05:51:05Z

In fact, the level at which SST is located represents the hot and coldness of the data. Because rocksdb uses the LSM tree, it will naturally merge cold data to a higher level.

@tanruixiang But about 90% of the data falls to the last layer of the LSM, so does that mean that 90% of the data is cold?

Most of the data should be cold data. If it is hot data, it will re-enter the previous layers, and the data in the last layer may be deleted.
For example, if key1 is in the last layer and we put key1 again, then key1 will go back to the previous layers after going from mmtable to sst, and at the same time, the key1 of the last layer will be invalid. Of course, if a certain data is only read, it should be placed in the cache even if it is in the last layer.

jishengming1 · 2023-01-30T11:17:05Z

Hi @xiaobiaozhao , I have a questions:
I get two ssd , Is there a way to split the hot data?

xiaobiaozhao · 2023-01-31T07:51:06Z

Hi @xiaobiaozhao , I have a questions: I get two ssd , Is there a way to split the hot data?

https://github.com/apache/incubator-kvrocks/pull/953/files#diff-e29cedc586b39d07b64be1df007101989a20f7d4452fc18fe23136f6d4ccd331R792

"/mnt/ssd 10G; /mnt/hdd 1T;"
hot data cool data

jishengming1 · 2023-01-31T08:10:22Z

Hi @xiaobiaozhao , I have a questions: I get two ssd , Is there a way to split the hot data?

https://github.com/apache/incubator-kvrocks/pull/953/files#diff-e29cedc586b39d07b64be1df007101989a20f7d4452fc18fe23136f6d4ccd331R792

"/mnt/ssd 10G; /mnt/hdd 1T;" hot data cool data

I am not trying to distinguish between hot data and cool data. I mean to make the pressure of both disks the same.
Hot data is distributed in two disks.

xiaobiaozhao · 2023-01-31T14:34:03Z

Hi @xiaobiaozhao , I have a questions: I get two ssd , Is there a way to split the hot data?

https://github.com/apache/incubator-kvrocks/pull/953/files#diff-e29cedc586b39d07b64be1df007101989a20f7d4452fc18fe23136f6d4ccd331R792
"/mnt/ssd 10G; /mnt/hdd 1T;" hot data cool data

I am not trying to distinguish between hot data and cool data. I mean to make the pressure of both disks the same. Hot data is distributed in two disks.

You can try cluster

jishengming1 · 2023-02-01T02:31:23Z

Hi @xiaobiaozhao , I have a questions: I get two ssd , Is there a way to split the hot data?

https://github.com/apache/incubator-kvrocks/pull/953/files#diff-e29cedc586b39d07b64be1df007101989a20f7d4452fc18fe23136f6d4ccd331R792
"/mnt/ssd 10G; /mnt/hdd 1T;" hot data cool data

I am not trying to distinguish between hot data and cool data. I mean to make the pressure of both disks the same. Hot data is distributed in two disks.

You can try cluster
Thanks, is there any other way if there is only one host?

marlboroman81 · 2023-02-02T16:35:02Z

I get two ssd , Is there a way to split the hot data?

You can make a raid0 from several disks and place the datadir on it. Or you can use zfs pool consisting of several disks.

jishengming1 · 2023-02-03T02:06:16Z

I get two ssd , Is there a way to split the hot data?

You can make a raid0 from several disks and place the datadir on it. Or you can use zfs pool consisting of several disks.

Thanks, I'll try.

xiaobiaozhao added the enhancement type enhancement label Sep 25, 2022

xiaobiaozhao mentioned this issue Oct 5, 2022

feat: add mutil_path to conf #953

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support Multi-disk(Multi-path) #918

support Multi-disk(Multi-path) #918

xiaobiaozhao commented Sep 25, 2022

caipengbo commented Sep 25, 2022 •

edited

Loading

tanruixiang commented Sep 25, 2022 •

edited

Loading

xiaobiaozhao commented Sep 25, 2022 •

edited

Loading

caipengbo commented Sep 26, 2022

tanruixiang commented Sep 26, 2022

jishengming1 commented Jan 30, 2023

xiaobiaozhao commented Jan 31, 2023 •

edited

Loading

jishengming1 commented Jan 31, 2023

xiaobiaozhao commented Jan 31, 2023

jishengming1 commented Feb 1, 2023

marlboroman81 commented Feb 2, 2023

jishengming1 commented Feb 3, 2023

support Multi-disk(Multi-path) #918

support Multi-disk(Multi-path) #918

Comments

xiaobiaozhao commented Sep 25, 2022

Search before asking

Motivation

Solution

Are you willing to submit a PR?

caipengbo commented Sep 25, 2022 • edited Loading

tanruixiang commented Sep 25, 2022 • edited Loading

xiaobiaozhao commented Sep 25, 2022 • edited Loading

caipengbo commented Sep 26, 2022

tanruixiang commented Sep 26, 2022

jishengming1 commented Jan 30, 2023

xiaobiaozhao commented Jan 31, 2023 • edited Loading

jishengming1 commented Jan 31, 2023

xiaobiaozhao commented Jan 31, 2023

jishengming1 commented Feb 1, 2023

marlboroman81 commented Feb 2, 2023

jishengming1 commented Feb 3, 2023

caipengbo commented Sep 25, 2022 •

edited

Loading

tanruixiang commented Sep 25, 2022 •

edited

Loading

xiaobiaozhao commented Sep 25, 2022 •

edited

Loading

xiaobiaozhao commented Jan 31, 2023 •

edited

Loading