Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for compressor plugins #585

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

lucagiac81
Copy link
Contributor

This is a follow-up to PR #584, adding support for external compressor plugins.
The first commit in this PR will match PR #584 until that is merged.
It is equivalent to PR6717 in RocksDB.

The Compressor and related classes are made part of the public API to allow development of plugins and to expose compressors in options.

Compressor plugins can be used to easily integrate new compression algorithms into Speedb. They can also implement compression techniques tailored to specific types of data. For example, if the values in a database are of numeric type (e.g., arrays of integers) with particular distributions (e.g., limited range within each block), the values could be compressed using a lightweight compression algorithm implemented as a plugin (such as frame-of-reference or delta encoding).

Options for Compressors

New options are added to support plugin compressors.
For example, compression was previously configured by compression (of type CompressionType) and compression_opts (of type CompressionOptions). This PR adds a compressor option (pointer to Compressor) to specify a compressor object (which includes type and options).
This approach was followed for the following options:

  • ColumnFamilyOptions: compression (compressor), bottommost_compression (bottommost_compressor)
  • AdvancedColumnFamilyOptions: compression_per_level (compressor_per_level), blob_compression (blob_compressor)
  • CompactionOptions: compression (compressor)

The existing CompressionType/CompressionOptions options are preserved for backward compatibility.
If the user doesn't specify compressors (leaving them null), the CompressionType/CompressionOptions options are used as before. Otherwise, compressors override the existing options.

A new constant kPluginCompression is defined in CompressionType for plugin compressors. The SST properties block stores information about the specific compressor in the compression_name field. This is used to instantiate a suitable compressor when opening the SST.

Option String Examples

Built-in compressor (existing options)
compression=kZSTD;compression_opts={level=1}

Built-in compressor (new options)
compressor={id=ZSTD;level=1}

Plugin compressor (new options)
compressor={id=my_compressor;my_option1=value1;my_option2=value2}

Options Object Example

Built-in compressor (existing options)

Options options;
options.compression = kZSTD;
options.compression_opts.level = 1;

Built-in compressor (new options)

Options options;
ConfigOptions config_options;
  Status s = Compressor::CreateFromString(
      config_options,
      "id=ZSTD;level=1",
      &options.compressor);

Plugin compressor (new options)

Options options;
ConfigOptions config_options;
  Status s = Compressor::CreateFromString(
      config_options,
      "id=my_compressor;my_option1=value1;my_option2=value2",
      &options.compressor);

db_bench

For db_bench, compression_type and individual compression options (such as compression_level) were left unchanged for backward compatibility.
compression_type is still used with plugin compressors to specify their name. Other compressor options can be passed using compressor_options.

Built-in compressor (existing options)
--compression_type=zstd --compression_level=1

Built-in compressor (new options)
--compression_type=zstd --compressor_options="level=1"

Plugin compressor (new options)
--compression_type=my_compressor --compressor_options="my_option1=value1;my_option2=value2"

Limitations/Future Work

Compressor plugins are currently not supported for

  • WAL compression: it requires adding streaming compression to the Compressor interface, as described in PR Add Compressor interface #584
  • Compressed secondary cache
  • BlobDB: the blob_compressor option allows passing options for built-in compressors, but plugins are not supported. Supporting plugins would require additional metadata to be stored with blob files.

These limitations will be addressed by future PRs.

@lucagiac81 lucagiac81 marked this pull request as ready for review June 27, 2023 14:07
@lucagiac81 lucagiac81 force-pushed the pluggable_compression branch from 24319b2 to c0655bb Compare August 15, 2023 20:09
@ofriedma
Copy link
Contributor

@lucagiac81 please rebase

@lucagiac81 lucagiac81 force-pushed the pluggable_compression branch from c0655bb to ce92cf2 Compare October 20, 2023 00:20
@@ -995,6 +1140,43 @@ void MutableCFOptions::RefreshDerivedOptions(int num_levels,
max_file_size[i] = target_file_size_base;
}
}

if (compressor == nullptr) {
Copy link
Contributor Author

@lucagiac81 lucagiac81 Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently discovered issue to be addressed as part of the review:
when using an option string specifying the (pre-existing) “compression” option instead of the (new) “compressor” option, the "compression" option is not applied.

This is caused by the following:

  1. When the base MutableCFOptions is created, compression defaults to kSnappyCompression . This leads to compressor being set to SnappyCompressor in RefreshDerivedOptions.
  2. When the option string is then parsed, the “compression” option won’t override the existing “compressor”, as it’s not null.
  3. In other words, the code can’t differentiate whether the SnappyCompressor selection came explicitly from the user or from the default settings.

This can be addressed by

  • Finding a way to eliminate the compression/compressor option duplication (best solution), or
  • having a separate variable for a "derived" compressor in MutableCFOptions, which is computed from the compressor/compression options. In this way, user selection is never overwritten by default settings.

A test must also be added to detect the bug and confirm the fix.

Copy link
Contributor

@speedbmike speedbmike Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the current status of this issue? It looks like a critical issue to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@lucagiac81 lucagiac81 Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a solution using the "derived" compressor approach described above (and a test to reproduce the issue and validate the fix). Now that #397 is merged, I'll try to look for a better solution before pushing an update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucagiac81 thank you

@mrambacher
Copy link
Contributor

#397 should be able to help solve this problem. Using it, we can set the compressor as an option and not serialize the compression+opts if the Compressor is not set. Once your PR is merged, I suggest you open another issue to outline that issue, preferably with an associated unit test. Then I can help work on a fix.

speedbmike
speedbmike previously approved these changes Nov 7, 2023
CreateIfMatches<ZSTDNotFinalCompressor>(id, result)) {
return Status::OK();
} else {
#ifndef ROCKSDB_LITE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROCKSDB_LITE is deprecated, should not be used. But its removal can be handled in a separate PR (there are other leftovers in speedb code)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was missed when I removed ROCKSDB_LITE support. I'll remove it and submit an update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed any leftover ROCKSDB_LITE support from the files modified by this PR (in compressor.cc and block_based_table_builder.cc).

@lucagiac81 lucagiac81 force-pushed the pluggable_compression branch from 636da34 to 0f12bbe Compare January 11, 2024 17:55
@lucagiac81
Copy link
Contributor Author

lucagiac81 commented Jan 11, 2024

Code is rebased on latest main based RocksDB 8.6.7 (the previous code was based on RocksDB 8.1.1).
I kept some of the changes in separate commits for ease of review.

Fix vector option parsing

  • A parsing error was discovered with compressor_per_level, which is a vector of Compressor (Customizable).
  • The vector is serialized with an outer set of braces. For example: {{id=NoCompression; parallel_threads=1}:{id=Snappy; parallel_threads=1}}
  • DefaultOptionsFormatter::ToVector calls NextToken which only removes the outer braces and returns one token with all the vector elements, instead of taking one element at a time.
  • This commit resolves the issue by removing the outer set of brackets.

Add test for option override issue

  • This exposes the bug described in a a previous comment in this PR (using compression and compressor options in option string).

Separate derived compression settings

  • This solves the bug exposed in the previous commit.
  • It adds a "derived" compressor field in MutableCFOptions, which is computed from the compressor/compression options. In this way, user selection is never overwritten by default settings.

I also tried to fix the bug using PR397 (with kUseBaseAddress), but I could not cover all cases.

  • With that PR397, I’m able to update “compressor” and “compression” fields in MutableCFOptions when parsing either option in the string.
  • However, I still run into issues when parsing the option file.
  • I noticed that the parse order from the option file is: compression, compressor, compression_opts
  • When parsing compression_opts, we need to apply the options only if the user specified the "compression" option. If the user specified "compressor", the options are already included.
  • We could only keep "compressor" in the option file, but the file would become incompatible with previous versions of Speedb.
  • I have the code for this solution available in a separate branch if you'd like to review it.

Copy link
Contributor

@ofriedma ofriedma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants