Skip to content

Commit

Permalink
Change Footer version handling, Make compression dynamic (quickwit-os…
Browse files Browse the repository at this point in the history
…s#1060)

Change Footer version handling, Make compression dynamic

Change Footer version handling
Simplify version handling by switching to JSON instead of binary serialization.
fixes quickwit-oss#1058

Make compression dynamic
Instead of choosing the compression during compile time via a feature flag, you can now have multiple compression algorithms enabled and decide during runtime which one to choose via IndexSettings. Changing the compression algorithm on an index is also supported. The information which algorithm was used in the doc store is stored in the DocStoreFooter. The default is the lz4 block format.
fixes quickwit-oss#904

Handle merging of different compressors
Fix feature flag names
Add doc store test for all compressors
  • Loading branch information
PSeitz authored May 28, 2021
1 parent 4afba00 commit 8d32c3b
Show file tree
Hide file tree
Showing 26 changed files with 560 additions and 473 deletions.
8 changes: 7 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,13 @@ Tantivy 0.15.0
- Simplified positions index format (@fulmicoton) #1022
- Moved bitpacking to bitpacker subcrate and add BlockedBitpacker, which bitpacks blocks of 128 elements (@PSeitz) #1030
- Added support for more-like-this query in tantivy (@evanxg852000) #1011
- Added support for sorting an index, e.g presorting documents in an index by a timestamp field. This can heavily improve performance for certain scenarios, by utilizing the sorted data (Top-n optimizations). #1026
- Added support for sorting an index, e.g presorting documents in an index by a timestamp field. This can heavily improve performance for certain scenarios, by utilizing the sorted data (Top-n optimizations)(@PSeitz). #1026
- Add iterator over documents in doc store (@PSeitz). #1044
- Fix log merge policy (@PSeitz). #1043
- Add detection to avoid small doc store blocks on merge (@PSeitz). #1054
- Make doc store compression dynamic (@PSeitz). #1060
- Switch to json for footer version handling (@PSeitz). #1060


Tantivy 0.14.0
=========================
Expand Down
8 changes: 4 additions & 4 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ regex ={ version = "1.5.4", default-features = false, features = ["std"] }
tantivy-fst = "0.3"
memmap = {version = "0.7", optional=true}
lz4_flex = { version = "0.8.0", default-features = false, features = ["checked-decode"], optional = true }
lz4 = { version = "1.23.2", optional = true }
brotli = { version = "3.3", optional = true }
snap = { version = "1.0.5", optional = true }
tempfile = { version = "3.2", optional = true }
Expand Down Expand Up @@ -77,12 +76,13 @@ debug-assertions = true
overflow-checks = true

[features]
default = ["mmap", "lz4-block-compression" ]
default = ["mmap", "lz4-compression" ]
mmap = ["fs2", "tempfile", "memmap"]

brotli-compression = ["brotli"]
lz4-compression = ["lz4"]
lz4-block-compression = ["lz4_flex"]
lz4-compression = ["lz4_flex"]
snappy-compression = ["snap"]

failpoints = ["fail/failpoints"]
unstable = [] # useful for benches.
wasm-bindgen = ["uuid/wasm-bindgen"]
Expand Down
3 changes: 2 additions & 1 deletion appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,6 @@ install:
build: false

test_script:
- REM SET RUST_LOG=tantivy,test & cargo test --all --verbose --no-default-features --features lz4-block-compression --features mmap
- REM SET RUST_LOG=tantivy,test & cargo test --all --verbose --no-default-features --features lz4-compression --features mmap
- REM SET RUST_LOG=tantivy,test & cargo test test_store --verbose --no-default-features --features lz4-compression --features snappy-compression --features brotli-compression --features mmap
- REM SET RUST_BACKTRACE=1 & cargo build --examples
2 changes: 1 addition & 1 deletion src/common/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ pub use self::bitset::BitSet;
pub(crate) use self::bitset::TinySet;
pub(crate) use self::composite_file::{CompositeFile, CompositeWrite};
pub use self::counting_writer::CountingWriter;
pub use self::serialize::{BinarySerializable, FixedSize};
pub use self::serialize::{BinarySerializable, DeserializeFrom, FixedSize};
pub use self::vint::{
read_u32_vint, read_u32_vint_no_advance, serialize_vint_u32, write_u32_vint, VInt,
};
Expand Down
19 changes: 19 additions & 0 deletions src/common/serialize.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,20 @@ pub trait BinarySerializable: fmt::Debug + Sized {
fn deserialize<R: Read>(reader: &mut R) -> io::Result<Self>;
}

pub trait DeserializeFrom<T: BinarySerializable> {
fn deserialize(&mut self) -> io::Result<T>;
}

/// Implement deserialize from &[u8] for all types which implement BinarySerializable.
///
/// TryFrom would actually be preferrable, but not possible because of the orphan
/// rules (not completely sure if this could be resolved)
impl<T: BinarySerializable> DeserializeFrom<T> for &[u8] {
fn deserialize(&mut self) -> io::Result<T> {
T::deserialize(self)
}
}

/// `FixedSize` marks a `BinarySerializable` as
/// always serializing to the same size.
pub trait FixedSize: BinarySerializable {
Expand Down Expand Up @@ -61,6 +75,11 @@ impl<Left: BinarySerializable, Right: BinarySerializable> BinarySerializable for
Ok((Left::deserialize(reader)?, Right::deserialize(reader)?))
}
}
impl<Left: BinarySerializable + FixedSize, Right: BinarySerializable + FixedSize> FixedSize
for (Left, Right)
{
const SIZE_IN_BYTES: usize = Left::SIZE_IN_BYTES + Right::SIZE_IN_BYTES;
}

impl BinarySerializable for u32 {
fn serialize<W: Write>(&self, writer: &mut W) -> io::Result<()> {
Expand Down
11 changes: 9 additions & 2 deletions src/core/index.rs
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ fn load_metas(
/// );
///
/// let schema = schema_builder.build();
/// let settings = IndexSettings{sort_by_field: Some(IndexSortByField{field:"number".to_string(), order:Order::Asc})};
/// let settings = IndexSettings{sort_by_field: Some(IndexSortByField{field:"number".to_string(), order:Order::Asc}), ..Default::default()};
/// let index = Index::builder().schema(schema).settings(settings).create_in_ram();
///
/// ```
Expand Down Expand Up @@ -173,7 +173,7 @@ impl IndexBuilder {
&directory,
)?;
let mut metas = IndexMeta::with_schema(self.get_expect_schema()?);
metas.index_settings = self.index_settings.clone();
metas.index_settings = self.index_settings;
let index = Index::open_from_metas(directory, &metas, SegmentMetaInventory::default());
Ok(index)
}
Expand Down Expand Up @@ -460,6 +460,13 @@ impl Index {
pub fn settings(&self) -> &IndexSettings {
&self.settings
}

/// Accessor to the index settings
///
pub fn settings_mut(&mut self) -> &mut IndexSettings {
&mut self.settings
}

/// Accessor to the index schema
///
/// The schema is actually cloned.
Expand Down
9 changes: 7 additions & 2 deletions src/core/index_meta.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use super::SegmentComponent;
use crate::core::SegmentId;
use crate::schema::Schema;
use crate::Opstamp;
use crate::{core::SegmentId, store::Compressor};
use census::{Inventory, TrackedObject};
use serde::{Deserialize, Serialize};
use std::path::PathBuf;
Expand Down Expand Up @@ -233,7 +233,11 @@ impl InnerSegmentMeta {
pub struct IndexSettings {
/// Sorts the documents by information
/// provided in `IndexSortByField`
#[serde(skip_serializing_if = "Option::is_none")]
pub sort_by_field: Option<IndexSortByField>,
/// The `Compressor` used to compress the doc store.
#[serde(default)]
pub docstore_compression: Compressor,
}
/// Settings to presort the documents in an index
///
Expand Down Expand Up @@ -380,6 +384,7 @@ mod tests {
field: "text".to_string(),
order: Order::Asc,
}),
..Default::default()
},
segments: Vec::new(),
schema,
Expand All @@ -389,7 +394,7 @@ mod tests {
let json = serde_json::ser::to_string(&index_metas).expect("serialization failed");
assert_eq!(
json,
r#"{"index_settings":{"sort_by_field":{"field":"text","order":"Asc"}},"segments":[],"schema":[{"name":"text","type":"text","options":{"indexing":{"record":"position","tokenizer":"default"},"stored":false}}],"opstamp":0}"#
r#"{"index_settings":{"sort_by_field":{"field":"text","order":"Asc"},"docstore_compression":"lz4"},"segments":[],"schema":[{"name":"text","type":"text","options":{"indexing":{"record":"position","tokenizer":"default"},"stored":false}}],"opstamp":0}"#
);
}
}
7 changes: 7 additions & 0 deletions src/directory/file_slice.rs
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,13 @@ impl FileSlice {
self.slice(from_offset..self.len())
}

/// Returns a slice from the end.
///
/// Equivalent to `.slice(self.len() - from_offset, self.len())`
pub fn slice_from_end(&self, from_offset: usize) -> FileSlice {
self.slice(self.len() - from_offset..self.len())
}

/// Like `.slice(...)` but enforcing only the `to`
/// boundary.
///
Expand Down
Loading

0 comments on commit 8d32c3b

Please sign in to comment.