0.9.0 index format is not compatible with the previous index format.
- MAJOR BUGFIX :
Some
Mmap
objects were being leaked, and would never get released. (@fulmicoton) - Removed most unsafe (@fulmicoton)
- Indexer memory footprint improved. (VInt comp, inlining the first block. (@fulmicoton)
- Stemming in other language possible (@pentlander)
- Segments with no docs are deleted earlier (@barrotsteindev)
- Added grouped add and delete operations. They are guaranteed to happen together (i.e. they cannot be split by a commit). In addition, adds are guaranteed to happen on the same segment. (@elbow-jason)
- Removed
INT_STORED
andINT_INDEXED
. It is now possible to useSTORED
andINDEXED
for int fields. (@fulmicoton) - Added DateTime field (@barrotsteindev)
- Added IndexReader. By default, index is reloaded automatically upon new commits (@fulmicoton)
- SIMD linear search within blocks (@fulmicoton)
tantivy 0.9 brought some API breaking change. To update from tantivy 0.8, you will need to go through the following steps.
-
schema::INT_INDEXED
andschema::INT_STORED
should be replaced byschema::INDEXED
andschema::INT_STORED
. -
The index now does not hold the pool of searcher anymore. You are required to create an intermediary object called
IndexReader
for this.// create the reader. You typically need to create 1 reader for the entire // lifetime of you program. let reader = index.reader()?; // Acquire a searcher (previously `index.searcher()`) is now written: let searcher = reader.searcher(); // With the default setting of the reader, you are not required to // call `index.load_searchers()` anymore. // // The IndexReader will pick up that change automatically, regardless // of whether the update was done in a different process or not. // If this behavior is not wanted, you can create your reader with // the `ReloadPolicy::Manual`, and manually decide when to reload the index // by calling `reader.reload()?`.
Fixing build for x86_64 platforms. (#496) No need to update from 0.8.1 if tantivy is building on your platform.
Hotfix of #476.
Merge was reflecting deletes before commit was passed. Thanks @barrotsteindev for reporting the bug.
No change in the index format
- API Breaking change in the collector API. (@jwolfe, @fulmicoton)
- Multithreaded search (@jwolfe, @fulmicoton)
No change in the index format
- Bugfix: NGramTokenizer panics on non ascii chars
- Added a space usage API
- Skip data for doc ids and positions (@fulmicoton), greatly improving performance
- Tantivy error now rely on the failure crate (@drusellers)
- Added support for
AND
,OR
,NOT
syntax in addition to the+
,-
syntax - Added a snippet generator with highlight (@vigneshsarma, @fulmicoton)
- Added a
TopFieldCollector
(@pentlander)
- Bugfix #324. GC removing was removing file that were still in useful
- Added support for parsing AllQuery and RangeQuery via QueryParser
- AllQuery:
*
- RangeQuery:
- Inclusive
field:[startIncl to endIncl]
- Exclusive
field:{startExcl to endExcl}
- Mixed
field:[startIncl to endExcl}
and vice versa - Unbounded
field:[start to *]
,field:[* to end]
- Inclusive
- AllQuery:
Special thanks to @drusellers and @jason-wolfe for their contributions to this release!
- Removed C code. Tantivy is now pure Rust. (@pmasurel)
- BM25 (@pmasurel)
- Approximate field norms encoded over 1 byte. (@pmasurel)
- Compiles on stable rust (@pmasurel)
- Add &[u8] fastfield for associating arbitrary bytes to each document (@jason-wolfe) (#270)
- Completely uncompressed
- Internally: One u64 fast field for indexes, one fast field for the bytes themselves.
- Add NGram token support (@drusellers)
- Add Stopword Filter support (@drusellers)
- Add a FuzzyTermQuery (@drusellers)
- Add a RegexQuery (@drusellers)
- Various performance improvements (@pmasurel)_
- bugfix #274
- bugfix #280
- bugfix #289
- bugfix #254 : tantivy failed if no documents in a segment contained a specific field.
- Faceting
- RangeQuery
- Configurable tokenization pipeline
- Bugfix in PhraseQuery
- Various query optimisation
- Allowing very large indexes
- 64 bits file address
- Smarter encoding of the
TermInfo
objects
- Bugfix race condition when deleting files. (#198)
- Prevent usage of AVX2 instructions (#201)
- Bugfix for non-indexed fields. (#199)
- Raise the limit of number of fields (previously 256 fields) (@fulmicoton)
- Removed u32 fields. They are replaced by u64 and i64 fields (#65) (@fulmicoton)
- Optimized skip in SegmentPostings (#130) (@lnicola)
- Replacing rustc_serialize by serde. Kudos to @KodrAus and @lnicola
- Using error-chain (@KodrAus)
- QueryParser: (@fulmicoton)
- Explicit error returned when searched for a term that is not indexed
- Searching for a int term via the query parser was broken
(age:1)
- Searching for a non-indexed field returns an explicit Error
- Phrase query for non-tokenized field are not tokenized by the query parser.
- Faster/Better indexing (@fulmicoton)
- using murmurhash2
- faster merging
- more memory efficient fast field writer (@lnicola )
- better handling of collisions
- lesser memory usage
- Added API, most notably to iterate over ranges of terms (@fulmicoton)
- Bugfix that was preventing to unmap segment files, on index drop (@fulmicoton)
- Made the doc! macro public (@fulmicoton)
- Added an alternative implementation of the streaming dictionary (@fulmicoton)
- Expose a method to trigger files garbage collection
Special thanks to @Kodraus @lnicola @Ameobea @manuel-woelker @celaus for their contribution to this release.
Thanks also to everyone in tantivy gitter chat for their advise and company :)
https://gitter.im/tantivy-search/tantivy
Warning:
Tantivy 0.3 is NOT backward compatible with tantivy 0.2 code and index format. You should not expect backward compatibility before tantivy 1.0.
- Delete. You can now delete documents from an index.
- Support for windows (Thanks to @lnicola)
- Added CI for Windows (https://ci.appveyor.com/project/fulmicoton/tantivy) Thanks to @KodrAus ! (#108)
- Various dependy version update (Thanks to @Ameobea) #76
- Fixed several race conditions in
Index.wait_merge_threads
- Fixed #72. Mmap were never released.
- Fixed #80. Fast field used to take an amplitude of 32 bits after a merge. (Ouch!)
- Fixed #92. u32 are now encoded using big endian in the fst in order to make there enumeration consistent with the natural ordering.
- Building binary targets for tantivy-cli (Thanks to @KodrAus)
- Misc invisible bug fixes, and code cleanup.
- Use