-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for full text queries and hybrid search queries #303
Conversation
@justin-cechmanek you should rebase your branch and change this pr to be against |
7f959f1
to
7e0f24d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments nice start
…ept aggregate queries
Prior to RedisVL 0.4.0, we validated passed-in Redis clients when the user called `set_client()`. This PR reintroduces similar behavior by validating all clients, whether we created them or not, on first access through the lazy-client mechanism. Closes RAAE-694.
Add `batch_search` and `batch_query` methods to `SearchIndex` and `AsyncSearchIndex`. These methods run search commands in a pipeline and return correctly-parsed and post-processed results.
This PR implements a layered architecture for managing and validating searchable data in Redis, with clear separation of concerns between schema definition, data validation, and storage operations. - `IndexSchema` provides the blueprint for data structure and constraints - Defines fields with specific types (TEXT, TAG, NUMERIC, GEO, VECTOR) - Supports different storage types (HASH, JSON) with appropriate configuration - `SchemaModelGenerator` dynamically creates Pydantic models from schema definitions - Implements a caching mechanism to avoid redundant model generation - Maps Redis field types to appropriate Python/Pydantic types - Provides type-specific validators: - VECTOR: validates dimensions and value ranges (e.g., INT8 range checks) - GEO: validates geographic coordinate format - NUMERIC: prevents boolean values - `BaseStorage` is the abstract class provides the foundation for Redis operations - Specialized implementations (HashStorage, JsonStorage) for different Redis data types - Enforces schema validation during write operations when set to True - Implements optimized batch operations using Redis pipelines - Supports both synchronous and asynchronous interfaces - Handles key generation, preprocessing, and error handling The `SearchIndex` contains the setting `validate_on_load`, which defaults on `False`. Objects are preprocessed and validated against the schema Objects are prepared with appropriate keys Batch writing occurs using Redis pipelines for efficiency TTL (expiration) can be applied if specified Keys are fetched in batches using pipelines Data is converted from Redis format to Python objects Bytes are automatically converted to appropriate types
Run API-dependent tests once per matrix run.
This pr accomplishes 2 goals: 1. Add an option for users to easily get back a similarity value between 0 and 1 that they might expect to compare against other vector dbs. 2. Fix the current bug that `distance_threshold` is validated to be between 0 and 1 when in reality it can take values between 0 and 2. > Note: after much careful thought I believe it is best that for `0.5.0` we do **not** start enforcing all distance_thresholds between 0 and 1 and move to this option as default behavior. Ideally this metric would be consistent throughout our code and I don't love supporting this flag but I think it provides the value that is scoped for this ticket while inflicting the least amount of pain and confusion. Changes: 1. Adds the `normalize_vector_distance` flag to VectorQuery and VectorRangeQuery. Behavior: - If set to `True` it normalizes values returned from redis to a value between 0 and 1. - For cosine similarity, it applies `(2 - value)/2`. - For L2 distance, it applies normalization `(1/(1+value))`. - For IP, it does nothing and throws a warning since normalized IP is cosine by definition. - For VectorRangeQuery, if `normalize_vector_distance=True` the distance threshold is now validated to be between 0 and 1 and denormalized for execution against the database to make consistent. 2. Relaxes validation for semantic caching and routing to be between 0 and 2 fixing the current bug and aligning with how the database actually functions.
…ept aggregate queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Couple of small comments, but 👍
Co-authored-by: Robert Shelton <[email protected]> Co-authored-by: Andrew Brookins <[email protected]> Co-authored-by: Tyler Hutcherson <[email protected]> Co-authored-by: Robert Shelton <[email protected]>
Co-authored-by: Robert Shelton <[email protected]> Co-authored-by: Andrew Brookins <[email protected]> Co-authored-by: Tyler Hutcherson <[email protected]> Co-authored-by: Robert Shelton <[email protected]>
No description provided.