Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for full text queries and hybrid search queries #303

Merged
merged 27 commits into from
Apr 4, 2025

Conversation

justin-cechmanek
Copy link
Collaborator

No description provided.

@rbs333
Copy link
Collaborator

rbs333 commented Mar 26, 2025

@justin-cechmanek you should rebase your branch and change this pr to be against 0.5.0

@tylerhutcherson tylerhutcherson changed the base branch from main to 0.5.0 March 26, 2025 14:10
@tylerhutcherson tylerhutcherson force-pushed the feat/RAAE-227/hybrid-query branch from 7f959f1 to 7e0f24d Compare March 27, 2025 17:52
Copy link
Collaborator

@tylerhutcherson tylerhutcherson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments nice start

justin-cechmanek and others added 15 commits March 27, 2025 13:28
Prior to RedisVL 0.4.0, we validated passed-in Redis clients when the
user called `set_client()`. This PR reintroduces similar behavior by
validating all clients, whether we created them or not, on first access
through the lazy-client mechanism.

Closes RAAE-694.
Add `batch_search` and `batch_query` methods to `SearchIndex` and
`AsyncSearchIndex`. These methods run search commands in a pipeline
and return correctly-parsed and post-processed results.
This PR implements a layered architecture for managing and validating
searchable data in Redis, with clear separation of concerns between
schema definition, data validation, and storage operations.

- `IndexSchema` provides the blueprint for data structure and
constraints
- Defines fields with specific types (TEXT, TAG, NUMERIC, GEO, VECTOR)
- Supports different storage types (HASH, JSON) with appropriate
configuration

- `SchemaModelGenerator` dynamically creates Pydantic models from schema
definitions
- Implements a caching mechanism to avoid redundant model generation
- Maps Redis field types to appropriate Python/Pydantic types
- Provides type-specific validators:
- VECTOR: validates dimensions and value ranges (e.g., INT8 range
checks)
    - GEO: validates geographic coordinate format
    - NUMERIC: prevents boolean values

- `BaseStorage` is the abstract class provides the foundation for Redis
operations
- Specialized implementations (HashStorage, JsonStorage) for different
Redis data types
- Enforces schema validation during write operations when set to True
- Implements optimized batch operations using Redis pipelines
- Supports both synchronous and asynchronous interfaces
- Handles key generation, preprocessing, and error handling

The `SearchIndex` contains the setting `validate_on_load`, which
defaults on `False`.

Objects are preprocessed and validated against the schema
Objects are prepared with appropriate keys
Batch writing occurs using Redis pipelines for efficiency
TTL (expiration) can be applied if specified

Keys are fetched in batches using pipelines
Data is converted from Redis format to Python objects
Bytes are automatically converted to appropriate types
Run API-dependent tests once per matrix run.
This pr accomplishes 2 goals:
1. Add an option for users to easily get back a similarity value between
0 and 1 that they might expect to compare against other vector dbs.
2. Fix the current bug that `distance_threshold` is validated to be
between 0 and 1 when in reality it can take values between 0 and 2.

> Note: after much careful thought I believe it is best that for `0.5.0`
we do **not** start enforcing all distance_thresholds between 0 and 1
and move to this option as default behavior. Ideally this metric would
be consistent throughout our code and I don't love supporting this flag
but I think it provides the value that is scoped for this ticket while
inflicting the least amount of pain and confusion.

Changes:

1. Adds the `normalize_vector_distance` flag to VectorQuery and
VectorRangeQuery.

Behavior:
- If set to `True` it normalizes values returned from redis to a value
between 0 and 1.
- For cosine similarity, it applies `(2 - value)/2`.
- For L2 distance, it applies normalization `(1/(1+value))`.
- For IP, it does nothing and throws a warning since normalized IP is
cosine by definition.
- For VectorRangeQuery, if `normalize_vector_distance=True` the distance
threshold is now validated to be between 0 and 1 and denormalized for
execution against the database to make consistent.

2. Relaxes validation for semantic caching and routing to be between 0
and 2 fixing the current bug and aligning with how the database actually
functions.
@justin-cechmanek justin-cechmanek changed the title adds TextQuery Class adds TextQuery and HybridAggregationQuery Classes Apr 2, 2025
@tylerhutcherson tylerhutcherson added the enhancement New feature or request label Apr 3, 2025
@justin-cechmanek justin-cechmanek marked this pull request as ready for review April 3, 2025 22:03
@tylerhutcherson tylerhutcherson changed the title adds TextQuery and HybridAggregationQuery Classes Add support for full text queries and hybrid search queriess Apr 3, 2025
@justin-cechmanek justin-cechmanek changed the title Add support for full text queries and hybrid search queriess Add support for full text queries and hybrid search queries Apr 3, 2025
Copy link
Collaborator

@abrookins abrookins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Couple of small comments, but 👍

@tylerhutcherson tylerhutcherson merged commit e98a7ae into 0.5.0 Apr 4, 2025
31 checks passed
@tylerhutcherson tylerhutcherson deleted the feat/RAAE-227/hybrid-query branch April 4, 2025 15:41
abrookins added a commit that referenced this pull request Apr 4, 2025
Co-authored-by: Robert Shelton <[email protected]>
Co-authored-by: Andrew Brookins <[email protected]>
Co-authored-by: Tyler Hutcherson <[email protected]>
Co-authored-by: Robert Shelton <[email protected]>
abrookins added a commit that referenced this pull request Apr 4, 2025
Co-authored-by: Robert Shelton <[email protected]>
Co-authored-by: Andrew Brookins <[email protected]>
Co-authored-by: Tyler Hutcherson <[email protected]>
Co-authored-by: Robert Shelton <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants