forked from apache/airflow
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Weaviate Provider (apache#35060)
* Add Weaviate Provider * Fix docs and static checks * Remove callable interface params from the operator * Resolve conflicts * Fix docs * Resolve conflicts * Update airflow/providers/weaviate/hooks/weaviate.py Co-authored-by: Pankaj Singh <[email protected]> * Update airflow/providers/weaviate/operators/weaviate.py Co-authored-by: Pankaj Singh <[email protected]> * Update airflow/providers/weaviate/operators/weaviate.py Co-authored-by: Pankaj Singh <[email protected]> * Add security.rst to docs * Resolve conflicts * Address PR Comments --------- Co-authored-by: Pankaj Singh <[email protected]>
- Loading branch information
1 parent
64c2eea
commit 4fe87ea
Showing
35 changed files
with
1,181 additions
and
21 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -108,6 +108,7 @@ body: | |
- telegram | ||
- trino | ||
- vertica | ||
- weaviate | ||
- yandex | ||
- zendesk | ||
validations: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
.. Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
.. http://www.apache.org/licenses/LICENSE-2.0 | ||
.. Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
``apache-airflow-providers-weaviate`` | ||
|
||
Changelog | ||
--------- | ||
|
||
1.0.0 | ||
..... | ||
|
||
Initial version of the provider. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
# | ||
# NOTE! THIS FILE IS AUTOMATICALLY GENERATED AND WILL BE | ||
# OVERWRITTEN WHEN PREPARING DOCUMENTATION FOR THE PACKAGES. | ||
# | ||
# IF YOU WANT TO MODIFY IT, YOU SHOULD MODIFY THE TEMPLATE | ||
# `PROVIDER__INIT__PY_TEMPLATE.py.jinja2` IN the `dev/provider_packages` DIRECTORY | ||
# |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
# | ||
# NOTE! THIS FILE IS AUTOMATICALLY GENERATED AND WILL BE | ||
# OVERWRITTEN WHEN PREPARING DOCUMENTATION FOR THE PACKAGES. | ||
# | ||
# IF YOU WANT TO MODIFY IT, YOU SHOULD MODIFY THE TEMPLATE | ||
# `PROVIDER__INIT__PY_TEMPLATE.py.jinja2` IN the `dev/provider_packages` DIRECTORY | ||
# |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,177 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
from __future__ import annotations | ||
|
||
from typing import Any | ||
|
||
import weaviate | ||
|
||
from airflow.hooks.base import BaseHook | ||
|
||
|
||
class WeaviateHook(BaseHook): | ||
""" | ||
Interact with Weaviate database to store vectors. This hook uses the `conn_id`. | ||
:param conn_id: The connection id to use when connecting to Weaviate. <howto/connection:weaviate> | ||
""" | ||
|
||
conn_name_attr = "conn_id" | ||
default_conn_name = "weaviate_default" | ||
conn_type = "weaviate" | ||
hook_name = "Weaviate" | ||
|
||
def __init__(self, conn_id: str = default_conn_name, *args: Any, **kwargs: Any) -> None: | ||
super().__init__(*args, **kwargs) | ||
self.conn_id = conn_id | ||
|
||
@staticmethod | ||
def get_connection_form_widgets() -> dict[str, Any]: | ||
"""Returns connection widgets to add to connection form.""" | ||
from flask_appbuilder.fieldwidgets import BS3PasswordFieldWidget | ||
from flask_babel import lazy_gettext | ||
from wtforms import PasswordField | ||
|
||
return { | ||
"token": PasswordField(lazy_gettext("Weaviate API Token"), widget=BS3PasswordFieldWidget()), | ||
} | ||
|
||
@staticmethod | ||
def get_ui_field_behaviour() -> dict[str, Any]: | ||
"""Returns custom field behaviour.""" | ||
return { | ||
"hidden_fields": ["port", "schema"], | ||
"relabeling": { | ||
"login": "OIDC Username", | ||
"password": "OIDC Password", | ||
}, | ||
} | ||
|
||
def get_client(self) -> weaviate.Client: | ||
conn = self.get_connection(self.conn_id) | ||
url = conn.host | ||
username = conn.login or "" | ||
password = conn.password or "" | ||
extras = conn.extra_dejson | ||
token = extras.pop("token", "") | ||
additional_headers = extras.pop("additional_headers", {}) | ||
scope = conn.extra_dejson.get("oidc_scope", "offline_access") | ||
|
||
if token == "" and username != "": | ||
auth_client_secret = weaviate.AuthClientPassword( | ||
username=username, password=password, scope=scope | ||
) | ||
else: | ||
auth_client_secret = weaviate.AuthApiKey(token) | ||
|
||
client = weaviate.Client( | ||
url=url, auth_client_secret=auth_client_secret, additional_headers=additional_headers | ||
) | ||
|
||
return client | ||
|
||
def test_connection(self) -> tuple[bool, str]: | ||
try: | ||
client = self.get_client() | ||
client.schema.get() | ||
return True, "Connection established!" | ||
except Exception as e: | ||
self.log.error("Error testing Weaviate connection: %s", e) | ||
return False, str(e) | ||
|
||
def create_class(self, class_json: dict[str, Any]) -> None: | ||
"""Create a new class.""" | ||
client = self.get_client() | ||
client.schema.create_class(class_json) | ||
|
||
def create_schema(self, schema_json: dict[str, Any]) -> None: | ||
""" | ||
Create a new Schema. | ||
Instead of adding classes one by one , you can upload a full schema in JSON format at once. | ||
:param schema_json: The schema to create | ||
""" | ||
client = self.get_client() | ||
client.schema.create(schema_json) | ||
|
||
def batch_data( | ||
self, class_name: str, data: list[dict[str, Any]], batch_config_params: dict[str, Any] | None = None | ||
) -> None: | ||
client = self.get_client() | ||
if not batch_config_params: | ||
batch_config_params = {} | ||
client.batch.configure(**batch_config_params) | ||
with client.batch as batch: | ||
# Batch import all data | ||
for index, data_obj in enumerate(data): | ||
self.log.debug("importing data: %s", index + 1) | ||
vector = data_obj.pop("Vector", None) | ||
if vector is not None: | ||
batch.add_data_object(data_obj, class_name, vector=vector) | ||
else: | ||
batch.add_data_object(data_obj, class_name) | ||
|
||
def delete_class(self, class_name: str) -> None: | ||
"""Delete an existing class.""" | ||
client = self.get_client() | ||
client.schema.delete_class(class_name) | ||
|
||
def query_with_vector( | ||
self, | ||
embeddings: list[float], | ||
class_name: str, | ||
*properties: list[str], | ||
certainty: float = 0.7, | ||
limit: int = 1, | ||
) -> dict[str, dict[Any, Any]]: | ||
""" | ||
Query weaviate database with near vectors. | ||
This method uses a vector search using a Get query. we are using a with_near_vector to provide | ||
weaviate with a query with vector itself. This is needed for query a Weaviate class with a custom, | ||
external vectorizer. Weaviate then converts this into a vector through the inference API | ||
(OpenAI in this particular example) and uses that vector as the basis for a vector search. | ||
""" | ||
client = self.get_client() | ||
results: dict[str, dict[Any, Any]] = ( | ||
client.query.get(class_name, properties[0]) | ||
.with_near_vector({"vector": embeddings, "certainty": certainty}) | ||
.with_limit(limit) | ||
.do() | ||
) | ||
return results | ||
|
||
def query_without_vector( | ||
self, search_text: str, class_name: str, *properties: list[str], limit: int = 1 | ||
) -> dict[str, dict[Any, Any]]: | ||
""" | ||
Query using near text. | ||
This method uses a vector search using a Get query. we are using a nearText operator to provide | ||
weaviate with a query search_text. Weaviate then converts this into a vector through the inference | ||
API (OpenAI in this particular example) and uses that vector as the basis for a vector search. | ||
""" | ||
client = self.get_client() | ||
results: dict[str, dict[Any, Any]] = ( | ||
client.query.get(class_name, properties[0]) | ||
.with_near_text({"concepts": [search_text]}) | ||
.with_limit(limit) | ||
.do() | ||
) | ||
return results |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
# | ||
# NOTE! THIS FILE IS AUTOMATICALLY GENERATED AND WILL BE | ||
# OVERWRITTEN WHEN PREPARING DOCUMENTATION FOR THE PACKAGES. | ||
# | ||
# IF YOU WANT TO MODIFY IT, YOU SHOULD MODIFY THE TEMPLATE | ||
# `PROVIDER__INIT__PY_TEMPLATE.py.jinja2` IN the `dev/provider_packages` DIRECTORY | ||
# |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
from __future__ import annotations | ||
|
||
from functools import cached_property | ||
from typing import TYPE_CHECKING, Any, Sequence | ||
|
||
from airflow.models import BaseOperator | ||
from airflow.providers.weaviate.hooks.weaviate import WeaviateHook | ||
|
||
if TYPE_CHECKING: | ||
from airflow.utils.context import Context | ||
|
||
|
||
class WeaviateIngestOperator(BaseOperator): | ||
""" | ||
Operator that store vector in the Weaviate class. | ||
.. seealso:: | ||
For more information on how to use this operator, take a look at the guide: | ||
:ref:`howto/operator:WeaviateIngestOperator` | ||
Operator that accepts input json to generate embeddings on or accepting provided custom vectors | ||
and store them in the Weaviate class. | ||
:param conn_id: The Weaviate connection. | ||
:param class_name: The Weaviate class to be used for storing the data objects into. | ||
:param input_json: The JSON representing Weaviate data objects to generate embeddings on (or provides | ||
custom vectors) and store them in the Weaviate class. Either input_json or input_callable should be | ||
provided. | ||
""" | ||
|
||
template_fields: Sequence[str] = ("input_json",) | ||
|
||
def __init__( | ||
self, | ||
conn_id: str, | ||
class_name: str, | ||
input_json: list[dict[str, Any]], | ||
**kwargs: Any, | ||
) -> None: | ||
self.batch_params = kwargs.pop("batch_params", {}) | ||
self.hook_params = kwargs.pop("hook_params", {}) | ||
super().__init__(**kwargs) | ||
self.class_name = class_name | ||
self.conn_id = conn_id | ||
self.input_json = input_json | ||
|
||
@cached_property | ||
def hook(self) -> WeaviateHook: | ||
"""Return an instance of the WeaviateHook.""" | ||
return WeaviateHook(conn_id=self.conn_id, **self.hook_params) | ||
|
||
def execute(self, context: Context) -> None: | ||
self.log.debug("Input json: %s", self.input_json) | ||
self.hook.batch_data(self.class_name, self.input_json, **self.batch_params) |
Oops, something went wrong.