-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search Google Drive documents and retrieve contents #265
base: main
Are you sure you want to change the base?
Conversation
…part of the document content, not its name)
… 'and' operators together
…rsions of arcade-ai
Codecov ReportAll modified and coverable lines are covered by tests ✅ 📢 Thoughts on this report? Let us know! |
@byrro I'm unable to run this locally. Looks like an issue with all of the double and single quotes in the |
…kdown, HTML, and JSON format options
…terface abstracting away args that are specific to the Google API
@EricGustin pushed a new implementation of the tool, refactored |
document_contains: Optional[list[str]] = None, | ||
document_not_contains: Optional[list[str]] = None, | ||
) -> str: | ||
query = ["mimeType = 'application/vnd.google-apps.document' and trashed = false"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking ahead, we will need to search for more file types beyond document. For example, searching for a spreadsheet by name. Perhaps mime type can be a parameter so that we don't have to worry about that debt in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, just pushed a new version with mime_type
as an argument to build files_list
query / params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you!
name_contains = keyword.replace("'", "\\'") | ||
full_text_contains = keyword.replace("'", "\\'") | ||
keyword_query = ( | ||
f"name contains '{name_contains}' or fullText contains '{full_text_contains}'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to group the left and right side of the or
inside parentheses, otherwise Google interprets the query in a way that we don't intend. See slack dm for more details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! just pushed a fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
This tool will be useful in scenarios akin to RAG, where someone wants to ask questions or request the production of a summary, for instance, about a bunch of documents related to a particular topic. Currently, to fulfill such requests, the LLM needs to first
list_documents
, thenget_document_by_id
for each document.We also implement a utility functions to return documents in Markdown and HTML, since the Drive API JSON is verbose and would waste too many tokens unnecessarily.
Limitations: the Markdown/HTML utilities do not handle table of contents (which I think aren't really useful here), headers, footers, or footnotes.
This PR deprecates
list_documents
and implementssearch_documents
, apart fromsearch_and_retrieve_documents
). This configuration makes it easier for LLMs to understand when to call each tool.Both tools had their interfaces refactored to remove Google API-specific arguments that were confusing LLMs sometimes, such as "corpora" and "support_all_drives". It now accepts arguments that better relate to expected user requests.