Use a BERT question answering model to query the information available on a webpage with natural language. Uses https://arxiv.org/abs/2004.02984 fine-tuned on https://rajpurkar.github.io/SQuAD-explorer/ via TensorFlowJS to search for and return relevant answers in text.
TODO: Image.
This extension is an experiment. Deep learning models like BERT are powerful but may return unpredictable and/or biased results that are tough to interpret. Please apply best judgement when analyzing search results.
Traditional search uses string-matching to find information within a webpage. Although most of us have trained ourselves to search for what we're looking for via string match, this can sometimes be a proxy for the true information we're trying to discover.
In our example above, imagine you're browsing the stripe documentation page on
testing (https://stripe.com/docs/testing),
aiming to understand the difference between test mode and live mode. With
string matching, you might search through some relevant phrases "live mode"
,
"test mode"
, or "difference"
and scan through the various results. With
semantic search, you can directly phrase your question "What is the difference between live mode and test mode?"
. We see that the model returns a relevant
result, even though the page does not contain the term "difference
".
Every time a user executes a search:
- The content script collects all
<p>
,<ul>
, and<ol>
elements on the page and extracts text from each. - The background script executes the question-answering model on every element, using the query as the question and the element's text as the context.
- If a match is returned by the model, it is highlighted within the page along with the confidence score returned by the model.
There are three main components that interact via Message Passing to orchestrate the extension:
- Popup (
popup.js
): React application that renders the search bar, controls searching and iterating through the results. - Content Script (
content.js
): Runs in the context of the current tab, responsible for reading from and manipulating the DOM. - Background (
background.js
): Background script that loads and executes the TensorFlowJS model on question-context pairs.
src/js/message_types.js
contains the messages used to interact between these
three components.