Adding weights and improving search ranking #41

dylanpivo · 2025-02-13T09:17:33Z

Add weights to title, keyword and abstract.
- Curation request for the weighting order to be: keyword, title and then abstract.

Fix and improve ranking at search:
during search the query was not treating 'full_text' and 'query' as columns, but rather as values.
Normalisation - current normalisation is set to 4 | 1 but will be investigated.

Fixes #9

dylanpivo · 2025-02-13T12:00:25Z

Normalization:

The below found here, outlines the normalization options.

0 (the default) ignores the document length
1 divides the rank by 1 + the logarithm of the document length
2 divides the rank by the document length
4 divides the rank by the mean harmonic distance between extents (this is implemented only by ts_rank_cd)
8 divides the rank by the number of unique words in document
16 divides the rank by 1 + the logarithm of the number of unique words in document
32 divides the rank by itself + 1

4 | 1 is currently in use.

4: weighs the record higher if the words in the search term occur closer together in the record. for instance if "climate" and "change" occur right after each other as opposed to at opposite ends of the document.

1: weighs the record lower if the document is longer. the log ensures the penalization is lessened.

dylanpivo · 2025-02-17T13:22:33Z

Testing:
The testing involves mocking up a list of different metadata records, publishing them and then searching. The metadata will be put together with only Lorum Ipsum mock data and in such a way to cater for the different ranking circumstances.

The list of options from which combinations will be generated are as follows:

The search term will be fixed.

Search terms in title no harmonic distance.
Search terms in title with harmonic distance.
No search term in title.

Short length abstract. (50 words)
Long abstract. (200 words)

No/Low harmonic distance of search terms in abstract.
High harmonic distance between search terms in abstract.

Many instances of search terms in abstract. (6 instances)
Few instances of search terms in abstract. (3 instances)
Note: the amount of instances does not increase if the abstract length increases. This is so lengthening the abstract effects the ranking in isolation.

Keywords with all search terms.
Keywords with no search terms.

dylanpivo marked this pull request as draft February 13, 2025 09:17

weightings added to titles and search ranking method updated

d59852a

dylanpivo force-pushed the add_title_weights branch from 6e02889 to d59852a Compare February 13, 2025 10:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding weights and improving search ranking #41

Adding weights and improving search ranking #41

dylanpivo commented Feb 13, 2025 •

edited

Loading

dylanpivo commented Feb 13, 2025 •

edited

Loading

dylanpivo commented Feb 17, 2025 •

edited

Loading

Adding weights and improving search ranking #41

Are you sure you want to change the base?

Adding weights and improving search ranking #41

Conversation

dylanpivo commented Feb 13, 2025 • edited Loading

dylanpivo commented Feb 13, 2025 • edited Loading

dylanpivo commented Feb 17, 2025 • edited Loading

dylanpivo commented Feb 13, 2025 •

edited

Loading

dylanpivo commented Feb 13, 2025 •

edited

Loading

dylanpivo commented Feb 17, 2025 •

edited

Loading