Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifying search query so one field scores higher over others? #139

Open
AmandaUCSC opened this issue Mar 23, 2017 · 7 comments
Open

Modifying search query so one field scores higher over others? #139

AmandaUCSC opened this issue Mar 23, 2017 · 7 comments

Comments

@AmandaUCSC
Copy link

AmandaUCSC commented Mar 23, 2017

Has anyone yet modified this plug-in's query so it scores a field higher over others? It should be possible according to the SolrRelevance FAQs wiki. Specifically, I want to do this:

How can I make "superman" in the title field score higher than in the subject field? For the standard request handler, "boost" the clause on the title field: q=title:superman^2 subject:superman Using the dismax request handler, one can specify boosts on fields in parameters such as qf: q=superman&qf=title^2 subject

I think what I need to do is somehow change the code in the ResultsController.php here:

  // Get the facet GET parameter
   $facet = $this->_request->facet;

    // Form the composite Solr query.
    if (!empty($facet)) $query .= " AND {$facet}";

    // Limit the query to public items if required
    if($limitToPublicItems) {
       $query .= ' AND public:"true"';
    }

    return $query;

Am I right? Has anyone already done this before?

@AmandaUCSC
Copy link
Author

AmandaUCSC commented Mar 24, 2017

Ah, I think I figured out where I change things... really quite simple I think. I see the DisMax query parser in the solrconfig.xml file. I believe if I just modify things to how we want them in there, everything should work.

UPDATE: Well, I modified, successfully reloaded it into solr, and it still doesn't seem to work the way I want to (the search doesn't seem to have changed at all). The field I want to add is "identifier." So here is basically what I added:

text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0 identifier^10.0

I am searching for an exact match in the identifier using a specific format. For the sake of simplicity, let's say it's called "Document(1900) No. 15". So when someone types that into the search box, the document with that identifier should be the very first to come up. At the moment it's not - it's about the fifth document to come up. The other documents mention this document in the text, and they are coming up prior to this one. Why would that be?

@AmandaUCSC
Copy link
Author

Hi all, so I tried boosting the identifier more (^20) and I also tried moving it to the front of the line (before text) just to see if that made a difference and it didn't. Do I also need to change another file somewhere to reflect that I'm adding the identifier field here? I couldn't tell from the schema file if that was necessary or not.

@kloor
Copy link

kloor commented Mar 29, 2017

I've been working a bit on the SolrSearch_ResultsController::_getQuery() method myself, as it is neither fully allowing or fully escaping the Lucene query syntax. One thing I've discovered is that most metadata fields are indexed with unintuitive names in Solr. Basically, anything that you mark as "Is Indexed?" in the Solr Search plugin Field Configuration will be indexed in a field that is named <id>_t, where <id> is the key from the omeka_solr_search_fields table.

For example in one of my installations the Identifier field has an <id> of 48, so the actual field name in Solr is 48_t. So, I guess to increase its relevance you would have to add 48_t^10.0 to that configuration file, but I have not tested this. Also, your field may have a different <id> number.

You should also be able to query that field in your query string with 48_t:"Document(1900) No. 15", but the current SolrSearch plugin is replacing colons with spaces in all queries, so that wouldn't work.

Looking through other forks of the plugin, I did find this commit from @jajm that appears to give more intuitive names to the fields in Solr: biblibre@48ab77d

@AmandaUCSC
Copy link
Author

AmandaUCSC commented Mar 29, 2017

Thanks -- and you're right. I actually figured out how to change that in the solrconfig.xml file yesterday and it worked (I was trying to do it for the Identifier field in our installation, which was 43_t ). I had to modify the default /select handler and added an edismax with qf. In the end it looked like this:

<str name="defType">edismax</str> <str name="qf">43_t^10.0 title^10.0 text^5.0</str>

I'll probably have to modify it again at some point based on additional criteria. But at least I figured out how to get this to work!

@kloor
Copy link

kloor commented Mar 30, 2017

Thanks for pointing out the EDisMax query parser. It performs much better than the standard query parser, and gracefully handles syntax issues like single quote problem reported in #137.

I've updated the SolrSearch_ResultsController::_getQuery() method in my fork of the SolrSearch package to use EDisMax:
https://github.com/BGSU-LITS/SolrSearch/blob/master/controllers/ResultsController.php#L110

I specified using EDisMax in the query string instead of the solrconfig.xml file so it would work without having to change that file and reload the core. I think the qf parameter from the config file should still be respected, though.

The other changes I made to the method were to remove the parts that stripped characters from the query, and to add plus signs to the facets and public field so that they are required when using EDisMax. Without the plus signs, documents that did not match the facets could still be selected.

@AmandaUCSC
Copy link
Author

AmandaUCSC commented Mar 31, 2017

Excellent! So should I get rid of the solrconfig.xml code I changed and just replace the SolrSearch_ResultsController::_getQuery() with the new version? (I had no idea about the plus sign issue...) Or do you mean to leave the config file qf parameter alone... I'm presuming no other methods or files were changed? Thanks!

@kloor
Copy link

kloor commented Mar 31, 2017

Right, I only changed SolrSearch_ResultsController::_getQuery(). You would probably want to keep your solrconfig.xml file for the qf parameter.

I was hesitant about specifying to use EDisMax in the code, as unlike the solrconfig.xml file, it wouldn't be configurable without editing the code. But, for people who want better query processing, it's easiest to just replace that function than to find and edit their solrconfig.xml file and reload the core.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants