mapping and tuning payloads in Solr 8

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

mapping and tuning payloads in Solr 8

Burgmans, Tom
Hi all,

In our Solr 6 setup we use string payloads to boost certain tokens (URIs). These strings are mapped to floats via a schema parameter "PayloadMapping", which can be read out in our custom WKSimilarity class (extending TFIDFSimilarity).

<fieldType name="uri_payload" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.DelimitedPayloadTokenFilterFactory" encoder="identity" delimiter="|"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.KeywordTokenizerFactory"/>
        </analyzer>
               <similarity class="com.wolterskluwer.atlas.solr.similarities.WKSimilarityFactory">
                <str name="BM25k1a">0.4</str>
                <str name="BM25k1b">0.4</str>
                <str name="BM25b">0.5</str>
                <str name="IDFCurveFactor">0</str>
                <str name="sloppyFreqCurveFactor">0.0</str>
                <str name="PayloadBoost">10.0</str>
                <str name="PayloadImpact">3.0</str>
                 <str name="PayloadCurveFactor">1.0</str>
                 <str name="PayloadMapping">isAbout=15.0,coversFiscalPeriod=10.0,type=5.0,hasTheme=5.0,subject=4.0,mentions=2.0,creator=2.0</str>
               </similarity>
</fieldType>

The reason for this indirection is convenience: by storing payload strings i.s.o. floats we could change & tune the boosts easily by updating the schema without having to change the content set.
Inside WKSimilarity each payload string is mapped to its corresponding boost value and the final boost is applied via the scorePayload method (where we could tune the boost curve via some additional schema parameters). This works well in Solr 6.

The problem: we are about to migrate to Solr 8 and after LUCENE-8014 it isn't possible anymore the override the scorePayload method in WKSimilarity (it is removed from TFIDFSimilarity). I wonder what alternatives there are for mapping strings payload to floats and use them in a tunable formula for boosting.

Thanks,
Tom Burgmans