Get only partial match results

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Get only partial match results

Balaji.A
Hi All,
   I have a specific requirement as stated below. Kindly suggest if this can be acheived or not and the steps to acheive it.

I have 2 cores storing different kind of data.

My search query should return results in the below given order

1) Exact match resutls from core1
2) Exact match results from core2
3) Partial match results from core1
4) Partial match results from core2

Note: I don't want exact match results to be duplicated in Partial match results.

Please suggest!

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Get only partial match results

Jonathan Rochkind
I think you're going to have trouble doing this with seperate cores.
With seperate cores, you'll need to issue two querries to solr, one for
each core. And then to intermingle results from the differnet cores like
that, it's going to require difficult (esp to do at all efficiently)
client side code. Different cores really are entirely seperate.

If you put everything in the same core, but with different solr fields
used, it will be easier to intermingle.  Conceptually, think of all the
documents you were thinking of as being indexed in 'core1' as being just
in the single core, but having their text indexed in a field called,
say, "text_core1".   Then the 'core2' documents are in that same core
actually, but indexed under "text_core2".

Now if you just wanted results from core1 followed by results from
core2, you could use dismax and boost the core1 field a lot, something like:

text_core1^1000   text_core2

Then we add in your requirement for "exact matches" first.  You need to
be more precise about what you mean by "exact" matches and "partial"
matches.  By "exact" do you mean phrase searching?  Do you mean it must
match the entire field exactly start to finish?  Do you mean un-stemmed,
where "partial" is stemmed?

Once you figure this out, one possible way to approach it is to set up a
solr field with analyzers such that it will only match "exact" matches.
For instance, if you really mean exact string match without any
tokenization, you could use the KeywordTokenizer. If you are able to set
up a solr field like this, then again using dismax, the solution is
straightforward, something like:

qf=text_core1_exact^3000  text_core2_exact^2000  text_core1_partial^1000
text_core2_partial

Hope this helps you think about how to approach your problem.

Jonathan

Balaji.A wrote:

> Hi All,
>    I have a specific requirement as stated below. Kindly suggest if this can
> be acheived or not and the steps to acheive it.
>
> I have 2 cores storing different kind of data.
>
> My search query should return results in the below given order
>
> 1) Exact match resutls from core1
> 2) Exact match results from core2
> 3) Partial match results from core1
> 4) Partial match results from core2
>
> Note: I don't want exact match results to be duplicated in Partial match
> results.
>
> Please suggest!
>
> Thanks.
>  
Reply | Threaded
Open this post in threaded view
|

Re: Get only partial match results

Balaji.A
Thanks Jonathan. I appreciate your reply.

Though I got few ideas for implementing my requirement, I got stuck up with few issues. It would be more helpful if you guide me in resolving those.

As you suggested I configured single core with different fields.

For example the core contains the following fields:

core1_title_exact (type : text_ws)
core1_title_partial (type : text)
core1_content_exact (type : text_ws)
core1_content_partial (type : text)
core2_title_exact (type : text_ws)
core2_title_partial (type: text)
core2_content_exact (type : text_ws)
core2_content_partial (type: text)


Problems
*******
1) While doing a dismax query, I specify the query in double quotes for exact match. This works fine but I don't get any partial matches in search result.

My query:
q="Ryder Cup"&qf=core1_title_exact^8000 core1_content_exact^7000 core2_title_exact^6000 core2_content_exact^5000 core1_title_partial^4000 core1_content_partial^3000 core2_title_partial^2000 core2_content_partial^1000

2) If the frequency of search term is more in "core2_content_exact" field, eventhough the search term is present atleast once in the field "core1_content_exact" I get "core2_content_exact" as my first search result item.

For example assume my search term is "Ryder Cup". And if the occurance of Ryder Cup in core1_content_exact field is 1 and occurance of the same text in core2_content_exact is about 15, search query is returning me core2_content_exact as first result.

Is it something to do with term Frequency? How do I fix this problem? Even if core1_content_exact field should be my topmost priority with the match of atlest one search term.


Thanks,
Balaji
Reply | Threaded
Open this post in threaded view
|

RE: Get only partial match results

Jonathan Rochkind
> 1) While doing a dismax query, I specify the query in double quotes for
> exact match. This works fine but I don't get any partial matches in search
> result.

Rather than specify your query in quotes for 'exact' matches, I was suggesting configuring the analyzers differently for your fields "core1_title_exact" and "core1_title_partial". -- oops, except I don't think I meant analyzers, I mean differnet class types in solr.

But again, it depends on what you mean by 'exact' -- do you mean it must match the whole string start to finish?  If so, if you make the *_exact fields in schema.xml use a "string" solr.StrField instead of a  "text" solr.TextField, then querries will only match in those fields if they are _exact_, covering the whole indexed string start to finish, all punctuation and spaces etc exactly the same. (You could use some analyzers to say lowercase, remove punctuation, and normalize whitespace to make it a _bit_ more forgiving). No need for quoting the query, it'll only match if it's exact.

Oops, except I just realized this isn't neccesarily true, sorry, because of the way the dismax query parser will deal with whitespace in the query. Hmm.

If what you mean by 'exact' is just a phrase search, then you don't need the seperate *_exact fields in the first place, you can just use dismax 'ps' param with the right boost.

Hmm, I think for the first case where 'exact' really does mean 'exact' (not phrase), you might be able to combine the _exact field configured as a solr.StrField, with the 'ps' technique, only mention the _exact fields in the dismax 'ps', not the dismax 'qf'.  

I'm not completely sure any of this will work, just giving you some ideas of how I'd try approaching it if it were me.

> If the frequency of search term is more in "core2_content_exact" field,
> eventhough the search term is present atleast once in the field
> "core1_content_exact" I get "core2_content_exact" as my first search result
> item.

I'm surprised this is true with such gigantic boosts, but I'm not sure what to do about it, sorry. Although I guess the boosts I suggested aren't that different from each other, they just all are multipled by 1000, which won't make them so different from each other. You could try making the boosts even more ridiculously higher. at each stage than the last, maybe powers of 10.  ^1, ^10, ^100, ^1000, ^10000.  

Jonathan
Reply | Threaded
Open this post in threaded view
|

RE: Get only partial match results

Balaji.A
Hi Jonathan,

   Once again many thanks for your guidance. I made it work this time :-)

Thanks,
Balaji.