ComplexPhraseQueryParser class question

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

ComplexPhraseQueryParser class question

baris.kazar
Hi,-

  I hope everyone is doing great.


i have a question regarrding ComplexPhraseQueryParser class.

This class can handle this queryText case very well:


"term1 erm2 abcd term3*"~2

(last term3 has * at the end and the whole phrase has slop value 2)


The term1, term2 and term3 are all in the Lucene index but abcd is not.

In other words there is no "term1 term2 abcd term3" in the Lucene index

but i still would like to find the following in my results:

"term1 term2 term3" despite having abcd term there.

How can i achieve this?


i setInOrder as true setPhraseSlop as 2 for the ComplexPhraseQueryParser.


Best regards

baris



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

回复:ComplexPhraseQueryParser class question

陈志祥
the standard phrasequery cannot do this, but you can prefilter the invalid term(abcd) out by using MultiTerms api.

Also, I have found that “a b c”~2 phrase query does not really match “a x x b x x c” by its implementation……






<img style="border-radius: 50%;border:none" width="96" height="96" src="">
陈志祥
阿里巴巴 地图引擎核心算法工程师
电话:057128223456-81124100
邮箱:[hidden email]
地址:上海-长宁-申通信息广场
<img width="16" height="16" style="border:none" src="">
<img style="border-radius: 4px;border:none;vertical-align: middle" src="" width="32" height="32"> 阿里巴巴 企业主页
信息安全声明:本邮件包含信息归发件人所在组织所有,发件人所在组织对该邮件拥有所有权利。
请接收者注意保密,未经发件人书面许可,不得向任何第三方组织和个人透露本邮件所含信息的全部或部分。以上声明仅适用于工作邮件。
Information Security Notice: The information contained in this mail is solely property of the sender's organization.
This mail communication is confidential. Recipients named above are obligated to maintain secrecy and are not permitted to disclose the contents of this communication to others.
------------------------------------------------------------------
发件人:<[hidden email]>
日 期:2020年01月30日 05:02:50
收件人:[hidden email]<[hidden email]>
抄 送:baris.kazar<[hidden email]>
主 题:ComplexPhraseQueryParser class question

Hi,-

  I hope everyone is doing great.


i have a question regarrding ComplexPhraseQueryParser class.

This class can handle this queryText case very well:


"term1 erm2 abcd term3*"~2

(last term3 has * at the end and the whole phrase has slop value 2)


The term1, term2 and term3 are all in the Lucene index but abcd is not.

In other words there is no "term1 term2 abcd term3" in the Lucene index

but i still would like to find the following in my results:

"term1 term2 term3" despite having abcd term there.

How can i achieve this?


i setInOrder as true setPhraseSlop as 2 for the ComplexPhraseQueryParser.


Best regards

baris



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: ComplexPhraseQueryParser class question

baris.kazar
Thanks Zhixiang. Yes, it cant find when there is an unrelated term in the middle that is not indexed.

Similar to what You suggested:
i can try the queryText by excluding one term at a time with the ComplexPhraseQueryParser and see best matches.
 But, i'd rather this is embedded into a Lucene api.

My question is asking also whether ComplexPhraseQueryParser has a way to support partial phrase match capability?

Elastic Search has this capability with a percentage indication.

i am surprised Lucene Core does not have this, i hope i am wrong.

Best regards


> On Jan 29, 2020, at 7:02 PM, 陈志祥 <[hidden email]> wrote:
>
> the standard phrasequery cannot do this, but you can prefilter the invalid term(abcd) out by using MultiTerms api.
>
> Also, I have found that “a b c”~2 phrase query does not really match “a x x b x x c” by its implementation……
>
>
>
>
>
>
>
> 陈志祥
> 阿里巴巴 地图引擎核心算法工程师
> 电话:057128223456-81124100
> 邮箱:[hidden email]
> 地址:上海-长宁-申通信息广场
>  
>   阿里巴巴 企业主页
> 信息安全声明:本邮件包含信息归发件人所在组织所有,发件人所在组织对该邮件拥有所有权利。
> 请接收者注意保密,未经发件人书面许可,不得向任何第三方组织和个人透露本邮件所含信息的全部或部分。以上声明仅适用于工作邮件。
> Information Security Notice: The information contained in this mail is solely property of the sender's organization.
> This mail communication is confidential. Recipients named above are obligated to maintain secrecy and are not permitted to disclose the contents of this communication to others.
> ------------------------------------------------------------------
> 发件人:<[hidden email]>
> 日 期:2020年01月30日 05:02:50
> 收件人:[hidden email]<[hidden email]>
> 抄 送:baris.kazar<[hidden email]>
> 主 题:ComplexPhraseQueryParser class question
>
> Hi,-
>
>   I hope everyone is doing great.
>
>
> i have a question regarrding ComplexPhraseQueryParser class.
>
> This class can handle this queryText case very well:
>
>
> "term1 erm2 abcd term3*"~2
>
> (last term3 has * at the end and the whole phrase has slop value 2)
>
>
> The term1, term2 and term3 are all in the Lucene index but abcd is not.
>
> In other words there is no "term1 term2 abcd term3" in the Lucene index
>
> but i still would like to find the following in my results:
>
> "term1 term2 term3" despite having abcd term there.
>
> How can i achieve this?
>
>
> i setInOrder as true setPhraseSlop as 2 for the ComplexPhraseQueryParser.
>
>
> Best regards
>
> baris
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]