How to query for 'any word' in a phrase

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to query for 'any word' in a phrase

Jeroen Lauwers
Dear all,

Is there a way to construct (spans?) a phrase search like the following:
the quick brown * jumps over the * *
where * = any word but exactly 1 word

I introduced these *’s at a specific position, so a PhraseQuery with slop of 2 is just not good enough
and the two *’s at the end must be matched as well.

Is there such a thing as a Term or BytesRef that always matches everything?

Thanks,
Jeroen
Reply | Threaded
Open this post in threaded view
|

回复:How to query for 'any word' in a phrase

陈志祥
could the slop parameter in phasequery be dynamically set?





陈志祥
阿里巴巴 地图引擎核心算法工程师
电话:057128223456-81124100
邮箱:[hidden email]
地址:上海-长宁-申通信息广场
<img style="border-radius: 4px;border:none;vertical-align: middle" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAAGK0lEQVRYCY3BfYzXZQEA8M/z/f549bzjfIHDwDtOuMsanoeas5jsUkdS4tjIWJtNp7X+CN2Y/ZFbruncbGP4h26Va2WtFSLLTSVnkISDsGbgzTkVjNeAA+FOjpPj5Pc8T8cdxqGH+PkEn0NAW5P6hVeYP69RR/tkbfXjNEvqRCRHe/rt2HpA54a91j//Hy93HtKTnV9wHouu0rqswwNzp7pDUisjISMiISEiIiHq3bjPqhWdlj+3zbs+Q3AOjRereXKRny5otbQIJsjISEhISEiIiEiIiIikqv6/7PbEj171yO5efUYRjKKjxazVd1l50QRzfCwjIyEhISEiISIiIiIiItJ9wpbFay1Zv8d2n1D6hLtv0L76HusmjjVLiQIBAQEBAcGw4NwyMhNKU+9s8e29fV5547AuIwQjdLSYtW6pV4ugQUBwRkZCQkJEQkREREREFRERERGZRNfNL7px/R7bnVY4rfESNavvsbIoNChRokSJEiVKlChRokCBAgUCAgoUCAgIhhQ0rL7FysZaNU4rnfaHOz3SNs1iJQqUKFAiICAgGF12RkY2LCMjGzKhYurMSSp/2madQYVBi67WuuBLlipQoECJCkpUUKJEgQIFCgQEFAgIKBAQEBCcZcF0Sxe1aDWoDHjqTo81TnK9EgVKlChQIiA4W0YxlprLuWQ29S1UJnLiKDGSkZCRkP1fCMZ84QLjn37bC+HqGeq33m+XilolKihRQYkCGRHRoIlc9g1mfIfLbmRiAyd66HmPyVdx/AjrH+afT1HNRFRRRTZSb/uzmioLW81HrXMJho2tofmHXHkfF0w3JH5E99vsf42/PcCYer6yjG89yf5Odr7m/wKykWoXzjC/Mq9Jh1MyMjIyMjISptzEdb/gwln0d7Htd+xaw383cewg1UgVfd28tJSZ32TKbHa+5rPMm6qj0j5Nm4yMjIQCyaDAlT+m/VE+6uWv89m3gZMDVBGRkZ0x9RounMq+151P+2RtRf1YzRIyMjIiYqDtYeY8Rqhw/AAHNlEdQEnrXeRANixjXC23/Zp3nufAG86nfozmiqROgYTgjNa7uOpBBPoPsu1pJs9lTB3FRA6/hZJcJSOU3PpLxk9izX3kbEg2LBtNXUVGcra6K7hhBaEw5B/L2P8KYRz9RxjoIyEiIWHuz2hZyO/nc+wAGdmw7JwqkqO41CnBsGseYuwkQ3avYf8mjnWRkZEQkZAx+25u+Akv/oA9m0jIyMgoKky/lu49HN1vhKOVnn476ie4VDAsVGi+3ZDXf87mB5EMyUjISEi4eDa3PEHnb9nyGxIyyvFMv5bmDnoP8+9n+LDbSD0n7ahs3afz682ud0pGQKoasmst1URANqxSw9h6ju0nRq65nzEXMPAh199HbROTv0w5ns4/8+oT9H1A8ilbD+ksm2rVdDRZ7GM50X+Exg6++F0uv5mmW5l1B9NuopzI4bcZ+JCE+hamfZWGduqb6d7F5l+x7lF2bmbgBMmont5mebi6Qf3W79ulUKtAgQI1U2iYQ2UCxw/Ss53j75MyGRkJKTC2joTjvcREREIVCcloetuf1VTp7NKzcY9Vcy93r4yMjN6D9L5EcLaMjIyEnDn+AQkRCQkJGcmoNh60qvN9PUXGin9Znk7qFxFRRRURERERVUREREREREQkJCREJKNKWf+KNy3PKA1657Aj1zWoban3NadkZGQkJGQkZCQkJCQkJCQkJGRko1qz1+MPbbbSoOC0xjo1W75nw0UTzFEgGBacLRuWkZCRkZCQkJCMqnvAljnPmre7V59BhdN2H9W3+AVLUtSliioiIiIiIiKqiIiIiIiIiEhGlehavNaS3b36nFYaYdcHuvf2euW2ZreHrEZGQkJCQkZCQkJCQkJGNqpE171/t+C597xlhGAUHY1mrb7VyovGmSP4tIyMjIyM7Jy6B2xZvNaS9Xts9wmlUew6qvuZbf44s05lZq05IRsjIyEhIyEhO6eU9a/Z6/HbX3L3G4d0GUVwHotatC6b7YG5U9yBWp9P78aDVq140/LntnnXZwg+h4C2S9UvnGH+vKk62idrqx+jGXWGHe05acfWQzo3HLD++Z1e7nxfT3Z+/wMRJ5n1jkJEIwAAAABJRU5ErkJggg==" width="32" height="32"> 阿里巴巴 企业主页 <img style="vertical-align: middle;border:none" width="10" height="20" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAoAAAAUCAMAAACDMFxkAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAbUExURUxpccPDw8DBwr/AwcjIyL/GxsLCwr/AwsLCwtH6rGoAAAAJdFJOUwAv1ewXKBXUKlgLeFUAAAAmSURBVAjXY2DACljZGWFMNiZmOJuFMjYHBhOhgFgWknOQHIkVAABQWgC5qJP1ZAAAAABJRU5ErkJggg==">
信息安全声明:本邮件包含信息归发件人所在组织所有,发件人所在组织对该邮件拥有所有权利。
请接收者注意保密,未经发件人书面许可,不得向任何第三方组织和个人透露本邮件所含信息的全部或部分。以上声明仅适用于工作邮件。
Information Security Notice: The information contained in this mail is solely property of the sender's organization.
This mail communication is confidential. Recipients named above are obligated to maintain secrecy and are not permitted to disclose the contents of this communication to others.
------------------------------------------------------------------
发件人:Jeroen Lauwers<[hidden email]>
日 期:2020年01月09日 23:17:37
收件人:[hidden email]<[hidden email]>
主 题:How to query for 'any word' in a phrase

Dear all,

Is there a way to construct (spans?) a phrase search like the following:
the quick brown * jumps over the * *
where * = any word but exactly 1 word

I introduced these *’s at a specific position, so a PhraseQuery with slop of 2 is just not good enough
and the two *’s at the end must be matched as well.

Is there such a thing as a Term or BytesRef that always matches everything?

Thanks,
Jeroen

Reply | Threaded
Open this post in threaded view
|

RE: 回复:How to query for 'any word' in a phrase

Jeroen Lauwers
I don’t understand your question:

In general: can it be set? Yes, : PhraseQuery<https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/search/PhraseQuery.html#PhraseQuery-int-java.lang.String-org.apache.lucene.util.BytesRef...->(int slop, String<https://docs.oracle.com/javase/8/docs/api/java/lang/String.html?is-external=true> field, BytesRef<https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/util/BytesRef.html>... terms)
In my specific case: also Yes. I’m parsing the query myself in a custom parser, so yes I can do it

As far as I understand, the slop is not specific to a position
Please explain how this could help.

Jeroen

From: 陈志祥 <[hidden email]>
Sent: donderdag 9 januari 2020 16:31
To: [hidden email]
Subject: 回复:How to query for 'any word' in a phrase

could the slop parameter in phasequery be dynamically set?

------------------------------------------------------------------
发件人:Jeroen Lauwers<[hidden email]<mailto:[hidden email]>>
日 期:2020年01月09日 23:17:37
收件人:[hidden email]<[hidden email]<mailto:[hidden email]%[hidden email]>>
主 题:How to query for 'any word' in a phrase

Dear all,

Is there a way to construct (spans?) a phrase search like the following:
the quick brown * jumps over the * *
where * = any word but exactly 1 word

I introduced these *’s at a specific position, so a PhraseQuery with slop of 2 is just not good enough
and the two *’s at the end must be matched as well.

Is there such a thing as a Term or BytesRef that always matches everything?

Thanks,
Jeroen
Reply | Threaded
Open this post in threaded view
|

回复: 回复:How to query for 'any word' in a phrase

陈志祥
i guess when you use * to mask a word,that is slop +1,continuous words means slop 0。PhaseQuery can only set a slop which is the max skip words count between terms,so that is a static config,not “dynamically set”





陈志祥
阿里巴巴 地图引擎核心算法工程师
电话:057128223456-81124100
邮箱:[hidden email]
地址:上海-长宁-申通信息广场
阿里巴巴 企业主页
信息安全声明:本邮件包含信息归发件人所在组织所有,发件人所在组织对该邮件拥有所有权利。
请接收者注意保密,未经发件人书面许可,不得向任何第三方组织和个人透露本邮件所含信息的全部或部分。以上声明仅适用于工作邮件。
Information Security Notice: The information contained in this mail is solely property of the sender's organization.
This mail communication is confidential. Recipients named above are obligated to maintain secrecy and are not permitted to disclose the contents of this communication to others.
------------------------------------------------------------------
发件人:Jeroen Lauwers<[hidden email]>
日 期:2020年01月09日 23:41:37
收件人:[hidden email]<[hidden email]>
主 题:RE: 回复:How to query for 'any word' in a phrase

I don’t understand your question:

In general: can it be set? Yes, : PhraseQuery<https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/search/PhraseQuery.html#PhraseQuery-int-java.lang.String-org.apache.lucene.util.BytesRef...->(int slop, String<https://docs.oracle.com/javase/8/docs/api/java/lang/String.html?is-external=true> field, BytesRef<https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/util/BytesRef.html>... terms)
In my specific case: also Yes. I’m parsing the query myself in a custom parser, so yes I can do it

As far as I understand, the slop is not specific to a position
Please explain how this could help.

Jeroen

From: 陈志祥 <[hidden email]>
Sent: donderdag 9 januari 2020 16:31
To: [hidden email]
Subject: 回复:How to query for 'any word' in a phrase

could the slop parameter in phasequery be dynamically set?

------------------------------------------------------------------
发件人:Jeroen Lauwers<[hidden email]<mailto:[hidden email]>>
日 期:2020年01月09日 23:17:37
收件人:[hidden email]<[hidden email]<mailto:[hidden email]%[hidden email]>>
主 题:How to query for 'any word' in a phrase

Dear all,

Is there a way to construct (spans?) a phrase search like the following:
the quick brown * jumps over the * *
where * = any word but exactly 1 word

I introduced these *’s at a specific position, so a PhraseQuery with slop of 2 is just not good enough
and the two *’s at the end must be matched as well.

Is there such a thing as a Term or BytesRef that always matches everything?

Thanks,
Jeroen

Reply | Threaded
Open this post in threaded view
|

回复: 回复:How to query for 'any word' in a phrase

陈志祥
In reply to this post by Jeroen Lauwers
To be more clear,i think you need build a custom PhraseQuery class,which can set each slop value between sub terms,also you need a special WildchardTerm matching any term which is only used in this custom PhraseQuery context……

Or just use grep tool or regex automata to scan?





陈志祥
阿里巴巴 地图引擎核心算法工程师
电话:057128223456-81124100
邮箱:[hidden email]
地址:上海-长宁-申通信息广场
阿里巴巴 企业主页
信息安全声明:本邮件包含信息归发件人所在组织所有,发件人所在组织对该邮件拥有所有权利。
请接收者注意保密,未经发件人书面许可,不得向任何第三方组织和个人透露本邮件所含信息的全部或部分。以上声明仅适用于工作邮件。
Information Security Notice: The information contained in this mail is solely property of the sender's organization.
This mail communication is confidential. Recipients named above are obligated to maintain secrecy and are not permitted to disclose the contents of this communication to others.
------------------------------------------------------------------
发件人:Jeroen Lauwers<[hidden email]>
日 期:2020年01月09日 23:41:37
收件人:[hidden email]<[hidden email]>
主 题:RE: 回复:How to query for 'any word' in a phrase

I don’t understand your question:

In general: can it be set? Yes, : PhraseQuery<https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/search/PhraseQuery.html#PhraseQuery-int-java.lang.String-org.apache.lucene.util.BytesRef...->(int slop, String<https://docs.oracle.com/javase/8/docs/api/java/lang/String.html?is-external=true> field, BytesRef<https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/util/BytesRef.html>... terms)
In my specific case: also Yes. I’m parsing the query myself in a custom parser, so yes I can do it

As far as I understand, the slop is not specific to a position
Please explain how this could help.

Jeroen

From: 陈志祥 <[hidden email]>
Sent: donderdag 9 januari 2020 16:31
To: [hidden email]
Subject: 回复:How to query for 'any word' in a phrase

could the slop parameter in phasequery be dynamically set?

------------------------------------------------------------------
发件人:Jeroen Lauwers<[hidden email]<mailto:[hidden email]>>
日 期:2020年01月09日 23:17:37
收件人:[hidden email]<[hidden email]<mailto:[hidden email]%[hidden email]>>
主 题:How to query for 'any word' in a phrase

Dear all,

Is there a way to construct (spans?) a phrase search like the following:
the quick brown * jumps over the * *
where * = any word but exactly 1 word

I introduced these *’s at a specific position, so a PhraseQuery with slop of 2 is just not good enough
and the two *’s at the end must be matched as well.

Is there such a thing as a Term or BytesRef that always matches everything?

Thanks,
Jeroen

Reply | Threaded
Open this post in threaded view
|

Re: How to query for 'any word' in a phrase

Tomoko Uchida
Hi,
did you try or consider SpanNearQuery?
You might need to insert some kind of spetial token (e.g., <EOS>) to the
end of the text field to match the "end of the sentence" anyways.

2020年1月10日(金) 1:30 陈志祥 <[hidden email]>:

> To be more clear,i think you need build a custom PhraseQuery class,which
> can set each slop value between sub terms,also you need a special
> WildchardTerm matching any term which is only used in this custom
> PhraseQuery context……
>
> Or just use grep tool or regex automata to scan?
>
>
>
>
>
> 陈志祥
> 阿里巴巴 地图引擎核心算法工程师
> 电话:057128223456-81124100
> 邮箱:[hidden email]
> 地址:上海-长宁-申通信息广场
>
> <https://tms.dingtalk.com/markets/dingtalk/person-view-v2?token=1B6294454CD1D4499FF5DBCBBB2150CB765636FFF84AD096D62C7A74B9DD20DD7E289FE886C65C3A037689E72B9EF3FC>
>
> <https://h5.dingtalk.com/home/index.html?corpId=dingd8e1123006514592&token=dd9393e11685028a443f58f91cb00b2a&from=emailSign> 阿里巴巴
> 企业主页
> <https://h5.dingtalk.com/home/index.html?corpId=dingd8e1123006514592&token=dd9393e11685028a443f58f91cb00b2a&from=emailSign>
> <https://h5.dingtalk.com/home/index.html?corpId=dingd8e1123006514592&token=dd9393e11685028a443f58f91cb00b2a&from=emailSign>
> 信息安全声明:本邮件包含信息归发件人所在组织所有,发件人所在组织对该邮件拥有所有权利。
> 请接收者注意保密,未经发件人书面许可,不得向任何第三方组织和个人透露本邮件所含信息的全部或部分。以上声明仅适用于工作邮件。
> Information Security Notice: The information contained in this mail is
> solely property of the sender's organization.
> This mail communication is confidential. Recipients named above are
> obligated to maintain secrecy and are not permitted to disclose the
> contents of this communication to others.
>
> ------------------------------------------------------------------
> 发件人:Jeroen Lauwers<[hidden email]>
> 日 期:2020年01月09日 23:41:37
> 收件人:[hidden email]<[hidden email]>
> 主 题:RE: 回复:How to query for 'any word' in a phrase
>
> I don’t understand your question:
>
> In general: can it be set? Yes, : PhraseQuery<
> https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/search/PhraseQuery.html#PhraseQuery-int-java.lang.String-org.apache.lucene.util.BytesRef...-
> >(int slop, String<
> https://docs.oracle.com/javase/8/docs/api/java/lang/String.html?is-external=true
> > field, BytesRef<
> https://lucene.apache.org/core/7_7_2/core/org/apache/lucene/util/BytesRef.html
> >... terms)
>
> In my specific case: also Yes. I’m parsing the query myself in a custom parser, so yes I can do it
>
> As far as I understand, the slop is not specific to a position
> Please explain how this could help.
>
> Jeroen
>
> From: 陈志祥 <[hidden email]>
> Sent: donderdag 9 januari 2020 16:31
> To: [hidden email]
> Subject: 回复:How to query for 'any word' in a phrase
>
> could the slop parameter in phasequery be dynamically set?
>
> ------------------------------------------------------------------
> 发件人:Jeroen Lauwers<[hidden email]<mailto:[hidden email]
> >>
> 日 期:2020年01月09日 23:17:37
> 收件人:[hidden email]<[hidden email]<mailto:
> [hidden email]%[hidden email]>>
> 主 题:How to query for 'any word' in a phrase
>
> Dear all,
>
> Is there a way to construct (spans?) a phrase search like the following:
> the quick brown * jumps over the * *
> where * = any word but exactly 1 word
>
>
> I introduced these *’s at a specific position, so a PhraseQuery with slop of 2 is just not good enough
> and the two *’s at the end must be matched as well.
>
> Is there such a thing as a Term or BytesRef that always matches everything?
>
> Thanks,
> Jeroen
>
>