Can't find Japanese words ending with numbers

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Can't find Japanese words ending with numbers

Antonio Facciorusso
Dear all,

I'm using Jackrabbit 2.16.1 and Lucene 3.6.2.

I have a node of type "mynodetype" having a property named "description" having the following value: "横浜第2センタ". If I perform a full-text search using "jcr:contains" like:

jcr:contains(., '<value>*')

this query returns 0 results:
"//element(*,mynodetype)[(jcr:contains(., '横浜第2*'))]"

while all of the following work correctly and return at least one result:

"//element(*,mynodetype)[(jcr:contains(., '横浜第2センタ*'))]"
"//element(*,mynodetype)[(jcr:contains(., '横浜第*'))]"
"//element(*,mynodetype)[(jcr:contains(., '2センタ*'))]"
"//element(*,mynodetype)[(jcr:contains(., 'センタ*'))]"

I tried using both the default analyzer and the Japanese one (https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/ja/JapaneseAnalyzer.html).

This is the content of my indexingConfiguration.xml file:

<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.1.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
    <index-rule nodeType="entity">
        <!-- simple properties -->
        <property isRegexp="true">.*:[^_]+</property>
        <!-- resources_data_xxx -->
        <property isRegexp="true">.*:resources_data_[^_]+</property>
        <!-- resources_xxx (with xxx != 'data') -->
        <property isRegexp="true">.*:resources_data[^_]+</property>
        <property isRegexp="true">.*:resources_(?!data)[^_]+</property>
        <!-- resourcesxyz_xxx -->
        <property isRegexp="true">.*:resources[^_]+_[^_]+</property>
        <!-- all other xxx_yyy (with xxx != resources) -->
        <property isRegexp="true">.*:(?!resources)[^_]+_[^_]+</property>
    </index-rule>
</configuration>

Should I use a different configuration/analyzer? Is it a bug?

Thank you.

Best regards,
Antonio.
[https://westpole.it/firma/logo.png]

Antonio Facciorusso
WebRainbow(r) Software Analyst & Developer

P +39 051 8550 562
M +39 335 1219330
E [hidden email]
W https://westpole.webex.com/meet/A.Facciorusso
A Via Ettore Cristoni, 84 - 40033 Casalecchio di Reno

[https://westpole.it/firma/sito.png]<https://westpole.it>  [https://westpole.it/firma/twitter.png] <https://twitter.com/WESTPOLE_SPA>   [https://westpole.it/firma/facebook.png] <https://www.facebook.com/WESTPOLESPA/>   [https://westpole.it/firma/linkedin.png] <https://www.linkedin.com/company/westpole/>


This email for the D.lgs.196/2003 (Privacy Code) and European Regulation 679/2016/UE (GDPR) may contain confidential and/or privileged information for the exclusive use of the intended recipient. Any review or distribution by others is strictly prohibited. If you are not the intended recipient, you must not use, copy, disclose or take any action based on this message or any information here. If you have received this email in error, please contact us (email:[hidden email]) by reply email and delete all copies. Legal privilege is not waived because you have read this email. Thank you for your cooperation.


[https://westpole.it/firma/ambiente.png] Please consider the environment before printing this email

Reply | Threaded
Open this post in threaded view
|

RE: Can't find Japanese words ending with numbers

Gareth Harper
Could someone please take me off this mailing list.

-----Original Message-----
From: Antonio Facciorusso <[hidden email]>
Sent: 17 April 2019 11:05
To: [hidden email]; [hidden email]
Subject: Can't find Japanese words ending with numbers

Dear all,

I'm using Jackrabbit 2.16.1 and Lucene 3.6.2.

I have a node of type "mynodetype" having a property named "description" having the following value: "横浜第2センタ". If I perform a full-text search using "jcr:contains" like:

jcr:contains(., '<value>*')

this query returns 0 results:
"//element(*,mynodetype)[(jcr:contains(., '横浜第2*'))]"

while all of the following work correctly and return at least one result:

"//element(*,mynodetype)[(jcr:contains(., '横浜第2センタ*'))]"
"//element(*,mynodetype)[(jcr:contains(., '横浜第*'))]"
"//element(*,mynodetype)[(jcr:contains(., '2センタ*'))]"
"//element(*,mynodetype)[(jcr:contains(., 'センタ*'))]"

I tried using both the default analyzer and the Japanese one (https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/ja/JapaneseAnalyzer.html).

This is the content of my indexingConfiguration.xml file:

<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.1.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
    <index-rule nodeType="entity">
        <!-- simple properties -->
        <property isRegexp="true">.*:[^_]+</property>
        <!-- resources_data_xxx -->
        <property isRegexp="true">.*:resources_data_[^_]+</property>
        <!-- resources_xxx (with xxx != 'data') -->
        <property isRegexp="true">.*:resources_data[^_]+</property>
        <property isRegexp="true">.*:resources_(?!data)[^_]+</property>
        <!-- resourcesxyz_xxx -->
        <property isRegexp="true">.*:resources[^_]+_[^_]+</property>
        <!-- all other xxx_yyy (with xxx != resources) -->
        <property isRegexp="true">.*:(?!resources)[^_]+_[^_]+</property>
    </index-rule>
</configuration>

Should I use a different configuration/analyzer? Is it a bug?

Thank you.

Best regards,
Antonio.
[https://westpole.it/firma/logo.png]

Antonio Facciorusso
WebRainbow(r) Software Analyst & Developer

P +39 051 8550 562
M +39 335 1219330
E [hidden email]
W https://westpole.webex.com/meet/A.Facciorusso
A Via Ettore Cristoni, 84 - 40033 Casalecchio di Reno

[https://westpole.it/firma/sito.png]<https://westpole.it>  [https://westpole.it/firma/twitter.png] <https://twitter.com/WESTPOLE_SPA>   [https://westpole.it/firma/facebook.png] <https://www.facebook.com/WESTPOLESPA/>   [https://westpole.it/firma/linkedin.png] <https://www.linkedin.com/company/westpole/>


This email for the D.lgs.196/2003 (Privacy Code) and European Regulation 679/2016/UE (GDPR) may contain confidential and/or privileged information for the exclusive use of the intended recipient. Any review or distribution by others is strictly prohibited. If you are not the intended recipient, you must not use, copy, disclose or take any action based on this message or any information here. If you have received this email in error, please contact us (email:[hidden email]) by reply email and delete all copies. Legal privilege is not waived because you have read this email. Thank you for your cooperation.


[https://westpole.it/firma/ambiente.png] Please consider the environment before printing this email


________________________________________________________________________
This e-mail has been scanned for all viruses by Claranet. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit:
http://www.claranet.co.uk
________________________________________________________________________

________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs - For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________
Reply | Threaded
Open this post in threaded view
|

RE: Can't find Japanese words ending with numbers

Uwe Schindler
Please check here, you have to do it on your own:
http://lucene.apache.org/core/discussion.html#java-user-list-java-userluceneapacheorg

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: Gareth Harper <[hidden email]>
> Sent: Wednesday, April 17, 2019 12:45 PM
> To: [hidden email]
> Subject: RE: Can't find Japanese words ending with numbers
>
> Could someone please take me off this mailing list.
>
> -----Original Message-----
> From: Antonio Facciorusso <[hidden email]>
> Sent: 17 April 2019 11:05
> To: [hidden email]; [hidden email]
> Subject: Can't find Japanese words ending with numbers
>
> Dear all,
>
> I'm using Jackrabbit 2.16.1 and Lucene 3.6.2.
>
> I have a node of type "mynodetype" having a property named "description"
> having the following value: "横浜第2センタ". If I perform a full-text search
> using "jcr:contains" like:
>
> jcr:contains(., '<value>*')
>
> this query returns 0 results:
> "//element(*,mynodetype)[(jcr:contains(., '横浜第2*'))]"
>
> while all of the following work correctly and return at least one result:
>
> "//element(*,mynodetype)[(jcr:contains(., '横浜第2センタ*'))]"
> "//element(*,mynodetype)[(jcr:contains(., '横浜第*'))]"
> "//element(*,mynodetype)[(jcr:contains(., '2センタ*'))]"
> "//element(*,mynodetype)[(jcr:contains(., 'センタ*'))]"
>
> I tried using both the default analyzer and the Japanese one
> (https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/j
> a/JapaneseAnalyzer.html).
>
> This is the content of my indexingConfiguration.xml file:
>
> <?xml version="1.0"?>
> <!DOCTYPE configuration SYSTEM
> "http://jackrabbit.apache.org/dtd/indexing-configuration-1.1.dtd">
> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
>     <index-rule nodeType="entity">
>         <!-- simple properties -->
>         <property isRegexp="true">.*:[^_]+</property>
>         <!-- resources_data_xxx -->
>         <property isRegexp="true">.*:resources_data_[^_]+</property>
>         <!-- resources_xxx (with xxx != 'data') -->
>         <property isRegexp="true">.*:resources_data[^_]+</property>
>         <property isRegexp="true">.*:resources_(?!data)[^_]+</property>
>         <!-- resourcesxyz_xxx -->
>         <property isRegexp="true">.*:resources[^_]+_[^_]+</property>
>         <!-- all other xxx_yyy (with xxx != resources) -->
>         <property isRegexp="true">.*:(?!resources)[^_]+_[^_]+</property>
>     </index-rule>
> </configuration>
>
> Should I use a different configuration/analyzer? Is it a bug?
>
> Thank you.
>
> Best regards,
> Antonio.
> [https://westpole.it/firma/logo.png]
>
> Antonio Facciorusso
> WebRainbow(r) Software Analyst & Developer
>
> P +39 051 8550 562
> M +39 335 1219330
> E [hidden email]
> W https://westpole.webex.com/meet/A.Facciorusso
> A Via Ettore Cristoni, 84 - 40033 Casalecchio di Reno
>
> [https://westpole.it/firma/sito.png]<https://westpole.it>
> [https://westpole.it/firma/twitter.png]
> <https://twitter.com/WESTPOLE_SPA>
> [https://westpole.it/firma/facebook.png]
> <https://www.facebook.com/WESTPOLESPA/>
> [https://westpole.it/firma/linkedin.png]
> <https://www.linkedin.com/company/westpole/>
>
>
> This email for the D.lgs.196/2003 (Privacy Code) and European Regulation
> 679/2016/UE (GDPR) may contain confidential and/or privileged information
> for the exclusive use of the intended recipient. Any review or distribution by
> others is strictly prohibited. If you are not the intended recipient, you must
> not use, copy, disclose or take any action based on this message or any
> information here. If you have received this email in error, please contact us
> (email:[hidden email]) by reply email and delete all copies. Legal
> privilege is not waived because you have read this email. Thank you for your
> cooperation.
>
>
> [https://westpole.it/firma/ambiente.png] Please consider the environment
> before printing this email
>
>
> ________________________________________________________________
> ________
> This e-mail has been scanned for all viruses by Claranet. The service is
> powered by MessageLabs. For more information on a proactive anti-virus
> service working around the clock, around the globe, visit:
> http://www.claranet.co.uk
> ________________________________________________________________
> ________
>
> ________________________________________________________________
> ________
> This e-mail has been scanned for all viruses by Star Internet. The
> service is powered by MessageLabs - For more information on a proactive
> anti-virus service working around the clock, around the globe, visit:
> http://www.star.net.uk
> ________________________________________________________________
> ________

Reply | Threaded
Open this post in threaded view
|

RE: Can't find Japanese words ending with numbers

Gareth Harper
Thank you.

-----Original Message-----
From: Uwe Schindler <[hidden email]>
Sent: 17 April 2019 12:18
To: [hidden email]
Subject: RE: Can't find Japanese words ending with numbers

Please check here, you have to do it on your own:
http://lucene.apache.org/core/discussion.html#java-user-list-java-userluceneapacheorg

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: Gareth Harper <[hidden email]>
> Sent: Wednesday, April 17, 2019 12:45 PM
> To: [hidden email]
> Subject: RE: Can't find Japanese words ending with numbers
>
> Could someone please take me off this mailing list.
>
> -----Original Message-----
> From: Antonio Facciorusso <[hidden email]>
> Sent: 17 April 2019 11:05
> To: [hidden email]; [hidden email]
> Subject: Can't find Japanese words ending with numbers
>
> Dear all,
>
> I'm using Jackrabbit 2.16.1 and Lucene 3.6.2.
>
> I have a node of type "mynodetype" having a property named "description"
> having the following value: "横浜第2センタ". If I perform a full-text search
> using "jcr:contains" like:
>
> jcr:contains(., '<value>*')
>
> this query returns 0 results:
> "//element(*,mynodetype)[(jcr:contains(., '横浜第2*'))]"
>
> while all of the following work correctly and return at least one result:
>
> "//element(*,mynodetype)[(jcr:contains(., '横浜第2センタ*'))]"
> "//element(*,mynodetype)[(jcr:contains(., '横浜第*'))]"
> "//element(*,mynodetype)[(jcr:contains(., '2センタ*'))]"
> "//element(*,mynodetype)[(jcr:contains(., 'センタ*'))]"
>
> I tried using both the default analyzer and the Japanese one
> (https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analys
> is/j
> a/JapaneseAnalyzer.html).
>
> This is the content of my indexingConfiguration.xml file:
>
> <?xml version="1.0"?>
> <!DOCTYPE configuration SYSTEM
> "http://jackrabbit.apache.org/dtd/indexing-configuration-1.1.dtd">
> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
>     <index-rule nodeType="entity">
>         <!-- simple properties -->
>         <property isRegexp="true">.*:[^_]+</property>
>         <!-- resources_data_xxx -->
>         <property isRegexp="true">.*:resources_data_[^_]+</property>
>         <!-- resources_xxx (with xxx != 'data') -->
>         <property isRegexp="true">.*:resources_data[^_]+</property>
>         <property isRegexp="true">.*:resources_(?!data)[^_]+</property>
>         <!-- resourcesxyz_xxx -->
>         <property isRegexp="true">.*:resources[^_]+_[^_]+</property>
>         <!-- all other xxx_yyy (with xxx != resources) -->
>         <property isRegexp="true">.*:(?!resources)[^_]+_[^_]+</property>
>     </index-rule>
> </configuration>
>
> Should I use a different configuration/analyzer? Is it a bug?
>
> Thank you.
>
> Best regards,
> Antonio.
> [https://westpole.it/firma/logo.png]
>
> Antonio Facciorusso
> WebRainbow(r) Software Analyst & Developer
>
> P +39 051 8550 562
> M +39 335 1219330
> E [hidden email]
> W https://westpole.webex.com/meet/A.Facciorusso
> A Via Ettore Cristoni, 84 - 40033 Casalecchio di Reno
>
> [https://westpole.it/firma/sito.png]<https://westpole.it>
> [https://westpole.it/firma/twitter.png]
> <https://twitter.com/WESTPOLE_SPA>
> [https://westpole.it/firma/facebook.png]
> <https://www.facebook.com/WESTPOLESPA/>
> [https://westpole.it/firma/linkedin.png]
> <https://www.linkedin.com/company/westpole/>
>
>
> This email for the D.lgs.196/2003 (Privacy Code) and European
> Regulation 679/2016/UE (GDPR) may contain confidential and/or
> privileged information for the exclusive use of the intended
> recipient. Any review or distribution by others is strictly
> prohibited. If you are not the intended recipient, you must not use,
> copy, disclose or take any action based on this message or any
> information here. If you have received this email in error, please
> contact us
> (email:[hidden email]) by reply email and delete all copies.
> Legal privilege is not waived because you have read this email. Thank
> you for your cooperation.
>
>
> [https://westpole.it/firma/ambiente.png] Please consider the
> environment before printing this email
>
>
> ________________________________________________________________
> ________
> This e-mail has been scanned for all viruses by Claranet. The service
> is powered by MessageLabs. For more information on a proactive
> anti-virus service working around the clock, around the globe, visit:
> http://www.claranet.co.uk
> ________________________________________________________________
> ________
>
> ________________________________________________________________
> ________
> This e-mail has been scanned for all viruses by Star Internet. The
> service is powered by MessageLabs - For more information on a
> proactive anti-virus service working around the clock, around the globe, visit:
> http://www.star.net.uk
> ________________________________________________________________
> ________

________________________________________________________________________
This e-mail has been scanned for all viruses by Claranet. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit:
http://www.claranet.co.uk
________________________________________________________________________

________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs - For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________