[jira] [Created] (LUCENE-4247) QueryParser doesn't call Analyzer

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (LUCENE-4247) QueryParser doesn't call Analyzer

JIRA jira@apache.org
Zied Hamdi created LUCENE-4247:
----------------------------------

             Summary: QueryParser doesn't call Analyzer
                 Key: LUCENE-4247
                 URL: https://issues.apache.org/jira/browse/LUCENE-4247
             Project: Lucene - Core
          Issue Type: Bug
          Components: core/queryparser
    Affects Versions: 3.6
            Reporter: Zied Hamdi


I'm trying to escape czech characters thorough the ASCIIFoldingFilter this works fine in indexing since I can retrieve the non-diacritic version of the content I indexed. But trying to retrieve with diacritics returns always 0 results

In debug mode I can clearly see that the Analyzer wasn't called (in addition to that I've put a breakpoint in my analyser to check if it is not called later, and it never passes in)


searchText = "příLIš*";
                Analyzer analyzer = (Analyzer) factory.getBean("analyzer");
                Query q = new QueryParser((Version) factory.getBean("version"), DestinationPlaceProperties.NAME, analyzer).parse(searchText);


The query q has these values in debug:
prefix Term  (id=90)
        field "name" (id=100)
        text "příliš" (id=101)

--- more details ----
q PrefixQuery  (id=65)
        boost 1.0
        numberOfTerms 0
        prefix Term  (id=90)
        rewriteMethod MultiTermQuery$2  (id=92)
---------------------

My analyser is quite simple: I put its code just for reference

public class DestinationAnalyser extends Analyzer {

        /**
         *
         */
        private final Version luceneVersion;

        public DestinationAnalyser(Version lucene_version) {
                super();
                this.luceneVersion = lucene_version;
        }

        /*
         * (non-Javadoc)
         *
         * @see org.apache.lucene.analysis.Analyzer#tokenStream(java.lang.String,
         * java.io.Reader)
         */
        @Override
        public TokenStream tokenStream(String fieldName, Reader reader) {
                TokenStream result = new StandardTokenizer(luceneVersion, reader);
                result = new StandardFilter(luceneVersion, result);
                result = new LowerCaseFilter(luceneVersion, result);
                result = new ASCIIFoldingFilter(result);
                return result;
        }
}


--------- WORKAROUND ---------
To avoid the problem, I'm actually using this method to transform the search text
        /**
         * Uses {@link ASCIIFoldingFilter} to transform diacritical text to its ascii
         * counterpart
         *
         * @param text
         *          to transform
         * @return ascii text
         */
        public static String foldToASCII(String text) {
                int length = text.length();
                char[] toReturn = new char[length];
                ASCIIFoldingFilter.foldToASCII(text.toCharArray(), 0, toReturn, 0, length);
                return new String(toReturn);
        }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Resolved] (LUCENE-4247) QueryParser doesn't call Analyzer

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler resolved LUCENE-4247.
-----------------------------------

    Resolution: Invalid
      Assignee: Uwe Schindler

This is not a bug, because you are using a wildcard query which cannot use the analyzer (because the analyzer would destroy the wildcards). Without the "*" at the end this query would be parsed as you expect.

Please ask such questions on the [hidden email] mailing list first, people there will help you with such things.
               

> QueryParser doesn't call Analyzer
> ---------------------------------
>
>                 Key: LUCENE-4247
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4247
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/queryparser
>    Affects Versions: 3.6
>            Reporter: Zied Hamdi
>            Assignee: Uwe Schindler
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> I'm trying to escape czech characters thorough the ASCIIFoldingFilter this works fine in indexing since I can retrieve the non-diacritic version of the content I indexed. But trying to retrieve with diacritics returns always 0 results
> In debug mode I can clearly see that the Analyzer wasn't called (in addition to that I've put a breakpoint in my analyser to check if it is not called later, and it never passes in)
> searchText = "příLIš*";
> Analyzer analyzer = (Analyzer) factory.getBean("analyzer");
> Query q = new QueryParser((Version) factory.getBean("version"), DestinationPlaceProperties.NAME, analyzer).parse(searchText);
> The query q has these values in debug:
> prefix Term  (id=90)
> field "name" (id=100)
> text "příliš" (id=101)
> --- more details ----
> q PrefixQuery  (id=65)
> boost 1.0
> numberOfTerms 0
> prefix Term  (id=90)
> rewriteMethod MultiTermQuery$2  (id=92)
> ---------------------
> My analyser is quite simple: I put its code just for reference
> public class DestinationAnalyser extends Analyzer {
> /**
> *
> */
> private final Version luceneVersion;
> public DestinationAnalyser(Version lucene_version) {
> super();
> this.luceneVersion = lucene_version;
> }
> /*
> * (non-Javadoc)
> *
> * @see org.apache.lucene.analysis.Analyzer#tokenStream(java.lang.String,
> * java.io.Reader)
> */
> @Override
> public TokenStream tokenStream(String fieldName, Reader reader) {
> TokenStream result = new StandardTokenizer(luceneVersion, reader);
> result = new StandardFilter(luceneVersion, result);
> result = new LowerCaseFilter(luceneVersion, result);
> result = new ASCIIFoldingFilter(result);
> return result;
> }
> }
> --------- WORKAROUND ---------
> To avoid the problem, I'm actually using this method to transform the search text
> /**
> * Uses {@link ASCIIFoldingFilter} to transform diacritical text to its ascii
> * counterpart
> *
> * @param text
> *          to transform
> * @return ascii text
> */
> public static String foldToASCII(String text) {
> int length = text.length();
> char[] toReturn = new char[length];
> ASCIIFoldingFilter.foldToASCII(text.toCharArray(), 0, toReturn, 0, length);
> return new String(toReturn);
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (LUCENE-4247) QueryParser doesn't call Analyzer

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13420826#comment-13420826 ]

Uwe Schindler commented on LUCENE-4247:
---------------------------------------

Just to add some background information:
For the Solr Queryparser (see SOLR-2921) there is a new marker "MultiTermAware" in Solr. The Solr QueryParser can handle that, but lack of an IndexSchema, Lucene's cannot, so it does not analyze all MultiTermQueries like WildCard, Prefix, Fuzzy, or TermRangeQueries.
Maybe we port over the whole analysis factory infrastructure to Lucene, then this might be fixed, but that is not possible at the moment with what's available in Lucene.
               

> QueryParser doesn't call Analyzer
> ---------------------------------
>
>                 Key: LUCENE-4247
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4247
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/queryparser
>    Affects Versions: 3.6
>            Reporter: Zied Hamdi
>            Assignee: Uwe Schindler
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> I'm trying to escape czech characters thorough the ASCIIFoldingFilter this works fine in indexing since I can retrieve the non-diacritic version of the content I indexed. But trying to retrieve with diacritics returns always 0 results
> In debug mode I can clearly see that the Analyzer wasn't called (in addition to that I've put a breakpoint in my analyser to check if it is not called later, and it never passes in)
> searchText = "příLIš*";
> Analyzer analyzer = (Analyzer) factory.getBean("analyzer");
> Query q = new QueryParser((Version) factory.getBean("version"), DestinationPlaceProperties.NAME, analyzer).parse(searchText);
> The query q has these values in debug:
> prefix Term  (id=90)
> field "name" (id=100)
> text "příliš" (id=101)
> --- more details ----
> q PrefixQuery  (id=65)
> boost 1.0
> numberOfTerms 0
> prefix Term  (id=90)
> rewriteMethod MultiTermQuery$2  (id=92)
> ---------------------
> My analyser is quite simple: I put its code just for reference
> public class DestinationAnalyser extends Analyzer {
> /**
> *
> */
> private final Version luceneVersion;
> public DestinationAnalyser(Version lucene_version) {
> super();
> this.luceneVersion = lucene_version;
> }
> /*
> * (non-Javadoc)
> *
> * @see org.apache.lucene.analysis.Analyzer#tokenStream(java.lang.String,
> * java.io.Reader)
> */
> @Override
> public TokenStream tokenStream(String fieldName, Reader reader) {
> TokenStream result = new StandardTokenizer(luceneVersion, reader);
> result = new StandardFilter(luceneVersion, result);
> result = new LowerCaseFilter(luceneVersion, result);
> result = new ASCIIFoldingFilter(result);
> return result;
> }
> }
> --------- WORKAROUND ---------
> To avoid the problem, I'm actually using this method to transform the search text
> /**
> * Uses {@link ASCIIFoldingFilter} to transform diacritical text to its ascii
> * counterpart
> *
> * @param text
> *          to transform
> * @return ascii text
> */
> public static String foldToASCII(String text) {
> int length = text.length();
> char[] toReturn = new char[length];
> ASCIIFoldingFilter.foldToASCII(text.toCharArray(), 0, toReturn, 0, length);
> return new String(toReturn);
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]