[jira] Created: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

JIRA jira@apache.org
Add n-gram tokenizers to contrib/analyzers
------------------------------------------

                 Key: LUCENE-759
                 URL: http://issues.apache.org/jira/browse/LUCENE-759
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Analysis
            Reporter: Otis Gospodnetic
            Priority: Minor


It would be nice to have some n-gram-capable tokenizers in contrib/analyzers.  Patch coming shortly.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/LUCENE-759?page=all ]

Otis Gospodnetic updated LUCENE-759:
------------------------------------

    Attachment: LUCENE-759.patch

Included:
  NGramTokenizer
  NGramTokenizerTest
  EdgeNGramTokenizer
  EdgeNGramTokenizerTest


> Add n-gram tokenizers to contrib/analyzers
> ------------------------------------------
>
>                 Key: LUCENE-759
>                 URL: http://issues.apache.org/jira/browse/LUCENE-759
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>         Attachments: LUCENE-759.patch
>
>
> It would be nice to have some n-gram-capable tokenizers in contrib/analyzers.  Patch coming shortly.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/LUCENE-759?page=all ]

Otis Gospodnetic resolved LUCENE-759.
-------------------------------------

    Resolution: Fixed

Unit tests pass, committed.

> Add n-gram tokenizers to contrib/analyzers
> ------------------------------------------
>
>                 Key: LUCENE-759
>                 URL: http://issues.apache.org/jira/browse/LUCENE-759
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>         Attachments: LUCENE-759.patch
>
>
> It would be nice to have some n-gram-capable tokenizers in contrib/analyzers.  Patch coming shortly.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Reopened: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic reopened LUCENE-759:
-------------------------------------

    Lucene Fields: [New, Patch Available]  (was: [New])

Reopening, because I'm bringing in Adam Hiatt's modifications that he uploaded in a patch for SOLR-81.  Adam's changes allow this tokenizer to create n-grams whose sizes are specified as a min-max range.

This patch fixes a bug in Adam's code, but has another bug that I don't know how to fix now.
Adam's bug:
  input: abcde
  minGram: 1
  maxGram: 3
  output: a ab abc  -- and this is where tokenizing stopped, which was wrong, it should have continued: b bc bcd c cd cde d de e

Otis' bug:
  input: abcde
  minGeam: 1
  maxGram: 3
  output: e de cde d cd bcd c bc abc b ab -- and this is where tokenizing stops, which is wrong, it should generate one more n-gram: a

This bug won't hurt SOLR-81, but it should be fixed.

> Add n-gram tokenizers to contrib/analyzers
> ------------------------------------------
>
>                 Key: LUCENE-759
>                 URL: https://issues.apache.org/jira/browse/LUCENE-759
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>         Attachments: LUCENE-759.patch
>
>
> It would be nice to have some n-gram-capable tokenizers in contrib/analyzers.  Patch coming shortly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Updated: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated LUCENE-759:
------------------------------------

    Attachment: LUCENE-759.patch

The modified tokenizer and the extended unit test.

> Add n-gram tokenizers to contrib/analyzers
> ------------------------------------------
>
>                 Key: LUCENE-759
>                 URL: https://issues.apache.org/jira/browse/LUCENE-759
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>         Attachments: LUCENE-759.patch, LUCENE-759.patch
>
>
> It would be nice to have some n-gram-capable tokenizers in contrib/analyzers.  Patch coming shortly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Commented: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473851 ]

Adam Hiatt commented on LUCENE-759:
-----------------------------------

Otis: this really isn't a bug. The min/max gram code I added only applied to the EdgeNGramTokenizer.  I only want to generate _edge_ n-grams between the range of sizes provided.

For example, with the EdgeNGramTokenizer
 input: abcde
  minGram: 1
  maxGram: 3

'a ab abc' is in fact what I intended to produce.

I think it makes more sense for the functionality to which you referred to be located in NGramTokenizer.


> Add n-gram tokenizers to contrib/analyzers
> ------------------------------------------
>
>                 Key: LUCENE-759
>                 URL: https://issues.apache.org/jira/browse/LUCENE-759
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>         Attachments: LUCENE-759.patch, LUCENE-759.patch
>
>
> It would be nice to have some n-gram-capable tokenizers in contrib/analyzers.  Patch coming shortly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Commented: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473891 ]

Otis Gospodnetic commented on LUCENE-759:
-----------------------------------------

Damn, I think you are right! :)  Once again, I'm making late night mistakes.  When will I learn!?
But I could take my code to NGramTokenizer then, at least.
My bug remains, though..... got an idea for a fix?



> Add n-gram tokenizers to contrib/analyzers
> ------------------------------------------
>
>                 Key: LUCENE-759
>                 URL: https://issues.apache.org/jira/browse/LUCENE-759
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>         Attachments: LUCENE-759.patch, LUCENE-759.patch
>
>
> It would be nice to have some n-gram-capable tokenizers in contrib/analyzers.  Patch coming shortly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Updated: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated LUCENE-759:
------------------------------------

    Attachment: LUCENE-759.patch

Here is the proper version.  This one is essentially the Lucene-n-gram-analyzer-specific Adam's patch from SOLR-81 + some passing unit tests I wrote to exercise the new n-gram range functionality.

I'll commit this by the end of the week unless Adam spots a bug.


> Add n-gram tokenizers to contrib/analyzers
> ------------------------------------------
>
>                 Key: LUCENE-759
>                 URL: https://issues.apache.org/jira/browse/LUCENE-759
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Otis Gospodnetic
>            Priority: Minor
>         Attachments: LUCENE-759.patch, LUCENE-759.patch, LUCENE-759.patch
>
>
> It would be nice to have some n-gram-capable tokenizers in contrib/analyzers.  Patch coming shortly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Resolved: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic resolved LUCENE-759.
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 2.2
         Assignee: Otis Gospodnetic
    Lucene Fields: [New, Patch Available]  (was: [Patch Available, New])

In SVN.


> Add n-gram tokenizers to contrib/analyzers
> ------------------------------------------
>
>                 Key: LUCENE-759
>                 URL: https://issues.apache.org/jira/browse/LUCENE-759
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Otis Gospodnetic
>         Assigned To: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: LUCENE-759.patch, LUCENE-759.patch, LUCENE-759.patch
>
>
> It would be nice to have some n-gram-capable tokenizers in contrib/analyzers.  Patch coming shortly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Commented: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12477115 ]

Doron Cohen commented on LUCENE-759:
------------------------------------

I have two comments/questions on the n-gram tokenizers:

(1) Seems that only the first 1024 characters of the input are handled, and the rest is ignored (and I think as result the input stream would remain dangling open).

If you add this test case:

    /**
     * Test that no ngrams are lost, even for really long inputs
     * @throws EXception
     */
    public void testLongerInput() throws Exception {
      int expectedNumTokens = 1024;
      int ngramLength = 2;
      // prepare long string
      StringBuffer sb = new StringBuffer();
      while (sb.length()<expectedNumTokens+ngramLength-1)
        sb.append('a');
     
      StringReader longStringReader = new StringReader (sb.toString());
      NGramTokenizer tokenizer = new NGramTokenizer(longStringReader, ngramLength, ngramLength);
      int numTokens = 0;
      Token token;
      while ((token = tokenizer.next())!=null) {
        numTokens++;
        assertEquals("aa",token.termText());
      }
      assertEquals("wrong number of tokens",expectedNumTokens,numTokens);
    }

With expectedNumTokens = 1023 it would pass, but any larger number would fail.

(2) It seems safer to read the characters like this
            int n = input.read(chars);
            inStr = new String(chars, 0, n);
(This way not counting on String.trim(), which does work, but worries me).



> Add n-gram tokenizers to contrib/analyzers
> ------------------------------------------
>
>                 Key: LUCENE-759
>                 URL: https://issues.apache.org/jira/browse/LUCENE-759
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Otis Gospodnetic
>         Assigned To: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: LUCENE-759.patch, LUCENE-759.patch, LUCENE-759.patch
>
>
> It would be nice to have some n-gram-capable tokenizers in contrib/analyzers.  Patch coming shortly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Reopened: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic reopened LUCENE-759:
-------------------------------------


More goodies coming.

> Add n-gram tokenizers to contrib/analyzers
> ------------------------------------------
>
>                 Key: LUCENE-759
>                 URL: https://issues.apache.org/jira/browse/LUCENE-759
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Otis Gospodnetic
>         Assigned To: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: LUCENE-759.patch, LUCENE-759.patch, LUCENE-759.patch
>
>
> It would be nice to have some n-gram-capable tokenizers in contrib/analyzers.  Patch coming shortly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Updated: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated LUCENE-759:
------------------------------------

    Attachment: LUCENE-759-filters.patch

N-gram-producting TokenFilters for Karl's mom.


> Add n-gram tokenizers to contrib/analyzers
> ------------------------------------------
>
>                 Key: LUCENE-759
>                 URL: https://issues.apache.org/jira/browse/LUCENE-759
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Otis Gospodnetic
>         Assigned To: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: LUCENE-759-filters.patch, LUCENE-759.patch, LUCENE-759.patch, LUCENE-759.patch
>
>
> It would be nice to have some n-gram-capable tokenizers in contrib/analyzers.  Patch coming shortly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Commented: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12477593 ]

Doron Cohen commented on LUCENE-759:
------------------------------------

Hi Otis,

>  (and I think as result the input stream would remain dangling open)

I take this part back - closing tokenStream would close the reader, and at least for the case that I thought of, invertDocument, the tokenStream is properly closed.

Can you comment on the input length: is it correct to handle only the first 1024 characters?

Thanks,
Doron

> Add n-gram tokenizers to contrib/analyzers
> ------------------------------------------
>
>                 Key: LUCENE-759
>                 URL: https://issues.apache.org/jira/browse/LUCENE-759
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Otis Gospodnetic
>         Assigned To: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: LUCENE-759-filters.patch, LUCENE-759.patch, LUCENE-759.patch, LUCENE-759.patch
>
>
> It would be nice to have some n-gram-capable tokenizers in contrib/analyzers.  Patch coming shortly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Commented: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12477649 ]

Otis Gospodnetic commented on LUCENE-759:
-----------------------------------------

Ah, didn't see your comments here earlier, Doron.  Yes, I think you are correct about the 1024 limit  - when I wrote that Tokenizer I was thinking TokenFilter, and thus I was thinking that that input Reader represents a Token, which was wrong.  So, I thought, "oh, 1024 chars/token, that will be enough".  I ended up needing TokenFilters for SOLR-81, so that's what I checked in.  Those operate on tokens and don't have the 1024 limitation.

Anyhow, feel free to slap your test + the fix in and thanks for checking!


> Add n-gram tokenizers to contrib/analyzers
> ------------------------------------------
>
>                 Key: LUCENE-759
>                 URL: https://issues.apache.org/jira/browse/LUCENE-759
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Otis Gospodnetic
>         Assigned To: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: LUCENE-759-filters.patch, LUCENE-759.patch, LUCENE-759.patch, LUCENE-759.patch
>
>
> It would be nice to have some n-gram-capable tokenizers in contrib/analyzers.  Patch coming shortly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Commented: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12479221 ]

Patrick Turcotte commented on LUCENE-759:
-----------------------------------------

Is it just me or are the NGramTokenFilter and EdgeNGramTokenFilter class not committed to SVN and not in the patch either?

NGramTokenFilterTest and EdgeNGramTokenFilterTest are referring to them, but I can not seem to find them.

Thanks and keep the good work.

Patrick

> Add n-gram tokenizers to contrib/analyzers
> ------------------------------------------
>
>                 Key: LUCENE-759
>                 URL: https://issues.apache.org/jira/browse/LUCENE-759
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Otis Gospodnetic
>         Assigned To: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: LUCENE-759-filters.patch, LUCENE-759.patch, LUCENE-759.patch, LUCENE-759.patch
>
>
> It would be nice to have some n-gram-capable tokenizers in contrib/analyzers.  Patch coming shortly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Commented: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12479378 ]

Hoss Man commented on LUCENE-759:
---------------------------------

Otis's most recent attachment contains only tests .. but previous attachemnts had implementations.

all of which have been commited under contrib/analyzers

(tip: if you click "All" at the top of the list of comments in Jira, you see every modification related to this issue, including subversion commits that Jira detects are related to the issue based on the commit message)

> Add n-gram tokenizers to contrib/analyzers
> ------------------------------------------
>
>                 Key: LUCENE-759
>                 URL: https://issues.apache.org/jira/browse/LUCENE-759
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Otis Gospodnetic
>         Assigned To: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: LUCENE-759-filters.patch, LUCENE-759.patch, LUCENE-759.patch, LUCENE-759.patch
>
>
> It would be nice to have some n-gram-capable tokenizers in contrib/analyzers.  Patch coming shortly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Commented: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

Patrek
> Otis's most recent attachment contains only tests .. but previous
> attachemnts had implementations.


I just took another look, nope, no "*Filter" class (except for test) in
those patches.

all of which have been commited under contrib/analyzers


Just did a fresh checkout from svn, and the filter classes are absent too.
Did a fulltext search. Refered to in the contrib/analyzers/src/test tests,
but not found in the contrib/analyzers/src/java folder.

Seems like the Filters were not commited.

(tip: if you click "All" at the top of the list of comments in Jira, you see
> every modification related to this issue, including subversion commits that
> Jira detects are related to the issue based on the commit message)


Thanks for the tip. Seems to confirm my doubts, as there are no "ADD" or
"MODIFY"  for any TokenFilter. But the tests are presents.

Again, I may be missing something, somewhere, but I can't seem to find the
NGramTokenFilter class.

Thanks.

Patrick
Reply | Threaded
Open this post in threaded view
|

Commented: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12479441 ]

Hoss Man commented on LUCENE-759:
---------------------------------

Ack! ... i'm sorry i completely missread Patrick's question.

ngram *Tokenizers* have been commited -- but there are no ngram TokenFilters ... there are tests for TokenFilters Otis commited on March2, but those tests do't currently compile/run without the TokenFilter's themselves.

Otis: do you have these TokenFilter's in your working directory that you perhaps forgot to svn add before committing?

> Add n-gram tokenizers to contrib/analyzers
> ------------------------------------------
>
>                 Key: LUCENE-759
>                 URL: https://issues.apache.org/jira/browse/LUCENE-759
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Otis Gospodnetic
>         Assigned To: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: LUCENE-759-filters.patch, LUCENE-759.patch, LUCENE-759.patch, LUCENE-759.patch
>
>
> It would be nice to have some n-gram-capable tokenizers in contrib/analyzers.  Patch coming shortly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Commented: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12480236 ]

Otis Gospodnetic commented on LUCENE-759:
-----------------------------------------

Oh, look at that!

[otis@localhost contrib]$ svn st
A      analyzers/src/java/org/apache/lucene/analysis/ngram/EdgeNGramTokenFilter.java
A      analyzers/src/java/org/apache/lucene/analysis/ngram/NGramTokenFilter.java

:)
It's in the repo now.  Sorry about that!


> Add n-gram tokenizers to contrib/analyzers
> ------------------------------------------
>
>                 Key: LUCENE-759
>                 URL: https://issues.apache.org/jira/browse/LUCENE-759
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Otis Gospodnetic
>         Assigned To: Otis Gospodnetic
>            Priority: Minor
>             Fix For: 2.2
>
>         Attachments: LUCENE-759-filters.patch, LUCENE-759.patch, LUCENE-759.patch, LUCENE-759.patch
>
>
> It would be nice to have some n-gram-capable tokenizers in contrib/analyzers.  Patch coming shortly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Commented: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

Otis Gospodnetic-2
My bad - I somehow forgot to commit "the meat".  It's in now.

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

----- Original Message ----
From: Patrick Turcotte <[hidden email]>
To: [hidden email]
Sent: Thursday, March 8, 2007 3:24:30 PM
Subject: Re: [jira] Commented: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers

> Otis's most recent attachment contains only tests .. but previous
> attachemnts had implementations.


I just took another look, nope, no "*Filter" class (except for test) in
those patches.

all of which have been commited under contrib/analyzers


Just did a fresh checkout from svn, and the filter classes are absent too.
Did a fulltext search. Refered to in the contrib/analyzers/src/test tests,
but not found in the contrib/analyzers/src/java folder.

Seems like the Filters were not commited.

(tip: if you click "All" at the top of the list of comments in Jira, you see
> every modification related to this issue, including subversion commits that
> Jira detects are related to the issue based on the commit message)


Thanks for the tip. Seems to confirm my doubts, as there are no "ADD" or
"MODIFY"  for any TokenFilter. But the tests are presents.

Again, I may be missing something, somewhere, but I can't seem to find the
NGramTokenFilter class.

Thanks.

Patrick




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12