WordDelimiterFilter ignores payloads

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

WordDelimiterFilter ignores payloads

Tricia Williams-2
Hi,

    When a WordDelimiterFilter ingests a token stream and creates a new
token (newTok) it appears to copy most of the old token attributes,
except the payload.  I believe this is a bug.  My solution is for the
WordDelimiterFilter to use the Token clone() method to create a carbon
copy and then modify the appropriate attributes (offsets and term
text).  I'm prepared to open a JIRA issue and submit a patch if others
agree with my solution.

Tricia
Reply | Threaded
Open this post in threaded view
|

Re: WordDelimiterFilter ignores payloads

Yonik Seeley-2
On Thu, Apr 3, 2008 at 11:46 AM, Tricia Williams
<[hidden email]> wrote:
>    When a WordDelimiterFilter ingests a token stream and creates a new token
> (newTok) it appears to copy most of the old token attributes, except the
> payload.  I believe this is a bug.  My solution is for the
> WordDelimiterFilter to use the Token clone() method to create a carbon copy
> and then modify the appropriate attributes (offsets and term text).  I'm
> prepared to open a JIRA issue and submit a patch if others agree with my
> solution.

That sounds right.  Are there types of payloads that we wouldn't want
duplicated for things like WDF or synonyms?

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: WordDelimiterFilter ignores payloads

Tricia Williams-2
Yonik Seeley wrote:

> On Thu, Apr 3, 2008 at 11:46 AM, Tricia Williams
> <[hidden email]> wrote:
>  
>>    When a WordDelimiterFilter ingests a token stream and creates a new token
>> (newTok) it appears to copy most of the old token attributes, except the
>> payload.  I believe this is a bug.  My solution is for the
>> WordDelimiterFilter to use the Token clone() method to create a carbon copy
>> and then modify the appropriate attributes (offsets and term text).  I'm
>> prepared to open a JIRA issue and submit a patch if others agree with my
>> solution.
>>    
>
> That sounds right.  Are there types of payloads that we wouldn't want
> duplicated for things like WDF or synonyms?
>
> -Yonik
>
>  
I don't think there are types of payloads that we wouldn't want to
duplicate, at least not that I can think of  - which is why I proposed
this solution.  I suppose another option would be to make this behaviour
configurable via schema.xml if such a case exists, but  I still think
the default behaviour should have the payload copied.

Tricia
Reply | Threaded
Open this post in threaded view
|

Re: WordDelimiterFilter ignores payloads

hossman

: suppose another option would be to make this behaviour configurable via
: schema.xml if such a case exists, but  I still think the default behaviour
: should have the payload copied.

+1





-Hoss