Snippet Generation at Punctuation Marks?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Snippet Generation at Punctuation Marks?

Jack L
Snippet generation use hl.fragsize to determine the size
of the snippets. This works very well. However, the snippets
often have half of a sentence at the beginning, and half
at the end. Is there a parameter I can use to tell the
snippet generation code to cut at punctuation marks when
possible?

--
Best regards,
Jack

Reply | Threaded
Open this post in threaded view
|

Re: Snippet Generation at Punctuation Marks?

Brian Whitman
On May 3, 2007, at 11:39 AM, Jack L wrote:
> Snippet generation use hl.fragsize to determine the size
> of the snippets. This works very well. However, the snippets
> often have half of a sentence at the beginning, and half
> at the end. Is there a parameter I can use to tell the
> snippet generation code to cut at punctuation marks when
> possible?


We are working on this and hope to have a solr patch soon. Doing  
simple splitting on punctuation is a new fragmenter, which trunk solr  
does not support yet. But we're hoping to fix that asap.

-brian
Reply | Threaded
Open this post in threaded view
|

Re[2]: Snippet Generation at Punctuation Marks?

Jack L
Thanks. Looking forward to it!

> We are working on this and hope to have a solr patch soon. Doing  
> simple splitting on punctuation is a new fragmenter, which trunk solr
> does not support yet. But we're hoping to fix that asap.

> -brian

Reply | Threaded
Open this post in threaded view
|

Re: Snippet Generation at Punctuation Marks?

Mike Klaas
In reply to this post by Brian Whitman
On 5/3/07, Brian Whitman <[hidden email]> wrote:

> On May 3, 2007, at 11:39 AM, Jack L wrote:
> > Snippet generation use hl.fragsize to determine the size
> > of the snippets. This works very well. However, the snippets
> > often have half of a sentence at the beginning, and half
> > at the end. Is there a parameter I can use to tell the
> > snippet generation code to cut at punctuation marks when
> > possible?
>
>
> We are working on this and hope to have a solr patch soon. Doing
> simple splitting on punctuation is a new fragmenter, which trunk solr
> does not support yet. But we're hoping to fix that asap.

See http://issues.apache.org/jira/browse/SOLR-102 for my solution to
this problem.  The idea is that you'd like to split at sentence
boundaries, but also not stray too far from the desired fragment size.
 It would be great to get comments on/improvements to this approach.

-Mike