Anton,
I think there are at least a couple of ways of doing this. I assume you
have a program that does sentence detection already, as Lucene does not
provide this. If not, I am sure a search of the web will find one that
has high accuracy.
You can:
1. Index each sentence as a separate Document. You will need a field on
the Document relating it back to the overall file so you can reconstruct it.
2. As you index, insert sentence/paragraph boundary markers into your
index and then use the SpanQuery functionality. Search this mail
archive for sentence boundary detection and Span Query (try the dev list
too). I think there was a discussion between me, Doug and Hoss on how
to do this.
3. Do search as you do now and then post process to figure out what
sentence it came from. This will be inefficient, but I don't know what
your requirements are that way, so it may work for you.
There are probably other ways too.
anton feldmann wrote:
> I intend, to make a search, to find a word or a word pair
> in a sentence or a paragraph. But then the sentence should be indicated
> as a whole. The question relates to the fact, that I need to extend
> Lucene
> in such a way that this is possible. But where to I make a start, because
> I have no idea, how I have to change the IndexFile, whether that
> conforms with the Lucene Team.
>
> cheers
>
> anton feldmann
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
[hidden email]
> For additional commands, e-mail:
[hidden email]
>
>
--
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
School of Information Studies
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org
Voice: 315-443-5484
Fax: 315-443-6886
---------------------------------------------------------------------
To unsubscribe, e-mail:
[hidden email]
For additional commands, e-mail:
[hidden email]