Word / Pharse match shown in a context

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Word / Pharse match shown in a context

DURGA DEEP
Dear All,

         I've been scouring through the Lucene classes. Are there any
classes which can help me acheive the following ?.

         1)  We are an e-mail service provider. We wanted to provide a seach
capability of e-mail messages via Lucene. So far we are able to index/ parse
the e-mail. create the appopriate indexes etc..

              Now The customer wants us to have a google like search
capability i.e when they search for a particular word, the word should be
highlighted as well as the surrounding
              text i.e the context in which this word occurs should also be
shown.

              Example : when searching for the word thread.

              ...crawler is a classic example of Thread in an poolExecutor
code
               an poolExecutor code...

Any help greatly appreciated
+ddt
Reply | Threaded
Open this post in threaded view
|

Re: Word / Pharse match shown in a context

Mark Miller-3
Look at the Highlighter in contrib. It creates fragments (context) and
highlights search terms in them (keywords).

If you want to highlight Phrase's correctly, check out this issue which
adds support for Spans and PhraseQuerys:

https://issues.apache.org/jira/browse/LUCENE-794

Mark


DURGA DEEP wrote:

> Dear All,
>
>          I've been scouring through the Lucene classes. Are there any
> classes which can help me acheive the following ?.
>
>          1)  We are an e-mail service provider. We wanted to provide a seach
> capability of e-mail messages via Lucene. So far we are able to index/ parse
> the e-mail. create the appopriate indexes etc..
>
>               Now The customer wants us to have a google like search
> capability i.e when they search for a particular word, the word should be
> highlighted as well as the surrounding
>               text i.e the context in which this word occurs should also be
> shown.
>
>               Example : when searching for the word thread.
>
>               ...crawler is a classic example of Thread in an poolExecutor
> code
>                an poolExecutor code...
>
> Any help greatly appreciated
> +ddt
>
>  

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Word / Pharse match shown in a context

DURGA DEEP
I have a follow up question. Seems like if I want to use highlighting, we
should store the content of the entire document that has to be indexed.

         d.add( new Field( FIELD_NAME, "some text", Field.Store.YES,
Field.Index.TOKENIZED) );

Are there better ways of acheiving this ?. Since we have huge data that
needs to be indexed.

  Thanks Much
_ddt

On 1/29/08, Mark Miller <[hidden email]> wrote:

>
> Look at the Highlighter in contrib. It creates fragments (context) and
> highlights search terms in them (keywords).
>
> If you want to highlight Phrase's correctly, check out this issue which
> adds support for Spans and PhraseQuerys:
>
> https://issues.apache.org/jira/browse/LUCENE-794
>
> Mark
>
>
> DURGA DEEP wrote:
> > Dear All,
> >
> >          I've been scouring through the Lucene classes. Are there any
> > classes which can help me acheive the following ?.
> >
> >          1)  We are an e-mail service provider. We wanted to provide a
> seach
> > capability of e-mail messages via Lucene. So far we are able to index/
> parse
> > the e-mail. create the appopriate indexes etc..
> >
> >               Now The customer wants us to have a google like search
> > capability i.e when they search for a particular word, the word should
> be
> > highlighted as well as the surrounding
> >               text i.e the context in which this word occurs should also
> be
> > shown.
> >
> >               Example : when searching for the word thread.
> >
> >               ...crawler is a classic example of Thread in an
> poolExecutor
> > code
> >                an poolExecutor code...
> >
> > Any help greatly appreciated
> > +ddt
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Word / Pharse match shown in a context

Mark Miller-3
You don't necessarily need to store the data in Lucene, but yes it does
need to be stored somewhere. Otherwise, where would the context come
from? If you are not stripping stopwords or stemming or lowercasing or
anything, I suppose you could rebuild it from the index...

To keep from having to retokenize you can check out the TokenSources
class which allows you to use TermVectors to rebuild the
TokenStream....you still need the original text to fragment and
highlight though. Weather you pull that text from a database, the
filesystem, or Lucene, does not matter to the highlighter.

- Mark

DURGA DEEP wrote:

> I have a follow up question. Seems like if I want to use highlighting, we
> should store the content of the entire document that has to be indexed.
>
>          d.add( new Field( FIELD_NAME, "some text", Field.Store.YES,
> Field.Index.TOKENIZED) );
>
> Are there better ways of acheiving this ?. Since we have huge data that
> needs to be indexed.
>
>   Thanks Much
> _ddt
>
> On 1/29/08, Mark Miller <[hidden email]> wrote:
>  
>> Look at the Highlighter in contrib. It creates fragments (context) and
>> highlights search terms in them (keywords).
>>
>> If you want to highlight Phrase's correctly, check out this issue which
>> adds support for Spans and PhraseQuerys:
>>
>> https://issues.apache.org/jira/browse/LUCENE-794
>>
>> Mark
>>
>>
>> DURGA DEEP wrote:
>>    
>>> Dear All,
>>>
>>>          I've been scouring through the Lucene classes. Are there any
>>> classes which can help me acheive the following ?.
>>>
>>>          1)  We are an e-mail service provider. We wanted to provide a
>>>      
>> seach
>>    
>>> capability of e-mail messages via Lucene. So far we are able to index/
>>>      
>> parse
>>    
>>> the e-mail. create the appopriate indexes etc..
>>>
>>>               Now The customer wants us to have a google like search
>>> capability i.e when they search for a particular word, the word should
>>>      
>> be
>>    
>>> highlighted as well as the surrounding
>>>               text i.e the context in which this word occurs should also
>>>      
>> be
>>    
>>> shown.
>>>
>>>               Example : when searching for the word thread.
>>>
>>>               ...crawler is a classic example of Thread in an
>>>      
>> poolExecutor
>>    
>>> code
>>>                an poolExecutor code...
>>>
>>> Any help greatly appreciated
>>> +ddt
>>>
>>>
>>>      
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>    
>
>  

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Having 2 fields, each using different analyzers?

Itamar Syn-Hershko
Hi all,

Since Analyzer is set per IndexWriter, which is being added a Document,
which has several fields, I was wondering how would I store 2 different
fields in a Document, each being passed through a different Analyzer? The
idea is to have 2 fields of the same content, one stemmed and one is not,
both are tokenized and not stored. The rationale is Hebrew indexing (yes,
again :) ).

Itamar.



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Having 2 fields, each using different analyzers?

steve_rowe
Hi Itamar,

On 01/31/2008 at 6:28 PM, Itamar Syn-Hershko wrote:
> Since Analyzer is set per IndexWriter, which is being added a Document,
> which has several fields, I was wondering how would I store 2 different
> fields in a Document, each being passed through a different Analyzer?
> The idea is to have 2 fields of the same content, one stemmed and one is
> not, both are tokenized and not stored. The rationale is Hebrew indexing
> (yes, again :) ).

Take a look at PerFieldAnalyzerWrapper:

<http://lucene.apache.org/java/1_4_3/api/org/apache/lucene/analysis/PerFieldAnalyzerWrapper.html>

Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]