Question on lucene sandbox highlighter

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Question on lucene sandbox highlighter

Terence Lai
Hi all,

I have a couple questions regarding to the Highlighter.

Question 1:
===========
I download the highlighter source files. When I compile the code, I got the following error:

----------------
org/apache/lucene/search/highlight/TokenSources.java [19:1] cannot resolve symbol
symbol  : class TermVectorOffsetInfo
location: package index
import org.apache.lucene.index.TermVectorOffsetInfo;
----------------

Note that I have lucene 1.4.2 jar file in my class path. However, it does not have org.apache.lucene.index.TermVectorOffsetInfo. Does anyone know whether I am missing some other jar files?


Question 2:
===========
I use lucene to search HTML document. Before I create the the seach index, I used another open source parser to remove all the HTML tag from the search field contents so that the HTML tag will not be part of the searchable values.

Now, I would like to apply the highlighter to my original HTML document. Is there any way for me to ignore the HTML tag while I perform the hightlight. For example, my search criteria is "html". I don't what the highlighter to highlight "<HTML>" tag.


Thanks,
Terence
   




----------------------------------------------------------
Get your free email account from http://www.trekspace.com
          Your Internet Virtual Desktop!

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Question on lucene sandbox highlighter

Erik Hatcher

On Jun 10, 2005, at 11:28 AM, Terence Lai wrote:

> Hi all,
>
> I have a couple questions regarding to the Highlighter.
>
> Question 1:
> ===========
> I download the highlighter source files. When I compile the code, I  
> got the following error:
>
> ----------------
> org/apache/lucene/search/highlight/TokenSources.java [19:1] cannot  
> resolve symbol
> symbol  : class TermVectorOffsetInfo
> location: package index
> import org.apache.lucene.index.TermVectorOffsetInfo;
> ----------------
>
> Note that I have lucene 1.4.2 jar file in my class path. However,  
> it does not have org.apache.lucene.index.TermVectorOffsetInfo. Does  
> anyone know whether I am missing some other jar files?

The latest Highlighter source code is now specific to the TRUNK of  
the core Lucene API (which will be Lucene 1.9/2.0).  You will need to  
pull a previous version somehow (I'm not sure if the Subversion  
repository for contrib goes back that far or you'll need to get at  
the CVS attic for jakarta-lucene-sandbox).

You can get a binary of a 1.4 compatible Highlighter JAR from the  
source code that comes with Lucene in Action at http://
www.lucenebook.com

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]