How to return entire resultset which includes the highlighted keywords

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to return entire resultset which includes the highlighted keywords

syedfa
Dear Fellow Java/Lucene developers:

I am trying to use the Highlighter class to return the keywords that the user is searching for in bold.  However, instead of returning a fragment of the block of text where the keyword is found, I would like to return the ENTIRE block of text.  Here is the block of code that I am using:

QueryScorer scorer = new QueryScorer(parser);
Highlighter highlighter = new Highlighter(scorer);
for(int i=0; i<hits.length(); i++){
         Document doc=hits.doc(i);
         String lns = doc.get("LINES");
         String spkr = doc.get("SPEAKER");
         TokenStream lines = analyser.tokenStream("LINES", new StringReader(lns));
         String highlightedLines = highlighter.getBestFragment(lines, lns);
         SearchResult resultBean = new SearchResult();
         resultBean.setNarrator(hits.doc(i).get("SPEAKER"));
         resultBean.setQuote(highlightedLines);
         System.out.println(resultBean.getNarrator());
         System.out.println(resultBean.getQuote());
         System.out.println("");
        }

I am searching through some quotes that have been taken from Shakespeare's "Hamlet" that is in xml format.  If I search for the keyword "arrows", I would like to return:

HAMLET
To be, or not to be: that is the question:
Whether 'tis nobler in the mind to suffer
The slings and <b>arrows of outrageous fortune,
Or to take arms against a sea of troubles,
And by opposing end them? To die: to sleep;
No more; and by a sleep to say we end
The heart-ache and the thousand natural shocks
That flesh is heir to, 'tis a consummation
Devoutly to be wish'd. To die, to sleep;
To sleep: perchance to dream: ay, there's the rub;
For in that sleep of death what dreams may come
When we have shuffled off this mortal coil,
Must give us pause: there's the respect
That makes calamity of so long life;
For who would bear the whips and scorns of time,
The oppressor's wrong, the proud man's contumely,
The pangs of despised love, the law's delay,
The insolence of office and the spurns
That patient merit of the unworthy takes,
When he himself might his quietus make
With a bare bodkin? who would fardels bear,
To grunt and sweat under a weary life,
But that the dread of something after death,
The undiscover'd country from whose bourn
No traveller returns, puzzles the will
And makes us rather bear those ills we have
Than fly to others that we know not of?
Thus conscience does make cowards of us all;
And thus the native hue of resolution
Is sicklied o'er with the pale cast of thought,
And enterprises of great pith and moment
With this regard their currents turn awry,
And lose the name of action.--Soft you now!
The fair Ophelia! Nymph, in thy orisons
Be all my sins remember'd.

and not:

HAMLET
To be, or not to be: that is the question:
Whether 'tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles,
And by opposing end them? To die: to sleep;
No more; and by a sleep to say

Is this possible, and if so, how?  

My sincerest thanks to each and everyone who replies.

Sincerely;
Fayyaz
Reply | Threaded
Open this post in threaded view
|

Re: How to return entire resultset which includes the highlighted keywords

Mark Miller-3
Check out NullFragmenter.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to return entire resultset which includes the highlighted keywords

syedfa
Thanks so much for your reply Mr.Miller, that's exactly what I was trying to accomplish!  :-)

I have however, run into another problem now, and thus I have a follow up question:

My goal is to provide a set of results like google to the user that presents a set of results with the keyword highlighted along with about two lines of text surrounding it.  When the user clicks on this link, they should be able to see the entire block of text with the highlighted keyword on the next page.  I have found that I am able to do one or the other (i.e. retrieve the entire block of text with the highlighted keyword, or the highlighted keyword and just the surrounding text, but not both at the same time).
My code where I am trying to do this is as follows:

     QueryScorer scorer = new QueryScorer(parser);
     Highlighter highlighter = new Highlighter(scorer);  //reference to highlight the keyword in the entire block of text
     Highlighter high = new Highlighter(scorer); //reference to highlight the keyword in the surrounding text
     Fragmenter fragmenter = new NullFragmenter();  //reference to retrieve the entire block of text
     Fragmenter fragment = new SimpleFragmenter(250);// reference to retrieve the surrounding text
     highlighter.setTextFragmenter(fragmenter);
     high.setTextFragmenter(fragment);
     
        for(int i=0; i<hits.length(); i++){
         Document doc=hits.doc(i);
         String lns = doc.get("LINES");
         TokenStream lines = analyser.tokenStream("LINES", new StringReader(lns));
         String highlightedLines = highlighter.getBestFragment(lines, lns);
         String highlight = high.getBestFragment(lines, lns);
         SearchResult resultBean = new SearchResult();
         resultBean.setNarrator(hits.doc(i).get("SPEAKER"));
         resultBean.setQuote(highlightedLines);
         resultBean.setHitResult(highlight);
         searchResult.add(resultBean);
         System.out.println(resultBean.getNarrator());
         System.out.println(resultBean.getHitResult());
         System.out.println("");
         //System.out.println(resultBean.getQuote());
         System.out.println("");
         System.out.println("");
         System.out.println("");
        }
       
        System.err.println("Found " + hits.length() + " document(s)(in " + (end-start) + " milliseconds) that matched query '" + q + "':");
       
        return searchResult;        
    }



How would I accomplish this?  Based on my above code, what am I doing wrong?  My code retrieves the entire block of text with the highilghted keyword correctly, but I am getting null as my output when I try to print the results of the highlighted keyword and the surrounding text.

Thanks again for all of your help.

Sincerely;
Fayyaz
<quote author="markrmiller">
Check out NullFragmenter.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



Reply | Threaded
Open this post in threaded view
|

Re: How to return entire resultset which includes the highlighted keywords

Mark Miller-3
A TokenStream can only be read once unless you wrap it with a
CachingTokenFilter and call reset between uses. So thats what you should do.

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to return entire resultset which includes the highlighted keywords

syedfa
Thanks for your reply, can you give an example of how this is done?

Sincerely;
Fayyaz

markrmiller wrote
A TokenStream can only be read once unless you wrap it with a
CachingTokenFilter and call reset between uses. So thats what you should do.

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: How to return entire resultset which includes the highlighted keywords

syedfa
In reply to this post by Mark Miller-3
Dear Mr.Miller:

I figured out how to use the CachingTokenFilter and it worked exactly as you described.  Thanks so much once again for sharing your time and expertise!  :-)

All the best.
Sincerely;
Fayyaz


markrmiller wrote
A TokenStream can only be read once unless you wrap it with a
CachingTokenFilter and call reset between uses. So thats what you should do.

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org