Offsets-highlight newbie question

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Offsets-highlight newbie question

Dear All,

I'm relatively new to Java and especially new to Lucene. I study Computational
Linguistics and this term we are required to make projects using Lucene.

My choice was to write a user-friendly application, which could allow user to
crate collections of text files and then search through them.

Lucene works great! I create index, analyzer, send query, get hits...
The problem is that one of the text of the collection is always displayed in a
JTextArea, and I want to be able to highlight, mark out in color, the words
that match the query.

Now, there is the Highlighter in Java, you need to give it offsets to highlight
a string, and I would be glad to do so if I could get the offsets!

I will give snippets of code here:
//read the index, store the terms

        try {
        inread =;
        terms = inread.terms();
        } catch (IOException e) {
        System.out.println(e.getMessage() + " IndexReader failed"); }

//then - get positions for a term

        try {
    TermPositions pos =

    while ({
    for (int i=0; i<pos.freq(); i++){
                                //add position to a list
    } catch (IOException ex) {
    System.out.println(ex.getMessage()); }

//and later

        try {
                //get text from the JTextArea
        String curText = text.getText();

                //start scanning it - okay, I guess there must be another way,
                // but this is the only i could think of
        Scanner in = new Scanner(curText);
        int counter = 0;
                int point = 0;
                //yes, the while-loop looks ugly
        while(in.hasNext() && counter!=curText.length()){
        for (int i=0; i<positions.size();i++){
        String word =;
        if (counter == positions.get(i)){
point), curText.indexOf(word, point)+wordlength,painter);
                                point = curText.indexOf(word)+l; }
        } catch (BadLocationException e) {

I had thought it should have worked, but when I give it a text, say, "blah blah
blah a little text", and query for "blah", it gives me some strange numbers, not
0,1,2 or 1,2,3, and even not 0, 5, 11, what it gives me is 1, 5, 14! Where do
these numbers come from? Where can I retrieve "real" pointers into the text,
or find another way to highlight hits?

Please, give me advice, the project is due February, 14th, so I'm really

Best Regards,

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]