Pagination

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Pagination

Lee Li Bin
Hi,

does anyone knows how to do pagination on jsp page using the number of hits
return? Or any other solutions?

 

Do provide me with some sample coding if possible or a step by step guide.
Sry if I'm asking too much, I'm new to lucene.

 

Thanks

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Pagination

chrislusf
After search, you will just get an object Hits, and go through all of the
documents by hits.doc(i).

The pagination is controlled by you. Lucene is pre-caching first 200
documents and lazy loading the rest by batch size 200.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes

On 6/29/07, Lee Li Bin <[hidden email]> wrote:

>
> Hi,
>
> does anyone knows how to do pagination on jsp page using the number of
> hits
> return? Or any other solutions?
>
>
>
> Do provide me with some sample coding if possible or a step by step guide.
> Sry if I'm asking too much, I'm new to lucene.
>
>
>
> Thanks
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

RE: Pagination

Lee Li Bin
Hi,

I still have no idea of how to get it done. Can give me some details?

The web application is in jsp btw.

Thanks a lot.

 
Regards,
Lee Li Bin
-----Original Message-----
From: Chris Lu [mailto:[hidden email]]
Sent: Saturday, June 30, 2007 2:21 AM
To: [hidden email]
Subject: Re: Pagination

After search, you will just get an object Hits, and go through all of the
documents by hits.doc(i).

The pagination is controlled by you. Lucene is pre-caching first 200
documents and lazy loading the rest by batch size 200.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_m
inutes

On 6/29/07, Lee Li Bin <[hidden email]> wrote:

>
> Hi,
>
> does anyone knows how to do pagination on jsp page using the number of
> hits
> return? Or any other solutions?
>
>
>
> Do provide me with some sample coding if possible or a step by step guide.
> Sry if I'm asking too much, I'm new to lucene.
>
>
>
> Thanks
>
>
>
>
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Pagination

mark harwood
In reply to this post by Lee Li Bin
The Hits class is OK but can be inefficient due to re-running the query unnecessarily.

The class below illustrates how to efficiently retrieve a particular page of results and lends itself to webapps where you don't want to retain server side state (i.e. a Hits object) for each client.
It would make sense to put an upper limit on the "start" parameter (as Google etc do) to avoid consuming to much RAM per client request.

Cheers,
Mark

[Begin code]




package lucene.pagination;

import org.apache.lucene.index.Term;
import org.apache.lucene.search.HitCollector;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.util.PriorityQueue;

/**
 * A HitCollector that retrieves a specific page of results
 * @author maharwood
 */
public class HitPageCollector extends HitCollector
{
    //Demo code showing pagination
    public static void main(String[] args) throws Exception
    {
        IndexSearcher s=new IndexSearcher("/indexes/nasa");
        HitPageCollector hpc=new HitPageCollector(1,10);
        Query q=new TermQuery(new Term("contents","sea"));
        s.search(q,hpc);
        ScoreDoc[] sd = hpc.getScores();
        System.out.println("Hits "+ hpc.getStart()+" - "+ hpc.getEnd()+" of "+hpc.getTotalAvailable());
        for (int i = 0; i < sd.length; i++)
        {
            System.out.println(sd[i].doc);
        }
        s.close();
    }

    int nDocs;
    PriorityQueue hq;
    float minScore = 0.0f;
    int totalHits = 0;
    int start;
    int maxNumHits;
    int totalInThisPage;

    public HitPageCollector(int start, int maxNumHits)
    {
        this.nDocs = start + maxNumHits;
        this.start = start;
        this.maxNumHits = maxNumHits;
        hq = new HitQueue(nDocs);
    }

    public void collect(int doc, float score)
    {
        totalHits++;
        if((hq.size()<nDocs)||(score >= minScore))
        {
            ScoreDoc scoreDoc = new ScoreDoc(doc,score);
            hq.insert(scoreDoc);              // update hit queue
            minScore = ((ScoreDoc)hq.top()).score; // reset minScore
        }
        totalInThisPage=hq.size();
    }
   

    public ScoreDoc[] getScores()
    {
        //just returns the number of hits required from the required start point
        /*
            So, given hits:
                1234567890
            and a start of 2 + maxNumHits of 3 should return:
                234
            or, given hits
                12
            should return
                2
            and so, on.
        */
        if (start <= 0)
        {
            throw new IllegalArgumentException("Invalid start :" + start+" - start should be >=1");
        }
        int numReturned = Math.min(maxNumHits, (hq.size() - (start - 1)));
        if (numReturned <= 0)
        {
            return new ScoreDoc[0];
        }
        ScoreDoc[] scoreDocs = new ScoreDoc[numReturned];
        ScoreDoc scoreDoc;
        for (int i = hq.size() - 1; i >= 0; i--) // put docs in array, working backwards from lowest count
        {
            scoreDoc = (ScoreDoc) hq.pop();
            if (i < (start - 1))
            {
                break; //off the beginning of the results array
            }
            if (i < (scoreDocs.length + (start - 1)))
            {
                scoreDocs[i - (start - 1)] = scoreDoc; //within scope of results array
            }
        }
        return scoreDocs;
    }

    public int getTotalAvailable()
    {
        return totalHits;
    }

    public int getStart()
    {
        return start;
    }
   
    public int getEnd()
    {
        return start+totalInThisPage-1;
    }
   
    public class HitQueue extends PriorityQueue
    {
          public HitQueue(int size)
          {
            initialize(size);
          }
          public final boolean lessThan(Object a, Object b)
          {
            ScoreDoc hitA = (ScoreDoc)a;
            ScoreDoc hitB = (ScoreDoc)b;
            if (hitA.score == hitB.score)
              return hitA.doc > hitB.doc;
            else
              return hitA.score < hitB.score;
          }
    }
}



----- Original Message ----
From: Lee Li Bin <[hidden email]>
To: [hidden email]
Sent: Monday, 2 July, 2007 9:59:14 AM
Subject: RE: Pagination

Hi,

I still have no idea of how to get it done. Can give me some details?

The web application is in jsp btw.

Thanks a lot.

 
Regards,
Lee Li Bin
-----Original Message-----
From: Chris Lu [mailto:[hidden email]]
Sent: Saturday, June 30, 2007 2:21 AM
To: [hidden email]
Subject: Re: Pagination

After search, you will just get an object Hits, and go through all of the
documents by hits.doc(i).

The pagination is controlled by you. Lucene is pre-caching first 200
documents and lazy loading the rest by batch size 200.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_m
inutes

On 6/29/07, Lee Li Bin <[hidden email]> wrote:

>
> Hi,
>
> does anyone knows how to do pagination on jsp page using the number of
> hits
> return? Or any other solutions?
>
>
>
> Do provide me with some sample coding if possible or a step by step guide.
> Sry if I'm asking too much, I'm new to lucene.
>
>
>
> Thanks
>
>
>
>
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]






      ___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/ 

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Pagination

Alixandre Santana
Mark,

The ScoreDoc[] contains only the  IDs of each lucene document. what
would be the best way of getting the entire (lucene)document ?
Should i do a new search with the ID retrivied by hpc.getScores() -
(searcher.doc(idDoc))?

thanks.

Alixandre

On 7/2/07, mark harwood <[hidden email]> wrote:

> The Hits class is OK but can be inefficient due to re-running the query unnecessarily.
>
> The class below illustrates how to efficiently retrieve a particular page of results and lends itself to webapps where you don't want to retain server side state (i.e. a Hits object) for each client.
> It would make sense to put an upper limit on the "start" parameter (as Google etc do) to avoid consuming to much RAM per client request.
>
> Cheers,
> Mark
>
> [Begin code]
>
>
>
>
> package lucene.pagination;
>
> import org.apache.lucene.index.Term;
> import org.apache.lucene.search.HitCollector;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.ScoreDoc;
> import org.apache.lucene.search.TermQuery;
> import org.apache.lucene.util.PriorityQueue;
>
> /**
>  * A HitCollector that retrieves a specific page of results
>  * @author maharwood
>  */
> public class HitPageCollector extends HitCollector
> {
>     //Demo code showing pagination
>     public static void main(String[] args) throws Exception
>     {
>         IndexSearcher s=new IndexSearcher("/indexes/nasa");
>         HitPageCollector hpc=new HitPageCollector(1,10);
>         Query q=new TermQuery(new Term("contents","sea"));
>         s.search(q,hpc);
>         ScoreDoc[] sd = hpc.getScores();
>         System.out.println("Hits "+ hpc.getStart()+" - "+ hpc.getEnd()+" of "+hpc.getTotalAvailable());
>         for (int i = 0; i < sd.length; i++)
>         {
>             System.out.println(sd[i].doc);
>         }
>         s.close();
>     }
>
>     int nDocs;
>     PriorityQueue hq;
>     float minScore = 0.0f;
>     int totalHits = 0;
>     int start;
>     int maxNumHits;
>     int totalInThisPage;
>
>     public HitPageCollector(int start, int maxNumHits)
>     {
>         this.nDocs = start + maxNumHits;
>         this.start = start;
>         this.maxNumHits = maxNumHits;
>         hq = new HitQueue(nDocs);
>     }
>
>     public void collect(int doc, float score)
>     {
>         totalHits++;
>         if((hq.size()<nDocs)||(score >= minScore))
>         {
>             ScoreDoc scoreDoc = new ScoreDoc(doc,score);
>             hq.insert(scoreDoc);              // update hit queue
>             minScore = ((ScoreDoc)hq.top()).score; // reset minScore
>         }
>         totalInThisPage=hq.size();
>     }
>
>
>     public ScoreDoc[] getScores()
>     {
>         //just returns the number of hits required from the required start point
>         /*
>             So, given hits:
>                 1234567890
>             and a start of 2 + maxNumHits of 3 should return:
>                 234
>             or, given hits
>                 12
>             should return
>                 2
>             and so, on.
>         */
>         if (start <= 0)
>         {
>             throw new IllegalArgumentException("Invalid start :" + start+" - start should be >=1");
>         }
>         int numReturned = Math.min(maxNumHits, (hq.size() - (start - 1)));
>         if (numReturned <= 0)
>         {
>             return new ScoreDoc[0];
>         }
>         ScoreDoc[] scoreDocs = new ScoreDoc[numReturned];
>         ScoreDoc scoreDoc;
>         for (int i = hq.size() - 1; i >= 0; i--) // put docs in array, working backwards from lowest count
>         {
>             scoreDoc = (ScoreDoc) hq.pop();
>             if (i < (start - 1))
>             {
>                 break; //off the beginning of the results array
>             }
>             if (i < (scoreDocs.length + (start - 1)))
>             {
>                 scoreDocs[i - (start - 1)] = scoreDoc; //within scope of results array
>             }
>         }
>         return scoreDocs;
>     }
>
>     public int getTotalAvailable()
>     {
>         return totalHits;
>     }
>
>     public int getStart()
>     {
>         return start;
>     }
>
>     public int getEnd()
>     {
>         return start+totalInThisPage-1;
>     }
>
>     public class HitQueue extends PriorityQueue
>     {
>           public HitQueue(int size)
>           {
>             initialize(size);
>           }
>           public final boolean lessThan(Object a, Object b)
>           {
>             ScoreDoc hitA = (ScoreDoc)a;
>             ScoreDoc hitB = (ScoreDoc)b;
>             if (hitA.score == hitB.score)
>               return hitA.doc > hitB.doc;
>             else
>               return hitA.score < hitB.score;
>           }
>     }
> }
>
>
>
> ----- Original Message ----
> From: Lee Li Bin <[hidden email]>
> To: [hidden email]
> Sent: Monday, 2 July, 2007 9:59:14 AM
> Subject: RE: Pagination
>
> Hi,
>
> I still have no idea of how to get it done. Can give me some details?
>
> The web application is in jsp btw.
>
> Thanks a lot.
>
>
> Regards,
> Lee Li Bin
> -----Original Message-----
> From: Chris Lu [mailto:[hidden email]]
> Sent: Saturday, June 30, 2007 2:21 AM
> To: [hidden email]
> Subject: Re: Pagination
>
> After search, you will just get an object Hits, and go through all of the
> documents by hits.doc(i).
>
> The pagination is controlled by you. Lucene is pre-caching first 200
> documents and lazy loading the rest by batch size 200.
>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_m
> inutes
>
> On 6/29/07, Lee Li Bin <[hidden email]> wrote:
> >
> > Hi,
> >
> > does anyone knows how to do pagination on jsp page using the number of
> > hits
> > return? Or any other solutions?
> >
> >
> >
> > Do provide me with some sample coding if possible or a step by step guide.
> > Sry if I'm asking too much, I'm new to lucene.
> >
> >
> >
> > Thanks
> >
> >
> >
> >
> >
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
>
>
>
>       ___________________________________________________________
> Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
> now.
> http://uk.answers.yahoo.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Alixandre Santana

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Pagination

Lee Li Bin
Hi,

Thanks Mark!

I do have the same question as Alixandre. How do I get the content of the
document instead of the document id?

Thanks.

Regards,
Lee Li Bin
-----Original Message-----
From: Alixandre Santana [mailto:[hidden email]]
Sent: Tuesday, July 03, 2007 12:55 AM
To: [hidden email]
Subject: Re: Pagination

Mark,

The ScoreDoc[] contains only the  IDs of each lucene document. what
would be the best way of getting the entire (lucene)document ?
Should i do a new search with the ID retrivied by hpc.getScores() -
(searcher.doc(idDoc))?

thanks.

Alixandre

On 7/2/07, mark harwood <[hidden email]> wrote:
> The Hits class is OK but can be inefficient due to re-running the query
unnecessarily.
>
> The class below illustrates how to efficiently retrieve a particular page
of results and lends itself to webapps where you don't want to retain server
side state (i.e. a Hits object) for each client.
> It would make sense to put an upper limit on the "start" parameter (as
Google etc do) to avoid consuming to much RAM per client request.

>
> Cheers,
> Mark
>
> [Begin code]
>
>
>
>
> package lucene.pagination;
>
> import org.apache.lucene.index.Term;
> import org.apache.lucene.search.HitCollector;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.ScoreDoc;
> import org.apache.lucene.search.TermQuery;
> import org.apache.lucene.util.PriorityQueue;
>
> /**
>  * A HitCollector that retrieves a specific page of results
>  * @author maharwood
>  */
> public class HitPageCollector extends HitCollector
> {
>     //Demo code showing pagination
>     public static void main(String[] args) throws Exception
>     {
>         IndexSearcher s=new IndexSearcher("/indexes/nasa");
>         HitPageCollector hpc=new HitPageCollector(1,10);
>         Query q=new TermQuery(new Term("contents","sea"));
>         s.search(q,hpc);
>         ScoreDoc[] sd = hpc.getScores();
>         System.out.println("Hits "+ hpc.getStart()+" - "+ hpc.getEnd()+"
of "+hpc.getTotalAvailable());

>         for (int i = 0; i < sd.length; i++)
>         {
>             System.out.println(sd[i].doc);
>         }
>         s.close();
>     }
>
>     int nDocs;
>     PriorityQueue hq;
>     float minScore = 0.0f;
>     int totalHits = 0;
>     int start;
>     int maxNumHits;
>     int totalInThisPage;
>
>     public HitPageCollector(int start, int maxNumHits)
>     {
>         this.nDocs = start + maxNumHits;
>         this.start = start;
>         this.maxNumHits = maxNumHits;
>         hq = new HitQueue(nDocs);
>     }
>
>     public void collect(int doc, float score)
>     {
>         totalHits++;
>         if((hq.size()<nDocs)||(score >= minScore))
>         {
>             ScoreDoc scoreDoc = new ScoreDoc(doc,score);
>             hq.insert(scoreDoc);              // update hit queue
>             minScore = ((ScoreDoc)hq.top()).score; // reset minScore
>         }
>         totalInThisPage=hq.size();
>     }
>
>
>     public ScoreDoc[] getScores()
>     {
>         //just returns the number of hits required from the required start
point

>         /*
>             So, given hits:
>                 1234567890
>             and a start of 2 + maxNumHits of 3 should return:
>                 234
>             or, given hits
>                 12
>             should return
>                 2
>             and so, on.
>         */
>         if (start <= 0)
>         {
>             throw new IllegalArgumentException("Invalid start :" + start+"
- start should be >=1");
>         }
>         int numReturned = Math.min(maxNumHits, (hq.size() - (start - 1)));
>         if (numReturned <= 0)
>         {
>             return new ScoreDoc[0];
>         }
>         ScoreDoc[] scoreDocs = new ScoreDoc[numReturned];
>         ScoreDoc scoreDoc;
>         for (int i = hq.size() - 1; i >= 0; i--) // put docs in array,
working backwards from lowest count
>         {
>             scoreDoc = (ScoreDoc) hq.pop();
>             if (i < (start - 1))
>             {
>                 break; //off the beginning of the results array
>             }
>             if (i < (scoreDocs.length + (start - 1)))
>             {
>                 scoreDocs[i - (start - 1)] = scoreDoc; //within scope of
results array

>             }
>         }
>         return scoreDocs;
>     }
>
>     public int getTotalAvailable()
>     {
>         return totalHits;
>     }
>
>     public int getStart()
>     {
>         return start;
>     }
>
>     public int getEnd()
>     {
>         return start+totalInThisPage-1;
>     }
>
>     public class HitQueue extends PriorityQueue
>     {
>           public HitQueue(int size)
>           {
>             initialize(size);
>           }
>           public final boolean lessThan(Object a, Object b)
>           {
>             ScoreDoc hitA = (ScoreDoc)a;
>             ScoreDoc hitB = (ScoreDoc)b;
>             if (hitA.score == hitB.score)
>               return hitA.doc > hitB.doc;
>             else
>               return hitA.score < hitB.score;
>           }
>     }
> }
>
>
>
> ----- Original Message ----
> From: Lee Li Bin <[hidden email]>
> To: [hidden email]
> Sent: Monday, 2 July, 2007 9:59:14 AM
> Subject: RE: Pagination
>
> Hi,
>
> I still have no idea of how to get it done. Can give me some details?
>
> The web application is in jsp btw.
>
> Thanks a lot.
>
>
> Regards,
> Lee Li Bin
> -----Original Message-----
> From: Chris Lu [mailto:[hidden email]]
> Sent: Saturday, June 30, 2007 2:21 AM
> To: [hidden email]
> Subject: Re: Pagination
>
> After search, you will just get an object Hits, and go through all of the
> documents by hits.doc(i).
>
> The pagination is controlled by you. Lucene is pre-caching first 200
> documents and lazy loading the rest by batch size 200.
>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
>
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_m

> inutes
>
> On 6/29/07, Lee Li Bin <[hidden email]> wrote:
> >
> > Hi,
> >
> > does anyone knows how to do pagination on jsp page using the number of
> > hits
> > return? Or any other solutions?
> >
> >
> >
> > Do provide me with some sample coding if possible or a step by step
guide.

> > Sry if I'm asking too much, I'm new to lucene.
> >
> >
> >
> > Thanks
> >
> >
> >
> >
> >
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
>
>
>
>       ___________________________________________________________
> Yahoo! Answers - Got a question? Someone out there knows the answer. Try
it
> now.
> http://uk.answers.yahoo.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Alixandre Santana

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Pagination

Lee Li Bin
In reply to this post by Alixandre Santana
Hi Mark,

How do I display results on the second page?
I manage to display on one page using your coding.

 
Regards,
Lee Li Bin

-----Original Message-----
From: Alixandre Santana [mailto:[hidden email]]
Sent: Tuesday, July 03, 2007 12:55 AM
To: [hidden email]
Subject: Re: Pagination

Mark,

The ScoreDoc[] contains only the  IDs of each lucene document. what
would be the best way of getting the entire (lucene)document ?
Should i do a new search with the ID retrivied by hpc.getScores() -
(searcher.doc(idDoc))?

thanks.

Alixandre

On 7/2/07, mark harwood <[hidden email]> wrote:
> The Hits class is OK but can be inefficient due to re-running the query
unnecessarily.
>
> The class below illustrates how to efficiently retrieve a particular page
of results and lends itself to webapps where you don't want to retain server
side state (i.e. a Hits object) for each client.
> It would make sense to put an upper limit on the "start" parameter (as
Google etc do) to avoid consuming to much RAM per client request.

>
> Cheers,
> Mark
>
> [Begin code]
>
>
>
>
> package lucene.pagination;
>
> import org.apache.lucene.index.Term;
> import org.apache.lucene.search.HitCollector;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.ScoreDoc;
> import org.apache.lucene.search.TermQuery;
> import org.apache.lucene.util.PriorityQueue;
>
> /**
>  * A HitCollector that retrieves a specific page of results
>  * @author maharwood
>  */
> public class HitPageCollector extends HitCollector
> {
>     //Demo code showing pagination
>     public static void main(String[] args) throws Exception
>     {
>         IndexSearcher s=new IndexSearcher("/indexes/nasa");
>         HitPageCollector hpc=new HitPageCollector(1,10);
>         Query q=new TermQuery(new Term("contents","sea"));
>         s.search(q,hpc);
>         ScoreDoc[] sd = hpc.getScores();
>         System.out.println("Hits "+ hpc.getStart()+" - "+ hpc.getEnd()+"
of "+hpc.getTotalAvailable());

>         for (int i = 0; i < sd.length; i++)
>         {
>             System.out.println(sd[i].doc);
>         }
>         s.close();
>     }
>
>     int nDocs;
>     PriorityQueue hq;
>     float minScore = 0.0f;
>     int totalHits = 0;
>     int start;
>     int maxNumHits;
>     int totalInThisPage;
>
>     public HitPageCollector(int start, int maxNumHits)
>     {
>         this.nDocs = start + maxNumHits;
>         this.start = start;
>         this.maxNumHits = maxNumHits;
>         hq = new HitQueue(nDocs);
>     }
>
>     public void collect(int doc, float score)
>     {
>         totalHits++;
>         if((hq.size()<nDocs)||(score >= minScore))
>         {
>             ScoreDoc scoreDoc = new ScoreDoc(doc,score);
>             hq.insert(scoreDoc);              // update hit queue
>             minScore = ((ScoreDoc)hq.top()).score; // reset minScore
>         }
>         totalInThisPage=hq.size();
>     }
>
>
>     public ScoreDoc[] getScores()
>     {
>         //just returns the number of hits required from the required start
point

>         /*
>             So, given hits:
>                 1234567890
>             and a start of 2 + maxNumHits of 3 should return:
>                 234
>             or, given hits
>                 12
>             should return
>                 2
>             and so, on.
>         */
>         if (start <= 0)
>         {
>             throw new IllegalArgumentException("Invalid start :" + start+"
- start should be >=1");
>         }
>         int numReturned = Math.min(maxNumHits, (hq.size() - (start - 1)));
>         if (numReturned <= 0)
>         {
>             return new ScoreDoc[0];
>         }
>         ScoreDoc[] scoreDocs = new ScoreDoc[numReturned];
>         ScoreDoc scoreDoc;
>         for (int i = hq.size() - 1; i >= 0; i--) // put docs in array,
working backwards from lowest count
>         {
>             scoreDoc = (ScoreDoc) hq.pop();
>             if (i < (start - 1))
>             {
>                 break; //off the beginning of the results array
>             }
>             if (i < (scoreDocs.length + (start - 1)))
>             {
>                 scoreDocs[i - (start - 1)] = scoreDoc; //within scope of
results array

>             }
>         }
>         return scoreDocs;
>     }
>
>     public int getTotalAvailable()
>     {
>         return totalHits;
>     }
>
>     public int getStart()
>     {
>         return start;
>     }
>
>     public int getEnd()
>     {
>         return start+totalInThisPage-1;
>     }
>
>     public class HitQueue extends PriorityQueue
>     {
>           public HitQueue(int size)
>           {
>             initialize(size);
>           }
>           public final boolean lessThan(Object a, Object b)
>           {
>             ScoreDoc hitA = (ScoreDoc)a;
>             ScoreDoc hitB = (ScoreDoc)b;
>             if (hitA.score == hitB.score)
>               return hitA.doc > hitB.doc;
>             else
>               return hitA.score < hitB.score;
>           }
>     }
> }
>
>
>
> ----- Original Message ----
> From: Lee Li Bin <[hidden email]>
> To: [hidden email]
> Sent: Monday, 2 July, 2007 9:59:14 AM
> Subject: RE: Pagination
>
> Hi,
>
> I still have no idea of how to get it done. Can give me some details?
>
> The web application is in jsp btw.
>
> Thanks a lot.
>
>
> Regards,
> Lee Li Bin
> -----Original Message-----
> From: Chris Lu [mailto:[hidden email]]
> Sent: Saturday, June 30, 2007 2:21 AM
> To: [hidden email]
> Subject: Re: Pagination
>
> After search, you will just get an object Hits, and go through all of the
> documents by hits.doc(i).
>
> The pagination is controlled by you. Lucene is pre-caching first 200
> documents and lazy loading the rest by batch size 200.
>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
>
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_m

> inutes
>
> On 6/29/07, Lee Li Bin <[hidden email]> wrote:
> >
> > Hi,
> >
> > does anyone knows how to do pagination on jsp page using the number of
> > hits
> > return? Or any other solutions?
> >
> >
> >
> > Do provide me with some sample coding if possible or a step by step
guide.

> > Sry if I'm asking too much, I'm new to lucene.
> >
> >
> >
> > Thanks
> >
> >
> >
> >
> >
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
>
>
>
>       ___________________________________________________________
> Yahoo! Answers - Got a question? Someone out there knows the answer. Try
it
> now.
> http://uk.answers.yahoo.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Alixandre Santana

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Pagination

MC Moisei-2
In reply to this post by Lee Li Bin
I get the ids then I do look the items in the database using select item.* from item where item.id in ( ids )

 -------------- Original message ----------------------
From: "Lee Li Bin" <[hidden email]>

> Hi,
>
> Thanks Mark!
>
> I do have the same question as Alixandre. How do I get the content of the
> document instead of the document id?
>
> Thanks.
>
> Regards,
> Lee Li Bin
> -----Original Message-----
> From: Alixandre Santana [mailto:[hidden email]]
> Sent: Tuesday, July 03, 2007 12:55 AM
> To: [hidden email]
> Subject: Re: Pagination
>
> Mark,
>
> The ScoreDoc[] contains only the  IDs of each lucene document. what
> would be the best way of getting the entire (lucene)document ?
> Should i do a new search with the ID retrivied by hpc.getScores() -
> (searcher.doc(idDoc))?
>
> thanks.
>
> Alixandre
>
> On 7/2/07, mark harwood <[hidden email]> wrote:
> > The Hits class is OK but can be inefficient due to re-running the query
> unnecessarily.
> >
> > The class below illustrates how to efficiently retrieve a particular page
> of results and lends itself to webapps where you don't want to retain server
> side state (i.e. a Hits object) for each client.
> > It would make sense to put an upper limit on the "start" parameter (as
> Google etc do) to avoid consuming to much RAM per client request.
> >
> > Cheers,
> > Mark
> >
> > [Begin code]
> >
> >
> >
> >
> > package lucene.pagination;
> >
> > import org.apache.lucene.index.Term;
> > import org.apache.lucene.search.HitCollector;
> > import org.apache.lucene.search.IndexSearcher;
> > import org.apache.lucene.search.Query;
> > import org.apache.lucene.search.ScoreDoc;
> > import org.apache.lucene.search.TermQuery;
> > import org.apache.lucene.util.PriorityQueue;
> >
> > /**
> >  * A HitCollector that retrieves a specific page of results
> >  * @author maharwood
> >  */
> > public class HitPageCollector extends HitCollector
> > {
> >     //Demo code showing pagination
> >     public static void main(String[] args) throws Exception
> >     {
> >         IndexSearcher s=new IndexSearcher("/indexes/nasa");
> >         HitPageCollector hpc=new HitPageCollector(1,10);
> >         Query q=new TermQuery(new Term("contents","sea"));
> >         s.search(q,hpc);
> >         ScoreDoc[] sd = hpc.getScores();
> >         System.out.println("Hits "+ hpc.getStart()+" - "+ hpc.getEnd()+"
> of "+hpc.getTotalAvailable());
> >         for (int i = 0; i < sd.length; i++)
> >         {
> >             System.out.println(sd[i].doc);
> >         }
> >         s.close();
> >     }
> >
> >     int nDocs;
> >     PriorityQueue hq;
> >     float minScore = 0.0f;
> >     int totalHits = 0;
> >     int start;
> >     int maxNumHits;
> >     int totalInThisPage;
> >
> >     public HitPageCollector(int start, int maxNumHits)
> >     {
> >         this.nDocs = start + maxNumHits;
> >         this.start = start;
> >         this.maxNumHits = maxNumHits;
> >         hq = new HitQueue(nDocs);
> >     }
> >
> >     public void collect(int doc, float score)
> >     {
> >         totalHits++;
> >         if((hq.size()<nDocs)||(score >= minScore))
> >         {
> >             ScoreDoc scoreDoc = new ScoreDoc(doc,score);
> >             hq.insert(scoreDoc);              // update hit queue
> >             minScore = ((ScoreDoc)hq.top()).score; // reset minScore
> >         }
> >         totalInThisPage=hq.size();
> >     }
> >
> >
> >     public ScoreDoc[] getScores()
> >     {
> >         //just returns the number of hits required from the required start
> point
> >         /*
> >             So, given hits:
> >                 1234567890
> >             and a start of 2 + maxNumHits of 3 should return:
> >                 234
> >             or, given hits
> >                 12
> >             should return
> >                 2
> >             and so, on.
> >         */
> >         if (start <= 0)
> >         {
> >             throw new IllegalArgumentException("Invalid start :" + start+"
> - start should be >=1");
> >         }
> >         int numReturned = Math.min(maxNumHits, (hq.size() - (start - 1)));
> >         if (numReturned <= 0)
> >         {
> >             return new ScoreDoc[0];
> >         }
> >         ScoreDoc[] scoreDocs = new ScoreDoc[numReturned];
> >         ScoreDoc scoreDoc;
> >         for (int i = hq.size() - 1; i >= 0; i--) // put docs in array,
> working backwards from lowest count
> >         {
> >             scoreDoc = (ScoreDoc) hq.pop();
> >             if (i < (start - 1))
> >             {
> >                 break; //off the beginning of the results array
> >             }
> >             if (i < (scoreDocs.length + (start - 1)))
> >             {
> >                 scoreDocs[i - (start - 1)] = scoreDoc; //within scope of
> results array
> >             }
> >         }
> >         return scoreDocs;
> >     }
> >
> >     public int getTotalAvailable()
> >     {
> >         return totalHits;
> >     }
> >
> >     public int getStart()
> >     {
> >         return start;
> >     }
> >
> >     public int getEnd()
> >     {
> >         return start+totalInThisPage-1;
> >     }
> >
> >     public class HitQueue extends PriorityQueue
> >     {
> >           public HitQueue(int size)
> >           {
> >             initialize(size);
> >           }
> >           public final boolean lessThan(Object a, Object b)
> >           {
> >             ScoreDoc hitA = (ScoreDoc)a;
> >             ScoreDoc hitB = (ScoreDoc)b;
> >             if (hitA.score == hitB.score)
> >               return hitA.doc > hitB.doc;
> >             else
> >               return hitA.score < hitB.score;
> >           }
> >     }
> > }
> >
> >
> >
> > ----- Original Message ----
> > From: Lee Li Bin <[hidden email]>
> > To: [hidden email]
> > Sent: Monday, 2 July, 2007 9:59:14 AM
> > Subject: RE: Pagination
> >
> > Hi,
> >
> > I still have no idea of how to get it done. Can give me some details?
> >
> > The web application is in jsp btw.
> >
> > Thanks a lot.
> >
> >
> > Regards,
> > Lee Li Bin
> > -----Original Message-----
> > From: Chris Lu [mailto:[hidden email]]
> > Sent: Saturday, June 30, 2007 2:21 AM
> > To: [hidden email]
> > Subject: Re: Pagination
> >
> > After search, you will just get an object Hits, and go through all of the
> > documents by hits.doc(i).
> >
> > The pagination is controlled by you. Lucene is pre-caching first 200
> > documents and lazy loading the rest by batch size 200.
> >
> > --
> > Chris Lu
> > -------------------------
> > Instant Scalable Full-Text Search On Any Database/Application
> > site: http://www.dbsight.net
> > demo: http://search.dbsight.com
> > Lucene Database Search in 3 minutes:
> >
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_m
> > inutes
> >
> > On 6/29/07, Lee Li Bin <[hidden email]> wrote:
> > >
> > > Hi,
> > >
> > > does anyone knows how to do pagination on jsp page using the number of
> > > hits
> > > return? Or any other solutions?
> > >
> > >
> > >
> > > Do provide me with some sample coding if possible or a step by step
> guide.
> > > Sry if I'm asking too much, I'm new to lucene.
> > >
> > >
> > >
> > > Thanks
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
> >
> >
> >
> >
> >       ___________________________________________________________
> > Yahoo! Answers - Got a question? Someone out there knows the answer. Try
> it
> > now.
> > http://uk.answers.yahoo.com/
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
>
> --
> Alixandre Santana
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Pagination

mark harwood
In reply to this post by Lee Li Bin
>>I get the ids then I do look the items in the database using select item.* from item where item.id in ( ids )

Hmm. That's likely to confuse the already confused :)
The ids referred to so far are Lucene internal document ids and are typically only meaningful to Lucene during a single IndexReader session. I wouldn't recommend storing them in a database because a Lucene document id can point to an entirely different document after deletes/updates are performed on the Lucene index and the IndexReader is reopened.

For the avoidance of further confusion I have extended the "main" method in my previous example (reposted below in full) to include examples of
1) Retrieving document content
2) Retrieving a "next" page (starting from result 11)
The values "1" and "11" used below in the calls to HitPageCollector constructor define the page start. This value is typically something you would get the client to pass to you e.g. note the number "10" in this URL http://www.google.com/search?q=lucene&start=10 which is used to select results from "10" onwards. Note also that this URL http://www.google.com/search?q=lucene&&start=10000 does not work because Google have placed a restriction on the maximum value for "start" - you should too.

Cheers
Mark


package lucene.pagination;

import org.apache.lucene.document.Document;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.HitCollector;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.util.PriorityQueue;

/**
 * A HitCollector that retrieves a specific page of results
 * @author maharwood
 */
public class HitPageCollector extends HitCollector
{
    //Demo code showing pagination
    public static void main(String[] args) throws Exception
    {
        IndexSearcher s=new IndexSearcher("/indexes/nasa");
        Query q=new TermQuery(new Term("contents","sea"));

        //Retrieve page 1  (hits 1-10)
        HitPageCollector hpc=new HitPageCollector(1,10);
        s.search(q,hpc);
        ScoreDoc[] sd = hpc.getScores();
        System.out.println("Hits "+ hpc.getStart()+" - "+ hpc.getEnd()+" of "+hpc.getTotalAvailable());
        for (int i = 0; i < sd.length; i++)
        {
            Document doc=s.doc(sd[i].doc);
            System.out.println(sd[i].score +" "+doc.get("title"));
        }
       
        //Example retrieve page 2 (hits 11-20)
        hpc=new HitPageCollector(11,10);
        s.search(q,hpc);
        sd = hpc.getScores();
        System.out.println("Hits "+ hpc.getStart()+" - "+ hpc.getEnd()+" of "+hpc.getTotalAvailable());
        for (int i = 0; i < sd.length; i++)
        {
            Document doc=s.doc(sd[i].doc);
            System.out.println(sd[i].score +" "+doc.get("title"));
        }
       
       
        s.close();
    }

    int nDocs;
    PriorityQueue hq;
    float minScore = 0.0f;
    int totalHits = 0;
    int start;
    int maxNumHits;
    int totalInThisPage;

    public HitPageCollector(int start, int maxNumHits)
    {
        this.nDocs = start + maxNumHits;
        this.start = start;
        this.maxNumHits = maxNumHits;
        hq = new HitQueue(nDocs);
    }

    public void collect(int doc, float score)
    {
        totalHits++;
        if((hq.size()<nDocs)||(score >= minScore))
        {
            ScoreDoc scoreDoc = new ScoreDoc(doc,score);
            hq.insert(scoreDoc);              // update hit queue
            minScore = ((ScoreDoc)hq.top()).score; // reset minScore
        }
        totalInThisPage=hq.size();
    }
   

    public ScoreDoc[] getScores()
    {
        //just returns the number of hits required from the required start point
        /*
            So, given hits:
                1234567890
            and a start of 2 + maxNumHits of 3 should return:
                234
            or, given hits
                12
            should return
                2
            and so, on.
        */
        if (start <= 0)
        {
            throw new IllegalArgumentException("Invalid start :" + start+" - start should be >=1");
        }
        int numReturned = Math.min(maxNumHits, (hq.size() - (start - 1)));
        if (numReturned <= 0)
        {
            return new ScoreDoc[0];
        }
        ScoreDoc[] scoreDocs = new ScoreDoc[numReturned];
        ScoreDoc scoreDoc;
        for (int i = hq.size() - 1; i >= 0; i--) // put docs in array, working backwards from lowest count
        {
            scoreDoc = (ScoreDoc) hq.pop();
            if (i < (start - 1))
            {
                break; //off the beginning of the results array
            }
            if (i < (scoreDocs.length + (start - 1)))
            {
                scoreDocs[i - (start - 1)] = scoreDoc; //within scope of results array
            }
        }
        return scoreDocs;
    }

    public int getTotalAvailable()
    {
        return totalHits;
    }

    public int getStart()
    {
        return start;
    }
   
    public int getEnd()
    {
        return start+totalInThisPage-1;
    }
   
    public class HitQueue extends PriorityQueue
    {
          public HitQueue(int size)
          {
            initialize(size);
          }
          public final boolean lessThan(Object a, Object b)
          {
            ScoreDoc hitA = (ScoreDoc)a;
            ScoreDoc hitB = (ScoreDoc)b;
            if (hitA.score == hitB.score)
              return hitA.doc > hitB.doc;
            else
              return hitA.score < hitB.score;
          }
    }
}






      ___________________________________________________________
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html 

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Pagination

MC Moisei-2
In reply to this post by Lee Li Bin
It looks that we may have different cases.

What I do I index my items prior to insert them into the database. When I do a search I get the ids that have the best match and then lookup the items from the database. So far worked just fine. I have 5000 rows  of items and I think will still work fine later when I'd have 100K items.

 -------------- Original message ----------------------
From: mark harwood <[hidden email]>

> >>I get the ids then I do look the items in the database using select item.*
> from item where item.id in ( ids )
>
> Hmm. That's likely to confuse the already confused :)
> The ids referred to so far are Lucene internal document ids and are typically
> only meaningful to Lucene during a single IndexReader session. I wouldn't
> recommend storing them in a database because a Lucene document id can point to
> an entirely different document after deletes/updates are performed on the Lucene
> index and the IndexReader is reopened.
>
> For the avoidance of further confusion I have extended the "main" method in my
> previous example (reposted below in full) to include examples of
> 1) Retrieving document content
> 2) Retrieving a "next" page (starting from result 11)
> The values "1" and "11" used below in the calls to HitPageCollector constructor
> define the page start. This value is typically something you would get the
> client to pass to you e.g. note the number "10" in this URL
> http://www.google.com/search?q=lucene&start=10 which is used to select results
> from "10" onwards. Note also that this URL
> http://www.google.com/search?q=lucene&&start=10000 does not work because Google
> have placed a restriction on the maximum value for "start" - you should too.
>
> Cheers
> Mark
>
>
> package lucene.pagination;
>
> import org.apache.lucene.document.Document;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.search.HitCollector;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.ScoreDoc;
> import org.apache.lucene.search.TermQuery;
> import org.apache.lucene.util.PriorityQueue;
>
> /**
>  * A HitCollector that retrieves a specific page of results
>  * @author maharwood
>  */
> public class HitPageCollector extends HitCollector
> {
>     //Demo code showing pagination
>     public static void main(String[] args) throws Exception
>     {
>         IndexSearcher s=new IndexSearcher("/indexes/nasa");
>         Query q=new TermQuery(new Term("contents","sea"));
>
>         //Retrieve page 1  (hits 1-10)
>         HitPageCollector hpc=new HitPageCollector(1,10);
>         s.search(q,hpc);
>         ScoreDoc[] sd = hpc.getScores();
>         System.out.println("Hits "+ hpc.getStart()+" - "+ hpc.getEnd()+" of
> "+hpc.getTotalAvailable());
>         for (int i = 0; i < sd.length; i++)
>         {
>             Document doc=s.doc(sd[i].doc);
>             System.out.println(sd[i].score +" "+doc.get("title"));
>         }
>        
>         //Example retrieve page 2 (hits 11-20)
>         hpc=new HitPageCollector(11,10);
>         s.search(q,hpc);
>         sd = hpc.getScores();
>         System.out.println("Hits "+ hpc.getStart()+" - "+ hpc.getEnd()+" of
> "+hpc.getTotalAvailable());
>         for (int i = 0; i < sd.length; i++)
>         {
>             Document doc=s.doc(sd[i].doc);
>             System.out.println(sd[i].score +" "+doc.get("title"));
>         }
>        
>        
>         s.close();
>     }
>
>     int nDocs;
>     PriorityQueue hq;
>     float minScore = 0.0f;
>     int totalHits = 0;
>     int start;
>     int maxNumHits;
>     int totalInThisPage;
>
>     public HitPageCollector(int start, int maxNumHits)
>     {
>         this.nDocs = start + maxNumHits;
>         this.start = start;
>         this.maxNumHits = maxNumHits;
>         hq = new HitQueue(nDocs);
>     }
>
>     public void collect(int doc, float score)
>     {
>         totalHits++;
>         if((hq.size()<nDocs)||(score >= minScore))
>         {
>             ScoreDoc scoreDoc = new ScoreDoc(doc,score);
>             hq.insert(scoreDoc);              // update hit queue
>             minScore = ((ScoreDoc)hq.top()).score; // reset minScore
>         }
>         totalInThisPage=hq.size();
>     }
>    
>
>     public ScoreDoc[] getScores()
>     {
>         //just returns the number of hits required from the required start point
>         /*
>             So, given hits:
>                 1234567890
>             and a start of 2 + maxNumHits of 3 should return:
>                 234
>             or, given hits
>                 12
>             should return
>                 2
>             and so, on.
>         */
>         if (start <= 0)
>         {
>             throw new IllegalArgumentException("Invalid start :" + start+" -
> start should be >=1");
>         }
>         int numReturned = Math.min(maxNumHits, (hq.size() - (start - 1)));
>         if (numReturned <= 0)
>         {
>             return new ScoreDoc[0];
>         }
>         ScoreDoc[] scoreDocs = new ScoreDoc[numReturned];
>         ScoreDoc scoreDoc;
>         for (int i = hq.size() - 1; i >= 0; i--) // put docs in array, working
> backwards from lowest count
>         {
>             scoreDoc = (ScoreDoc) hq.pop();
>             if (i < (start - 1))
>             {
>                 break; //off the beginning of the results array
>             }
>             if (i < (scoreDocs.length + (start - 1)))
>             {
>                 scoreDocs[i - (start - 1)] = scoreDoc; //within scope of results
> array
>             }
>         }
>         return scoreDocs;
>     }
>
>     public int getTotalAvailable()
>     {
>         return totalHits;
>     }
>
>     public int getStart()
>     {
>         return start;
>     }
>    
>     public int getEnd()
>     {
>         return start+totalInThisPage-1;
>     }
>    
>     public class HitQueue extends PriorityQueue
>     {
>           public HitQueue(int size)
>           {
>             initialize(size);
>           }
>           public final boolean lessThan(Object a, Object b)
>           {
>             ScoreDoc hitA = (ScoreDoc)a;
>             ScoreDoc hitB = (ScoreDoc)b;
>             if (hitA.score == hitB.score)
>               return hitA.doc > hitB.doc;
>             else
>               return hitA.score < hitB.score;
>           }
>     }
> }
>
>
>
>
>
>
>       ___________________________________________________________
> Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
> your free account today
> http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html 
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Pagination

Alixandre Santana
In reply to this post by mark harwood
Mark,

Thanks for the code.

Well..I´m doing the same thing you are:

Retrieve some Doc IDs and then use the code
- Document doc=searcher.doc(sd[i].doc) - to get the Document itself.

But in this case, we are doing a search to get the IDs, and "n"
searches to get the Documents, which is not a good practice.

Is there another option of do it?

Alixandre


On 7/3/07, mark harwood <[hidden email]> wrote:

> >>I get the ids then I do look the items in the database using select item.* from item where item.id in ( ids )
>
> Hmm. That's likely to confuse the already confused :)
> The ids referred to so far are Lucene internal document ids and are typically only meaningful to Lucene during a single IndexReader session. I wouldn't recommend storing them in a database because a Lucene document id can point to an entirely different document after deletes/updates are performed on the Lucene index and the IndexReader is reopened.
>
> For the avoidance of further confusion I have extended the "main" method in my previous example (reposted below in full) to include examples of
> 1) Retrieving document content
> 2) Retrieving a "next" page (starting from result 11)
> The values "1" and "11" used below in the calls to HitPageCollector constructor define the page start. This value is typically something you would get the client to pass to you e.g. note the number "10" in this URL http://www.google.com/search?q=lucene&start=10 which is used to select results from "10" onwards. Note also that this URL http://www.google.com/search?q=lucene&&start=10000 does not work because Google have placed a restriction on the maximum value for "start" - you should too.
>
> Cheers
> Mark
>
>
> package lucene.pagination;
>
> import org.apache.lucene.document.Document;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.search.HitCollector;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.ScoreDoc;
> import org.apache.lucene.search.TermQuery;
> import org.apache.lucene.util.PriorityQueue;
>
> /**
>  * A HitCollector that retrieves a specific page of results
>  * @author maharwood
>  */
> public class HitPageCollector extends HitCollector
> {
>     //Demo code showing pagination
>     public static void main(String[] args) throws Exception
>     {
>         IndexSearcher s=new IndexSearcher("/indexes/nasa");
>         Query q=new TermQuery(new Term("contents","sea"));
>
>         //Retrieve page 1  (hits 1-10)
>         HitPageCollector hpc=new HitPageCollector(1,10);
>         s.search(q,hpc);
>         ScoreDoc[] sd = hpc.getScores();
>         System.out.println("Hits "+ hpc.getStart()+" - "+ hpc.getEnd()+" of "+hpc.getTotalAvailable());
>         for (int i = 0; i < sd.length; i++)
>         {
>             Document doc=s.doc(sd[i].doc);
>             System.out.println(sd[i].score +" "+doc.get("title"));
>         }
>
>         //Example retrieve page 2 (hits 11-20)
>         hpc=new HitPageCollector(11,10);
>         s.search(q,hpc);
>         sd = hpc.getScores();
>         System.out.println("Hits "+ hpc.getStart()+" - "+ hpc.getEnd()+" of "+hpc.getTotalAvailable());
>         for (int i = 0; i < sd.length; i++)
>         {
>             Document doc=s.doc(sd[i].doc);
>             System.out.println(sd[i].score +" "+doc.get("title"));
>         }
>
>
>         s.close();
>     }
>
>     int nDocs;
>     PriorityQueue hq;
>     float minScore = 0.0f;
>     int totalHits = 0;
>     int start;
>     int maxNumHits;
>     int totalInThisPage;
>
>     public HitPageCollector(int start, int maxNumHits)
>     {
>         this.nDocs = start + maxNumHits;
>         this.start = start;
>         this.maxNumHits = maxNumHits;
>         hq = new HitQueue(nDocs);
>     }
>
>     public void collect(int doc, float score)
>     {
>         totalHits++;
>         if((hq.size()<nDocs)||(score >= minScore))
>         {
>             ScoreDoc scoreDoc = new ScoreDoc(doc,score);
>             hq.insert(scoreDoc);              // update hit queue
>             minScore = ((ScoreDoc)hq.top()).score; // reset minScore
>         }
>         totalInThisPage=hq.size();
>     }
>
>
>     public ScoreDoc[] getScores()
>     {
>         //just returns the number of hits required from the required start point
>         /*
>             So, given hits:
>                 1234567890
>             and a start of 2 + maxNumHits of 3 should return:
>                 234
>             or, given hits
>                 12
>             should return
>                 2
>             and so, on.
>         */
>         if (start <= 0)
>         {
>             throw new IllegalArgumentException("Invalid start :" + start+" - start should be >=1");
>         }
>         int numReturned = Math.min(maxNumHits, (hq.size() - (start - 1)));
>         if (numReturned <= 0)
>         {
>             return new ScoreDoc[0];
>         }
>         ScoreDoc[] scoreDocs = new ScoreDoc[numReturned];
>         ScoreDoc scoreDoc;
>         for (int i = hq.size() - 1; i >= 0; i--) // put docs in array, working backwards from lowest count
>         {
>             scoreDoc = (ScoreDoc) hq.pop();
>             if (i < (start - 1))
>             {
>                 break; //off the beginning of the results array
>             }
>             if (i < (scoreDocs.length + (start - 1)))
>             {
>                 scoreDocs[i - (start - 1)] = scoreDoc; //within scope of results array
>             }
>         }
>         return scoreDocs;
>     }
>
>     public int getTotalAvailable()
>     {
>         return totalHits;
>     }
>
>     public int getStart()
>     {
>         return start;
>     }
>
>     public int getEnd()
>     {
>         return start+totalInThisPage-1;
>     }
>
>     public class HitQueue extends PriorityQueue
>     {
>           public HitQueue(int size)
>           {
>             initialize(size);
>           }
>           public final boolean lessThan(Object a, Object b)
>           {
>             ScoreDoc hitA = (ScoreDoc)a;
>             ScoreDoc hitB = (ScoreDoc)b;
>             if (hitA.score == hitB.score)
>               return hitA.doc > hitB.doc;
>             else
>               return hitA.score < hitB.score;
>           }
>     }
> }
>
>
>
>
>
>
>       ___________________________________________________________
> Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
> your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Alixandre Santana

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Pagination

mark harwood
In reply to this post by Lee Li Bin
>>and "n" searches to get the Documents,
???
Where does the "n" come in? searcher.doc(id) is not a search. It is a call to IndexReader.document() to retrieve a specific document.
Try run it. It shouldn't be slow.



----- Original Message ----
From: Alixandre Santana <[hidden email]>
To: [hidden email]
Sent: Tuesday, 3 July, 2007 5:17:51 PM
Subject: Re: Pagination

Mark,

Thanks for the code.

Well..I´m doing the same thing you are:

Retrieve some Doc IDs and then use the code
- Document doc=searcher.doc(sd[i].doc) - to get the Document itself.

But in this case, we are doing a search to get the IDs, and "n"
searches to get the Documents, which is not a good practice.

Is there another option of do it?

Alixandre


On 7/3/07, mark harwood <[hidden email]> wrote:

> >>I get the ids then I do look the items in the database using select item.* from item where item.id in ( ids )
>
> Hmm. That's likely to confuse the already confused :)
> The ids referred to so far are Lucene internal document ids and are typically only meaningful to Lucene during a single IndexReader session. I wouldn't recommend storing them in a database because a Lucene document id can point to an entirely different document after deletes/updates are performed on the Lucene index and the IndexReader is reopened.
>
> For the avoidance of further confusion I have extended the "main" method in my previous example (reposted below in full) to include examples of
> 1) Retrieving document content
> 2) Retrieving a "next" page (starting from result 11)
> The values "1" and "11" used below in the calls to HitPageCollector constructor define the page start. This value is typically something you would get the client to pass to you e.g. note the number "10" in this URL http://www.google.com/search?q=lucene&start=10 which is used to select results from "10" onwards. Note also that this URL http://www.google.com/search?q=lucene&&start=10000 does not work because Google have placed a restriction on the maximum value for "start" - you should too.
>
> Cheers
> Mark
>
>
> package lucene.pagination;
>
> import org.apache.lucene.document.Document;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.search.HitCollector;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.ScoreDoc;
> import org.apache.lucene.search.TermQuery;
> import org.apache.lucene.util.PriorityQueue;
>
> /**
>  * A HitCollector that retrieves a specific page of results
>  * @author maharwood
>  */
> public class HitPageCollector extends HitCollector
> {
>     //Demo code showing pagination
>     public static void main(String[] args) throws Exception
>     {
>         IndexSearcher s=new IndexSearcher("/indexes/nasa");
>         Query q=new TermQuery(new Term("contents","sea"));
>
>         //Retrieve page 1  (hits 1-10)
>         HitPageCollector hpc=new HitPageCollector(1,10);
>         s.search(q,hpc);
>         ScoreDoc[] sd = hpc.getScores();
>         System.out.println("Hits "+ hpc.getStart()+" - "+ hpc.getEnd()+" of "+hpc.getTotalAvailable());
>         for (int i = 0; i < sd.length; i++)
>         {
>             Document doc=s.doc(sd[i].doc);
>             System.out.println(sd[i].score +" "+doc.get("title"));
>         }
>
>         //Example retrieve page 2 (hits 11-20)
>         hpc=new HitPageCollector(11,10);
>         s.search(q,hpc);
>         sd = hpc.getScores();
>         System.out.println("Hits "+ hpc.getStart()+" - "+ hpc.getEnd()+" of "+hpc.getTotalAvailable());
>         for (int i = 0; i < sd.length; i++)
>         {
>             Document doc=s.doc(sd[i].doc);
>             System.out.println(sd[i].score +" "+doc.get("title"));
>         }
>
>
>         s.close();
>     }
>
>     int nDocs;
>     PriorityQueue hq;
>     float minScore = 0.0f;
>     int totalHits = 0;
>     int start;
>     int maxNumHits;
>     int totalInThisPage;
>
>     public HitPageCollector(int start, int maxNumHits)
>     {
>         this.nDocs = start + maxNumHits;
>         this.start = start;
>         this.maxNumHits = maxNumHits;
>         hq = new HitQueue(nDocs);
>     }
>
>     public void collect(int doc, float score)
>     {
>         totalHits++;
>         if((hq.size()<nDocs)||(score >= minScore))
>         {
>             ScoreDoc scoreDoc = new ScoreDoc(doc,score);
>             hq.insert(scoreDoc);              // update hit queue
>             minScore = ((ScoreDoc)hq.top()).score; // reset minScore
>         }
>         totalInThisPage=hq.size();
>     }
>
>
>     public ScoreDoc[] getScores()
>     {
>         //just returns the number of hits required from the required start point
>         /*
>             So, given hits:
>                 1234567890
>             and a start of 2 + maxNumHits of 3 should return:
>                 234
>             or, given hits
>                 12
>             should return
>                 2
>             and so, on.
>         */
>         if (start <= 0)
>         {
>             throw new IllegalArgumentException("Invalid start :" + start+" - start should be >=1");
>         }
>         int numReturned = Math.min(maxNumHits, (hq.size() - (start - 1)));
>         if (numReturned <= 0)
>         {
>             return new ScoreDoc[0];
>         }
>         ScoreDoc[] scoreDocs = new ScoreDoc[numReturned];
>         ScoreDoc scoreDoc;
>         for (int i = hq.size() - 1; i >= 0; i--) // put docs in array, working backwards from lowest count
>         {
>             scoreDoc = (ScoreDoc) hq.pop();
>             if (i < (start - 1))
>             {
>                 break; //off the beginning of the results array
>             }
>             if (i < (scoreDocs.length + (start - 1)))
>             {
>                 scoreDocs[i - (start - 1)] = scoreDoc; //within scope of results array
>             }
>         }
>         return scoreDocs;
>     }
>
>     public int getTotalAvailable()
>     {
>         return totalHits;
>     }
>
>     public int getStart()
>     {
>         return start;
>     }
>
>     public int getEnd()
>     {
>         return start+totalInThisPage-1;
>     }
>
>     public class HitQueue extends PriorityQueue
>     {
>           public HitQueue(int size)
>           {
>             initialize(size);
>           }
>           public final boolean lessThan(Object a, Object b)
>           {
>             ScoreDoc hitA = (ScoreDoc)a;
>             ScoreDoc hitB = (ScoreDoc)b;
>             if (hitA.score == hitB.score)
>               return hitA.doc > hitB.doc;
>             else
>               return hitA.score < hitB.score;
>           }
>     }
> }
>
>
>
>
>
>
>       ___________________________________________________________
> Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
> your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Alixandre Santana

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]






      ___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Pagination

mark harwood
In reply to this post by MC Moisei-2
>>It looks that we may have different cases.

I was hoping to answer the original question which was how to retrieve
pages of matching documents from a Lucene index (no database mentioned).

 >>So far worked just fine. I have 5000 rows of items and I think will
still work fine later when I'd have 100K items.

I have a Lucene-plus-database hybrid application with 100 million
documents running a business today, taking daily updates. Trust me, I
don't store Lucene doc ids in the database because it is not  a scalable
or robust approach for the reasons I outlined earlier :)

Cheers,
Mark



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]