How do I get all the documents in the index without searching?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How do I get all the documents in the index without searching?

Paul Tomblin
I want to iterate through all the documents that are in the crawl,
programattically.  The only code I can find does searches.  I don't
want to search for a term, I want everything.  Is there a way to do
this?

--
http://www.linkedin.com/in/paultomblin
Reply | Threaded
Open this post in threaded view
|

Re: How do I get all the documents in the index without searching?

Alex McLintock
Try looking at how the indexers work. They *do* iterate through all
the documents in the crawl (or rather one segment at a time). However
they do it in a Hadoop way...



2009/8/11 Paul Tomblin <[hidden email]>:
> I want to iterate through all the documents that are in the crawl,
> programattically.  The only code I can find does searches.  I don't
> want to search for a term, I want everything.  Is there a way to do
> this?
Reply | Threaded
Open this post in threaded view
|

Re: How do I get all the documents in the index without searching?

Paul Tomblin
In reply to this post by Paul Tomblin
On Tue, Aug 11, 2009 at 2:10 PM, Paul Tomblin<[hidden email]> wrote:
> I want to iterate through all the documents that are in the crawl,
> programattically.  The only code I can find does searches.  I don't
> want to search for a term, I want everything.  Is there a way to do
> this?

To answer my own question, what I ended up doing was
            IndexReader reader = IndexReader.open(indexDir.getAbsolutePath());
            for (int i = 0; i < reader.numDocs(); i++)
            {
                Document doc = reader.document(i);
            }

Now that I have the Document, I have to figure out how to process it
further to get the actual contents, but I assume that I need to go
back to the segment for that.



--
http://www.linkedin.com/in/paultomblin