getting document metadata

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

getting document metadata

Pablo Gomes Ludermir
Hello all,

I would like to retrieve some document metadata after the search, i.e.
the documents that are returned in the Hits would be PDFs and I would
be able to get some info using PDFBox.
But I am not sure about indexing the path when adding the document to
the index (I do some processing with the contents of the index, and I
would like to have only one field: the body contents). Is there
another way to get the document's path if we don't index it? Or just
with magic? :)

Regards,

--
Pablo Gomes Ludermir
[hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: getting document metadata

Luke Shannon
Hi Pablo;

Can you give a little more detail? I don't understand what you mean when you
say "indexing the path when adding the document to the index".

If you get a Lucene document using  LucenePDFDocument class
(http://www.pdfbox.org/javadoc/index.html), the document returned will
contain a field called path. This will have the location of the document on
the system. Is this what you are after?

Luke

----- Original Message -----
From: "Pablo Gomes Ludermir" <[hidden email]>
To: "Lucene user list" <[hidden email]>
Sent: Tuesday, May 03, 2005 2:23 PM
Subject: getting document metadata


Hello all,

I would like to retrieve some document metadata after the search, i.e.
the documents that are returned in the Hits would be PDFs and I would
be able to get some info using PDFBox.
But I am not sure about indexing the path when adding the document to
the index (I do some processing with the contents of the index, and I
would like to have only one field: the body contents). Is there
another way to get the document's path if we don't index it? Or just
with magic? :)

Regards,

--
Pablo Gomes Ludermir
[hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Fwd: getting document metadata

Pablo Gomes Ludermir
Forgot to send to the list.

---------- Forwarded message ----------
From: Pablo Gomes Ludermir <[hidden email]>
Date: May 3, 2005 9:07 PM
Subject: Re: getting document metadata
To: Luke Shannon <[hidden email]>


I actually would like to have a single field on the Document object,
named CONTENTS. My question would be if it is possible to retrieve the
document with its "number". e.g. IndexReader.document (n); And "n"
would be provided by the Hits.id(...) from the search results. Is it a
reliable approach?

Regards

On 5/3/05, Luke Shannon <[hidden email]> wrote:

> Hi Pablo;
>
> Can you give a little more detail? I don't understand what you mean when you
> say "indexing the path when adding the document to the index".
>
> If you get a Lucene document using  LucenePDFDocument class
> (http://www.pdfbox.org/javadoc/index.html), the document returned will
> contain a field called path. This will have the location of the document on
> the system. Is this what you are after?
>
> Luke
>
> ----- Original Message -----
> From: "Pablo Gomes Ludermir" <[hidden email]>
> To: "Lucene user list" <[hidden email]>
> Sent: Tuesday, May 03, 2005 2:23 PM
> Subject: getting document metadata
>
> Hello all,
>
> I would like to retrieve some document metadata after the search, i.e.
> the documents that are returned in the Hits would be PDFs and I would
> be able to get some info using PDFBox.
> But I am not sure about indexing the path when adding the document to
> the index (I do some processing with the contents of the index, and I
> would like to have only one field: the body contents). Is there
> another way to get the document's path if we don't index it? Or just
> with magic? :)
>
> Regards,
>
> --
> Pablo Gomes Ludermir
> [hidden email]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

--
Pablo Gomes Ludermir
[hidden email]


--
Pablo Gomes Ludermir
[hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...