Text search from List of xml files using Apache Lucene

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Text search from List of xml files using Apache Lucene

Karthikeyan P
Hello,

I have a list of xml files in a directory , I have to parse these xml using
apache lucene and index it. Once indexing is done , I want to be able to
search text inside xml files. How can I achieve this? I am able to search
text files in a similar way, can someone help me with xml lucene search??


Regards,

Karthik.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Text search from List of xml files using Apache Lucene

Corbin, J.D.
Hi Karthik,


​Sounds like you know what you have to do, the only problem I saw with your
statement is about parsing it with Lucene.  You can read the files from
disk (basic I/O) and use a SAX parser to extract the information you want
to search against and then build your index from that information.  Once
you have the index built, you can search against it using Lucene's API.

J.D.​


On Tue, Mar 28, 2017 at 6:26 AM, Karthikeyan P <[hidden email]>
wrote:

> Hello,
>
> I have a list of xml files in a directory , I have to parse these xml using
> apache lucene and index it. Once indexing is done , I want to be able to
> search text inside xml files. How can I achieve this? I am able to search
> text files in a similar way, can someone help me with xml lucene search??
>
>
> Regards,
>
> Karthik.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Text search from List of xml files using Apache Lucene

Evert Wagenaar
In reply to this post by Karthikeyan P
You don't need a Lucene Parser (They don't exist). In stead use a Java
Parser (such as dom4j). I personally prefer DOM. It allows XPATH to extract
exactly what you need. SAX is an alternative to DOM. SAX isn't however
recommended by the W3C and lacks many of the extraction methods available
in DOM.

See http://www.evertwagenaar.com/index.php/2017/02/04/parsers-dom-or-sax/
for more.


Op di 28 mrt. 2017 om 15:27 schreef Karthikeyan P <[hidden email]>

Hello,

I have a list of xml files in a directory , I have to parse these xml using
apache lucene and index it. Once indexing is done , I want to be able to
search text inside xml files. How can I achieve this? I am able to search
text files in a similar way, can someone help me with xml lucene search??


Regards,

Karthik.

--
Sent from Gmail IPad
Loading...