Lucene as xml store

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene as xml store

Namrata Kumari
 
hi,
 
I am a beginner to lucene , So kindly excuse me if the questions mentioned a
bit naive.
- Can I use lucene as an xml store + search engine?
- What I understood is that if we want to perform search on xml doc. we need
to parse xml document, form indexes and on the basis of fields perform
search.
- So, does this mean, that even if we use lucene as xml store (IF WE CAN!!),
we need to parse it to form indexes?
 
Please reply to this as soon as possible
 
Regards,
Namrata
 
 
Reply | Threaded
Open this post in threaded view
|

RE: Lucene as xml store

Namrata Kumari
Hi Otis,

But can lucene be used as an xml store i.e. storing original xml documents
as it is?

Regards,
Namrata  

-----Original Message-----
From: Otis Gospodnetic [mailto:[hidden email]]
Sent: Friday, July 22, 2005 11:50 AM
To: [hidden email]
Subject: Re: Lucene as xml store

Hi Namrata,

Yes, you would need to parse the XML.
Here is one way to do it:
  http://www-128.ibm.com/developerworks/java/library/j-lucene/

Otis


--- Namrata Kumari <[hidden email]> wrote:

>  
> hi,
>  
> I am a beginner to lucene , So kindly excuse me if the questions
> mentioned a bit naive.
> - Can I use lucene as an xml store + search engine?
> - What I understood is that if we want to perform search on xml doc.
> we need
> to parse xml document, form indexes and on the basis of fields perform
> search.
> - So, does this mean, that even if we use lucene as xml store (IF WE
> CAN!!), we need to parse it to form indexes?
>  
> Please reply to this as soon as possible
>  
> Regards,
> Namrata
>  
>  
>
Reply | Threaded
Open this post in threaded view
|

Re: Lucene as xml store

Otis Gospodnetic-2
In reply to this post by Namrata Kumari
Hi Namrata,

Yes, you would need to parse the XML.
Here is one way to do it:
  http://www-128.ibm.com/developerworks/java/library/j-lucene/

Otis


--- Namrata Kumari <[hidden email]> wrote:

>  
> hi,
>  
> I am a beginner to lucene , So kindly excuse me if the questions
> mentioned a
> bit naive.
> - Can I use lucene as an xml store + search engine?
> - What I understood is that if we want to perform search on xml doc.
> we need
> to parse xml document, form indexes and on the basis of fields
> perform
> search.
> - So, does this mean, that even if we use lucene as xml store (IF WE
> CAN!!),
> we need to parse it to form indexes?
>  
> Please reply to this as soon as possible
>  
> Regards,
> Namrata
>  
>  
>

Reply | Threaded
Open this post in threaded view
|

RE: Lucene as xml store

Namrata Kumari
In reply to this post by Namrata Kumari
Hey Erik,

Thanks for the info.

- Well, the application I want to develop is more like storing xml files and
with each of them having different structure. And then performing search on
them that in turn can depend on the structure of the xml doc and user's
requirement.

- Moreover, I did not exactly understood as to how I can store the xml
document. I mean, I went through the java doc and couldnot figure out the
api's that could be used for this purpose. Can you guide me in this?

- But the biggest question is: Is Lucene a good option [which now I doubt on
the basis of what I have read till now :-(]

Regards,
Namrata


-----Original Message-----
From: Erik Hatcher [mailto:[hidden email]]
Sent: Friday, July 22, 2005 2:11 PM
To: [hidden email]
Subject: Re: Lucene as xml store


On Jul 22, 2005, at 1:07 AM, Namrata Kumari wrote:

>
> hi,
>
> I am a beginner to lucene , So kindly excuse me if the questions
> mentioned a bit naive.
> - Can I use lucene as an xml store + search engine?
> - What I understood is that if we want to perform search on xml doc.
> we need to parse xml document, form indexes and on the basis of fields
> perform search.
> - So, does this mean, that even if we use lucene as xml store (IF WE
> CAN!!), we need to parse it to form indexes?

Lucene is a search engine and only deals with text (Strings essentially).
Lucene is also a flat document space and doing queries for things
hierarchical is not how it was designed, but it can be done to a limited
degree depending on how data is indexed.

Yes, Lucene can store text as well as make it searchable - so you could
store an XML document in it as well.

You have not provided any information on the types of queries you need to
support or what the user experience will be like.  There are many ways to
use Lucene and whether it is suitable solution to your  
application depends on that information.   Tell us more about what  
you're wanting to do and we can guide you further.

> Please reply to this as soon as possible

That's what they all say!   :)   No need to say such a thing - if you  
have well articulated questions that are straightforward enough to answer,
you'll get responses quickly here.

     Erik

Reply | Threaded
Open this post in threaded view
|

Re: Lucene as xml store

Erik Hatcher
In reply to this post by Namrata Kumari

On Jul 22, 2005, at 1:07 AM, Namrata Kumari wrote:

>
> hi,
>
> I am a beginner to lucene , So kindly excuse me if the questions  
> mentioned a
> bit naive.
> - Can I use lucene as an xml store + search engine?
> - What I understood is that if we want to perform search on xml  
> doc. we need
> to parse xml document, form indexes and on the basis of fields perform
> search.
> - So, does this mean, that even if we use lucene as xml store (IF  
> WE CAN!!),
> we need to parse it to form indexes?

Lucene is a search engine and only deals with text (Strings  
essentially).  Lucene is also a flat document space and doing queries  
for things hierarchical is not how it was designed, but it can be  
done to a limited degree depending on how data is indexed.

Yes, Lucene can store text as well as make it searchable - so you  
could store an XML document in it as well.

You have not provided any information on the types of queries you  
need to support or what the user experience will be like.  There are  
many ways to use Lucene and whether it is suitable solution to your  
application depends on that information.   Tell us more about what  
you're wanting to do and we can guide you further.

> Please reply to this as soon as possible

That's what they all say!   :)   No need to say such a thing - if you  
have well articulated questions that are straightforward enough to  
answer, you'll get responses quickly here.

     Erik



Reply | Threaded
Open this post in threaded view
|

RE: Lucene as xml store

Hondros, Constantine
In reply to this post by Namrata Kumari
You are better off using an XML database like

http://xml.apache.org/xindice/
or
http://exist.sourceforge.net/

... which will allow you to perform fast XPath queries on your XML data.

-----Original Message-----
From: Namrata Kumari [mailto:[hidden email]]
Sent: 22 July 2005 10:37
To: [hidden email]
Subject: RE: Lucene as xml store


Hey Erik,

Thanks for the info.

- Well, the application I want to develop is more like storing xml files and
with each of them having different structure. And then performing search on
them that in turn can depend on the structure of the xml doc and user's
requirement.

- Moreover, I did not exactly understood as to how I can store the xml
document. I mean, I went through the java doc and couldnot figure out the
api's that could be used for this purpose. Can you guide me in this?

- But the biggest question is: Is Lucene a good option [which now I doubt on
the basis of what I have read till now :-(]

Regards,
Namrata


-----Original Message-----
From: Erik Hatcher [mailto:[hidden email]]
Sent: Friday, July 22, 2005 2:11 PM
To: [hidden email]
Subject: Re: Lucene as xml store


On Jul 22, 2005, at 1:07 AM, Namrata Kumari wrote:

>
> hi,
>
> I am a beginner to lucene , So kindly excuse me if the questions
> mentioned a bit naive.
> - Can I use lucene as an xml store + search engine?
> - What I understood is that if we want to perform search on xml doc.
> we need to parse xml document, form indexes and on the basis of fields
> perform search.
> - So, does this mean, that even if we use lucene as xml store (IF WE
> CAN!!), we need to parse it to form indexes?

Lucene is a search engine and only deals with text (Strings essentially).
Lucene is also a flat document space and doing queries for things
hierarchical is not how it was designed, but it can be done to a limited
degree depending on how data is indexed.

Yes, Lucene can store text as well as make it searchable - so you could
store an XML document in it as well.

You have not provided any information on the types of queries you need to
support or what the user experience will be like.  There are many ways to
use Lucene and whether it is suitable solution to your  
application depends on that information.   Tell us more about what  
you're wanting to do and we can guide you further.

> Please reply to this as soon as possible

That's what they all say!   :)   No need to say such a thing - if you  
have well articulated questions that are straightforward enough to answer,
you'll get responses quickly here.

     Erik


--
The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.

Reply | Threaded
Open this post in threaded view
|

Re: Lucene as xml store

Erik Hatcher
In reply to this post by Namrata Kumari

On Jul 22, 2005, at 4:37 AM, Namrata Kumari wrote:
> - Well, the application I want to develop is more like storing xml  
> files and
> with each of them having different structure. And then performing  
> search on
> them that in turn can depend on the structure of the xml doc and  
> user's
> requirement.

That's still a pretty generic requirement.  What type of queries?  
XPath?

> - Moreover, I did not exactly understood as to how I can store the xml
> document. I mean, I went through the java doc and couldnot figure  
> out the
> api's that could be used for this purpose. Can you guide me in this?

Look at the various types of fields.  There is a "stored" attribute  
on Field that allows the field to be stored.

> - But the biggest question is: Is Lucene a good option [which now I  
> doubt on
> the basis of what I have read till now :-(]

It really all depends.  I built a search engine for the Rossetti  
Archive (http://www.rossettiarchive.org/rose/) which indexes XML  
files like this:

     http://www.rossettiarchive.org/docs/1-1847.s244.raw.xml

XPath queries are not possible into the XML, but that is also not a  
use case for the system.  Highly structured queries such as this one  
are supported because the indexing process extracted detailed  
information from the XML files:

     <a href="http://www.rossettiarchive.org/rose/?query=%2Bgenre%3Asonnet+%2B%">http://www.rossettiarchive.org/rose/?query=%2Bgenre%3Asonnet+%2B% 
28author%3Arossetti+OR+author%3Adgr%29+%2Byear%3A%5B1850+TO+1870%5D

I still do not have a clear cut understanding of your needs and thus  
still not sure if Lucene is suitable or not.  Certainly for full-text  
searches it is a fine choice, but the structured queries are a  
different story.

     Erik


>
> Regards,
> Namrata
>
>
> -----Original Message-----
> From: Erik Hatcher [mailto:[hidden email]]
> Sent: Friday, July 22, 2005 2:11 PM
> To: [hidden email]
> Subject: Re: Lucene as xml store
>
>
> On Jul 22, 2005, at 1:07 AM, Namrata Kumari wrote:
>
>
>>
>> hi,
>>
>> I am a beginner to lucene , So kindly excuse me if the questions
>> mentioned a bit naive.
>> - Can I use lucene as an xml store + search engine?
>> - What I understood is that if we want to perform search on xml doc.
>> we need to parse xml document, form indexes and on the basis of  
>> fields
>> perform search.
>> - So, does this mean, that even if we use lucene as xml store (IF WE
>> CAN!!), we need to parse it to form indexes?
>>
>
> Lucene is a search engine and only deals with text (Strings  
> essentially).
> Lucene is also a flat document space and doing queries for things
> hierarchical is not how it was designed, but it can be done to a  
> limited
> degree depending on how data is indexed.
>
> Yes, Lucene can store text as well as make it searchable - so you  
> could
> store an XML document in it as well.
>
> You have not provided any information on the types of queries you  
> need to
> support or what the user experience will be like.  There are many  
> ways to
> use Lucene and whether it is suitable solution to your
> application depends on that information.   Tell us more about what
> you're wanting to do and we can guide you further.
>
>
>> Please reply to this as soon as possible
>>
>
> That's what they all say!   :)   No need to say such a thing - if you
> have well articulated questions that are straightforward enough to  
> answer,
> you'll get responses quickly here.
>
>      Erik
>