JdbcDirectory

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

JdbcDirectory

Guilherme Barile
Hello,
        We're starting a new project, which basically catalogs everything we  
have in the department (different objects with different metadata),  
and as I used Lucene before, I'm preparing a presentation to the  
team, as I think it would really simplify the storage of metadata and  
documents.
        The system will be pretty straightforward, all items will be  
cataloged,  and most of them won't be changed too much ( I'll raise  
this question later ).
        So, here are my main concerns, hope you can help

1) Storing all data (index and content) wasn't recommended in the  
past, as the index could become corrupted. Do I have this problem if  
I use a JdbcDirectory (PostgreSQL backend) ? I already read about the  
performance degradation when using a database as main storage, but  
this won't be a problem.

2) Lucene doesn't support incremental editing (a new Document will be  
created when someone edits an item), so is it possible to manage some  
kind of versioning ? Anyone ever implemented something this way ?

Thanks a lot for the attention

Guilherme Barile
Prosoma Informática
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: JdbcDirectory

Askar Zaidi
1) I don't understand why the index would get corrupted. We store huge data
and meta-data using Lucene.
2) For this, I synced Lucene with the DB operations. If you use Hibernate,
theres an API for that. Or, you could just write your own factory methods to
add/delete/edit index documents when a DB operation takes place (e.g edit).

On 9/3/07, Guilherme Barile <[hidden email]> wrote:

>
> Hello,
>         We're starting a new project, which basically catalogs everything
> we
> have in the department (different objects with different metadata),
> and as I used Lucene before, I'm preparing a presentation to the
> team, as I think it would really simplify the storage of metadata and
> documents.
>         The system will be pretty straightforward, all items will be
> cataloged,  and most of them won't be changed too much ( I'll raise
> this question later ).
>         So, here are my main concerns, hope you can help
>
> 1) Storing all data (index and content) wasn't recommended in the
> past, as the index could become corrupted. Do I have this problem if
> I use a JdbcDirectory (PostgreSQL backend) ? I already read about the
> performance degradation when using a database as main storage, but
> this won't be a problem.
>
> 2) Lucene doesn't support incremental editing (a new Document will be
> created when someone edits an item), so is it possible to manage some
> kind of versioning ? Anyone ever implemented something this way ?
>
> Thanks a lot for the attention
>
> Guilherme Barile
> Prosoma Informática
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: JdbcDirectory

Guilherme Barile
> 1) I don't understand why the index would get corrupted. We store  
> huge data
> and meta-data using Lucene.

I got that information when lucene 1.4 was the lastest version, may  
have changed. I'll trust you.

> 2) For this, I synced Lucene with the DB operations. If you use  
> Hibernate,
> theres an API for that. Or, you could just write your own factory  
> methods to
> add/delete/edit index documents when a DB operation takes place  
> (e.g edit).

You mean every time you update the db, you update the index also ?  
I'm actually planning not to use any external entity, and rely  
everything on Lucene. Wondered if some simple query (get the lastest  
document for example) would solve the versioning issue

Thanks a lot

Gui

>
> On 9/3/07, Guilherme Barile <[hidden email]> wrote:
>>
>> Hello,
>>         We're starting a new project, which basically catalogs  
>> everything
>> we
>> have in the department (different objects with different metadata),
>> and as I used Lucene before, I'm preparing a presentation to the
>> team, as I think it would really simplify the storage of metadata and
>> documents.
>>         The system will be pretty straightforward, all items will be
>> cataloged,  and most of them won't be changed too much ( I'll raise
>> this question later ).
>>         So, here are my main concerns, hope you can help
>>
>> 1) Storing all data (index and content) wasn't recommended in the
>> past, as the index could become corrupted. Do I have this problem if
>> I use a JdbcDirectory (PostgreSQL backend) ? I already read about the
>> performance degradation when using a database as main storage, but
>> this won't be a problem.
>>
>> 2) Lucene doesn't support incremental editing (a new Document will be
>> created when someone edits an item), so is it possible to manage some
>> kind of versioning ? Anyone ever implemented something this way ?
>>
>> Thanks a lot for the attention
>>
>> Guilherme Barile
>> Prosoma Informática
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: JdbcDirectory

Askar Zaidi
Yes. Every time a user updates a piece of information, you do the update in
the DB as well as the Index. If you are using Hibernate, they have an API
that does this mapping. I am not sure why you plan to store data in the
Index ?? Storing data is the DBs job, searching is the Index job. I would
suggest you have both (a schema for your data and an index). Thats what I
did. I could have stored everything on the Lucene Index, but I am scared as
the application grows I will need a DB system eventually. I don't think
people use Lucene to "store" data just as they do it in a DBMS.

-
Askar

On 9/3/07, Guilherme Barile <[hidden email]> wrote:

>
> > 1) I don't understand why the index would get corrupted. We store
> > huge data
> > and meta-data using Lucene.
>
> I got that information when lucene 1.4 was the lastest version, may
> have changed. I'll trust you.
>
> > 2) For this, I synced Lucene with the DB operations. If you use
> > Hibernate,
> > theres an API for that. Or, you could just write your own factory
> > methods to
> > add/delete/edit index documents when a DB operation takes place
> > (e.g edit).
>
> You mean every time you update the db, you update the index also ?
> I'm actually planning not to use any external entity, and rely
> everything on Lucene. Wondered if some simple query (get the lastest
> document for example) would solve the versioning issue
>
> Thanks a lot
>
> Gui
>
> >
> > On 9/3/07, Guilherme Barile <[hidden email]> wrote:
> >>
> >> Hello,
> >>         We're starting a new project, which basically catalogs
> >> everything
> >> we
> >> have in the department (different objects with different metadata),
> >> and as I used Lucene before, I'm preparing a presentation to the
> >> team, as I think it would really simplify the storage of metadata and
> >> documents.
> >>         The system will be pretty straightforward, all items will be
> >> cataloged,  and most of them won't be changed too much ( I'll raise
> >> this question later ).
> >>         So, here are my main concerns, hope you can help
> >>
> >> 1) Storing all data (index and content) wasn't recommended in the
> >> past, as the index could become corrupted. Do I have this problem if
> >> I use a JdbcDirectory (PostgreSQL backend) ? I already read about the
> >> performance degradation when using a database as main storage, but
> >> this won't be a problem.
> >>
> >> 2) Lucene doesn't support incremental editing (a new Document will be
> >> created when someone edits an item), so is it possible to manage some
> >> kind of versioning ? Anyone ever implemented something this way ?
> >>
> >> Thanks a lot for the attention
> >>
> >> Guilherme Barile
> >> Prosoma Informática
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: JdbcDirectory

Guilherme Barile
Storing the data in the index, mainly for non-structured data.
We plan to implement something like this ThingDB from http://
demo.openlibrary.org/about/tech, and though that maybe lucene +  
JdbcDirectory could act as a backend.

gui

On Sep 3, 2007, at 2:34 PM, Askar Zaidi wrote:

> Yes. Every time a user updates a piece of information, you do the  
> update in
> the DB as well as the Index. If you are using Hibernate, they have  
> an API
> that does this mapping. I am not sure why you plan to store data in  
> the
> Index ?? Storing data is the DBs job, searching is the Index job. I  
> would
> suggest you have both (a schema for your data and an index). Thats  
> what I
> did. I could have stored everything on the Lucene Index, but I am  
> scared as
> the application grows I will need a DB system eventually. I don't  
> think
> people use Lucene to "store" data just as they do it in a DBMS.
>
> -
> Askar
>
> On 9/3/07, Guilherme Barile <[hidden email]> wrote:
>>
>>> 1) I don't understand why the index would get corrupted. We store
>>> huge data
>>> and meta-data using Lucene.
>>
>> I got that information when lucene 1.4 was the lastest version, may
>> have changed. I'll trust you.
>>
>>> 2) For this, I synced Lucene with the DB operations. If you use
>>> Hibernate,
>>> theres an API for that. Or, you could just write your own factory
>>> methods to
>>> add/delete/edit index documents when a DB operation takes place
>>> (e.g edit).
>>
>> You mean every time you update the db, you update the index also ?
>> I'm actually planning not to use any external entity, and rely
>> everything on Lucene. Wondered if some simple query (get the lastest
>> document for example) would solve the versioning issue
>>
>> Thanks a lot
>>
>> Gui
>>
>>>
>>> On 9/3/07, Guilherme Barile <[hidden email]> wrote:
>>>>
>>>> Hello,
>>>>         We're starting a new project, which basically catalogs
>>>> everything
>>>> we
>>>> have in the department (different objects with different metadata),
>>>> and as I used Lucene before, I'm preparing a presentation to the
>>>> team, as I think it would really simplify the storage of  
>>>> metadata and
>>>> documents.
>>>>         The system will be pretty straightforward, all items  
>>>> will be
>>>> cataloged,  and most of them won't be changed too much ( I'll raise
>>>> this question later ).
>>>>         So, here are my main concerns, hope you can help
>>>>
>>>> 1) Storing all data (index and content) wasn't recommended in the
>>>> past, as the index could become corrupted. Do I have this  
>>>> problem if
>>>> I use a JdbcDirectory (PostgreSQL backend) ? I already read  
>>>> about the
>>>> performance degradation when using a database as main storage, but
>>>> this won't be a problem.
>>>>
>>>> 2) Lucene doesn't support incremental editing (a new Document  
>>>> will be
>>>> created when someone edits an item), so is it possible to manage  
>>>> some
>>>> kind of versioning ? Anyone ever implemented something this way ?
>>>>
>>>> Thanks a lot for the attention
>>>>
>>>> Guilherme Barile
>>>> Prosoma Informática
>>>> -------------------------------------------------------------------
>>>> --
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Data in the Index [was: JdbcDirectory]

Guilherme Barile
So,
        Anyone ever stored the data in the index also ? What are your  
experiences ?

Thanks a lot

Gui

On Sep 3, 2007, at 3:47 PM, Guilherme Barile wrote:

> Storing the data in the index, mainly for non-structured data.
> We plan to implement something like this ThingDB from http://
> demo.openlibrary.org/about/tech, and though that maybe lucene +  
> JdbcDirectory could act as a backend.
>
> gui
>
> On Sep 3, 2007, at 2:34 PM, Askar Zaidi wrote:
>
>> Yes. Every time a user updates a piece of information, you do the  
>> update in
>> the DB as well as the Index. If you are using Hibernate, they have  
>> an API
>> that does this mapping. I am not sure why you plan to store data  
>> in the
>> Index ?? Storing data is the DBs job, searching is the Index job.  
>> I would
>> suggest you have both (a schema for your data and an index). Thats  
>> what I
>> did. I could have stored everything on the Lucene Index, but I am  
>> scared as
>> the application grows I will need a DB system eventually. I don't  
>> think
>> people use Lucene to "store" data just as they do it in a DBMS.
>>
>> -
>> Askar
>>
>> On 9/3/07, Guilherme Barile <[hidden email]> wrote:
>>>
>>>> 1) I don't understand why the index would get corrupted. We store
>>>> huge data
>>>> and meta-data using Lucene.
>>>
>>> I got that information when lucene 1.4 was the lastest version, may
>>> have changed. I'll trust you.
>>>
>>>> 2) For this, I synced Lucene with the DB operations. If you use
>>>> Hibernate,
>>>> theres an API for that. Or, you could just write your own factory
>>>> methods to
>>>> add/delete/edit index documents when a DB operation takes place
>>>> (e.g edit).
>>>
>>> You mean every time you update the db, you update the index also ?
>>> I'm actually planning not to use any external entity, and rely
>>> everything on Lucene. Wondered if some simple query (get the lastest
>>> document for example) would solve the versioning issue
>>>
>>> Thanks a lot
>>>
>>> Gui
>>>
>>>>
>>>> On 9/3/07, Guilherme Barile <[hidden email]> wrote:
>>>>>
>>>>> Hello,
>>>>>         We're starting a new project, which basically catalogs
>>>>> everything
>>>>> we
>>>>> have in the department (different objects with different  
>>>>> metadata),
>>>>> and as I used Lucene before, I'm preparing a presentation to the
>>>>> team, as I think it would really simplify the storage of  
>>>>> metadata and
>>>>> documents.
>>>>>         The system will be pretty straightforward, all items  
>>>>> will be
>>>>> cataloged,  and most of them won't be changed too much ( I'll  
>>>>> raise
>>>>> this question later ).
>>>>>         So, here are my main concerns, hope you can help
>>>>>
>>>>> 1) Storing all data (index and content) wasn't recommended in the
>>>>> past, as the index could become corrupted. Do I have this  
>>>>> problem if
>>>>> I use a JdbcDirectory (PostgreSQL backend) ? I already read  
>>>>> about the
>>>>> performance degradation when using a database as main storage, but
>>>>> this won't be a problem.
>>>>>
>>>>> 2) Lucene doesn't support incremental editing (a new Document  
>>>>> will be
>>>>> created when someone edits an item), so is it possible to  
>>>>> manage some
>>>>> kind of versioning ? Anyone ever implemented something this way ?
>>>>>
>>>>> Thanks a lot for the attention
>>>>>
>>>>> Guilherme Barile
>>>>> Prosoma Informática
>>>>> ------------------------------------------------------------------
>>>>> ---
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>>>
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Data in the Index [was: JdbcDirectory]

Patrek
Hi,

At first, we thought we would use a "dual" approach, an Lucene index
and a RDBMS for storage.

While prototyping, for simplicity sake, we used the Lucene index as
storage, thinking we could easily replace it later. So far, speed is
satisfying enough that we are going to keep data there util retrieval
performance becomes an issue.

But our data is mainly "static", with changes only a few times a week.

Our main data is an XML document.

Hope this helps.

Patrick

On 9/4/07, Guilherme Barile <[hidden email]> wrote:
> So,
>         Anyone ever stored the data in the index also ? What are your
> experiences ?
>
> Thanks a lot
>
> Gui
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Data in the Index [was: JdbcDirectory]

chrislusf
In reply to this post by Guilherme Barile
I store Lucene index outside database, and run indexing periodically to get
the latest updates, not depending on ORM APIs.
In general, search data can be slower to update unless some realtime
requirements.

Storing data in index saves trips to databases. This usually is a huge
difference on rendering speed comparing to retrieve the rest data from
database.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes

On 9/4/07, Guilherme Barile <[hidden email]> wrote:

>
> So,
>         Anyone ever stored the data in the index also ? What are your
> experiences ?
>
> Thanks a lot
>
> Gui
>
> On Sep 3, 2007, at 3:47 PM, Guilherme Barile wrote:
>
> > Storing the data in the index, mainly for non-structured data.
> > We plan to implement something like this ThingDB from http://
> > demo.openlibrary.org/about/tech, and though that maybe lucene +
> > JdbcDirectory could act as a backend.
> >
> > gui
> >
> > On Sep 3, 2007, at 2:34 PM, Askar Zaidi wrote:
> >
> >> Yes. Every time a user updates a piece of information, you do the
> >> update in
> >> the DB as well as the Index. If you are using Hibernate, they have
> >> an API
> >> that does this mapping. I am not sure why you plan to store data
> >> in the
> >> Index ?? Storing data is the DBs job, searching is the Index job.
> >> I would
> >> suggest you have both (a schema for your data and an index). Thats
> >> what I
> >> did. I could have stored everything on the Lucene Index, but I am
> >> scared as
> >> the application grows I will need a DB system eventually. I don't
> >> think
> >> people use Lucene to "store" data just as they do it in a DBMS.
> >>
> >> -
> >> Askar
> >>
> >> On 9/3/07, Guilherme Barile <[hidden email]> wrote:
> >>>
> >>>> 1) I don't understand why the index would get corrupted. We store
> >>>> huge data
> >>>> and meta-data using Lucene.
> >>>
> >>> I got that information when lucene 1.4 was the lastest version, may
> >>> have changed. I'll trust you.
> >>>
> >>>> 2) For this, I synced Lucene with the DB operations. If you use
> >>>> Hibernate,
> >>>> theres an API for that. Or, you could just write your own factory
> >>>> methods to
> >>>> add/delete/edit index documents when a DB operation takes place
> >>>> (e.g edit).
> >>>
> >>> You mean every time you update the db, you update the index also ?
> >>> I'm actually planning not to use any external entity, and rely
> >>> everything on Lucene. Wondered if some simple query (get the lastest
> >>> document for example) would solve the versioning issue
> >>>
> >>> Thanks a lot
> >>>
> >>> Gui
> >>>
> >>>>
> >>>> On 9/3/07, Guilherme Barile <[hidden email]> wrote:
> >>>>>
> >>>>> Hello,
> >>>>>         We're starting a new project, which basically catalogs
> >>>>> everything
> >>>>> we
> >>>>> have in the department (different objects with different
> >>>>> metadata),
> >>>>> and as I used Lucene before, I'm preparing a presentation to the
> >>>>> team, as I think it would really simplify the storage of
> >>>>> metadata and
> >>>>> documents.
> >>>>>         The system will be pretty straightforward, all items
> >>>>> will be
> >>>>> cataloged,  and most of them won't be changed too much ( I'll
> >>>>> raise
> >>>>> this question later ).
> >>>>>         So, here are my main concerns, hope you can help
> >>>>>
> >>>>> 1) Storing all data (index and content) wasn't recommended in the
> >>>>> past, as the index could become corrupted. Do I have this
> >>>>> problem if
> >>>>> I use a JdbcDirectory (PostgreSQL backend) ? I already read
> >>>>> about the
> >>>>> performance degradation when using a database as main storage, but
> >>>>> this won't be a problem.
> >>>>>
> >>>>> 2) Lucene doesn't support incremental editing (a new Document
> >>>>> will be
> >>>>> created when someone edits an item), so is it possible to
> >>>>> manage some
> >>>>> kind of versioning ? Anyone ever implemented something this way ?
> >>>>>
> >>>>> Thanks a lot for the attention
> >>>>>
> >>>>> Guilherme Barile
> >>>>> Prosoma Informática
> >>>>> ------------------------------------------------------------------
> >>>>> ---
> >>>>> To unsubscribe, e-mail: [hidden email]
> >>>>> For additional commands, e-mail: [hidden email]
> >>>>>
> >>>>>
> >>>
> >>>
> >>> --------------------------------------------------------------------
> >>> -
> >>> To unsubscribe, e-mail: [hidden email]
> >>> For additional commands, e-mail: [hidden email]
> >>>
> >>>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>