IndexWriter: setRAMBufferSizeMB

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

IndexWriter: setRAMBufferSizeMB

spring
Hi,

if I understand this property correctly every time the ram buffer is full it
gets automaticaly written to disk. Something like a commit in a database.
Thus if my application dies, all docs in the buffer get lost. Right?

If so, is there any event/callback etc. which informs my application that
such a commit happend?

Thank you.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IndexWriter: setRAMBufferSizeMB

Michael McCandless-2
Well ... every time the RAM buffer is full, a new segment is flushed
to the Directory, but that is not necessarily a "commit" in that
an IndexReader would see the new segment, nor, that the segment would
survive if the machine suddenly crashed.

You should't rely on when specifically IndexWriter makes its changes
visible to readers.  The best way to be sure is to close the writer.

There is work underway now, in this issue:

   https://issues.apache.org/jira/browse/LUCENE-1044

that will add an explicit "commit" call, which you would use to 1)
make the changes visible to readers, and 2) sync the index such that
if the machine crashed (after commit returns) then your changes as of
the commit will survive.  But it's not committed yet ... it will be in
2.4.

One way for a reader to check if a new commit has happened is to
call the isCurrent method.  Maybe that helps?

Mike

<[hidden email]> wrote:

> Hi,
>
> if I understand this property correctly every time the ram buffer  
> is full it
> gets automaticaly written to disk. Something like a commit in a  
> database.
> Thus if my application dies, all docs in the buffer get lost. Right?
>
> If so, is there any event/callback etc. which informs my  
> application that
> such a commit happend?
>
> Thank you.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: IndexWriter: setRAMBufferSizeMB

spring
OK, so there is nothing in 2.3 besides IndexWriter.close to ensure that the
docs are written to disk and that the index will survive an application /
machine death?

> -----Original Message-----
> From: Michael McCandless [mailto:[hidden email]]
> Sent: Freitag, 8. Februar 2008 19:34
> To: [hidden email]
> Subject: Re: IndexWriter: setRAMBufferSizeMB
>
> Well ... every time the RAM buffer is full, a new segment is flushed
> to the Directory, but that is not necessarily a "commit" in that
> an IndexReader would see the new segment, nor, that the segment would
> survive if the machine suddenly crashed.
>
> You should't rely on when specifically IndexWriter makes its changes
> visible to readers.  The best way to be sure is to close the writer.
>
> There is work underway now, in this issue:
>
>    https://issues.apache.org/jira/browse/LUCENE-1044
>
> that will add an explicit "commit" call, which you would use to 1)
> make the changes visible to readers, and 2) sync the index such that
> if the machine crashed (after commit returns) then your changes as of
> the commit will survive.  But it's not committed yet ... it will be in
> 2.4.
>
> One way for a reader to check if a new commit has happened is to
> call the isCurrent method.  Maybe that helps?
>
> Mike
>
> <[hidden email]> wrote:
>
> > Hi,
> >
> > if I understand this property correctly every time the ram buffer  
> > is full it
> > gets automaticaly written to disk. Something like a commit in a  
> > database.
> > Thus if my application dies, all docs in the buffer get lost. Right?
> >
> > If so, is there any event/callback etc. which informs my  
> > application that
> > such a commit happend?
> >
> > Thank you.
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IndexWriter: setRAMBufferSizeMB

Michael McCandless-2

It's complicated.

In 2.3, you can use IW.flush to write docs to disk.  But that method  
will be deprecated in 2.4 and replaced with commit.  Or, you can close.

If application (jvm) dies or killed, the index will be fine but won't  
have any un-flushed buffered docs.

If machine dies (os crashes, power cord pulled) then there is a real  
risk that the index will become corrupt.  This is because Lucene has  
never explicitly sync()'d the files to ensure they are on stable  
storage.  LUCENE-1044 fixes that (adds syncs).

Mike

<[hidden email]> wrote:

> OK, so there is nothing in 2.3 besides IndexWriter.close to ensure  
> that the
> docs are written to disk and that the index will survive an  
> application /
> machine death?
>
>> -----Original Message-----
>> From: Michael McCandless [mailto:[hidden email]]
>> Sent: Freitag, 8. Februar 2008 19:34
>> To: [hidden email]
>> Subject: Re: IndexWriter: setRAMBufferSizeMB
>>
>> Well ... every time the RAM buffer is full, a new segment is flushed
>> to the Directory, but that is not necessarily a "commit" in that
>> an IndexReader would see the new segment, nor, that the segment would
>> survive if the machine suddenly crashed.
>>
>> You should't rely on when specifically IndexWriter makes its changes
>> visible to readers.  The best way to be sure is to close the writer.
>>
>> There is work underway now, in this issue:
>>
>>    https://issues.apache.org/jira/browse/LUCENE-1044
>>
>> that will add an explicit "commit" call, which you would use to 1)
>> make the changes visible to readers, and 2) sync the index such that
>> if the machine crashed (after commit returns) then your changes as of
>> the commit will survive.  But it's not committed yet ... it will  
>> be in
>> 2.4.
>>
>> One way for a reader to check if a new commit has happened is to
>> call the isCurrent method.  Maybe that helps?
>>
>> Mike
>>
>> <[hidden email]> wrote:
>>
>>> Hi,
>>>
>>> if I understand this property correctly every time the ram buffer
>>> is full it
>>> gets automaticaly written to disk. Something like a commit in a
>>> database.
>>> Thus if my application dies, all docs in the buffer get lost. Right?
>>>
>>> If so, is there any event/callback etc. which informs my
>>> application that
>>> such a commit happend?
>>>
>>> Thank you.
>>>
>>>
>>>
>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: IndexWriter: setRAMBufferSizeMB

spring
Thank you.
So I will call flush in 2.3 (and may lose data when machine dies) and
commit() in 2.4+ (here a sync() will save the data).

> -----Original Message-----
> From: Michael McCandless [mailto:[hidden email]]
> Sent: Freitag, 8. Februar 2008 21:01
> To: [hidden email]
> Subject: Re: IndexWriter: setRAMBufferSizeMB
>
>
> It's complicated.
>
> In 2.3, you can use IW.flush to write docs to disk.  But that method  
> will be deprecated in 2.4 and replaced with commit.  Or, you
> can close.
>
> If application (jvm) dies or killed, the index will be fine
> but won't  
> have any un-flushed buffered docs.
>
> If machine dies (os crashes, power cord pulled) then there is a real  
> risk that the index will become corrupt.  This is because Lucene has  
> never explicitly sync()'d the files to ensure they are on stable  
> storage.  LUCENE-1044 fixes that (adds syncs).
>
> Mike
>
> <[hidden email]> wrote:
>
> > OK, so there is nothing in 2.3 besides IndexWriter.close to ensure  
> > that the
> > docs are written to disk and that the index will survive an  
> > application /
> > machine death?
> >
> >> -----Original Message-----
> >> From: Michael McCandless [mailto:[hidden email]]
> >> Sent: Freitag, 8. Februar 2008 19:34
> >> To: [hidden email]
> >> Subject: Re: IndexWriter: setRAMBufferSizeMB
> >>
> >> Well ... every time the RAM buffer is full, a new segment
> is flushed
> >> to the Directory, but that is not necessarily a "commit" in that
> >> an IndexReader would see the new segment, nor, that the
> segment would
> >> survive if the machine suddenly crashed.
> >>
> >> You should't rely on when specifically IndexWriter makes
> its changes
> >> visible to readers.  The best way to be sure is to close
> the writer.
> >>
> >> There is work underway now, in this issue:
> >>
> >>    https://issues.apache.org/jira/browse/LUCENE-1044
> >>
> >> that will add an explicit "commit" call, which you would use to 1)
> >> make the changes visible to readers, and 2) sync the index
> such that
> >> if the machine crashed (after commit returns) then your
> changes as of
> >> the commit will survive.  But it's not committed yet ... it will  
> >> be in
> >> 2.4.
> >>
> >> One way for a reader to check if a new commit has happened is to
> >> call the isCurrent method.  Maybe that helps?
> >>
> >> Mike
> >>
> >> <[hidden email]> wrote:
> >>
> >>> Hi,
> >>>
> >>> if I understand this property correctly every time the ram buffer
> >>> is full it
> >>> gets automaticaly written to disk. Something like a commit in a
> >>> database.
> >>> Thus if my application dies, all docs in the buffer get
> lost. Right?
> >>>
> >>> If so, is there any event/callback etc. which informs my
> >>> application that
> >>> such a commit happend?
> >>>
> >>> Thank you.
> >>>
> >>>
> >>>
> >>
> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [hidden email]
> >>> For additional commands, e-mail: [hidden email]
> >>>
> >>
> >>
> >>
> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IndexWriter: setRAMBufferSizeMB

Michael McCandless-2

Exactly!

Mike

<[hidden email]> wrote:

> Thank you.
> So I will call flush in 2.3 (and may lose data when machine dies) and
> commit() in 2.4+ (here a sync() will save the data).
>
>> -----Original Message-----
>> From: Michael McCandless [mailto:[hidden email]]
>> Sent: Freitag, 8. Februar 2008 21:01
>> To: [hidden email]
>> Subject: Re: IndexWriter: setRAMBufferSizeMB
>>
>>
>> It's complicated.
>>
>> In 2.3, you can use IW.flush to write docs to disk.  But that method
>> will be deprecated in 2.4 and replaced with commit.  Or, you
>> can close.
>>
>> If application (jvm) dies or killed, the index will be fine
>> but won't
>> have any un-flushed buffered docs.
>>
>> If machine dies (os crashes, power cord pulled) then there is a real
>> risk that the index will become corrupt.  This is because Lucene has
>> never explicitly sync()'d the files to ensure they are on stable
>> storage.  LUCENE-1044 fixes that (adds syncs).
>>
>> Mike
>>
>> <[hidden email]> wrote:
>>
>>> OK, so there is nothing in 2.3 besides IndexWriter.close to ensure
>>> that the
>>> docs are written to disk and that the index will survive an
>>> application /
>>> machine death?
>>>
>>>> -----Original Message-----
>>>> From: Michael McCandless [mailto:[hidden email]]
>>>> Sent: Freitag, 8. Februar 2008 19:34
>>>> To: [hidden email]
>>>> Subject: Re: IndexWriter: setRAMBufferSizeMB
>>>>
>>>> Well ... every time the RAM buffer is full, a new segment
>> is flushed
>>>> to the Directory, but that is not necessarily a "commit" in that
>>>> an IndexReader would see the new segment, nor, that the
>> segment would
>>>> survive if the machine suddenly crashed.
>>>>
>>>> You should't rely on when specifically IndexWriter makes
>> its changes
>>>> visible to readers.  The best way to be sure is to close
>> the writer.
>>>>
>>>> There is work underway now, in this issue:
>>>>
>>>>    https://issues.apache.org/jira/browse/LUCENE-1044
>>>>
>>>> that will add an explicit "commit" call, which you would use to 1)
>>>> make the changes visible to readers, and 2) sync the index
>> such that
>>>> if the machine crashed (after commit returns) then your
>> changes as of
>>>> the commit will survive.  But it's not committed yet ... it will
>>>> be in
>>>> 2.4.
>>>>
>>>> One way for a reader to check if a new commit has happened is to
>>>> call the isCurrent method.  Maybe that helps?
>>>>
>>>> Mike
>>>>
>>>> <[hidden email]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> if I understand this property correctly every time the ram buffer
>>>>> is full it
>>>>> gets automaticaly written to disk. Something like a commit in a
>>>>> database.
>>>>> Thus if my application dies, all docs in the buffer get
>> lost. Right?
>>>>>
>>>>> If so, is there any event/callback etc. which informs my
>>>>> application that
>>>>> such a commit happend?
>>>>>
>>>>> Thank you.
>>>>>
>>>>>
>>>>>
>>>>
>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>>
>>>>
>>>>
>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>
>>>
>>>
>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]