commit with only commitData

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

commit with only commitData

Shai Erera
Hi

Today, you cannot call IW.commit(userData) twice, even if the userData's content is not null (or different) in the two calls.
Is there any particular reason why we prevent someone from doing that?

For instance, when one works with a search and taxonomy indexes, we found it useful to store some commit
data in both indexes to keep them in sync, so that e.g. when you reopen both, you can make sure the two actually
match.
However, for some indexing sessions, no new categories will be added to the index, therefore any commit that
will be called on TaxoWriter will silently be ignored, even if commitData is passed.

I've asked around and discovered that more people had a need for that - storing some global-application information which
e.g. denotes the state of this index in the overall app. Because commitData cannot be used like that, they add a dummy
document to the index with that info, which they always update, and also make sure to filter it out during search.

I don't think that adding dummy documents to the index is good, especially not if you need to ensure they're filtered
out. Also, it's currently not possible to add dummy documents to the taxonomy index, but let's leave that aside for now.

So, why shouldn't we let someone commit by only changing userData? What would be the harm? I can see two ways to allow that:

1) If commit() is called and nothing has changed, don't create a new commit point, only if commit(userData) is called.

2) Alternatively, remove userData from the commit() API (that will simplify prpeareCommit API too !), and exchange with an
   IndexWriter.setCommitData() API, which will also mark that IW has pending changes, and therefore must commit.

Maybe option #2 will make it clear to both users of IW (and us developers) that the application requests to make a transaction
to this IW instance. It also removes the duplicate commit and prepareCommit API.

Thoughts?

Shai

Reply | Threaded
Open this post in threaded view
|

Re: commit with only commitData

Michael McCandless-2
I agree this (skipping a commit if there "seems to be" no changes) is annoying.

I think a separate API would make sense?  Then we'd just set the
changed bit in IW so that the next commit we always write it.

Mike McCandless

http://blog.mikemccandless.com

On Sat, Nov 24, 2012 at 12:08 AM, Shai Erera <[hidden email]> wrote:

> Hi
>
> Today, you cannot call IW.commit(userData) twice, even if the userData's
> content is not null (or different) in the two calls.
> Is there any particular reason why we prevent someone from doing that?
>
> For instance, when one works with a search and taxonomy indexes, we found it
> useful to store some commit
> data in both indexes to keep them in sync, so that e.g. when you reopen
> both, you can make sure the two actually
> match.
> However, for some indexing sessions, no new categories will be added to the
> index, therefore any commit that
> will be called on TaxoWriter will silently be ignored, even if commitData is
> passed.
>
> I've asked around and discovered that more people had a need for that -
> storing some global-application information which
> e.g. denotes the state of this index in the overall app. Because commitData
> cannot be used like that, they add a dummy
> document to the index with that info, which they always update, and also
> make sure to filter it out during search.
>
> I don't think that adding dummy documents to the index is good, especially
> not if you need to ensure they're filtered
> out. Also, it's currently not possible to add dummy documents to the
> taxonomy index, but let's leave that aside for now.
>
> So, why shouldn't we let someone commit by only changing userData? What
> would be the harm? I can see two ways to allow that:
>
> 1) If commit() is called and nothing has changed, don't create a new commit
> point, only if commit(userData) is called.
>
> 2) Alternatively, remove userData from the commit() API (that will simplify
> prpeareCommit API too !), and exchange with an
>    IndexWriter.setCommitData() API, which will also mark that IW has pending
> changes, and therefore must commit.
>
> Maybe option #2 will make it clear to both users of IW (and us developers)
> that the application requests to make a transaction
> to this IW instance. It also removes the duplicate commit and prepareCommit
> API.
>
> Thoughts?
>
> Shai
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: commit with only commitData

Shai Erera
I will open an issue so we can continue discussing.

Shai

On Mon, Nov 26, 2012 at 6:17 PM, Michael McCandless <[hidden email]> wrote:
I agree this (skipping a commit if there "seems to be" no changes) is annoying.

I think a separate API would make sense?  Then we'd just set the
changed bit in IW so that the next commit we always write it.

Mike McCandless

http://blog.mikemccandless.com

On Sat, Nov 24, 2012 at 12:08 AM, Shai Erera <[hidden email]> wrote:
> Hi
>
> Today, you cannot call IW.commit(userData) twice, even if the userData's
> content is not null (or different) in the two calls.
> Is there any particular reason why we prevent someone from doing that?
>
> For instance, when one works with a search and taxonomy indexes, we found it
> useful to store some commit
> data in both indexes to keep them in sync, so that e.g. when you reopen
> both, you can make sure the two actually
> match.
> However, for some indexing sessions, no new categories will be added to the
> index, therefore any commit that
> will be called on TaxoWriter will silently be ignored, even if commitData is
> passed.
>
> I've asked around and discovered that more people had a need for that -
> storing some global-application information which
> e.g. denotes the state of this index in the overall app. Because commitData
> cannot be used like that, they add a dummy
> document to the index with that info, which they always update, and also
> make sure to filter it out during search.
>
> I don't think that adding dummy documents to the index is good, especially
> not if you need to ensure they're filtered
> out. Also, it's currently not possible to add dummy documents to the
> taxonomy index, but let's leave that aside for now.
>
> So, why shouldn't we let someone commit by only changing userData? What
> would be the harm? I can see two ways to allow that:
>
> 1) If commit() is called and nothing has changed, don't create a new commit
> point, only if commit(userData) is called.
>
> 2) Alternatively, remove userData from the commit() API (that will simplify
> prpeareCommit API too !), and exchange with an
>    IndexWriter.setCommitData() API, which will also mark that IW has pending
> changes, and therefore must commit.
>
> Maybe option #2 will make it clear to both users of IW (and us developers)
> that the application requests to make a transaction
> to this IW instance. It also removes the duplicate commit and prepareCommit
> API.
>
> Thoughts?
>
> Shai
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]