Incremental Indexing.

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Incremental Indexing.

장용석
Hi~.
I hava a question about lucene incremental indexing.

I want to do incremental indexing my goods data.
For example, I have 4 products datas with
"GOOD_ID","NAME","PRICE","CREATEDATE","UPDATEDATE" colunms.

1, ipod, 30000, 2008-11-10:11:00, 2008-11-10:11:00
2, java book, 20000, 2008-11-10:11:00, 2008-11-10:11:00
3, calendar, 10000, 2008-11-10:11:00, 2008-11-10:11:00
4, lucene book, 5000, 2008-11-10:11:00, 2008-11-10:11:00

If I will Indexing these datas, they will have a unique docid.

And I update one of them that has good_id "1", price colunm 30000 to 35000
and UPDATEDATE colunm 2008-11-10:11:00 to 2008-11-10:12:00.

In this case , I want update my index with new data good_id "1".

In book, If I want to update my index then I should delete target data from
index and add data to index.
If the target data is one, I think It is no matter for me, and applications.
But if the target datas are over 3000 (or more) , this applcations must do
job delete data and add data each 3000(or more) times.
I worried about It will be problem to my applications.
Or Is this job no matter?

I need your helps.. :-)

many thanks.
Jang.

--
DEV용식
http://devyongsik.tistory.com
Reply | Threaded
Open this post in threaded view
|

Re: Incremental Indexing.

Jason Rutherglen
Hi Jang,

I've been working on Tag Index to address this issue.  It seems like a
popular feature and I have not had time to fully implement it yet.
http://issues.apache.org/jira/browse/LUCENE-1292  To be technical it
handles UN_TOKENIZED fields (did this name change now?) and some
specialized things to allow updating of parts of the inverted index.
If you're interested in working on it, feel free to let me know.

Cheers,
Jason

2008/9/8 장용석 <[hidden email]>:

> Hi~.
> I hava a question about lucene incremental indexing.
>
> I want to do incremental indexing my goods data.
> For example, I have 4 products datas with
> "GOOD_ID","NAME","PRICE","CREATEDATE","UPDATEDATE" colunms.
>
> 1, ipod, 30000, 2008-11-10:11:00, 2008-11-10:11:00
> 2, java book, 20000, 2008-11-10:11:00, 2008-11-10:11:00
> 3, calendar, 10000, 2008-11-10:11:00, 2008-11-10:11:00
> 4, lucene book, 5000, 2008-11-10:11:00, 2008-11-10:11:00
>
> If I will Indexing these datas, they will have a unique docid.
>
> And I update one of them that has good_id "1", price colunm 30000 to 35000
> and UPDATEDATE colunm 2008-11-10:11:00 to 2008-11-10:12:00.
>
> In this case , I want update my index with new data good_id "1".
>
> In book, If I want to update my index then I should delete target data from
> index and add data to index.
> If the target data is one, I think It is no matter for me, and applications.
> But if the target datas are over 3000 (or more) , this applcations must do
> job delete data and add data each 3000(or more) times.
> I worried about It will be problem to my applications.
> Or Is this job no matter?
>
> I need your helps.. :-)
>
> many thanks.
> Jang.
>
> --
> DEV용식
> http://devyongsik.tistory.com
>
Reply | Threaded
Open this post in threaded view
|

Re: Incremental Indexing.

Ian Lea
In reply to this post by 장용석
Such incremental indexing is standard practice and unlikely to cause a
problem, particularly if you are only working with a few thousand
documents.  Instead of delete/add you could use
IndexWriter.updateDocument().


--
Ian.


2008/9/9 장용석 <[hidden email]>:

> Hi~.
> I hava a question about lucene incremental indexing.
>
> I want to do incremental indexing my goods data.
> For example, I have 4 products datas with
> "GOOD_ID","NAME","PRICE","CREATEDATE","UPDATEDATE" colunms.
>
> 1, ipod, 30000, 2008-11-10:11:00, 2008-11-10:11:00
> 2, java book, 20000, 2008-11-10:11:00, 2008-11-10:11:00
> 3, calendar, 10000, 2008-11-10:11:00, 2008-11-10:11:00
> 4, lucene book, 5000, 2008-11-10:11:00, 2008-11-10:11:00
>
> If I will Indexing these datas, they will have a unique docid.
>
> And I update one of them that has good_id "1", price colunm 30000 to 35000
> and UPDATEDATE colunm 2008-11-10:11:00 to 2008-11-10:12:00.
>
> In this case , I want update my index with new data good_id "1".
>
> In book, If I want to update my index then I should delete target data from
> index and add data to index.
> If the target data is one, I think It is no matter for me, and applications.
> But if the target datas are over 3000 (or more) , this applcations must do
> job delete data and add data each 3000(or more) times.
> I worried about It will be problem to my applications.
> Or Is this job no matter?
>
> I need your helps.. :-)
>
> many thanks.
> Jang.
>
> --
> DEV용식
> http://devyongsik.tistory.com
>
Reply | Threaded
Open this post in threaded view
|

Re: Incremental Indexing.

장용석
Thanks for your helps.
I have about 400000 documents in my index and it is constant update (price
or name.. etc).
I will try use function delete and add.

And Jason
I am interested in it (actually about lucene),
but I am worried I do not understand core logic all about lucene and I am
not good at english.

I was stuied about just Analyzer. For making Korean Analyzer.

I wonder I can do something nevertheless. :-)

--
Jang.




08. 9. 9, Ian Lea <[hidden email]>님이 작성:

>
> Such incremental indexing is standard practice and unlikely to cause a
> problem, particularly if you are only working with a few thousand
> documents.  Instead of delete/add you could use
> IndexWriter.updateDocument().
>
>
> --
> Ian.
>
>
> 2008/9/9 장용석 <[hidden email]>:
> > Hi~.
> > I hava a question about lucene incremental indexing.
> >
> > I want to do incremental indexing my goods data.
> > For example, I have 4 products datas with
> > "GOOD_ID","NAME","PRICE","CREATEDATE","UPDATEDATE" colunms.
> >
> > 1, ipod, 30000, 2008-11-10:11:00, 2008-11-10:11:00
> > 2, java book, 20000, 2008-11-10:11:00, 2008-11-10:11:00
> > 3, calendar, 10000, 2008-11-10:11:00, 2008-11-10:11:00
> > 4, lucene book, 5000, 2008-11-10:11:00, 2008-11-10:11:00
> >
> > If I will Indexing these datas, they will have a unique docid.
> >
> > And I update one of them that has good_id "1", price colunm 30000 to
> 35000
> > and UPDATEDATE colunm 2008-11-10:11:00 to 2008-11-10:12:00.
> >
> > In this case , I want update my index with new data good_id "1".
> >
> > In book, If I want to update my index then I should delete target data
> from
> > index and add data to index.
> > If the target data is one, I think It is no matter for me, and
> applications.
> > But if the target datas are over 3000 (or more) , this applcations must
> do
> > job delete data and add data each 3000(or more) times.
> > I worried about It will be problem to my applications.
> > Or Is this job no matter?
> >
> > I need your helps.. :-)
> >
> > many thanks.
> > Jang.
> >
> > --
> > DEV용식
> > http://devyongsik.tistory.com
> >
>



--
DEV용식
http://devyongsik.tistory.com
Reply | Threaded
Open this post in threaded view
|

Re: Incremental Indexing.

Jason Rutherglen
Hi Jang,

Yes, and I have not completed it either... Perhaps when I do you can use it.

Best regards,
Jason

On Tue, Sep 9, 2008 at 9:20 PM, 장용석 <[hidden email]> wrote:

> Thanks for your helps.
> I have about 400000 documents in my index and it is constant update (price
> or name.. etc).
> I will try use function delete and add.
>
> And Jason
> I am interested in it (actually about lucene),
> but I am worried I do not understand core logic all about lucene and I am
> not good at english.
>
> I was stuied about just Analyzer. For making Korean Analyzer.
>
> I wonder I can do something nevertheless. :-)
>
> --
> Jang.
>
>
>
>
> 08. 9. 9, Ian Lea <[hidden email]>님이 작성:
>>
>> Such incremental indexing is standard practice and unlikely to cause a
>> problem, particularly if you are only working with a few thousand
>> documents.  Instead of delete/add you could use
>> IndexWriter.updateDocument().
>>
>>
>> --
>> Ian.
>>
>>
>> 2008/9/9 장용석 <[hidden email]>:
>> > Hi~.
>> > I hava a question about lucene incremental indexing.
>> >
>> > I want to do incremental indexing my goods data.
>> > For example, I have 4 products datas with
>> > "GOOD_ID","NAME","PRICE","CREATEDATE","UPDATEDATE" colunms.
>> >
>> > 1, ipod, 30000, 2008-11-10:11:00, 2008-11-10:11:00
>> > 2, java book, 20000, 2008-11-10:11:00, 2008-11-10:11:00
>> > 3, calendar, 10000, 2008-11-10:11:00, 2008-11-10:11:00
>> > 4, lucene book, 5000, 2008-11-10:11:00, 2008-11-10:11:00
>> >
>> > If I will Indexing these datas, they will have a unique docid.
>> >
>> > And I update one of them that has good_id "1", price colunm 30000 to
>> 35000
>> > and UPDATEDATE colunm 2008-11-10:11:00 to 2008-11-10:12:00.
>> >
>> > In this case , I want update my index with new data good_id "1".
>> >
>> > In book, If I want to update my index then I should delete target data
>> from
>> > index and add data to index.
>> > If the target data is one, I think It is no matter for me, and
>> applications.
>> > But if the target datas are over 3000 (or more) , this applcations must
>> do
>> > job delete data and add data each 3000(or more) times.
>> > I worried about It will be problem to my applications.
>> > Or Is this job no matter?
>> >
>> > I need your helps.. :-)
>> >
>> > many thanks.
>> > Jang.
>> >
>> > --
>> > DEV용식
>> > http://devyongsik.tistory.com
>> >
>>
>
>
>
> --
> DEV용식
> http://devyongsik.tistory.com
>