How?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

How?

Cooper Geng
Hi guys,
     Some problems confuse me. When I would like to index some data from a
table in database. While I create the index on this table, the  searching
job keeps going . How can I work out it?
By the way, the number of data is around 1 hundred million.

--
Best Regards
Cooper Geng
Reply | Threaded
Open this post in threaded view
|

Re: How?

Erick Erickson
You really have to tell us more about what you're trying to do
to get a meaningful reply.

What do you mean you create the index on a table? Are you
using some sort of embedded SQL to query the table then
creat a lucene index? How big is the index? What search
are you submitting? What does your search code look like?

Best
Erick

On Jan 14, 2008 10:47 PM, coolgeng coolgeng <[hidden email]> wrote:

> Hi guys,
>     Some problems confuse me. When I would like to index some data from a
> table in database. While I create the index on this table, the  searching
> job keeps going . How can I work out it?
> By the way, the number of data is around 1 hundred million.
>
> --
> Best Regards
> Cooper Geng
>
Reply | Threaded
Open this post in threaded view
|

Re: How?

Cooper Geng
firstly, I submit the query like "select * from [tablename]". And in this
table, there are around  30th columns and  40,000 rows data. And I use the
standrandAnalyzer to generate the index. And as my experience, it cost 200M
disk to store the index.

for example, I will search the "Name" field in the index and it does work
perfectly. But when I create an index on the 100,000,000 data. It will cost
too much time to index creation. So is there some good solution to solve
this?
thanks


On Jan 15, 2008 10:47 PM, Erick Erickson <[hidden email]> wrote:

> You really have to tell us more about what you're trying to do
> to get a meaningful reply.
>
> What do you mean you create the index on a table? Are you
> using some sort of embedded SQL to query the table then
> creat a lucene index? How big is the index? What search
> are you submitting? What does your search code look like?
>
> Best
> Erick
>
> On Jan 14, 2008 10:47 PM, coolgeng coolgeng <[hidden email]> wrote:
>
> > Hi guys,
> >     Some problems confuse me. When I would like to index some data from
> a
> > table in database. While I create the index on this table, the
>  searching
> > job keeps going . How can I work out it?
> > By the way, the number of data is around 1 hundred million.
> >
> > --
> > Best Regards
> > Cooper Geng
> >
>



--
Best Regards
Cooper Geng
Reply | Threaded
Open this post in threaded view
|

RE: How?

spring
> firstly, I submit the query like "select * from [tablename]".
> And in this
> table, there are around  30th columns and  40,000 rows data.
> And I use the
> standrandAnalyzer to generate the index.

Why don't you use a database index?


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How?

Erick Erickson
In reply to this post by Cooper Geng
As I read your latest post, it's not *searching* that's taking too long, but
*indexing*.

Well, 100,000,000 rows is a lot. It'll never be just a few minutes. But I
also
have to ask whether the most time is being spent actually indexing or
fetching from the database? You could time this easily by just skipping the
indexing step and measuring say the time to fetch the first 1,000,000 rows
and comparing that against fetching *and* indexing.

The whole point of a search engine is to trade some up-front time
indexing for faster searching.

Otherwise, have you looked at this page?

http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

Also, 200M for 40K rows seems like a lot of disk space. Are you
storing all the data in the index? And do you need to?

Best
Erick

On Jan 16, 2008 3:24 AM, coolgeng coolgeng <[hidden email]> wrote:

> firstly, I submit the query like "select * from [tablename]". And in this
> table, there are around  30th columns and  40,000 rows data. And I use the
> standrandAnalyzer to generate the index. And as my experience, it cost
> 200M
> disk to store the index.
>
> for example, I will search the "Name" field in the index and it does work
> perfectly. But when I create an index on the 100,000,000 data. It will
> cost
> too much time to index creation. So is there some good solution to solve
> this?
> thanks
>
>
> On Jan 15, 2008 10:47 PM, Erick Erickson <[hidden email]> wrote:
>
> > You really have to tell us more about what you're trying to do
> > to get a meaningful reply.
> >
> > What do you mean you create the index on a table? Are you
> > using some sort of embedded SQL to query the table then
> > creat a lucene index? How big is the index? What search
> > are you submitting? What does your search code look like?
> >
> > Best
> > Erick
> >
> > On Jan 14, 2008 10:47 PM, coolgeng coolgeng <[hidden email]>
> wrote:
> >
> > > Hi guys,
> > >     Some problems confuse me. When I would like to index some data
> from
> > a
> > > table in database. While I create the index on this table, the
> >  searching
> > > job keeps going . How can I work out it?
> > > By the way, the number of data is around 1 hundred million.
> > >
> > > --
> > > Best Regards
> > > Cooper Geng
> > >
> >
>
>
>
> --
> Best Regards
> Cooper Geng
>
Reply | Threaded
Open this post in threaded view
|

Re: How?

Cooper Geng
In reply to this post by spring
I can use the cluster index on the table. But you can create only one
cluster index in a table. In this table , lots of data need to search, so I
choose the Lucene to do that.


On Jan 16, 2008 6:57 PM, <[hidden email]> wrote:

> > firstly, I submit the query like "select * from [tablename]".
> > And in this
> > table, there are around  30th columns and  40,000 rows data.
> > And I use the
> > standrandAnalyzer to generate the index.
>
> Why don't you use a database index?
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Best Regards
Cooper Geng
Reply | Threaded
Open this post in threaded view
|

RE: How?

spring
> I can use the cluster index on the table. But you can create only one
> cluster index in a table. In this table , lots of data need
> to search, so I
> choose the Lucene to do that.

Why do you need a clustered index in the database?
A non-clustered would do the job as well.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How?

Cooper Geng
A non-clustered and clustered index has resovle the problem, but Lucene can
not do the same thing like that?

On Jan 16, 2008 11:44 PM, <[hidden email]> wrote:

> > I can use the cluster index on the table. But you can create only one
> > cluster index in a table. In this table , lots of data need
> > to search, so I
> > choose the Lucene to do that.
>
> Why do you need a clustered index in the database?
> A non-clustered would do the job as well.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Best Regards
Cooper Geng
Reply | Threaded
Open this post in threaded view
|

RE: How?

spring
> A non-clustered and clustered index has resovle the problem,
> but Lucene can
> not do the same thing like that?

Well, I bet the database solution is the best, as long as you do not search
in big text fields or you need special fulltext features like fuzzy search
etc.

Synchronizing a lucene index with such a big database is pure overkill, as
long as the database does the job.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]