how to balance index and search

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

how to balance index and search

James liu-2
I find index html will make tomcat obtain cpu 100% . It make seach become
slow.

So how to balance index and search.


web i use apache+php

solr i use tomcat 6+java1.6


Any suguesstion i m waiting.

--
regards
jl
Reply | Threaded
Open this post in threaded view
|

Re: how to balance index and search

James liu-2
Can people from cnet tell how to use solr in CNET.COM ?


2007/3/16, James liu <[hidden email]>:

>
> I find index html will make tomcat obtain cpu 100% . It make seach become
> slow.
>
> So how to balance index and search.
>
>
> web i use apache+php
>
> solr i use tomcat 6+java1.6
>
>
> Any suguesstion i m waiting.
>
> --
> regards
> jl




--
regards
jl
Reply | Threaded
Open this post in threaded view
|

Re: how to balance index and search

Chris Hostetter-3
In reply to this post by James liu-2

if your indexing while searching is causing problems, one way to reduce
the impact is to index on a master instance and then use the replication
scripts to sync it up with a slave instance (where all of your searches
happen)

if you are specificly seeing high CPU when indexing HTML, that's probably
because the HTML Analyzers have to do a lot of compelx stuff to strip out
hte HTML ... another option might be to parse that HTML on the client side
before sending it to Solr.

: I find index html will make tomcat obtain cpu 100% . It make seach become
: slow.
:
: So how to balance index and search.
:
:
: web i use apache+php
:
: solr i use tomcat 6+java1.6



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: how to balance index and search

Chris Hostetter-3
In reply to this post by James liu-2

: Can people from cnet tell how to use solr in CNET.COM ?

I really don't understand your question, here's some links to CNET.com
that use Solr...

http://www.cnet.com/4244-5_1-0.html?query=ipod
http://search.news.com/search?q=apple
http://reviews.cnet.com/4566-3121-0.html



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: how to balance index and search

James liu-2
In reply to this post by Chris Hostetter-3
2007/3/17, Chris Hostetter <[hidden email]>:
>
>
> if your indexing while searching is causing problems, one way to reduce
> the impact is to index on a master instance and then use the replication
> scripts to sync it up with a slave instance (where all of your searches
> happen)


I think it have problem that we use win2003 and i remember replication
scripts have problem in FreeBSD.

if you are specificly seeing high CPU when indexing HTML, that's probably
> because the HTML Analyzers have to do a lot of compelx stuff to strip out
> hte HTML ... another option might be to parse that HTML on the client side
> before sending it to Solr.


Spider crawl html data into MS sql server. I just get data from SQL Server
and curl it to solr.
Tomorrow i will test under this option .


: I find index html will make tomcat obtain cpu 100% . It make seach become
> : slow.
> :
> : So how to balance index and search.
> :
> :
> : web i use apache+php
> :
> : solr i use tomcat 6+java1.6

-Hoss
>
>


--
regards
jl
Reply | Threaded
Open this post in threaded view
|

Re: how to balance index and search

James liu-2
In reply to this post by Chris Hostetter-3
2007/3/17, Chris Hostetter <[hidden email]>:

>
>
> : Can people from cnet tell how to use solr in CNET.COM ?
>
> I really don't understand your question, here's some links to CNET.com
> that use Solr...
>
> http://www.cnet.com/4244-5_1-0.html?query=ipod
> http://search.news.com/search?q=apple
> http://reviews.cnet.com/4566-3121-0.html


I just wana know CNET.com's index and search architecture if it can be
public.
Many people who use solr or wanna use,,they all wanna know and learn.



-Hoss
>
>


--
regards
jl
Reply | Threaded
Open this post in threaded view
|

Re: how to balance index and search

Chris Hostetter-3
In reply to this post by James liu-2

: I think it have problem that we use win2003 and i remember replication

The scripts thta come with Solr don't work on windows becaues they rely on
hardlinks to efficinelty copy only things that have changed -- but the
principle of indexing on one server, creating "snapshots" (which could be
true copies instead of hardlinks) and the nreplicating those snapshots out
to slave servers for searching is still a solid one.

the hooks Solr provides for triggering snapshot creation on the master and
snapshot installation on the slave make it possible for you to implement
those anyway thta makes sense for your environment.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: how to balance index and search

Chris Hostetter-3
In reply to this post by James liu-2

: I just wana know CNET.com's index and search architecture if it can be
: public.
: Many people who use solr or wanna use,,they all wanna know and learn.

I'm not sure what to tell you: Solr *is* our search arch.  We have a dozen
or so Solr, indexes, all of them use hte master/slave model -- but they
are all configured in various ways based on the nature of the data and the
types of queries we do.  the news collection doesn't do faceted search and
surfacing new stories immediately is crucial, so they have small cache
configs, with very low auto warming, and replication cranked up to happen
very frequently; meanwhile hte product index where update latency of 20
minutes isn't the end of the world but we do want to support faceted
searching does snapinstalls only every 15 minutes (i think) with big
caches, that are 100% auto warmed.





-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: how to balance index and search

James liu-2
In reply to this post by Chris Hostetter-3
2007/3/19, Chris Hostetter <[hidden email]>:
>
>
> : I think it have problem that we use win2003 and i remember replication
>
> The scripts thta come with Solr don't work on windows becaues they rely on
> hardlinks to efficinelty copy only things that have changed -- but the
> principle of indexing on one server, creating "snapshots" (which could be
> true copies instead of hardlinks) and the nreplicating those snapshots out
> to slave servers for searching is still a solid one.


Now i m reading cwRsync which is Rsync in Window.

the hooks Solr provides for triggering snapshot creation on the master and
> snapshot installation on the slave make it possible for you to implement
> those anyway thta makes sense for your environment.
>
-Hoss
>
>


--
regards
jl
Reply | Threaded
Open this post in threaded view
|

Re: how to balance index and search

James liu-2
In reply to this post by Chris Hostetter-3
2007/3/19, Chris Hostetter <[hidden email]>:
>
>
> : I just wana know CNET.com's index and search architecture if it can be
> : public.
> : Many people who use solr or wanna use,,they all wanna know and learn.
>
> I'm not sure what to tell you: Solr *is* our search arch.


Below information  that i wanna learn. Thks  Chris.

Maybe this thing should add to wiki. I think person will be happy reading
it.

  We have a dozen

> or so Solr, indexes, all of them use hte master/slave model -- but they
> are all configured in various ways based on the nature of the data and the
> types of queries we do.  the news collection doesn't do faceted search and
> surfacing new stories immediately is crucial, so they have small cache
> configs, with very low auto warming, and replication cranked up to happen
> very frequently; meanwhile hte product index where update latency of 20
> minutes isn't the end of the world but we do want to support faceted
> searching does snapinstalls only every 15 minutes (i think) with big
> caches, that are 100% auto warmed.
>
>
>
>
>
> -Hoss
>
>


--
regards
jl