Starting an index...

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Starting an index...

Jack L

I have played with the "example" directory for a while.
Everything seems to work well. Now I'd like to start my own
index and I have a few questions.

1. I suppose I can start from copying the whole example
directory and name it myindex. I understand that I need
to modify the solr/conf/schema.xml to suit my data. Besides
that, is there anything else that I must/should change?
I'll take a look at the stopwords.txt, etc. to see if any
changes is required. How about solr.war? Anything else I
need to customize? (I'm not a heavy java developer.)

2. For each index, do I need to copy this directory and start
a solr instance? Is it possible to run one solr instance
for multiple indices?

3. solr comes with jetty and it seems to work pretty well.
Is there any reason that I should switch to tomcat for
production servers?

--
Thanks,
Jack

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
Reply | Threaded
Open this post in threaded view
|

Re: Starting an index...

Chris Hostetter-3

: 1. I suppose I can start from copying the whole example
: directory and name it myindex. I understand that I need
: to modify the solr/conf/schema.xml to suit my data. Besides
: that, is there anything else that I must/should change?
: I'll take a look at the stopwords.txt, etc. to see if any
: changes is required. How about solr.war? Anything else I
: need to customize? (I'm not a heavy java developer.)

the only things you should need to worry about customizing are in the
solr/conf dir ... you should give a critical eye to all of those files
(there's some zany protwords.txt and synonyms.txt that only make
sense for the example data)

you shouldn't need to customize anything else, except the configuration
for your servlet container to get it to run solr at the URL you want, and
to get it to log things the way you want.

: 2. For each index, do I need to copy this directory and start
: a solr instance? Is it possible to run one solr instance
: for multiple indices?

no, each instance manages a single schema and a single data index -- but
thta schema can allow for various differnt types of documents that don't
need to have anything in common.

: 3. solr comes with jetty and it seems to work pretty well.
: Is there any reason that I should switch to tomcat for
: production servers?

it is entirely personal prefrence ... the use of Jetty shouldn't be
considered an endorsement, it's just a free, pure java servlet container
that was the easiest to bundle into a self contained demo.


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Starting an index...

Erik Hatcher
In reply to this post by Jack L

On Feb 21, 2007, at 4:37 PM, Jack L wrote:
> 2. For each index, do I need to copy this directory and start
> a solr instance? Is it possible to run one solr instance
> for multiple indices?

Further on this than Hoss mentioned... you can share a common  
configuration among multiple Solr instances without copying the  
directory by using system property substitutions recently added to Solr:

        <http://wiki.apache.org/solr/SolrConfigXml#head- 
d30367ed9b794622d81c4320aad69a7575cd03d3>

These substitutions work in both schema.xml and solrconfig.xml.

        Erik
Reply | Threaded
Open this post in threaded view
|

ReStarting an index...

Jack L
In reply to this post by Chris Hostetter-3
Thanks Chris and Eric for the replies. Very helpful.

> no, each instance manages a single schema and a single data index -- but
> thta schema can allow for various differnt types of documents that don't
> need to have anything in common.

Does this mean that as long as I have the schema for all doc
types (which essentially means a larger schema file) set up,
then I can just throw any doc types to it, provided
that there is no conflict among the field names? And
the fields are flat among different doc types?

Is there a way to specify the doc types other than having
it as one of the fields so that I can query against to get
a specific type?

--
Best regards,
Jack



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
Reply | Threaded
Open this post in threaded view
|

Re: ReStarting an index...

Erik Hatcher

On Feb 21, 2007, at 9:29 PM, Jack L wrote:

> Thanks Chris and Eric for the replies. Very helpful.
>
>> no, each instance manages a single schema and a single data index  
>> -- but
>> thta schema can allow for various differnt types of documents that  
>> don't
>> need to have anything in common.
>
> Does this mean that as long as I have the schema for all doc
> types (which essentially means a larger schema file) set up,
> then I can just throw any doc types to it, provided
> that there is no conflict among the field names?

Wouldn't even matter if there were field name "conflicts".  A field  
by any other name is just a field.  All document types could have a  
"title" field, for example.

> And
> the fields are flat among different doc types?

I don't understand what you mean by flat here.  By definition, a  
document in Solr/Lucene is "flat" in that it has fields, but no  
hierarchy beyond that.

> Is there a way to specify the doc types other than having
> it as one of the fields so that I can query against to get
> a specific type?

No, there isn't another way.  Solr doesn't impose any semantics on  
the _types of documents_ you index... it's up to the client to do  
that.  But adding a simple "type" field to every document facilitates  
some amazing stuff :)

        Erik

Reply | Threaded
Open this post in threaded view
|

Re[4]: Starting an index...

Jack L
Hello Erik,

> Wouldn't even matter if there were field name "conflicts".  A field
> by any other name is just a field.  All document types could have a
> "title" field, for example.

That makes sense.

I wonder what happens if I change the schema after some documents
have been inserted? Is this allowed at all? Will the index become
corrupted if I add/remove some fields? Or change the field properties?

Jack


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
Reply | Threaded
Open this post in threaded view
|

Re: Re[4]: Starting an index...

Walter Underwood, Netflix
On 2/22/07 1:37 PM, "Jack L" <[hidden email]> wrote:

> I wonder what happens if I change the schema after some documents
> have been inserted? Is this allowed at all? Will the index become
> corrupted if I add/remove some fields? Or change the field properties?

The schema just controls the input mapping. After the fields
are indexed, the schema doesn't control them.

wunder

Reply | Threaded
Open this post in threaded view
|

Re: Re[4]: Starting an index...

Erik Hatcher

On Feb 22, 2007, at 4:49 PM, Walter Underwood wrote:

> On 2/22/07 1:37 PM, "Jack L" <[hidden email]> wrote:
>
>> I wonder what happens if I change the schema after some documents
>> have been inserted? Is this allowed at all? Will the index become
>> corrupted if I add/remove some fields? Or change the field  
>> properties?
>
> The schema just controls the input mapping. After the fields
> are indexed, the schema doesn't control them.

However I think things would get funky if you remove some fields from  
the schema that are in the index, and then a document with fields not  
in the schema gets returned... no?

I suggest, Jack, that you look into the dynamic field feature.  You  
can have fields mapped in the schema with a wildcard, such as  
*_text.  Then your client can add anything_text it likes.  No need to  
remove from the schema.

I guess I'm weary of a mismatch between an index and the schema, and  
can't say that I recommend that at this point without seeing where it  
may have issues.

        Erik

Reply | Threaded
Open this post in threaded view
|

Re: Re[4]: Starting an index...

Chris Hostetter-3

: I guess I'm weary of a mismatch between an index and the schema, and
: can't say that I recommend that at this point without seeing where it
: may have issues.

modifying a schema without rebuilding the index from scratch is in fact
"deep voodoo" -- some things work okay; some things break horribly;
some things are safe to change on "query slaves" (machines that only
search against an index) once the full index has been replicated over from
the master - which needed the changes and a complete deletion of hte index
before indexing the new records ... etc.

enumerating all of the things you can/can't get away with is beyond the
scope of what i'm willing to try and type up right now ... suffice to say
there are tricks, and once you get more comfortable with Solr you can
experiment with those tricks.



-Hoss