lucene indexing and field configuration or schema

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

lucene indexing and field configuration or schema

John Wang-9
Hi folks:

    Solr has schemas that defined per field configuration for the entire corpus, whereas Lucene determines the information from each individual document. So on that level, it is inconsistent.

    We found having the schema information up front allows us flexibilities in designing our posting list, also makes the indexingchain logic much simpler.

    Just wanted to toss out this idea. It may have been discussed before, but I was unable to google it.

Thanks

-John
Reply | Threaded
Open this post in threaded view
|

Re: lucene indexing and field configuration or schema

Adrien Grand
Hi John,

On Mon, Jun 10, 2013 at 8:17 PM, John Wang <[hidden email]> wrote:
>     We found having the schema information up front allows us flexibilities
> in designing our posting list, also makes the indexingchain logic much
> simpler.

Can you give examples of the kind of decisions that you are able to
make by having the schema up-front?

--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: lucene indexing and field configuration or schema

John Wang-9
Hey Adrian:

Sorry about the late reply. I somehow missed your email.

We are doing some customizations with the Lucene indexing pipeline, here are some examples we ran into that we used an external configuration file to help us:

1) default payload size: we define this in our config file to avoid storing a length per posting.
2) docvalue types: we have built updatable docvalue support for fix length types, e.g. int, long etc., we store the type in the configuration file, where as with lucene, a long would be used and could be wasteful for us.

Thanks

-John


On Tue, Jun 11, 2013 at 8:50 AM, Adrien Grand <[hidden email]> wrote:
Hi John,

On Mon, Jun 10, 2013 at 8:17 PM, John Wang <[hidden email]> wrote:
>     We found having the schema information up front allows us flexibilities
> in designing our posting list, also makes the indexingchain logic much
> simpler.

Can you give examples of the kind of decisions that you are able to
make by having the schema up-front?

--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]