dynamicField + copyField = dynamicCopyField?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

dynamicField + copyField = dynamicCopyField?

Darren Vengroff-2
Dear Solrians,

I've been looking through the IndexSchema and DocumentBuilder code, hoping
to find something that is essentially a combination of what dynamicField and
copyField do.  Specifically, I'd like to say that any field that matches a
given pattern should be copied to a specific multivalued destination field.
For example, my config might contain something like:

    <!-- All text fields get thrown in here for default indexing. -->
    <field name="text" type="text" indexed="true" stored="false"
multiValued="true"/>

    <!-- Search text by default so we get all text fields. -->
    <defaultSearchField>text</defaultSearchField>

    <!--
        Any field ending in _t is assumed to be a text field and is thrown
into text,
        despite the fact that we may not have known of its specific
existence when
        we created the config file.
    -->
    <dynamicCopyField source="*_t" dest="text" />

The advantage of this approach over pure copyField is that we don't have to
know the full set of text fields up front when we are writing the config
file.  If a field name matches both a dynamicField and a dynamicCopyField,
then both behaviors should probably occur, although config files will not
typically be written this way.

Has anything like this been discussed or implemented before?

Looking forward to your feedback.

Thanks,

-D


Reply | Threaded
Open this post in threaded view
|

Re: dynamicField + copyField = dynamicCopyField?

Yonik Seeley
Darren,

Having copyField functionality available to dynamic fields makes total
sense... it just hasn't been implemented yet.

http://www.mail-archive.com/solr-user@.../msg00019.html

-Yonik

On 6/5/06, Darren Vengroff <[hidden email]> wrote:

> Dear Solrians,
>
> I've been looking through the IndexSchema and DocumentBuilder code, hoping
> to find something that is essentially a combination of what dynamicField and
> copyField do.  Specifically, I'd like to say that any field that matches a
> given pattern should be copied to a specific multivalued destination field.
> For example, my config might contain something like:
>
>     <!-- All text fields get thrown in here for default indexing. -->
>     <field name="text" type="text" indexed="true" stored="false"
> multiValued="true"/>
>
>     <!-- Search text by default so we get all text fields. -->
>     <defaultSearchField>text</defaultSearchField>
>
>     <!--
>         Any field ending in _t is assumed to be a text field and is thrown
> into text,
>         despite the fact that we may not have known of its specific
> existence when
>         we created the config file.
>     -->
>     <dynamicCopyField source="*_t" dest="text" />
>
> The advantage of this approach over pure copyField is that we don't have to
> know the full set of text fields up front when we are writing the config
> file.  If a field name matches both a dynamicField and a dynamicCopyField,
> then both behaviors should probably occur, although config files will not
> typically be written this way.
>
> Has anything like this been discussed or implemented before?
>
> Looking forward to your feedback.
>
> Thanks,
>
> -D
Reply | Threaded
Open this post in threaded view
|

Re: dynamicField + copyField = dynamicCopyField?

Chris Hostetter-3
In reply to this post by Darren Vengroff-2

: copyField do.  Specifically, I'd like to say that any field that matches a
: given pattern should be copied to a specific multivalued destination field.
        ...
: Has anything like this been discussed or implemented before?


It's been discussed, which prompted a FAQ about it, but no one has had the
need/drive to impliment anything beyond what's currently supported...

http://wiki.apache.org/solr/FAQ#head-6b1d9dc2c14adecfe6fc5ce86448f15fc84baab9
http://www.nabble.com/Copy-Field-and-Dynamic-Fields-t1228725.html#a3252483

It would be a fairly cool feature to have though if you're interested in
implimenting it.  One thing i wasn't clear on in your email though...

: file.  If a field name matches both a dynamicField and a dynamicCopyField,
: then both behaviors should probably occur, although config files will not
: typically be written this way.

...why not?  Isn't that the most common case? (declaring a dynamicField
with the pattern "*_t" and having a dynamicCopyField that writes copies
anything matching "*_t" to a generic "text" field).



-Hoss

Reply | Threaded
Open this post in threaded view
|

RE: dynamicField + copyField = dynamicCopyField?

Darren Vengroff-2
OK, I'll take a crack at it.

I thought about the comment that puzzled you below.  I think you are right,
it would be typical.  I don't know why I thought otherwise.

-D

-----Original Message-----
From: Chris Hostetter [mailto:[hidden email]]
Sent: Monday, June 05, 2006 2:13 PM
To: [hidden email]
Subject: Re: dynamicField + copyField = dynamicCopyField?


: copyField do.  Specifically, I'd like to say that any field that matches a
: given pattern should be copied to a specific multivalued destination
field.
        ...
: Has anything like this been discussed or implemented before?


It's been discussed, which prompted a FAQ about it, but no one has had the
need/drive to impliment anything beyond what's currently supported...

http://wiki.apache.org/solr/FAQ#head-6b1d9dc2c14adecfe6fc5ce86448f15fc84baab
9
http://www.nabble.com/Copy-Field-and-Dynamic-Fields-t1228725.html#a3252483

It would be a fairly cool feature to have though if you're interested in
implimenting it.  One thing i wasn't clear on in your email though...

: file.  If a field name matches both a dynamicField and a dynamicCopyField,
: then both behaviors should probably occur, although config files will not
: typically be written this way.

...why not?  Isn't that the most common case? (declaring a dynamicField
with the pattern "*_t" and having a dynamicCopyField that writes copies
anything matching "*_t" to a generic "text" field).



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: dynamicField + copyField = dynamicCopyField?

Yonik Seeley
In reply to this post by Darren Vengroff-2
Darren, I'm not sure how familiar you are with Lucene, but if you are
using dynamicFields (or a lot of indexed fields), check out the
omitNorms attribute.

Even if your indexed field values are sparse (contained only on a few
documents), Lucene keeps a 1 byte norm for each document in the index
for each indexed field by default.  That can really add up depending
on what you are doing.

I've considered making it the default for non-text fields in the
schema, but I'm worried that some people might try index-time boosts,
and they won't work w/o norms.

-Yonik

On 6/5/06, Darren Vengroff <[hidden email]> wrote:

> Dear Solrians,
>
> I've been looking through the IndexSchema and DocumentBuilder code, hoping
> to find something that is essentially a combination of what dynamicField and
> copyField do.  Specifically, I'd like to say that any field that matches a
> given pattern should be copied to a specific multivalued destination field.
> For example, my config might contain something like:
>
>     <!-- All text fields get thrown in here for default indexing. -->
>     <field name="text" type="text" indexed="true" stored="false"
> multiValued="true"/>
>
>     <!-- Search text by default so we get all text fields. -->
>     <defaultSearchField>text</defaultSearchField>
>
>     <!--
>         Any field ending in _t is assumed to be a text field and is thrown
> into text,
>         despite the fact that we may not have known of its specific
> existence when
>         we created the config file.
>     -->
>     <dynamicCopyField source="*_t" dest="text" />
>
> The advantage of this approach over pure copyField is that we don't have to
> know the full set of text fields up front when we are writing the config
> file.  If a field name matches both a dynamicField and a dynamicCopyField,
> then both behaviors should probably occur, although config files will not
> typically be written this way.
>
> Has anything like this been discussed or implemented before?
>
> Looking forward to your feedback.
>
> Thanks,
>
> -D
>
>
>


--
-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server
Reply | Threaded
Open this post in threaded view
|

RE: dynamicField + copyField = dynamicCopyField?

Darren Vengroff-2
Sounds good, I will check it out.

-D

-----Original Message-----
From: Yonik Seeley [mailto:[hidden email]]
Sent: Tuesday, June 06, 2006 6:26 PM
To: [hidden email]
Subject: Re: dynamicField + copyField = dynamicCopyField?

Darren, I'm not sure how familiar you are with Lucene, but if you are
using dynamicFields (or a lot of indexed fields), check out the
omitNorms attribute.

Even if your indexed field values are sparse (contained only on a few
documents), Lucene keeps a 1 byte norm for each document in the index
for each indexed field by default.  That can really add up depending
on what you are doing.

I've considered making it the default for non-text fields in the
schema, but I'm worried that some people might try index-time boosts,
and they won't work w/o norms.

-Yonik

On 6/5/06, Darren Vengroff <[hidden email]> wrote:
> Dear Solrians,
>
> I've been looking through the IndexSchema and DocumentBuilder code, hoping
> to find something that is essentially a combination of what dynamicField
and
> copyField do.  Specifically, I'd like to say that any field that matches a
> given pattern should be copied to a specific multivalued destination
field.

> For example, my config might contain something like:
>
>     <!-- All text fields get thrown in here for default indexing. -->
>     <field name="text" type="text" indexed="true" stored="false"
> multiValued="true"/>
>
>     <!-- Search text by default so we get all text fields. -->
>     <defaultSearchField>text</defaultSearchField>
>
>     <!--
>         Any field ending in _t is assumed to be a text field and is thrown
> into text,
>         despite the fact that we may not have known of its specific
> existence when
>         we created the config file.
>     -->
>     <dynamicCopyField source="*_t" dest="text" />
>
> The advantage of this approach over pure copyField is that we don't have
to

> know the full set of text fields up front when we are writing the config
> file.  If a field name matches both a dynamicField and a dynamicCopyField,
> then both behaviors should probably occur, although config files will not
> typically be written this way.
>
> Has anything like this been discussed or implemented before?
>
> Looking forward to your feedback.
>
> Thanks,
>
> -D
>
>
>


--
-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server