[jira] Created: (SOLR-21) Dynamic copying of fields (allow wildcard sources in copyField)

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (SOLR-21) Dynamic copying of fields (allow wildcard sources in copyField)

Jorge Spinsanti (Jira)
Dynamic copying of fields (allow wildcard sources in copyField)
---------------------------------------------------------------

         Key: SOLR-21
         URL: http://issues.apache.org/jira/browse/SOLR-21
     Project: Solr
        Type: New Feature

  Components: update  
 Environment: all
    Reporter: Darren Erik Vengroff
 Attachments: dynamicCopy.patch

It would be really nice if it were possible to use wildcards to do things like:

    <copyField source="*_t" dest="text"/>

The above example copies all fields ending in "_t" to the "text" field.

I've put together a patch to do this. If there are mutlitple matches, all copies are done.  If there is a match in a dynamicField, then the dynamic field is also generated, subject to the existing rules that short expressions go first.  I tried to stick to the spirit of the code as I saw it, and made what I thought were a minimal reasonable set of changes.  The patch includes some additional tests in ConvertedLegacyTest.java to test the new functionality.  That may not be the  best  place for new tests, but it beats no tests.

I'd really like to get this, or some improved variant of it into the codebase, as it's quite important to my application.  Please review and comment/criticize as you see fit.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-21) Dynamic copying of fields (allow wildcard sources in copyField)

Jorge Spinsanti (Jira)
     [ http://issues.apache.org/jira/browse/SOLR-21?page=all ]

Darren Erik Vengroff updated SOLR-21:
-------------------------------------

    Attachment: dynamicCopy.patch

Here is the patch.

> Dynamic copying of fields (allow wildcard sources in copyField)
> ---------------------------------------------------------------
>
>          Key: SOLR-21
>          URL: http://issues.apache.org/jira/browse/SOLR-21
>      Project: Solr
>         Type: New Feature

>   Components: update
>  Environment: all
>     Reporter: Darren Erik Vengroff
>  Attachments: dynamicCopy.patch
>
> It would be really nice if it were possible to use wildcards to do things like:
>     <copyField source="*_t" dest="text"/>
> The above example copies all fields ending in "_t" to the "text" field.
> I've put together a patch to do this. If there are mutlitple matches, all copies are done.  If there is a match in a dynamicField, then the dynamic field is also generated, subject to the existing rules that short expressions go first.  I tried to stick to the spirit of the code as I saw it, and made what I thought were a minimal reasonable set of changes.  The patch includes some additional tests in ConvertedLegacyTest.java to test the new functionality.  That may not be the  best  place for new tests, but it beats no tests.
> I'd really like to get this, or some improved variant of it into the codebase, as it's quite important to my application.  Please review and comment/criticize as you see fit.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-21) Dynamic copying of fields (allow wildcard sources in copyField)

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)
    [ http://issues.apache.org/jira/browse/SOLR-21?page=comments#action_12414922 ]

Darren Erik Vengroff commented on SOLR-21:
------------------------------------------

Note that this has previously been discussed elsewhere:

http://wiki.apache.org/solr/FAQ#head-6b1d9dc2c14adecfe6fc5ce86448f15fc84baab9
http://www.nabble.com/Copy-Field-and-Dynamic-Fields-t1228725.html#a3252483
http://www.mail-archive.com/solr-user@.../msg00019.html


> Dynamic copying of fields (allow wildcard sources in copyField)
> ---------------------------------------------------------------
>
>          Key: SOLR-21
>          URL: http://issues.apache.org/jira/browse/SOLR-21
>      Project: Solr
>         Type: New Feature

>   Components: update
>  Environment: all
>     Reporter: Darren Erik Vengroff
>  Attachments: dynamicCopy.patch
>
> It would be really nice if it were possible to use wildcards to do things like:
>     <copyField source="*_t" dest="text"/>
> The above example copies all fields ending in "_t" to the "text" field.
> I've put together a patch to do this. If there are mutlitple matches, all copies are done.  If there is a match in a dynamicField, then the dynamic field is also generated, subject to the existing rules that short expressions go first.  I tried to stick to the spirit of the code as I saw it, and made what I thought were a minimal reasonable set of changes.  The patch includes some additional tests in ConvertedLegacyTest.java to test the new functionality.  That may not be the  best  place for new tests, but it beats no tests.
> I'd really like to get this, or some improved variant of it into the codebase, as it's quite important to my application.  Please review and comment/criticize as you see fit.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (SOLR-21) Dynamic copying of fields (allow wildcard sources in copyField)

Yonik Seeley
In reply to this post by Jorge Spinsanti (Jira)
Thanks Darren!
I'm looking it over now.

-Yonik

On 6/6/06, Darren Erik Vengroff (JIRA) <[hidden email]> wrote:

> Dynamic copying of fields (allow wildcard sources in copyField)
> ---------------------------------------------------------------
>
>          Key: SOLR-21
>          URL: http://issues.apache.org/jira/browse/SOLR-21
>      Project: Solr
>         Type: New Feature
>
>   Components: update
>  Environment: all
>     Reporter: Darren Erik Vengroff
>  Attachments: dynamicCopy.patch
>
> It would be really nice if it were possible to use wildcards to do things like:
>
>     <copyField source="*_t" dest="text"/>
>
> The above example copies all fields ending in "_t" to the "text" field.
>
> I've put together a patch to do this. If there are mutlitple matches, all copies are done.  If there is a match in a dynamicField, then the dynamic field is also generated, subject to the existing rules that short expressions go first.  I tried to stick to the spirit of the code as I saw it, and made what I thought were a minimal reasonable set of changes.  The patch includes some additional tests in ConvertedLegacyTest.java to test the new functionality.  That may not be the  best  place for new tests, but it beats no tests.
>
> I'd really like to get this, or some improved variant of it into the codebase, as it's quite important to my application.  Please review and comment/criticize as you see fit.
>
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira
>
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (SOLR-21) Dynamic copying of fields (allow wildcard sources in copyField)

Yonik Seeley
Hi Darren,
I'm a bit confused about the meaning of hasExplicitField...

If I have a <copyField source="*_a" dest="*_b"/>
The dynamic fields *_a and *_b must both be defined, right?
In that case, it seems like "if it matches a field or dynamicField
declaration" would always be true, no?

+  /**
+   * Does the schema have the specified field defined explicitly, i.e.
+   * not as a result of a copyField declaration with a wildcard?  We
+   * consider it explicitly defined if it matches a field or dynamicField
+   * declaration.
+   * @param fieldName
+   * @return true if explicitly declared in the schema.
+   */
+  public boolean hasExplicitField(String fieldName) {
+    if(fields.containsKey(fieldName)) {
+      return true;
+    }
+
+    for (DynamicField df : dynamicFields) {
+      if (df.matches(fieldName)) return true;
+    }
+
+    return false;
+  }

-Yonik


On 6/6/06, Yonik Seeley <[hidden email]> wrote:

> Thanks Darren!
> I'm looking it over now.
>
> -Yonik
>
> On 6/6/06, Darren Erik Vengroff (JIRA) <[hidden email]> wrote:
> > Dynamic copying of fields (allow wildcard sources in copyField)
> > ---------------------------------------------------------------
> >
> >          Key: SOLR-21
> >          URL: http://issues.apache.org/jira/browse/SOLR-21
> >      Project: Solr
> >         Type: New Feature
> >
> >   Components: update
> >  Environment: all
> >     Reporter: Darren Erik Vengroff
> >  Attachments: dynamicCopy.patch
> >
> > It would be really nice if it were possible to use wildcards to do things like:
> >
> >     <copyField source="*_t" dest="text"/>
> >
> > The above example copies all fields ending in "_t" to the "text" field.
> >
> > I've put together a patch to do this. If there are mutlitple matches, all copies are done.  If there is a match in a dynamicField, then the dynamic field is also generated, subject to the existing rules that short expressions go first.  I tried to stick to the spirit of the code as I saw it, and made what I thought were a minimal reasonable set of changes.  The patch includes some additional tests in ConvertedLegacyTest.java to test the new functionality.  That may not be the  best  place for new tests, but it beats no tests.
> >
> > I'd really like to get this, or some improved variant of it into the codebase, as it's quite important to my application.  Please review and comment/criticize as you see fit.
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > If you think it was sent incorrectly contact one of the administrators:
> >    http://issues.apache.org/jira/secure/Administrators.jspa
> > -
> > For more information on JIRA, see:
> >    http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

RE: [jira] Created: (SOLR-21) Dynamic copying of fields (allow wildcard sources in copyField)

Darren Vengroff-2
Hi Yonik,

The purpose of hasExplicitField() is to enable

    DocumentBuilder. addField(String name, String val, float boost)

to determine whether or not it should add a single field for the name that
was given.  This should only happen if

    a) The name was specified as a field in the schema.
or
    b) The name matches a dynamic field.

If neither of these is the case, then we still might copy the value to one
or more other fields due to a wildcard match in a copyField.

The syntax you described below, where the destination contains a wildcard,
is not supported by this implementation.  The destination must be a an
explicit field, meeting the conditions above.

Does that clarify what I was trying to do?

-D

-----Original Message-----
From: Yonik Seeley [mailto:[hidden email]]
Sent: Tuesday, June 06, 2006 9:01 AM
To: [hidden email]
Subject: Re: [jira] Created: (SOLR-21) Dynamic copying of fields (allow
wildcard sources in copyField)

Hi Darren,
I'm a bit confused about the meaning of hasExplicitField...

If I have a <copyField source="*_a" dest="*_b"/>
The dynamic fields *_a and *_b must both be defined, right?
In that case, it seems like "if it matches a field or dynamicField
declaration" would always be true, no?

+  /**
+   * Does the schema have the specified field defined explicitly, i.e.
+   * not as a result of a copyField declaration with a wildcard?  We
+   * consider it explicitly defined if it matches a field or dynamicField
+   * declaration.
+   * @param fieldName
+   * @return true if explicitly declared in the schema.
+   */
+  public boolean hasExplicitField(String fieldName) {
+    if(fields.containsKey(fieldName)) {
+      return true;
+    }
+
+    for (DynamicField df : dynamicFields) {
+      if (df.matches(fieldName)) return true;
+    }
+
+    return false;
+  }

-Yonik


On 6/6/06, Yonik Seeley <[hidden email]> wrote:

> Thanks Darren!
> I'm looking it over now.
>
> -Yonik
>
> On 6/6/06, Darren Erik Vengroff (JIRA) <[hidden email]> wrote:
> > Dynamic copying of fields (allow wildcard sources in copyField)
> > ---------------------------------------------------------------
> >
> >          Key: SOLR-21
> >          URL: http://issues.apache.org/jira/browse/SOLR-21
> >      Project: Solr
> >         Type: New Feature
> >
> >   Components: update
> >  Environment: all
> >     Reporter: Darren Erik Vengroff
> >  Attachments: dynamicCopy.patch
> >
> > It would be really nice if it were possible to use wildcards to do
things like:
> >
> >     <copyField source="*_t" dest="text"/>
> >
> > The above example copies all fields ending in "_t" to the "text" field.
> >
> > I've put together a patch to do this. If there are mutlitple matches,
all copies are done.  If there is a match in a dynamicField, then the
dynamic field is also generated, subject to the existing rules that short
expressions go first.  I tried to stick to the spirit of the code as I saw
it, and made what I thought were a minimal reasonable set of changes.  The
patch includes some additional tests in ConvertedLegacyTest.java to test the
new functionality.  That may not be the  best  place for new tests, but it
beats no tests.
> >
> > I'd really like to get this, or some improved variant of it into the
codebase, as it's quite important to my application.  Please review and
comment/criticize as you see fit.
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > If you think it was sent incorrectly contact one of the administrators:
> >    http://issues.apache.org/jira/secure/Administrators.jspa
> > -
> > For more information on JIRA, see:
> >    http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (SOLR-21) Dynamic copying of fields (allow wildcard sources in copyField)

Yonik Seeley
Ahh, so your copyField allows a source field that isn't even defined
in the schema... that actually goes a bit beyond extending copyField
to dynamic fields.

My mental model had been to "extend copyFields to encompass
dynamicFields", and I think yours was "make copyFields dynamic".

My original though was that a copyField source would exactly match a
dynamicField, and that the copyFields map<String,SchemaField[]> could
be extended to handle dynamic fields too.  The key for a normal field
might be "foo", and the key for a dynamic field could be "*_i" for
example.

The upside to your approach is more flexibility... the copyField
source need not have any schema field definition, or could actually
cover multiple at once.  One of the downsides is that there isn't a
fast-path of a single hash lookup to retrieve the fields and a single
hash lookup to retrieve the dynamicFields.  Maybe it's nothing to
worry about compared to the rest of the indexing process though.

-Yonik

On 6/6/06, Darren Vengroff <[hidden email]> wrote:

> Hi Yonik,
>
> The purpose of hasExplicitField() is to enable
>
>     DocumentBuilder. addField(String name, String val, float boost)
>
> to determine whether or not it should add a single field for the name that
> was given.  This should only happen if
>
>     a) The name was specified as a field in the schema.
> or
>     b) The name matches a dynamic field.
>
> If neither of these is the case, then we still might copy the value to one
> or more other fields due to a wildcard match in a copyField.
>
> The syntax you described below, where the destination contains a wildcard,
> is not supported by this implementation.  The destination must be a an
> explicit field, meeting the conditions above.
>
> Does that clarify what I was trying to do?
>
> -D
>
> -----Original Message-----
> From: Yonik Seeley [mailto:[hidden email]]
> Sent: Tuesday, June 06, 2006 9:01 AM
> To: [hidden email]
> Subject: Re: [jira] Created: (SOLR-21) Dynamic copying of fields (allow
> wildcard sources in copyField)
>
> Hi Darren,
> I'm a bit confused about the meaning of hasExplicitField...
>
> If I have a <copyField source="*_a" dest="*_b"/>
> The dynamic fields *_a and *_b must both be defined, right?
> In that case, it seems like "if it matches a field or dynamicField
> declaration" would always be true, no?
>
> +  /**
> +   * Does the schema have the specified field defined explicitly, i.e.
> +   * not as a result of a copyField declaration with a wildcard?  We
> +   * consider it explicitly defined if it matches a field or dynamicField
> +   * declaration.
> +   * @param fieldName
> +   * @return true if explicitly declared in the schema.
> +   */
> +  public boolean hasExplicitField(String fieldName) {
> +    if(fields.containsKey(fieldName)) {
> +      return true;
> +    }
> +
> +    for (DynamicField df : dynamicFields) {
> +      if (df.matches(fieldName)) return true;
> +    }
> +
> +    return false;
> +  }
>
> -Yonik
>
>
> On 6/6/06, Yonik Seeley <[hidden email]> wrote:
> > Thanks Darren!
> > I'm looking it over now.
> >
> > -Yonik
> >
> > On 6/6/06, Darren Erik Vengroff (JIRA) <[hidden email]> wrote:
> > > Dynamic copying of fields (allow wildcard sources in copyField)
> > > ---------------------------------------------------------------
> > >
> > >          Key: SOLR-21
> > >          URL: http://issues.apache.org/jira/browse/SOLR-21
> > >      Project: Solr
> > >         Type: New Feature
> > >
> > >   Components: update
> > >  Environment: all
> > >     Reporter: Darren Erik Vengroff
> > >  Attachments: dynamicCopy.patch
> > >
> > > It would be really nice if it were possible to use wildcards to do
> things like:
> > >
> > >     <copyField source="*_t" dest="text"/>
> > >
> > > The above example copies all fields ending in "_t" to the "text" field.
> > >
> > > I've put together a patch to do this. If there are mutlitple matches,
> all copies are done.  If there is a match in a dynamicField, then the
> dynamic field is also generated, subject to the existing rules that short
> expressions go first.  I tried to stick to the spirit of the code as I saw
> it, and made what I thought were a minimal reasonable set of changes.  The
> patch includes some additional tests in ConvertedLegacyTest.java to test the
> new functionality.  That may not be the  best  place for new tests, but it
> beats no tests.
> > >
> > > I'd really like to get this, or some improved variant of it into the
> codebase, as it's quite important to my application.  Please review and
> comment/criticize as you see fit.
> > >
> > > --
> > > This message is automatically generated by JIRA.
> > > -
> > > If you think it was sent incorrectly contact one of the administrators:
> > >    http://issues.apache.org/jira/secure/Administrators.jspa
> > > -
> > > For more information on JIRA, see:
> > >    http://www.atlassian.com/software/jira
>
>


--
-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server
Reply | Threaded
Open this post in threaded view
|

RE: [jira] Created: (SOLR-21) Dynamic copying of fields (allow wildcard sources in copyField)

Darren Vengroff-2
I appreciate your performance point.  One of the things I considered, if
performance becomes an issue, is putting a cache in that would map from a
field name to the full set of fields it should copy to.  We would only go
through the longer dynamic matching exercise once, and then put the results
in the cache.  But I wanted to hold off on until such time as there is
evidence that the dynamic approach is too costly.

Has Solr been profiled to see where the bottlenecks really are?

-D

-----Original Message-----
From: Yonik Seeley [mailto:[hidden email]]
Sent: Tuesday, June 06, 2006 9:58 AM
To: [hidden email]
Subject: Re: [jira] Created: (SOLR-21) Dynamic copying of fields (allow
wildcard sources in copyField)

Ahh, so your copyField allows a source field that isn't even defined
in the schema... that actually goes a bit beyond extending copyField
to dynamic fields.

My mental model had been to "extend copyFields to encompass
dynamicFields", and I think yours was "make copyFields dynamic".

My original though was that a copyField source would exactly match a
dynamicField, and that the copyFields map<String,SchemaField[]> could
be extended to handle dynamic fields too.  The key for a normal field
might be "foo", and the key for a dynamic field could be "*_i" for
example.

The upside to your approach is more flexibility... the copyField
source need not have any schema field definition, or could actually
cover multiple at once.  One of the downsides is that there isn't a
fast-path of a single hash lookup to retrieve the fields and a single
hash lookup to retrieve the dynamicFields.  Maybe it's nothing to
worry about compared to the rest of the indexing process though.

-Yonik

On 6/6/06, Darren Vengroff <[hidden email]> wrote:

> Hi Yonik,
>
> The purpose of hasExplicitField() is to enable
>
>     DocumentBuilder. addField(String name, String val, float boost)
>
> to determine whether or not it should add a single field for the name that
> was given.  This should only happen if
>
>     a) The name was specified as a field in the schema.
> or
>     b) The name matches a dynamic field.
>
> If neither of these is the case, then we still might copy the value to one
> or more other fields due to a wildcard match in a copyField.
>
> The syntax you described below, where the destination contains a wildcard,
> is not supported by this implementation.  The destination must be a an
> explicit field, meeting the conditions above.
>
> Does that clarify what I was trying to do?
>
> -D
>
> -----Original Message-----
> From: Yonik Seeley [mailto:[hidden email]]
> Sent: Tuesday, June 06, 2006 9:01 AM
> To: [hidden email]
> Subject: Re: [jira] Created: (SOLR-21) Dynamic copying of fields (allow
> wildcard sources in copyField)
>
> Hi Darren,
> I'm a bit confused about the meaning of hasExplicitField...
>
> If I have a <copyField source="*_a" dest="*_b"/>
> The dynamic fields *_a and *_b must both be defined, right?
> In that case, it seems like "if it matches a field or dynamicField
> declaration" would always be true, no?
>
> +  /**
> +   * Does the schema have the specified field defined explicitly, i.e.
> +   * not as a result of a copyField declaration with a wildcard?  We
> +   * consider it explicitly defined if it matches a field or dynamicField
> +   * declaration.
> +   * @param fieldName
> +   * @return true if explicitly declared in the schema.
> +   */
> +  public boolean hasExplicitField(String fieldName) {
> +    if(fields.containsKey(fieldName)) {
> +      return true;
> +    }
> +
> +    for (DynamicField df : dynamicFields) {
> +      if (df.matches(fieldName)) return true;
> +    }
> +
> +    return false;
> +  }
>
> -Yonik
>
>
> On 6/6/06, Yonik Seeley <[hidden email]> wrote:
> > Thanks Darren!
> > I'm looking it over now.
> >
> > -Yonik
> >
> > On 6/6/06, Darren Erik Vengroff (JIRA) <[hidden email]> wrote:
> > > Dynamic copying of fields (allow wildcard sources in copyField)
> > > ---------------------------------------------------------------
> > >
> > >          Key: SOLR-21
> > >          URL: http://issues.apache.org/jira/browse/SOLR-21
> > >      Project: Solr
> > >         Type: New Feature
> > >
> > >   Components: update
> > >  Environment: all
> > >     Reporter: Darren Erik Vengroff
> > >  Attachments: dynamicCopy.patch
> > >
> > > It would be really nice if it were possible to use wildcards to do
> things like:
> > >
> > >     <copyField source="*_t" dest="text"/>
> > >
> > > The above example copies all fields ending in "_t" to the "text"
field.
> > >
> > > I've put together a patch to do this. If there are mutlitple matches,
> all copies are done.  If there is a match in a dynamicField, then the
> dynamic field is also generated, subject to the existing rules that short
> expressions go first.  I tried to stick to the spirit of the code as I saw
> it, and made what I thought were a minimal reasonable set of changes.  The
> patch includes some additional tests in ConvertedLegacyTest.java to test
the

> new functionality.  That may not be the  best  place for new tests, but it
> beats no tests.
> > >
> > > I'd really like to get this, or some improved variant of it into the
> codebase, as it's quite important to my application.  Please review and
> comment/criticize as you see fit.
> > >
> > > --
> > > This message is automatically generated by JIRA.
> > > -
> > > If you think it was sent incorrectly contact one of the
administrators:
> > >    http://issues.apache.org/jira/secure/Administrators.jspa
> > > -
> > > For more information on JIRA, see:
> > >    http://www.atlassian.com/software/jira
>
>


--
-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (SOLR-21) Dynamic copying of fields (allow wildcard sources in copyField)

Yonik Seeley
On 6/6/06, Darren Vengroff <[hidden email]> wrote:
> I appreciate your performance point.  One of the things I considered, if
> performance becomes an issue, is putting a cache in that would map from a
> field name to the full set of fields it should copy to.  We would only go
> through the longer dynamic matching exercise once, and then put the results
> in the cache.  But I wanted to hold off on until such time as there is
> evidence that the dynamic approach is too costly.

Very understandable.  I'm a self-admitted premature optimizer, because
I often find it a very fun part of development :-)

> Has Solr been profiled to see where the bottlenecks really are?

Only for the query side, where the bottlenecks understandably turned
out to be the Lucene queries themselves.

For the indexing side, there are one or two lucene patches I want to
apply that should have a bigger impact on indexing speed.

Anyway, I was comfortable enough that any possible bottlenecks could
be eliminated in the future that I was already working on committing
your patch.  Unfortunately incompatible changes were made to
IndexSchema in the meantime, so I'm applying it by hand.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (SOLR-21) Dynamic copying of fields (allow wildcard sources in copyField)

Chris Hostetter-3

: your patch.  Unfortunately incompatible changes were made to
: IndexSchema in the meantime, so I'm applying it by hand.

the documentation changes i commited last night should have been mostly
addative, so the easiest way to rectify the differneces may be to revert
my doc additions, apply the patch from this bug, and then manually apply
the doc additions.

(i haven't had a chance to look at this patch, i'm just assuming it
involves at least as many modifications as it does additions)

-Hoss

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (SOLR-21) Dynamic copying of fields (allow wildcard sources in copyField)

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)
     [ http://issues.apache.org/jira/browse/SOLR-21?page=all ]
     
Yonik Seeley resolved SOLR-21:
------------------------------

    Resolution: Fixed
     Assign To: Yonik Seeley

I just committed this.  Thanks Darren!

I did make a tweak or two...
- changed hasExplicitField to getFieldOrNull (a getField that returns null instead of throwing an exception) since it seems like a more general method that can avoid a hash lookup.
- throw an exception if a field being added doesn't match anything... that was the previous behavior.

do those make sense?

> Dynamic copying of fields (allow wildcard sources in copyField)
> ---------------------------------------------------------------
>
>          Key: SOLR-21
>          URL: http://issues.apache.org/jira/browse/SOLR-21
>      Project: Solr
>         Type: New Feature

>   Components: update
>  Environment: all
>     Reporter: Darren Erik Vengroff
>     Assignee: Yonik Seeley
>  Attachments: dynamicCopy.patch
>
> It would be really nice if it were possible to use wildcards to do things like:
>     <copyField source="*_t" dest="text"/>
> The above example copies all fields ending in "_t" to the "text" field.
> I've put together a patch to do this. If there are mutlitple matches, all copies are done.  If there is a match in a dynamicField, then the dynamic field is also generated, subject to the existing rules that short expressions go first.  I tried to stick to the spirit of the code as I saw it, and made what I thought were a minimal reasonable set of changes.  The patch includes some additional tests in ConvertedLegacyTest.java to test the new functionality.  That may not be the  best  place for new tests, but it beats no tests.
> I'd really like to get this, or some improved variant of it into the codebase, as it's quite important to my application.  Please review and comment/criticize as you see fit.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

RE: [jira] Created: (SOLR-21) Dynamic copying of fields (allow wildcard sources in copyField)

Chris Hostetter-3
In reply to this post by Darren Vengroff-2

: If neither of these is the case, then we still might copy the value to one
: or more other fields due to a wildcard match in a copyField.

Darren, glancing at your patch, I see you added a simple test for copying
from *_t to text -- but *_t is an existing dynamicField .. could you
perhaps add some more test cases covering these new situations you've
added support...

   <copyField source="*_notADeclaredDynamicField" dest="text"/>
   <copyField source="notADeclaredField"          dest="text"/>


Also; the paranoid part of my brain wonders if the <copyField> directive
should be left as it is, and a new <dynamicCopyField> directive should be
added to support wildcards in the source ... i worry that some weird sick
individual out there might acctually want a field name that contains a "*"
in it, and we've not eliminated the ability for them to use that field
with a copyField

(this was the same motivation for the <field> <dynamicField> distinction)

: The syntax you described below, where the destination contains a wildcard,
: is not supported by this implementation.  The destination must be a an
: explicit field, meeting the conditions above.

: If I have a <copyField source="*_a" dest="*_b"/>

I'm acctually in favor of this approach .. Yonik's idea once upon a time
of supporting <copyField source="*_a" dest="*_b"/> allways scared me, not
only does it raise hte question:

  is that legal as schema creation even if i don't have a <dynamicField
  name="*_b" ... /> but i do have a lot of filds whose names end with _b?

but it opens the door for <copyField source="*_a" dest="b_*"/> ... how
should that be treated if in addition to a dynamicField with the prefix
b_* i also have an explicitly field named "b_foo" .. normal matching rules
say that if i add a doc with a field named "b_foo" it's subject to the
FieldType of the explicitly declared field, but is that the "right" thing
to do when we're in a copyField relationship where the source was dynamic?

...like i said, i'm paranoid.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Created: (SOLR-21) Dynamic copying of fields (allow wildcard sources in copyField)

Yonik Seeley
On 6/6/06, Chris Hostetter <[hidden email]> wrote:
> Also; the paranoid part of my brain wonders if the <copyField> directive
> should be left as it is, and a new <dynamicCopyField> directive should be
> added to support wildcards in the source
> ... i worry that some weird sick
> individual out there might acctually want a field name that contains a "*"
> in it, and we've not eliminated the ability for them to use that field
> with a copyField
>
> (this was the same motivation for the <field> <dynamicField> distinction)

I'm fine with having "*" in copyField, and it seems pretty clear what
it does (most people should be familiar with "*" as a wildcard).

For the record, I actually preferred <field name="*_i"> to
dynamicField, but I didn't feel strongly enough to argue much.  There
can be errors made both ways... someone may put a "*" in a field
definition, expecting it to act like a dynamicField.
I also don't think it's that important to support any character in
field names... so if you can't have "*", I don't see it as a big deal.

> I'm acctually in favor of this approach .. Yonik's idea once upon a time
> of supporting <copyField source="*_a" dest="*_b"/>

I still think that's might be a useful case, but it's implementation
can wait until someone actually needs it :-)

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Resolved: (SOLR-21) Dynamic copying of fields (allow wildcard sources in copyField)

Chris Hostetter-3
In reply to this post by Jorge Spinsanti (Jira)

: I just committed this.  Thanks Darren!

Congrats on your first (of what I hope will be many) commited Solr
contribution Darren!

Would you like the honor of updating the FAQ on copyField...

http://wiki.apache.org/solr/FAQ#head-6b1d9dc2c14adecfe6fc5ce86448f15fc84baab9


-Hoss

Reply | Threaded
Open this post in threaded view
|

RE: [jira] Resolved: (SOLR-21) Dynamic copying of fields (allow wildcard sources in copyField)

Darren Vengroff-2
Just updated the FAQ.  Thanks to you guys for building such a great product
for me to add a few little features to.

-D

-----Original Message-----
From: Chris Hostetter [mailto:[hidden email]]
Sent: Tuesday, June 06, 2006 12:25 PM
To: [hidden email]
Subject: Re: [jira] Resolved: (SOLR-21) Dynamic copying of fields (allow
wildcard sources in copyField)


: I just committed this.  Thanks Darren!

Congrats on your first (of what I hope will be many) commited Solr
contribution Darren!

Would you like the honor of updating the FAQ on copyField...

http://wiki.apache.org/solr/FAQ#head-6b1d9dc2c14adecfe6fc5ce86448f15fc84baab
9


-Hoss