[jira] Created: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index

JIRA jira@apache.org
StructureRequestHandler - allowing client to discover all fields in the index
-----------------------------------------------------------------------------

                 Key: SOLR-116
                 URL: https://issues.apache.org/jira/browse/SOLR-116
             Project: Solr
          Issue Type: New Feature
          Components: search
            Reporter: Erik Hatcher
         Assigned To: Erik Hatcher
            Priority: Minor


This request handler returns all fields and their type.  In Ruby format (&wt=ruby) the results, for the example index, look like this currently:

{'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}}

A client wanting to introspect Solr could combine the actual fields and their types with parsing of schema.xml to glean a lot and dynamically configure based on what is inside an index.  Should more information per field be returned, or is simply the type name sufficient?   What else is desirable for this request handler?

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erik Hatcher updated SOLR-116:
------------------------------

    Attachment: structure_handler.patch

> StructureRequestHandler - allowing client to discover all fields in the index
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-116
>                 URL: https://issues.apache.org/jira/browse/SOLR-116
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Erik Hatcher
>         Assigned To: Erik Hatcher
>            Priority: Minor
>         Attachments: structure_handler.patch
>
>
> This request handler returns all fields and their type.  In Ruby format (&wt=ruby) the results, for the example index, look like this currently:
> {'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}}
> A client wanting to introspect Solr could combine the actual fields and their types with parsing of schema.xml to glean a lot and dynamically configure based on what is inside an index.  Should more information per field be returned, or is simply the type name sufficient?   What else is desirable for this request handler?

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466226 ]

Erik Hatcher commented on SOLR-116:
-----------------------------------

The initial example was from an older example index.  From trunk, the response is this:

{'responseHeader'=>{'status'=>0,'QTime'=>2},'fields'=>{'includes'=>'text','cat'=>'text_ws','alphaNameSort'=>'alphaOnlySort','id'=>'string','text'=>'text','manu_exact'=>'string','features'=>'text','price'=>'sfloat','incubationdate_dt'=>'date','timestamp'=>'date','sku'=>'textTight','name'=>'text','nameSort'=>'string','manu'=>'text','weight'=>'sfloat','inStock'=>'boolean','popularity'=>'sint'}}

incubationdate_dt is a dynamic field, and thus could not be gleaned from simply reading schema.xml.

> StructureRequestHandler - allowing client to discover all fields in the index
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-116
>                 URL: https://issues.apache.org/jira/browse/SOLR-116
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Erik Hatcher
>         Assigned To: Erik Hatcher
>            Priority: Minor
>         Attachments: structure_handler.patch
>
>
> This request handler returns all fields and their type.  In Ruby format (&wt=ruby) the results, for the example index, look like this currently:
> {'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}}
> A client wanting to introspect Solr could combine the actual fields and their types with parsing of schema.xml to glean a lot and dynamically configure based on what is inside an index.  Should more information per field be returned, or is simply the type name sufficient?   What else is desirable for this request handler?

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466247 ]

Yonik Seeley commented on SOLR-116:
-----------------------------------

Looks good, I like the fieldnames as the keys.  The only change I might make is to make it extensible by returning a map as the value.

Instead of:
  'id'=>'string'
It could be
  'id'=>{type=>'string'}

And then other info could optionally go in there:
  'id'=>{type=>'string', multiValued=>'false', 'indexed'=>'true', 'stored'=>'true', 'defaultValue'=>'...'}

Hmmm, and what are the aesthetics of the XML?

<lst name="fields">
  <lst name="id>  <str name="type">string</str>  </lst>
  <lst name="text">...

Not bad...
 

> StructureRequestHandler - allowing client to discover all fields in the index
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-116
>                 URL: https://issues.apache.org/jira/browse/SOLR-116
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Erik Hatcher
>         Assigned To: Erik Hatcher
>            Priority: Minor
>         Attachments: structure_handler.patch
>
>
> This request handler returns all fields and their type.  In Ruby format (&wt=ruby) the results, for the example index, look like this currently:
> {'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}}
> A client wanting to introspect Solr could combine the actual fields and their types with parsing of schema.xml to glean a lot and dynamically configure based on what is inside an index.  Should more information per field be returned, or is simply the type name sufficient?   What else is desirable for this request handler?

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466248 ]

Yonik Seeley commented on SOLR-116:
-----------------------------------

If you want to commit early and still mess around with the parameters and response formats,
one could add a 'NOTICE'=>'This interface is experimental and will be changing'
to the response.

As this handler returns info about the index, is this where listing of terms and docfreqs should also go?

> StructureRequestHandler - allowing client to discover all fields in the index
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-116
>                 URL: https://issues.apache.org/jira/browse/SOLR-116
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Erik Hatcher
>         Assigned To: Erik Hatcher
>            Priority: Minor
>         Attachments: structure_handler.patch
>
>
> This request handler returns all fields and their type.  In Ruby format (&wt=ruby) the results, for the example index, look like this currently:
> {'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}}
> A client wanting to introspect Solr could combine the actual fields and their types with parsing of schema.xml to glean a lot and dynamically configure based on what is inside an index.  Should more information per field be returned, or is simply the type name sufficient?   What else is desirable for this request handler?

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466292 ]

Erik Hatcher commented on SOLR-116:
-----------------------------------

I had thought of the Map for the field name keyed value as well.  

Terms and document frequencies make more sense from a facet handler, it seems, which you can already do with &qt=standard&facet=true&facet.field=fieldname&q=[* TO *] I believe.

I'll add the Map level in there, and the notice, and commit soon so we can tinker with it in Flare as a way to provide a dynamic UI based on the fields in the index.

> StructureRequestHandler - allowing client to discover all fields in the index
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-116
>                 URL: https://issues.apache.org/jira/browse/SOLR-116
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Erik Hatcher
>         Assigned To: Erik Hatcher
>            Priority: Minor
>         Attachments: structure_handler.patch
>
>
> This request handler returns all fields and their type.  In Ruby format (&wt=ruby) the results, for the example index, look like this currently:
> {'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}}
> A client wanting to introspect Solr could combine the actual fields and their types with parsing of schema.xml to glean a lot and dynamically configure based on what is inside an index.  Should more information per field be returned, or is simply the type name sufficient?   What else is desirable for this request handler?

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466293 ]

Yonik Seeley commented on SOLR-116:
-----------------------------------

Facets are slightly different than docfreq's... one is expensive, and one is very cheap since it's pre-calculated by lucene.
The disad to the lucene version is that the docfreq doesn't take deleted docs into account.

If you want to page through or download *all* terms of a full-text field, the faceting code would take forever in comparison.

other ideas for info:

"index" : {
  "numDocs" : 10123,
  "maxDoc" : 12345,
  "age" : 2000,  #number of milliseconds the index has been open... sort of equivalent to index freshness, but not really.
  "version":123425235,  #index version.  Actually, I think this should be in responseHeader to aid in client-side caching
}

I think this stuff is useful, it's just a matter  of preference if it goes in the same handler or not.
If this *does* go in this handler, then perhaps it should be named "indexinfo" or something.  I'd be fine with this hander being only about schema too though.

> StructureRequestHandler - allowing client to discover all fields in the index
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-116
>                 URL: https://issues.apache.org/jira/browse/SOLR-116
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Erik Hatcher
>         Assigned To: Erik Hatcher
>            Priority: Minor
>         Attachments: structure_handler.patch
>
>
> This request handler returns all fields and their type.  In Ruby format (&wt=ruby) the results, for the example index, look like this currently:
> {'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}}
> A client wanting to introspect Solr could combine the actual fields and their types with parsing of schema.xml to glean a lot and dynamically configure based on what is inside an index.  Should more information per field be returned, or is simply the type name sufficient?   What else is desirable for this request handler?

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index

Chris Hostetter-3

: Facets are slightly different than docfreq's... one is expensive, and
: one is very cheap since it's pre-calculated by lucene.

: If you want to page through or download *all* terms of a full-text
: field, the faceting code would take forever in comparison.

i was thinking the same thing.

: "index" : {
:   "numDocs" : 10123,
:   "maxDoc" : 12345,
:   "age" : 2000,  #number of milliseconds the index has been open... sort of equivalent to index freshness, but not really.
:   "version":123425235,  #index version.  Actually, I think this should be in responseHeader to aid in client-side caching
: }
:
: I think this stuff is useful, it's just a matter of preference if it
: goes in the same handler or not. If this *does* go in this handler, then
: perhaps it should be named "indexinfo" or something.  I'd be fine with
: this hander being only about schema too though.

+1 to all of that ... "IndexInfo" seeems like a good thing for returning
all sorts of info, of which the field list is just one piece ... I think
adding the stuff yonik listed above definitely makes sense in the same
request handler that reutrns the list of fields ... the TermEnum iteration
with docFreq might make more sense in a seperate request handler if for no
other reason then to simplify the API when paginating through terms




-Hoss

Reply | Threaded
Open this post in threaded view
|

[jira] Closed: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erik Hatcher closed SOLR-116.
-----------------------------

    Resolution: Fixed

I've committed IndexInfoRequestHandler based on the feedback here.  The field information is now returned as a map, with type being the only value currently.  I also added in an "index" keyed map which contains numDocs, maxDoc, and Lucene index version.  I wasn't sure how the "age" value should be computed, so I commented that out for now.  

I'm closing this issue, and tweaks to this handler can be discussed in solr-dev now.

Thanks for the feedback.

> StructureRequestHandler - allowing client to discover all fields in the index
> -----------------------------------------------------------------------------
>
>                 Key: SOLR-116
>                 URL: https://issues.apache.org/jira/browse/SOLR-116
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Erik Hatcher
>         Assigned To: Erik Hatcher
>            Priority: Minor
>         Attachments: structure_handler.patch
>
>
> This request handler returns all fields and their type.  In Ruby format (&wt=ruby) the results, for the example index, look like this currently:
> {'responseHeader'=>{'status'=>0,'QTime'=>1},'fields'=>{'cat'=>'text_ws','includes'=>'text','id'=>'string','text'=>'text','price'=>'sfloat','features'=>'text','manu_exact'=>'string','manu'=>'text','name'=>'text','sku'=>'textTight','inStock'=>'boolean','popularity'=>'sint','weight'=>'sfloat'}}
> A client wanting to introspect Solr could combine the actual fields and their types with parsing of schema.xml to glean a lot and dynamically configure based on what is inside an index.  Should more information per field be returned, or is simply the type name sufficient?   What else is desirable for this request handler?

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

indexinfo request handler (was Re: [jira] Commented: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index)

Erik Hatcher
In reply to this post by Chris Hostetter-3

On Jan 21, 2007, at 3:34 AM, Chris Hostetter wrote:

> : "index" : {
> :   "numDocs" : 10123,
> :   "maxDoc" : 12345,
> :   "age" : 2000,  #number of milliseconds the index has been  
> open... sort of equivalent to index freshness, but not really.
> :   "version":123425235,  #index version.  Actually, I think this  
> should be in responseHeader to aid in client-side caching
> : }
> :
> : I think this stuff is useful, it's just a matter of preference if it
> : goes in the same handler or not. If this *does* go in this  
> handler, then
> : perhaps it should be named "indexinfo" or something.  I'd be fine  
> with
> : this hander being only about schema too though.
>
> +1 to all of that ... "IndexInfo" seeems like a good thing for  
> returning
> all sorts of info, of which the field list is just one piece
> ... I think
> adding the stuff yonik listed above definitely makes sense in the same
> request handler that reutrns the list of fields ...

Done!

This particular request handler will do wonders for the Flare side of  
things, just you wait and see :)

> the TermEnum iteration
> with docFreq might make more sense in a seperate request handler if  
> for no
> other reason then to simplify the API when paginating through terms

Agreed.  I'm not sure why anyone would want docFreq that included  
terms from documents that had been deleted though, so there should at  
least be a warning in the documentation that says this particular  
information is best done on an optimized index.

I'm curious what use case is envisioned for docFreqs that aren't  
"faceted" in that they aren't constrained by some query to limit the  
results.  I certainly can see it being cool to navigate all the terms  
of a field in a general sense of transparency, but narrowing down the  
terms only from a particular sub-set of documents is what my  
application needs.

        Erik

Reply | Threaded
Open this post in threaded view
|

Re: indexinfo request handler (was Re: [jira] Commented: (SOLR-116) StructureRequestHandler - allowing client to discover all fields in the index)

Chris Hostetter-3

: Agreed.  I'm not sure why anyone would want docFreq that included
: terms from documents that had been deleted though, so there should at
: least be a warning in the documentation that says this particular
: information is best done on an optimized index.
:
: I'm curious what use case is envisioned for docFreqs that aren't
: "faceted" in that they aren't constrained by some query to limit the
: results.  I certainly can see it being cool to navigate all the terms
: of a field in a general sense of transparency, but narrowing down the
: terms only from a particular sub-set of documents is what my
: application needs.

right .. i think the concept is more general visibility into the
underlying index for the purpose of generating statistic.  the docFreqs
give you a good sampling of how common a term is, even if it's not 100%
exact because of deletions.



-Hoss