SOLR-116

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

SOLR-116

AE-4
Hi:

After doing quite a bit of searching what I understand is that the medicine to my problem of word count is in docTermFreq and TermEnum ... as Chris Hostetter points out clearly for statistical purpose in the post below. (Please note I am not so familer with java)

http://www.mail-archive.com/solr-dev@.../msg02347.html

Based on the discussion in SOLR-116 it seems like it is there somewhere .. Can I access it somehow view solrb i.e ruby gem.. oh it would save me so much trouble trying to get it right in Java.  

I would appreciate a clarification if its possible to access it via ruby/solrb..

Regards


 
---------------------------------

Stava rätt! Stava lätt! Yahoo! Mails stavkontroll tar hand om tryckfelen och mycket mer! Få den på http://se.mail.yahoo.com
Reply | Threaded
Open this post in threaded view
|

Re: SOLR-116

Chris Hostetter-3

I'm a little back logged on mail or i would have replied to your word
count email earlier...

one thing to keep in mind is that the index doesn't deal in "words" it
deals in "terms" -- the differnece being a term has "field" and a "token"
-- what was discussed in the mail archives leading up to the creation of
SOLR-106, SOLR-116 and SOLR-117 is that you can get the lists of
frequently used "terms" per field doing a faceted search, and if you want
to iterate through a lot the terms for a field you can now do so
(SOLR-106) ... or you can restrict the terms not just by a field name, but
also by a prefix (SOLR-117).

what you can't do is get a list of the ocunts of all the times a *word*
appears in your index -- to accomplish this you would need to have a
single field that you copyField all of your other fields into, and then
get the facet counts on it.

note that do do a faceted search, you have to search on soemthing ... if
you want the counts across your entire index, you can search on foo:[* TO
*] where "foo" is the name of whatever field in your schema is your
uniqueKey field.


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: SOLR-116

Erik Hatcher
In reply to this post by AE-4

On Jan 29, 2007, at 8:49 PM, Antonio Eggberg wrote:

> After doing quite a bit of searching what I understand is that the  
> medicine to my problem of word count is in docTermFreq and  
> TermEnum ... as Chris Hostetter points out clearly for statistical  
> purpose in the post below. (Please note I am not so familer with java)
>
> http://www.mail-archive.com/solr-dev@.../msg02347.html
>
> Based on the discussion in SOLR-116 it seems like it is there  
> somewhere .. Can I access it somehow view solrb i.e ruby gem.. oh  
> it would save me so much trouble trying to get it right in Java.
>
> I would appreciate a clarification if its possible to access it via  
> ruby/solrb..


solrb provides this kind of capability now, via the &qt=indexinfo  
request handler:

 > require 'solr'
 > require 'pp'  # pretty print
 > connection = Solr::Connection.new("http://localhost:8983/solr")
 > pp connection.send(Solr::Request::IndexInfo.new)
#<Solr::Response::IndexInfo:0x7dd46c
@data=
   {"NOTICE"=>"This interface is experimental and may be changing",
    "fields"=>
     {"author_text"=>{"type"=>"text"},
      "subject_genre_facet"=>{"type"=>"string"},
      "text"=>{"type"=>"text"},
      "subject_geographic_facet"=>{"type"=>"string"},
      "subject_format_facet"=>{"type"=>"string"},
      "id"=>{"type"=>"string"},
      "subject_era_facet"=>{"type"=>"string"},
      "subject_topic_facet"=>{"type"=>"string"},
      "title_text"=>{"type"=>"text"}},
    "index"=>{"numDocs"=>50000, "version"=>1168970065801,  
"maxDoc"=>50000},
    "responseHeader"=>{"status"=>0, "QTime"=>0}},
@header={"status"=>0, "QTime"=>0},
@raw_response="{'responseHeader'=>{'status'=>0,'QTime'=>0},'fields'=>
{'title_text'=>{'type'=>'text'},'subject_format_facet'=>
{'type'=>'string'},'subject_geographic_facet'=>
{'type'=>'string'},'subject_topic_facet'=>
{'type'=>'string'},'subject_genre_facet'=>{'type'=>'string'},'text'=>
{'type'=>'text'},'author_text'=>{'type'=>'text'},'subject_era_facet'=>
{'type'=>'string'},'id'=>{'type'=>'string'}},'index'=>
{'maxDoc'=>50000,'numDocs'=>50000,'version'=>1168970065801},'NOTICE'=>'T
his interface is experimental and may be changing'}">

So, no, the current information provided by this handler does not  
contain frequency information.  I'd be happy to consider patches that  
allow it to provide more information, though I'd like to keep the  
basic index information request as succinct as possible, using  
additional parameters to output more details if requested.

        Erik

Reply | Threaded
Open this post in threaded view
|

SV: Re: SOLR-116

AE-4


Erik Hatcher <[hidden email]> skrev:
So, no, the current information provided by this handler does not  
contain frequency information.  I'd be happy to consider patches that  
allow it to provide more information, though I'd like to keep the  
basic index information request as succinct as possible, using  
additional parameters to output more details if requested.

 Erik and Chris!

Thanks again for the clarification. It helps a lot. I am still trying to get my head around Solr and Solrb. Lot of new ..cool..things! It would me greatly if any of you could beef up at least the solrb RDOC i.e. beside stubbs with examples etc. I know its a bit early in the development but
its difficult to get into API's without docs. I would love to contribute once  we have some basics there plus when I get a bit more comfortable in Solr.

Thanks again for your help. Very appreciated.


 
---------------------------------

Stava rätt! Stava lätt! Yahoo! Mails stavkontroll tar hand om tryckfelen och mycket mer! Få den på http://se.mail.yahoo.com
Reply | Threaded
Open this post in threaded view
|

Re: SV: Re: SOLR-116

Erik Hatcher

On Jan 30, 2007, at 5:33 AM, Antonio Eggberg wrote:

> Erik Hatcher <[hidden email]> skrev:
> So, no, the current information provided by this handler does not
> contain frequency information.  I'd be happy to consider patches that
> allow it to provide more information, though I'd like to keep the
> basic index information request as succinct as possible, using
> additional parameters to output more details if requested.
>
>  Erik and Chris!
>
> Thanks again for the clarification. It helps a lot. I am still  
> trying to get my head around Solr and Solrb. Lot of  
> new ..cool..things! It would me greatly if any of you could beef up  
> at least the solrb RDOC i.e. beside stubbs with examples etc. I  
> know its a bit early in the development but
> its difficult to get into API's without docs. I would love to  
> contribute once  we have some basics there plus when I get a bit  
> more comfortable in Solr.

solrb rdoc... indeed we'll beef it up.  If you have specific  
questions, please ask and that'll help prioritize what needs to be  
covered sooner rather than later.

However, even better than documentation is real-world working example  
code.  You'll find that in the README file and in the quite robust  
test case (100% code coverage, still!).  The unit test cases are  
sometimes contrived to exercise an edge case in the code and may not  
be useful for copy/paste, but the functional tests are end-to-end  
tests that hit a real Solr instance.

        Erik

Reply | Threaded
Open this post in threaded view
|

Re: SV: Re: SOLR-116

Zaheed Haque
On 2/1/07, Erik Hatcher <[hidden email]> wrote:

>
> On Jan 30, 2007, at 5:33 AM, Antonio Eggberg wrote:
> > Erik Hatcher <[hidden email]> skrev:
> > So, no, the current information provided by this handler does not
> > contain frequency information.  I'd be happy to consider patches that
> > allow it to provide more information, though I'd like to keep the
> > basic index information request as succinct as possible, using
> > additional parameters to output more details if requested.
> >
> >  Erik and Chris!
> >
> > Thanks again for the clarification. It helps a lot. I am still
> > trying to get my head around Solr and Solrb. Lot of
> > new ..cool..things! It would me greatly if any of you could beef up
> > at least the solrb RDOC i.e. beside stubbs with examples etc. I
> > know its a bit early in the development but
> > its difficult to get into API's without docs. I would love to
> > contribute once  we have some basics there plus when I get a bit
> > more comfortable in Solr.
>
> solrb rdoc... indeed we'll beef it up.  If you have specific
> questions, please ask and that'll help prioritize what needs to be
> covered sooner rather than later.

Cool. Please see below.

> However, even better than documentation is real-world working example
> code.  You'll find that in the README file and in the quite robust
> test case (100% code coverage, still!).  The unit test cases are
> sometimes contrived to exercise an edge case in the code and may not
> be useful for copy/paste, but the functional tests are end-to-end
> tests that hit a real Solr instance.

I think trying to tackle two problem (Learning Solr as well as
solrb/flare) at the
same time is giving the problem. While I was writing the i18n unit test I found
out the 100% test cov. thats really cool. Now in terms of docs.. I
think it would
be great if I have a

- TODO file under client/ruby/solrb. What I mean is that - lot of times
I find information in wiki which applies to Solr but I don't know if it applies
to solr-ruby-api. So it would be nice to have a TODO file (Things that is
not available in solr-ruby yet. Makes my life easier.
- I am completely lost in terms of facets.. I would love to have some more info
about it. I couldn't follow the facet's part of the code when i was
looking at the test/unit/standard_request.rb and standard_response.rb,
It would be nice if
there were some explanation regarding facets in the .rb files.

If you prefer, what I could do is comment the code as I see it based
on the functional
test and then you can edit it this way we could beef up the doc rather
quick. Its not that
many files so I could give it a shot by end day tomorrow. Off course
except Facets :-)

Cheers
Zaheed
Reply | Threaded
Open this post in threaded view
|

solrb documentation (was Re: SV: Re: SOLR-116)

Erik Hatcher

On Feb 1, 2007, at 6:47 AM, Zaheed Haque wrote:

> I think trying to tackle two problem (Learning Solr as well as
> solrb/flare) at the
> same time is giving the problem. While I was writing the i18n unit  
> test I found
> out the 100% test cov. thats really cool. Now in terms of docs.. I
> think it would
> be great if I have a
>
> - TODO file under client/ruby/solrb. What I mean is that - lot of  
> times
> I find information in wiki which applies to Solr but I don't know  
> if it applies
> to solr-ruby-api. So it would be nice to have a TODO file (Things  
> that is
> not available in solr-ruby yet. Makes my life easier.

We have our own solrb TODO list on the wiki: <http://wiki.apache.org/ 
solr/solrb/ToDo>, also note the "rake todo" task that will produce a  
list of all the TBD/TODO/FIXME lines in the code.

> - I am completely lost in terms of facets.. I would love to have  
> some more info
> about it. I couldn't follow the facet's part of the code when i was
> looking at the test/unit/standard_request.rb and standard_response.rb,
> It would be nice if
> there were some explanation regarding facets in the .rb files.

Quite fair enough.

However, rather than reinvent the wheel, have you read these?

  - <http://wiki.apache.org/solr/SolrFacetingOverview>
  - <http://wiki.apache.org/solr/SimpleFacetParameters>

Granted the difference is that through solrb we map more readable  
names to the arguments, along a sensible data-structure of parameters:

     request = Solr::Request::Standard.new(:query => 'query',
        :facets => {
          :fields => [:genre,
                      # field that overrides the global facet parameters
                      {:year =>
                         {:limit => 50, :mincount => 0, :missing =>  
false, :sort => :term, :prefix=>"199"}}],
          :queries => ["q1", "q2"],
          :prefix => "cat",
          :limit => 5, :zeros => true, :mincount => 20, :sort  
=> :count  # global facet parameters
         }
     )

I have to stop and laugh when I see this versus the vastly more  
succinct query string this generates, and wonder if the syntactic  
sugar is really worth it :)  One nicety is that nested values like a  
field-specific override of the mincount is easier to visually parse  
with the Ruby code.

So while I'm in complete agreement with you that solrb's rdocs need  
lots of improvements, I also don't want the docs to be a copy of what  
Solr docs already explain nicely.

> If you prefer, what I could do is comment the code as I see it based
> on the functional
> test and then you can edit it this way we could beef up the doc rather
> quick. Its not that
> many files so I could give it a shot by end day tomorrow. Off course
> except Facets :-)

Sure thing!  Patches are always welcome, and even more welcome when  
they are documentation patches :)

        Erik