[acts_as_solr] Few question on usage

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[acts_as_solr] Few question on usage

solruser-2
Hi

Here are few question for solr integrating with ruby

1. What are other alternatives are available for ruby integration with solr
other than acts-as_solr plugin.
2. acts_as_solr plugin - does it support highlighting feature
3. performance benchmark for acts_as_solr plugin available if any


-thanks
dev
Reply | Threaded
Open this post in threaded view
|

Re: [acts_as_solr] Few question on usage

Nathan  Woodhull
I am using the Solr plugin to the Searchable module.  I have made a
large number of enhancements to the library to support the application
that I am working on that I hope to contribute back to the community
soon (multiple indexes, faceted browsing, etc)  but for basic stuff,
you should find what is there to be adequate.
http://searchable.rubyforge.org/

I looked at acts-as-solr when I started this project and was not
terribly impressed with it.

-Nathan

On 4/17/07, amit rohatgi <[hidden email]> wrote:

> Hi
>
> Here are few question for solr integrating with ruby
>
> 1. What are other alternatives are available for ruby integration with solr
> other than acts-as_solr plugin.
> 2. acts_as_solr plugin - does it support highlighting feature
> 3. performance benchmark for acts_as_solr plugin available if any
>
>
> -thanks
> dev
>


--
Nathan Woodhull
blog: http://techfordemocracy.com/
aim: nathanwoodhull
cell: 518-207-6768
Reply | Threaded
Open this post in threaded view
|

Re: [acts_as_solr] Few question on usage

Chris Hostetter-3
In reply to this post by solruser-2

I don't really know alot about Ruby, but as i understand it there are more
then a few versions of something called "acts_as_solr" floating arround
... the first written by Erik as a proof of concept, and then pickedu pand
polished a bit by someone else (whose name escapes me)

all of the "serious" ruby/solr development i know about is happening as
part of the "Flare" sub-sub project...

        http://wiki.apache.org/solr/Flare
        http://wiki.apache.org/solr/SolRuby

...most of the people workign on it seem to hang out on the
ruby-dev@lucene mailing list.  as i understand it the "solr-ruby" package
is a low level ruby<->solr API, with Flare being a higher level
reusable Rails app type thingamombob.  (can you tell i don't know a lot
about RUby or rails? ... i'm winging it)


: Date: Tue, 17 Apr 2007 10:52:00 -0700
: From: amit rohatgi <[hidden email]>
: Reply-To: [hidden email]
: To: [hidden email]
: Subject: [acts_as_solr] Few question on usage
:
: Hi
:
: Here are few question for solr integrating with ruby
:
: 1. What are other alternatives are available for ruby integration with solr
: other than acts-as_solr plugin.
: 2. acts_as_solr plugin - does it support highlighting feature
: 3. performance benchmark for acts_as_solr plugin available if any
:
:
: -thanks
: dev
:



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: [acts_as_solr] Few question on usage

Erik Hatcher
Sorry, I missed the original mail.   Hoss has got it right.

Personally I'd love to see acts_as_solr definitively come into the  
solr-ruby fold.

Regarding your questions:

> : 1. What are other alternatives are available for ruby integration  
> with solr
> : other than acts-as_solr plugin.

acts_as_solr is purely for ActiveRecord (database O/R mapping)  
integration with Solr, such that when you create/update/delete  
records they get taken care of in Solr also.

For pure Ruby access to Solr without a database, use solr-ruby.  The  
0.01 gem is available as "gem install solr-ruby", but if you can I'd  
recommend you tinker with the trunk codebase too.

> : 2. acts_as_solr plugin - does it support highlighting feature

This depends on which acts_as_solr you've grabbed.  As Hoss  
mentioned, there are various flavors of it floating around.   I've  
promised to speak about acts_as_solr at RailsConf next month, so I'll  
be working to get that under control even if that means resurrecting  
my initial hack and making it part of solr-ruby and hoping that the  
other implementations floating out there would like to collaborate on  
a definitive version built into the Solr codebase.

> : 3. performance benchmark for acts_as_solr plugin available if any

What kind of numbers are you after?  acts_as_solr searches Solr, and  
then will fetch the records from the database to bring back model  
objects, so you have to account for the database access in the  
picture as well as Solr.

        Erik



On Apr 19, 2007, at 5:30 PM, Chris Hostetter wrote:

>
> I don't really know alot about Ruby, but as i understand it there  
> are more
> then a few versions of something called "acts_as_solr" floating  
> arround
> ... the first written by Erik as a proof of concept, and then  
> pickedu pand
> polished a bit by someone else (whose name escapes me)
>
> all of the "serious" ruby/solr development i know about is  
> happening as
> part of the "Flare" sub-sub project...
>
> http://wiki.apache.org/solr/Flare
> http://wiki.apache.org/solr/SolRuby
>
> ...most of the people workign on it seem to hang out on the
> ruby-dev@lucene mailing list.  as i understand it the "solr-ruby"  
> package
> is a low level ruby<->solr API, with Flare being a higher level
> reusable Rails app type thingamombob.  (can you tell i don't know a  
> lot
> about RUby or rails? ... i'm winging it)
>
>
> : Date: Tue, 17 Apr 2007 10:52:00 -0700
> : From: amit rohatgi <[hidden email]>
> : Reply-To: [hidden email]
> : To: [hidden email]
> : Subject: [acts_as_solr] Few question on usage
> :
> : Hi
> :
> : Here are few question for solr integrating with ruby
> :
> : 1. What are other alternatives are available for ruby integration  
> with solr
> : other than acts-as_solr plugin.
> : 2. acts_as_solr plugin - does it support highlighting feature
> : 3. performance benchmark for acts_as_solr plugin available if any
> :
> :
> : -thanks
> : dev
> :
>
>
>
> -Hoss

Reply | Threaded
Open this post in threaded view
|

Re: [acts_as_solr] Few question on usage

solruser-2
Hi Erik,

Please find my comments under ">>>" to your queries.

> : 1. What are other alternatives are available for ruby integration  
> with solr
> : other than acts-as_solr plugin.

acts_as_solr is purely for ActiveRecord (database O/R mapping)  
integration with Solr, such that when you create/update/delete  
records they get taken care of in Solr also.

For pure Ruby access to Solr without a database, use solr-ruby.  The  
0.01 gem is available as "gem install solr-ruby", but if you can I'd  
recommend you tinker with the trunk codebase too.

>>>
Well I say, considering use of solr with rails application. Whats the ideal approach?.


> : 2. acts_as_solr plugin - does it support highlighting feature

This depends on which acts_as_solr you've grabbed.  As Hoss  
mentioned, there are various flavors of it floating around.   I've  
promised to speak about acts_as_solr at RailsConf next month, so I'll  
be working to get that under control even if that means resurrecting  
my initial hack and making it part of solr-ruby and hoping that the  
other implementations floating out there would like to collaborate on  
a definitive version built into the Solr codebase.

>>>
Since there are many flavors floating around which is most sought after and supported. And I agree that definitive version will help ROR community to accept solr with much larger level of confidence.
 And since ROR application are addressing
web2.0 the need for search and collaborate information is much higher. So I personally believe addressing this will definately go long way.

> : 3. performance benchmark for acts_as_solr plugin available if any

What kind of numbers are you after?  acts_as_solr searches Solr, and  
then will fetch the records from the database to bring back model  
objects, so you have to account for the database access in the  
picture as well as Solr.

>>>
Well to be specific I am keen to know about creation and update of indexes when you run into large number of documents. Since database is used to populate the models and definately it will be the commulative effect of retrieval of document from solr with lucene, network issues (since its a web service) and locally on database (depends on configuration).


-TIA



Erik Hatcher wrote
Sorry, I missed the original mail.   Hoss has got it right.

Personally I'd love to see acts_as_solr definitively come into the  
solr-ruby fold.

Regarding your questions:

> : 1. What are other alternatives are available for ruby integration  
> with solr
> : other than acts-as_solr plugin.

acts_as_solr is purely for ActiveRecord (database O/R mapping)  
integration with Solr, such that when you create/update/delete  
records they get taken care of in Solr also.

For pure Ruby access to Solr without a database, use solr-ruby.  The  
0.01 gem is available as "gem install solr-ruby", but if you can I'd  
recommend you tinker with the trunk codebase too.

> : 2. acts_as_solr plugin - does it support highlighting feature

This depends on which acts_as_solr you've grabbed.  As Hoss  
mentioned, there are various flavors of it floating around.   I've  
promised to speak about acts_as_solr at RailsConf next month, so I'll  
be working to get that under control even if that means resurrecting  
my initial hack and making it part of solr-ruby and hoping that the  
other implementations floating out there would like to collaborate on  
a definitive version built into the Solr codebase.

> : 3. performance benchmark for acts_as_solr plugin available if any

What kind of numbers are you after?  acts_as_solr searches Solr, and  
then will fetch the records from the database to bring back model  
objects, so you have to account for the database access in the  
picture as well as Solr.

        Erik



On Apr 19, 2007, at 5:30 PM, Chris Hostetter wrote:

>
> I don't really know alot about Ruby, but as i understand it there  
> are more
> then a few versions of something called "acts_as_solr" floating  
> arround
> ... the first written by Erik as a proof of concept, and then  
> pickedu pand
> polished a bit by someone else (whose name escapes me)
>
> all of the "serious" ruby/solr development i know about is  
> happening as
> part of the "Flare" sub-sub project...
>
> http://wiki.apache.org/solr/Flare
> http://wiki.apache.org/solr/SolRuby
>
> ...most of the people workign on it seem to hang out on the
> ruby-dev@lucene mailing list.  as i understand it the "solr-ruby"  
> package
> is a low level ruby<->solr API, with Flare being a higher level
> reusable Rails app type thingamombob.  (can you tell i don't know a  
> lot
> about RUby or rails? ... i'm winging it)
>
>
> : Date: Tue, 17 Apr 2007 10:52:00 -0700
> : From: amit rohatgi <solruser@gmail.com>
> : Reply-To: solr-user@lucene.apache.org
> : To: solr-user@lucene.apache.org
> : Subject: [acts_as_solr] Few question on usage
> :
> : Hi
> :
> : Here are few question for solr integrating with ruby
> :
> : 1. What are other alternatives are available for ruby integration  
> with solr
> : other than acts-as_solr plugin.
> : 2. acts_as_solr plugin - does it support highlighting feature
> : 3. performance benchmark for acts_as_solr plugin available if any
> :
> :
> : -thanks
> : dev
> :
>
>
>
> -Hoss
Reply | Threaded
Open this post in threaded view
|

Re: [acts_as_solr] Few question on usage

Erik Hatcher

On Apr 20, 2007, at 2:30 PM, solruser wrote:
> For pure Ruby access to Solr without a database, use solr-ruby.  The
> 0.01 gem is available as "gem install solr-ruby", but if you can I'd
> recommend you tinker with the trunk codebase too.
>
>>>>
> Well I say, considering use of solr with rails application. Whats  
> the ideal
> approach?.

"rails application" is a pretty broad category of applications at  
this point.  If we're talking about a database-backed application  
being searchable by Solr, I'd go for the RubyForge acts_as_solr  
first.  However, I suspect that it needs work in terms of  
facilitating access to facets, highlighting, and other types of  
custom query handlers.

If your application is backed by other datastores, like in my cases a  
bunch of MARC records in binary format, or a flat delimited file, a  
ZIP file full of RDF/XML files, or even more interestingly another  
Solr instance that we wanted to repurpose in another Solr-based  
application, then go with solr-ruby.

It's my intention to bridge this gap in the near future somehow, I  
just haven't formulated an exact plan.  acts_as_solr fits nicely and  
very very easily on top of solr-ruby.  I envision acts_as_solr simply  
being part of solr-ruby and it'd only hook in if you have  
ActiveRecord installed, otherwise it'd be transparent, only taking up  
a few 10's of lines of code in an un-required .rb file.

The first step could be to patch the RubyForge acts_as_solr to use  
solr-ruby to kick start collaboration.  As for where my effort fits  
into a calendar, within the next few weeks I'll be delving into it  
deeply and can speak more definitively.


>>>>
> Since there are many flavors floating around which is most sought  
> after and
> supported. And I agree that definitive version will help ROR  
> community to
> accept solr with much larger level of confidence.
>  And since ROR application are addressing
> web2.0 the need for search and collaborate information is much  
> higher. So I
> personally believe addressing this will definately go long way.

That's the plan!   No question about it.  I personally am running on  
all cylinders, and will make progress on these technologies as my  
real-world needs require them, which is increasing all the time.  All  
savvy SolRubyists are invited to jump in!

I've not documented this stuff on the wiki to the standards set by  
the Solr engine itself, but there is some pretty amazing power going  
on with solr-ruby right now.  For example, the data mapping / indexer  
framework makes this easy to import a dataset into Solr using Ruby:

source = DataSource.new

mapping = {
   :id => :isbn,
   :name => :author,
   :source => "BOOKS",
   :year => Proc.new {|record| record.date[0,4] },
}

Solr::Indexer.index(source, mapper) do |orig_data, solr_document|
   solr_document[:timestamp] = Time.now
end

This showcases the simplistic data source facility (*quack* -  
anything that has a #each method) [with a contrived DataSource bogus  
class], and the mapping capabilities.  The mapping is a hash of Solr  
field names to value mapping.  A value mapping can be a String  
("BOOKS"), a Symbol (:isbn, :author) which looks up that field from  
(uh, #)each of the objects yielded to the each block.  This lookup  
simply means again *quack* that the data object needs a [] method  
defined.  The Proc example is a bit more advanced Ruby voodoo for  
embedded a bit of code into the mapping to be executed later with  
actual record passed into it, and in the example it strips off the  
first four characters of the records date property.  And one more bit  
of Ruby coolness is the do ... end block for the indexer method.  The  
indexer takes a data source and a mapper melding them together as  
described, and allowing you one final chance to affect the  
solr_document before it gets indexed, of course also provided the  
original data object.

We now already have a simple mapper, an XPath mapper, and an Hpricot  
mapper available.  We also have some handy data sources including a  
tab-delimited file source (obsoleted in my play book by the CSV  
importer now built in).  I'm also using a simple custom MARC binary  
data source and mapper specific to ruby-marc objects, and I just put  
together a SolrSource that takes a query (and filters) for one Solr  
instance in a configurable paging way, that feeds documents returned  
from that query successively out.  Apply a mapper to that data source  
and you can pipe data from one Solr to another like this:

solr_source = Solr::Importer::SolrSource.new("http://localhost:8420/ 
solr", "*:*", ["year:[1776 TO 1918]", 'author:smith'])
count = 0
Solr::Indexer.index(source_solr, mapper, {:debug => false, :timeout  
=> 120, :solr_url => "http://localhost:8983/solr"}) do |orig_data,  
solr_document|
   count = count + 1
   if count % 100 == 0
     puts "#{count}"
   end
end

The count junk is just to see console progress on how many records  
have been indexed.

So I'm working the Ruby/Solr thing as much as possible right now.  
There is something to what we've got there, but its not packaged as  
nicely as needed for a community to flourish, and for that I  
apologize.  But there is also enough goodness there now to lure folks  
in to want to get involved.

Right now in RoR with the Flare plugin installed, you can have a  
controller that looks like this:

    class SearchController < ApplicationController
       flare
     end

And with some copy/pasting of templates (that we can build in as  
defaults somehow I'm sure) you have a faceted browsing Ajax tricked  
out (well, inplace editor and Ajax suggest) experience with how many  
lines of code?   (the devil is in the details though, and that is why  
I don't yet recommend flare to folks that just want it to just work  
and also be configurable)  Flare cuts a lot of corners by hard-coding  
some thing that need to be made configurable, etc.  Typical  
prototyping approach, tinker, tinker, tinker, distill.  I'm still in  
the first tinker phase with Flare right now.  But folks interested in  
rolling up their sleeves and don't mind getting a little grubby with  
code are more than invited to delve into Flare now, with the  
forewarning that the flare you see today will not be at all near the  
Flare that spawns from the ashes.  Pioneering spirit required.

>> : 3. performance benchmark for acts_as_solr plugin available if any
>
> What kind of numbers are you after?  acts_as_solr searches Solr, and
> then will fetch the records from the database to bring back model
> objects, so you have to account for the database access in the
> picture as well as Solr.
>
>>>>
> Well to be specific I am keen to know about creation and update of  
> indexes
> when you run into large number of documents. Since database is used to
> populate the models and definately it will be the commulative  
> effect of
> retrieval of document from solr with lucene, network issues (since  
> its a web
> service) and locally on database (depends on configuration).

Again we need to be clear about "large".  I've got near 4M indexes  
under my belt now, but many others have gone to 10M+.  Lucene and  
Solr both scale very well in the 10's of millions and even further up  
into the hundreds of millions I've heard.

Certainly those other latencies you mention are valid questions, but  
in my experience they've not been show-stopping concerns performance  
with Solr + Ruby has been more than acceptable... it's been just  
fine, even with several spots for improvement in all those areas in  
my applications.  First rule of optimization: Don't.  Second rule of  
optimization: Don't optimize yet.

        Erik


Reply | Threaded
Open this post in threaded view
|

Re: [acts_as_solr] Few question on usage

Erik Hatcher

On Apr 21, 2007, at 9:42 PM, Erik Hatcher wrote:

> source = DataSource.new
>
> mapping = {
>   :id => :isbn,
>   :name => :author,
>   :source => "BOOKS",
>   :year => Proc.new {|record| record.date[0,4] },
> }
>
> Solr::Indexer.index(source, mapper) do |orig_data, solr_document|
>   solr_document[:timestamp] = Time.now
> end

Sorry, my bad, that's what I get for contriving code without testing  
it, and then changing the implementation to suit how I wanted to  
describe it.

It should be Solr::Indexer.index(source, mapping) ....  # mappING

I just changed the implementation to allow a Hash as well as a  
Solr::Importer::Mapper object (well, really anything with a #map  
method).

        Erik



Reply | Threaded
Open this post in threaded view
|

Re: [acts_as_solr] Few question on usage

solruser-2
Hi Erik

Thanks for detailed information. With your detailed information I understand that acts_as_solr is presently the best available solution to connect to Solr from rails application for database. And you look forward to bring this under Solr Ruby development going forward. Which I assume will happen in next month or so.

That being the case acts_as_solr plugin from rubyforge is the most suitable place to start and soon it can be expected to work under apache solrruby project And could look forward for centralized code additions, updates and contributions here. Please correct if this understanding and future expectations is different

Thanks


Erik Hatcher wrote
On Apr 21, 2007, at 9:42 PM, Erik Hatcher wrote:
> source = DataSource.new
>
> mapping = {
>   :id => :isbn,
>   :name => :author,
>   :source => "BOOKS",
>   :year => Proc.new {|record| record.date[0,4] },
> }
>
> Solr::Indexer.index(source, mapper) do |orig_data, solr_document|
>   solr_document[:timestamp] = Time.now
> end

Sorry, my bad, that's what I get for contriving code without testing  
it, and then changing the implementation to suit how I wanted to  
describe it.

It should be Solr::Indexer.index(source, mapping) ....  # mappING

I just changed the implementation to allow a Hash as well as a  
Solr::Importer::Mapper object (well, really anything with a #map  
method).

        Erik