Haloe (Lucene package) released!

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Haloe (Lucene package) released!

Marcus Herou
Hi guys.

Glad to announce that I finally managed to move this package out of the
company code and released it to the OS community. The package contains some
neat classes which we use for instance when indexing and searching through
some hundred thousands of blogs and the ShardedSolrDocumentIndexer will be
used frequently now when we will index the entire Blogosphere.

Check it out at http://dev.tailsweep.com

Kindly

//Marcus
Reply | Threaded
Open this post in threaded view
|

Re: Haloe (Lucene package) released!

Petite Abeille-2-2

On Sep 8, 2008, at 6:43 AM, Marcus Herou wrote:

> the ShardedSolrDocumentIndexer will be
> used frequently now when we will index the entire Blogosphere.

Yes, you will indeed need all the help you can muster! :)

blogosphere, noun
An poisonous environment of methane, self-satisfaction and other hot  
gasses.
“The only creatures that can survive in the blogosphere are low-order  
molds, able to feed off the waste of others.”

http://www.eod.com/devil/archive/blogosphere.html

--
PA.
http://alt.textdrive.com/nanoki/




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Haloe (Lucene package) released!

Marcus Herou
:) Whoof so much high quality info and at the same time a huge amount of
useless data, splogs and spam.

/M

On Mon, Sep 8, 2008 at 7:38 PM, Petite Abeille <[hidden email]>wrote:

>
> On Sep 8, 2008, at 6:43 AM, Marcus Herou wrote:
>
>  the ShardedSolrDocumentIndexer will be
>> used frequently now when we will index the entire Blogosphere.
>>
>
> Yes, you will indeed need all the help you can muster! :)
>
> blogosphere, noun
> An poisonous environment of methane, self-satisfaction and other hot
> gasses.
> "The only creatures that can survive in the blogosphere are low-order
> molds, able to feed off the waste of others."
>
> http://www.eod.com/devil/archive/blogosphere.html
>
> --
> PA.
> http://alt.textdrive.com/nanoki/
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[hidden email]
http://www.tailsweep.com/
http://blogg.tailsweep.com/
Reply | Threaded
Open this post in threaded view
|

Re: Haloe (Lucene package) released!

Petite Abeille-2-2

On Sep 8, 2008, at 7:49 PM, Marcus Herou wrote:

> :) Whoof so much high quality info and at the same time a huge  
> amount of useless data, splogs and spam.

Incidentally, if you search needs are humbler and do not require the  
full fire power of mighty Lucene, SQLite provides a very handy Full  
Text Search (FTS) module:

http://www.sqlite.org/cvstrac/wiki?p=FtsUsage

Rather straightforward to use as well:

(1) Create a table

     create virtual table document using fts3
     (
         name,
         content,
         tokenize porter
     )

(2) Populate it

     insert into document( name, content ) values( %s, %s )

(3) Search it

     select      document.name as name,
                 snippet( document, '<i>', '</i>', '…' ) as extract
     from        document
     where       document.content match %s

Here is an example of FTS at work:

http://svr225.stepx.com:3388/search?q=blog

Cheers,

--
PA.
http://alt.textdrive.com/nanoki/







---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Haloe (Lucene package) released!

Marcus Herou
Cool. Since this project is about indexing documents (java.util.Map) in
general perhaps a SqliteDocumentIndexer could be implemented.
I am as well thinking of using HBase as an indexing system since it is
sorted by nature. And perhaps add BitMap indexing with for example FastBit.

There is no end to the possibilities, just lack of resources :)

Kindly

//Marcus

On Mon, Sep 8, 2008 at 8:34 PM, Petite Abeille <[hidden email]>wrote:

>
> On Sep 8, 2008, at 7:49 PM, Marcus Herou wrote:
>
>  :) Whoof so much high quality info and at the same time a huge amount of
>> useless data, splogs and spam.
>>
>
> Incidentally, if you search needs are humbler and do not require the full
> fire power of mighty Lucene, SQLite provides a very handy Full Text Search
> (FTS) module:
>
> http://www.sqlite.org/cvstrac/wiki?p=FtsUsage
>
> Rather straightforward to use as well:
>
> (1) Create a table
>
>    create virtual table document using fts3
>    (
>        name,
>        content,
>        tokenize porter
>    )
>
> (2) Populate it
>
>    insert into document( name, content ) values( %s, %s )
>
> (3) Search it
>
>    select      document.name as name,
>                snippet( document, '<i>', '</i>', '…' ) as extract
>    from        document
>    where       document.content match %s
>
> Here is an example of FTS at work:
>
> http://svr225.stepx.com:3388/search?q=blog
>
> Cheers,
>
>
> --
> PA.
> http://alt.textdrive.com/nanoki/
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[hidden email]
http://www.tailsweep.com/
http://blogg.tailsweep.com/