summer of code: solr for apache mail archives

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

summer of code: solr for apache mail archives

Doug Cutting
Here's an idea for a Summer of Code project: build a Solr-based search
engine for Apache's email archives.  This could run on
lucene.zones.apache.org.  It would first index the full archives, and
then index new messages as they arrive.  We can setup a notification
mechanism for new messages with Apache infrastructure.  Fields indexed
would include mailing-list, sender, date, subject and contents.  The
search interface could be a form with text boxes for sender, subject and
content, and perhaps pulldowns for mailing-list and date.  Results could
be sortable by any field.  Faceted search by sender/month/mailing-list,
etc. could be useful.

http://wiki.apache.org/general/SummerOfCode2006

I'd be happy to co-mentor this with someone from Solr.

Doug
Reply | Threaded
Open this post in threaded view
|

Re: summer of code: solr for apache mail archives

Yoav Shapira-2
+1, good idea, I support it.  (But not volunteering to be the actual
mentor as I have too much on my plate scheduled for the summer)

Yoav

On 4/19/06, Doug Cutting <[hidden email]> wrote:

> Here's an idea for a Summer of Code project: build a Solr-based search
> engine for Apache's email archives.  This could run on
> lucene.zones.apache.org.  It would first index the full archives, and
> then index new messages as they arrive.  We can setup a notification
> mechanism for new messages with Apache infrastructure.  Fields indexed
> would include mailing-list, sender, date, subject and contents.  The
> search interface could be a form with text boxes for sender, subject and
> content, and perhaps pulldowns for mailing-list and date.  Results could
> be sortable by any field.  Faceted search by sender/month/mailing-list,
> etc. could be useful.
>
> http://wiki.apache.org/general/SummerOfCode2006
>
> I'd be happy to co-mentor this with someone from Solr.
>
> Doug
>


--
Yoav Shapira
Nimalex LLC
1 Mifflin Place, Suite 310
Cambridge, MA, USA
[hidden email] / www.yoavshapira.com
Reply | Threaded
Open this post in threaded view
|

Re: summer of code: solr for apache mail archives

Yonik Seeley
In reply to this post by Doug Cutting
On 4/19/06, Doug Cutting <[hidden email]> wrote:
> I'd be happy to co-mentor this with someone from Solr.

OK, I'll volunteer to be the co-mentor.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: summer of code: solr for apache mail archives

Yonik Seeley
In reply to this post by Doug Cutting
On 4/19/06, Doug Cutting <[hidden email]> wrote:
> http://wiki.apache.org/general/SummerOfCode2006
>
> I'd be happy to co-mentor this with someone from Solr.

OK, I added solr-mail-archive to the Wiki (pretty much copied your description)

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: summer of code: solr for apache mail archives

Ian Holsman-3
If you could get it so that it interfaces with mod-mbox (what they
currently use)
that would be a better solution for the ASF infrastructure I think.

On 4/21/06, Yonik Seeley <[hidden email]> wrote:
> On 4/19/06, Doug Cutting <[hidden email]> wrote:
> > http://wiki.apache.org/general/SummerOfCode2006
> >
> > I'd be happy to co-mentor this with someone from Solr.
>
> OK, I added solr-mail-archive to the Wiki (pretty much copied your description)
>
> -Yonik
>


--
[hidden email] -- blog: http://feh.holsman.net/ -- PH: ++61-3-9818-0132

If everything seems under control, you're not going fast enough. -
Mario Andretti
Reply | Threaded
Open this post in threaded view
|

Re: summer of code: solr for apache mail archives

Doug Cutting
Ian Holsman wrote:
> If you could get it so that it interfaces with mod-mbox (what they
> currently use)
> that would be a better solution for the ASF infrastructure I think.

I assume that search results would be displayed with mod-mbox, i.e.,
links in the hit list would be links to mail-archive.a.o.  Is that what
you mean?

Also, in my original message I said: "We can setup a notification
mechanism for new messages with Apache infrastructure."  I now note that
mod_mbox provides Atom feeds for each list.  So we can just poll those
to index new messages.  We could generate the current list of feeds by
scraping http://mail-archives.apache.org/mod_mbox/.

Doug