GSoC

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

GSoC

David Nemeskey
Hi all,

I have already sent this mail to Simon Willnauer, and he suggested me to post
it here for discussion.

I am David Nemeskey, a PhD student at the Eotvos Lorand University, Budapest,
Hungary. I am doing an IR-related research, and we have considered using
Lucene as our search engine. We were quite satisfied with the speed and ease of
use. However, we would like to experiment with different ranking algorithms,
and this is where problems arise. Lucene only supports the VSM, and
unfortunately the ranking architecture seems to be tailored specifically to its
needs.

I would be very much interested in revamping the ranking component as a GSoC
project. The following modifications should be doable in the allocated time
frame:
- a new ranking class hierarchy, which is generic enough to allow easy
implementation of new weighting schemes (at least bag-of-words ones),
- addition of state-of-the-art ranking methods, such as Okapi BM25, proximity
and DFR models,
- configuration for ranking selection, with the old method as default.

I believe all users of Lucene would profit from such a project. It would
provide the scientific community with an even more useful research aid, while
regular users could benefit from superior ranking results.

Please let me know your opinion about this proposal.

Thank you very much,
David Nemeskey

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GSoC

Mark Miller-3
+1 the proposal. We already have a committer digging into this area - he would make a perfect GSoC mentor! And would likely love the help.

His response likely to follow...

- Mark

On Jan 28, 2011, at 11:32 AM, David Nemeskey wrote:

> Hi all,
>
> I have already sent this mail to Simon Willnauer, and he suggested me to post
> it here for discussion.
>
> I am David Nemeskey, a PhD student at the Eotvos Lorand University, Budapest,
> Hungary. I am doing an IR-related research, and we have considered using
> Lucene as our search engine. We were quite satisfied with the speed and ease of
> use. However, we would like to experiment with different ranking algorithms,
> and this is where problems arise. Lucene only supports the VSM, and
> unfortunately the ranking architecture seems to be tailored specifically to its
> needs.
>
> I would be very much interested in revamping the ranking component as a GSoC
> project. The following modifications should be doable in the allocated time
> frame:
> - a new ranking class hierarchy, which is generic enough to allow easy
> implementation of new weighting schemes (at least bag-of-words ones),
> - addition of state-of-the-art ranking methods, such as Okapi BM25, proximity
> and DFR models,
> - configuration for ranking selection, with the old method as default.
>
> I believe all users of Lucene would profit from such a project. It would
> provide the scientific community with an even more useful research aid, while
> regular users could benefit from superior ranking results.
>
> Please let me know your opinion about this proposal.
>
> Thank you very much,
> David Nemeskey
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

- Mark Miller
lucidimagination.com





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GSoC

Simon Willnauer
On Fri, Jan 28, 2011 at 5:42 PM, Mark Miller <[hidden email]> wrote:
> +1 the proposal. We already have a committer digging into this area - he would make a perfect GSoC mentor! And would likely love the help.

same here +1 - if there is mentoring needed I will be there too.
Robert I recommend you already when David contacted me in the first
place :)

it's all yours :)

simon

>
> His response likely to follow...
>
> - Mark
>
> On Jan 28, 2011, at 11:32 AM, David Nemeskey wrote:
>
>> Hi all,
>>
>> I have already sent this mail to Simon Willnauer, and he suggested me to post
>> it here for discussion.
>>
>> I am David Nemeskey, a PhD student at the Eotvos Lorand University, Budapest,
>> Hungary. I am doing an IR-related research, and we have considered using
>> Lucene as our search engine. We were quite satisfied with the speed and ease of
>> use. However, we would like to experiment with different ranking algorithms,
>> and this is where problems arise. Lucene only supports the VSM, and
>> unfortunately the ranking architecture seems to be tailored specifically to its
>> needs.
>>
>> I would be very much interested in revamping the ranking component as a GSoC
>> project. The following modifications should be doable in the allocated time
>> frame:
>> - a new ranking class hierarchy, which is generic enough to allow easy
>> implementation of new weighting schemes (at least bag-of-words ones),
>> - addition of state-of-the-art ranking methods, such as Okapi BM25, proximity
>> and DFR models,
>> - configuration for ranking selection, with the old method as default.
>>
>> I believe all users of Lucene would profit from such a project. It would
>> provide the scientific community with an even more useful research aid, while
>> regular users could benefit from superior ranking results.
>>
>> Please let me know your opinion about this proposal.
>>
>> Thank you very much,
>> David Nemeskey
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GSoC

Robert Muir
In reply to this post by David Nemeskey
On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey
<[hidden email]> wrote:

> Hi all,
>
> I have already sent this mail to Simon Willnauer, and he suggested me to post
> it here for discussion.
>
> I am David Nemeskey, a PhD student at the Eotvos Lorand University, Budapest,
> Hungary. I am doing an IR-related research, and we have considered using
> Lucene as our search engine. We were quite satisfied with the speed and ease of
> use. However, we would like to experiment with different ranking algorithms,
> and this is where problems arise. Lucene only supports the VSM, and
> unfortunately the ranking architecture seems to be tailored specifically to its
> needs.
>
> I would be very much interested in revamping the ranking component as a GSoC
> project. The following modifications should be doable in the allocated time
> frame:
> - a new ranking class hierarchy, which is generic enough to allow easy
> implementation of new weighting schemes (at least bag-of-words ones),
> - addition of state-of-the-art ranking methods, such as Okapi BM25, proximity
> and DFR models,
> - configuration for ranking selection, with the old method as default.
>
> I believe all users of Lucene would profit from such a project. It would
> provide the scientific community with an even more useful research aid, while
> regular users could benefit from superior ranking results.
>
> Please let me know your opinion about this proposal.
>

Hi David, honestly this sounds fantastic.

It would be great to have someone to work with us on this issue!

To date, progress is pretty slow-going (minor improvements, cleanups,
additional stats here and there)... but we really need all the help we
can get, especially from people who have a really good understanding
of the various models.

In case you are interested, here are some references to discussions
about adding more flexibility (with some prototypes etc):
http://www.lucidimagination.com/search/document/72787e0e54f798e4/baby_steps_towards_making_lucene_s_scoring_more_flexible
https://issues.apache.org/jira/browse/LUCENE-2392

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GSoC

David Nemeskey
Hi guys,

Mark, Robert, Simon: thanks for the support! I really hope we can work
together this summer (and before that, obviously).

According to http://www.google-
melange.com/document/show/gsoc_program/google/gsoc2011/timeline , there's
still some time until the application period. So let me use this week to finish
my PhD research plan, and get back to you next week.

I am not really familiar with how the program works, i.e. how detailed the
application description should be, when mentorship is decided, etc. so I guess
we will have a lot to talk about. :)

(Actually, should we move this discussion private?)

David

> Hi David, honestly this sounds fantastic.
>
> It would be great to have someone to work with us on this issue!
>
> To date, progress is pretty slow-going (minor improvements, cleanups,
> additional stats here and there)... but we really need all the help we
> can get, especially from people who have a really good understanding
> of the various models.
>
> In case you are interested, here are some references to discussions
> about adding more flexibility (with some prototypes etc):
> http://www.lucidimagination.com/search/document/72787e0e54f798e4/baby_steps
> _towards_making_lucene_s_scoring_more_flexible
> https://issues.apache.org/jira/browse/LUCENE-2392

> On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey
>
> <[hidden email]> wrote:
> > Hi all,
> >
> > I have already sent this mail to Simon Willnauer, and he suggested me to
> > post it here for discussion.
> >
> > I am David Nemeskey, a PhD student at the Eotvos Lorand University,
> > Budapest, Hungary. I am doing an IR-related research, and we have
> > considered using Lucene as our search engine. We were quite satisfied
> > with the speed and ease of use. However, we would like to experiment
> > with different ranking algorithms, and this is where problems arise.
> > Lucene only supports the VSM, and unfortunately the ranking architecture
> > seems to be tailored specifically to its needs.
> >
> > I would be very much interested in revamping the ranking component as a
> > GSoC project. The following modifications should be doable in the
> > allocated time frame:
> > - a new ranking class hierarchy, which is generic enough to allow easy
> > implementation of new weighting schemes (at least bag-of-words ones),
> > - addition of state-of-the-art ranking methods, such as Okapi BM25,
> > proximity and DFR models,
> > - configuration for ranking selection, with the old method as default.
> >
> > I believe all users of Lucene would profit from such a project. It would
> > provide the scientific community with an even more useful research aid,
> > while regular users could benefit from superior ranking results.
> >
> > Please let me know your opinion about this proposal.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GSoC

Simon Willnauer
Hey David,

I saw that you added a tiny line to the GSoC Lucene wiki - thanks for that.

On Wed, Feb 2, 2011 at 10:10 AM, David Nemeskey
<[hidden email]> wrote:
> Hi guys,
>
> Mark, Robert, Simon: thanks for the support! I really hope we can work
> together this summer (and before that, obviously).
Same here!
>
> According to http://www.google-
> melange.com/document/show/gsoc_program/google/gsoc2011/timeline , there's
> still some time until the application period. So let me use this week to finish
> my PhD research plan, and get back to you next week.
>
> I am not really familiar with how the program works, i.e. how detailed the
> application description should be, when mentorship is decided, etc. so I guess
> we will have a lot to talk about. :)

so from a 10000ft view it work like this:

1. Write up a short proposal what your idea is about
2. make it public! and publish a implementation plan - how you would
want to realize your proposal. If you don't follow that 100% in the
actual impl. don't worry. Its just mean to give us an idea that you
know what you are doing and where you want to go. something like a 1
A4 rough design doc.
3. give other people the change to apply for the same suggestion (this
is how it works though)
4 Let the ASF / us assign one or more possible mentors to it
5. let us apply for a slot in GSoC (those are limited for organizations)
6. get accepted
7. rock it!

>
> (Actually, should we move this discussion private?)
no - we usually do everything in public except of discussion within
the PMC that are meant to be private for legal reasons or similar
things. Lets stick to the mailing list for all communication except
you have something that should clearly not be public. This also give
other contributors a chance to help and get interested in your work!!

simon

>
> David
>
>> Hi David, honestly this sounds fantastic.
>>
>> It would be great to have someone to work with us on this issue!
>>
>> To date, progress is pretty slow-going (minor improvements, cleanups,
>> additional stats here and there)... but we really need all the help we
>> can get, especially from people who have a really good understanding
>> of the various models.
>>
>> In case you are interested, here are some references to discussions
>> about adding more flexibility (with some prototypes etc):
>> http://www.lucidimagination.com/search/document/72787e0e54f798e4/baby_steps
>> _towards_making_lucene_s_scoring_more_flexible
>> https://issues.apache.org/jira/browse/LUCENE-2392
>
>> On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey
>>
>> <[hidden email]> wrote:
>> > Hi all,
>> >
>> > I have already sent this mail to Simon Willnauer, and he suggested me to
>> > post it here for discussion.
>> >
>> > I am David Nemeskey, a PhD student at the Eotvos Lorand University,
>> > Budapest, Hungary. I am doing an IR-related research, and we have
>> > considered using Lucene as our search engine. We were quite satisfied
>> > with the speed and ease of use. However, we would like to experiment
>> > with different ranking algorithms, and this is where problems arise.
>> > Lucene only supports the VSM, and unfortunately the ranking architecture
>> > seems to be tailored specifically to its needs.
>> >
>> > I would be very much interested in revamping the ranking component as a
>> > GSoC project. The following modifications should be doable in the
>> > allocated time frame:
>> > - a new ranking class hierarchy, which is generic enough to allow easy
>> > implementation of new weighting schemes (at least bag-of-words ones),
>> > - addition of state-of-the-art ranking methods, such as Okapi BM25,
>> > proximity and DFR models,
>> > - configuration for ranking selection, with the old method as default.
>> >
>> > I believe all users of Lucene would profit from such a project. It would
>> > provide the scientific community with an even more useful research aid,
>> > while regular users could benefit from superior ranking results.
>> >
>> > Please let me know your opinion about this proposal.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GSoC

Grant Ingersoll-2
In reply to this post by David Nemeskey

On Feb 2, 2011, at 4:10 AM, David Nemeskey wrote:

> Hi guys,
>
> Mark, Robert, Simon: thanks for the support! I really hope we can work
> together this summer (and before that, obviously).

Sounds like a great idea.  Looking forward to the proposal.

>
> According to http://www.google-
> melange.com/document/show/gsoc_program/google/gsoc2011/timeline , there's
> still some time until the application period. So let me use this week to finish
> my PhD research plan, and get back to you next week.
>
> I am not really familiar with how the program works, i.e. how detailed the
> application description should be, when mentorship is decided, etc. so I guess
> we will have a lot to talk about. :)

It's pretty competitive, especially since you are not only competing against others for Lucene slots, but you are competing against other ASF projects.  I highly recommend you, as well as interested mentors, look through Mahout's past GSOC projects: http://www.lucidimagination.com/search/?q=GSOC#/p:mahout and http://www.lucidimagination.com/search/document/2acd6fd380feec3/thoughts_on_gsoc and https://cwiki.apache.org/confluence/display/MAHOUT/GSOC

>
> (Actually, should we move this discussion private?)

No, you shouldn't and it would be to your detriment come the ranking process since people won't have a track record of what you've done as it relates to your proposal.  The goal of GSOC is to learn how Open Source works.  Even though you have a mentor, that person is there to help you navigate the community, not to be a private tutor on technical details.   I routinely tell all my students that I will help them w/ personal issues (vacation, emergencies, etc.) but that all technical stuff must be done on list (JIRA, IRC, dev@, patches, etc.)

>
> David
>
>> Hi David, honestly this sounds fantastic.
>>
>> It would be great to have someone to work with us on this issue!
>>
>> To date, progress is pretty slow-going (minor improvements, cleanups,
>> additional stats here and there)... but we really need all the help we
>> can get, especially from people who have a really good understanding
>> of the various models.
>>
>> In case you are interested, here are some references to discussions
>> about adding more flexibility (with some prototypes etc):
>> http://www.lucidimagination.com/search/document/72787e0e54f798e4/baby_steps
>> _towards_making_lucene_s_scoring_more_flexible
>> https://issues.apache.org/jira/browse/LUCENE-2392
>
>> On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey
>>
>> <[hidden email]> wrote:
>>> Hi all,
>>>
>>> I have already sent this mail to Simon Willnauer, and he suggested me to
>>> post it here for discussion.
>>>
>>> I am David Nemeskey, a PhD student at the Eotvos Lorand University,
>>> Budapest, Hungary. I am doing an IR-related research, and we have
>>> considered using Lucene as our search engine. We were quite satisfied
>>> with the speed and ease of use. However, we would like to experiment
>>> with different ranking algorithms, and this is where problems arise.
>>> Lucene only supports the VSM, and unfortunately the ranking architecture
>>> seems to be tailored specifically to its needs.
>>>
>>> I would be very much interested in revamping the ranking component as a
>>> GSoC project. The following modifications should be doable in the
>>> allocated time frame:
>>> - a new ranking class hierarchy, which is generic enough to allow easy
>>> implementation of new weighting schemes (at least bag-of-words ones),
>>> - addition of state-of-the-art ranking methods, such as Okapi BM25,
>>> proximity and DFR models,
>>> - configuration for ranking selection, with the old method as default.
>>>
>>> I believe all users of Lucene would profit from such a project. It would
>>> provide the scientific community with an even more useful research aid,
>>> while regular users could benefit from superior ranking results.
>>>
>>> Please let me know your opinion about this proposal.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GSoC

David Nemeskey
In reply to this post by Simon Willnauer
Hey,

I have written the proposal. Please let me know if you want more / less of
certain parts. Should I upload it somewhere?

Implementation plan soon to follow.

Sorry for the late reply; I have been rather busy these past few weeks.

David

On Wednesday, February 02, 2011 10:35:55 Simon Willnauer wrote:

> Hey David,
>
> I saw that you added a tiny line to the GSoC Lucene wiki - thanks for that.
>
> On Wed, Feb 2, 2011 at 10:10 AM, David Nemeskey
>
> <[hidden email]> wrote:
> > Hi guys,
> >
> > Mark, Robert, Simon: thanks for the support! I really hope we can work
> > together this summer (and before that, obviously).
>
> Same here!
>
> > According to http://www.google-
> > melange.com/document/show/gsoc_program/google/gsoc2011/timeline , there's
> > still some time until the application period. So let me use this week to
> > finish my PhD research plan, and get back to you next week.
> >
> > I am not really familiar with how the program works, i.e. how detailed
> > the application description should be, when mentorship is decided, etc.
> > so I guess we will have a lot to talk about. :)
>
> so from a 10000ft view it work like this:
>
> 1. Write up a short proposal what your idea is about
> 2. make it public! and publish a implementation plan - how you would
> want to realize your proposal. If you don't follow that 100% in the
> actual impl. don't worry. Its just mean to give us an idea that you
> know what you are doing and where you want to go. something like a 1
> A4 rough design doc.
> 3. give other people the change to apply for the same suggestion (this
> is how it works though)
> 4 Let the ASF / us assign one or more possible mentors to it
> 5. let us apply for a slot in GSoC (those are limited for organizations)
> 6. get accepted
> 7. rock it!
>
> > (Actually, should we move this discussion private?)
>
> no - we usually do everything in public except of discussion within
> the PMC that are meant to be private for legal reasons or similar
> things. Lets stick to the mailing list for all communication except
> you have something that should clearly not be public. This also give
> other contributors a chance to help and get interested in your work!!
>
> simon
>
> > David
> >
> >> Hi David, honestly this sounds fantastic.
> >>
> >> It would be great to have someone to work with us on this issue!
> >>
> >> To date, progress is pretty slow-going (minor improvements, cleanups,
> >> additional stats here and there)... but we really need all the help we
> >> can get, especially from people who have a really good understanding
> >> of the various models.
> >>
> >> In case you are interested, here are some references to discussions
> >> about adding more flexibility (with some prototypes etc):
> >> http://www.lucidimagination.com/search/document/72787e0e54f798e4/baby_st
> >> eps _towards_making_lucene_s_scoring_more_flexible
> >> https://issues.apache.org/jira/browse/LUCENE-2392
> >>
> >> On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey
> >>
> >> <[hidden email]> wrote:
> >> > Hi all,
> >> >
> >> > I have already sent this mail to Simon Willnauer, and he suggested me
> >> > to post it here for discussion.
> >> >
> >> > I am David Nemeskey, a PhD student at the Eotvos Lorand University,
> >> > Budapest, Hungary. I am doing an IR-related research, and we have
> >> > considered using Lucene as our search engine. We were quite satisfied
> >> > with the speed and ease of use. However, we would like to experiment
> >> > with different ranking algorithms, and this is where problems arise.
> >> > Lucene only supports the VSM, and unfortunately the ranking
> >> > architecture seems to be tailored specifically to its needs.
> >> >
> >> > I would be very much interested in revamping the ranking component as
> >> > a GSoC project. The following modifications should be doable in the
> >> > allocated time frame:
> >> > - a new ranking class hierarchy, which is generic enough to allow easy
> >> > implementation of new weighting schemes (at least bag-of-words ones),
> >> > - addition of state-of-the-art ranking methods, such as Okapi BM25,
> >> > proximity and DFR models,
> >> > - configuration for ranking selection, with the old method as default.
> >> >
> >> > I believe all users of Lucene would profit from such a project. It
> >> > would provide the scientific community with an even more useful
> >> > research aid, while regular users could benefit from superior ranking
> >> > results.
> >> >
> >> > Please let me know your opinion about this proposal.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

proposal.pdf (115K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: GSoC

Simon Willnauer
I think that is good for now. I should get started on codeawards and
wrap up our proposals. I hope I can do that this week.

simon

On Tue, Feb 22, 2011 at 3:16 PM, David Nemeskey
<[hidden email]> wrote:

> Hey,
>
> I have written the proposal. Please let me know if you want more / less of
> certain parts. Should I upload it somewhere?
>
> Implementation plan soon to follow.
>
> Sorry for the late reply; I have been rather busy these past few weeks.
>
> David
>
> On Wednesday, February 02, 2011 10:35:55 Simon Willnauer wrote:
>> Hey David,
>>
>> I saw that you added a tiny line to the GSoC Lucene wiki - thanks for that.
>>
>> On Wed, Feb 2, 2011 at 10:10 AM, David Nemeskey
>>
>> <[hidden email]> wrote:
>> > Hi guys,
>> >
>> > Mark, Robert, Simon: thanks for the support! I really hope we can work
>> > together this summer (and before that, obviously).
>>
>> Same here!
>>
>> > According to http://www.google-
>> > melange.com/document/show/gsoc_program/google/gsoc2011/timeline , there's
>> > still some time until the application period. So let me use this week to
>> > finish my PhD research plan, and get back to you next week.
>> >
>> > I am not really familiar with how the program works, i.e. how detailed
>> > the application description should be, when mentorship is decided, etc.
>> > so I guess we will have a lot to talk about. :)
>>
>> so from a 10000ft view it work like this:
>>
>> 1. Write up a short proposal what your idea is about
>> 2. make it public! and publish a implementation plan - how you would
>> want to realize your proposal. If you don't follow that 100% in the
>> actual impl. don't worry. Its just mean to give us an idea that you
>> know what you are doing and where you want to go. something like a 1
>> A4 rough design doc.
>> 3. give other people the change to apply for the same suggestion (this
>> is how it works though)
>> 4 Let the ASF / us assign one or more possible mentors to it
>> 5. let us apply for a slot in GSoC (those are limited for organizations)
>> 6. get accepted
>> 7. rock it!
>>
>> > (Actually, should we move this discussion private?)
>>
>> no - we usually do everything in public except of discussion within
>> the PMC that are meant to be private for legal reasons or similar
>> things. Lets stick to the mailing list for all communication except
>> you have something that should clearly not be public. This also give
>> other contributors a chance to help and get interested in your work!!
>>
>> simon
>>
>> > David
>> >
>> >> Hi David, honestly this sounds fantastic.
>> >>
>> >> It would be great to have someone to work with us on this issue!
>> >>
>> >> To date, progress is pretty slow-going (minor improvements, cleanups,
>> >> additional stats here and there)... but we really need all the help we
>> >> can get, especially from people who have a really good understanding
>> >> of the various models.
>> >>
>> >> In case you are interested, here are some references to discussions
>> >> about adding more flexibility (with some prototypes etc):
>> >> http://www.lucidimagination.com/search/document/72787e0e54f798e4/baby_st
>> >> eps _towards_making_lucene_s_scoring_more_flexible
>> >> https://issues.apache.org/jira/browse/LUCENE-2392
>> >>
>> >> On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey
>> >>
>> >> <[hidden email]> wrote:
>> >> > Hi all,
>> >> >
>> >> > I have already sent this mail to Simon Willnauer, and he suggested me
>> >> > to post it here for discussion.
>> >> >
>> >> > I am David Nemeskey, a PhD student at the Eotvos Lorand University,
>> >> > Budapest, Hungary. I am doing an IR-related research, and we have
>> >> > considered using Lucene as our search engine. We were quite satisfied
>> >> > with the speed and ease of use. However, we would like to experiment
>> >> > with different ranking algorithms, and this is where problems arise.
>> >> > Lucene only supports the VSM, and unfortunately the ranking
>> >> > architecture seems to be tailored specifically to its needs.
>> >> >
>> >> > I would be very much interested in revamping the ranking component as
>> >> > a GSoC project. The following modifications should be doable in the
>> >> > allocated time frame:
>> >> > - a new ranking class hierarchy, which is generic enough to allow easy
>> >> > implementation of new weighting schemes (at least bag-of-words ones),
>> >> > - addition of state-of-the-art ranking methods, such as Okapi BM25,
>> >> > proximity and DFR models,
>> >> > - configuration for ranking selection, with the old method as default.
>> >> >
>> >> > I believe all users of Lucene would profit from such a project. It
>> >> > would provide the scientific community with an even more useful
>> >> > research aid, while regular users could benefit from superior ranking
>> >> > results.
>> >> >
>> >> > Please let me know your opinion about this proposal.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [hidden email]
>> > For additional commands, e-mail: [hidden email]
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GSoC

Fernando Wasylyszyn
"This also give other contributors a chance to help and get interested in your work!!"
I really would love to contribute to this project!

Regards.
Fernando.


De: Simon Willnauer <[hidden email]>
Para: [hidden email]
CC: David Nemeskey <[hidden email]>
Enviado: martes, 22 de febrero, 2011 11:22:57
Asunto: Re: GSoC

I think that is good for now. I should get started on codeawards and
wrap up our proposals. I hope I can do that this week.

simon

On Tue, Feb 22, 2011 at 3:16 PM, David Nemeskey
<[hidden email]> wrote:

> Hey,
>
> I have written the proposal. Please let me know if you want more / less of
> certain parts. Should I upload it somewhere?
>
> Implementation plan soon to follow.
>
> Sorry for the late reply; I have been rather busy these past few weeks.
>
> David
>
> On Wednesday, February 02, 2011 10:35:55 Simon Willnauer wrote:
>> Hey David,
>>
>> I saw that you added a tiny line to the GSoC Lucene wiki - thanks for that.
>>
>> On Wed, Feb 2, 2011 at 10:10 AM, David Nemeskey
>>
>> <[hidden email]> wrote:
>> > Hi guys,
>> >
>> > Mark, Robert, Simon: thanks for the support! I really hope we can work
>> > together this summer (and before that, obviously).
>>
>> Same here!
>>
>> > According to http://www.google-
>> > melange.com/document/show/gsoc_program/google/gsoc2011/timeline , there's
>> > still some time until the application period. So let me use this week to
>> > finish my PhD research plan, and get back to you next week.
>> >
>> > I am not really familiar with how the program works, i.e. how detailed
>> > the application description should be, when mentorship is decided, etc.
>> > so I guess we will have a lot to talk about. :)
>>
>> so from a 10000ft view it work like this:
>>
>> 1. Write up a short proposal what your idea is about
>> 2. make it public! and publish a implementation plan - how you would
>> want to realize your proposal. If you don't follow that 100% in the
>> actual impl. don't worry. Its just mean to give us an idea that you
>> know what you are doing and where you want to go. something like a 1
>> A4 rough design doc.
>> 3. give other people the change to apply for the same suggestion (this
>> is how it works though)
>> 4 Let the ASF / us assign one or more possible mentors to it
>> 5. let us apply for a slot in GSoC (those are limited for organizations)
>> 6. get accepted
>> 7. rock it!
>>
>> > (Actually, should we move this discussion private?)
>>
>> no - we usually do everything in public except of discussion within
>> the PMC that are meant to be private for legal reasons or similar
>> things. Lets stick to the mailing list for all communication except
>> you have something that should clearly not be public. This also give
>> other contributors a chance to help and get interested in your work!!
>>
>> simon
>>
>> > David
>> >
>> >> Hi David, honestly this sounds fantastic.
>> >>
>> >> It would be great to have someone to work with us on this issue!
>> >>
>> >> To date, progress is pretty slow-going (minor improvements, cleanups,
>> >> additional stats here and there)... but we really need all the help we
>> >> can get, especially from people who have a really good understanding
>> >> of the various models.
>> >>
>> >> In case you are interested, here are some references to discussions
>> >> about adding more flexibility (with some prototypes etc):
>> >> http://www.lucidimagination.com/search/document/72787e0e54f798e4/baby_st
>> >> eps _towards_making_lucene_s_scoring_more_flexible
>> >> https://issues.apache.org/jira/browse/LUCENE-2392
>> >>
>> >> On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey
>> >>
>> >> <[hidden email]> wrote:
>> >> > Hi all,
>> >> >
>> >> > I have already sent this mail to Simon Willnauer, and he suggested me
>> >> > to post it here for discussion.
>> >> >
>> >> > I am David Nemeskey, a PhD student at the Eotvos Lorand University,
>> >> > Budapest, Hungary. I am doing an IR-related research, and we have
>> >> > considered using Lucene as our search engine. We were quite satisfied
>> >> > with the speed and ease of use. However, we would like to experiment
>> >> > with different ranking algorithms, and this is where problems arise.
>> >> > Lucene only supports the VSM, and unfortunately the ranking
>> >> > architecture seems to be tailored specifically to its needs.
>> >> >
>> >> > I would be very much interested in revamping the ranking component as
>> >> > a GSoC project. The following modifications should be doable in the
>> >> > allocated time frame:
>> >> > - a new ranking class hierarchy, which is generic enough to allow easy
>> >> > implementation of new weighting schemes (at least bag-of-words ones),
>> >> > - addition of state-of-the-art ranking methods, such as Okapi BM25,
>> >> > proximity and DFR models,
>> >> > - configuration for ranking selection, with the old method as default.
>> >> >
>> >> > I believe all users of Lucene would profit from such a project. It
>> >> > would provide the scientific community with an even more useful
>> >> > research aid, while regular users could benefit from superior ranking
>> >> > results.
>> >> >
>> >> > Please let me know your opinion about this proposal.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [hidden email]
>> > For additional commands, e-mail: [hidden email]
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


 
Reply | Threaded
Open this post in threaded view
|

Re: GSoC

David Nemeskey
In reply to this post by Simon Willnauer
Please find the implementation plan attached. The word "soon" gets a new
meaning when power outages are taken into account. :)

As before, comments are welcome.

David

On Tuesday, February 22, 2011 15:22:57 Simon Willnauer wrote:

> I think that is good for now. I should get started on codeawards and
> wrap up our proposals. I hope I can do that this week.
>
> simon
>
> On Tue, Feb 22, 2011 at 3:16 PM, David Nemeskey
>
> <[hidden email]> wrote:
> > Hey,
> >
> > I have written the proposal. Please let me know if you want more / less
> > of certain parts. Should I upload it somewhere?
> >
> > Implementation plan soon to follow.
> >
> > Sorry for the late reply; I have been rather busy these past few weeks.
> >
> > David
> >
> > On Wednesday, February 02, 2011 10:35:55 Simon Willnauer wrote:
> >> Hey David,
> >>
> >> I saw that you added a tiny line to the GSoC Lucene wiki - thanks for
> >> that.
> >>
> >> On Wed, Feb 2, 2011 at 10:10 AM, David Nemeskey
> >>
> >> <[hidden email]> wrote:
> >> > Hi guys,
> >> >
> >> > Mark, Robert, Simon: thanks for the support! I really hope we can work
> >> > together this summer (and before that, obviously).
> >>
> >> Same here!
> >>
> >> > According to http://www.google-
> >> > melange.com/document/show/gsoc_program/google/gsoc2011/timeline ,
> >> > there's still some time until the application period. So let me use
> >> > this week to finish my PhD research plan, and get back to you next
> >> > week.
> >> >
> >> > I am not really familiar with how the program works, i.e. how detailed
> >> > the application description should be, when mentorship is decided,
> >> > etc. so I guess we will have a lot to talk about. :)
> >>
> >> so from a 10000ft view it work like this:
> >>
> >> 1. Write up a short proposal what your idea is about
> >> 2. make it public! and publish a implementation plan - how you would
> >> want to realize your proposal. If you don't follow that 100% in the
> >> actual impl. don't worry. Its just mean to give us an idea that you
> >> know what you are doing and where you want to go. something like a 1
> >> A4 rough design doc.
> >> 3. give other people the change to apply for the same suggestion (this
> >> is how it works though)
> >> 4 Let the ASF / us assign one or more possible mentors to it
> >> 5. let us apply for a slot in GSoC (those are limited for organizations)
> >> 6. get accepted
> >> 7. rock it!
> >>
> >> > (Actually, should we move this discussion private?)
> >>
> >> no - we usually do everything in public except of discussion within
> >> the PMC that are meant to be private for legal reasons or similar
> >> things. Lets stick to the mailing list for all communication except
> >> you have something that should clearly not be public. This also give
> >> other contributors a chance to help and get interested in your work!!
> >>
> >> simon
> >>
> >> > David
> >> >
> >> >> Hi David, honestly this sounds fantastic.
> >> >>
> >> >> It would be great to have someone to work with us on this issue!
> >> >>
> >> >> To date, progress is pretty slow-going (minor improvements, cleanups,
> >> >> additional stats here and there)... but we really need all the help
> >> >> we can get, especially from people who have a really good
> >> >> understanding of the various models.
> >> >>
> >> >> In case you are interested, here are some references to discussions
> >> >> about adding more flexibility (with some prototypes etc):
> >> >> http://www.lucidimagination.com/search/document/72787e0e54f798e4/baby
> >> >> _st eps _towards_making_lucene_s_scoring_more_flexible
> >> >> https://issues.apache.org/jira/browse/LUCENE-2392
> >> >>
> >> >> On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey
> >> >>
> >> >> <[hidden email]> wrote:
> >> >> > Hi all,
> >> >> >
> >> >> > I have already sent this mail to Simon Willnauer, and he suggested
> >> >> > me to post it here for discussion.
> >> >> >
> >> >> > I am David Nemeskey, a PhD student at the Eotvos Lorand University,
> >> >> > Budapest, Hungary. I am doing an IR-related research, and we have
> >> >> > considered using Lucene as our search engine. We were quite
> >> >> > satisfied with the speed and ease of use. However, we would like
> >> >> > to experiment with different ranking algorithms, and this is where
> >> >> > problems arise. Lucene only supports the VSM, and unfortunately
> >> >> > the ranking architecture seems to be tailored specifically to its
> >> >> > needs.
> >> >> >
> >> >> > I would be very much interested in revamping the ranking component
> >> >> > as a GSoC project. The following modifications should be doable in
> >> >> > the allocated time frame:
> >> >> > - a new ranking class hierarchy, which is generic enough to allow
> >> >> > easy implementation of new weighting schemes (at least
> >> >> > bag-of-words ones), - addition of state-of-the-art ranking
> >> >> > methods, such as Okapi BM25, proximity and DFR models,
> >> >> > - configuration for ranking selection, with the old method as
> >> >> > default.
> >> >> >
> >> >> > I believe all users of Lucene would profit from such a project. It
> >> >> > would provide the scientific community with an even more useful
> >> >> > research aid, while regular users could benefit from superior
> >> >> > ranking results.
> >> >> >
> >> >> > Please let me know your opinion about this proposal.
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: [hidden email]
> >> > For additional commands, e-mail: [hidden email]
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

implementation_plan.pdf (66K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: GSoC

Simon Willnauer
Hey David and all others who want to contribute to GSoC,

the ASF has applied for GSoC 2011 as a mentoring organization. As a
ASF project we don't need to apply directly though but we need to
register our ideas now. This works like almost anything in the ASF
through JIRA. All ideas should be recorded as JIRA tickets  labeled
with "gsoc2011". Once this is done it will show up here:
http://s.apache.org/gsoc2011tasks

Everybody who is interested in GSoC as a mentor or student should now
read this too http://community.apache.org/gsoc.html


Thanks,

Simon




On Thu, Feb 24, 2011 at 12:14 PM, David Nemeskey
<[hidden email]> wrote:

> Please find the implementation plan attached. The word "soon" gets a new
> meaning when power outages are taken into account. :)
>
> As before, comments are welcome.
>
> David
>
> On Tuesday, February 22, 2011 15:22:57 Simon Willnauer wrote:
>> I think that is good for now. I should get started on codeawards and
>> wrap up our proposals. I hope I can do that this week.
>>
>> simon
>>
>> On Tue, Feb 22, 2011 at 3:16 PM, David Nemeskey
>>
>> <[hidden email]> wrote:
>> > Hey,
>> >
>> > I have written the proposal. Please let me know if you want more / less
>> > of certain parts. Should I upload it somewhere?
>> >
>> > Implementation plan soon to follow.
>> >
>> > Sorry for the late reply; I have been rather busy these past few weeks.
>> >
>> > David
>> >
>> > On Wednesday, February 02, 2011 10:35:55 Simon Willnauer wrote:
>> >> Hey David,
>> >>
>> >> I saw that you added a tiny line to the GSoC Lucene wiki - thanks for
>> >> that.
>> >>
>> >> On Wed, Feb 2, 2011 at 10:10 AM, David Nemeskey
>> >>
>> >> <[hidden email]> wrote:
>> >> > Hi guys,
>> >> >
>> >> > Mark, Robert, Simon: thanks for the support! I really hope we can work
>> >> > together this summer (and before that, obviously).
>> >>
>> >> Same here!
>> >>
>> >> > According to http://www.google-
>> >> > melange.com/document/show/gsoc_program/google/gsoc2011/timeline ,
>> >> > there's still some time until the application period. So let me use
>> >> > this week to finish my PhD research plan, and get back to you next
>> >> > week.
>> >> >
>> >> > I am not really familiar with how the program works, i.e. how detailed
>> >> > the application description should be, when mentorship is decided,
>> >> > etc. so I guess we will have a lot to talk about. :)
>> >>
>> >> so from a 10000ft view it work like this:
>> >>
>> >> 1. Write up a short proposal what your idea is about
>> >> 2. make it public! and publish a implementation plan - how you would
>> >> want to realize your proposal. If you don't follow that 100% in the
>> >> actual impl. don't worry. Its just mean to give us an idea that you
>> >> know what you are doing and where you want to go. something like a 1
>> >> A4 rough design doc.
>> >> 3. give other people the change to apply for the same suggestion (this
>> >> is how it works though)
>> >> 4 Let the ASF / us assign one or more possible mentors to it
>> >> 5. let us apply for a slot in GSoC (those are limited for organizations)
>> >> 6. get accepted
>> >> 7. rock it!
>> >>
>> >> > (Actually, should we move this discussion private?)
>> >>
>> >> no - we usually do everything in public except of discussion within
>> >> the PMC that are meant to be private for legal reasons or similar
>> >> things. Lets stick to the mailing list for all communication except
>> >> you have something that should clearly not be public. This also give
>> >> other contributors a chance to help and get interested in your work!!
>> >>
>> >> simon
>> >>
>> >> > David
>> >> >
>> >> >> Hi David, honestly this sounds fantastic.
>> >> >>
>> >> >> It would be great to have someone to work with us on this issue!
>> >> >>
>> >> >> To date, progress is pretty slow-going (minor improvements, cleanups,
>> >> >> additional stats here and there)... but we really need all the help
>> >> >> we can get, especially from people who have a really good
>> >> >> understanding of the various models.
>> >> >>
>> >> >> In case you are interested, here are some references to discussions
>> >> >> about adding more flexibility (with some prototypes etc):
>> >> >> http://www.lucidimagination.com/search/document/72787e0e54f798e4/baby
>> >> >> _st eps _towards_making_lucene_s_scoring_more_flexible
>> >> >> https://issues.apache.org/jira/browse/LUCENE-2392
>> >> >>
>> >> >> On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey
>> >> >>
>> >> >> <[hidden email]> wrote:
>> >> >> > Hi all,
>> >> >> >
>> >> >> > I have already sent this mail to Simon Willnauer, and he suggested
>> >> >> > me to post it here for discussion.
>> >> >> >
>> >> >> > I am David Nemeskey, a PhD student at the Eotvos Lorand University,
>> >> >> > Budapest, Hungary. I am doing an IR-related research, and we have
>> >> >> > considered using Lucene as our search engine. We were quite
>> >> >> > satisfied with the speed and ease of use. However, we would like
>> >> >> > to experiment with different ranking algorithms, and this is where
>> >> >> > problems arise. Lucene only supports the VSM, and unfortunately
>> >> >> > the ranking architecture seems to be tailored specifically to its
>> >> >> > needs.
>> >> >> >
>> >> >> > I would be very much interested in revamping the ranking component
>> >> >> > as a GSoC project. The following modifications should be doable in
>> >> >> > the allocated time frame:
>> >> >> > - a new ranking class hierarchy, which is generic enough to allow
>> >> >> > easy implementation of new weighting schemes (at least
>> >> >> > bag-of-words ones), - addition of state-of-the-art ranking
>> >> >> > methods, such as Okapi BM25, proximity and DFR models,
>> >> >> > - configuration for ranking selection, with the old method as
>> >> >> > default.
>> >> >> >
>> >> >> > I believe all users of Lucene would profit from such a project. It
>> >> >> > would provide the scientific community with an even more useful
>> >> >> > research aid, while regular users could benefit from superior
>> >> >> > ranking results.
>> >> >> >
>> >> >> > Please let me know your opinion about this proposal.
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: [hidden email]
>> >> > For additional commands, e-mail: [hidden email]
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [hidden email]
>> >> For additional commands, e-mail: [hidden email]
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [hidden email]
>> > For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GSoC

Grant Ingersoll-2
I think we, Lucene committers, need to identify who is willing to mentor.    In my experience, it is less than 5 hours a week.  Most of the work is done as part of the community.  Sometimes you have to be tough and fail someone (I did last year) but most of the time, if you take the time to interview the candidates up front, it is a good experience for everyone.

I'd add it would be useful to have everyone put the lucene-gsoc-11 label on their issues too, that way we can quickly find the Lucene ones.

Also, feel free to label existing bugs.


On Mar 9, 2011, at 2:11 AM, Simon Willnauer wrote:

> Hey David and all others who want to contribute to GSoC,
>
> the ASF has applied for GSoC 2011 as a mentoring organization. As a
> ASF project we don't need to apply directly though but we need to
> register our ideas now. This works like almost anything in the ASF
> through JIRA. All ideas should be recorded as JIRA tickets  labeled
> with "gsoc2011". Once this is done it will show up here:
> http://s.apache.org/gsoc2011tasks
>
> Everybody who is interested in GSoC as a mentor or student should now
> read this too http://community.apache.org/gsoc.html
>
>
> Thanks,
>
> Simon
>
>
>
>
> On Thu, Feb 24, 2011 at 12:14 PM, David Nemeskey
> <[hidden email]> wrote:
>> Please find the implementation plan attached. The word "soon" gets a new
>> meaning when power outages are taken into account. :)
>>
>> As before, comments are welcome.
>>
>> David
>>
>> On Tuesday, February 22, 2011 15:22:57 Simon Willnauer wrote:
>>> I think that is good for now. I should get started on codeawards and
>>> wrap up our proposals. I hope I can do that this week.
>>>
>>> simon
>>>
>>> On Tue, Feb 22, 2011 at 3:16 PM, David Nemeskey
>>>
>>> <[hidden email]> wrote:
>>>> Hey,
>>>>
>>>> I have written the proposal. Please let me know if you want more / less
>>>> of certain parts. Should I upload it somewhere?
>>>>
>>>> Implementation plan soon to follow.
>>>>
>>>> Sorry for the late reply; I have been rather busy these past few weeks.
>>>>
>>>> David
>>>>
>>>> On Wednesday, February 02, 2011 10:35:55 Simon Willnauer wrote:
>>>>> Hey David,
>>>>>
>>>>> I saw that you added a tiny line to the GSoC Lucene wiki - thanks for
>>>>> that.
>>>>>
>>>>> On Wed, Feb 2, 2011 at 10:10 AM, David Nemeskey
>>>>>
>>>>> <[hidden email]> wrote:
>>>>>> Hi guys,
>>>>>>
>>>>>> Mark, Robert, Simon: thanks for the support! I really hope we can work
>>>>>> together this summer (and before that, obviously).
>>>>>
>>>>> Same here!
>>>>>
>>>>>> According to http://www.google-
>>>>>> melange.com/document/show/gsoc_program/google/gsoc2011/timeline ,
>>>>>> there's still some time until the application period. So let me use
>>>>>> this week to finish my PhD research plan, and get back to you next
>>>>>> week.
>>>>>>
>>>>>> I am not really familiar with how the program works, i.e. how detailed
>>>>>> the application description should be, when mentorship is decided,
>>>>>> etc. so I guess we will have a lot to talk about. :)
>>>>>
>>>>> so from a 10000ft view it work like this:
>>>>>
>>>>> 1. Write up a short proposal what your idea is about
>>>>> 2. make it public! and publish a implementation plan - how you would
>>>>> want to realize your proposal. If you don't follow that 100% in the
>>>>> actual impl. don't worry. Its just mean to give us an idea that you
>>>>> know what you are doing and where you want to go. something like a 1
>>>>> A4 rough design doc.
>>>>> 3. give other people the change to apply for the same suggestion (this
>>>>> is how it works though)
>>>>> 4 Let the ASF / us assign one or more possible mentors to it
>>>>> 5. let us apply for a slot in GSoC (those are limited for organizations)
>>>>> 6. get accepted
>>>>> 7. rock it!
>>>>>
>>>>>> (Actually, should we move this discussion private?)
>>>>>
>>>>> no - we usually do everything in public except of discussion within
>>>>> the PMC that are meant to be private for legal reasons or similar
>>>>> things. Lets stick to the mailing list for all communication except
>>>>> you have something that should clearly not be public. This also give
>>>>> other contributors a chance to help and get interested in your work!!
>>>>>
>>>>> simon
>>>>>
>>>>>> David
>>>>>>
>>>>>>> Hi David, honestly this sounds fantastic.
>>>>>>>
>>>>>>> It would be great to have someone to work with us on this issue!
>>>>>>>
>>>>>>> To date, progress is pretty slow-going (minor improvements, cleanups,
>>>>>>> additional stats here and there)... but we really need all the help
>>>>>>> we can get, especially from people who have a really good
>>>>>>> understanding of the various models.
>>>>>>>
>>>>>>> In case you are interested, here are some references to discussions
>>>>>>> about adding more flexibility (with some prototypes etc):
>>>>>>> http://www.lucidimagination.com/search/document/72787e0e54f798e4/baby
>>>>>>> _st eps _towards_making_lucene_s_scoring_more_flexible
>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2392
>>>>>>>
>>>>>>> On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey
>>>>>>>
>>>>>>> <[hidden email]> wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I have already sent this mail to Simon Willnauer, and he suggested
>>>>>>>> me to post it here for discussion.
>>>>>>>>
>>>>>>>> I am David Nemeskey, a PhD student at the Eotvos Lorand University,
>>>>>>>> Budapest, Hungary. I am doing an IR-related research, and we have
>>>>>>>> considered using Lucene as our search engine. We were quite
>>>>>>>> satisfied with the speed and ease of use. However, we would like
>>>>>>>> to experiment with different ranking algorithms, and this is where
>>>>>>>> problems arise. Lucene only supports the VSM, and unfortunately
>>>>>>>> the ranking architecture seems to be tailored specifically to its
>>>>>>>> needs.
>>>>>>>>
>>>>>>>> I would be very much interested in revamping the ranking component
>>>>>>>> as a GSoC project. The following modifications should be doable in
>>>>>>>> the allocated time frame:
>>>>>>>> - a new ranking class hierarchy, which is generic enough to allow
>>>>>>>> easy implementation of new weighting schemes (at least
>>>>>>>> bag-of-words ones), - addition of state-of-the-art ranking
>>>>>>>> methods, such as Okapi BM25, proximity and DFR models,
>>>>>>>> - configuration for ranking selection, with the old method as
>>>>>>>> default.
>>>>>>>>
>>>>>>>> I believe all users of Lucene would profit from such a project. It
>>>>>>>> would provide the scientific community with an even more useful
>>>>>>>> research aid, while regular users could benefit from superior
>>>>>>>> ranking results.
>>>>>>>>
>>>>>>>> Please let me know your opinion about this proposal.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GSoC

Simon Willnauer
On Wed, Mar 9, 2011 at 5:48 PM, Grant Ingersoll <[hidden email]> wrote:
> I think we, Lucene committers, need to identify who is willing to mentor.    In my experience, it is less than 5 hours a week.  Most of the work is done as part of the community.  Sometimes you have to be tough and fail someone (I did last year) but most of the time, if you take the time to interview the candidates up front, it is a good experience for everyone.

count me in

>
> I'd add it would be useful to have everyone put the lucene-gsoc-11 label on their issues too, that way we can quickly find the Lucene ones.

done on at least one ;)

simon

>
> Also, feel free to label existing bugs.
>
>
> On Mar 9, 2011, at 2:11 AM, Simon Willnauer wrote:
>
>> Hey David and all others who want to contribute to GSoC,
>>
>> the ASF has applied for GSoC 2011 as a mentoring organization. As a
>> ASF project we don't need to apply directly though but we need to
>> register our ideas now. This works like almost anything in the ASF
>> through JIRA. All ideas should be recorded as JIRA tickets  labeled
>> with "gsoc2011". Once this is done it will show up here:
>> http://s.apache.org/gsoc2011tasks
>>
>> Everybody who is interested in GSoC as a mentor or student should now
>> read this too http://community.apache.org/gsoc.html
>>
>>
>> Thanks,
>>
>> Simon
>>
>>
>>
>>
>> On Thu, Feb 24, 2011 at 12:14 PM, David Nemeskey
>> <[hidden email]> wrote:
>>> Please find the implementation plan attached. The word "soon" gets a new
>>> meaning when power outages are taken into account. :)
>>>
>>> As before, comments are welcome.
>>>
>>> David
>>>
>>> On Tuesday, February 22, 2011 15:22:57 Simon Willnauer wrote:
>>>> I think that is good for now. I should get started on codeawards and
>>>> wrap up our proposals. I hope I can do that this week.
>>>>
>>>> simon
>>>>
>>>> On Tue, Feb 22, 2011 at 3:16 PM, David Nemeskey
>>>>
>>>> <[hidden email]> wrote:
>>>>> Hey,
>>>>>
>>>>> I have written the proposal. Please let me know if you want more / less
>>>>> of certain parts. Should I upload it somewhere?
>>>>>
>>>>> Implementation plan soon to follow.
>>>>>
>>>>> Sorry for the late reply; I have been rather busy these past few weeks.
>>>>>
>>>>> David
>>>>>
>>>>> On Wednesday, February 02, 2011 10:35:55 Simon Willnauer wrote:
>>>>>> Hey David,
>>>>>>
>>>>>> I saw that you added a tiny line to the GSoC Lucene wiki - thanks for
>>>>>> that.
>>>>>>
>>>>>> On Wed, Feb 2, 2011 at 10:10 AM, David Nemeskey
>>>>>>
>>>>>> <[hidden email]> wrote:
>>>>>>> Hi guys,
>>>>>>>
>>>>>>> Mark, Robert, Simon: thanks for the support! I really hope we can work
>>>>>>> together this summer (and before that, obviously).
>>>>>>
>>>>>> Same here!
>>>>>>
>>>>>>> According to http://www.google-
>>>>>>> melange.com/document/show/gsoc_program/google/gsoc2011/timeline ,
>>>>>>> there's still some time until the application period. So let me use
>>>>>>> this week to finish my PhD research plan, and get back to you next
>>>>>>> week.
>>>>>>>
>>>>>>> I am not really familiar with how the program works, i.e. how detailed
>>>>>>> the application description should be, when mentorship is decided,
>>>>>>> etc. so I guess we will have a lot to talk about. :)
>>>>>>
>>>>>> so from a 10000ft view it work like this:
>>>>>>
>>>>>> 1. Write up a short proposal what your idea is about
>>>>>> 2. make it public! and publish a implementation plan - how you would
>>>>>> want to realize your proposal. If you don't follow that 100% in the
>>>>>> actual impl. don't worry. Its just mean to give us an idea that you
>>>>>> know what you are doing and where you want to go. something like a 1
>>>>>> A4 rough design doc.
>>>>>> 3. give other people the change to apply for the same suggestion (this
>>>>>> is how it works though)
>>>>>> 4 Let the ASF / us assign one or more possible mentors to it
>>>>>> 5. let us apply for a slot in GSoC (those are limited for organizations)
>>>>>> 6. get accepted
>>>>>> 7. rock it!
>>>>>>
>>>>>>> (Actually, should we move this discussion private?)
>>>>>>
>>>>>> no - we usually do everything in public except of discussion within
>>>>>> the PMC that are meant to be private for legal reasons or similar
>>>>>> things. Lets stick to the mailing list for all communication except
>>>>>> you have something that should clearly not be public. This also give
>>>>>> other contributors a chance to help and get interested in your work!!
>>>>>>
>>>>>> simon
>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>> Hi David, honestly this sounds fantastic.
>>>>>>>>
>>>>>>>> It would be great to have someone to work with us on this issue!
>>>>>>>>
>>>>>>>> To date, progress is pretty slow-going (minor improvements, cleanups,
>>>>>>>> additional stats here and there)... but we really need all the help
>>>>>>>> we can get, especially from people who have a really good
>>>>>>>> understanding of the various models.
>>>>>>>>
>>>>>>>> In case you are interested, here are some references to discussions
>>>>>>>> about adding more flexibility (with some prototypes etc):
>>>>>>>> http://www.lucidimagination.com/search/document/72787e0e54f798e4/baby
>>>>>>>> _st eps _towards_making_lucene_s_scoring_more_flexible
>>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2392
>>>>>>>>
>>>>>>>> On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey
>>>>>>>>
>>>>>>>> <[hidden email]> wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I have already sent this mail to Simon Willnauer, and he suggested
>>>>>>>>> me to post it here for discussion.
>>>>>>>>>
>>>>>>>>> I am David Nemeskey, a PhD student at the Eotvos Lorand University,
>>>>>>>>> Budapest, Hungary. I am doing an IR-related research, and we have
>>>>>>>>> considered using Lucene as our search engine. We were quite
>>>>>>>>> satisfied with the speed and ease of use. However, we would like
>>>>>>>>> to experiment with different ranking algorithms, and this is where
>>>>>>>>> problems arise. Lucene only supports the VSM, and unfortunately
>>>>>>>>> the ranking architecture seems to be tailored specifically to its
>>>>>>>>> needs.
>>>>>>>>>
>>>>>>>>> I would be very much interested in revamping the ranking component
>>>>>>>>> as a GSoC project. The following modifications should be doable in
>>>>>>>>> the allocated time frame:
>>>>>>>>> - a new ranking class hierarchy, which is generic enough to allow
>>>>>>>>> easy implementation of new weighting schemes (at least
>>>>>>>>> bag-of-words ones), - addition of state-of-the-art ranking
>>>>>>>>> methods, such as Okapi BM25, proximity and DFR models,
>>>>>>>>> - configuration for ranking selection, with the old method as
>>>>>>>>> default.
>>>>>>>>>
>>>>>>>>> I believe all users of Lucene would profit from such a project. It
>>>>>>>>> would provide the scientific community with an even more useful
>>>>>>>>> research aid, while regular users could benefit from superior
>>>>>>>>> ranking results.
>>>>>>>>>
>>>>>>>>> Please let me know your opinion about this proposal.
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem docs using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GSoC

David Nemeskey
Ok, I have created a new issue, LUCENE-2959 for this project. I have uploaded
the pdfs and added the gsoc2011 and lucene-gsoc-2011 labels as well.

David

On 2011 March 09, Wednesday 21:58:53 Simon Willnauer wrote:

> On Wed, Mar 9, 2011 at 5:48 PM, Grant Ingersoll <[hidden email]> wrote:
> > I think we, Lucene committers, need to identify who is willing to mentor.
> >    In my experience, it is less than 5 hours a week.  Most of the work
> > is done as part of the community.  Sometimes you have to be tough and
> > fail someone (I did last year) but most of the time, if you take the
> > time to interview the candidates up front, it is a good experience for
> > everyone.
>
> count me in
>
> > I'd add it would be useful to have everyone put the lucene-gsoc-11 label
> > on their issues too, that way we can quickly find the Lucene ones.
>
> done on at least one ;)
>
> simon
>
> > Also, feel free to label existing bugs.
> >
> > On Mar 9, 2011, at 2:11 AM, Simon Willnauer wrote:
> >> Hey David and all others who want to contribute to GSoC,
> >>
> >> the ASF has applied for GSoC 2011 as a mentoring organization. As a
> >> ASF project we don't need to apply directly though but we need to
> >> register our ideas now. This works like almost anything in the ASF
> >> through JIRA. All ideas should be recorded as JIRA tickets  labeled
> >> with "gsoc2011". Once this is done it will show up here:
> >> http://s.apache.org/gsoc2011tasks
> >>
> >> Everybody who is interested in GSoC as a mentor or student should now
> >> read this too http://community.apache.org/gsoc.html
> >>
> >>
> >> Thanks,
> >>
> >> Simon
> >>
> >>
> >>
> >>
> >> On Thu, Feb 24, 2011 at 12:14 PM, David Nemeskey
> >>
> >> <[hidden email]> wrote:
> >>> Please find the implementation plan attached. The word "soon" gets a
> >>> new meaning when power outages are taken into account. :)
> >>>
> >>> As before, comments are welcome.
> >>>
> >>> David
> >>>
> >>> On Tuesday, February 22, 2011 15:22:57 Simon Willnauer wrote:
> >>>> I think that is good for now. I should get started on codeawards and
> >>>> wrap up our proposals. I hope I can do that this week.
> >>>>
> >>>> simon
> >>>>
> >>>> On Tue, Feb 22, 2011 at 3:16 PM, David Nemeskey
> >>>>
> >>>> <[hidden email]> wrote:
> >>>>> Hey,
> >>>>>
> >>>>> I have written the proposal. Please let me know if you want more /
> >>>>> less of certain parts. Should I upload it somewhere?
> >>>>>
> >>>>> Implementation plan soon to follow.
> >>>>>
> >>>>> Sorry for the late reply; I have been rather busy these past few
> >>>>> weeks.
> >>>>>
> >>>>> David
> >>>>>
> >>>>> On Wednesday, February 02, 2011 10:35:55 Simon Willnauer wrote:
> >>>>>> Hey David,
> >>>>>>
> >>>>>> I saw that you added a tiny line to the GSoC Lucene wiki - thanks
> >>>>>> for that.
> >>>>>>
> >>>>>> On Wed, Feb 2, 2011 at 10:10 AM, David Nemeskey
> >>>>>>
> >>>>>> <[hidden email]> wrote:
> >>>>>>> Hi guys,
> >>>>>>>
> >>>>>>> Mark, Robert, Simon: thanks for the support! I really hope we can
> >>>>>>> work together this summer (and before that, obviously).
> >>>>>>
> >>>>>> Same here!
> >>>>>>
> >>>>>>> According to http://www.google-
> >>>>>>> melange.com/document/show/gsoc_program/google/gsoc2011/timeline ,
> >>>>>>> there's still some time until the application period. So let me use
> >>>>>>> this week to finish my PhD research plan, and get back to you next
> >>>>>>> week.
> >>>>>>>
> >>>>>>> I am not really familiar with how the program works, i.e. how
> >>>>>>> detailed the application description should be, when mentorship is
> >>>>>>> decided, etc. so I guess we will have a lot to talk about. :)
> >>>>>>
> >>>>>> so from a 10000ft view it work like this:
> >>>>>>
> >>>>>> 1. Write up a short proposal what your idea is about
> >>>>>> 2. make it public! and publish a implementation plan - how you would
> >>>>>> want to realize your proposal. If you don't follow that 100% in the
> >>>>>> actual impl. don't worry. Its just mean to give us an idea that you
> >>>>>> know what you are doing and where you want to go. something like a 1
> >>>>>> A4 rough design doc.
> >>>>>> 3. give other people the change to apply for the same suggestion
> >>>>>> (this is how it works though)
> >>>>>> 4 Let the ASF / us assign one or more possible mentors to it
> >>>>>> 5. let us apply for a slot in GSoC (those are limited for
> >>>>>> organizations) 6. get accepted
> >>>>>> 7. rock it!
> >>>>>>
> >>>>>>> (Actually, should we move this discussion private?)
> >>>>>>
> >>>>>> no - we usually do everything in public except of discussion within
> >>>>>> the PMC that are meant to be private for legal reasons or similar
> >>>>>> things. Lets stick to the mailing list for all communication except
> >>>>>> you have something that should clearly not be public. This also give
> >>>>>> other contributors a chance to help and get interested in your
> >>>>>> work!!
> >>>>>>
> >>>>>> simon
> >>>>>>
> >>>>>>> David
> >>>>>>>
> >>>>>>>> Hi David, honestly this sounds fantastic.
> >>>>>>>>
> >>>>>>>> It would be great to have someone to work with us on this issue!
> >>>>>>>>
> >>>>>>>> To date, progress is pretty slow-going (minor improvements,
> >>>>>>>> cleanups, additional stats here and there)... but we really need
> >>>>>>>> all the help we can get, especially from people who have a really
> >>>>>>>> good understanding of the various models.
> >>>>>>>>
> >>>>>>>> In case you are interested, here are some references to
> >>>>>>>> discussions about adding more flexibility (with some prototypes
> >>>>>>>> etc):
> >>>>>>>> http://www.lucidimagination.com/search/document/72787e0e54f798e4/
> >>>>>>>> baby _st eps _towards_making_lucene_s_scoring_more_flexible
> >>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2392
> >>>>>>>>
> >>>>>>>> On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey
> >>>>>>>>
> >>>>>>>> <[hidden email]> wrote:
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> I have already sent this mail to Simon Willnauer, and he
> >>>>>>>>> suggested me to post it here for discussion.
> >>>>>>>>>
> >>>>>>>>> I am David Nemeskey, a PhD student at the Eotvos Lorand
> >>>>>>>>> University, Budapest, Hungary. I am doing an IR-related
> >>>>>>>>> research, and we have considered using Lucene as our search
> >>>>>>>>> engine. We were quite satisfied with the speed and ease of use.
> >>>>>>>>> However, we would like to experiment with different ranking
> >>>>>>>>> algorithms, and this is where problems arise. Lucene only
> >>>>>>>>> supports the VSM, and unfortunately the ranking architecture
> >>>>>>>>> seems to be tailored specifically to its needs.
> >>>>>>>>>
> >>>>>>>>> I would be very much interested in revamping the ranking
> >>>>>>>>> component as a GSoC project. The following modifications should
> >>>>>>>>> be doable in the allocated time frame:
> >>>>>>>>> - a new ranking class hierarchy, which is generic enough to allow
> >>>>>>>>> easy implementation of new weighting schemes (at least
> >>>>>>>>> bag-of-words ones), - addition of state-of-the-art ranking
> >>>>>>>>> methods, such as Okapi BM25, proximity and DFR models,
> >>>>>>>>> - configuration for ranking selection, with the old method as
> >>>>>>>>> default.
> >>>>>>>>>
> >>>>>>>>> I believe all users of Lucene would profit from such a project.
> >>>>>>>>> It would provide the scientific community with an even more
> >>>>>>>>> useful research aid, while regular users could benefit from
> >>>>>>>>> superior ranking results.
> >>>>>>>>>
> >>>>>>>>> Please let me know your opinion about this proposal.
> >>>>>>>
> >>>>>>> -------------------------------------------------------------------
> >>>>>>> -- To unsubscribe, e-mail: [hidden email]
> >>>>>>> For additional commands, e-mail: [hidden email]
> >>>>>>
> >>>>>> --------------------------------------------------------------------
> >>>>>> - To unsubscribe, e-mail: [hidden email]
> >>>>>> For additional commands, e-mail: [hidden email]
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: [hidden email]
> >>>>> For additional commands, e-mail: [hidden email]
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [hidden email]
> >>> For additional commands, e-mail: [hidden email]
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >
> > --------------------------
> > Grant Ingersoll
> > http://www.lucidimagination.com/
> >
> > Search the Lucene ecosystem docs using Solr/Lucene:
> > http://www.lucidimagination.com/search
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GSoC

Simon Willnauer
awesome thanks!

simon

On Thu, Mar 10, 2011 at 11:54 AM, David Nemeskey
<[hidden email]> wrote:

> Ok, I have created a new issue, LUCENE-2959 for this project. I have uploaded
> the pdfs and added the gsoc2011 and lucene-gsoc-2011 labels as well.
>
> David
>
> On 2011 March 09, Wednesday 21:58:53 Simon Willnauer wrote:
>> On Wed, Mar 9, 2011 at 5:48 PM, Grant Ingersoll <[hidden email]> wrote:
>> > I think we, Lucene committers, need to identify who is willing to mentor.
>> >    In my experience, it is less than 5 hours a week.  Most of the work
>> > is done as part of the community.  Sometimes you have to be tough and
>> > fail someone (I did last year) but most of the time, if you take the
>> > time to interview the candidates up front, it is a good experience for
>> > everyone.
>>
>> count me in
>>
>> > I'd add it would be useful to have everyone put the lucene-gsoc-11 label
>> > on their issues too, that way we can quickly find the Lucene ones.
>>
>> done on at least one ;)
>>
>> simon
>>
>> > Also, feel free to label existing bugs.
>> >
>> > On Mar 9, 2011, at 2:11 AM, Simon Willnauer wrote:
>> >> Hey David and all others who want to contribute to GSoC,
>> >>
>> >> the ASF has applied for GSoC 2011 as a mentoring organization. As a
>> >> ASF project we don't need to apply directly though but we need to
>> >> register our ideas now. This works like almost anything in the ASF
>> >> through JIRA. All ideas should be recorded as JIRA tickets  labeled
>> >> with "gsoc2011". Once this is done it will show up here:
>> >> http://s.apache.org/gsoc2011tasks
>> >>
>> >> Everybody who is interested in GSoC as a mentor or student should now
>> >> read this too http://community.apache.org/gsoc.html
>> >>
>> >>
>> >> Thanks,
>> >>
>> >> Simon
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, Feb 24, 2011 at 12:14 PM, David Nemeskey
>> >>
>> >> <[hidden email]> wrote:
>> >>> Please find the implementation plan attached. The word "soon" gets a
>> >>> new meaning when power outages are taken into account. :)
>> >>>
>> >>> As before, comments are welcome.
>> >>>
>> >>> David
>> >>>
>> >>> On Tuesday, February 22, 2011 15:22:57 Simon Willnauer wrote:
>> >>>> I think that is good for now. I should get started on codeawards and
>> >>>> wrap up our proposals. I hope I can do that this week.
>> >>>>
>> >>>> simon
>> >>>>
>> >>>> On Tue, Feb 22, 2011 at 3:16 PM, David Nemeskey
>> >>>>
>> >>>> <[hidden email]> wrote:
>> >>>>> Hey,
>> >>>>>
>> >>>>> I have written the proposal. Please let me know if you want more /
>> >>>>> less of certain parts. Should I upload it somewhere?
>> >>>>>
>> >>>>> Implementation plan soon to follow.
>> >>>>>
>> >>>>> Sorry for the late reply; I have been rather busy these past few
>> >>>>> weeks.
>> >>>>>
>> >>>>> David
>> >>>>>
>> >>>>> On Wednesday, February 02, 2011 10:35:55 Simon Willnauer wrote:
>> >>>>>> Hey David,
>> >>>>>>
>> >>>>>> I saw that you added a tiny line to the GSoC Lucene wiki - thanks
>> >>>>>> for that.
>> >>>>>>
>> >>>>>> On Wed, Feb 2, 2011 at 10:10 AM, David Nemeskey
>> >>>>>>
>> >>>>>> <[hidden email]> wrote:
>> >>>>>>> Hi guys,
>> >>>>>>>
>> >>>>>>> Mark, Robert, Simon: thanks for the support! I really hope we can
>> >>>>>>> work together this summer (and before that, obviously).
>> >>>>>>
>> >>>>>> Same here!
>> >>>>>>
>> >>>>>>> According to http://www.google-
>> >>>>>>> melange.com/document/show/gsoc_program/google/gsoc2011/timeline ,
>> >>>>>>> there's still some time until the application period. So let me use
>> >>>>>>> this week to finish my PhD research plan, and get back to you next
>> >>>>>>> week.
>> >>>>>>>
>> >>>>>>> I am not really familiar with how the program works, i.e. how
>> >>>>>>> detailed the application description should be, when mentorship is
>> >>>>>>> decided, etc. so I guess we will have a lot to talk about. :)
>> >>>>>>
>> >>>>>> so from a 10000ft view it work like this:
>> >>>>>>
>> >>>>>> 1. Write up a short proposal what your idea is about
>> >>>>>> 2. make it public! and publish a implementation plan - how you would
>> >>>>>> want to realize your proposal. If you don't follow that 100% in the
>> >>>>>> actual impl. don't worry. Its just mean to give us an idea that you
>> >>>>>> know what you are doing and where you want to go. something like a 1
>> >>>>>> A4 rough design doc.
>> >>>>>> 3. give other people the change to apply for the same suggestion
>> >>>>>> (this is how it works though)
>> >>>>>> 4 Let the ASF / us assign one or more possible mentors to it
>> >>>>>> 5. let us apply for a slot in GSoC (those are limited for
>> >>>>>> organizations) 6. get accepted
>> >>>>>> 7. rock it!
>> >>>>>>
>> >>>>>>> (Actually, should we move this discussion private?)
>> >>>>>>
>> >>>>>> no - we usually do everything in public except of discussion within
>> >>>>>> the PMC that are meant to be private for legal reasons or similar
>> >>>>>> things. Lets stick to the mailing list for all communication except
>> >>>>>> you have something that should clearly not be public. This also give
>> >>>>>> other contributors a chance to help and get interested in your
>> >>>>>> work!!
>> >>>>>>
>> >>>>>> simon
>> >>>>>>
>> >>>>>>> David
>> >>>>>>>
>> >>>>>>>> Hi David, honestly this sounds fantastic.
>> >>>>>>>>
>> >>>>>>>> It would be great to have someone to work with us on this issue!
>> >>>>>>>>
>> >>>>>>>> To date, progress is pretty slow-going (minor improvements,
>> >>>>>>>> cleanups, additional stats here and there)... but we really need
>> >>>>>>>> all the help we can get, especially from people who have a really
>> >>>>>>>> good understanding of the various models.
>> >>>>>>>>
>> >>>>>>>> In case you are interested, here are some references to
>> >>>>>>>> discussions about adding more flexibility (with some prototypes
>> >>>>>>>> etc):
>> >>>>>>>> http://www.lucidimagination.com/search/document/72787e0e54f798e4/
>> >>>>>>>> baby _st eps _towards_making_lucene_s_scoring_more_flexible
>> >>>>>>>> https://issues.apache.org/jira/browse/LUCENE-2392
>> >>>>>>>>
>> >>>>>>>> On Fri, Jan 28, 2011 at 11:32 AM, David Nemeskey
>> >>>>>>>>
>> >>>>>>>> <[hidden email]> wrote:
>> >>>>>>>>> Hi all,
>> >>>>>>>>>
>> >>>>>>>>> I have already sent this mail to Simon Willnauer, and he
>> >>>>>>>>> suggested me to post it here for discussion.
>> >>>>>>>>>
>> >>>>>>>>> I am David Nemeskey, a PhD student at the Eotvos Lorand
>> >>>>>>>>> University, Budapest, Hungary. I am doing an IR-related
>> >>>>>>>>> research, and we have considered using Lucene as our search
>> >>>>>>>>> engine. We were quite satisfied with the speed and ease of use.
>> >>>>>>>>> However, we would like to experiment with different ranking
>> >>>>>>>>> algorithms, and this is where problems arise. Lucene only
>> >>>>>>>>> supports the VSM, and unfortunately the ranking architecture
>> >>>>>>>>> seems to be tailored specifically to its needs.
>> >>>>>>>>>
>> >>>>>>>>> I would be very much interested in revamping the ranking
>> >>>>>>>>> component as a GSoC project. The following modifications should
>> >>>>>>>>> be doable in the allocated time frame:
>> >>>>>>>>> - a new ranking class hierarchy, which is generic enough to allow
>> >>>>>>>>> easy implementation of new weighting schemes (at least
>> >>>>>>>>> bag-of-words ones), - addition of state-of-the-art ranking
>> >>>>>>>>> methods, such as Okapi BM25, proximity and DFR models,
>> >>>>>>>>> - configuration for ranking selection, with the old method as
>> >>>>>>>>> default.
>> >>>>>>>>>
>> >>>>>>>>> I believe all users of Lucene would profit from such a project.
>> >>>>>>>>> It would provide the scientific community with an even more
>> >>>>>>>>> useful research aid, while regular users could benefit from
>> >>>>>>>>> superior ranking results.
>> >>>>>>>>>
>> >>>>>>>>> Please let me know your opinion about this proposal.
>> >>>>>>>
>> >>>>>>> -------------------------------------------------------------------
>> >>>>>>> -- To unsubscribe, e-mail: [hidden email]
>> >>>>>>> For additional commands, e-mail: [hidden email]
>> >>>>>>
>> >>>>>> --------------------------------------------------------------------
>> >>>>>> - To unsubscribe, e-mail: [hidden email]
>> >>>>>> For additional commands, e-mail: [hidden email]
>> >>>>>
>> >>>>> ---------------------------------------------------------------------
>> >>>>> To unsubscribe, e-mail: [hidden email]
>> >>>>> For additional commands, e-mail: [hidden email]
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: [hidden email]
>> >>> For additional commands, e-mail: [hidden email]
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: [hidden email]
>> >> For additional commands, e-mail: [hidden email]
>> >
>> > --------------------------
>> > Grant Ingersoll
>> > http://www.lucidimagination.com/
>> >
>> > Search the Lucene ecosystem docs using Solr/Lucene:
>> > http://www.lucidimagination.com/search
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [hidden email]
>> > For additional commands, e-mail: [hidden email]
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GSoC

Michael McCandless-2
In reply to this post by Simon Willnauer
On Wed, Mar 9, 2011 at 3:58 PM, Simon Willnauer
<[hidden email]> wrote:
> On Wed, Mar 9, 2011 at 5:48 PM, Grant Ingersoll <[hidden email]> wrote:
>> I think we, Lucene committers, need to identify who is willing to mentor.    In my experience, it is less than 5 hours a week.  Most of the work is done as part of the community.  Sometimes you have to be tough and fail someone (I did last year) but most of the time, if you take the time to interview the candidates up front, it is a good experience for everyone.
>
> count me in

I'll also be a GSOC mentor!

--
Mike

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]