Re : Re : GSoC Evolutionary Algorithm Idea

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re : Re : GSoC Evolutionary Algorithm Idea

deneche abdelhakim
> I see this is ASL,

What is ASL ???

> so that seems workable, but I'd be interested in  
> hearing more about how it integrates w/ what we are doing?


A Ga works by evoluting a population of individuals toward a desired goal. Generally, a GA needs thousands of iterations with hundreds of individuals to find a satisfiying solution. The fitness function, that calculates the closeness of a given individual to the desired goal, is called for each individual of the population for each iteration. In our case, and especially for very large dataset the fitness function can be implemented with the Map-Reduce strategy to exploit distributed computing.

The main program, will launch the GA using WatchMaker, each time the GA needs to calculate the fitness of a given indiviudal it calls a specific class given by us, this class should configure a Hadoop Job and launch its execution (Iam not sure about how to it yet, but Iam already working on that part).

My secondary goal is to implement a ready to use generic fitness function for WatchMaker that calls internally Hadoop, those allowing us to use the GA for different problems such as multiclass classification (I will do it if I don't run out of time), classifcation...

The implemented fitness function is a good mesure for binary classification, it takes a set of rules and calculates their quality over large datasets using Hadoop for the parallelism. So it can be used outside the GA.


Abdel Hakim Deneche





      _____________________________________________________________________________
Envoyez avec Yahoo! Mail. Capacité de stockage illimitée pour vos emails. http://mail.yahoo.fr
Reply | Threaded
Open this post in threaded view
|

Re: Re : Re : GSoC Evolutionary Algorithm Idea

Grant Ingersoll-2

On Mar 26, 2008, at 7:38 AM, deneche abdelhakim wrote:

>> I see this is ASL,
>
> What is ASL ???

Sorry, Apache Software License.  It means we can package the library,  
if needed, with our code.  Gotta keep the lawyers happy!

>
>
>> so that seems workable, but I'd be interested in
>> hearing more about how it integrates w/ what we are doing?
>
>
> A Ga works by evoluting a population of individuals toward a desired  
> goal. Generally, a GA needs thousands of iterations with hundreds of  
> individuals to find a satisfiying solution. The fitness function,  
> that calculates the closeness of a given individual to the desired  
> goal, is called for each individual of the population for each  
> iteration. In our case, and especially for very large dataset the  
> fitness function can be implemented with the Map-Reduce strategy to  
> exploit distributed computing.
>
> The main program, will launch the GA using WatchMaker, each time the  
> GA needs to calculate the fitness of a given indiviudal it calls a  
> specific class given by us, this class should configure a Hadoop Job  
> and launch its execution (Iam not sure about how to it yet, but Iam  
> already working on that part).
>
> My secondary goal is to implement a ready to use generic fitness  
> function for WatchMaker that calls internally Hadoop, those allowing  
> us to use the GA for different problems such as multiclass  
> classification (I will do it if I don't run out of time),  
> classifcation...
>
> The implemented fitness function is a good mesure for binary  
> classification, it takes a set of rules and calculates their quality  
> over large datasets using Hadoop for the parallelism. So it can be  
> used outside the GA.
>


Sounds good.  I look forward to the proposal.


-Grant
Reply | Threaded
Open this post in threaded view
|

Re: Re : Re : GSoC Evolutionary Algorithm Idea

Ted Dunning-3
In reply to this post by deneche abdelhakim


GA is a special case of evolutionary algorithms in general.

If we ignore cross-over for the moment, hadoop is ideal for EA in general.
The natural implementation would have each map input record represent a
single member of the population.  The mapper would mutate this member and
evaluate the fitness, outputting all records with a random key from a small
set (depends on how many reducers we want) and the combiner and reducer
would sieve out the top N members for each key value.  Multiple passes of
this would be the way to run the algorithm.  If the key set has cardinality
1, then this reduces to ordinary mutation and selection, if it is larger,
then the selection of the top n becomes a bit approximate, but should not
cause any significant problems.  A second pass could be used to reduce to an
exact selection if needed.  Depending on how the combiner words, using a
single reduce might be very fast.

Crossover requires that pairs items be brought together, roughly at random,
and might require an extra map-reduce.

My own preference with these algorithms is to avoid cross-over and focus on
meta-mutation where some of the state in the records specifies how the
mutation should proceed.  This can have dramatic positive effects in
accelerating convergence by providing somewhat of a trade-off against
convergence guarantees.  Since everybody gives up the convergence guarantees
anyway, this is a nice knob to have.  I describe on approach that can be
very effective in an old paper that can be found here:
http://arxiv.org/abs/0803.3838 .  I can provide a sample implementation in
C, but the paper is probably easier to read.





On 3/26/08 4:38 AM, "deneche abdelhakim" <[hidden email]> wrote:

>> I see this is ASL,
>
> What is ASL ???
>
>> so that seems workable, but I'd be interested in
>> hearing more about how it integrates w/ what we are doing?
>
>
> A Ga works by evoluting a population of individuals toward a desired goal.
> Generally, a GA needs thousands of iterations with hundreds of individuals to
> find a satisfiying solution. The fitness function, that calculates the
> closeness of a given individual to the desired goal, is called for each
> individual of the population for each iteration. In our case, and especially
> for very large dataset the fitness function can be implemented with the
> Map-Reduce strategy to exploit distributed computing.
>
> The main program, will launch the GA using WatchMaker, each time the GA needs
> to calculate the fitness of a given indiviudal it calls a specific class given
> by us, this class should configure a Hadoop Job and launch its execution (Iam
> not sure about how to it yet, but Iam already working on that part).
>
> My secondary goal is to implement a ready to use generic fitness function for
> WatchMaker that calls internally Hadoop, those allowing us to use the GA for
> different problems such as multiclass classification (I will do it if I don't
> run out of time), classifcation...
>
> The implemented fitness function is a good mesure for binary classification,
> it takes a set of rules and calculates their quality over large datasets using
> Hadoop for the parallelism. So it can be used outside the GA.
>
>
> Abdel Hakim Deneche
>
>
>
>
>
>      
> _____________________________________________________________________________
> Envoyez avec Yahoo! Mail. Capacité de stockage illimitée pour vos emails.
> http://mail.yahoo.fr