[VOTE] Release Apache Nutch 0.9

classic Classic list List threaded Threaded
42 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Apache Nutch 0.9

吕召刚

Hello,

sounds good.  one question is we have taged release-0.9, and this release has
been there in some mirror sites of nutch, and people downloaded this version,
so there would be two nutch-0.9 exist in the world, how could people differ
between them.

Thanks
David

Chris Mattmann :

> Folks,
>
>  Discussing this with Andrzej, and reading his email below, I tend to agree
> more with this procedure below. I would like to call for a vote to change
> the existing as-documented procedure (on the wiki) to branch first, do
> testing in  branch (apply patches where needed), and then when the branch
> is blessed (e.g., 3 binding votes from committers in favor of it), tag it,
> and make a release. Sound good?
>
>  In terms of next steps with what we have now, that boils down to:
>
> 1. delete tags/release-0.9
> 2. apply patch to trunk
> 3. create branches/branch-0.9
> 4. have dennis test again (large scale)
> 5. if all goes well, finish release process
> 6. tag tags/release-0.9
>
> Thoughts?
>
> Thanks!
>
> Cheers,
>   Chris
>
> On 3/28/07 10:35 AM, "Andrzej Bialecki" <[hidden email]> wrote:
> > Dennis Kubes wrote:
> >> Yes.  This seems to have fixed the problem.  All, do we want to create a
> >> JIRA and commit this for the 0.9 release?
> >
> > It should definitely go into the release, and we need a patch for the
> > trunk/ .
> >
> > Actually, I'm somewhat surprised that we have tags/release-0.9 but we
> > don't yet have branches/branch-0.9 ...
> >
> > I think I'm confused, or the release procedure is confused. My
> > understanding so far was that we first create a branch-0.9, we test the
> > build from that branch and if it passes all tests and the wait period is
> > over, then we copy it to tags/release-0.9 and proclaim a release - which
> > is really a read-only branch, i.e. we don't ever commit any patches to
> > it ... If that were the case, then we still wouldn't have the
> > release-0.9 tag, we could have applied the patch in branch-0.9, plus
> > possibly other patches, and then finally tag this tree as
> > tags/release-0.9.
> >
> > As it is now we are in an awkward situation that we have to patch
> > tags/release-0.9 ..
> >
> > One solution would be now to delete this tag, apply the patch to trunk,
> > create branches/branch-0.9, and continue applying any other patches that
> > may come up during this testing period - and when we are finally happy
> > with the codebase then take a snapshot into tags/release-0.9, and keep
> > it read-only.
> >
> > Another solution is to bend the rules and apply the patch to trunk/ and
> > then merge from the trunk to tags/release-0.9 .
> >
> > What do you think?

--
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Apache Nutch 0.9

Sami Siren-2
In reply to this post by Andrzej Białecki-2
2007/3/28, Andrzej Bialecki <[hidden email]>:
>
> Dennis Kubes wrote:
> > Yes.  This seems to have fixed the problem.  All, do we want to create a
> > JIRA and commit this for the 0.9 release?
>
> It should definitely go into the release, and we need a patch for the
> trunk/ .


+1

Actually, I'm somewhat surprised that we have tags/release-0.9 but we
> don't yet have branches/branch-0.9 ...


IMO there's no need for a branch before a release.

I think I'm confused, or the release procedure is confused. My
> understanding so far was that we first create a branch-0.9, we test the
> build from that branch and if it passes all tests and the wait period is
> over, then we copy it to tags/release-0.9 and proclaim a release - which
> is really a read-only branch, i.e. we don't ever commit any patches to
> it ... If that were the case, then we still wouldn't have the
> release-0.9 tag, we could have applied the patch in branch-0.9, plus
> possibly other patches, and then finally tag this tree as tags/release-0.9
> .


IMO we should have had a 0.9-rc1 tag, apply patch to trunk, have 0.9-rc2 tag
and so on
until we are satisfied.

Then when we're actually satisfied create tag for 0.9 (copy from rc that got
promoted).

What is the benefit of using a branch before a release?

--
 Sami Siren
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Apache Nutch 0.9

Andrzej Białecki-2
Sami Siren wrote:

> IMO we should have had a 0.9-rc1 tag, apply patch to trunk, have
> 0.9-rc2 tag and so on until we are satisfied.
>
> Then when we're actually satisfied create tag for 0.9 (copy from rc
> that got promoted).
>
> What is the benefit of using a branch before a release?

That you don't withhold other development while waiting for rc1,
rc2, rcN, ... - other patches, including disruptive ones and those that
introduce new features, can be applied in the meantime to trunk/ .

As for bugfixes, they can be merged up or down between the branch
and trunk as needed.


--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Apache Nutch 0.9

Sami Siren-2
2007/3/29, Andrzej Bialecki <[hidden email]>:

>
> Sami Siren wrote:
>
> > IMO we should have had a 0.9-rc1 tag, apply patch to trunk, have
> > 0.9-rc2 tag and so on until we are satisfied.
> >
> > Then when we're actually satisfied create tag for 0.9 (copy from rc
> > that got promoted).
> >
> > What is the benefit of using a branch before a release?
>
> That you don't withhold other development while waiting for rc1,
> rc2, rcN, ... - other patches, including disruptive ones and those that
> introduce new features, can be applied in the meantime to trunk/ .


Nutch trunk has "kind of" being in a state where no modifications are
allowed fow quite a long time already before Chris started his work. If the
branch is done based on those argumnents it should have been done a
long time ago.

--
 Sami Siren
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Apache Nutch 0.9

Andrzej Białecki-2
Sami Siren wrote:

> 2007/3/29, Andrzej Bialecki <[hidden email]>:
>>
>> Sami Siren wrote:
>>
>> > IMO we should have had a 0.9-rc1 tag, apply patch to trunk, have
>> > 0.9-rc2 tag and so on until we are satisfied.
>> >
>> > Then when we're actually satisfied create tag for 0.9 (copy from rc
>> > that got promoted).
>> >
>> > What is the benefit of using a branch before a release?
>>
>> That you don't withhold other development while waiting for rc1,
>> rc2, rcN, ... - other patches, including disruptive ones and those that
>> introduce new features, can be applied in the meantime to trunk/ .
>
>
> Nutch trunk has "kind of" being in a state where no modifications are
> allowed fow quite a long time already before Chris started his work. If the
> branch is done based on those argumnents it should have been done a
> long time ago.

I'm not arguing about the past - the project had so few active people at
that time that it didn't matter. However, now that the project became
more active again, and there are people willing and able to work on new
functionality, we shouldn't hinder their work. We are discussing what is
the best course of the action for future releases, in order to avoid
such long freeze periods on the trunk/ .


--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Apache Nutch 0.9

Sami Siren-2
2007/3/29, Andrzej Bialecki <[hidden email]>:

>
> Sami Siren wrote:
> > 2007/3/29, Andrzej Bialecki <[hidden email]>:
> >>
> >> Sami Siren wrote:
> >>
> >> > IMO we should have had a 0.9-rc1 tag, apply patch to trunk, have
> >> > 0.9-rc2 tag and so on until we are satisfied.
> >> >
> >> > Then when we're actually satisfied create tag for 0.9 (copy from rc
> >> > that got promoted).
> >> >
> >> > What is the benefit of using a branch before a release?
> >>
> >> That you don't withhold other development while waiting for rc1,
> >> rc2, rcN, ... - other patches, including disruptive ones and those that
> >> introduce new features, can be applied in the meantime to trunk/ .
> >
> >
> > Nutch trunk has "kind of" being in a state where no modifications are
> > allowed fow quite a long time already before Chris started his work. If
> the
> > branch is done based on those argumnents it should have been done a
> > long time ago.
>
> I'm not arguing about the past - the project had so few active people at
> that time that it didn't matter. However, now that the project became
> more active again, and there are people willing and able to work on new
> functionality, we shouldn't hinder their work. We are discussing what is
> the best course of the action for future releases, in order to avoid
> such long freeze periods on the trunk/ .


Now you are not getting me. I am not arguing about the past either.
I am just saying that the proposed process does not give any
benefit in regard of those arguments you gave if the trunk is freezed long
before branching. The branch should have been done when we kind fixed
the features for 0.9. not when the first rc is cut.

--
 Sami Siren


--
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Apache Nutch 0.9

Andrzej Białecki-2
Sami Siren wrote:

> 2007/3/29, Andrzej Bialecki <[hidden email]>:
>>
>> Sami Siren wrote:
>> > 2007/3/29, Andrzej Bialecki <[hidden email]>:
>> >>
>> >> Sami Siren wrote:
>> >>
>> >> > IMO we should have had a 0.9-rc1 tag, apply patch to trunk, have
>> >> > 0.9-rc2 tag and so on until we are satisfied.
>> >> >
>> >> > Then when we're actually satisfied create tag for 0.9 (copy from rc
>> >> > that got promoted).
>> >> >
>> >> > What is the benefit of using a branch before a release?
>> >>
>> >> That you don't withhold other development while waiting for rc1,
>> >> rc2, rcN, ... - other patches, including disruptive ones and those
>> that
>> >> introduce new features, can be applied in the meantime to trunk/ .
>> >
>> >
>> > Nutch trunk has "kind of" being in a state where no modifications are
>> > allowed fow quite a long time already before Chris started his work. If
>> the
>> > branch is done based on those argumnents it should have been done a
>> > long time ago.
>>
>> I'm not arguing about the past - the project had so few active people at
>> that time that it didn't matter. However, now that the project became
>> more active again, and there are people willing and able to work on new
>> functionality, we shouldn't hinder their work. We are discussing what is
>> the best course of the action for future releases, in order to avoid
>> such long freeze periods on the trunk/ .
>
>
> Now you are not getting me. I am not arguing about the past either.

Ok :)

> I am just saying that the proposed process does not give any
> benefit in regard of those arguments you gave if the trunk is freezed long
> before branching.

Ah, but the whole point is that we should be able _not_ to freeze the
trunk for so long for the future releases, and the procedure I proposed
allows us to do this - whereas if we stick to the current procedure we
will have to continue with the long freezes - which, I think we both
agree, are not beneficial.


> The branch should have been done when we kind fixed
> the features for 0.9. not when the first rc is cut.

That's the past for me. We're discussing how we should modify the
procedure to get the best results in the future, given the team and the
momentum we have now.

--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Apache Nutch 0.9

Sami Siren-2
> > The branch should have been done when we kind fixed
> > the features for 0.9. not when the first rc is cut.
>
> That's the past for me. We're discussing how we should modify the
> procedure to get the best results in the future, given the team and the
> momentum we have now.


I think we're discussing about the same thing(improving the process), I
just don't think 0.9 is out yet :)


But to wrap it up for me:

+1 for creating 0.9 branch after fixing the bug (and removing the tag),
creating new rc
and starting a vote.


I still propose that we discuss a bit more (in a separate thread) before
rewriting the how to release
page in wiki.

--
 Sami Siren
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Apache Nutch 0.9

Andrzej Białecki-2
Sami Siren wrote:

>> > The branch should have been done when we kind fixed
>> > the features for 0.9. not when the first rc is cut.
>>
>> That's the past for me. We're discussing how we should modify the
>> procedure to get the best results in the future, given the team and the
>> momentum we have now.
>
>
> I think we're discussing about the same thing(improving the process), I
> just don't think 0.9 is out yet :)
>
>
> But to wrap it up for me:
>
> +1 for creating 0.9 branch after fixing the bug (and removing the tag),
> creating new rc
> and starting a vote.


+1.


> I still propose that we discuss a bit more (in a separate thread) before
> rewriting the how to release
> page in wiki.

I agree - the current release process didn't fare too well in this
particular situation ...


--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Apache Nutch 0.9

chrismattmann
Hi Guys,

>> I think we're discussing about the same thing(improving the process), I
>> just don't think 0.9 is out yet :)
>>
>>
>> But to wrap it up for me:
>>
>> +1 for creating 0.9 branch after fixing the bug (and removing the tag),
>> creating new rc
>> and starting a vote.
>
>
> +1.

+1.

So, that's 3 binding votes to change the process. It looks like we have
enough to get started. I will begin work tonight (my time, Los Angeles, PST)
on removing the tag, and starting the process over again.

In the meanwhile, Dennis, do you have the patch that fixes the issue with
Hadoop? If so, ,could you commit it ASAP to the trunk. Once that's done,
I'll remove the tag, and star the release process over again, and get an RC
out for a vote. Then, we can move forward from there.

Thanks, guys!

Cheers,
  Chris

>
>
>> I still propose that we discuss a bit more (in a separate thread) before
>> rewriting the how to release
>> page in wiki.
>
> I agree - the current release process didn't fare too well in this
> particular situation ...
>

______________________________________________
Chris A. Mattmann
[hidden email]
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group

_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                        Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.


Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Apache Nutch 0.9

Dennis Kubes


Chris Mattmann wrote:

> Hi Guys,
>
>>> I think we're discussing about the same thing(improving the process), I
>>> just don't think 0.9 is out yet :)
>>>
>>>
>>> But to wrap it up for me:
>>>
>>> +1 for creating 0.9 branch after fixing the bug (and removing the tag),
>>> creating new rc
>>> and starting a vote.
>>
>> +1.
>
> +1.
>
> So, that's 3 binding votes to change the process. It looks like we have
> enough to get started. I will begin work tonight (my time, Los Angeles, PST)
> on removing the tag, and starting the process over again.
>
> In the meanwhile, Dennis, do you have the patch that fixes the issue with
> Hadoop? If so, ,could you commit it ASAP to the trunk. Once that's done,
> I'll remove the tag, and star the release process over again, and get an RC
> out for a vote. Then, we can move forward from there.

I will do this immediately.

Dennis Kubes

>
> Thanks, guys!
>
> Cheers,
>   Chris
>
>>
>>> I still propose that we discuss a bit more (in a separate thread) before
>>> rewriting the how to release
>>> page in wiki.
>> I agree - the current release process didn't fare too well in this
>> particular situation ...
>>
>
> ______________________________________________
> Chris A. Mattmann
> [hidden email]
> Staff Member
> Modeling and Data Management Systems Section (387)
> Data Management Systems and Technologies Group
>
> _________________________________________________
> Jet Propulsion Laboratory            Pasadena, CA
> Office: 171-266B                        Mailstop:  171-246
> _______________________________________________________
>
> Disclaimer:  The opinions presented within are my own and do not reflect
> those of either NASA, JPL, or the California Institute of Technology.
>
>
Reply | Threaded
Open this post in threaded view
|

[VOTE] Release Apache Nutch 0.9

chrismattmann
In reply to this post by chrismattmann
Hi Folks,
 
I have posted a candidate for the Apache Nutch 0.9 release at
 
 http://people.apache.org/~mattmann/nutch_0.9/rc2/
 
See the included CHANGES-0.9.txt file for details on release
contents and latest changes. The release was made from the 0.9-dev trunk,
including the recent patch applied by Dennis. I've also created a branch for
this release candidate at:
http://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.9.
 
Please vote on releasing these packages as Apache Nutch 0.9.
The vote is open for the next 72 hours. Only votes from Nutch
committers are binding, but everyone is welcome to check the release
candidate and voice their approval or disapproval. The vote  passes if
at least three binding +1 votes are cast.
 
[ ] +1 Release the packages as Apache Nutch 0.9
[ ] -1 Do not release the packages because...
 
Thanks!
 
Cheers,

 Chris





Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Apache Nutch 0.9

chrismattmann
Folks,

 As an FYI, here is a link to the log of the steps that I followed to get to
this point in the release:

http://people.apache.org/~mattmann/NUTCH_0.9_release_log_v2.doc

Cheers,
  Chris



On 4/2/07 10:52 PM, "Chris Mattmann" <[hidden email]> wrote:

> Hi Folks,
>  
> I have posted a candidate for the Apache Nutch 0.9 release at
>  
>  http://people.apache.org/~mattmann/nutch_0.9/rc2/
>  
> See the included CHANGES-0.9.txt file for details on release
> contents and latest changes. The release was made from the 0.9-dev trunk,
> including the recent patch applied by Dennis. I've also created a branch for
> this release candidate at:
> http://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.9.
>  
> Please vote on releasing these packages as Apache Nutch 0.9.
> The vote is open for the next 72 hours. Only votes from Nutch
> committers are binding, but everyone is welcome to check the release
> candidate and voice their approval or disapproval. The vote  passes if
> at least three binding +1 votes are cast.
>  
> [ ] +1 Release the packages as Apache Nutch 0.9
> [ ] -1 Do not release the packages because...
>  
> Thanks!
>  
> Cheers,
>
>  Chris
>
>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Apache Nutch 0.9

Dennis Kubes
Ok, I ran some bigger test crawls > 150K with the 0.9RC.  Everything
worked fine (inject, generate, fetch, updatedb, readdb, linkdb,
mergesegs, mergdb, merge, index) except delete duplicates on which I am
getting this error when running against segment indexes on the DFS.

Because of the way I am automating some of my crawls (sorting names by
alpha and only running part of the list), only one segment part-xxxxx
had results and then others had 0 results.  I don't know if that would
cause this and I don't think this bug is critical for the 0.9 release
but I wanted to bring it up.

My guess would be that this is a small bug within the lucene libraries
when the directories have 0 results.  What is everyone's opinion on this
in terms of the release?  My vote would be to move forward with the release.

Dennis Kubes

Task Id : task_0027_m_000003_3, Status : FAILED
task_0027_m_000003_3: Error running child
task_0027_m_000003_3: java.lang.ArrayIndexOutOfBoundsException: -1
task_0027_m_000003_3:   at
org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java:113)
task_0027_m_000003_3:   at
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(DeleteDuplicates.java:176)
task_0027_m_000003_3:   at
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
task_0027_m_000003_3:   at
org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
task_0027_m_000003_3:   at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
task_0027_m_000003_3:   at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
DeleteDuplicates: java.io.IOException: Job failed!

Chris Mattmann wrote:

> Folks,
>
>  As an FYI, here is a link to the log of the steps that I followed to get to
> this point in the release:
>
> http://people.apache.org/~mattmann/NUTCH_0.9_release_log_v2.doc
>
> Cheers,
>   Chris
>
>
>
> On 4/2/07 10:52 PM, "Chris Mattmann" <[hidden email]> wrote:
>
>> Hi Folks,
>>  
>> I have posted a candidate for the Apache Nutch 0.9 release at
>>  
>>  http://people.apache.org/~mattmann/nutch_0.9/rc2/
>>  
>> See the included CHANGES-0.9.txt file for details on release
>> contents and latest changes. The release was made from the 0.9-dev trunk,
>> including the recent patch applied by Dennis. I've also created a branch for
>> this release candidate at:
>> http://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.9.
>>  
>> Please vote on releasing these packages as Apache Nutch 0.9.
>> The vote is open for the next 72 hours. Only votes from Nutch
>> committers are binding, but everyone is welcome to check the release
>> candidate and voice their approval or disapproval. The vote  passes if
>> at least three binding +1 votes are cast.
>>  
>> [ ] +1 Release the packages as Apache Nutch 0.9
>> [ ] -1 Do not release the packages because...
>>  
>> Thanks!
>>  
>> Cheers,
>>
>>  Chris
>>
>>
>>
>>
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Apache Nutch 0.9

Andrzej Białecki-2
Dennis Kubes wrote:

> Ok, I ran some bigger test crawls > 150K with the 0.9RC.  Everything
> worked fine (inject, generate, fetch, updatedb, readdb, linkdb,
> mergesegs, mergdb, merge, index) except delete duplicates on which I am
> getting this error when running against segment indexes on the DFS.
>
> Because of the way I am automating some of my crawls (sorting names by
> alpha and only running part of the list), only one segment part-xxxxx
> had results and then others had 0 results.  I don't know if that would
> cause this and I don't think this bug is critical for the 0.9 release
> but I wanted to bring it up.

Please try the patch included at the end.


>
> My guess would be that this is a small bug within the lucene libraries
> when the directories have 0 results.  What is everyone's opinion on this
> in terms of the release?  My vote would be to move forward with the
> release.

I think we should move forward.


--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Index: DeleteDuplicates.java
===================================================================
--- DeleteDuplicates.java       (revision 521176)
+++ DeleteDuplicates.java       (working copy)
@@ -158,19 +158,28 @@
      public class DDRecordReader implements RecordReader {

        private IndexReader indexReader;
-      private int maxDoc;
-      private int doc;
+      private int maxDoc = 0;
+      private int doc = 0;
        private Text index;

        public DDRecordReader(FileSplit split, JobConf job,
            Text index) throws IOException {
-        indexReader = IndexReader.open(new
FsDirectory(FileSystem.get(job), split.getPath(), false, job));
-        maxDoc = indexReader.maxDoc();
+        try {
+          indexReader = IndexReader.open(new
FsDirectory(FileSystem.get(job), split.getPath(), false, job));
+          maxDoc = indexReader.maxDoc();
+        } catch (IOException ioe) {
+          LOG.warn("Can't open index at " + split + ", skipping. (" +
ioe.getMessage() + ")");
+          indexReader = null;
+        }
          this.index = index;
        }

        public boolean next(Writable key, Writable value)
          throws IOException {
+
+        // skip empty indexes
+        if (indexReader == null || maxDoc <= 0)
+          return false;

          // skip deleted documents
          while (indexReader.isDeleted(doc) && doc < maxDoc) doc++;
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Apache Nutch 0.9

Andrzej Białecki-2
In reply to this post by chrismattmann
Chris Mattmann wrote:
[..]
> [ ] +1 Release the packages as Apache Nutch 0.9
> [ ] -1 Do not release the packages because...

+1.


--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Apache Nutch 0.9

Dennis Kubes
[X] +1 Release the packages as Apache Nutch 0.9
[ ] -1 Do not release the packages because...

Andrzej Bialecki wrote:
> Chris Mattmann wrote:
> [..]
>> [ ] +1 Release the packages as Apache Nutch 0.9
>> [ ] -1 Do not release the packages because...
>
> +1.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Apache Nutch 0.9

Dennis Kubes
In reply to this post by Andrzej Białecki-2
That works.  I created the JIRA and attached your patch.  It passes all
build tests and works on my 150K run across my 5 machine dev cluster.
Should we go ahead and commit this?

Dennis

Andrzej Bialecki wrote:

> Dennis Kubes wrote:
>> Ok, I ran some bigger test crawls > 150K with the 0.9RC.  Everything
>> worked fine (inject, generate, fetch, updatedb, readdb, linkdb,
>> mergesegs, mergdb, merge, index) except delete duplicates on which I
>> am getting this error when running against segment indexes on the DFS.
>>
>> Because of the way I am automating some of my crawls (sorting names by
>> alpha and only running part of the list), only one segment part-xxxxx
>> had results and then others had 0 results.  I don't know if that would
>> cause this and I don't think this bug is critical for the 0.9 release
>> but I wanted to bring it up.
>
> Please try the patch included at the end.
>
>
>>
>> My guess would be that this is a small bug within the lucene libraries
>> when the directories have 0 results.  What is everyone's opinion on
>> this in terms of the release?  My vote would be to move forward with
>> the release.
>
> I think we should move forward.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Apache Nutch 0.9

Andrzej Białecki-2
Dennis Kubes wrote:
> That works.  I created the JIRA and attached your patch.  It passes all
> build tests and works on my 150K run across my 5 machine dev cluster.
> Should we go ahead and commit this?

Well, we certainly have to commit it to trunk, but not to branch-0.9 -
as we keep testing and discovering new issues, and patching them, we
will never make a release ... I think for issues that are not critical
or blocker we should press forward, otherwise we will have to wait
another 72 hours, and another, and another ...


--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release Apache Nutch 0.9

Sami Siren-2
In reply to this post by chrismattmann
Chris Mattmann wrote:
> Hi Folks,
>  
> I have posted a candidate for the Apache Nutch 0.9 release at
>  
>  http://people.apache.org/~mattmann/nutch_0.9/rc2/
> Please vote on releasing these packages as Apache Nutch 0.9.

+1

--
 Sami Siren
123