number of block duplicated

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

number of block duplicated

Stefan Groschupf-2
Hi,
I remember that I heard it is possible to change the number of block  
duplicates somehow, but didn't find any property in the nutch-
default.xml.
Can someone give the lost link to this property?

Thanks.
Stefan
Reply | Threaded
Open this post in threaded view
|

Re: number of block duplicated

Dominik Friedrich
In ndfs/FSNamesystem.java is a property ndfs.replication so I added

<property>
  <name>ndfs.replication</name>
  <value>2</value>
  <description>Number of replications of fs blocks</description>
</property>

to my nutch-site.xml. I saw in this file and other files also properties
that are not in nutch-default.xml and some are not even read from
NutchConf but set directly in the source code, e.g. in
mapred/MRConstants.java. I could go through the current code and create
an updated nutch-site.xml with all properties some day next week.

regards,
Dominik


Stefan Groschupf schrieb:
> Hi,
> I remember that I heard it is possible to change the number of block
> duplicates somehow, but didn't find any property in the
> nutch-default.xml.
> Can someone give the lost link to this property?
>
> Thanks.
> Stefan
>


Reply | Threaded
Open this post in threaded view
|

Re: number of block duplicated

Stefan Groschupf-2
>
> to my nutch-site.xml. I saw in this file and other files also  
> properties that are not in nutch-default.xml and some are not even  
> read from NutchConf but set directly in the source code, e.g. in  
> mapred/MRConstants.java. I could go through the current code and  
> create an updated nutch-site.xml with all properties some day next  
> week.

That would be great! !!

Thanks!

Reply | Threaded
Open this post in threaded view
|

Re: number of block duplicated

Dominik Friedrich
Patch attached. I haven't added any properties from plugins or test classes.

best regards,
Dominik

Stefan Groschupf schrieb:

>>
>> to my nutch-site.xml. I saw in this file and other files also
>> properties that are not in nutch-default.xml and some are not even
>> read from NutchConf but set directly in the source code, e.g. in
>> mapred/MRConstants.java. I could go through the current code and
>> create an updated nutch-site.xml with all properties some day next week.
>
> That would be great! !!
>
> Thanks!
>
>
>

Index: nutch-default.xml
===================================================================
--- nutch-default.xml (revision 369731)
+++ nutch-default.xml (working copy)
@@ -379,6 +379,14 @@
   exception.</description>
 </property>
   
+<property>
+  <name>io.map.index.skip</name>
+  <value>0</value>
+  <description>Number of index entries to skip between each entry.  Zero by default.
+   Setting this to values larger than zero can facilitate opening large map
+   files using less memory.</description>
+</property>
+
 <!-- file system properties -->
 
 <property>
@@ -412,6 +420,39 @@
   directories, typically on different devices.</description>
 </property>
 
+<property>
+  <name>ndfs.replication</name>
+  <value>3</value>
+  <description>How many copies we try to have at all times. The actual number of
+  replications is at max the number of datanodes in the cluster.</description>
+</property>
+
+<property>
+  <name>ndfs.max-repl-streams</name>
+  <value>2</value>
+  <description>How many outgoing replication streams a given node should
+  have at one time.</description>
+</property>
+
+<property>
+  <name>ndfs.availability.allocation</name>
+  <value>false</value>
+  <description>Whether we should use disk-availability info when
+  determining target.</description>
+</property>
+
+<property>
+  <name>ndfs.namenode.handler.count</name>
+  <value>10</value>
+  <description>Number of handler threads.</description>
+</property>
+
+<property>
+  <name>test.ndfs.same.host.targets.allowed</name>
+  <value>false</value>
+  <description>??? You better don't change it unless you know what you do.</description>
+</property>
+
 <!-- map/reduce properties -->
 
 <property>
@@ -511,6 +552,13 @@
   child processes.</description>
 </property>
 
+<property>
+  <name>mapred.combine.buffer.size</name>
+  <value>100000</value>
+  <description>The number of entries the combining collector caches befor
+  combining them and writing to disk.</description>
+</property>
+
 <!-- indexer properties -->
 
 <property>
@@ -926,4 +974,13 @@
   </description>
 </property>
 
+<!-- PruneIndexTool -->
+
+<property>
+  <name>prune.index.tool.queries</name>
+  <value></value>
+  <description>With no -queries Parameter calling PruneIndexTool this file
+  is used.</description>
+</property>
+
 </nutch-conf>
Reply | Threaded
Open this post in threaded view
|

Re: number of block duplicated

Doug Cutting-2
Dominik Friedrich wrote:
> Patch attached.

Thanks!

I've added some of these to nutch-default.xml.  I didn't add them all,
since putting them in nutch-default is like declaring something public:
we'll need to keep supporting that feature in the future.  Some of these
are things that users should not change, or are things which might soon
be removed.

Cheers,

Doug
Reply | Threaded
Open this post in threaded view
|

Link index & page content obtained separately.

Pashabhai
Hi ,

Can anybody please  explain how indexes and page
content are linked together.

In a situation, where only indexes are sent by third
party but not the page content.

can i fetch the page content using Fetcher and then
link index & page together.

Presently i am able to show search results without
snippet & cached page.


Thanks
P



 

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com