Index parts of xml file separately

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Index parts of xml file separately

Hi all,

I have Nutch running to index xml files hosted on a webserver.

The xml files are structured to have several sections called <tag1>. I'm hoping I can index each section separately.


index this
<do not index this
index this separately

Is this possible using/creating my own plugins, and if so can you give me a high level view of how I can do this?


Sent with [ProtonMail]( Secure Email.