Index parts of xml file separately

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Index parts of xml file separately

andrew.foyer
Hi all,

I have Nutch running to index xml files hosted on a webserver.

The xml files are structured to have several sections called <tag1>. I'm hoping I can index each section separately.

Eg

<tag1>
index this
</tag1>
<tag2>
<do not index this
</tag2>
<tag1>
index this separately
</tag1>

Is this possible using/creating my own plugins, and if so can you give me a high level view of how I can do this?

A F

Sent with [ProtonMail](https://protonmail.com) Secure Email.