Multiple index fields using XMLParser plugin for Nutch

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiple index fields using XMLParser plugin for Nutch

Jayant Kumar Gandhi
After some effort and help from you guys, especially Rida I was able
to get the plugin to run on Nutch 0.8.1.

I am using the XMLParser plugin from
http://issues.apache.org/jira/browse/NUTCH-185 .
I have an XML that I need to index:-

<tags>
 <tag>tag1</tag>
 <tag>tag2</tag>
 <tag>tag3</tag>
<tags>

I wish to index each of those tags separately. XMLParser seems to
merge the content of those and index as a single value.

I want it to index as:-
tag = tag1
tag = tag2
tag = tag3

while it is indexing as
tag = tag1 tag2 tag3


I have the field 'tag' as keyword, so searching for tag:tag1 doesnt
return any results. Also I cannot have the field 'tag' as text.

Could someone please guide me on how to achieve that? I am using the
xpath value as:
//tags/tag

Thanks and Best Regards,
Jayant Gandhi

--
www.jkg.in | http://www.jkg.in/contact-me/
Jayant Kr. Gandhi
Reply | Threaded
Open this post in threaded view
|

Re: Multiple index fields using XMLParser plugin for Nutch

Rida Benjelloun
Hi Kumar,
Nutch doesn't support multifieds values, so I decided to merge the content
in the same field. If you want to search the field you should index it as
"Text" instead of "keyword".
Best regards.


On 11/11/06, Jayant Kumar Gandhi <[hidden email]> wrote:

>
> After some effort and help from you guys, especially Rida I was able
> to get the plugin to run on Nutch 0.8.1.
>
> I am using the XMLParser plugin from
> http://issues.apache.org/jira/browse/NUTCH-185 .
> I have an XML that I need to index:-
>
> <tags>
> <tag>tag1</tag>
> <tag>tag2</tag>
> <tag>tag3</tag>
> <tags>
>
> I wish to index each of those tags separately. XMLParser seems to
> merge the content of those and index as a single value.
>
> I want it to index as:-
> tag = tag1
> tag = tag2
> tag = tag3
>
> while it is indexing as
> tag = tag1 tag2 tag3
>
>
> I have the field 'tag' as keyword, so searching for tag:tag1 doesnt
> return any results. Also I cannot have the field 'tag' as text.
>
> Could someone please guide me on how to achieve that? I am using the
> xpath value as:
> //tags/tag
>
> Thanks and Best Regards,
> Jayant Gandhi
>
> --
> www.jkg.in | http://www.jkg.in/contact-me/
> Jayant Kr. Gandhi
>




-----------------------------------------------------------
Rida Benjelloun,
DocuLibre inc.
-----------------------------------------------------------