Index Content Removing the HTML Tags.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Index Content Removing the HTML Tags.

Fiz Newyorker
Hello Solr Group,

Good Morning !

I am working on Solr 6.5 version and I am trying to Index from Mongo DB
3.2.5.

I have content collection in mongodb where there is body column which has
html tags in it.
I want to index body column with out html tags.

*Please see the below body column data in mongodb*

"<body><p>i cant hear the other side but they can hear me we are both using
same android software and Note 4 what seems to be the problem on her phone
that i cant hear her on messenger</p></body>"

I want to index only the content , I don't want html tags to be indexed and
searched.

Please let me know how to go about this .


Thanks
Fiz Ahmed.
Reply | Threaded
Open this post in threaded view
|

Re: Index Content Removing the HTML Tags.

Erick Erickson
Have you tried: HtmlStripCharFilterFactory?

On Mon, Dec 4, 2017 at 12:37 PM, Fiz Newyorker <[hidden email]> wrote:

> Hello Solr Group,
>
> Good Morning !
>
> I am working on Solr 6.5 version and I am trying to Index from Mongo DB
> 3.2.5.
>
> I have content collection in mongodb where there is body column which has
> html tags in it.
> I want to index body column with out html tags.
>
> *Please see the below body column data in mongodb*
>
> "<body><p>i cant hear the other side but they can hear me we are both using
> same android software and Note 4 what seems to be the problem on her phone
> that i cant hear her on messenger</p></body>"
>
> I want to index only the content , I don't want html tags to be indexed and
> searched.
>
> Please let me know how to go about this .
>
>
> Thanks
> Fiz Ahmed.