Filtering on blank fields

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Filtering on blank fields

Manepalli, Kalyan
Hi,

            I want to fetch only the documents which have a certain
field.

For this I am using a fq query like this

fq=rev.comments:[* TO *]

 

rev.comments fields is of type string.

The functionality works correctly but I am seeing a performance
degradation

Without the above fq, the QTime is around 300ms

With fq, the QTime jumps to 850ms

 

Is there any known issue with range query on String fields

Is there any other efficient way to do this.

 

Any suggestions in this regard will be very helpful

Thanks,

Kalyan Manepalli

 

Reply | Threaded
Open this post in threaded view
|

Re: Filtering on blank fields

Mike Klaas

On 20-Nov-08, at 12:23 PM, Manepalli, Kalyan wrote:

> Hi,
>
>            I want to fetch only the documents which have a certain
> field.
>
> For this I am using a fq query like this
>
> fq=rev.comments:[* TO *]
>
>
>
> rev.comments fields is of type string.
>
> The functionality works correctly but I am seeing a performance
> degradation
>
> Without the above fq, the QTime is around 300ms
>
> With fq, the QTime jumps to 850ms
>
>
>
> Is there any known issue with range query on String fields
>
> Is there any other efficient way to do this.

This is an inverted index at its worst, unfortunately (to look for an  
empty field, you are enumerating every possible value of that field  
and excluding the docs containing it).

The solution is to store a token indicating that the field is empty,  
such as "<nocomment>" (I think that "" works too).  Then change your  
fq to

fq=-comments:"<nocomment>"

It should be much faster.

-Mike
Reply | Threaded
Open this post in threaded view
|

RE: Filtering on blank fields

Manepalli, Kalyan
Hi Mike,
        Thanks for the suggestion, I will test it out and post the
results

Thanks,
Kalyan Manepalli
-----Original Message-----
From: Mike Klaas [mailto:[hidden email]]
Sent: Thursday, November 20, 2008 2:38 PM
To: [hidden email]
Subject: Re: Filtering on blank fields


On 20-Nov-08, at 12:23 PM, Manepalli, Kalyan wrote:

> Hi,
>
>            I want to fetch only the documents which have a certain
> field.
>
> For this I am using a fq query like this
>
> fq=rev.comments:[* TO *]
>
>
>
> rev.comments fields is of type string.
>
> The functionality works correctly but I am seeing a performance
> degradation
>
> Without the above fq, the QTime is around 300ms
>
> With fq, the QTime jumps to 850ms
>
>
>
> Is there any known issue with range query on String fields
>
> Is there any other efficient way to do this.

This is an inverted index at its worst, unfortunately (to look for an  
empty field, you are enumerating every possible value of that field  
and excluding the docs containing it).

The solution is to store a token indicating that the field is empty,  
such as "<nocomment>" (I think that "" works too).  Then change your  
fq to

fq=-comments:"<nocomment>"

It should be much faster.

-Mike
If you are not the intended recipient of this e-mail message, please notify the sender
and delete all copies immediately. The sender believes this message and any attachments
were sent free of any virus, worm, Trojan horse, and other forms of malicious code.
This message and its attachments could have been infected during transmission. The
recipient opens any attachments at the recipient's own risk, and in so doing, the
recipient accepts full responsibility for such actions and agrees to take protective
and remedial action relating to any malicious code. Travelport is not liable for any
loss or damage arising from this message or its attachments.


Reply | Threaded
Open this post in threaded view
|

RE: Filtering on blank fields

Lance Norskog-2
The problem with a zero-length string "" is that it is also returned by:
field:[* TO *].  So you don't know if you're doing this right or not. For
those of us who cannot reindex at the drop of a hat, this is a big deal. We
went with -1.

Lance

-----Original Message-----
From: Manepalli, Kalyan [mailto:[hidden email]]
Sent: Thursday, November 20, 2008 12:58 PM
To: [hidden email]
Subject: RE: Filtering on blank fields

Hi Mike,
        Thanks for the suggestion, I will test it out and post the results

Thanks,
Kalyan Manepalli
-----Original Message-----
From: Mike Klaas [mailto:[hidden email]]
Sent: Thursday, November 20, 2008 2:38 PM
To: [hidden email]
Subject: Re: Filtering on blank fields


On 20-Nov-08, at 12:23 PM, Manepalli, Kalyan wrote:

> Hi,
>
>            I want to fetch only the documents which have a certain
> field.
>
> For this I am using a fq query like this
>
> fq=rev.comments:[* TO *]
>
>
>
> rev.comments fields is of type string.
>
> The functionality works correctly but I am seeing a performance
> degradation
>
> Without the above fq, the QTime is around 300ms
>
> With fq, the QTime jumps to 850ms
>
>
>
> Is there any known issue with range query on String fields
>
> Is there any other efficient way to do this.

This is an inverted index at its worst, unfortunately (to look for an  
empty field, you are enumerating every possible value of that field  
and excluding the docs containing it).

The solution is to store a token indicating that the field is empty,  
such as "<nocomment>" (I think that "" works too).  Then change your  
fq to

fq=-comments:"<nocomment>"

It should be much faster.

-Mike
If you are not the intended recipient of this e-mail message, please notify
the sender
and delete all copies immediately. The sender believes this message and any
attachments
were sent free of any virus, worm, Trojan horse, and other forms of
malicious code.
This message and its attachments could have been infected during
transmission. The
recipient opens any attachments at the recipient's own risk, and in so
doing, the
recipient accepts full responsibility for such actions and agrees to take
protective
and remedial action relating to any malicious code. Travelport is not liable
for any
loss or damage arising from this message or its attachments.