Searching substring starting at a fixed position

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Searching substring starting at a fixed position

luther.blisset
hi folks,
I'm new to Lucene and I'm looking for a way to search a substring that starts at a fixed position.
It isn't a classical substring search because it's a bit weird.
I indexed a field that represents the avability of a room in a hostal during 1 year.
The field is composed by 365 digits and each digit represents 1 day and it can be set at 0 (available) or 1 (not available). Thus a string like this:
110001111...(continues with ones until the position 365)
means that the hostal is available from 3rd until 5th of January.
And now the problem...
Suppose that I want to search if an hostal is available from 3rd to 5th of January...
I'd have to use wildcards and search a string like this:
??000????...(continues with ? until the position 365)
I think this way is not a good one and is too bad for the application perfomance.
Is there a way to ask Lucene to search starting from a fixed position?
For instance, regard to the example above, could I search a 000 starting to search at the third position?
I hope all was clear and you can help me because I've no idea how to solve the problem.
Thanks a lot!

Luther
Reply | Threaded
Open this post in threaded view
|

Re: Searching substring starting at a fixed position

Karsten F.-2
Hi Luther,

your question:
"Is there a way to ask Lucene to search starting from a fixed position?"

the anwer: no, not by standard search.

But you don't want to use your field for scoring. So this is a field to filter results.
you could easily change RangeFilter for this purpose but the new filter would have to read all tokes of this field(which is slow).
So you should cache the filter with CachingWrapperFilter (at least the "most wanted" periods).

if you have a lot of changes in (only) this field you should consider to build a lucene Filter by your own without the help of lucene index (e.g. direct use of bitsets or a database).
if you don"t have a lot of changes but a lot of different periods to search for, you should change your datamodel.
Because you easly can use lucene with 1 field and 365 different tokens (20080101, 20080102, ...20081231).

Best regards
  Karsten

luther blisset wrote
hi folks,
I'm new to Lucene and I'm looking for a way to search a substring that starts at a fixed position.
It isn't a classical substring search because it's a bit weird.
I indexed a field that represents the avability of a room in a hostal during 1 year.
The field is composed by 365 digits and each digit represents 1 day and it can be set at 0 (available) or 1 (not available). Thus a string like this:
110001111...(continues with ones until the position 365)
means that the hostal is available from 3rd until 5th of January.
And now the problem...
Suppose that I want to search if an hostal is available from 3rd to 5th of January...
I'd have to use wildcards and search a string like this:
??000????...(continues with ? until the position 365)
I think this way is not a good one and is too bad for the application perfomance.
Is there a way to ask Lucene to search starting from a fixed position?
For instance, regard to the example above, could I search a 000 starting to search at the third position?
I hope all was clear and you can help me because I've no idea how to solve the problem.
Thanks a lot!

Luther
Reply | Threaded
Open this post in threaded view
|

Re: Searching substring starting at a fixed position

xinxin zhou
In reply to this post by luther.blisset
i's a question of math and arithmetic,not a question about lucene.there is
other good way deal with it.
Reply | Threaded
Open this post in threaded view
|

Re: Searching substring starting at a fixed position

Ian Lea
In reply to this post by luther.blisset
Luther


RegexQuery might work, or how about splitting the digit string out into dates
and search for them e.g. 110001111 could be stored as "avail: jan03 jan04
jan05" and a search for +avail:jan03 +avail:jan04 +avail:jan05 would get a
hit.


--
Ian.


On Thu, Sep 11, 2008 at 12:34 PM, luther blisset <[hidden email]> wrote:

>
> hi folks,
> I'm new to Lucene and I'm looking for a way to search a substring that
> starts at a fixed position.
> It isn't a classical substring search because it's a bit weird.
> I indexed a field that represents the avability of a room in a hostal during
> 1 year.
> The field is composed by 365 digits and each digit represents 1 day and it
> can be set at 0 (available) or 1 (not available). Thus a string like this:
> 110001111...(continues with ones until the position 365)
> means that the hostal is available from 3rd until 5th of January.
> And now the problem...
> Suppose that I want to search if an hostal is available from 3rd to 5th of
> January...
> I'd have to use wildcards and search a string like this:
> ??000????...(continues with ? until the position 365)
> I think this way is not a good one and is too bad for the application
> perfomance.
> Is there a way to ask Lucene to search starting from a fixed position?
> For instance, regard to the example above, could I search a 000 starting to
> search at the third position?
> I hope all was clear and you can help me because I've no idea how to solve
> the problem.
> Thanks a lot!
>
> Luther
> --
> View this message in context: http://www.nabble.com/Searching-substring-starting-at-a-fixed-position-tp19432922p19432922.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Searching substring starting at a fixed position

xinxin zhou
that's ok.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Searching substring starting at a fixed position

luther.blisset
In reply to this post by Karsten F.-2
Really thanks Karsten and Ian Lea!!
You gave me a very useful solutions
I'm going to try the last one of Karsten:

Because you easly can use lucene with 1 field and 365 different tokens (20080101, 20080102, ...20081231).


even if the solution of Ian Lea seems to be a very good one and I'll try it too.
Thanks a lot!I really appreciate your help

luther




Karsten F. wrote
Hi Luther,

your question:
"Is there a way to ask Lucene to search starting from a fixed position?"

the anwer: no, not by standard search.

But you don't want to use your field for scoring. So this is a field to filter results.
you could easily change RangeFilter for this purpose but the new filter would have to read all tokes of this field(which is slow).
So you should cache the filter with CachingWrapperFilter (at least the "most wanted" periods).

if you have a lot of changes in (only) this field you should consider to build a lucene Filter by your own without the help of lucene index (e.g. direct use of bitsets or a database).
if you don"t have a lot of changes but a lot of different periods to search for, you should change your datamodel.
Because you easly can use lucene with 1 field and 365 different tokens (20080101, 20080102, ...20081231).

Best regards
  Karsten

luther blisset wrote
hi folks,
I'm new to Lucene and I'm looking for a way to search a substring that starts at a fixed position.
It isn't a classical substring search because it's a bit weird.
I indexed a field that represents the avability of a room in a hostal during 1 year.
The field is composed by 365 digits and each digit represents 1 day and it can be set at 0 (available) or 1 (not available). Thus a string like this:
110001111...(continues with ones until the position 365)
means that the hostal is available from 3rd until 5th of January.
And now the problem...
Suppose that I want to search if an hostal is available from 3rd to 5th of January...
I'd have to use wildcards and search a string like this:
??000????...(continues with ? until the position 365)
I think this way is not a good one and is too bad for the application perfomance.
Is there a way to ask Lucene to search starting from a fixed position?
For instance, regard to the example above, could I search a 000 starting to search at the third position?
I hope all was clear and you can help me because I've no idea how to solve the problem.
Thanks a lot!

Luther