Index / Query IP Address as number.

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Index / Query IP Address as number.

SolrUser1543
This question was  raised here  for a few times , but no final solution was provided .

I'am using a combination of ClassicTokenizer and WordDelimiterFactory in my Query / Index chain.

as a result an IP like 192.168.1.3 is indexed as

192 - pos1
168 - pos2
1    - pos3
3    - pos4
19216813 - pos5


So searching for a similar ,but different address like 192.168.1.4 will return wrong item because of match for all 3 first position.

So the question is , what is the best way do index / query by IP as number , but using ClassicTokenizer and WordDelimiter  ?


actually I would like to have the IP as num , without breaking it on parts .  ( have only 19216813 )

Thanks .


Reply | Threaded
Open this post in threaded view
|

Re: Index / Query IP Address as number.

Walter Underwood
Use a PatternReplaceCharFilterFactory to map the periods to empty strings, then use a KeywordTokenizer and a string field type. If you want to sort it or do range queries, you might use an integer field.

wunder

On May 18, 2014, at 12:20 PM, SolrUser1543 <[hidden email]> wrote:

> This question was  raised here  for a few times , but no final solution was
> provided .
>
> I'am using a combination of ClassicTokenizer and WordDelimiterFactory in my
> Query / Index chain.
>
> as a result an IP like 192.168.1.3 is indexed as
>
> 192 - pos1
> 168 - pos2
> 1    - pos3
> 3    - pos4
> 19216813 - pos5
>
>
> So searching for a similar ,but different address like 192.168.1.4 will
> return wrong item because of match for all 3 first position.
>
> So the question is , what is the best way do index / query by IP as number ,
> but using ClassicTokenizer and WordDelimiter  ?
>
>
> actually I would like to have the IP as num , without breaking it on parts .
> ( have only 19216813 )
>
> Thanks .
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Index / Query IP Address as number.

Jack Krupansky-2
In reply to this post by SolrUser1543
What are you using for your default query operator, and do you have
autoGeneratePhraseQueries set to "true" for your field type?

I mean, a query for 192.168.1.4 shouldn't match 192.168.1.3 - unless you
have autoGeneratePhraseQueries set to "false" (the default.)

-- Jack Krupansky

-----Original Message-----
From: SolrUser1543
Sent: Sunday, May 18, 2014 3:20 PM
To: [hidden email]
Subject: Index / Query IP Address as number.

This question was  raised here  for a few times , but no final solution was
provided .

I'am using a combination of ClassicTokenizer and WordDelimiterFactory in my
Query / Index chain.

as a result an IP like 192.168.1.3 is indexed as

192 - pos1
168 - pos2
1    - pos3
3    - pos4
19216813 - pos5


So searching for a similar ,but different address like 192.168.1.4 will
return wrong item because of match for all 3 first position.

So the question is , what is the best way do index / query by IP as number ,
but using ClassicTokenizer and WordDelimiter  ?


actually I would like to have the IP as num , without breaking it on parts .
( have only 19216813 )

Thanks .






--
View this message in context:
http://lucene.472066.n3.nabble.com/Index-Query-IP-Address-as-number-tp4136760.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Index / Query IP Address as number.

SolrUser1543
I dont have autogeneratephrasequeries set to true .  I tried both false / true for it  , but nothing changed

Capture.JPG

the same chain defined for both query / index :

    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" >
      <analyzer type="index">
         
        <tokenizer class="solr.ClassicTokenizerFactory"/>" />
       <filter class="solr.WordDelimiterFilterFactory"
                splitOnCaseChange ="0"
                splitOnNumerics ="1"
                stemEnglishPossessive ="0"
                generateWordParts="1"
                generateNumberParts="1"
                catenateWords="0"
                catenateNumbers="1"
                catenateAll="1"
                preserveOriginal="0"
               
               
                />
       
      </analyzer>
Reply | Threaded
Open this post in threaded view
|

Re: Index / Query IP Address as number.

SolrUser1543
In reply to this post by Walter Underwood
I have a text field containing a large piece of mixed text , like :

test test 12/12/2001 12345 192.168.1.1 1234324


I need to  create a copy field which will capture only all IPs from the text ( may be more than one IP ) .

What will be the best way to do ?

I dont see any option to make WordDelimiter to not break down the IP , so as alternative I will use a copy field .
Reply | Threaded
Open this post in threaded view
|

Re: Index / Query IP Address as number.

Jack Krupansky-2
Consider an update processor - either raw Java or a snippet of JavaScript
with the stateless script update processor. The update processor could be
hard-coded or take parameters as to which source value to examine and what
field to output. It could use a simple regex to extract only IP addresses.
And then you could output to multiple fields - one for the raw string for
wildcard matches, say, and one as an integer for proximity or range checks.

-- Jack Krupansky

-----Original Message-----
From: SolrUser1543
Sent: Monday, May 19, 2014 3:04 PM
To: [hidden email]
Subject: Re: Index / Query IP Address as number.

I have a text field containing a large piece of mixed text , like :

test test 12/12/2001 12345 192.168.1.1 1234324


I need to  create a copy field which will capture only all IPs from the text
( may be more than one IP ) .

What will be the best way to do ?

I dont see any option to make WordDelimiter to not break down the IP , so as
alternative I will use a copy field .




--
View this message in context:
http://lucene.472066.n3.nabble.com/Index-Query-IP-Address-as-number-tp4136760p4136974.html
Sent from the Solr - User mailing list archive at Nabble.com.