N-dimensional Point Indexing

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

N-dimensional Point Indexing

Luís Filipe Nassif
Hi all,

Lucene is able to index generic n-dimensional points for efficient
similarity or nearest neightbors search? I have looked at spatial package
in the past but seems it is specific to geo points? The use case is to
index image feature vectors to search for similar images in a corpus.

Currently we are using lucene to text search and we would like to not have
to manage two different index structures, synchronize commits, so on.

Thank you,
Luis Nassif
Reply | Threaded
Open this post in threaded view
|

Re: N-dimensional Point Indexing

Luís Filipe Nassif
Sorry, I was looking at the wrong place. Should I use BinaryPoint (
https://lucene.apache.org/core/6_0_0/core/org/apache/lucene/document/BinaryPoint.html)
?

2018-02-06 14:17 GMT-02:00 Luís Filipe Nassif <[hidden email]>:

> Hi all,
>
> Lucene is able to index generic n-dimensional points for efficient
> similarity or nearest neightbors search? I have looked at spatial package
> in the past but seems it is specific to geo points? The use case is to
> index image feature vectors to search for similar images in a corpus.
>
> Currently we are using lucene to text search and we would like to not have
> to manage two different index structures, synchronize commits, so on.
>
> Thank you,
> Luis Nassif
>
Reply | Threaded
Open this post in threaded view
|

Re: N-dimensional Point Indexing

Luís Filipe Nassif
Is it limited up to 8 dimensions as described at
https://www.elastic.co/blog/lucene-points-6.0?

2018-02-06 15:35 GMT-02:00 Luís Filipe Nassif <[hidden email]>:

> Sorry, I was looking at the wrong place. Should I use BinaryPoint (
> https://lucene.apache.org/core/6_0_0/core/org/apache/
> lucene/document/BinaryPoint.html) ?
>
> 2018-02-06 14:17 GMT-02:00 Luís Filipe Nassif <[hidden email]>:
>
>> Hi all,
>>
>> Lucene is able to index generic n-dimensional points for efficient
>> similarity or nearest neightbors search? I have looked at spatial package
>> in the past but seems it is specific to geo points? The use case is to
>> index image feature vectors to search for similar images in a corpus.
>>
>> Currently we are using lucene to text search and we would like to not
>> have to manage two different index structures, synchronize commits, so on.
>>
>> Thank you,
>> Luis Nassif
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: N-dimensional Point Indexing

Luís Filipe Nassif
Hi Lucene community,

Is BinaryPoint limited up to 8 dimensions?

Thanks,
Luis

Em 6 de fev de 2018 16:07, "Luís Filipe Nassif" <[hidden email]>
escreveu:

Is it limited up to 8 dimensions as described at
https://www.elastic.co/blog/lucene-points-6.0?

2018-02-06 15:35 GMT-02:00 Luís Filipe Nassif <[hidden email]>:

> Sorry, I was looking at the wrong place. Should I use BinaryPoint (
> https://lucene.apache.org/core/6_0_0/core/org/apache/lucene
> /document/BinaryPoint.html) ?
>
> 2018-02-06 14:17 GMT-02:00 Luís Filipe Nassif <[hidden email]>:
>
>> Hi all,
>>
>> Lucene is able to index generic n-dimensional points for efficient
>> similarity or nearest neightbors search? I have looked at spatial package
>> in the past but seems it is specific to geo points? The use case is to
>> index image feature vectors to search for similar images in a corpus.
>>
>> Currently we are using lucene to text search and we would like to not
>> have to manage two different index structures, synchronize commits, so on.
>>
>> Thank you,
>> Luis Nassif
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: N-dimensional Point Indexing

Adrien Grand
Yes it is.

Le mar. 27 févr. 2018 à 00:03, Luís Filipe Nassif <[hidden email]> a
écrit :

> Hi Lucene community,
>
> Is BinaryPoint limited up to 8 dimensions?
>
> Thanks,
> Luis
>
> Em 6 de fev de 2018 16:07, "Luís Filipe Nassif" <[hidden email]>
> escreveu:
>
> Is it limited up to 8 dimensions as described at
> https://www.elastic.co/blog/lucene-points-6.0?
>
> 2018-02-06 15:35 GMT-02:00 Luís Filipe Nassif <[hidden email]>:
>
> > Sorry, I was looking at the wrong place. Should I use BinaryPoint (
> > https://lucene.apache.org/core/6_0_0/core/org/apache/lucene
> > /document/BinaryPoint.html) ?
> >
> > 2018-02-06 14:17 GMT-02:00 Luís Filipe Nassif <[hidden email]>:
> >
> >> Hi all,
> >>
> >> Lucene is able to index generic n-dimensional points for efficient
> >> similarity or nearest neightbors search? I have looked at spatial
> package
> >> in the past but seems it is specific to geo points? The use case is to
> >> index image feature vectors to search for similar images in a corpus.
> >>
> >> Currently we are using lucene to text search and we would like to not
> >> have to manage two different index structures, synchronize commits, so
> on.
> >>
> >> Thank you,
> >> Luis Nassif
> >>
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: N-dimensional Point Indexing

Luís Filipe Nassif
Thank you, Adrian.

Em 26 de fev de 2018 21:19, "Adrien Grand" <[hidden email]> escreveu:

> Yes it is.
>
> Le mar. 27 févr. 2018 à 00:03, Luís Filipe Nassif <[hidden email]> a
> écrit :
>
>> Hi Lucene community,
>>
>> Is BinaryPoint limited up to 8 dimensions?
>>
>> Thanks,
>> Luis
>>
>> Em 6 de fev de 2018 16:07, "Luís Filipe Nassif" <[hidden email]>
>> escreveu:
>>
>> Is it limited up to 8 dimensions as described at
>> https://www.elastic.co/blog/lucene-points-6.0?
>>
>> 2018-02-06 15:35 GMT-02:00 Luís Filipe Nassif <[hidden email]>:
>>
>> > Sorry, I was looking at the wrong place. Should I use BinaryPoint (
>> > https://lucene.apache.org/core/6_0_0/core/org/apache/lucene
>> > /document/BinaryPoint.html) ?
>> >
>> > 2018-02-06 14:17 GMT-02:00 Luís Filipe Nassif <[hidden email]>:
>> >
>> >> Hi all,
>> >>
>> >> Lucene is able to index generic n-dimensional points for efficient
>> >> similarity or nearest neightbors search? I have looked at spatial
>> package
>> >> in the past but seems it is specific to geo points? The use case is to
>> >> index image feature vectors to search for similar images in a corpus.
>> >>
>> >> Currently we are using lucene to text search and we would like to not
>> >> have to manage two different index structures, synchronize commits, so
>> on.
>> >>
>> >> Thank you,
>> >> Luis Nassif
>> >>
>> >
>> >
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: N-dimensional Point Indexing

kkrugler
I’ve been looking at directly storing feature vectors and providing scoring/filtering support.

This is for vectors consisting of (typically 300 - 2048) floats or doubles.

It’s following the same pattern as geospatial support - so a new field type and query/parser, plus plumbing to hook it into Solr.

Before I go much further, is there anything like this already done, or in the works?

Thanks,

— Ken


> On Feb 26, 2018, at 4:24 PM, Luís Filipe Nassif <[hidden email]> wrote:
>
> Thank you, Adrian.
>
> Em 26 de fev de 2018 21:19, "Adrien Grand" <[hidden email]> escreveu:
>
>> Yes it is.
>>
>> Le mar. 27 févr. 2018 à 00:03, Luís Filipe Nassif <[hidden email]> a
>> écrit :
>>
>>> Hi Lucene community,
>>>
>>> Is BinaryPoint limited up to 8 dimensions?
>>>
>>> Thanks,
>>> Luis
>>>
>>> Em 6 de fev de 2018 16:07, "Luís Filipe Nassif" <[hidden email]>
>>> escreveu:
>>>
>>> Is it limited up to 8 dimensions as described at
>>> https://www.elastic.co/blog/lucene-points-6.0?
>>>
>>> 2018-02-06 15:35 GMT-02:00 Luís Filipe Nassif <[hidden email]>:
>>>
>>>> Sorry, I was looking at the wrong place. Should I use BinaryPoint (
>>>> https://lucene.apache.org/core/6_0_0/core/org/apache/lucene
>>>> /document/BinaryPoint.html) ?
>>>>
>>>> 2018-02-06 14:17 GMT-02:00 Luís Filipe Nassif <[hidden email]>:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Lucene is able to index generic n-dimensional points for efficient
>>>>> similarity or nearest neightbors search? I have looked at spatial
>>> package
>>>>> in the past but seems it is specific to geo points? The use case is to
>>>>> index image feature vectors to search for similar images in a corpus.
>>>>>
>>>>> Currently we are using lucene to text search and we would like to not
>>>>> have to manage two different index structures, synchronize commits, so
>>> on.
>>>>>
>>>>> Thank you,
>>>>> Luis Nassif

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra

Reply | Threaded
Open this post in threaded view
|

Re: N-dimensional Point Indexing

Adrien Grand
If you need them for scoring, then the natural choice would be to
encode them in a BinaryDocValuesField. How do you plan to filter on
these filter vectors? This is too many dimensions for points and doc
values are not good at filtering.

On Thu, Oct 18, 2018 at 2:32 AM Ken Krugler <[hidden email]> wrote:

>
> I’ve been looking at directly storing feature vectors and providing scoring/filtering support.
>
> This is for vectors consisting of (typically 300 - 2048) floats or doubles.
>
> It’s following the same pattern as geospatial support - so a new field type and query/parser, plus plumbing to hook it into Solr.
>
> Before I go much further, is there anything like this already done, or in the works?
>
> Thanks,
>
> — Ken
>
>
> > On Feb 26, 2018, at 4:24 PM, Luís Filipe Nassif <[hidden email]> wrote:
> >
> > Thank you, Adrian.
> >
> > Em 26 de fev de 2018 21:19, "Adrien Grand" <[hidden email]> escreveu:
> >
> >> Yes it is.
> >>
> >> Le mar. 27 févr. 2018 à 00:03, Luís Filipe Nassif <[hidden email]> a
> >> écrit :
> >>
> >>> Hi Lucene community,
> >>>
> >>> Is BinaryPoint limited up to 8 dimensions?
> >>>
> >>> Thanks,
> >>> Luis
> >>>
> >>> Em 6 de fev de 2018 16:07, "Luís Filipe Nassif" <[hidden email]>
> >>> escreveu:
> >>>
> >>> Is it limited up to 8 dimensions as described at
> >>> https://www.elastic.co/blog/lucene-points-6.0?
> >>>
> >>> 2018-02-06 15:35 GMT-02:00 Luís Filipe Nassif <[hidden email]>:
> >>>
> >>>> Sorry, I was looking at the wrong place. Should I use BinaryPoint (
> >>>> https://lucene.apache.org/core/6_0_0/core/org/apache/lucene
> >>>> /document/BinaryPoint.html) ?
> >>>>
> >>>> 2018-02-06 14:17 GMT-02:00 Luís Filipe Nassif <[hidden email]>:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> Lucene is able to index generic n-dimensional points for efficient
> >>>>> similarity or nearest neightbors search? I have looked at spatial
> >>> package
> >>>>> in the past but seems it is specific to geo points? The use case is to
> >>>>> index image feature vectors to search for similar images in a corpus.
> >>>>>
> >>>>> Currently we are using lucene to text search and we would like to not
> >>>>> have to manage two different index structures, synchronize commits, so
> >>> on.
> >>>>>
> >>>>> Thank you,
> >>>>> Luis Nassif
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
>


--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]