Re: Lucene Indexing structure

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Lucene Indexing structure

Aaron Schon
Take a look at the Lire project:

http://www.semanticmetadata.net/lire/


2008/4/26 Vaijanath N. Rao <[hidden email]>:

> Hi Lucene-user and Lucene-dev,
>
>  I want to use lucene as an backend for the Image search (Content based
> Image retrieval).
>
>  Indexing Mechanism:
>  a) Get the Image properties such as Texture Tamura (TT), Texture Edge
> Histogram (TE), Color Coherence Vector (CCV) and Color Histogram (CH) and
> Color Correlogram  (CC) .
>  b) Convert each of these vector into String and index into lucene as
> fields, thush each Image (document in terms of lucene) consist of 6 fields
> Image name, TT field, TE field, CCV field, CH field and CC field.
>
>  Searching Mechanism:
>  a) For the search Image convert the Image into the above 5 properties.
>  b) for every field and for every value within the field construct the
> query, For example let's say the user wants to search only Color histogram
> based similarity and the query Image has 3 1 4 5 as the CH value the query
> will look like.
>    query = "CH:3 CH:1CH:4 CH:5"
>  c) for the results returned convert all the field values back into float
> and do the distance computation and re-rank the document with lower the
> distance on the top and larger distance at the bottom.
>  for example:
>    For above query assume that output has two documents
>    with one having CH as "1 3 5 4" and other one having CH as " 3 1 5 4", so
> the distance computation will rank the second document higher than the
> first.
>
>  Obviously there is something wrong with the above approach (as to get the
> correct document we need to get all the documents and than do the required
> distance calculation), but that' due to lack of my knowledge of Luce and
> lucene's Index storage.
>
>  What I want to know how to improve upon the exsisting architecture other
> than making number of fields in the lucene equalling to total number of
> feature*size of each feature.
>
>  Any other pointer will be welcomed. Is there is any Range tree
> implementation within lucene which I can use for this operation.
>
>  --Thanks and Regards
>  Vaijanath N. Rao
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: [hidden email]
>  For additional commands, e-mail: [hidden email]
>
>



--

-

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


      ____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene Indexing structure

Vaijanath N. Rao-2
Hi Aaron,

I looked into http://www.semanticmetadata.net/lire/, and have already
mailed Mathias who is the author of the tool. The problem with the tool
is that it iterates over each document in linear fashion. I have got one
of the solutions, which was to cluster he images outside lucene using
either SOM (self Organizing map) or any other clustering/classification
algorithm and than index the images and it's features in lucene with the
cluster id.

So now when a search happens first I retrieve the cluster id and than I
search in lucene for all the images having this cluster-id. Once I get
all the images within the cluster Id, I do the re-ranking based on the
distance (let's say euclidean). Which reduces some time computation.

The above design is also scalable as at any point of time I know there
will be few clusters and I would have to iterate over only those images
which are within a cluster.  But yes still it might have a bottleneck.
You can help me out in making this better.

I will also look into what Glen suggested, but not sure how to go about
it. But it's definitely worth a try.

--Thanks and Regards
Vaiajanth

Aaron Schon wrote:

> Take a look at the Lire project:
>
> http://www.semanticmetadata.net/lire/
>
>
> 2008/4/26 Vaijanath N. Rao <[hidden email]>:
>  
>> Hi Lucene-user and Lucene-dev,
>>
>>  I want to use lucene as an backend for the Image search (Content based
>> Image retrieval).
>>
>>  Indexing Mechanism:
>>  a) Get the Image properties such as Texture Tamura (TT), Texture Edge
>> Histogram (TE), Color Coherence Vector (CCV) and Color Histogram (CH) and
>> Color Correlogram  (CC) .
>>  b) Convert each of these vector into String and index into lucene as
>> fields, thush each Image (document in terms of lucene) consist of 6 fields
>> Image name, TT field, TE field, CCV field, CH field and CC field.
>>
>>  Searching Mechanism:
>>  a) For the search Image convert the Image into the above 5 properties.
>>  b) for every field and for every value within the field construct the
>> query, For example let's say the user wants to search only Color histogram
>> based similarity and the query Image has 3 1 4 5 as the CH value the query
>> will look like.
>>    query = "CH:3 CH:1CH:4 CH:5"
>>  c) for the results returned convert all the field values back into float
>> and do the distance computation and re-rank the document with lower the
>> distance on the top and larger distance at the bottom.
>>  for example:
>>    For above query assume that output has two documents
>>    with one having CH as "1 3 5 4" and other one having CH as " 3 1 5 4", so
>> the distance computation will rank the second document higher than the
>> first.
>>
>>  Obviously there is something wrong with the above approach (as to get the
>> correct document we need to get all the documents and than do the required
>> distance calculation), but that' due to lack of my knowledge of Luce and
>> lucene's Index storage.
>>
>>  What I want to know how to improve upon the exsisting architecture other
>> than making number of fields in the lucene equalling to total number of
>> feature*size of each feature.
>>
>>  Any other pointer will be welcomed. Is there is any Range tree
>> implementation within lucene which I can use for this operation.
>>
>>  --Thanks and Regards
>>  Vaijanath N. Rao
>>
>>  ---------------------------------------------------------------------
>>  To unsubscribe, e-mail: [hidden email]
>>  For additional commands, e-mail: [hidden email]
>>
>>
>>    
>
>
>
>  


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]