Get distinct fields values from lucene index

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Get distinct fields values from lucene index

Amol Suryawanshi
Hello,

I am using lucene in my organization. I want to know how can I get distinct values from lucene index. I have tried “GroupingSearch” API but it doesn’t serves the purpose. It will give all documents contains distinct values. I have used below code.


final GroupingSearch groupingSearch = new GroupingSearch(groupField);

Sort sort  =  new Sort(new SortField(groupField, SortField.Type.STRING_VAL, false));
groupingSearch.setSortWithinGroup(sort);
Query query = new MatchAllDocsQuery();
TopGroups<BytesRef> topGroups = null;

try {
    topGroups = groupingSearch.search(searcher, query, 0, 10);
} catch (final IOException e) {
    System.out.println("Can't execute group search because of an IOException. "+ e);
}

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

Reply | Threaded
Open this post in threaded view
|

Re: Get distinct fields values from lucene index

Michael Sokolov-4
In Solr and ES this is done with faceting and aggregations,
respectively, based on Lucene's low-level APIs. Have you looked at
TermsEnum? You can use that to get all distinct terms for a segment,
and then it is up to you to coalesce terms across segments ("leaves").

On Thu, Nov 21, 2019 at 1:15 AM Amol Suryawanshi
<[hidden email]> wrote:

>
> Hello,
>
> I am using lucene in my organization. I want to know how can I get distinct values from lucene index. I have tried “GroupingSearch” API but it doesn’t serves the purpose. It will give all documents contains distinct values. I have used below code.
>
>
> final GroupingSearch groupingSearch = new GroupingSearch(groupField);
>
> Sort sort  =  new Sort(new SortField(groupField, SortField.Type.STRING_VAL, false));
> groupingSearch.setSortWithinGroup(sort);
> Query query = new MatchAllDocsQuery();
> TopGroups<BytesRef> topGroups = null;
>
> try {
>     topGroups = groupingSearch.search(searcher, query, 0, 10);
> } catch (final IOException e) {
>     System.out.println("Can't execute group search because of an IOException. "+ e);
> }
>
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Get distinct fields values from lucene index

Amol Suryawanshi
Hello Michael,

Thanks for the response,

I have tried the approach suggested by you(TermsEnum) but it is not working for me. I have used below code.


String field = "address";
try (IndexReader reader = Utils.getIndexReader(indexDirectoryPath))
{
    List<LeafReaderContext> leaves = reader.leaves();
    for (LeafReaderContext leaf : leaves) {
        Terms _terms = leaf.reader().terms(field);
        if (_terms == null) {
            continue;
        }

        TermsEnum termsEnum = _terms.iterator();
        System.out.println(termsEnum);
    }
} catch (IOException e) {
    e.printStackTrace();
}


Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

________________________________
From: Michael Sokolov <[hidden email]>
Sent: Friday, November 22, 2019 8:11:25 PM
To: [hidden email] <[hidden email]>
Subject: Re: Get distinct fields values from lucene index

In Solr and ES this is done with faceting and aggregations,
respectively, based on Lucene's low-level APIs. Have you looked at
TermsEnum? You can use that to get all distinct terms for a segment,
and then it is up to you to coalesce terms across segments ("leaves").

On Thu, Nov 21, 2019 at 1:15 AM Amol Suryawanshi
<[hidden email]> wrote:

>
> Hello,
>
> I am using lucene in my organization. I want to know how can I get distinct values from lucene index. I have tried “GroupingSearch” API but it doesn’t serves the purpose. It will give all documents contains distinct values. I have used below code.
>
>
> final GroupingSearch groupingSearch = new GroupingSearch(groupField);
>
> Sort sort  =  new Sort(new SortField(groupField, SortField.Type.STRING_VAL, false));
> groupingSearch.setSortWithinGroup(sort);
> Query query = new MatchAllDocsQuery();
> TopGroups<BytesRef> topGroups = null;
>
> try {
>     topGroups = groupingSearch.search(searcher, query, 0, 10);
> } catch (final IOException e) {
>     System.out.println("Can't execute group search because of an IOException. "+ e);
> }
>
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Get distinct fields values from lucene index

Amol Suryawanshi
Thanks Michael,

Appreciate your feedback. It’s working for me now.

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

________________________________
From: Amol Suryawanshi <[hidden email]>
Sent: Monday, November 25, 2019 7:05:36 PM
To: [hidden email] <[hidden email]>
Subject: RE: Get distinct fields values from lucene index

Hello Michael,

Thanks for the response,

I have tried the approach suggested by you(TermsEnum) but it is not working for me. I have used below code.


String field = "address";
try (IndexReader reader = Utils.getIndexReader(indexDirectoryPath))
{
    List<LeafReaderContext> leaves = reader.leaves();
    for (LeafReaderContext leaf : leaves) {
        Terms _terms = leaf.reader().terms(field);
        if (_terms == null) {
            continue;
        }

        TermsEnum termsEnum = _terms.iterator();
        System.out.println(termsEnum);
    }
} catch (IOException e) {
    e.printStackTrace();
}


Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

________________________________
From: Michael Sokolov <[hidden email]>
Sent: Friday, November 22, 2019 8:11:25 PM
To: [hidden email] <[hidden email]>
Subject: Re: Get distinct fields values from lucene index

In Solr and ES this is done with faceting and aggregations,
respectively, based on Lucene's low-level APIs. Have you looked at
TermsEnum? You can use that to get all distinct terms for a segment,
and then it is up to you to coalesce terms across segments ("leaves").

On Thu, Nov 21, 2019 at 1:15 AM Amol Suryawanshi
<[hidden email]> wrote:

>
> Hello,
>
> I am using lucene in my organization. I want to know how can I get distinct values from lucene index. I have tried “GroupingSearch” API but it doesn’t serves the purpose. It will give all documents contains distinct values. I have used below code.
>
>
> final GroupingSearch groupingSearch = new GroupingSearch(groupField);
>
> Sort sort  =  new Sort(new SortField(groupField, SortField.Type.STRING_VAL, false));
> groupingSearch.setSortWithinGroup(sort);
> Query query = new MatchAllDocsQuery();
> TopGroups<BytesRef> topGroups = null;
>
> try {
>     topGroups = groupingSearch.search(searcher, query, 0, 10);
> } catch (final IOException e) {
>     System.out.println("Can't execute group search because of an IOException. "+ e);
> }
>
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]