Getting list of unique values in a field

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Getting list of unique values in a field

Steven White
Hi everyone,

One of my indexed field is as follows:

    <field name="CC_FILE_EXT" type="string" docValues="true"
multiValued="false" indexed="true" required="true" stored="false"/>

It holds the file extension of the files I'm indexing.  That is, let us say
I indexed 10 million files and the result of such indexing, the field
CC_FILE_EXT will now have the file extension.  In my case the unique file
extension list is about 300.

Using SolrJ, is there a quick and fast way for me to get back all the
unique values this field has across all of my document?  I don't and cannot
scan all the 10 million indexed documents in Solr to build that list.  That
would be very inefficient.

Thanks,

Steven
Reply | Threaded
Open this post in threaded view
|

Re: Getting list of unique values in a field

David Hastings
just use a facet on the field should work yes?

On Fri, Jul 12, 2019 at 9:39 AM Steven White <[hidden email]> wrote:

> Hi everyone,
>
> One of my indexed field is as follows:
>
>     <field name="CC_FILE_EXT" type="string" docValues="true"
> multiValued="false" indexed="true" required="true" stored="false"/>
>
> It holds the file extension of the files I'm indexing.  That is, let us say
> I indexed 10 million files and the result of such indexing, the field
> CC_FILE_EXT will now have the file extension.  In my case the unique file
> extension list is about 300.
>
> Using SolrJ, is there a quick and fast way for me to get back all the
> unique values this field has across all of my document?  I don't and cannot
> scan all the 10 million indexed documents in Solr to build that list.  That
> would be very inefficient.
>
> Thanks,
>
> Steven
>
Reply | Threaded
Open this post in threaded view
|

Re: Getting list of unique values in a field

Steven White
Thanks David.  But is there a SolrJ sample code on how to do this?  I need
to see one, or at least the API, so I know how to make the call.

Steven

On Fri, Jul 12, 2019 at 9:42 AM David Hastings <[hidden email]>
wrote:

> just use a facet on the field should work yes?
>
> On Fri, Jul 12, 2019 at 9:39 AM Steven White <[hidden email]> wrote:
>
> > Hi everyone,
> >
> > One of my indexed field is as follows:
> >
> >     <field name="CC_FILE_EXT" type="string" docValues="true"
> > multiValued="false" indexed="true" required="true" stored="false"/>
> >
> > It holds the file extension of the files I'm indexing.  That is, let us
> say
> > I indexed 10 million files and the result of such indexing, the field
> > CC_FILE_EXT will now have the file extension.  In my case the unique file
> > extension list is about 300.
> >
> > Using SolrJ, is there a quick and fast way for me to get back all the
> > unique values this field has across all of my document?  I don't and
> cannot
> > scan all the 10 million indexed documents in Solr to build that list.
> That
> > would be very inefficient.
> >
> > Thanks,
> >
> > Steven
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Getting list of unique values in a field

David Hastings
i found this:

https://stackoverflow.com/questions/14485031/faceting-using-solrj-and-solr4

and this

https://www.programcreek.com/java-api-examples/?api=org.apache.solr.client.solrj.response.FacetField


just from a google search

On Fri, Jul 12, 2019 at 9:46 AM Steven White <[hidden email]> wrote:

> Thanks David.  But is there a SolrJ sample code on how to do this?  I need
> to see one, or at least the API, so I know how to make the call.
>
> Steven
>
> On Fri, Jul 12, 2019 at 9:42 AM David Hastings <
> [hidden email]>
> wrote:
>
> > just use a facet on the field should work yes?
> >
> > On Fri, Jul 12, 2019 at 9:39 AM Steven White <[hidden email]>
> wrote:
> >
> > > Hi everyone,
> > >
> > > One of my indexed field is as follows:
> > >
> > >     <field name="CC_FILE_EXT" type="string" docValues="true"
> > > multiValued="false" indexed="true" required="true" stored="false"/>
> > >
> > > It holds the file extension of the files I'm indexing.  That is, let us
> > say
> > > I indexed 10 million files and the result of such indexing, the field
> > > CC_FILE_EXT will now have the file extension.  In my case the unique
> file
> > > extension list is about 300.
> > >
> > > Using SolrJ, is there a quick and fast way for me to get back all the
> > > unique values this field has across all of my document?  I don't and
> > cannot
> > > scan all the 10 million indexed documents in Solr to build that list.
> > That
> > > would be very inefficient.
> > >
> > > Thanks,
> > >
> > > Steven
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Getting list of unique values in a field

Jason Gerlowski
The Solr ref-guide has examples which show how to do this too.  Take a
look at some of the faceting examples here:
https://lucene.apache.org/solr/guide/8_1/json-facet-api.html#bucketing-facet-example

Best,

Jason

On Fri, Jul 12, 2019 at 9:50 AM David Hastings
<[hidden email]> wrote:

>
> i found this:
>
> https://stackoverflow.com/questions/14485031/faceting-using-solrj-and-solr4
>
> and this
>
> https://www.programcreek.com/java-api-examples/?api=org.apache.solr.client.solrj.response.FacetField
>
>
> just from a google search
>
> On Fri, Jul 12, 2019 at 9:46 AM Steven White <[hidden email]> wrote:
>
> > Thanks David.  But is there a SolrJ sample code on how to do this?  I need
> > to see one, or at least the API, so I know how to make the call.
> >
> > Steven
> >
> > On Fri, Jul 12, 2019 at 9:42 AM David Hastings <
> > [hidden email]>
> > wrote:
> >
> > > just use a facet on the field should work yes?
> > >
> > > On Fri, Jul 12, 2019 at 9:39 AM Steven White <[hidden email]>
> > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > One of my indexed field is as follows:
> > > >
> > > >     <field name="CC_FILE_EXT" type="string" docValues="true"
> > > > multiValued="false" indexed="true" required="true" stored="false"/>
> > > >
> > > > It holds the file extension of the files I'm indexing.  That is, let us
> > > say
> > > > I indexed 10 million files and the result of such indexing, the field
> > > > CC_FILE_EXT will now have the file extension.  In my case the unique
> > file
> > > > extension list is about 300.
> > > >
> > > > Using SolrJ, is there a quick and fast way for me to get back all the
> > > > unique values this field has across all of my document?  I don't and
> > > cannot
> > > > scan all the 10 million indexed documents in Solr to build that list.
> > > That
> > > > would be very inefficient.
> > > >
> > > > Thanks,
> > > >
> > > > Steven
> > > >
> > >
> >