Why Lucene's Suggest API can ONLY load field terms which is Store.YES?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Why Lucene's Suggest API can ONLY load field terms which is Store.YES?

小鱼儿-2
I have a document `category` field, which is a "|,;" separator separated
string, in indexing phase, i do manually split the value into atomic terms
and index as StringField, & i also add a same name StoredField which
contains original value form:





*List<String> terms = analyzer.analysis((String)fieldValue); for(String
term: terms) {      doc.add(new StringField(fieldName, term, Store.NO));
}doc.add(new StoredField(fieldName, (String)fieldValue));*

Then i use Suggest API to load this field's all terms:















*        Set<String> terms = new HashSet<String>();
DocumentDictionary dict = new DocumentDictionary(this.indexReader,
fieldName, null);        InputIterator it;        try {            it =
dict.getEntryIterator();            //            BytesRef byteRef = null;
          while((byteRef = it.next()) != null){                String term
= byteRef.utf8ToString();                terms.add(term);            }
  } catch (IOException e) {            e.printStackTrace();
log.error(e.getMessage(), e);        }*

To my supprise, terms seems only returning the STORED value, which is the
original value form, but i expect they should be the terms i put in each
StringField!

Is this a design miss or impl. limit?
Reply | Threaded
Open this post in threaded view
|

Re: Why Lucene's Suggest API can ONLY load field terms which is Store.YES?

Mikhail Khludnev-2
Hello,
It's by design: StringFields are searchable and filled by analysis output,
StoredFields are returned input values.
That's it.

On Fri, Dec 27, 2019 at 11:32 AM 小鱼儿 <[hidden email]> wrote:

> I have a document `category` field, which is a "|,;" separator separated
> string, in indexing phase, i do manually split the value into atomic terms
> and index as StringField, & i also add a same name StoredField which
> contains original value form:
>
>
>
>
>
> *List<String> terms = analyzer.analysis((String)fieldValue); for(String
> term: terms) {      doc.add(new StringField(fieldName, term, Store.NO));
> }doc.add(new StoredField(fieldName, (String)fieldValue));*
>
> Then i use Suggest API to load this field's all terms:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *        Set<String> terms = new HashSet<String>();
> DocumentDictionary dict = new DocumentDictionary(this.indexReader,
> fieldName, null);        InputIterator it;        try {            it =
> dict.getEntryIterator();            //            BytesRef byteRef = null;
>           while((byteRef = it.next()) != null){                String term
> = byteRef.utf8ToString();                terms.add(term);            }
>   } catch (IOException e) {            e.printStackTrace();
> log.error(e.getMessage(), e);        }*
>
> To my supprise, terms seems only returning the STORED value, which is the
> original value form, but i expect they should be the terms i put in each
> StringField!
>
> Is this a design miss or impl. limit?
>


--
Sincerely yours
Mikhail Khludnev
Reply | Threaded
Open this post in threaded view
|

Re: Why Lucene's Suggest API can ONLY load field terms which is Store.YES?

小鱼儿-2
But i feel very confused about this design: if i can search by some
indexable field, means there should be some terms stored somewhere, so i
should be able to get these terms as a Dictionary?

Lucene docs says it uses the same field name for 2 kinds of index data
store when set Store.YES,  it seems treating them the same, here i have to
make 2 field names to compat the confusing and inner-conflicting design...

Mikhail Khludnev <[hidden email]> 于2019年12月27日周五 下午5:05写道:

> Hello,
> It's by design: StringFields are searchable and filled by analysis output,
> StoredFields are returned input values.
> That's it.
>
> On Fri, Dec 27, 2019 at 11:32 AM 小鱼儿 <[hidden email]> wrote:
>
> > I have a document `category` field, which is a "|,;" separator separated
> > string, in indexing phase, i do manually split the value into atomic
> terms
> > and index as StringField, & i also add a same name StoredField which
> > contains original value form:
> >
> >
> >
> >
> >
> > *List<String> terms = analyzer.analysis((String)fieldValue); for(String
> > term: terms) {      doc.add(new StringField(fieldName, term, Store.NO));
> > }doc.add(new StoredField(fieldName, (String)fieldValue));*
> >
> > Then i use Suggest API to load this field's all terms:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > *        Set<String> terms = new HashSet<String>();
> > DocumentDictionary dict = new DocumentDictionary(this.indexReader,
> > fieldName, null);        InputIterator it;        try {            it =
> > dict.getEntryIterator();            //            BytesRef byteRef =
> null;
> >           while((byteRef = it.next()) != null){                String
> term
> > = byteRef.utf8ToString();                terms.add(term);            }
> >   } catch (IOException e) {            e.printStackTrace();
> > log.error(e.getMessage(), e);        }*
> >
> > To my supprise, terms seems only returning the STORED value, which is the
> > original value form, but i expect they should be the terms i put in each
> > StringField!
> >
> > Is this a design miss or impl. limit?
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>
Reply | Threaded
Open this post in threaded view
|

Re: Why Lucene's Suggest API can ONLY load field terms which is Store.YES?

Mikhail Khludnev-2
On Fri, Dec 27, 2019 at 12:12 PM 小鱼儿 <[hidden email]> wrote:

> But i feel very confused about this design: if i can search by some
> indexable field, means there should be some terms stored somewhere, so i
> should be able to get these terms as a Dictionary?
>
Right. Here we go MultiTerms.getTerms()


>
> Lucene docs says it uses the same field name for 2 kinds of index data
> store when set Store.YES,  it seems treating them the same, here i have to
> make 2 field names to compat the confusing and inner-conflicting design...
>
It might seems so. Almost everyone got though these doubts. I like to quote
this talk https://www.youtube.com/watch?v=T5RmMNDR5XI



>
> Mikhail Khludnev <[hidden email]> 于2019年12月27日周五 下午5:05写道:
>
> > Hello,
> > It's by design: StringFields are searchable and filled by analysis
> output,
> > StoredFields are returned input values.
> > That's it.
> >
> > On Fri, Dec 27, 2019 at 11:32 AM 小鱼儿 <[hidden email]> wrote:
> >
> > > I have a document `category` field, which is a "|,;" separator
> separated
> > > string, in indexing phase, i do manually split the value into atomic
> > terms
> > > and index as StringField, & i also add a same name StoredField which
> > > contains original value form:
> > >
> > >
> > >
> > >
> > >
> > > *List<String> terms = analyzer.analysis((String)fieldValue); for(String
> > > term: terms) {      doc.add(new StringField(fieldName, term,
> Store.NO));
> > > }doc.add(new StoredField(fieldName, (String)fieldValue));*
> > >
> > > Then i use Suggest API to load this field's all terms:
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > *        Set<String> terms = new HashSet<String>();
> > > DocumentDictionary dict = new DocumentDictionary(this.indexReader,
> > > fieldName, null);        InputIterator it;        try {            it =
> > > dict.getEntryIterator();            //            BytesRef byteRef =
> > null;
> > >           while((byteRef = it.next()) != null){                String
> > term
> > > = byteRef.utf8ToString();                terms.add(term);            }
> > >   } catch (IOException e) {            e.printStackTrace();
> > > log.error(e.getMessage(), e);        }*
> > >
> > > To my supprise, terms seems only returning the STORED value, which is
> the
> > > original value form, but i expect they should be the terms i put in
> each
> > > StringField!
> > >
> > > Is this a design miss or impl. limit?
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


--
Sincerely yours
Mikhail Khludnev