regarding FieldSelector

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

regarding FieldSelector

is_maximum
Hi all,

Can anyone explain what is the FieldSelector and the usage or benefits of
this structure? I read the javadocs but I can't get for what goal it is
provided in Lucene.

Thanks in advance

--
Regards,
Mohammad
--------------------------
see my blog: http://brainable.blogspot.com/
another in Persian: http://fekre-motefavet.blogspot.com/
--
Regards
Mohammad
Pixelshot
Reply | Threaded
Open this post in threaded view
|

Re: regarding FieldSelector

Grant Ingersoll-2
Hi Mohammad,

The typical use cases are:
1. You have several small fields used in a results display and one or  
two large fields (i.e. the original document) and you don't want to  
pay the cost of loading the large fields for results display because  
most of them won't be chosen.  When a result is chosen, the lazily  
loaded field will be retrieved.

2. You only want to load certain fields, or the first field, or you  
just want to know the size of a field.

Basically, it gives you control over how fields are loaded from disk  
in Lucene.

See my ApacheCon Europe presentation http://cnlp.org/presentations/ 
slides/AdvancedLuceneEU.pdf for a few slides (towards the end) on  
FieldSelector.

On Sep 12, 2007, at 5:13 AM, Mohammad Norouzi wrote:

> Hi all,
>
> Can anyone explain what is the FieldSelector and the usage or  
> benefits of
> this structure? I read the javadocs but I can't get for what goal  
> it is
> provided in Lucene.
>
> Thanks in advance
>
> --
> Regards,
> Mohammad
> --------------------------
> see my blog: http://brainable.blogspot.com/
> another in Persian: http://fekre-motefavet.blogspot.com/

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: regarding FieldSelector

is_maximum
Hi Grant,
Really thanks for your nice document about advanced Lucene. it was very
useful for me.

as I understand, we can set some large fields to be lazily loading, now my
question is when it will be loaded? it make sense when we call
doc.get("field_name")
it will load from the index, Am I right?

in my application, I've provided a result set structure to navigate between
results and documents and provide a get(String fieldname) method just like
java.sql.ResultSet.getString() method, and also this result set implements
HitCollector in order to collect my own ID rather than Lucene's document id,
so I think I can set my field ID to be loaded always and the other fields to
be lazily loading, Does this improve the search process?

again, thank you very much indeed.


On 9/12/07, Grant Ingersoll <[hidden email]> wrote:

>
> Hi Mohammad,
>
> The typical use cases are:
> 1. You have several small fields used in a results display and one or
> two large fields (i.e. the original document) and you don't want to
> pay the cost of loading the large fields for results display because
> most of them won't be chosen.  When a result is chosen, the lazily
> loaded field will be retrieved.
>
> 2. You only want to load certain fields, or the first field, or you
> just want to know the size of a field.
>
> Basically, it gives you control over how fields are loaded from disk
> in Lucene.
>
> See my ApacheCon Europe presentation http://cnlp.org/presentations/
> slides/AdvancedLuceneEU.pdf for a few slides (towards the end) on
> FieldSelector.
>
> On Sep 12, 2007, at 5:13 AM, Mohammad Norouzi wrote:
>
> > Hi all,
> >
> > Can anyone explain what is the FieldSelector and the usage or
> > benefits of
> > this structure? I read the javadocs but I can't get for what goal
> > it is
> > provided in Lucene.
> >
> > Thanks in advance
> >
> > --
> > Regards,
> > Mohammad
> > --------------------------
> > see my blog: http://brainable.blogspot.com/
> > another in Persian: http://fekre-motefavet.blogspot.com/
>
> --------------------------
> Grant Ingersoll
> http://lucene.grantingersoll.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Regards,
Mohammad
--------------------------
see my blog: http://brainable.blogspot.com/
another in Persian: http://fekre-motefavet.blogspot.com/
--
Regards
Mohammad
Pixelshot
Reply | Threaded
Open this post in threaded view
|

Re: regarding FieldSelector

Erick Erickson
Well, it depends on what "improve the search process" means
in your context <G>..

But I had a case similar to yours that I wrote up in the Wiki where
my search times improved about 10X by using lazy loading. You
might want to read that entry here...

http://wiki.apache.org/lucene-java/FieldSelectorPerformance

Note the peculiar characteristics of my data set, I really suspect
that a 10x improvement in retrieval speed is atypical...

As for when lazily-loaded fields actually get loaded, I didn't really
have to explore it very fully, but a short experiment should do it
for you.....

Best
Erick

On 9/12/07, Mohammad Norouzi <[hidden email]> wrote:

>
> Hi Grant,
> Really thanks for your nice document about advanced Lucene. it was very
> useful for me.
>
> as I understand, we can set some large fields to be lazily loading, now my
> question is when it will be loaded? it make sense when we call
> doc.get("field_name")
> it will load from the index, Am I right?
>
> in my application, I've provided a result set structure to navigate
> between
> results and documents and provide a get(String fieldname) method just like
> java.sql.ResultSet.getString() method, and also this result set implements
> HitCollector in order to collect my own ID rather than Lucene's document
> id,
> so I think I can set my field ID to be loaded always and the other fields
> to
> be lazily loading, Does this improve the search process?
>
> again, thank you very much indeed.
>
>
> On 9/12/07, Grant Ingersoll <[hidden email]> wrote:
> >
> > Hi Mohammad,
> >
> > The typical use cases are:
> > 1. You have several small fields used in a results display and one or
> > two large fields (i.e. the original document) and you don't want to
> > pay the cost of loading the large fields for results display because
> > most of them won't be chosen.  When a result is chosen, the lazily
> > loaded field will be retrieved.
> >
> > 2. You only want to load certain fields, or the first field, or you
> > just want to know the size of a field.
> >
> > Basically, it gives you control over how fields are loaded from disk
> > in Lucene.
> >
> > See my ApacheCon Europe presentation http://cnlp.org/presentations/
> > slides/AdvancedLuceneEU.pdf for a few slides (towards the end) on
> > FieldSelector.
> >
> > On Sep 12, 2007, at 5:13 AM, Mohammad Norouzi wrote:
> >
> > > Hi all,
> > >
> > > Can anyone explain what is the FieldSelector and the usage or
> > > benefits of
> > > this structure? I read the javadocs but I can't get for what goal
> > > it is
> > > provided in Lucene.
> > >
> > > Thanks in advance
> > >
> > > --
> > > Regards,
> > > Mohammad
> > > --------------------------
> > > see my blog: http://brainable.blogspot.com/
> > > another in Persian: http://fekre-motefavet.blogspot.com/
> >
> > --------------------------
> > Grant Ingersoll
> > http://lucene.grantingersoll.com
> >
> > Lucene Helpful Hints:
> > http://wiki.apache.org/lucene-java/BasicsOfPerformance
> > http://wiki.apache.org/lucene-java/LuceneFAQ
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
>
> --
> Regards,
> Mohammad
> --------------------------
> see my blog: http://brainable.blogspot.com/
> another in Persian: http://fekre-motefavet.blogspot.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: regarding FieldSelector

is_maximum
Thanks
as I saw the documents, we can only use this great field selector in
IndexReader.document() method the problem is I have a Searcher in my result
set structure and when the client calls getString("a_field_name") at that
time I invoke the searcher.doc(current_doc_id).get("a_field_name),
I already collected the result IDs. so in my case, I can't use
FieldSelector.

Do I have to revise the way of retrieving documents in my code?



On 9/12/07, Erick Erickson <[hidden email]> wrote:

>
> Well, it depends on what "improve the search process" means
> in your context <G>..
>
> But I had a case similar to yours that I wrote up in the Wiki where
> my search times improved about 10X by using lazy loading. You
> might want to read that entry here...
>
> http://wiki.apache.org/lucene-java/FieldSelectorPerformance
>
> Note the peculiar characteristics of my data set, I really suspect
> that a 10x improvement in retrieval speed is atypical...
>
> As for when lazily-loaded fields actually get loaded, I didn't really
> have to explore it very fully, but a short experiment should do it
> for you.....
>
> Best
> Erick
>
> On 9/12/07, Mohammad Norouzi <[hidden email]> wrote:
> >
> > Hi Grant,
> > Really thanks for your nice document about advanced Lucene. it was very
> > useful for me.
> >
> > as I understand, we can set some large fields to be lazily loading, now
> my
> > question is when it will be loaded? it make sense when we call
> > doc.get("field_name")
> > it will load from the index, Am I right?
> >
> > in my application, I've provided a result set structure to navigate
> > between
> > results and documents and provide a get(String fieldname) method just
> like
> > java.sql.ResultSet.getString() method, and also this result set
> implements
> > HitCollector in order to collect my own ID rather than Lucene's document
> > id,
> > so I think I can set my field ID to be loaded always and the other
> fields
> > to
> > be lazily loading, Does this improve the search process?
> >
> > again, thank you very much indeed.
> >
> >
> > On 9/12/07, Grant Ingersoll <[hidden email]> wrote:
> > >
> > > Hi Mohammad,
> > >
> > > The typical use cases are:
> > > 1. You have several small fields used in a results display and one or
> > > two large fields (i.e. the original document) and you don't want to
> > > pay the cost of loading the large fields for results display because
> > > most of them won't be chosen.  When a result is chosen, the lazily
> > > loaded field will be retrieved.
> > >
> > > 2. You only want to load certain fields, or the first field, or you
> > > just want to know the size of a field.
> > >
> > > Basically, it gives you control over how fields are loaded from disk
> > > in Lucene.
> > >
> > > See my ApacheCon Europe presentation http://cnlp.org/presentations/
> > > slides/AdvancedLuceneEU.pdf for a few slides (towards the end) on
> > > FieldSelector.
> > >
> > > On Sep 12, 2007, at 5:13 AM, Mohammad Norouzi wrote:
> > >
> > > > Hi all,
> > > >
> > > > Can anyone explain what is the FieldSelector and the usage or
> > > > benefits of
> > > > this structure? I read the javadocs but I can't get for what goal
> > > > it is
> > > > provided in Lucene.
> > > >
> > > > Thanks in advance
> > > >
> > > > --
> > > > Regards,
> > > > Mohammad
> > > > --------------------------
> > > > see my blog: http://brainable.blogspot.com/
> > > > another in Persian: http://fekre-motefavet.blogspot.com/
> > >
> > > --------------------------
> > > Grant Ingersoll
> > > http://lucene.grantingersoll.com
> > >
> > > Lucene Helpful Hints:
> > > http://wiki.apache.org/lucene-java/BasicsOfPerformance
> > > http://wiki.apache.org/lucene-java/LuceneFAQ
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> > >
> >
> >
> > --
> > Regards,
> > Mohammad
> > --------------------------
> > see my blog: http://brainable.blogspot.com/
> > another in Persian: http://fekre-motefavet.blogspot.com/
> >
>



--
Regards,
Mohammad
--------------------------
see my blog: http://brainable.blogspot.com/
another in Persian: http://fekre-motefavet.blogspot.com/
--
Regards
Mohammad
Pixelshot
Reply | Threaded
Open this post in threaded view
|

Re: regarding FieldSelector

Erick Erickson
Do you have any evidence that you're having a performance issue? If
not, I'd just do the simple thing and ignore the rest. The performance
issues I found were because I was spinning through many, many
documents. If you're only worrying about one document at a time,
it may not be an issue.

If you *are* having performance issues, I'd *strongly* recommend
that you measure to find out where the problem is before trying
a solution. Otherwise you'll optimize code that isn't the problem.

Best
Erick

On 9/13/07, Mohammad Norouzi <[hidden email]> wrote:

>
> Thanks
> as I saw the documents, we can only use this great field selector in
> IndexReader.document() method the problem is I have a Searcher in my
> result
> set structure and when the client calls getString("a_field_name") at that
> time I invoke the searcher.doc(current_doc_id).get("a_field_name),
> I already collected the result IDs. so in my case, I can't use
> FieldSelector.
>
> Do I have to revise the way of retrieving documents in my code?
>
>
>
> On 9/12/07, Erick Erickson <[hidden email]> wrote:
> >
> > Well, it depends on what "improve the search process" means
> > in your context <G>..
> >
> > But I had a case similar to yours that I wrote up in the Wiki where
> > my search times improved about 10X by using lazy loading. You
> > might want to read that entry here...
> >
> > http://wiki.apache.org/lucene-java/FieldSelectorPerformance
> >
> > Note the peculiar characteristics of my data set, I really suspect
> > that a 10x improvement in retrieval speed is atypical...
> >
> > As for when lazily-loaded fields actually get loaded, I didn't really
> > have to explore it very fully, but a short experiment should do it
> > for you.....
> >
> > Best
> > Erick
> >
> > On 9/12/07, Mohammad Norouzi <[hidden email]> wrote:
> > >
> > > Hi Grant,
> > > Really thanks for your nice document about advanced Lucene. it was
> very
> > > useful for me.
> > >
> > > as I understand, we can set some large fields to be lazily loading,
> now
> > my
> > > question is when it will be loaded? it make sense when we call
> > > doc.get("field_name")
> > > it will load from the index, Am I right?
> > >
> > > in my application, I've provided a result set structure to navigate
> > > between
> > > results and documents and provide a get(String fieldname) method just
> > like
> > > java.sql.ResultSet.getString() method, and also this result set
> > implements
> > > HitCollector in order to collect my own ID rather than Lucene's
> document
> > > id,
> > > so I think I can set my field ID to be loaded always and the other
> > fields
> > > to
> > > be lazily loading, Does this improve the search process?
> > >
> > > again, thank you very much indeed.
> > >
> > >
> > > On 9/12/07, Grant Ingersoll <[hidden email]> wrote:
> > > >
> > > > Hi Mohammad,
> > > >
> > > > The typical use cases are:
> > > > 1. You have several small fields used in a results display and one
> or
> > > > two large fields (i.e. the original document) and you don't want to
> > > > pay the cost of loading the large fields for results display because
> > > > most of them won't be chosen.  When a result is chosen, the lazily
> > > > loaded field will be retrieved.
> > > >
> > > > 2. You only want to load certain fields, or the first field, or you
> > > > just want to know the size of a field.
> > > >
> > > > Basically, it gives you control over how fields are loaded from disk
> > > > in Lucene.
> > > >
> > > > See my ApacheCon Europe presentation http://cnlp.org/presentations/
> > > > slides/AdvancedLuceneEU.pdf for a few slides (towards the end) on
> > > > FieldSelector.
> > > >
> > > > On Sep 12, 2007, at 5:13 AM, Mohammad Norouzi wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Can anyone explain what is the FieldSelector and the usage or
> > > > > benefits of
> > > > > this structure? I read the javadocs but I can't get for what goal
> > > > > it is
> > > > > provided in Lucene.
> > > > >
> > > > > Thanks in advance
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Mohammad
> > > > > --------------------------
> > > > > see my blog: http://brainable.blogspot.com/
> > > > > another in Persian: http://fekre-motefavet.blogspot.com/
> > > >
> > > > --------------------------
> > > > Grant Ingersoll
> > > > http://lucene.grantingersoll.com
> > > >
> > > > Lucene Helpful Hints:
> > > > http://wiki.apache.org/lucene-java/BasicsOfPerformance
> > > > http://wiki.apache.org/lucene-java/LuceneFAQ
> > > >
> > > >
> > > >
> > > >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [hidden email]
> > > > For additional commands, e-mail: [hidden email]
> > > >
> > > >
> > >
> > >
> > > --
> > > Regards,
> > > Mohammad
> > > --------------------------
> > > see my blog: http://brainable.blogspot.com/
> > > another in Persian: http://fekre-motefavet.blogspot.com/
> > >
> >
>
>
>
> --
> Regards,
> Mohammad
> --------------------------
> see my blog: http://brainable.blogspot.com/
> another in Persian: http://fekre-motefavet.blogspot.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: regarding FieldSelector

is_maximum
well, actually, I have 5 index directory and it will increase in future. and
the thing is that each document about 20 fields on average. considering many
users may connect to the system (we anticipate 500 users at this time) I
want to know whether this will make performance issue or not.

 we provided a feature to select which fields they want to be displayed so I
know that only 5 or 6 fields are important to my users. I don't know the way
I stated in my last email, I mean searcher.doc(doc_id).get("field_name"),
make the Lucene to load all fields of the document or only the given name?
if yes, I mean if all the fields are loaded I think it's better to make them
lazy.

what do you suggest?

thanks


On 9/13/07, Erick Erickson <[hidden email]> wrote:

>
> Do you have any evidence that you're having a performance issue? If
> not, I'd just do the simple thing and ignore the rest. The performance
> issues I found were because I was spinning through many, many
> documents. If you're only worrying about one document at a time,
> it may not be an issue.
>
> If you *are* having performance issues, I'd *strongly* recommend
> that you measure to find out where the problem is before trying
> a solution. Otherwise you'll optimize code that isn't the problem.
>
> Best
> Erick
>
> On 9/13/07, Mohammad Norouzi <[hidden email]> wrote:
> >
> > Thanks
> > as I saw the documents, we can only use this great field selector in
> > IndexReader.document() method the problem is I have a Searcher in my
> > result
> > set structure and when the client calls getString("a_field_name") at
> that
> > time I invoke the searcher.doc(current_doc_id).get("a_field_name),
> > I already collected the result IDs. so in my case, I can't use
> > FieldSelector.
> >
> > Do I have to revise the way of retrieving documents in my code?
> >
> >
> >
> > On 9/12/07, Erick Erickson <[hidden email]> wrote:
> > >
> > > Well, it depends on what "improve the search process" means
> > > in your context <G>..
> > >
> > > But I had a case similar to yours that I wrote up in the Wiki where
> > > my search times improved about 10X by using lazy loading. You
> > > might want to read that entry here...
> > >
> > > http://wiki.apache.org/lucene-java/FieldSelectorPerformance
> > >
> > > Note the peculiar characteristics of my data set, I really suspect
> > > that a 10x improvement in retrieval speed is atypical...
> > >
> > > As for when lazily-loaded fields actually get loaded, I didn't really
> > > have to explore it very fully, but a short experiment should do it
> > > for you.....
> > >
> > > Best
> > > Erick
> > >
> > > On 9/12/07, Mohammad Norouzi <[hidden email]> wrote:
> > > >
> > > > Hi Grant,
> > > > Really thanks for your nice document about advanced Lucene. it was
> > very
> > > > useful for me.
> > > >
> > > > as I understand, we can set some large fields to be lazily loading,
> > now
> > > my
> > > > question is when it will be loaded? it make sense when we call
> > > > doc.get("field_name")
> > > > it will load from the index, Am I right?
> > > >
> > > > in my application, I've provided a result set structure to navigate
> > > > between
> > > > results and documents and provide a get(String fieldname) method
> just
> > > like
> > > > java.sql.ResultSet.getString() method, and also this result set
> > > implements
> > > > HitCollector in order to collect my own ID rather than Lucene's
> > document
> > > > id,
> > > > so I think I can set my field ID to be loaded always and the other
> > > fields
> > > > to
> > > > be lazily loading, Does this improve the search process?
> > > >
> > > > again, thank you very much indeed.
> > > >
> > > >
> > > > On 9/12/07, Grant Ingersoll <[hidden email]> wrote:
> > > > >
> > > > > Hi Mohammad,
> > > > >
> > > > > The typical use cases are:
> > > > > 1. You have several small fields used in a results display and one
> > or
> > > > > two large fields (i.e. the original document) and you don't want
> to
> > > > > pay the cost of loading the large fields for results display
> because
> > > > > most of them won't be chosen.  When a result is chosen, the lazily
> > > > > loaded field will be retrieved.
> > > > >
> > > > > 2. You only want to load certain fields, or the first field, or
> you
> > > > > just want to know the size of a field.
> > > > >
> > > > > Basically, it gives you control over how fields are loaded from
> disk
> > > > > in Lucene.
> > > > >
> > > > > See my ApacheCon Europe presentation
> http://cnlp.org/presentations/
> > > > > slides/AdvancedLuceneEU.pdf for a few slides (towards the end) on
> > > > > FieldSelector.
> > > > >
> > > > > On Sep 12, 2007, at 5:13 AM, Mohammad Norouzi wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Can anyone explain what is the FieldSelector and the usage or
> > > > > > benefits of
> > > > > > this structure? I read the javadocs but I can't get for what
> goal
> > > > > > it is
> > > > > > provided in Lucene.
> > > > > >
> > > > > > Thanks in advance
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > > Mohammad
> > > > > > --------------------------
> > > > > > see my blog: http://brainable.blogspot.com/
> > > > > > another in Persian: http://fekre-motefavet.blogspot.com/
> > > > >
> > > > > --------------------------
> > > > > Grant Ingersoll
> > > > > http://lucene.grantingersoll.com
> > > > >
> > > > > Lucene Helpful Hints:
> > > > > http://wiki.apache.org/lucene-java/BasicsOfPerformance
> > > > > http://wiki.apache.org/lucene-java/LuceneFAQ
> > > > >
> > > > >
> > > > >
> > > > >
> > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: [hidden email]
> > > > > For additional commands, e-mail: [hidden email]
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Mohammad
> > > > --------------------------
> > > > see my blog: http://brainable.blogspot.com/
> > > > another in Persian: http://fekre-motefavet.blogspot.com/
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Mohammad
> > --------------------------
> > see my blog: http://brainable.blogspot.com/
> > another in Persian: http://fekre-motefavet.blogspot.com/
> >
>



--
Regards,
Mohammad
--------------------------
see my blog: http://brainable.blogspot.com/
another in Persian: http://fekre-motefavet.blogspot.com/
--
Regards
Mohammad
Pixelshot
Reply | Threaded
Open this post in threaded view
|

Re: regarding FieldSelector

Erick Erickson
I'm not entirely sure. So what I'd do if I were you is write a
little test program and step through it in the debugger and
see <G>....

But, if you're only allowing the user to fetch a single document
at a time, I don't think it matters enough to worry about. If, on the
other hand, you're allowing the user to display some combination
of, say, 5 fields for a *list* of documents, I'd make them all lazy
and then you can write a HitCollector to get the list "lazily".

Best
Erick

On 9/13/07, Mohammad Norouzi <[hidden email]> wrote:

>
> well, actually, I have 5 index directory and it will increase in future.
> and
> the thing is that each document about 20 fields on average. considering
> many
> users may connect to the system (we anticipate 500 users at this time) I
> want to know whether this will make performance issue or not.
>
> we provided a feature to select which fields they want to be displayed so
> I
> know that only 5 or 6 fields are important to my users. I don't know the
> way
> I stated in my last email, I mean searcher.doc(doc_id).get("field_name"),
> make the Lucene to load all fields of the document or only the given name?
> if yes, I mean if all the fields are loaded I think it's better to make
> them
> lazy.
>
> what do you suggest?
>
> thanks
>
>
> On 9/13/07, Erick Erickson <[hidden email]> wrote:
> >
> > Do you have any evidence that you're having a performance issue? If
> > not, I'd just do the simple thing and ignore the rest. The performance
> > issues I found were because I was spinning through many, many
> > documents. If you're only worrying about one document at a time,
> > it may not be an issue.
> >
> > If you *are* having performance issues, I'd *strongly* recommend
> > that you measure to find out where the problem is before trying
> > a solution. Otherwise you'll optimize code that isn't the problem.
> >
> > Best
> > Erick
> >
> > On 9/13/07, Mohammad Norouzi <[hidden email]> wrote:
> > >
> > > Thanks
> > > as I saw the documents, we can only use this great field selector in
> > > IndexReader.document() method the problem is I have a Searcher in my
> > > result
> > > set structure and when the client calls getString("a_field_name") at
> > that
> > > time I invoke the searcher.doc(current_doc_id).get("a_field_name),
> > > I already collected the result IDs. so in my case, I can't use
> > > FieldSelector.
> > >
> > > Do I have to revise the way of retrieving documents in my code?
> > >
> > >
> > >
> > > On 9/12/07, Erick Erickson <[hidden email]> wrote:
> > > >
> > > > Well, it depends on what "improve the search process" means
> > > > in your context <G>..
> > > >
> > > > But I had a case similar to yours that I wrote up in the Wiki where
> > > > my search times improved about 10X by using lazy loading. You
> > > > might want to read that entry here...
> > > >
> > > > http://wiki.apache.org/lucene-java/FieldSelectorPerformance
> > > >
> > > > Note the peculiar characteristics of my data set, I really suspect
> > > > that a 10x improvement in retrieval speed is atypical...
> > > >
> > > > As for when lazily-loaded fields actually get loaded, I didn't
> really
> > > > have to explore it very fully, but a short experiment should do it
> > > > for you.....
> > > >
> > > > Best
> > > > Erick
> > > >
> > > > On 9/12/07, Mohammad Norouzi <[hidden email]> wrote:
> > > > >
> > > > > Hi Grant,
> > > > > Really thanks for your nice document about advanced Lucene. it was
> > > very
> > > > > useful for me.
> > > > >
> > > > > as I understand, we can set some large fields to be lazily
> loading,
> > > now
> > > > my
> > > > > question is when it will be loaded? it make sense when we call
> > > > > doc.get("field_name")
> > > > > it will load from the index, Am I right?
> > > > >
> > > > > in my application, I've provided a result set structure to
> navigate
> > > > > between
> > > > > results and documents and provide a get(String fieldname) method
> > just
> > > > like
> > > > > java.sql.ResultSet.getString() method, and also this result set
> > > > implements
> > > > > HitCollector in order to collect my own ID rather than Lucene's
> > > document
> > > > > id,
> > > > > so I think I can set my field ID to be loaded always and the other
> > > > fields
> > > > > to
> > > > > be lazily loading, Does this improve the search process?
> > > > >
> > > > > again, thank you very much indeed.
> > > > >
> > > > >
> > > > > On 9/12/07, Grant Ingersoll <[hidden email]> wrote:
> > > > > >
> > > > > > Hi Mohammad,
> > > > > >
> > > > > > The typical use cases are:
> > > > > > 1. You have several small fields used in a results display and
> one
> > > or
> > > > > > two large fields (i.e. the original document) and you don't want
> > to
> > > > > > pay the cost of loading the large fields for results display
> > because
> > > > > > most of them won't be chosen.  When a result is chosen, the
> lazily
> > > > > > loaded field will be retrieved.
> > > > > >
> > > > > > 2. You only want to load certain fields, or the first field, or
> > you
> > > > > > just want to know the size of a field.
> > > > > >
> > > > > > Basically, it gives you control over how fields are loaded from
> > disk
> > > > > > in Lucene.
> > > > > >
> > > > > > See my ApacheCon Europe presentation
> > http://cnlp.org/presentations/
> > > > > > slides/AdvancedLuceneEU.pdf for a few slides (towards the end)
> on
> > > > > > FieldSelector.
> > > > > >
> > > > > > On Sep 12, 2007, at 5:13 AM, Mohammad Norouzi wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > Can anyone explain what is the FieldSelector and the usage or
> > > > > > > benefits of
> > > > > > > this structure? I read the javadocs but I can't get for what
> > goal
> > > > > > > it is
> > > > > > > provided in Lucene.
> > > > > > >
> > > > > > > Thanks in advance
> > > > > > >
> > > > > > > --
> > > > > > > Regards,
> > > > > > > Mohammad
> > > > > > > --------------------------
> > > > > > > see my blog: http://brainable.blogspot.com/
> > > > > > > another in Persian: http://fekre-motefavet.blogspot.com/
> > > > > >
> > > > > > --------------------------
> > > > > > Grant Ingersoll
> > > > > > http://lucene.grantingersoll.com
> > > > > >
> > > > > > Lucene Helpful Hints:
> > > > > > http://wiki.apache.org/lucene-java/BasicsOfPerformance
> > > > > > http://wiki.apache.org/lucene-java/LuceneFAQ
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: [hidden email]
> > > > > > For additional commands, e-mail:
> [hidden email]
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Mohammad
> > > > > --------------------------
> > > > > see my blog: http://brainable.blogspot.com/
> > > > > another in Persian: http://fekre-motefavet.blogspot.com/
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Mohammad
> > > --------------------------
> > > see my blog: http://brainable.blogspot.com/
> > > another in Persian: http://fekre-motefavet.blogspot.com/
> > >
> >
>
>
>
> --
> Regards,
> Mohammad
> --------------------------
> see my blog: http://brainable.blogspot.com/
> another in Persian: http://fekre-motefavet.blogspot.com/
>
Reply | Threaded
Open this post in threaded view
|

Re: regarding FieldSelector

is_maximum
actually, I show the result with pagination support, and they have option to
choose the number of records per page. and yes, I should provide a test
program, but about the HitCollector, I already created one, and collect all
lucene's document id and also my needed ID that stored in the index

>> you can write a HitCollector to get the list "lazily".

do you mean by writing a HitCollector, all the list will load lazily and no
need to use FieldSelector?


thanks

On 9/13/07, Erick Erickson <[hidden email]> wrote:

>
> I'm not entirely sure. So what I'd do if I were you is write a
> little test program and step through it in the debugger and
> see <G>....
>
> But, if you're only allowing the user to fetch a single document
> at a time, I don't think it matters enough to worry about. If, on the
> other hand, you're allowing the user to display some combination
> of, say, 5 fields for a *list* of documents, I'd make them all lazy
> and then you can write a HitCollector to get the list "lazily".
>
> Best
> Erick
>
>
--
Regards,
Mohammad
--------------------------
see my blog: http://brainable.blogspot.com/
another in Persian: http://fekre-motefavet.blogspot.com/
--
Regards
Mohammad
Pixelshot
Reply | Threaded
Open this post in threaded view
|

Re: regarding FieldSelector

Grant Ingersoll-4
In reply to this post by is_maximum
Searcher is a Searchable and Searchable defines the doc() method with  
FieldSelector, but I suppose we could add an abstract declaration of  
it to Searcher, since it has to be implemented on all derived  
classes anyway due to it being on the Searchable interface.

So, you can either cast to a known Searcher or I suppose you can  
figure out a way to get the IndexReader.  What kind of Searcher are  
you using?

-Grant

On Sep 13, 2007, at 4:50 AM, Mohammad Norouzi wrote:

> Thanks
> as I saw the documents, we can only use this great field selector in
> IndexReader.document() method the problem is I have a Searcher in  
> my result
> set structure and when the client calls getString("a_field_name")  
> at that
> time I invoke the searcher.doc(current_doc_id).get("a_field_name),
> I already collected the result IDs. so in my case, I can't use
> FieldSelector.
>
> Do I have to revise the way of retrieving documents in my code?
>
>
>
> On 9/12/07, Erick Erickson <[hidden email]> wrote:
>>
>> Well, it depends on what "improve the search process" means
>> in your context <G>..
>>
>> But I had a case similar to yours that I wrote up in the Wiki where
>> my search times improved about 10X by using lazy loading. You
>> might want to read that entry here...
>>
>> http://wiki.apache.org/lucene-java/FieldSelectorPerformance
>>
>> Note the peculiar characteristics of my data set, I really suspect
>> that a 10x improvement in retrieval speed is atypical...
>>
>> As for when lazily-loaded fields actually get loaded, I didn't really
>> have to explore it very fully, but a short experiment should do it
>> for you.....
>>
>> Best
>> Erick
>>
>> On 9/12/07, Mohammad Norouzi <[hidden email]> wrote:
>>>
>>> Hi Grant,
>>> Really thanks for your nice document about advanced Lucene. it  
>>> was very
>>> useful for me.
>>>
>>> as I understand, we can set some large fields to be lazily  
>>> loading, now
>> my
>>> question is when it will be loaded? it make sense when we call
>>> doc.get("field_name")
>>> it will load from the index, Am I right?
>>>
>>> in my application, I've provided a result set structure to navigate
>>> between
>>> results and documents and provide a get(String fieldname) method  
>>> just
>> like
>>> java.sql.ResultSet.getString() method, and also this result set
>> implements
>>> HitCollector in order to collect my own ID rather than Lucene's  
>>> document
>>> id,
>>> so I think I can set my field ID to be loaded always and the other
>> fields
>>> to
>>> be lazily loading, Does this improve the search process?
>>>
>>> again, thank you very much indeed.
>>>
>>>
>>> On 9/12/07, Grant Ingersoll <[hidden email]> wrote:
>>>>
>>>> Hi Mohammad,
>>>>
>>>> The typical use cases are:
>>>> 1. You have several small fields used in a results display and  
>>>> one or
>>>> two large fields (i.e. the original document) and you don't want to
>>>> pay the cost of loading the large fields for results display  
>>>> because
>>>> most of them won't be chosen.  When a result is chosen, the lazily
>>>> loaded field will be retrieved.
>>>>
>>>> 2. You only want to load certain fields, or the first field, or you
>>>> just want to know the size of a field.
>>>>
>>>> Basically, it gives you control over how fields are loaded from  
>>>> disk
>>>> in Lucene.
>>>>
>>>> See my ApacheCon Europe presentation http://cnlp.org/presentations/
>>>> slides/AdvancedLuceneEU.pdf for a few slides (towards the end) on
>>>> FieldSelector.
>>>>
>>>> On Sep 12, 2007, at 5:13 AM, Mohammad Norouzi wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Can anyone explain what is the FieldSelector and the usage or
>>>>> benefits of
>>>>> this structure? I read the javadocs but I can't get for what goal
>>>>> it is
>>>>> provided in Lucene.
>>>>>
>>>>> Thanks in advance
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Mohammad
>>>>> --------------------------
>>>>> see my blog: http://brainable.blogspot.com/
>>>>> another in Persian: http://fekre-motefavet.blogspot.com/
>>>>
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://lucene.grantingersoll.com
>>>>
>>>> Lucene Helpful Hints:
>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>
>>>>
>>>>
>>>> -------------------------------------------------------------------
>>>> --
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Mohammad
>>> --------------------------
>>> see my blog: http://brainable.blogspot.com/
>>> another in Persian: http://fekre-motefavet.blogspot.com/
>>>
>>
>
>
>
> --
> Regards,
> Mohammad
> --------------------------
> see my blog: http://brainable.blogspot.com/
> another in Persian: http://fekre-motefavet.blogspot.com/

------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

multiple Tokens in a Tokengroup not matching

Dave Schneider
Hi,

I've inherited some Lucene 1.9.1 code, and have run into the following
problem:

I have a TokenGroup with multiple tokens in it, and a query that should
match against multiple tokens (e.g. X and Y) in the TokenGroup.  
However, when I look in the Hit that results, I see that one of the
Tokens in the TokenGroup has a weight of 1.0, while all the rest have a
weight of 0.  If I run a search with just X in the query, it matches the
TokenGroup, and when I run a search with just Y in the query, it also
matches the TokenGroup, just as I'd expect.  But a query that includes
both X and Y looks like only X matches.  I need to see that both X and Y
matched in order to get my highlighting to work correctly.

Can anyone provide any hints as to what might be going on here, and what
I might do to fix it?  We have a vague suspicion that it's related to
the weight on the matching token being 1.0 that's causing Lucene to not
both with any other tokens (because the weight for the TokenGroup is
already as high as it can be), but it's just a suspicion.

Thanks,

Dave Schneider


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: regarding FieldSelector

is_maximum
In reply to this post by Grant Ingersoll-4
well, I can't see any doc() method with FieldSelector argument, perhaps this
is provided in nightly builds of Lucene, currently I am using Lucene v2.1.0
I am using org.apache.lucene.search.Searcher and new
IndexSearcher(a_directory) to instantiate an instance of it


On 9/14/07, Grant Ingersoll <[hidden email]> wrote:

>
> Searcher is a Searchable and Searchable defines the doc() method with
> FieldSelector, but I suppose we could add an abstract declaration of
> it to Searcher, since it has to be implemented on all derived
> classes anyway due to it being on the Searchable interface.
>
> So, you can either cast to a known Searcher or I suppose you can
> figure out a way to get the IndexReader.  What kind of Searcher are
> you using?
>
> -Grant
>
> On Sep 13, 2007, at 4:50 AM, Mohammad Norouzi wrote:
>
> > Thanks
> > as I saw the documents, we can only use this great field selector in
> > IndexReader.document() method the problem is I have a Searcher in
> > my result
> > set structure and when the client calls getString("a_field_name")
> > at that
> > time I invoke the searcher.doc(current_doc_id).get("a_field_name),
> > I already collected the result IDs. so in my case, I can't use
> > FieldSelector.
> >
> > Do I have to revise the way of retrieving documents in my code?
> >
> >
> >
> > On 9/12/07, Erick Erickson <[hidden email]> wrote:
> >>
> >> Well, it depends on what "improve the search process" means
> >> in your context <G>..
> >>
> >> But I had a case similar to yours that I wrote up in the Wiki where
> >> my search times improved about 10X by using lazy loading. You
> >> might want to read that entry here...
> >>
> >> http://wiki.apache.org/lucene-java/FieldSelectorPerformance
> >>
> >> Note the peculiar characteristics of my data set, I really suspect
> >> that a 10x improvement in retrieval speed is atypical...
> >>
> >> As for when lazily-loaded fields actually get loaded, I didn't really
> >> have to explore it very fully, but a short experiment should do it
> >> for you.....
> >>
> >> Best
> >> Erick
> >>
> >> On 9/12/07, Mohammad Norouzi <[hidden email]> wrote:
> >>>
> >>> Hi Grant,
> >>> Really thanks for your nice document about advanced Lucene. it
> >>> was very
> >>> useful for me.
> >>>
> >>> as I understand, we can set some large fields to be lazily
> >>> loading, now
> >> my
> >>> question is when it will be loaded? it make sense when we call
> >>> doc.get("field_name")
> >>> it will load from the index, Am I right?
> >>>
> >>> in my application, I've provided a result set structure to navigate
> >>> between
> >>> results and documents and provide a get(String fieldname) method
> >>> just
> >> like
> >>> java.sql.ResultSet.getString() method, and also this result set
> >> implements
> >>> HitCollector in order to collect my own ID rather than Lucene's
> >>> document
> >>> id,
> >>> so I think I can set my field ID to be loaded always and the other
> >> fields
> >>> to
> >>> be lazily loading, Does this improve the search process?
> >>>
> >>> again, thank you very much indeed.
> >>>
> >>>
> >>> On 9/12/07, Grant Ingersoll <[hidden email]> wrote:
> >>>>
> >>>> Hi Mohammad,
> >>>>
> >>>> The typical use cases are:
> >>>> 1. You have several small fields used in a results display and
> >>>> one or
> >>>> two large fields (i.e. the original document) and you don't want to
> >>>> pay the cost of loading the large fields for results display
> >>>> because
> >>>> most of them won't be chosen.  When a result is chosen, the lazily
> >>>> loaded field will be retrieved.
> >>>>
> >>>> 2. You only want to load certain fields, or the first field, or you
> >>>> just want to know the size of a field.
> >>>>
> >>>> Basically, it gives you control over how fields are loaded from
> >>>> disk
> >>>> in Lucene.
> >>>>
> >>>> See my ApacheCon Europe presentation http://cnlp.org/presentations/
> >>>> slides/AdvancedLuceneEU.pdf for a few slides (towards the end) on
> >>>> FieldSelector.
> >>>>
> >>>> On Sep 12, 2007, at 5:13 AM, Mohammad Norouzi wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> Can anyone explain what is the FieldSelector and the usage or
> >>>>> benefits of
> >>>>> this structure? I read the javadocs but I can't get for what goal
> >>>>> it is
> >>>>> provided in Lucene.
> >>>>>
> >>>>> Thanks in advance
> >>>>>
> >>>>> --
> >>>>> Regards,
> >>>>> Mohammad
> >>>>> --------------------------
> >>>>> see my blog: http://brainable.blogspot.com/
> >>>>> another in Persian: http://fekre-motefavet.blogspot.com/
> >>>>
> >>>> --------------------------
> >>>> Grant Ingersoll
> >>>> http://lucene.grantingersoll.com
> >>>>
> >>>> Lucene Helpful Hints:
> >>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> >>>> http://wiki.apache.org/lucene-java/LuceneFAQ
> >>>>
> >>>>
> >>>>
> >>>> -------------------------------------------------------------------
> >>>> --
> >>>> To unsubscribe, e-mail: [hidden email]
> >>>> For additional commands, e-mail: [hidden email]
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Regards,
> >>> Mohammad
> >>> --------------------------
> >>> see my blog: http://brainable.blogspot.com/
> >>> another in Persian: http://fekre-motefavet.blogspot.com/
> >>>
> >>
> >
> >
> >
> > --
> > Regards,
> > Mohammad
> > --------------------------
> > see my blog: http://brainable.blogspot.com/
> > another in Persian: http://fekre-motefavet.blogspot.com/
>
> ------------------------------------------------------
> Grant Ingersoll
> http://www.grantingersoll.com/
> http://lucene.grantingersoll.com
> http://www.paperoftheweek.com/
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Regards,
Mohammad
--------------------------
see my blog: http://brainable.blogspot.com/
another in Persian: http://fekre-motefavet.blogspot.com/
--
Regards
Mohammad
Pixelshot
Reply | Threaded
Open this post in threaded view
|

Re: regarding FieldSelector

hossman

: well, I can't see any doc() method with FieldSelector argument, perhaps this
: is provided in nightly builds of Lucene, currently I am using Lucene v2.1.0

2.2 was released in June, in it the Searchable interface defines a doc
method which takes a FieldSelector...

http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/Searchable.html#doc(int,%20org.apache.lucene.document.FieldSelector)


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: multiple Tokens in a Tokengroup not matching

Erick Erickson
In reply to this post by Dave Schneider
I really can't tell much from your e-mail. What I'd recommend is
that you get a copy of Luke and examine your index (including
your query and it's results). Also, try query.toString() to see what
the actual query submitted to Lucene is, that may give you some
clues as to what's going on.

If the results make no sense, think about posting the results of
the toString and/or the Luke explain to  the list and I suspect
you'll get a more helpful response.

Imagine that you had been given your e-mail while knowing nothing
about the application. There's not much "there" there <G>. More
detail would help us give you a more helpful response....

That said, it's tough getting a bunch of code dropped on  your
head and being told "fix it". Been there, done that. Not much fun.

Best
Erick


On 9/14/07, Dave Schneider <[hidden email]> wrote:

>
> Hi,
>
> I've inherited some Lucene 1.9.1 code, and have run into the following
> problem:
>
> I have a TokenGroup with multiple tokens in it, and a query that should
> match against multiple tokens (e.g. X and Y) in the TokenGroup.
> However, when I look in the Hit that results, I see that one of the
> Tokens in the TokenGroup has a weight of 1.0, while all the rest have a
> weight of 0.  If I run a search with just X in the query, it matches the
> TokenGroup, and when I run a search with just Y in the query, it also
> matches the TokenGroup, just as I'd expect.  But a query that includes
> both X and Y looks like only X matches.  I need to see that both X and Y
> matched in order to get my highlighting to work correctly.
>
> Can anyone provide any hints as to what might be going on here, and what
> I might do to fix it?  We have a vague suspicion that it's related to
> the weight on the matching token being 1.0 that's causing Lucene to not
> both with any other tokens (because the weight for the TokenGroup is
> already as high as it can be), but it's just a suspicion.
>
> Thanks,
>
> Dave Schneider
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>