Quantcast

How to get document effectively. or FieldCache example

Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

How to get document effectively. or FieldCache example

neerajshah84
Hello Everyone,

I am using Lucene 3.6. I have to index around 60k docuemnts. After
performing the search when i try to reterive documents from seacher using
searcher.doc(docid)  it slows down the search .
Please is there any other way to get the document.

Also if anyone can give me an end-to-end example for working FieldCache.
While implementing the cache i have :

int[] fieldIds = FieldCache.DEFAULT.getInts(indexMultiReader, "id");

now i dont know how to further use the fieldIds for improving search.
Please give me an end-to-end example.

Thanks
Neeraj
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to get document effectively. or FieldCache example

Adrien Grand
IndexSearcher.doc is the right way to retrieve documents. If this is
slowing things down for you, I'm wondering that you might be fetching too
many results?

Le jeu. 20 avr. 2017 à 14:16, neeraj shah <[hidden email]> a écrit :

> Hello Everyone,
>
> I am using Lucene 3.6. I have to index around 60k docuemnts. After
> performing the search when i try to reterive documents from seacher using
> searcher.doc(docid)  it slows down the search .
> Please is there any other way to get the document.
>
> Also if anyone can give me an end-to-end example for working FieldCache.
> While implementing the cache i have :
>
> int[] fieldIds = FieldCache.DEFAULT.getInts(indexMultiReader, "id");
>
> now i dont know how to further use the fieldIds for improving search.
> Please give me an end-to-end example.
>
> Thanks
> Neeraj
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to get document effectively. or FieldCache example

neerajshah84
Yes I fetching around 5 lacs result from index searcher.
Also i am indexing each line of each file because while searching i need
all the lines of a file which has matched term.
Please tell me am i doing it right.
{code}

InputStream  is = new BufferedInputStream(new FileInputStream(file));
    BufferedReader bufr = new BufferedReader(new InputStreamReader(is));
    String inputLine="" ;

    while((inputLine=bufr.readLine())!=null ){
Document doc = new Document();
    doc.add(new
Field("contents",inputLine,Field.Store.YES,Field.Index.ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS));
    doc.add(new
Field("title",section,Field.Store.YES,Field.Index.NOT_ANALYZED));
    String newRem = new String(rem);

    doc.add(new
Field("fieldsort",newRem,Field.Store.YES,Field.Index.ANALYZED));
    doc.add(new Field("fieldsort2",rem.toLowerCase().replaceAll("-",
"").replaceAll(" ", ""),Field.Store.YES,Field.Index.ANALYZED));

    doc.add(new
Field("field1",Author,Field.Store.YES,Field.Index.NOT_ANALYZED));
    doc.add(new
Field("field2",Book,Field.Store.YES,Field.Index.NOT_ANALYZED));
    doc.add(new
Field("field3",sec,Field.Store.YES,Field.Index.NOT_ANALYZED));

    writer.addDocument(doc);

}
    is.close();

{/code}

On Thu, Apr 20, 2017 at 5:57 PM, Adrien Grand <[hidden email]> wrote:

> IndexSearcher.doc is the right way to retrieve documents. If this is
> slowing things down for you, I'm wondering that you might be fetching too
> many results?
>
> Le jeu. 20 avr. 2017 à 14:16, neeraj shah <[hidden email]> a
> écrit :
>
> > Hello Everyone,
> >
> > I am using Lucene 3.6. I have to index around 60k docuemnts. After
> > performing the search when i try to reterive documents from seacher using
> > searcher.doc(docid)  it slows down the search .
> > Please is there any other way to get the document.
> >
> > Also if anyone can give me an end-to-end example for working FieldCache.
> > While implementing the cache i have :
> >
> > int[] fieldIds = FieldCache.DEFAULT.getInts(indexMultiReader, "id");
> >
> > now i dont know how to further use the fieldIds for improving search.
> > Please give me an end-to-end example.
> >
> > Thanks
> > Neeraj
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to get document effectively. or FieldCache example

Adrien Grand
Lucene is not designed for retrieving that many results. What are you doing
with those 5 lacs documents, I suspect this is too much to display so you
probably perform some computations on them? If so maybe you could move them
to Lucene using eg. facets? If that does not work, I'm afraid that Lucene
is not the right tool for your problem.

Le ven. 21 avr. 2017 à 08:56, neeraj shah <[hidden email]> a écrit :

> Yes I fetching around 5 lacs result from index searcher.
> Also i am indexing each line of each file because while searching i need
> all the lines of a file which has matched term.
> Please tell me am i doing it right.
> {code}
>
> InputStream  is = new BufferedInputStream(new FileInputStream(file));
>     BufferedReader bufr = new BufferedReader(new InputStreamReader(is));
>     String inputLine="" ;
>
>     while((inputLine=bufr.readLine())!=null ){
> Document doc = new Document();
>     doc.add(new
>
> Field("contents",inputLine,Field.Store.YES,Field.Index.ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS));
>     doc.add(new
> Field("title",section,Field.Store.YES,Field.Index.NOT_ANALYZED));
>     String newRem = new String(rem);
>
>     doc.add(new
> Field("fieldsort",newRem,Field.Store.YES,Field.Index.ANALYZED));
>     doc.add(new Field("fieldsort2",rem.toLowerCase().replaceAll("-",
> "").replaceAll(" ", ""),Field.Store.YES,Field.Index.ANALYZED));
>
>     doc.add(new
> Field("field1",Author,Field.Store.YES,Field.Index.NOT_ANALYZED));
>     doc.add(new
> Field("field2",Book,Field.Store.YES,Field.Index.NOT_ANALYZED));
>     doc.add(new
> Field("field3",sec,Field.Store.YES,Field.Index.NOT_ANALYZED));
>
>     writer.addDocument(doc);
>
> }
>     is.close();
>
> {/code}
>
> On Thu, Apr 20, 2017 at 5:57 PM, Adrien Grand <[hidden email]> wrote:
>
> > IndexSearcher.doc is the right way to retrieve documents. If this is
> > slowing things down for you, I'm wondering that you might be fetching too
> > many results?
> >
> > Le jeu. 20 avr. 2017 à 14:16, neeraj shah <[hidden email]> a
> > écrit :
> >
> > > Hello Everyone,
> > >
> > > I am using Lucene 3.6. I have to index around 60k docuemnts. After
> > > performing the search when i try to reterive documents from seacher
> using
> > > searcher.doc(docid)  it slows down the search .
> > > Please is there any other way to get the document.
> > >
> > > Also if anyone can give me an end-to-end example for working
> FieldCache.
> > > While implementing the cache i have :
> > >
> > > int[] fieldIds = FieldCache.DEFAULT.getInts(indexMultiReader, "id");
> > >
> > > now i dont know how to further use the fieldIds for improving search.
> > > Please give me an end-to-end example.
> > >
> > > Thanks
> > > Neeraj
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to get document effectively. or FieldCache example

neerajshah84
then which one is right tool for text searching in files. please can you
suggest me?


On Fri, Apr 21, 2017 at 2:01 PM, Adrien Grand <[hidden email]> wrote:

> Lucene is not designed for retrieving that many results. What are you doing
> with those 5 lacs documents, I suspect this is too much to display so you
> probably perform some computations on them? If so maybe you could move them
> to Lucene using eg. facets? If that does not work, I'm afraid that Lucene
> is not the right tool for your problem.
>
> Le ven. 21 avr. 2017 à 08:56, neeraj shah <[hidden email]> a
> écrit :
>
> > Yes I fetching around 5 lacs result from index searcher.
> > Also i am indexing each line of each file because while searching i need
> > all the lines of a file which has matched term.
> > Please tell me am i doing it right.
> > {code}
> >
> > InputStream  is = new BufferedInputStream(new FileInputStream(file));
> >     BufferedReader bufr = new BufferedReader(new InputStreamReader(is));
> >     String inputLine="" ;
> >
> >     while((inputLine=bufr.readLine())!=null ){
> > Document doc = new Document();
> >     doc.add(new
> >
> > Field("contents",inputLine,Field.Store.YES,Field.Index.
> ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS));
> >     doc.add(new
> > Field("title",section,Field.Store.YES,Field.Index.NOT_ANALYZED));
> >     String newRem = new String(rem);
> >
> >     doc.add(new
> > Field("fieldsort",newRem,Field.Store.YES,Field.Index.ANALYZED));
> >     doc.add(new Field("fieldsort2",rem.toLowerCase().replaceAll("-",
> > "").replaceAll(" ", ""),Field.Store.YES,Field.Index.ANALYZED));
> >
> >     doc.add(new
> > Field("field1",Author,Field.Store.YES,Field.Index.NOT_ANALYZED));
> >     doc.add(new
> > Field("field2",Book,Field.Store.YES,Field.Index.NOT_ANALYZED));
> >     doc.add(new
> > Field("field3",sec,Field.Store.YES,Field.Index.NOT_ANALYZED));
> >
> >     writer.addDocument(doc);
> >
> > }
> >     is.close();
> >
> > {/code}
> >
> > On Thu, Apr 20, 2017 at 5:57 PM, Adrien Grand <[hidden email]> wrote:
> >
> > > IndexSearcher.doc is the right way to retrieve documents. If this is
> > > slowing things down for you, I'm wondering that you might be fetching
> too
> > > many results?
> > >
> > > Le jeu. 20 avr. 2017 à 14:16, neeraj shah <[hidden email]> a
> > > écrit :
> > >
> > > > Hello Everyone,
> > > >
> > > > I am using Lucene 3.6. I have to index around 60k docuemnts. After
> > > > performing the search when i try to reterive documents from seacher
> > using
> > > > searcher.doc(docid)  it slows down the search .
> > > > Please is there any other way to get the document.
> > > >
> > > > Also if anyone can give me an end-to-end example for working
> > FieldCache.
> > > > While implementing the cache i have :
> > > >
> > > > int[] fieldIds = FieldCache.DEFAULT.getInts(indexMultiReader, "id");
> > > >
> > > > now i dont know how to further use the fieldIds for improving search.
> > > > Please give me an end-to-end example.
> > > >
> > > > Thanks
> > > > Neeraj
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: How to get document effectively. or FieldCache example

Uwe Schindler
Hi,

for full text search, Lucene is the right tool. The problem is that inverted indexes and the software (like Lucene) on top are optimized to return the best ranking results very fast. This is what users normally do, e.g. when they search Google. You get a page with 10 or 20 results displayed. This can be done very fast, so Lucene will quickly collect those 20 documents and retrieving the values from stored fields is cheap.

The problem is if you want to get all results! This is in most cases also not really what you want: The lower-ranking results coming at the end are in most cases not interesting, so you won't fetch them from the index. Retrieving the first 10 or 20 is fast.

FYI, try it out on Google: You can page the results but at some point it will not allow you to dig deeper in the result (it is impossible to show results after offset 200 / page 20). This is similar in Lucene. Fetching all results is discouraged, as it gets slower and slower the deeper you dive.

Lucene has some workarounds like "searchAfter", but this does not solve the problem that fetching the values is slow.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: neeraj shah [mailto:[hidden email]]
> Sent: Friday, April 21, 2017 1:22 PM
> To: [hidden email]
> Subject: Re: How to get document effectively. or FieldCache example
>
> then which one is right tool for text searching in files. please can you
> suggest me?
>
>
> On Fri, Apr 21, 2017 at 2:01 PM, Adrien Grand <[hidden email]> wrote:
>
> > Lucene is not designed for retrieving that many results. What are you doing
> > with those 5 lacs documents, I suspect this is too much to display so you
> > probably perform some computations on them? If so maybe you could
> move them
> > to Lucene using eg. facets? If that does not work, I'm afraid that Lucene
> > is not the right tool for your problem.
> >
> > Le ven. 21 avr. 2017 à 08:56, neeraj shah <[hidden email]> a
> > écrit :
> >
> > > Yes I fetching around 5 lacs result from index searcher.
> > > Also i am indexing each line of each file because while searching i need
> > > all the lines of a file which has matched term.
> > > Please tell me am i doing it right.
> > > {code}
> > >
> > > InputStream  is = new BufferedInputStream(new FileInputStream(file));
> > >     BufferedReader bufr = new BufferedReader(new
> InputStreamReader(is));
> > >     String inputLine="" ;
> > >
> > >     while((inputLine=bufr.readLine())!=null ){
> > > Document doc = new Document();
> > >     doc.add(new
> > >
> > > Field("contents",inputLine,Field.Store.YES,Field.Index.
> > ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS));
> > >     doc.add(new
> > > Field("title",section,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > >     String newRem = new String(rem);
> > >
> > >     doc.add(new
> > > Field("fieldsort",newRem,Field.Store.YES,Field.Index.ANALYZED));
> > >     doc.add(new Field("fieldsort2",rem.toLowerCase().replaceAll("-",
> > > "").replaceAll(" ", ""),Field.Store.YES,Field.Index.ANALYZED));
> > >
> > >     doc.add(new
> > > Field("field1",Author,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > >     doc.add(new
> > > Field("field2",Book,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > >     doc.add(new
> > > Field("field3",sec,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > >
> > >     writer.addDocument(doc);
> > >
> > > }
> > >     is.close();
> > >
> > > {/code}
> > >
> > > On Thu, Apr 20, 2017 at 5:57 PM, Adrien Grand <[hidden email]>
> wrote:
> > >
> > > > IndexSearcher.doc is the right way to retrieve documents. If this is
> > > > slowing things down for you, I'm wondering that you might be fetching
> > too
> > > > many results?
> > > >
> > > > Le jeu. 20 avr. 2017 à 14:16, neeraj shah <[hidden email]> a
> > > > écrit :
> > > >
> > > > > Hello Everyone,
> > > > >
> > > > > I am using Lucene 3.6. I have to index around 60k docuemnts. After
> > > > > performing the search when i try to reterive documents from seacher
> > > using
> > > > > searcher.doc(docid)  it slows down the search .
> > > > > Please is there any other way to get the document.
> > > > >
> > > > > Also if anyone can give me an end-to-end example for working
> > > FieldCache.
> > > > > While implementing the cache i have :
> > > > >
> > > > > int[] fieldIds = FieldCache.DEFAULT.getInts(indexMultiReader, "id");
> > > > >
> > > > > now i dont know how to further use the fieldIds for improving search.
> > > > > Please give me an end-to-end example.
> > > > >
> > > > > Thanks
> > > > > Neeraj
> > > > >
> > > >
> > >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to get document effectively. or FieldCache example

Chris Hostetter-3
In reply to this post by neerajshah84

: then which one is right tool for text searching in files. please can you
: suggest me?

so far all you've done is show us your *indexing* code; and said that
after you do a search, calling searcher.doc(docid) on 500,000 documents is
slow.

But you still haven't described the usecase you are trying to solve -- ie:
*WHY* do you want these 500,000 results from your search? Once you get
those Documents back, *WHAT* are you going to do with them?

If you show us some code, and talk us through your goal, then we can help
you -- otherwise all we can do is warn you that the specific
searcher.doc(docid) API isn't designed to be efficient at that large a
scale.  Other APIs in Lucene are designed to be efficient at large scale,
but we don't really know what to suggest w/o knowing what you're trying to
do...

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341


PS: please, Please PLEASE upgrade to Lucene 6.x.  3.6 is more then 5 years
old, and completley unsupported -- any advice you are given on this list
is likeley to refer to APIs that are completley different then the version
of Lucene you are working with.


:
:
: On Fri, Apr 21, 2017 at 2:01 PM, Adrien Grand <[hidden email]> wrote:
:
: > Lucene is not designed for retrieving that many results. What are you doing
: > with those 5 lacs documents, I suspect this is too much to display so you
: > probably perform some computations on them? If so maybe you could move them
: > to Lucene using eg. facets? If that does not work, I'm afraid that Lucene
: > is not the right tool for your problem.
: >
: > Le ven. 21 avr. 2017 à 08:56, neeraj shah <[hidden email]> a
: > écrit :
: >
: > > Yes I fetching around 5 lacs result from index searcher.
: > > Also i am indexing each line of each file because while searching i need
: > > all the lines of a file which has matched term.
: > > Please tell me am i doing it right.
: > > {code}
: > >
: > > InputStream  is = new BufferedInputStream(new FileInputStream(file));
: > >     BufferedReader bufr = new BufferedReader(new InputStreamReader(is));
: > >     String inputLine="" ;
: > >
: > >     while((inputLine=bufr.readLine())!=null ){
: > > Document doc = new Document();
: > >     doc.add(new
: > >
: > > Field("contents",inputLine,Field.Store.YES,Field.Index.
: > ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS));
: > >     doc.add(new
: > > Field("title",section,Field.Store.YES,Field.Index.NOT_ANALYZED));
: > >     String newRem = new String(rem);
: > >
: > >     doc.add(new
: > > Field("fieldsort",newRem,Field.Store.YES,Field.Index.ANALYZED));
: > >     doc.add(new Field("fieldsort2",rem.toLowerCase().replaceAll("-",
: > > "").replaceAll(" ", ""),Field.Store.YES,Field.Index.ANALYZED));
: > >
: > >     doc.add(new
: > > Field("field1",Author,Field.Store.YES,Field.Index.NOT_ANALYZED));
: > >     doc.add(new
: > > Field("field2",Book,Field.Store.YES,Field.Index.NOT_ANALYZED));
: > >     doc.add(new
: > > Field("field3",sec,Field.Store.YES,Field.Index.NOT_ANALYZED));
: > >
: > >     writer.addDocument(doc);
: > >
: > > }
: > >     is.close();
: > >
: > > {/code}
: > >
: > > On Thu, Apr 20, 2017 at 5:57 PM, Adrien Grand <[hidden email]> wrote:
: > >
: > > > IndexSearcher.doc is the right way to retrieve documents. If this is
: > > > slowing things down for you, I'm wondering that you might be fetching
: > too
: > > > many results?
: > > >
: > > > Le jeu. 20 avr. 2017 à 14:16, neeraj shah <[hidden email]> a
: > > > écrit :
: > > >
: > > > > Hello Everyone,
: > > > >
: > > > > I am using Lucene 3.6. I have to index around 60k docuemnts. After
: > > > > performing the search when i try to reterive documents from seacher
: > > using
: > > > > searcher.doc(docid)  it slows down the search .
: > > > > Please is there any other way to get the document.
: > > > >
: > > > > Also if anyone can give me an end-to-end example for working
: > > FieldCache.
: > > > > While implementing the cache i have :
: > > > >
: > > > > int[] fieldIds = FieldCache.DEFAULT.getInts(indexMultiReader, "id");
: > > > >
: > > > > now i dont know how to further use the fieldIds for improving search.
: > > > > Please give me an end-to-end example.
: > > > >
: > > > > Thanks
: > > > > Neeraj
: > > > >
: > > >
: > >
: >
:

-Hoss
http://www.lucidworks.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to get document effectively. or FieldCache example

neerajshah84
Hello,
Let me explain my case:
- suppose I am  searching word ("pain" (in same chapter) "head") .    This
is my query.
 Now what i need to do is i need to first search "pain" and then i need to
search "head" seperately then i need common file name of both search result.
Now the criteria is Suppose:

FileA - Chapter A  - has word only "*pain*"
FileB - Chapter B  - has word both "*head*" and "*pain*"
FileC - Chapter A  - has word only "*head*"
FileD - Chapter D  - has only word "*head*"
FileE -  Chapter A - has only word "*pain*"

Now the result should be:
FileA - Chapter A  - has word only "*pain*"
FileB - Chapter B  - has word both "*head*" and "*pain*"
FileC - Chapter A  - has word only "*head*"
FileE -  Chapter A - has only word "*pain*"

FileD - Chapter D  - has only word "*head*" will not appear in search
result because "Chapter D" name is not same as other chapters which has
both search words.
In short I have to show only those chapters from any book but with same
chapter name which has both search word or atleast one search word. But
chapter name should be same.

Above is my requirement that is why I was parsing all hits for pain and
head seperatly then i was collecting common "title" or chapter name from
both results or the result which has atleast one search word and same
chapter name.
In my result only "pain" word has "5 Lacs result" and "head" word has "60K"
results.

Please suggest me if you have other approach in mind.

Thanks,
Neeraj






On Sat, Apr 22, 2017 at 12:20 AM, Chris Hostetter <[hidden email]>
wrote:

>
> : then which one is right tool for text searching in files. please can you
> : suggest me?
>
> so far all you've done is show us your *indexing* code; and said that
> after you do a search, calling searcher.doc(docid) on 500,000 documents is
> slow.
>
> But you still haven't described the usecase you are trying to solve -- ie:
> *WHY* do you want these 500,000 results from your search? Once you get
> those Documents back, *WHAT* are you going to do with them?
>
> If you show us some code, and talk us through your goal, then we can help
> you -- otherwise all we can do is warn you that the specific
> searcher.doc(docid) API isn't designed to be efficient at that large a
> scale.  Other APIs in Lucene are designed to be efficient at large scale,
> but we don't really know what to suggest w/o knowing what you're trying to
> do...
>
> https://people.apache.org/~hossman/#xyproblem
> XY Problem
>
> Your question appears to be an "XY Problem" ... that is: you are dealing
> with "X", you are assuming "Y" will help you, and you are asking about "Y"
> without giving more details about the "X" so that we can understand the
> full issue.  Perhaps the best solution doesn't involve "Y" at all?
> See Also: http://www.perlmonks.org/index.pl?node_id=542341
>
>
> PS: please, Please PLEASE upgrade to Lucene 6.x.  3.6 is more then 5 years
> old, and completley unsupported -- any advice you are given on this list
> is likeley to refer to APIs that are completley different then the version
> of Lucene you are working with.
>
>
> :
> :
> : On Fri, Apr 21, 2017 at 2:01 PM, Adrien Grand <[hidden email]> wrote:
> :
> : > Lucene is not designed for retrieving that many results. What are you
> doing
> : > with those 5 lacs documents, I suspect this is too much to display so
> you
> : > probably perform some computations on them? If so maybe you could move
> them
> : > to Lucene using eg. facets? If that does not work, I'm afraid that
> Lucene
> : > is not the right tool for your problem.
> : >
> : > Le ven. 21 avr. 2017 à 08:56, neeraj shah <[hidden email]> a
> : > écrit :
> : >
> : > > Yes I fetching around 5 lacs result from index searcher.
> : > > Also i am indexing each line of each file because while searching i
> need
> : > > all the lines of a file which has matched term.
> : > > Please tell me am i doing it right.
> : > > {code}
> : > >
> : > > InputStream  is = new BufferedInputStream(new FileInputStream(file));
> : > >     BufferedReader bufr = new BufferedReader(new
> InputStreamReader(is));
> : > >     String inputLine="" ;
> : > >
> : > >     while((inputLine=bufr.readLine())!=null ){
> : > > Document doc = new Document();
> : > >     doc.add(new
> : > >
> : > > Field("contents",inputLine,Field.Store.YES,Field.Index.
> : > ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS));
> : > >     doc.add(new
> : > > Field("title",section,Field.Store.YES,Field.Index.NOT_ANALYZED));
> : > >     String newRem = new String(rem);
> : > >
> : > >     doc.add(new
> : > > Field("fieldsort",newRem,Field.Store.YES,Field.Index.ANALYZED));
> : > >     doc.add(new Field("fieldsort2",rem.toLowerCase().replaceAll("-",
> : > > "").replaceAll(" ", ""),Field.Store.YES,Field.Index.ANALYZED));
> : > >
> : > >     doc.add(new
> : > > Field("field1",Author,Field.Store.YES,Field.Index.NOT_ANALYZED));
> : > >     doc.add(new
> : > > Field("field2",Book,Field.Store.YES,Field.Index.NOT_ANALYZED));
> : > >     doc.add(new
> : > > Field("field3",sec,Field.Store.YES,Field.Index.NOT_ANALYZED));
> : > >
> : > >     writer.addDocument(doc);
> : > >
> : > > }
> : > >     is.close();
> : > >
> : > > {/code}
> : > >
> : > > On Thu, Apr 20, 2017 at 5:57 PM, Adrien Grand <[hidden email]>
> wrote:
> : > >
> : > > > IndexSearcher.doc is the right way to retrieve documents. If this
> is
> : > > > slowing things down for you, I'm wondering that you might be
> fetching
> : > too
> : > > > many results?
> : > > >
> : > > > Le jeu. 20 avr. 2017 à 14:16, neeraj shah <[hidden email]>
> a
> : > > > écrit :
> : > > >
> : > > > > Hello Everyone,
> : > > > >
> : > > > > I am using Lucene 3.6. I have to index around 60k docuemnts.
> After
> : > > > > performing the search when i try to reterive documents from
> seacher
> : > > using
> : > > > > searcher.doc(docid)  it slows down the search .
> : > > > > Please is there any other way to get the document.
> : > > > >
> : > > > > Also if anyone can give me an end-to-end example for working
> : > > FieldCache.
> : > > > > While implementing the cache i have :
> : > > > >
> : > > > > int[] fieldIds = FieldCache.DEFAULT.getInts(indexMultiReader,
> "id");
> : > > > >
> : > > > > now i dont know how to further use the fieldIds for improving
> search.
> : > > > > Please give me an end-to-end example.
> : > > > >
> : > > > > Thanks
> : > > > > Neeraj
> : > > > >
> : > > >
> : > >
> : >
> :
>
> -Hoss
> http://www.lucidworks.com/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to get document effectively. or FieldCache example

Jacques Uber
Have you considered indexing chapters as documents? Using your example you
would have three documents corresponding to your three chapters: A, B, and
D. Once you have that structure the query "pain AND head" returns only
chapters A and B. Using the information gained from this new chapter index
you could then use your existing index to do "pain AND head AND (chapter:A
OR chapter:B)"

On Fri, Apr 21, 2017 at 10:40 PM, neeraj shah <[hidden email]>
wrote:

> Hello,
> Let me explain my case:
> - suppose I am  searching word ("pain" (in same chapter) "head") .    This
> is my query.
>  Now what i need to do is i need to first search "pain" and then i need to
> search "head" seperately then i need common file name of both search
> result.
> Now the criteria is Suppose:
>
> FileA - Chapter A  - has word only "*pain*"
> FileB - Chapter B  - has word both "*head*" and "*pain*"
> FileC - Chapter A  - has word only "*head*"
> FileD - Chapter D  - has only word "*head*"
> FileE -  Chapter A - has only word "*pain*"
>
> Now the result should be:
> FileA - Chapter A  - has word only "*pain*"
> FileB - Chapter B  - has word both "*head*" and "*pain*"
> FileC - Chapter A  - has word only "*head*"
> FileE -  Chapter A - has only word "*pain*"
>
> FileD - Chapter D  - has only word "*head*" will not appear in search
> result because "Chapter D" name is not same as other chapters which has
> both search words.
> In short I have to show only those chapters from any book but with same
> chapter name which has both search word or atleast one search word. But
> chapter name should be same.
>
> Above is my requirement that is why I was parsing all hits for pain and
> head seperatly then i was collecting common "title" or chapter name from
> both results or the result which has atleast one search word and same
> chapter name.
> In my result only "pain" word has "5 Lacs result" and "head" word has "60K"
> results.
>
> Please suggest me if you have other approach in mind.
>
> Thanks,
> Neeraj
>
>
>
>
>
>
> On Sat, Apr 22, 2017 at 12:20 AM, Chris Hostetter <
> [hidden email]>
> wrote:
>
> >
> > : then which one is right tool for text searching in files. please can
> you
> > : suggest me?
> >
> > so far all you've done is show us your *indexing* code; and said that
> > after you do a search, calling searcher.doc(docid) on 500,000 documents
> is
> > slow.
> >
> > But you still haven't described the usecase you are trying to solve --
> ie:
> > *WHY* do you want these 500,000 results from your search? Once you get
> > those Documents back, *WHAT* are you going to do with them?
> >
> > If you show us some code, and talk us through your goal, then we can help
> > you -- otherwise all we can do is warn you that the specific
> > searcher.doc(docid) API isn't designed to be efficient at that large a
> > scale.  Other APIs in Lucene are designed to be efficient at large scale,
> > but we don't really know what to suggest w/o knowing what you're trying
> to
> > do...
> >
> > https://people.apache.org/~hossman/#xyproblem
> > XY Problem
> >
> > Your question appears to be an "XY Problem" ... that is: you are dealing
> > with "X", you are assuming "Y" will help you, and you are asking about
> "Y"
> > without giving more details about the "X" so that we can understand the
> > full issue.  Perhaps the best solution doesn't involve "Y" at all?
> > See Also: http://www.perlmonks.org/index.pl?node_id=542341
> >
> >
> > PS: please, Please PLEASE upgrade to Lucene 6.x.  3.6 is more then 5
> years
> > old, and completley unsupported -- any advice you are given on this list
> > is likeley to refer to APIs that are completley different then the
> version
> > of Lucene you are working with.
> >
> >
> > :
> > :
> > : On Fri, Apr 21, 2017 at 2:01 PM, Adrien Grand <[hidden email]>
> wrote:
> > :
> > : > Lucene is not designed for retrieving that many results. What are you
> > doing
> > : > with those 5 lacs documents, I suspect this is too much to display so
> > you
> > : > probably perform some computations on them? If so maybe you could
> move
> > them
> > : > to Lucene using eg. facets? If that does not work, I'm afraid that
> > Lucene
> > : > is not the right tool for your problem.
> > : >
> > : > Le ven. 21 avr. 2017 à 08:56, neeraj shah <[hidden email]> a
> > : > écrit :
> > : >
> > : > > Yes I fetching around 5 lacs result from index searcher.
> > : > > Also i am indexing each line of each file because while searching i
> > need
> > : > > all the lines of a file which has matched term.
> > : > > Please tell me am i doing it right.
> > : > > {code}
> > : > >
> > : > > InputStream  is = new BufferedInputStream(new
> FileInputStream(file));
> > : > >     BufferedReader bufr = new BufferedReader(new
> > InputStreamReader(is));
> > : > >     String inputLine="" ;
> > : > >
> > : > >     while((inputLine=bufr.readLine())!=null ){
> > : > > Document doc = new Document();
> > : > >     doc.add(new
> > : > >
> > : > > Field("contents",inputLine,Field.Store.YES,Field.Index.
> > : > ANALYZED,Field.TermVector.WITH_POSITIONS_OFFSETS));
> > : > >     doc.add(new
> > : > > Field("title",section,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > : > >     String newRem = new String(rem);
> > : > >
> > : > >     doc.add(new
> > : > > Field("fieldsort",newRem,Field.Store.YES,Field.Index.ANALYZED));
> > : > >     doc.add(new Field("fieldsort2",rem.
> toLowerCase().replaceAll("-",
> > : > > "").replaceAll(" ", ""),Field.Store.YES,Field.Index.ANALYZED));
> > : > >
> > : > >     doc.add(new
> > : > > Field("field1",Author,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > : > >     doc.add(new
> > : > > Field("field2",Book,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > : > >     doc.add(new
> > : > > Field("field3",sec,Field.Store.YES,Field.Index.NOT_ANALYZED));
> > : > >
> > : > >     writer.addDocument(doc);
> > : > >
> > : > > }
> > : > >     is.close();
> > : > >
> > : > > {/code}
> > : > >
> > : > > On Thu, Apr 20, 2017 at 5:57 PM, Adrien Grand <[hidden email]>
> > wrote:
> > : > >
> > : > > > IndexSearcher.doc is the right way to retrieve documents. If this
> > is
> > : > > > slowing things down for you, I'm wondering that you might be
> > fetching
> > : > too
> > : > > > many results?
> > : > > >
> > : > > > Le jeu. 20 avr. 2017 à 14:16, neeraj shah <
> [hidden email]>
> > a
> > : > > > écrit :
> > : > > >
> > : > > > > Hello Everyone,
> > : > > > >
> > : > > > > I am using Lucene 3.6. I have to index around 60k docuemnts.
> > After
> > : > > > > performing the search when i try to reterive documents from
> > seacher
> > : > > using
> > : > > > > searcher.doc(docid)  it slows down the search .
> > : > > > > Please is there any other way to get the document.
> > : > > > >
> > : > > > > Also if anyone can give me an end-to-end example for working
> > : > > FieldCache.
> > : > > > > While implementing the cache i have :
> > : > > > >
> > : > > > > int[] fieldIds = FieldCache.DEFAULT.getInts(indexMultiReader,
> > "id");
> > : > > > >
> > : > > > > now i dont know how to further use the fieldIds for improving
> > search.
> > : > > > > Please give me an end-to-end example.
> > : > > > >
> > : > > > > Thanks
> > : > > > > Neeraj
> > : > > > >
> > : > > >
> > : > >
> > : >
> > :
> >
> > -Hoss
> > http://www.lucidworks.com/
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
Loading...