Question regarding the index files

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Question regarding the index files

yaishb
Hello,

I'm coping a question I've asked in the Users lists, but I think it requires some patching effort, so maybe that list will be more suitable. The question is as follow.

I'm using Lucene2.4. I'm developing a web application that using Lucene (via compass) to do the searches.
I'm intending to deploy the application in Google App Engine (http://code.google.com/appengine/), which limits files length to be smaller than 10MB. I've read about the various policies supported by Lucene to limit the file sizes, but on matter which policy I used and which parameters, the index files still grew to be lot more the 10MB. Looking at the code, I've managed to limit the cfs files (predicting the file size in CompoundFileWriter before closing the file) - I guess that will degrade performance, but it's OK for now. But now the FDT files are becoming huge (about 60MB) and I cant identifiy a way to limit those files.

Is there some built-in and correct way to limit these files length? If no, can someone direct me please how should I tweak the source code to achieve that?

Thanks for any help.
Reply | Threaded
Open this post in threaded view
|

Re: Question regarding the index files

Michael McCandless-2
I answered on java-user.  I think it should be able to be done w/o
source code changes to Lucene.

Mike

On Thu, Sep 10, 2009 at 2:39 AM, Dvora <[hidden email]> wrote:

>
> Hello,
>
> I'm coping a question I've asked in the Users lists, but I think it requires
> some patching effort, so maybe that list will be more suitable. The question
> is as follow.
>
> I'm using Lucene2.4. I'm developing a web application that using Lucene (via
> compass) to do the searches.
> I'm intending to deploy the application in Google App Engine
> (http://code.google.com/appengine/), which limits files length to be smaller
> than 10MB. I've read about the various policies supported by Lucene to limit
> the file sizes, but on matter which policy I used and which parameters, the
> index files still grew to be lot more the 10MB. Looking at the code, I've
> managed to limit the cfs files (predicting the file size in
> CompoundFileWriter before closing the file) - I guess that will degrade
> performance, but it's OK for now. But now the FDT files are becoming huge
> (about 60MB) and I cant identifiy a way to limit those files.
>
> Is there some built-in and correct way to limit these files length? If no,
> can someone direct me please how should I tweak the source code to achieve
> that?
>
> Thanks for any help.
> --
> View this message in context: http://www.nabble.com/Question-regarding-the-index-files-tp25378103p25378103.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

LowerCaseFilter, is there a reason why the class is final?

Daniel Shane-2
In reply to this post by yaishb
Hi all,

I was wondering why the LowerCaseFilter is declared final? In my code, I
would like to extend it but apparently its not possible. I'm just
wondering why extending this type of class is considered evil?

Daniel Shane

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: LowerCaseFilter, is there a reason why the class is final?

Uwe Schindler
See https://issues.apache.org/jira/browse/LUCENE-1753

In general, if you want to add functionality plug another filter into the
chain. At least the implementations should be final (next/incrementToken).

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]


> -----Original Message-----
> From: Daniel Shane [mailto:[hidden email]]
> Sent: Thursday, September 10, 2009 4:06 PM
> To: [hidden email]
> Subject: LowerCaseFilter, is there a reason why the class is final?
>
> Hi all,
>
> I was wondering why the LowerCaseFilter is declared final? In my code, I
> would like to extend it but apparently its not possible. I'm just
> wondering why extending this type of class is considered evil?
>
> Daniel Shane
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: LowerCaseFilter, is there a reason why the class is final?

Uwe Schindler
I forget, this known as "Decorator Pattern":
http://en.wikipedia.org/wiki/Decorator_pattern

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]


> -----Original Message-----
> From: Uwe Schindler [mailto:[hidden email]]
> Sent: Thursday, September 10, 2009 4:09 PM
> To: [hidden email]
> Subject: RE: LowerCaseFilter, is there a reason why the class is final?
>
> See https://issues.apache.org/jira/browse/LUCENE-1753
>
> In general, if you want to add functionality plug another filter into the
> chain. At least the implementations should be final (next/incrementToken).
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]
>
>
> > -----Original Message-----
> > From: Daniel Shane [mailto:[hidden email]]
> > Sent: Thursday, September 10, 2009 4:06 PM
> > To: [hidden email]
> > Subject: LowerCaseFilter, is there a reason why the class is final?
> >
> > Hi all,
> >
> > I was wondering why the LowerCaseFilter is declared final? In my code, I
> > would like to extend it but apparently its not possible. I'm just
> > wondering why extending this type of class is considered evil?
> >
> > Daniel Shane
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: LowerCaseFilter, is there a reason why the class is final?

Daniel Shane-2
In reply to this post by Uwe Schindler
With the current API of TokenStream (incrementToken) I really do not see how I could do the following scenario with the Decorator Pattern :

I have a case where I would like to execute the LowerCaseFilter only if the token is of type "word". I do not want to execute the LowerCaseFilter if the token is of type "number" for example.

Unfortunately, I don't see how this is possible using the current API and the fact that the filters are final.

The only thing I can do is add a filter before the LowerCaseFilter that would pass all the non-word tokens to the next filter, but it seems really complicated for a case where a simple extend would do the job.

Or, if the API had two methods, one which increments the stream and another which process the current "token" or attributes then it would be possible to do what I have in mind using the Decorator Pattern.

Does anyone else see a way of doing this that is simple?

Daniel Shane

Uwe Schindler wrote:
See https://issues.apache.org/jira/browse/LUCENE-1753

In general, if you want to add functionality plug another filter into the
chain. At least the implementations should be final (next/incrementToken).

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]


  
-----Original Message-----
From: Daniel Shane [[hidden email]]
Sent: Thursday, September 10, 2009 4:06 PM
To: [hidden email]
Subject: LowerCaseFilter, is there a reason why the class is final?

Hi all,

I was wondering why the LowerCaseFilter is declared final? In my code, I
would like to extend it but apparently its not possible. I'm just
wondering why extending this type of class is considered evil?

Daniel Shane

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
    



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

  

Reply | Threaded
Open this post in threaded view
|

RE: LowerCaseFilter, is there a reason why the class is final?

Uwe Schindler
> The only thing I can do is add a filter before the LowerCaseFilter that
> would pass all the non-word tokens to the next filter, but it seems really
> complicated for a case where a simple extend would do the job.

This is the way to go!

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: LowerCaseFilter, is there a reason why the class is final?

Ted Dunning
In reply to this post by Daniel Shane-2

Copy/paste.  Clearly Uwe and others were worried that users wouldn't be able to extend these classes compatibly. 

My own opinion is that this causes worse problems with back compatibility because people wind up copying code instead of calling it.  You may be able to extend an abstract class to minimize your work.

On Fri, Sep 11, 2009 at 5:33 AM, Daniel Shane <[hidden email]> wrote:
Does anyone else see a way of doing this that is simple?



--
Ted Dunning, CTO
DeepDyve

Reply | Threaded
Open this post in threaded view
|

Re: LowerCaseFilter, is there a reason why the class is final?

Mark Miller-3
I think thats true - but its also interesting to note: LowerCaseFilter
has been final since it was put into svn in 01.

--
- Mark

http://www.lucidimagination.com



Ted Dunning wrote:

>
> Copy/paste.  Clearly Uwe and others were worried that users wouldn't
> be able to extend these classes compatibly.
>
> My own opinion is that this causes worse problems with back
> compatibility because people wind up copying code instead of calling
> it.  You may be able to extend an abstract class to minimize your work.
>
> On Fri, Sep 11, 2009 at 5:33 AM, Daniel Shane
> <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Does anyone else see a way of doing this that is simple?
>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: LowerCaseFilter, is there a reason why the class is final?

Ted Dunning

Lucene has always had way too much use of final for my taste.  I have often had to resort of chicanery to get around this.

In many cases, there wasn't even an alternative abstract class.  Writing test cases involving IndexWriters was a case that sticks in my memory.

On Fri, Sep 11, 2009 at 7:13 AM, Mark Miller <[hidden email]> wrote:
I think thats true - but its also interesting to note: LowerCaseFilter
has been final since it was put into svn in 01.



--
Ted Dunning, CTO
DeepDyve

Reply | Threaded
Open this post in threaded view
|

Re: LowerCaseFilter, is there a reason why the class is final?

Daniel Shane-2
In reply to this post by Ted Dunning
IMHO, if I'm forced to write a by-pass filter to re-use a filter instead of copy/pasting it, I think we are getting way off the Decorator Pattern. Its not simple anymore. I bet you have 9 chances out of 10 that a dev. will copy/paste that code before writing a by-pass filter.

Extending the functionality of a filter should not be something difficult. And having everyone write their own bypass filter seems really annoying. Imagine all those people having to write the by-pass filter.

We should include such a filter in Lucene natively and add in the JavaDocs of the filter the mention that you can extend them with it to avoid people copy/pasting code.

If you want I can cook up a draft to get things started.

Daniel Shane

Ted Dunning wrote:

Copy/paste.  Clearly Uwe and others were worried that users wouldn't be able to extend these classes compatibly. 

My own opinion is that this causes worse problems with back compatibility because people wind up copying code instead of calling it.  You may be able to extend an abstract class to minimize your work.

On Fri, Sep 11, 2009 at 5:33 AM, Daniel Shane <[hidden email]> wrote:
Does anyone else see a way of doing this that is simple?



--
Ted Dunning, CTO
DeepDyve