Re: svn-commit: 168449 FSDirectory

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: svn-commit: 168449 FSDirectory

Bernhard Messer
Hi Daniel,

i just had a look at the new implementation that FSDirectory deletes
lucene related files only. I like the patch, but i think we left some
room for optimization. In the current implementation, it's necessary to
run thru all known Lucene extensions (13 for the moment), for each call
of "LuceneFileFilter.accept()". If creating an index in a directory
which contains several hundred files, this definitly will be a
bottleneck. So creating a new Index in a directory containing 100 files,
we will endup with 1300 calls to
"if (name.endsWith("."+IndexReader.FILENAME_EXTENSIONS[i]))" which
always needs to create a new StringBuffer to merge the two strings.

Therefore i would like to propose two changes:
1) we should store the extension in a hash and not in String[] to have a
faster lookup
2) check for the file extension only without using the "."

any thoughts

Bernhard


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: svn-commit: 168449 FSDirectory

Doug Cutting
Bernhard Messer wrote:
> Therefore i would like to propose two changes:
> 1) we should store the extension in a hash and not in String[] to have a
> faster lookup

Do you mean to use something like:

String lastDot = name.lastIndexOf('.');
if (lastDot >= 0) {
   String nameExt = name.substring(lastDot+1, name.length());
   if (FILENAME_EXTENSIONS.get(nameExt)) {
     ...
   }
}

That would allocate a new String in each case, which would be
substantially faster.  Is that what you meant?

> 2) check for the file extension only without using the "."

Do you mean something like:

   String ext = IndexReader.FILENAME_EXTENSIONS[i];
   if (name.endsWith(ext)
       && name.length >= ext.length()+1
       && '.' == name.charAt(name.length()-(ext.length()+1))) {
     ...
   }

I don't see how this works with (1) above.  My guess is that (1) alone
would be fastest, even though it still allocates objects.

Also, as mentioned in another message, I think this class, along with
other index file name logic, should all move to a single place.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: svn-commit: 168449 FSDirectory

Bernhard Messer
Doug Cutting wrote:

> Bernhard Messer wrote:
>
>> Therefore i would like to propose two changes:
>> 1) we should store the extension in a hash and not in String[] to
>> have a faster lookup
>
>
> Do you mean to use something like:
>
> String lastDot = name.lastIndexOf('.');
> if (lastDot >= 0) {
>   String nameExt = name.substring(lastDot+1, name.length());
>   if (FILENAME_EXTENSIONS.get(nameExt)) {
>     ...
>   }
> }
>
> That would allocate a new String in each case, which would be
> substantially faster.  Is that what you meant?

exactly, that's what i had in mind. I know that we have to allocate a
new string object for every call, but this must be much cheaper than the
current implementation which has to walk thru the whole array every time
the method is called.

Bernhard


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: svn-commit: 168449 FSDirectory

Chris Nokleberg
On Tue, 07 Jun 2005 09:13:10 +0200, Bernhard Messer wrote:
> exactly, that's what i had in mind. I know that we have to allocate a
> new string object for every call, but this must be much cheaper than the
> current implementation which has to walk thru the whole array every time
> the method is called.

If you're still concerned about performance you can use a "trie" data
structure, which takes a little effort to build but can match very quickly
(with no allocations). I've attached an example implementation that works
for filename extensions which only use the letters 'a'-'z' (minimally
tested):

  private static final ExtensionMatcher MATCHER =
    new ExtensionMatcher(new String[]{
      "cfs", "fnm", "fdx", "fdt", "tii",
      "tis", "frq", "prx", "del", "tvx",
      "tvd", "tvf", "tvp",
    });

  MATCHER.match("foo.cfs"); // returns true

BTW, I think it is a bad idea to have FILENAME_EXTENSIONS as a public
array. It being final does not prevent code from changing the contents of
the array. If the extensions must be public I would recommend either an
accessor that returns a copy of the array, or an unmodifiable
collection/set/list.

Chris

////////////////////////////////////////////////////////////////////////

import java.util.*;

public class ExtensionMatcher
{
    private Object[] tree;
   
    public ExtensionMatcher(String[] exts)
    {
        tree = create(Arrays.asList(exts), 0);
    }

    private static Object[] create(List exts, int index)
    {
        Object[] array = new Object[27];
        Map byLetter = new HashMap();
        for (Iterator it = exts.iterator(); it.hasNext();) {
            String ext = (String)it.next();
            int length = ext.length();
            if (length > index) {
                Character c = new Character(ext.charAt(length - 1 - index));
                List list = (List)byLetter.get(c);
                if (list == null)
                    byLetter.put(c, list = new ArrayList());
                list.add(ext);
            } else if (length == index) {
                array[26] = Boolean.TRUE;
            }
        }
        for (Iterator it = byLetter.keySet().iterator(); it.hasNext();) {
            Character c = (Character)it.next();
            char val = c.charValue();
            if (val < 'a' || val > 'z')
                throw new IllegalArgumentException("Extension must be between 'a' and 'z'");
            array[val - 'a'] = create((List)byLetter.get(c), index + 1);
        }
        return array;
    }

    public boolean match(String file)
    {
        int index = 0;
        int length = file.length();
        Object[] array = tree;
        for (;;) {
            if (index >= length)
                return false;
            char val = file.charAt(length - 1 - index);
            if (val == '.' && array[26] != null)
                return true;
            if (val < 'a' || val > 'z')
                return false;
            array = (Object[])array[val - 'a'];
            if (array == null)
                return false;
            index++;
        }
    }
}



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]