Directory.list() deprecation

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Directory.list() deprecation

Daniel Noll-3-2
Hi all.

I am trying to clean up some deprecated calls which are showing up on
upgrading to 2.9.0 (from 2.3.2...), and I have just come across
Directory.list(), which says this:

> Deprecated For some Directory implementations (FSDirectory, and its subclasses), this method silently filters its results to include only index files. Please use listAll instead, which does no filtering.

  * We have files in there which aren't Lucene's so obviously
listAll() will not work.
  * We can't use FSDirectory directly because our tests rely on the
Directory abstraction so that they can use a RAMDirectory.

Given this, what is the suggested replacement for this method once it goes away?

I'm not sure I understand the motivation for the change in list(), but
I do think it was inconsistent for Directory implementations to
perform different filtering (they should have at least all used the
same filter.)

Daniel

--
Daniel Noll                            Forensic and eDiscovery Software
Senior Developer                              The world's most advanced
Nuix                                                email data analysis
http://nuix.com/                                and eDiscovery software

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Directory.list() deprecation

Michael McCandless-2
Well... you can use oal.index.IndexFileNameFilter.getFilter() to
filter for only the Lucene index files, or, you could filter for the
additional files you know you've placed in the index directory?

The motivation for this change was that Directory is the wrong place
to have "smarts" about what is & isn't an index file: it's too
low-level (and, different Directory impls were inconsistent).
Especially with flexible indexing coming, where a codec can write
whatever files it wants, the Directory has no way know.

Some details are in http://issues.apache.org/jira/browse/LUCENE-1468.

Mike

On Fri, Nov 6, 2009 at 12:39 AM, Daniel Noll <[hidden email]> wrote:

> Hi all.
>
> I am trying to clean up some deprecated calls which are showing up on
> upgrading to 2.9.0 (from 2.3.2...), and I have just come across
> Directory.list(), which says this:
>
>> Deprecated For some Directory implementations (FSDirectory, and its subclasses), this method silently filters its results to include only index files. Please use listAll instead, which does no filtering.
>
>  * We have files in there which aren't Lucene's so obviously
> listAll() will not work.
>  * We can't use FSDirectory directly because our tests rely on the
> Directory abstraction so that they can use a RAMDirectory.
>
> Given this, what is the suggested replacement for this method once it goes away?
>
> I'm not sure I understand the motivation for the change in list(), but
> I do think it was inconsistent for Directory implementations to
> perform different filtering (they should have at least all used the
> same filter.)
>
> Daniel
>
> --
> Daniel Noll                            Forensic and eDiscovery Software
> Senior Developer                              The world's most advanced
> Nuix                                                email data analysis
> http://nuix.com/                                and eDiscovery software
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Directory.list() deprecation

Daniel Noll-3-2
On Fri, Nov 6, 2009 at 20:26, Michael McCandless
<[hidden email]> wrote:
> Well... you can use oal.index.IndexFileNameFilter.getFilter() to
> filter for only the Lucene index files, or, you could filter for the
> additional files you know you've placed in the index directory?

This is the workaround we're currently using, but it's pretty obvious
why it's less than ideal:

    FileNameFilter filter = IndexFileNameFilter.getFilter();
    List<String> results = new ArrayList<String>();
    for (String candidate : dir.listAll()) {
        if (filter.accept(null, candidate)) {  // <--
            results.add(candidate);
        }
    }

The biggest issue here is that the FileNameFilter forces us to provide
a File for the first parameter even though the index may not even be
stored on disk.  We can pass null and hope that the filter won't have
an issue with that, which works ... *for now*.

> The motivation for this change was that Directory is the wrong place
> to have "smarts" about what is & isn't an index file: it's too
> low-level (and, different Directory impls were inconsistent).
> Especially with flexible indexing coming, where a codec can write
> whatever files it wants, the Directory has no way know.

This seems reasonable, but it would have been nice to have a list
method which accepted a filter so that there would at least be a
replacement for the old behaviour.  The way it is now, Lucene has
deprecated a method people were using while providing no replacement
except for "write it yourself", the same as what happened when Hits
got canned.

Daniel


--
Daniel Noll                            Forensic and eDiscovery Software
Senior Developer                              The world's most advanced
Nuix                                                email data analysis
http://nuix.com/                                and eDiscovery software

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Directory.list() deprecation

Michael McCandless-2
On Sun, Nov 8, 2009 at 4:58 PM, Daniel Noll <[hidden email]> wrote:

>> Well... you can use oal.index.IndexFileNameFilter.getFilter() to
>> filter for only the Lucene index files, or, you could filter for the
>> additional files you know you've placed in the index directory?
>
> This is the workaround we're currently using, but it's pretty obvious
> why it's less than ideal:
>
>    FileNameFilter filter = IndexFileNameFilter.getFilter();
>    List<String> results = new ArrayList<String>();
>    for (String candidate : dir.listAll()) {
>        if (filter.accept(null, candidate)) {  // <--
>            results.add(candidate);
>        }
>    }
>
> The biggest issue here is that the FileNameFilter forces us to provide
> a File for the first parameter even though the index may not even be
> stored on disk.  We can pass null and hope that the filter won't have
> an issue with that, which works ... *for now*.

I don't expect IndexFileNameFilter will ever look at that (File
directory) argument.  Lucene itself does the same thing (passes null),
internally, whenever it uses IndexFileNameFilter.

>> The motivation for this change was that Directory is the wrong place
>> to have "smarts" about what is & isn't an index file: it's too
>> low-level (and, different Directory impls were inconsistent).
>> Especially with flexible indexing coming, where a codec can write
>> whatever files it wants, the Directory has no way know.
>
> This seems reasonable, but it would have been nice to have a list
> method which accepted a filter so that there would at least be a
> replacement for the old behaviour.  The way it is now, Lucene has
> deprecated a method people were using while providing no replacement
> except for "write it yourself", the same as what happened when Hits
> got canned.

Honestly I thought the effort to write the above code was trivial
enough that preserving this inside Lucene was not necessary.  But I
guess would have been good to include such a code fragment in the
javadocs for list().

Stepping back, since presumably your app knows what it's storing in
the directory, can't you filter for files you know you've created?
What's the larger use case here?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Directory.list() deprecation

Daniel Noll-3-2
On Tue, Nov 10, 2009 at 00:44, Michael McCandless
<[hidden email]> wrote:
> Stepping back, since presumably your app knows what it's storing in
> the directory, can't you filter for files you know you've created?
> What's the larger use case here?

The exact use case where we were using list() is to determine whether
the index had data in it, without having to open it and do a
docCount() (well, there were also calls to it in the unit tests, but
those were entirely replaceable with listAll()).

This was previously a one-liner:

    boolean containsData = directory.list().length > 1

Maybe there is another newer API which will return this to being a
one-liner -- at the time it was written this seemed to be the best
option.

By the way, when I mean "there is no data in it", I mean the index
exists but has 0 documents.  Detecting that the index itself does not
exist is somewhat simpler.

Daniel


--
Daniel Noll                            Forensic and eDiscovery Software
Senior Developer                              The world's most advanced
Nuix                                                email data analysis
http://nuix.com/                                and eDiscovery software

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Directory.list() deprecation

Michael McCandless-2
On Mon, Nov 9, 2009 at 7:53 PM, Daniel Noll <[hidden email]> wrote:

> On Tue, Nov 10, 2009 at 00:44, Michael McCandless
> <[hidden email]> wrote:
>> Stepping back, since presumably your app knows what it's storing in
>> the directory, can't you filter for files you know you've created?
>> What's the larger use case here?
>
> The exact use case where we were using list() is to determine whether
> the index had data in it, without having to open it and do a
> docCount() (well, there were also calls to it in the unit tests, but
> those were entirely replaceable with listAll()).
>
> This was previously a one-liner:
>
>    boolean containsData = directory.list().length > 1
>
> Maybe there is another newer API which will return this to being a
> one-liner -- at the time it was written this seemed to be the best
> option.
>
> By the way, when I mean "there is no data in it", I mean the index
> exists but has 0 documents.  Detecting that the index itself does not
> exist is somewhat simpler.

I see.

There's IndexReader.indexExists(), but it sounds like that's not what
you want because you want to check whether in fact it has > 0 docs in
it.

Otherwise, I think something like this (requires 2.9, since prior to
that SegmentInfos isn't public) should work:

    SegmentInfos sis = new SegmentInfos();
    try {
      sis.read(dir);
    } catch (IOException ioe) {
      // presumably no index exists
    }
    int totDocCount = 0;
    for(SegmentInfo info : sis) {
      totDocCount += info.docCount;
    }

It's not a one-liner, but it's fast to run since it just reads the
segments file.  But remember that SegmentInfos has forward rights to
break back-compat ("subject to change suddenly in the next release")!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]