Delete entire index

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Delete entire index

Matt Mitchell-2
Hi,
Is there a way to have Solr completely remove the current index?  
<deleteAll/> ?

We're still in development and so our schema is wavering. Anytime we  
make a change and want to re-index we first have to:

stop tomcat (or the solr webapp)
manually remove the data/index
restart tomcat (or the solr webapp)

The removing of the data/index directory is where we have the most  
trouble, because of the file permissions. The data/index directory is  
owned by tomcat/tomcat so in order to remove it, we have to issue  
sudo rm which we'd like to avoid.

Ideally if we could just tell Solr to delete all data without having  
to do anymore manual work, it'd be great! : )

Something else that would help is if we tell Tomcat/Solr which user/
group and/or permission to use on the data/index directory when it's  
created.

Any thoughts on this?

Matt
Reply | Threaded
Open this post in threaded view
|

Re: Delete entire index

Thiago Jackiw-2
Matt,

I could be wrong, but I think you can send a "delete by query" syntax:
<delete><query>*:*</query></delete>

--
Thiago Jackiw
acts_as_solr => http://acts-as-solr.railsfreaks.com


On 6/13/07, Matt Mitchell <[hidden email]> wrote:

> Hi,
> Is there a way to have Solr completely remove the current index?
> <deleteAll/> ?
>
> We're still in development and so our schema is wavering. Anytime we
> make a change and want to re-index we first have to:
>
> stop tomcat (or the solr webapp)
> manually remove the data/index
> restart tomcat (or the solr webapp)
>
> The removing of the data/index directory is where we have the most
> trouble, because of the file permissions. The data/index directory is
> owned by tomcat/tomcat so in order to remove it, we have to issue
> sudo rm which we'd like to avoid.
>
> Ideally if we could just tell Solr to delete all data without having
> to do anymore manual work, it'd be great! : )
>
> Something else that would help is if we tell Tomcat/Solr which user/
> group and/or permission to use on the data/index directory when it's
> created.
>
> Any thoughts on this?
>
> Matt
>
Reply | Threaded
Open this post in threaded view
|

Re: Delete entire index

Yonik Seeley-2
In reply to this post by Matt Mitchell-2
I agree... and the fix really belongs at the Lucene level:
https://issues.apache.org/jira/browse/LUCENE-932

-Yonik

On 6/13/07, Matt Mitchell <[hidden email]> wrote:

> Hi,
> Is there a way to have Solr completely remove the current index?
> <deleteAll/> ?
>
> We're still in development and so our schema is wavering. Anytime we
> make a change and want to re-index we first have to:
>
> stop tomcat (or the solr webapp)
> manually remove the data/index
> restart tomcat (or the solr webapp)
>
> The removing of the data/index directory is where we have the most
> trouble, because of the file permissions. The data/index directory is
> owned by tomcat/tomcat so in order to remove it, we have to issue
> sudo rm which we'd like to avoid.
>
> Ideally if we could just tell Solr to delete all data without having
> to do anymore manual work, it'd be great! : )
>
> Something else that would help is if we tell Tomcat/Solr which user/
> group and/or permission to use on the data/index directory when it's
> created.
>
> Any thoughts on this?
>
> Matt
Reply | Threaded
Open this post in threaded view
|

Re: Delete entire index

Chris Hostetter-3
In reply to this post by Thiago Jackiw-2

: I could be wrong, but I think you can send a "delete by query" syntax:
: <delete><query>*:*</query></delete>

correct ... deleting *:* followed by doing an <optimize/> should be
functionally equivilent to stoping the servlet container, removing the
directory and starting the sorl container ... but it won't be quite as
efficient.

if someone wants to open a Jira issue for supporting a
<delete><all/></delete> command (or even just a special case optimization
when <delete><query>*:*</query></delete> is used) I'm certianly in favor
of the idea  ... getting a patch to implement it is another matter :)




-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Delete entire index

Yonik Seeley-2
On 6/13/07, Chris Hostetter <[hidden email]> wrote:
>
> : I could be wrong, but I think you can send a "delete by query" syntax:
> : <delete><query>*:*</query></delete>
>
> correct ... deleting *:* followed by doing an <optimize/> should be
> functionally equivilent to stoping the servlet container, removing the
> directory and starting the sorl container ... but it won't be quite as
> efficient.

Actually, it's not quite equivalent if there was a schema change.
There are some "sticky" field properties that are per-segment global.
For example, if you added omitNorms="true" to a field, then did
delete *:* and reindexed, you would most likely still end up with
norms in your index because of how segments are merged.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Delete entire index

Chris Hostetter-3
: > correct ... deleting *:* followed by doing an <optimize/> should be
: > functionally equivilent to stoping the servlet container, removing the
: > directory and starting the sorl container ... but it won't be quite as
: > efficient.
:
: Actually, it's not quite equivalent if there was a schema change.
: There are some "sticky" field properties that are per-segment global.
: For example, if you added omitNorms="true" to a field, then did

Hmmm... I thought the optimize would take care of that?  Doesn't merging
segments removes the fieldinfo for fields not use ... and with all docs
deleted, all of hte fields are not in use, so wouldn't all of the
fieldinfo be deleted?


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Delete entire index

Yonik Seeley-2
On 6/13/07, Chris Hostetter <[hidden email]> wrote:

> : > correct ... deleting *:* followed by doing an <optimize/> should be
> : > functionally equivilent to stoping the servlet container, removing the
> : > directory and starting the sorl container ... but it won't be quite as
> : > efficient.
> :
> : Actually, it's not quite equivalent if there was a schema change.
> : There are some "sticky" field properties that are per-segment global.
> : For example, if you added omitNorms="true" to a field, then did
>
> Hmmm... I thought the optimize would take care of that?

Oh yes, sorry, I was thinking about optimize after you reindexed.  If
you forget to do optimize, you get a different index though...
definitely spooky stuff to someone not expecting it.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Delete entire index

Ryan McKinley
>> :
>> : Actually, it's not quite equivalent if there was a schema change.
>> : There are some "sticky" field properties that are per-segment global.
>> : For example, if you added omitNorms="true" to a field, then did
>>
>> Hmmm... I thought the optimize would take care of that?
>
> Oh yes, sorry, I was thinking about optimize after you reindexed.  If
> you forget to do optimize, you get a different index though...
> definitely spooky stuff to someone not expecting it.
>


Is there an easy way to check if the lucene per/field properties are out
of sync with the solr schema?  If so, maybe we should display it on the
admin page.

Are there other sticky field properties besides omitNorms?

I know I have made changes to a production server where I:
  1. change the field definition for a field
  2. get the last indexed time for a document of that type
  3. index all documents of that type
  4. delete everything of that type not addded since the start time
  5. optimize

It appeared to work fine...

Reply | Threaded
Open this post in threaded view
|

Re: Delete entire index

Yonik Seeley-2
On 6/13/07, Ryan McKinley <[hidden email]> wrote:

> >> :
> >> : Actually, it's not quite equivalent if there was a schema change.
> >> : There are some "sticky" field properties that are per-segment global.
> >> : For example, if you added omitNorms="true" to a field, then did
> >>
> >> Hmmm... I thought the optimize would take care of that?
> >
> > Oh yes, sorry, I was thinking about optimize after you reindexed.  If
> > you forget to do optimize, you get a different index though...
> > definitely spooky stuff to someone not expecting it.
> >
>
>
> Is there an easy way to check if the lucene per/field properties are out
> of sync with the solr schema?  If so, maybe we should display it on the
> admin page.

It would be rare, and give false assurances I think.  There are *many*
types of schema changes that we would not be able to detect the
incompatibility.  If people have any doubt, they should re-index.

> Are there other sticky field properties besides omitNorms?

Yes, this function is what merges the FieldInfo from different
segments (from FieldInfos.java):
  public FieldInfo add(String name, boolean isIndexed, boolean storeTermVector,
                       boolean storePositionWithTermVector, boolean
storeOffsetWithTermVector,
                       boolean omitNorms, boolean storePayloads) {
    FieldInfo fi = fieldInfo(name);
    if (fi == null) {
      return addInternal(name, isIndexed, storeTermVector,
storePositionWithTermVector, storeOffsetWithTermVector, omitNorms,
storePayloads);
    } else {
      if (fi.isIndexed != isIndexed) {
        fi.isIndexed = true;                      // once indexed, always index
      }
      if (fi.storeTermVector != storeTermVector) {
        fi.storeTermVector = true;                // once vector, always vector
      }
      if (fi.storePositionWithTermVector != storePositionWithTermVector) {
        fi.storePositionWithTermVector = true;                // once
vector, always vector
      }
      if (fi.storeOffsetWithTermVector != storeOffsetWithTermVector) {
        fi.storeOffsetWithTermVector = true;                // once
vector, always vector
      }
      if (fi.omitNorms != omitNorms) {
        fi.omitNorms = false;                // once norms are stored,
always store
      }
      if (fi.storePayloads != storePayloads) {
        fi.storePayloads = true;
      }

    }
    return fi;
  }

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Delete entire index

Otis Gospodnetic-2
In reply to this post by Matt Mitchell-2
I think this would be useful.  The other day I hit this problem of fq=.... not working.  It turned out that the schema was changed (some non-indexed fields  were made indexed), the bulk upload was done, but that bulk upload left the old index files in place, so ended up with "double index" within the same index dir.

Otis



----- Original Message ----
From: Ryan McKinley <[hidden email]>
To: [hidden email]
Sent: Wednesday, June 13, 2007 3:40:52 PM
Subject: Re: Delete entire index

>> :
>> : Actually, it's not quite equivalent if there was a schema change.
>> : There are some "sticky" field properties that are per-segment global.
>> : For example, if you added omitNorms="true" to a field, then did
>>
>> Hmmm... I thought the optimize would take care of that?
>
> Oh yes, sorry, I was thinking about optimize after you reindexed.  If
> you forget to do optimize, you get a different index though...
> definitely spooky stuff to someone not expecting it.
>


Is there an easy way to check if the lucene per/field properties are out
of sync with the solr schema?  If so, maybe we should display it on the
admin page.

Are there other sticky field properties besides omitNorms?

I know I have made changes to a production server where I:
  1. change the field definition for a field
  2. get the last indexed time for a document of that type
  3. index all documents of that type
  4. delete everything of that type not addded since the start time
  5. optimize

It appeared to work fine...