[jira] Created: (SOLR-830) snappuller picks bad snapshot name

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (SOLR-830) snappuller picks bad snapshot name

Hudson (Jira)
snappuller picks bad snapshot name
----------------------------------

                 Key: SOLR-830
                 URL: https://issues.apache.org/jira/browse/SOLR-830
             Project: Solr
          Issue Type: Bug
          Components: replication
            Reporter: Hoss Man


as mentioned on the mailing list...

http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
{quote}
We're seeing strange behavior on one of our slave nodes after replication.
When the new searcher is created we see FileNotFoundExceptions in the log
and the index is strangely invalid/corrupted.

We may have identified the root cause but wanted to run it by the community.
We figure there is a bug in the snappuller shell script, line 181:

snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"`

This line determines the directory name of the latest snapshot to download
to the slave from the master.  Problem with this line is that it grab the
temporary work directory of a snapshot in progress.  Those temporary
directories are prefixed with  "temp" and as far as I can tell should never
get pulled from the master so its easy to disambiguate.  It seems that this
temp directory, if it exists will be the newest one so if present it will be
the one replicated: FAIL.
{quote}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-830) snappuller picks bad snapshot name

Hudson (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated SOLR-830:
--------------------------

    Description:
as mentioned on the mailing list...

http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
{noformat}
We're seeing strange behavior on one of our slave nodes after replication.
When the new searcher is created we see FileNotFoundExceptions in the log
and the index is strangely invalid/corrupted.

We may have identified the root cause but wanted to run it by the community.
We figure there is a bug in the snappuller shell script, line 181:

snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"`

This line determines the directory name of the latest snapshot to download
to the slave from the master.  Problem with this line is that it grab the
temporary work directory of a snapshot in progress.  Those temporary
directories are prefixed with  "temp" and as far as I can tell should never
get pulled from the master so its easy to disambiguate.  It seems that this
temp directory, if it exists will be the newest one so if present it will be
the one replicated: FAIL.
{noformat}

  was:
as mentioned on the mailing list...

http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
{quote}
We're seeing strange behavior on one of our slave nodes after replication.
When the new searcher is created we see FileNotFoundExceptions in the log
and the index is strangely invalid/corrupted.

We may have identified the root cause but wanted to run it by the community.
We figure there is a bug in the snappuller shell script, line 181:

snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"`

This line determines the directory name of the latest snapshot to download
to the slave from the master.  Problem with this line is that it grab the
temporary work directory of a snapshot in progress.  Those temporary
directories are prefixed with  "temp" and as far as I can tell should never
get pulled from the master so its easy to disambiguate.  It seems that this
temp directory, if it exists will be the newest one so if present it will be
the one replicated: FAIL.
{quote}


> snappuller picks bad snapshot name
> ----------------------------------
>
>                 Key: SOLR-830
>                 URL: https://issues.apache.org/jira/browse/SOLR-830
>             Project: Solr
>          Issue Type: Bug
>          Components: replication
>            Reporter: Hoss Man
>
> as mentioned on the mailing list...
> http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
> {noformat}
> We're seeing strange behavior on one of our slave nodes after replication.
> When the new searcher is created we see FileNotFoundExceptions in the log
> and the index is strangely invalid/corrupted.
> We may have identified the root cause but wanted to run it by the community.
> We figure there is a bug in the snappuller shell script, line 181:
> snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
> ${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"`
> This line determines the directory name of the latest snapshot to download
> to the slave from the master.  Problem with this line is that it grab the
> temporary work directory of a snapshot in progress.  Those temporary
> directories are prefixed with  "temp" and as far as I can tell should never
> get pulled from the master so its easy to disambiguate.  It seems that this
> temp directory, if it exists will be the newest one so if present it will be
> the one replicated: FAIL.
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-830) snappuller picks bad snapshot name

Hudson (Jira)
In reply to this post by Hudson (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643512#action_12643512 ]

Hoss Man commented on SOLR-830:
-------------------------------

while i haven't noticed this bug myself, and i'm not very "shell script smart", it does seem like it would be a problem ... prior to r529471 snappuller used a find command that required the filenames start with "snapshot." where as the current grep just requires that it contain "snapshot."

the original poster had the following suggested fix...

{noformat}
We've tweaked the line to exclude any directories starting with "temp":
snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
${master_data_dir}|grep 'snapshot\.'|grep -v wip|grep -v temp|sort -r|head
-1"`
{noformat}

...my first reaction would be to change the grep to use an anchored regex, but i'm not sure how portable that is - so perhaps the "grep -v temp" is the way to go.  I'll leave it to people who understand shell scripts better.


> snappuller picks bad snapshot name
> ----------------------------------
>
>                 Key: SOLR-830
>                 URL: https://issues.apache.org/jira/browse/SOLR-830
>             Project: Solr
>          Issue Type: Bug
>          Components: replication
>            Reporter: Hoss Man
>
> as mentioned on the mailing list...
> http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
> {noformat}
> We're seeing strange behavior on one of our slave nodes after replication.
> When the new searcher is created we see FileNotFoundExceptions in the log
> and the index is strangely invalid/corrupted.
> We may have identified the root cause but wanted to run it by the community.
> We figure there is a bug in the snappuller shell script, line 181:
> snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
> ${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"`
> This line determines the directory name of the latest snapshot to download
> to the slave from the master.  Problem with this line is that it grab the
> temporary work directory of a snapshot in progress.  Those temporary
> directories are prefixed with  "temp" and as far as I can tell should never
> get pulled from the master so its easy to disambiguate.  It seems that this
> temp directory, if it exists will be the newest one so if present it will be
> the one replicated: FAIL.
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-830) snappuller picks bad snapshot name

Hudson (Jira)
In reply to this post by Hudson (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated SOLR-830:
--------------------------

    Affects Version/s: 1.2
                       1.3

> snappuller picks bad snapshot name
> ----------------------------------
>
>                 Key: SOLR-830
>                 URL: https://issues.apache.org/jira/browse/SOLR-830
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (scripts)
>    Affects Versions: 1.2, 1.3
>            Reporter: Hoss Man
>
> as mentioned on the mailing list...
> http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
> {noformat}
> We're seeing strange behavior on one of our slave nodes after replication.
> When the new searcher is created we see FileNotFoundExceptions in the log
> and the index is strangely invalid/corrupted.
> We may have identified the root cause but wanted to run it by the community.
> We figure there is a bug in the snappuller shell script, line 181:
> snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
> ${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"`
> This line determines the directory name of the latest snapshot to download
> to the slave from the master.  Problem with this line is that it grab the
> temporary work directory of a snapshot in progress.  Those temporary
> directories are prefixed with  "temp" and as far as I can tell should never
> get pulled from the master so its easy to disambiguate.  It seems that this
> temp directory, if it exists will be the newest one so if present it will be
> the one replicated: FAIL.
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-830) snappuller picks bad snapshot name

Hudson (Jira)
In reply to this post by Hudson (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644004#action_12644004 ]

Bill Au commented on SOLR-830:
------------------------------


I think we should use a regular expression and match against the naming convention of the snapshot (snapshot.YYYYmmddHHMMss).  Here is what I propose:

snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
${master_data_dir}|egrep '^snapshot\.[123456789][0-9]{13}$'|sort -r|head
-1"`

> snappuller picks bad snapshot name
> ----------------------------------
>
>                 Key: SOLR-830
>                 URL: https://issues.apache.org/jira/browse/SOLR-830
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (scripts)
>    Affects Versions: 1.2, 1.3
>            Reporter: Hoss Man
>
> as mentioned on the mailing list...
> http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
> {noformat}
> We're seeing strange behavior on one of our slave nodes after replication.
> When the new searcher is created we see FileNotFoundExceptions in the log
> and the index is strangely invalid/corrupted.
> We may have identified the root cause but wanted to run it by the community.
> We figure there is a bug in the snappuller shell script, line 181:
> snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
> ${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"`
> This line determines the directory name of the latest snapshot to download
> to the slave from the master.  Problem with this line is that it grab the
> temporary work directory of a snapshot in progress.  Those temporary
> directories are prefixed with  "temp" and as far as I can tell should never
> get pulled from the master so its easy to disambiguate.  It seems that this
> temp directory, if it exists will be the newest one so if present it will be
> the one replicated: FAIL.
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Assigned: (SOLR-830) snappuller picks bad snapshot name

Hudson (Jira)
In reply to this post by Hudson (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bill Au reassigned SOLR-830:
----------------------------

    Assignee: Bill Au

> snappuller picks bad snapshot name
> ----------------------------------
>
>                 Key: SOLR-830
>                 URL: https://issues.apache.org/jira/browse/SOLR-830
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (scripts)
>    Affects Versions: 1.2, 1.3
>            Reporter: Hoss Man
>            Assignee: Bill Au
>
> as mentioned on the mailing list...
> http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
> {noformat}
> We're seeing strange behavior on one of our slave nodes after replication.
> When the new searcher is created we see FileNotFoundExceptions in the log
> and the index is strangely invalid/corrupted.
> We may have identified the root cause but wanted to run it by the community.
> We figure there is a bug in the snappuller shell script, line 181:
> snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
> ${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"`
> This line determines the directory name of the latest snapshot to download
> to the slave from the master.  Problem with this line is that it grab the
> temporary work directory of a snapshot in progress.  Those temporary
> directories are prefixed with  "temp" and as far as I can tell should never
> get pulled from the master so its easy to disambiguate.  It seems that this
> temp directory, if it exists will be the newest one so if present it will be
> the one replicated: FAIL.
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-830) snappuller picks bad snapshot name

Hudson (Jira)
In reply to this post by Hudson (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644011#action_12644011 ]

Hoss Man commented on SOLR-830:
-------------------------------

that's kind of what i thinking, but i wasn't sure how portable egrep is.

> snappuller picks bad snapshot name
> ----------------------------------
>
>                 Key: SOLR-830
>                 URL: https://issues.apache.org/jira/browse/SOLR-830
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (scripts)
>    Affects Versions: 1.2, 1.3
>            Reporter: Hoss Man
>            Assignee: Bill Au
>
> as mentioned on the mailing list...
> http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
> {noformat}
> We're seeing strange behavior on one of our slave nodes after replication.
> When the new searcher is created we see FileNotFoundExceptions in the log
> and the index is strangely invalid/corrupted.
> We may have identified the root cause but wanted to run it by the community.
> We figure there is a bug in the snappuller shell script, line 181:
> snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
> ${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"`
> This line determines the directory name of the latest snapshot to download
> to the slave from the master.  Problem with this line is that it grab the
> temporary work directory of a snapshot in progress.  Those temporary
> directories are prefixed with  "temp" and as far as I can tell should never
> get pulled from the master so its easy to disambiguate.  It seems that this
> temp directory, if it exists will be the newest one so if present it will be
> the one replicated: FAIL.
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-830) snappuller picks bad snapshot name

Hudson (Jira)
In reply to this post by Hudson (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644037#action_12644037 ]

Steven Rowe commented on SOLR-830:
----------------------------------

From the egrep(1) man page:

{noformat}
Traditional egrep did not support the { metacharacter, and some egrep
implementations  support \{ instead, so portable scripts should avoid {
in egrep patterns and should use [{] to match a literal {.
{noformat}

Perl is pretty portable, and has a stable regex language - how about (untested):

{code}
snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} \
"perl -e 'chdir q/${master_data_dir}/; print ((sort grep {/^snapshot[.][1-9][0-9]{13}\$/} <*>)[-1])'"`
{code}


> snappuller picks bad snapshot name
> ----------------------------------
>
>                 Key: SOLR-830
>                 URL: https://issues.apache.org/jira/browse/SOLR-830
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (scripts)
>    Affects Versions: 1.2, 1.3
>            Reporter: Hoss Man
>            Assignee: Bill Au
>
> as mentioned on the mailing list...
> http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
> {noformat}
> We're seeing strange behavior on one of our slave nodes after replication.
> When the new searcher is created we see FileNotFoundExceptions in the log
> and the index is strangely invalid/corrupted.
> We may have identified the root cause but wanted to run it by the community.
> We figure there is a bug in the snappuller shell script, line 181:
> snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
> ${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"`
> This line determines the directory name of the latest snapshot to download
> to the slave from the master.  Problem with this line is that it grab the
> temporary work directory of a snapshot in progress.  Those temporary
> directories are prefixed with  "temp" and as far as I can tell should never
> get pulled from the master so its easy to disambiguate.  It seems that this
> temp directory, if it exists will be the newest one so if present it will be
> the one replicated: FAIL.
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-830) snappuller picks bad snapshot name

Hudson (Jira)
In reply to this post by Hudson (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated SOLR-830:
----------------------------------

    Fix Version/s: 1.3.1

> snappuller picks bad snapshot name
> ----------------------------------
>
>                 Key: SOLR-830
>                 URL: https://issues.apache.org/jira/browse/SOLR-830
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (scripts)
>    Affects Versions: 1.2, 1.3
>            Reporter: Hoss Man
>            Assignee: Bill Au
>             Fix For: 1.3.1
>
>
> as mentioned on the mailing list...
> http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
> {noformat}
> We're seeing strange behavior on one of our slave nodes after replication.
> When the new searcher is created we see FileNotFoundExceptions in the log
> and the index is strangely invalid/corrupted.
> We may have identified the root cause but wanted to run it by the community.
> We figure there is a bug in the snappuller shell script, line 181:
> snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
> ${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"`
> This line determines the directory name of the latest snapshot to download
> to the slave from the master.  Problem with this line is that it grab the
> temporary work directory of a snapshot in progress.  Those temporary
> directories are prefixed with  "temp" and as far as I can tell should never
> get pulled from the master so its easy to disambiguate.  It seems that this
> temp directory, if it exists will be the newest one so if present it will be
> the one replicated: FAIL.
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-830) snappuller picks bad snapshot name

Hudson (Jira)
In reply to this post by Hudson (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647331#action_12647331 ]

Bill Au commented on SOLR-830:
------------------------------

Steve, thanks for the perl code.  I need to get rid of the "\" before the "$" in order to get it to work for me:

perl -e 'chdir q/${master_data_dir}/; print ((sort grep {/^snapshot[.][1-9][0-9]{13}$/} <*>)[-1])'


I have tested this on Linux and FreeBSD.  I will test on Mac OS X tonight.  It will be good if someone can do a quick test on Solaris.  You really don't need a full brown Solr installation to test it.  Just create some dummy directory with various names like:

snapshot.00080527124131
snapshot.20080527124131
snapshot.20080527124131-wip
snapshot.20080527140518
snapshot.20080527140610
snapshot.20081113113700
snapshot.2080527124131
temp-snapshot.20080527124131

and then run the perl command to make sure the right one is returned.  With the data set above, you should get:

snapshot.20081113113700

> snappuller picks bad snapshot name
> ----------------------------------
>
>                 Key: SOLR-830
>                 URL: https://issues.apache.org/jira/browse/SOLR-830
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (scripts)
>    Affects Versions: 1.2, 1.3
>            Reporter: Hoss Man
>            Assignee: Bill Au
>             Fix For: 1.3.1
>
>
> as mentioned on the mailing list...
> http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
> {noformat}
> We're seeing strange behavior on one of our slave nodes after replication.
> When the new searcher is created we see FileNotFoundExceptions in the log
> and the index is strangely invalid/corrupted.
> We may have identified the root cause but wanted to run it by the community.
> We figure there is a bug in the snappuller shell script, line 181:
> snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
> ${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"`
> This line determines the directory name of the latest snapshot to download
> to the slave from the master.  Problem with this line is that it grab the
> temporary work directory of a snapshot in progress.  Those temporary
> directories are prefixed with  "temp" and as far as I can tell should never
> get pulled from the master so its easy to disambiguate.  It seems that this
> temp directory, if it exists will be the newest one so if present it will be
> the one replicated: FAIL.
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-830) snappuller picks bad snapshot name

Hudson (Jira)
In reply to this post by Hudson (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647345#action_12647345 ]

Steven Rowe commented on SOLR-830:
----------------------------------

Bill, I ran the following script on a Linux box against a remote Solaris 2.8 box, with Perl 5.8.0 installed, on which I created the set of empty directories you listed:

{code}
master_host=remotesolaris
master_data_dir=/tmp/solrtest
snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} \
"perl -e 'chdir q|${master_data_dir}|; print ((sort grep {/^snapshot[.][1-9][0-9]{13}$/} <*>)[-1])'"`
echo ${snap_name}
{code}

N.B.: I had to change the quoting boundary character around ${master_data_dir}, from '/' to '|', since ${master_data_dir} contains '/' characters -- I think '|' is really unlikely to be a pathname component.

It worked for me both with and without the '\' before the '$' -- I get the following output from the script:

snapshot.20081113113700

> snappuller picks bad snapshot name
> ----------------------------------
>
>                 Key: SOLR-830
>                 URL: https://issues.apache.org/jira/browse/SOLR-830
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (scripts)
>    Affects Versions: 1.2, 1.3
>            Reporter: Hoss Man
>            Assignee: Bill Au
>             Fix For: 1.3.1
>
>
> as mentioned on the mailing list...
> http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
> {noformat}
> We're seeing strange behavior on one of our slave nodes after replication.
> When the new searcher is created we see FileNotFoundExceptions in the log
> and the index is strangely invalid/corrupted.
> We may have identified the root cause but wanted to run it by the community.
> We figure there is a bug in the snappuller shell script, line 181:
> snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
> ${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"`
> This line determines the directory name of the latest snapshot to download
> to the slave from the master.  Problem with this line is that it grab the
> temporary work directory of a snapshot in progress.  Those temporary
> directories are prefixed with  "temp" and as far as I can tell should never
> get pulled from the master so its easy to disambiguate.  It seems that this
> temp directory, if it exists will be the newest one so if present it will be
> the one replicated: FAIL.
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-830) snappuller picks bad snapshot name

Hudson (Jira)
In reply to this post by Hudson (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647384#action_12647384 ]

Bill Au commented on SOLR-830:
------------------------------

Steven, thanks for testing on Solaris.  It looks like on Linux and FreeBSD that the '\' in front of '$' escape the special meaning of '$' so it is trying to match against the literal '$' after all the digits (ie snapshot.20080527124131$).  Unless this does not work on Mac OS X, I will go with perl without the '\' before the '$'.  I will attach a patch here after I test on my Mac tonight.

> snappuller picks bad snapshot name
> ----------------------------------
>
>                 Key: SOLR-830
>                 URL: https://issues.apache.org/jira/browse/SOLR-830
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (scripts)
>    Affects Versions: 1.2, 1.3
>            Reporter: Hoss Man
>            Assignee: Bill Au
>             Fix For: 1.3.1
>
>
> as mentioned on the mailing list...
> http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
> {noformat}
> We're seeing strange behavior on one of our slave nodes after replication.
> When the new searcher is created we see FileNotFoundExceptions in the log
> and the index is strangely invalid/corrupted.
> We may have identified the root cause but wanted to run it by the community.
> We figure there is a bug in the snappuller shell script, line 181:
> snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
> ${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"`
> This line determines the directory name of the latest snapshot to download
> to the slave from the master.  Problem with this line is that it grab the
> temporary work directory of a snapshot in progress.  Those temporary
> directories are prefixed with  "temp" and as far as I can tell should never
> get pulled from the master so its easy to disambiguate.  It seems that this
> temp directory, if it exists will be the newest one so if present it will be
> the one replicated: FAIL.
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-830) snappuller picks bad snapshot name

Hudson (Jira)
In reply to this post by Hudson (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648260#action_12648260 ]

Bill Au commented on SOLR-830:
------------------------------

It works on Mac OS X.

> snappuller picks bad snapshot name
> ----------------------------------
>
>                 Key: SOLR-830
>                 URL: https://issues.apache.org/jira/browse/SOLR-830
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (scripts)
>    Affects Versions: 1.2, 1.3
>            Reporter: Hoss Man
>            Assignee: Bill Au
>             Fix For: 1.3.1
>
>         Attachments: solr-830.patch
>
>
> as mentioned on the mailing list...
> http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
> {noformat}
> We're seeing strange behavior on one of our slave nodes after replication.
> When the new searcher is created we see FileNotFoundExceptions in the log
> and the index is strangely invalid/corrupted.
> We may have identified the root cause but wanted to run it by the community.
> We figure there is a bug in the snappuller shell script, line 181:
> snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
> ${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"`
> This line determines the directory name of the latest snapshot to download
> to the slave from the master.  Problem with this line is that it grab the
> temporary work directory of a snapshot in progress.  Those temporary
> directories are prefixed with  "temp" and as far as I can tell should never
> get pulled from the master so its easy to disambiguate.  It seems that this
> temp directory, if it exists will be the newest one so if present it will be
> the one replicated: FAIL.
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-830) snappuller picks bad snapshot name

Hudson (Jira)
In reply to this post by Hudson (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bill Au updated SOLR-830:
-------------------------

    Attachment: solr-830.patch

patch to use perl regular expression to improve accuracy in finding latest snapshot.

> snappuller picks bad snapshot name
> ----------------------------------
>
>                 Key: SOLR-830
>                 URL: https://issues.apache.org/jira/browse/SOLR-830
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (scripts)
>    Affects Versions: 1.2, 1.3
>            Reporter: Hoss Man
>            Assignee: Bill Au
>             Fix For: 1.3.1
>
>         Attachments: solr-830.patch
>
>
> as mentioned on the mailing list...
> http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
> {noformat}
> We're seeing strange behavior on one of our slave nodes after replication.
> When the new searcher is created we see FileNotFoundExceptions in the log
> and the index is strangely invalid/corrupted.
> We may have identified the root cause but wanted to run it by the community.
> We figure there is a bug in the snappuller shell script, line 181:
> snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
> ${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"`
> This line determines the directory name of the latest snapshot to download
> to the slave from the master.  Problem with this line is that it grab the
> temporary work directory of a snapshot in progress.  Those temporary
> directories are prefixed with  "temp" and as far as I can tell should never
> get pulled from the master so its easy to disambiguate.  It seems that this
> temp directory, if it exists will be the newest one so if present it will be
> the one replicated: FAIL.
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (SOLR-830) snappuller picks bad snapshot name

Hudson (Jira)
In reply to this post by Hudson (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bill Au resolved SOLR-830.
--------------------------

    Resolution: Fixed

Patch committed:

Sending        CHANGES.txt
Sending        src/scripts/snappuller
Transmitting file data ..
Committed revision 719233.


> snappuller picks bad snapshot name
> ----------------------------------
>
>                 Key: SOLR-830
>                 URL: https://issues.apache.org/jira/browse/SOLR-830
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (scripts)
>    Affects Versions: 1.2, 1.3
>            Reporter: Hoss Man
>            Assignee: Bill Au
>             Fix For: 1.3.1
>
>         Attachments: solr-830.patch
>
>
> as mentioned on the mailing list...
> http://www.nabble.com/FileNotFoundException-on-slave-after-replication---script-bug--to20111313.html#a20111313
> {noformat}
> We're seeing strange behavior on one of our slave nodes after replication.
> When the new searcher is created we see FileNotFoundExceptions in the log
> and the index is strangely invalid/corrupted.
> We may have identified the root cause but wanted to run it by the community.
> We figure there is a bug in the snappuller shell script, line 181:
> snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} "ls
> ${master_data_dir}|grep 'snapshot\.'|grep -v wip|sort -r|head -1"`
> This line determines the directory name of the latest snapshot to download
> to the slave from the master.  Problem with this line is that it grab the
> temporary work directory of a snapshot in progress.  Those temporary
> directories are prefixed with  "temp" and as far as I can tell should never
> get pulled from the master so its easy to disambiguate.  It seems that this
> temp directory, if it exists will be the newest one so if present it will be
> the one replicated: FAIL.
> {noformat}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.