EmbeddedSolrServer and StreamingUpdateSolrServer

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

EmbeddedSolrServer and StreamingUpdateSolrServer

pcrao
Hi,

I am using EmbeddedSolrServer for full indexing (Multi core)
and StreamingUpdateSolrServer for incremental indexing.
The steps involved are mentioned below.

Full indexing (Daily)
1) Start EmbeddedSolrServer
2) Delete all docs
3) Add all docs
4) Commit and optimize collection
5) Stop EmbeddedSolrServer
6) Reload core
    http://localhost:7070/solr/admin/cores?action=RELOAD&core=docs

Incremental Indexing (Hourly)
1) Start StreamingUpdateSolrServer
2) Add/Delete docs
3) Commit collection

Now, the issue is the index is getting corrupted if we do Full indexing
and incremental indexing one after the other without restarting
the Tomcat web server (localhost:7070). There is no issue if we
restart the Tomcat after each of the indexing processes (Full and incremental).

Please let me know how can we avoid corrupting the index without restarting the
Tomcat. I am fairly new to Solr, so I may be missing something here.

Below are some details about our Solr Installation.
1) JVM                       OpenJDK 64-Bit Server VM (19.0-b09)
2) solr-spec-version     4.0.0.2011.12.08.06.33.52
3) solr-impl-version      4.0-SNAPSHOT 1211898 - root - 2011-12-08 06:33:52
4) lucene-spec-version 4.0-SNAPSHOT
5) lucene-impl-version  4.0-SNAPSHOT 1211898 - root - 2011-12-08 06:24:12
6) OS                        Red Hat Enterprise Linux Server release 6.1 (Santiago)

Thanks,
PC Rao.
Reply | Threaded
Open this post in threaded view
|

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

pcrao
Hi,

Any update on this?
Please let me know if you need additional information on this.

Thanks,
PC Rao.
Reply | Threaded
Open this post in threaded view
|

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

Mikhail Khludnev
Hi,

it's hard to help until you tell us why you think that index is corrupted.
Logs&steps&stacktraces are useful.

Regards

On Wed, Apr 11, 2012 at 2:56 PM, pcrao <[hidden email]> wrote:

> Hi,
>
> Any update on this?
> Please let me know if you need additional information on this.
>
> Thanks,
> PC Rao.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-and-StreamingUpdateSolrServer-tp3889073p3902171.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



--
Sincerely yours
Mikhail Khludnev
[hidden email]

<http://www.griddynamics.com>
 <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

pcrao
Hi Mikhail Khludnev,

Thank you for the reply.
I think the index is getting corrupted because StreamingUpdateSolrServer is keeping reference
to some index files that are being deleted by EmbeddedSolrServer during commit/optimize process.
As a result when I Index(Full) using EmbeddedSolrServer and then do Incremental index using StreamingUpdateSolrServer it fails with a FileNotFound exception.
 A special note: we don't optimize the index after Incremental indexing(StreamingUpdateSolrServer) but we do optimize it after the Full index(EmbeddedSolrServer). Please see the below log and let me know
if you need further information.
-------------------------------------------------------------------------------------------
Mar 29, 2012 12:05:03 AM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {add=[035405]} 0 28
Mar 29, 2012 12:05:03 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update/extract params={stream.type=text/html&literal.stream_source_info=/snps/docs/customer/q_and_a/html/035405.html&literal.stream_name=035405.html&wt=javabin&collectionName=docs&version=2} status=0 QTime=28
Mar 29, 2012 12:05:03 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitSearcher=true,expungeDeletes=false,softCommit=false)
Mar 29, 2012 12:05:03 AM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {commit=} 0 10
Mar 29, 2012 12:05:03 AM org.apache.solr.common.SolrException log
SEVERE: java.io.FileNotFoundException: /opt/solr/home/data/docs_index/index/_3d.cfs (No such file or directory)
        at java.io.RandomAccessFile.open(Native Method)
        at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
        at org.apache.lucene.store.MMapDirectory.createSlicer(MMapDirectory.java:229)
        at org.apache.lucene.store.CompoundFileDirectory.<init>(CompoundFileDirectory.java:65)
        at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:82)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:112)
        at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:700)
        at org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:263)
        at org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2852)
        at org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2843)
        at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2616)
        at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2731)
        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2719)
        at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2703)
        at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:325)
        at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:84)
        at org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:154)
        at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:107)
        at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:52)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1477)
        at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
-----------------------------------------------------------------------------------------------------

Thanks,
PC Rao.
Reply | Threaded
Open this post in threaded view
|

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

Shawn Heisey-4
On 4/12/2012 4:52 AM, pcrao wrote:

> I think the index is getting corrupted because StreamingUpdateSolrServer is
> keeping reference
> to some index files that are being deleted by EmbeddedSolrServer during
> commit/optimize process.
> As a result when I Index(Full) using EmbeddedSolrServer and then do
> Incremental index using StreamingUpdateSolrServer it fails with a
> FileNotFound exception.
>   A special note: we don't optimize the index after Incremental
> indexing(StreamingUpdateSolrServer) but we do optimize it after the Full
> index(EmbeddedSolrServer). Please see the below log and let me know
> if you need further information.

I am a relative newbie to all this, and I've never used
EmbeddedSolrServer, only CommonsHttpSolrServer and
StreamingUpdateSolrServer.  I'm not even sure the embedded object is an
option unless your program is running in the same JVM as Solr.  Mine is
separate.

If I am right about ESS needing to be in the same JVM as Solr, then that
means it can do a more direct interaction with Solr and therefore might
not be coordinated with the HTTP access that SUSS uses.  I have read
multiple times that the developers don't recommend using ESS.  If you
are going to use it, you probably have to do everything with it.

SUSS does everything in the background, so you have no guarantees as to
when it will happen, as well as no ability to check for completion or
errors.  Because of the lack of error detection, I had to stop using SUSS.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

pcrao
Hi Shawn,

Thanks for sharing your opinion.

Mikhail Khludnev, what do you think of Shawn's opinion?

Thanks,
PC Rao.
Reply | Threaded
Open this post in threaded view
|

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

Mikhail Khludnev
Did I get right that you have two separate processes (different app) access
the same LuceneDIrectory simultaneously? In this case I suggest to read
about Locking mechanism. I'm not really experienced in it.
You showed logs from StrUpdHandler failure, it's clear. Can you show logs
from Embeded server commit, which is supposed to be successful?

On Fri, Apr 13, 2012 at 9:34 AM, pcrao <[hidden email]> wrote:

> Hi Shawn,
>
> Thanks for sharing your opinion.
>
> Mikhail Khludnev, what do you think of Shawn's opinion?
>
> Thanks,
> PC Rao.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-and-StreamingUpdateSolrServer-tp3889073p3907223.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



--
Sincerely yours
Mikhail Khludnev
[hidden email]

<http://www.griddynamics.com>
 <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

pcrao
Hi Mikhail Khludnev,

You are partially right. i.e. We have two separate processes accessing the same Lucene Directory
but they do not run simultaneously. They run one after the other and only after the first one is
completed. The commit from the EmbeddedServer is successful and I am posting the log below.

---------------------------------------------------------------------------------------

INFO: [] webapp=null path=/update/extract params={stream.type=text%2Fhtml&collectionName=docs} status=0 QTime=5
Apr 5, 2012 7:28:34 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitSearcher=true,expungeDeletes=false,softCommit=false)
Apr 5, 2012 7:28:34 AM org.apache.solr.core.SolrDeletionPolicy onCommit
INFO: SolrDeletionPolicy.onCommit: commits:num=2
        commit{dir=/opt/solr/home/data/docs_index/index,segFN=segments_4v,version=1333471748253,generation=175,filenames=[_5a.fdt, _5a_0.tip, _5a.fdx, _5a.tvf, _5a.tvx, segments_4v, _5a.tvd, _5a_0.prx, _5a.per, _5a_0.frq, _5a_0.tim, _5a.fnm]
        commit{dir=/opt/solr/home/data/docs_index/index,segFN=segments_4w,version=1333471748256,generation=176,filenames=[_5b.fnm, _5b.tvd, _5b.tvf, _5b_0.tip, _5b.nrm, _5b_0.tim, _5b.fdx, _5b_0.prx, _5b_0.frq, segments_4w, _5b.tvx, _5b.per, _5b.fdt]
Apr 5, 2012 7:28:34 AM org.apache.solr.core.SolrDeletionPolicy updateCommits
INFO: newest commit = 1333471748256
Apr 5, 2012 7:28:34 AM org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening Searcher@17c232ee main
Apr 5, 2012 7:28:34 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Apr 5, 2012 7:28:34 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming Searcher@17c232ee main{DirectoryReader(segments_4w:1333471748256 _5b(4.0):Cv1000)} from Searcher@658f7386 main{DirectoryReader(segments_4v:1333471748253 _5a(4.0):Cv16787)}
        fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 5, 2012 7:28:34 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for Searcher@17c232ee main{DirectoryReader(segments_4w:1333471748256 _5b(4.0):Cv1000)}
        fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 5, 2012 7:28:34 AM org.apache.solr.core.SolrCore registerSearcher
INFO: [] Registered new searcher Searcher@17c232ee main{DirectoryReader(segments_4w:1333471748256 _5b(4.0):Cv1000)}
Apr 5, 2012 7:28:34 AM org.apache.solr.search.SolrIndexSearcher close
INFO: Closing Searcher@658f7386 main
        fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 5, 2012 7:28:34 AM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {commit=} 0 344
Apr 5, 2012 7:28:34 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=null path=/update params={commit=true&waitSearcher=true} status=0 QTime=344
Apr 5, 2012 7:28:34 AM org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {add=[009658]} 0 9
Apr 5, 2012 7:28:34 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=null path=/update/extract params={stream.type=text%2Fhtml&collectionName=docs} status=0 QTime=9
-------------------------------------------------------------------------------------------------

Please let me know your thoughts.

Thanks,
PC Rao.
Reply | Threaded
Open this post in threaded view
|

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

pcrao
Hi,

Any update?
Thanks,
PC Rao
Reply | Threaded
Open this post in threaded view
|

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

Mikhail Khludnev
To be honest I have no idea. Can you try to shutdown the first process JVM
after it's complete indexing and start second JVM only after that. Whether
it work?
which version of Solr you are running?

On Fri, Apr 20, 2012 at 8:14 AM, pcrao <[hidden email]> wrote:

> Hi,
>
> Any update?
> Thanks,
> PC Rao
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-and-StreamingUpdateSolrServer-tp3889073p3925014.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



--
Sincerely yours
Mikhail Khludnev
[hidden email]

<http://www.griddynamics.com>
 <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

pcrao
Hi Mikhail Khludnev,

THank you for your help.

Let me explain you the scenario about JVM.
The JVM in which tomcat is running will not be restarted every time the StreamingUpdateSolrServer
is running where as the EmbeddedSolrServer is a fresh JVM instance(new process) every time.
In this scenario the index is being corrupted.

If I restart Tomcat(i.e. restart JVM in which StreamingupdateServer is running) after each of the index
completion the index doesn't get corrupted. However, this is not a viable option for us because Solr will
not be available to users during the restart.

Let me know if you have any more thoughts on this.
In case you dont, can you also let me know how can I seek help from others?

Thanks again,
PC Rao.
Reply | Threaded
Open this post in threaded view
|

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

pcrao
Hi,

Any more thoughts??

Thanks,
PC Rao.
Reply | Threaded
Open this post in threaded view
|

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

Ryan McKinley
In general -- i would not suggest mixing EmbeddedSolrServer with a
different style (unless the other instances are read only).  If you
have multiple instances writing to the same files on disk you are
asking for problems.

Have you tried just using StreamingUpdateSolrServer for daily update?
I would suspect that it would be faster then EmbeddedSolrServer
anyway.

ryan



On Wed, Apr 25, 2012 at 11:32 PM, pcrao <[hidden email]> wrote:

> Hi,
>
> Any more thoughts??
>
> Thanks,
> PC Rao.
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-and-StreamingUpdateSolrServer-tp3889073p3940383.html
> Sent from the Solr - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

pcrao
Hi Ryan,

I see.

Yes, for incremental indexing(Hourly) we use StreamingUpdateSolrServer
and it is faster than EmbeddedSolrServer.

We are also using, Embedded server for full indexing on a daily basis and
it is efficient for full indexing as it can handle large number of documents
in a better way.

THanks,
PC Rao.
Reply | Threaded
Open this post in threaded view
|

Re: EmbeddedSolrServer and StreamingUpdateSolrServer

pcrao
Hi,

Can someone officially confirm that it is not supported by current Solr version
to use both EmbeddedSolrServer(For Full indexing) and StreamingUpdateSolrServer(For Incremental indexing )
to update the same index?

How can I request for enhancement in the next version?
I think that this requirement is valid and very useful; Any disagreements?

Thanks,
PC Rao.