Initial import problems

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Initial import problems

Andrew Nagy-2
Hello, I am new to SOLR but very excited for it's possibilities.

I am having some difficulties with my data import which I hope can be
solved very easily.
First I wrote an xslt to transform my xml into the solr schema and
modified the schema.xml to match the fields that I created.  I then ran
the post.sh on my 492,000 records that I have.  Near the end of the
process the records stopped being added due to a memory heap error.  I
obviously maxed the allotted memory for the import.  Next time I will
import less at a time!

I then posted a commit statement.  I went to my solr admin site and
looked at the statistics.  It said 372,000 records (roughly) were stored
and 1 commit.  I tried to do a search but no matter what I search for I
get 0 results.  I even tried title:"the" (assuming it is not blocking
the stop word, it should return something!).

It appears to me that the search is not searching any records.  Any idea
as to what I might need to do, or should I start over from scratch and
re-import my records in smaller chunks?

Thanks!
Andrew
Reply | Threaded
Open this post in threaded view
|

Re: Initial import problems

Yonik Seeley-2
On 12/5/06, Andrew Nagy <[hidden email]> wrote:

> Hello, I am new to SOLR but very excited for it's possibilities.
>
> I am having some difficulties with my data import which I hope can be
> solved very easily.
> First I wrote an xslt to transform my xml into the solr schema and
> modified the schema.xml to match the fields that I created.  I then ran
> the post.sh on my 492,000 records that I have.  Near the end of the
> process the records stopped being added due to a memory heap error.  I
> obviously maxed the allotted memory for the import.  Next time I will
> import less at a time!

Did you increase the JVM heap size?

> I then posted a commit statement.

Correct operation of the server after an OOM exception isn't really
guaranteed (the excpetion  may happen in any thread, in any library,
including that of the app server).

>  I went to my solr admin site and
> looked at the statistics.  It said 372,000 records (roughly) were stored
> and 1 commit.  I tried to do a search but no matter what I search for I
> get 0 results.  I even tried title:"the" (assuming it is not blocking
> the stop word, it should return something!).
>
> It appears to me that the search is not searching any records.  Any idea
> as to what I might need to do, or should I start over from scratch and
> re-import my records in smaller chunks?

That might help, but may not be sufficient if you don't have enough heap memory.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Initial import problems

Mike Klaas
In reply to this post by Andrew Nagy-2
On 12/5/06, Andrew Nagy <[hidden email]> wrote:

> Hello, I am new to SOLR but very excited for it's possibilities.
>
> I am having some difficulties with my data import which I hope can be
> solved very easily.
> First I wrote an xslt to transform my xml into the solr schema and
> modified the schema.xml to match the fields that I created.  I then ran
> the post.sh on my 492,000 records that I have.  Near the end of the
> process the records stopped being added due to a memory heap error.  I
> obviously maxed the allotted memory for the import.  Next time I will
> import less at a time!

Yeah, committing more frequently should help this case.

> I then posted a commit statement.  I went to my solr admin site and
> looked at the statistics.  It said 372,000 records (roughly) were stored
> and 1 commit.  I tried to do a search but no matter what I search for I
> get 0 results.  I even tried title:"the" (assuming it is not blocking
> the stop word, it should return something!).

The schema in the example does include a stop word filter--are you
sure that you aren't blocking stop words?

-MIke
Reply | Threaded
Open this post in threaded view
|

Re: Initial import problems

maustin
I'm having slow performance with my solr index. I'm not sure what to do. I
need some suggestions on what to try. I have updated all my records in the
last couple of days. I'm not sure how much it degraded because of that, but
it now takes about 3 seconds per search. My cache statistics don't look so
good either.

Also... I'm not sure I was supposed to do a couple of things.
    - I did an optimize index through Luke with compound format and noticed
in the solrconfig file that useCompoundFile is set to false.
    - I changed one of the fields in the schema from text_ws to string
    - I added a field (type="text" indexed="false" stored="true")

My schema and solrconfig are the same as the example except I have a few
more fields. My pc is winXP and has 2gig of ram. Below are some stats from
the solr admin stat page.

Thanks!


caching : true
numDocs : 1185814
maxDoc : 2070472
readerImpl : MultiReader

      name:  filterCache
      class:  org.apache.solr.search.LRUCache
      version:  1.0
      description:  LRU Cache(maxSize=512, initialSize=512,
autowarmCount=256,
regenerator=org.apache.solr.search.SolrIndexSearcher$1@d55986)
      stats:  lookups : 658446
      hits : 30
      hitratio : 0.00
      inserts : 658420
      evictions : 657908
      size : 512
      cumulative_lookups : 658446
      cumulative_hits : 30
      cumulative_hitratio : 0.00
      cumulative_inserts : 658420
      cumulative_evictions : 657908


      name:  queryResultCache
      class:  org.apache.solr.search.LRUCache
      version:  1.0
      description:  LRU Cache(maxSize=512, initialSize=512,
autowarmCount=256,
regenerator=org.apache.solr.search.SolrIndexSearcher$2@1b4c1d7)
      stats:  lookups : 88
      hits : 83
      hitratio : 0.94
      inserts : 6
      evictions : 0
      size : 5
      cumulative_lookups : 88
      cumulative_hits : 83
      cumulative_hitratio : 0.94
      cumulative_inserts : 6
      cumulative_evictions : 0


      name:  documentCache
      class:  org.apache.solr.search.LRUCache
      version:  1.0
      description:  LRU Cache(maxSize=512, initialSize=512)
      stats:  lookups : 780
      hits : 738
      hitratio : 0.94
      inserts : 42
      evictions : 0
      size : 42
      cumulative_lookups : 780
      cumulative_hits : 738
      cumulative_hitratio : 0.94
      cumulative_inserts : 42
      cumulative_evictions : 0