Add doc limit - Follow Up

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Add doc limit - Follow Up

sangraal
Hey guys,
You might remember a bunch of emails going back and forth between me and the
very helpful Solr folks a few weeks back. I just wanted to let you know
about what I've learned about the problem in last week or so.

The problem was that I would run into a hard limit of how many documents I
could add to Solr in a single post on Tomcat. It was usually around 5000 -
6000 posts per add, before the system would hang indefinitely. I tried the
Java Solr client and the problem was exactly the same. I'm still unsure of
what exactly causes the problem... but I have now been able to work around
it.

The problem only occurs when adding docs that contain <![CDATA[]]> tags in
the body of the <field> tag. The problem also only seems to cause an add
limit on an individual post. I limited the size of my HTTP posts to 5000
documents per post, and the problem never showed up. You do not need to do a
commit after each batch as I previously thought.

So, like I said, I'm still unsure of what causes this problem. It does seem
to only happen on Tomcat. I've verified that the doc limit does not show up
when running on Jetty. It seems to be some sort of a problem when Solr
attempts to write to the response, but doesn't seem to be an issue with Solr
itself. Again, it only occurs if you have CData tags in your xml as well.

Strange one indeed, but I hope if any of you run into this problem this will
help you out.

-Sangraal
Reply | Threaded
Open this post in threaded view
|

Re: Add doc limit - Follow Up

Yonik Seeley-2
On 8/29/06, sangraal aiken <[hidden email]> wrote:
> The problem only occurs when adding docs that contain <![CDATA[]]> tags in
> the body of the <field> tag. The problem also only seems to cause an add
> limit on an individual post. I limited the size of my HTTP posts to 5000
> documents per post, and the problem never showed up. You do not need to do a
> commit after each batch as I previously thought.

That's very interesting... it sounds like perhaps an XPP (the XML
parser) bug that tomcat manages to tickle.
I looked through the XPP changelogs quick - no mention of a problem
like this being fixed.

-Yonik