Sole for Content Management

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Sole for Content Management

Moenieb Davids-2
Hi All,

Background:
I am currently testing a deployment of a content management framework where
I am trying to punt Solr as the tool of choice for ingestion and searching.

Current status:
I have deployed SolrCloud across multiple servers with multiple shards and
a replication factor of 2.
In terms of collections, I have a person collection that contains details
individuals including address and high level portfolio info. Structurally,
this collection contains great grandchildren.
Then I have a few collections that deals with content. For now, content is
just emails and document with a max size of 2MB, with certain user
exceptions that can go higher than 2MB.
Content is indexed twice in terms of the actual content, firstly as
binary/stream and then as readable text. Metadata is negligible


Challenges:
When performing full text searches without concurrently executing updates,
solr seems to be doing well. Running updates also does okish given the
nature of the transaction. However, when I run search and updates
simultaneously, performance drops quite significantly. I have played with
field properties, analyzers, tokenizers, shafting sizes etc.
Any advice?
Would like to know if anyone has done something similar. Please excuse the
long winded message


--
Sent from Gmail Mobile
Reply | Threaded
Open this post in threaded view
|

Fwd: Sole for Content Management

Moenieb Davids-2
---------- Forwarded message ----------
From: *Moenieb Davids* <[hidden email]>
Date: Thursday, June 7, 2018
Subject: Sole for Content Management
To: "[hidden email]" <[hidden email]>, "
[hidden email]" <[hidden email]>


Hi All,

Background:
I am currently testing a deployment of a content management framework where
I am trying to punt Solr as the tool of choice for ingestion and searching.

Current status:
I have deployed SolrCloud across multiple servers with multiple shards and
a replication factor of 2.
In terms of collections, I have a person collection that contains details
individuals including address and high level portfolio info. Structurally,
this collection contains great grandchildren.
Then I have a few collections that deals with content. For now, content is
just emails and document with a max size of 2MB, with certain user
exceptions that can go higher than 2MB.
Content is indexed twice in terms of the actual content, firstly as
binary/stream and then as readable text. Metadata is negligible


Challenges:
When performing full text searches without concurrently executing updates,
solr seems to be doing well. Running updates also does okish given the
nature of the transaction. However, when I run search and updates
simultaneously, performance drops quite significantly. I have played with
field properties, analyzers, tokenizers, shafting sizes etc.
Any advice?
Would like to know if anyone has done something similar. Please excuse the
long winded message


--
Sent from Gmail Mobile



--
Sent from Gmail Mobile