How to split a merged index which is more than 2GB in size into same size multiple shard

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to split a merged index which is more than 2GB in size into same size multiple shard

arghya.it87
I am experimenting solr cloud 5.2.1 version. I have 3 shards and I use
IndexMergeTool to merge my index in a single directory. Now I have created a
new collection where I have 4 shards and I want to split my index in these 4
shards.
 Is there any IndexSplit tool available or how to do it.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: How to split a merged index which is more than 2GB in size into same size multiple shard

Erick Erickson
There’s the collections API command SPLITSHARD.

But is this for functional reasons or are you just experimenting for background information? 2G is a tiny index by recent standards, I routinely see 200G indexes on a replica.

And merging indexes in SolrCloud is a bit tricky, you have to be sure to merge docs in sub-indexes such that the hash value of the ID (or routing) field is congruent with the “range” parameter in the state.json file. This assumes compositeID routing.

So unless you thoroughly understand the implications, including and especially the above paragraph, I’d strongly recommend you don’t go into production with indexes you’ve merged.

Best,
Erick

> On Mar 14, 2019, at 11:02 PM, arghya.it87 <[hidden email]> wrote:
>
> I am experimenting solr cloud 5.2.1 version. I have 3 shards and I use
> IndexMergeTool to merge my index in a single directory. Now I have created a
> new collection where I have 4 shards and I want to split my index in these 4
> shards.
> Is there any IndexSplit tool available or how to do it.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply | Threaded
Open this post in threaded view
|

Solr or SolrJ Atomic Update

THIERRY BOUCHENY
Hello,

I have spent a few hours trying to understand why I get this error.

RunUpdateProcessor has received an AddUpdateCommand containing a document that appears to still contain Atomic document update operations, most likely because DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain

I am trying to do a part update of a record Using either Solo ( through a POST update ) or SolrJ, in both cases I have the same error. It must be configuration problem but I can’t find out what is is, even after having sent a few hours on the web.

I run Solr on 2 different servers one with solr 5.4.1 and an other with solr 7.5.0 with the same problem.

Adding a new document works fine but trying to update the text field always come back with this error.

My SolrJ code is quite simple.

                HttpSolrClient client = new HttpSolrClient.Builder(baseurl).build();
                client.setParser(new XMLResponseParser());

                SolrQuery query = new SolrQuery();
                query.set("q", "userid:18278456");
                QueryResponse response = client.query(query);
                 
                SolrDocumentList docList = response.getResults();
                System.out.println("docList: " + docList.size());
                if( docList.size() == 0  )
                {
                        SolrInputDocument doc = new SolrInputDocument();
                        doc.addField("userid", "18278456");
                        doc.addField("text", _data);
                    client.add(doc);
                        client.commit();
                }
                else
                {
                        for (SolrDocument doc : docList)
                        {
                                System.out.println("existing doc id: " + doc.get("id"));
                                SolrInputDocument _updatedoc = new SolrInputDocument();
                                Map<String, String> partialUpdate = new HashMap<String, String>();
                                partialUpdate.put("set", _data);
                                _updatedoc.addField("id", doc.get("id"));
                                _updatedoc.addField("text", partialUpdate);
                            client.add(_updatedoc);
                                client.commit();
                                break;
                        }
                }

And my POST solr requests

for a Add ( this works ) Content-Type: application/json

{{solrdomainurl}}/testtika/update?commit=true

With the raw body [{"userid":"18278456","text":"test”}]

For part update ( responds with the above error ) Content-Type: application/json

With the raw body [{"id":"e841a2b5-a48d-47ef-b019-8f8d41e92655","text":{"set":"other"}}]


id is my uniqueId

Defined as follow in my schema.xml

    <field name="id" type="uuid" indexed="true" stored="true" default="NEW" />

        <uniquekey>id</uniquekey>

         <fieldType name="uuid" class="solr.UUIDField" indexed="true" />

I won’t put the solrconfig.xml in the email but after using my custom one with unsuccessful tries I took the one in exemple/example-DIH/solr folder with the same result.

Any idea what I could do wrong is welcome !! Thanks in advance.

Thierry

Reply | Threaded
Open this post in threaded view
|

Re: Solr or SolrJ Atomic Update

Mikhail Khludnev-2
Maybe you can share the error you are talking about?

On Fri, Mar 15, 2019 at 10:03 PM THIERRY BOUCHENY <[hidden email]>
wrote:

> Hello,
>
> I have spent a few hours trying to understand why I get this error.
>
> RunUpdateProcessor has received an AddUpdateCommand containing a document
> that appears to still contain Atomic document update operations, most
> likely because DistributedUpdateProcessorFactory was explicitly disabled
> from this updateRequestProcessorChain
>
> I am trying to do a part update of a record Using either Solo ( through a
> POST update ) or SolrJ, in both cases I have the same error. It must be
> configuration problem but I can’t find out what is is, even after having
> sent a few hours on the web.
>
> I run Solr on 2 different servers one with solr 5.4.1 and an other with
> solr 7.5.0 with the same problem.
>
> Adding a new document works fine but trying to update the text field
> always come back with this error.
>
> My SolrJ code is quite simple.
>
>                 HttpSolrClient client = new
> HttpSolrClient.Builder(baseurl).build();
>                 client.setParser(new XMLResponseParser());
>
>                 SolrQuery query = new SolrQuery();
>                 query.set("q", "userid:18278456");
>                 QueryResponse response = client.query(query);
>
>                 SolrDocumentList docList = response.getResults();
>                 System.out.println("docList: " + docList.size());
>                 if( docList.size() == 0  )
>                 {
>                         SolrInputDocument doc = new SolrInputDocument();
>                         doc.addField("userid", "18278456");
>                         doc.addField("text", _data);
>                     client.add(doc);
>                         client.commit();
>                 }
>                 else
>                 {
>                         for (SolrDocument doc : docList)
>                         {
>                                 System.out.println("existing doc id: " +
> doc.get("id"));
>                                 SolrInputDocument _updatedoc = new
> SolrInputDocument();
>                                 Map<String, String> partialUpdate = new
> HashMap<String, String>();
>                                 partialUpdate.put("set", _data);
>                                 _updatedoc.addField("id", doc.get("id"));
>                                 _updatedoc.addField("text", partialUpdate);
>                                 client.add(_updatedoc);
>                                 client.commit();
>                                 break;
>                         }
>                 }
>
> And my POST solr requests
>
> for a Add ( this works ) Content-Type: application/json
>
> {{solrdomainurl}}/testtika/update?commit=true
>
> With the raw body [{"userid":"18278456","text":"test”}]
>
> For part update ( responds with the above error ) Content-Type:
> application/json
>
> With the raw body
> [{"id":"e841a2b5-a48d-47ef-b019-8f8d41e92655","text":{"set":"other"}}]
>
>
> id is my uniqueId
>
> Defined as follow in my schema.xml
>
>                 <field name="id" type="uuid" indexed="true" stored="true"
> default="NEW" />
>
>         <uniquekey>id</uniquekey>
>
>          <fieldType name="uuid" class="solr.UUIDField" indexed="true" />
>
> I won’t put the solrconfig.xml in the email but after using my custom one
> with unsuccessful tries I took the one in exemple/example-DIH/solr folder
> with the same result.
>
> Any idea what I could do wrong is welcome !! Thanks in advance.
>
> Thierry
>
>

--
Sincerely yours
Mikhail Khludnev
Reply | Threaded
Open this post in threaded view
|

Re: Solr or SolrJ Atomic Update

THIERRY BOUCHENY
Hi Mikhail,

The error was at the beginning of my email in the first lines, here it is :

RunUpdateProcessor has received an AddUpdateCommand containing a document that appears to still contain Atomic document update operations, most likely because DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain

Best regards

Thierry

> On 16 Mar 2019, at 19:34, Mikhail Khludnev <[hidden email]> wrote:
>
> Maybe you can share the error you are talking about?
>
> On Fri, Mar 15, 2019 at 10:03 PM THIERRY BOUCHENY <[hidden email]>
> wrote:
>
>> Hello,
>>
>> I have spent a few hours trying to understand why I get this error.
>>
>> RunUpdateProcessor has received an AddUpdateCommand containing a document
>> that appears to still contain Atomic document update operations, most
>> likely because DistributedUpdateProcessorFactory was explicitly disabled
>> from this updateRequestProcessorChain
>>
>> I am trying to do a part update of a record Using either Solo ( through a
>> POST update ) or SolrJ, in both cases I have the same error. It must be
>> configuration problem but I can’t find out what is is, even after having
>> sent a few hours on the web.
>>
>> I run Solr on 2 different servers one with solr 5.4.1 and an other with
>> solr 7.5.0 with the same problem.
>>
>> Adding a new document works fine but trying to update the text field
>> always come back with this error.
>>
>> My SolrJ code is quite simple.
>>
>>                HttpSolrClient client = new
>> HttpSolrClient.Builder(baseurl).build();
>>                client.setParser(new XMLResponseParser());
>>
>>                SolrQuery query = new SolrQuery();
>>                query.set("q", "userid:18278456");
>>                QueryResponse response = client.query(query);
>>
>>                SolrDocumentList docList = response.getResults();
>>                System.out.println("docList: " + docList.size());
>>                if( docList.size() == 0  )
>>                {
>>                        SolrInputDocument doc = new SolrInputDocument();
>>                        doc.addField("userid", "18278456");
>>                        doc.addField("text", _data);
>>                    client.add(doc);
>>                        client.commit();
>>                }
>>                else
>>                {
>>                        for (SolrDocument doc : docList)
>>                        {
>>                                System.out.println("existing doc id: " +
>> doc.get("id"));
>>                                SolrInputDocument _updatedoc = new
>> SolrInputDocument();
>>                                Map<String, String> partialUpdate = new
>> HashMap<String, String>();
>>                                partialUpdate.put("set", _data);
>>                                _updatedoc.addField("id", doc.get("id"));
>>                                _updatedoc.addField("text", partialUpdate);
>>                                client.add(_updatedoc);
>>                                client.commit();
>>                                break;
>>                        }
>>                }
>>
>> And my POST solr requests
>>
>> for a Add ( this works ) Content-Type: application/json
>>
>> {{solrdomainurl}}/testtika/update?commit=true
>>
>> With the raw body [{"userid":"18278456","text":"test”}]
>>
>> For part update ( responds with the above error ) Content-Type:
>> application/json
>>
>> With the raw body
>> [{"id":"e841a2b5-a48d-47ef-b019-8f8d41e92655","text":{"set":"other"}}]
>>
>>
>> id is my uniqueId
>>
>> Defined as follow in my schema.xml
>>
>>                <field name="id" type="uuid" indexed="true" stored="true"
>> default="NEW" />
>>
>>        <uniquekey>id</uniquekey>
>>
>>         <fieldType name="uuid" class="solr.UUIDField" indexed="true" />
>>
>> I won’t put the solrconfig.xml in the email but after using my custom one
>> with unsuccessful tries I took the one in exemple/example-DIH/solr folder
>> with the same result.
>>
>> Any idea what I could do wrong is welcome !! Thanks in advance.
>>
>> Thierry
>>
>>
>
> --
> Sincerely yours
> Mikhail Khludnev

Reply | Threaded
Open this post in threaded view
|

Re: Solr or SolrJ Atomic Update

THIERRY BOUCHENY
In reply to this post by THIERRY BOUCHENY
Hi all,

Ok, I found my problem, it was a silly one as I expected !!! In my schema the  “uniqueKey” was spelled with a small cap k !!

Thierry

> On 15 Mar 2019, at 19:03, THIERRY BOUCHENY <[hidden email]> wrote:
>
> Hello,
>
> I have spent a few hours trying to understand why I get this error.
>
> RunUpdateProcessor has received an AddUpdateCommand containing a document that appears to still contain Atomic document update operations, most likely because DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain
>
> I am trying to do a part update of a record Using either Solo ( through a POST update ) or SolrJ, in both cases I have the same error. It must be configuration problem but I can’t find out what is is, even after having sent a few hours on the web.
>
> I run Solr on 2 different servers one with solr 5.4.1 and an other with solr 7.5.0 with the same problem.
>
> Adding a new document works fine but trying to update the text field always come back with this error.
>
> My SolrJ code is quite simple.
>
> HttpSolrClient client = new HttpSolrClient.Builder(baseurl).build();
> client.setParser(new XMLResponseParser());
>
> SolrQuery query = new SolrQuery();
> query.set("q", "userid:18278456");
> QueryResponse response = client.query(query);
>
> SolrDocumentList docList = response.getResults();
> System.out.println("docList: " + docList.size());
> if( docList.size() == 0  )
> {
> SolrInputDocument doc = new SolrInputDocument();
> doc.addField("userid", "18278456");
> doc.addField("text", _data);
>    client.add(doc);
> client.commit();
> }
> else
> {
> for (SolrDocument doc : docList)
> {
> System.out.println("existing doc id: " + doc.get("id"));
> SolrInputDocument _updatedoc = new SolrInputDocument();
> Map<String, String> partialUpdate = new HashMap<String, String>();
> partialUpdate.put("set", _data);
> _updatedoc.addField("id", doc.get("id"));
> _updatedoc.addField("text", partialUpdate);
>     client.add(_updatedoc);
> client.commit();
> break;
> }
> }
>
> And my POST solr requests
>
> for a Add ( this works ) Content-Type: application/json
>
> {{solrdomainurl}}/testtika/update?commit=true
>
> With the raw body [{"userid":"18278456","text":"test”}]
>
> For part update ( responds with the above error ) Content-Type: application/json
>
> With the raw body [{"id":"e841a2b5-a48d-47ef-b019-8f8d41e92655","text":{"set":"other"}}]
>
>
> id is my uniqueId
>
> Defined as follow in my schema.xml
>
>   <field name="id" type="uuid" indexed="true" stored="true" default="NEW" />
>
> <uniquekey>id</uniquekey>
>
> <fieldType name="uuid" class="solr.UUIDField" indexed="true" />
>
> I won’t put the solrconfig.xml in the email but after using my custom one with unsuccessful tries I took the one in exemple/example-DIH/solr folder with the same result.
>
> Any idea what I could do wrong is welcome !! Thanks in advance.
>
> Thierry
>