Custom update processor not kicking in

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Custom update processor not kicking in

Rahul Goswami
Hello,

I am using solr 7.2.1 in a standalone mode. I created a custom update
request processor and placed it between the distributed processor and run
update processor in my chain. I made sure the chain is invoked since I see
log lines from the getInstance() method of my processor factory. But I
don’t see any log lines from the processAdd() method.

Any inputs on why the processor is getting skipped if placed after
distributed processor?

Thanks,
Rahul
Reply | Threaded
Open this post in threaded view
|

RE: Custom update processor not kicking in

Markus Jelsma-2
Hello Rahul,

I don't know why you don't see your logs lines, but if i remember correctly, you must put all custom processors above Log, Distributed and Run, at least i remember i read it somewhere a long time ago.

We put all our custom processors on top of the three default processors and they run just fine.

Try it.

Regards,
Markus
 
-----Original message-----

> From:Rahul Goswami <[hidden email]>
> Sent: Wednesday 18th September 2019 22:20
> To: [hidden email]
> Subject: Custom update processor not kicking in
>
> Hello,
>
> I am using solr 7.2.1 in a standalone mode. I created a custom update
> request processor and placed it between the distributed processor and run
> update processor in my chain. I made sure the chain is invoked since I see
> log lines from the getInstance() method of my processor factory. But I
> don’t see any log lines from the processAdd() method.
>
> Any inputs on why the processor is getting skipped if placed after
> distributed processor?
>
> Thanks,
> Rahul
>
Reply | Threaded
Open this post in threaded view
|

Re: Custom update processor not kicking in

Erick Erickson
It Depends (tm). This is a little confused. Why do you have
distributed processor in stand-alone Solr? Stand-alone doesn't, well,
distribute updates so that seems odd. Do try switching it around and
putting it on top, this should be OK since distributed is irrelevant.

You can also just set a breakpoint and see for instance, the
instructions in the "IntelliJ" section here:
https://cwiki.apache.org/confluence/display/solr/HowToContribute

One thing I'd do is make very, very sure that my jar file was being
found. IIRC, the -v startup option will log exactly where solr looks
for jar files. Be sure your custom jar is in one of them and is picked
up. I've set a lib directive to one place only to discover that
there's an old copy lying around someplace else....

Best,
Erick

On Wed, Sep 18, 2019 at 5:08 PM Markus Jelsma
<[hidden email]> wrote:

>
> Hello Rahul,
>
> I don't know why you don't see your logs lines, but if i remember correctly, you must put all custom processors above Log, Distributed and Run, at least i remember i read it somewhere a long time ago.
>
> We put all our custom processors on top of the three default processors and they run just fine.
>
> Try it.
>
> Regards,
> Markus
>
> -----Original message-----
> > From:Rahul Goswami <[hidden email]>
> > Sent: Wednesday 18th September 2019 22:20
> > To: [hidden email]
> > Subject: Custom update processor not kicking in
> >
> > Hello,
> >
> > I am using solr 7.2.1 in a standalone mode. I created a custom update
> > request processor and placed it between the distributed processor and run
> > update processor in my chain. I made sure the chain is invoked since I see
> > log lines from the getInstance() method of my processor factory. But I
> > don’t see any log lines from the processAdd() method.
> >
> > Any inputs on why the processor is getting skipped if placed after
> > distributed processor?
> >
> > Thanks,
> > Rahul
> >
Reply | Threaded
Open this post in threaded view
|

Re: Custom update processor not kicking in

Rahul Goswami
Eric, Markus,
Thank you for your inputs. I made sure that the jar file is found correctly
since the core reloads fine and also prints the log lines from my processor
during update request (getInstane() method of the update factory). The
reason why I want to insert the processor between distributed update
processor (DUP) and run update processor (RUP) is because there are certain
fields which were indexed against a dynamic field “*” and later the schema
was patched to remove the * field, causing atomic updates to fail for such
documents. Reindexing is not option since the index has nearly 200 million
docs. My understanding is that the atomic updates are stitched back to a
complete document in the DUP before being reindexed by RUP. Hence if I am
able to access the document before being indexed and check for fields which
are not defined in the schema, I can remove them from the stitched back
document so that the atomic update can happen successfully for such docs.
The documentation below mentions that even if I don’t include the DUP in my
chain it is automatically inserted just before RUP.

https://lucene.apache.org/solr/guide/7_2/update-request-processors.html#custom-update-request-processor-chain


I tried both approaches viz. explicitly specifying my processor after DUP
in the chain and also tried using the “post-processor” option in the chain,
to have the custom processor execute after DUP. Still looks like the
processor is just short circuited. I have defined my logic in the
processAdd() of the  processor. Is this an expected behavior?

Regards,
Rahul


On Wed, Sep 18, 2019 at 5:28 PM Erick Erickson <[hidden email]>
wrote:

> It Depends (tm). This is a little confused. Why do you have
> distributed processor in stand-alone Solr? Stand-alone doesn't, well,
> distribute updates so that seems odd. Do try switching it around and
> putting it on top, this should be OK since distributed is irrelevant.
>
> You can also just set a breakpoint and see for instance, the
> instructions in the "IntelliJ" section here:
> https://cwiki.apache.org/confluence/display/solr/HowToContribute
>
> One thing I'd do is make very, very sure that my jar file was being
> found. IIRC, the -v startup option will log exactly where solr looks
> for jar files. Be sure your custom jar is in one of them and is picked
> up. I've set a lib directive to one place only to discover that
> there's an old copy lying around someplace else....
>
> Best,
> Erick
>
> On Wed, Sep 18, 2019 at 5:08 PM Markus Jelsma
> <[hidden email]> wrote:
> >
> > Hello Rahul,
> >
> > I don't know why you don't see your logs lines, but if i remember
> correctly, you must put all custom processors above Log, Distributed and
> Run, at least i remember i read it somewhere a long time ago.
> >
> > We put all our custom processors on top of the three default processors
> and they run just fine.
> >
> > Try it.
> >
> > Regards,
> > Markus
> >
> > -----Original message-----
> > > From:Rahul Goswami <[hidden email]>
> > > Sent: Wednesday 18th September 2019 22:20
> > > To: [hidden email]
> > > Subject: Custom update processor not kicking in
> > >
> > > Hello,
> > >
> > > I am using solr 7.2.1 in a standalone mode. I created a custom update
> > > request processor and placed it between the distributed processor and
> run
> > > update processor in my chain. I made sure the chain is invoked since I
> see
> > > log lines from the getInstance() method of my processor factory. But I
> > > don’t see any log lines from the processAdd() method.
> > >
> > > Any inputs on why the processor is getting skipped if placed after
> > > distributed processor?
> > >
> > > Thanks,
> > > Rahul
> > >
>
Reply | Threaded
Open this post in threaded view
|

Re: Custom update processor not kicking in

Erick Erickson
_Why_ is reindexing not an option? 200M doc isn't that many.
Since you have Atomic updates working, you could easily
write a little program that pulled the docs from you existing
collection and pushed them to a new one with the new schema.

Do use CursorMark if you try that.... You have to be ready to
reindex as time passes, either to upgrade to a major version
2 greater than what you're using now or because the requirements
change yet again.

Best,
Erick

On Thu, Sep 19, 2019 at 12:36 AM Rahul Goswami <[hidden email]> wrote:

>
> Eric, Markus,
> Thank you for your inputs. I made sure that the jar file is found correctly
> since the core reloads fine and also prints the log lines from my processor
> during update request (getInstane() method of the update factory). The
> reason why I want to insert the processor between distributed update
> processor (DUP) and run update processor (RUP) is because there are certain
> fields which were indexed against a dynamic field “*” and later the schema
> was patched to remove the * field, causing atomic updates to fail for such
> documents. Reindexing is not option since the index has nearly 200 million
> docs. My understanding is that the atomic updates are stitched back to a
> complete document in the DUP before being reindexed by RUP. Hence if I am
> able to access the document before being indexed and check for fields which
> are not defined in the schema, I can remove them from the stitched back
> document so that the atomic update can happen successfully for such docs.
> The documentation below mentions that even if I don’t include the DUP in my
> chain it is automatically inserted just before RUP.
>
> https://lucene.apache.org/solr/guide/7_2/update-request-processors.html#custom-update-request-processor-chain
>
>
> I tried both approaches viz. explicitly specifying my processor after DUP
> in the chain and also tried using the “post-processor” option in the chain,
> to have the custom processor execute after DUP. Still looks like the
> processor is just short circuited. I have defined my logic in the
> processAdd() of the  processor. Is this an expected behavior?
>
> Regards,
> Rahul
>
>
> On Wed, Sep 18, 2019 at 5:28 PM Erick Erickson <[hidden email]>
> wrote:
>
> > It Depends (tm). This is a little confused. Why do you have
> > distributed processor in stand-alone Solr? Stand-alone doesn't, well,
> > distribute updates so that seems odd. Do try switching it around and
> > putting it on top, this should be OK since distributed is irrelevant.
> >
> > You can also just set a breakpoint and see for instance, the
> > instructions in the "IntelliJ" section here:
> > https://cwiki.apache.org/confluence/display/solr/HowToContribute
> >
> > One thing I'd do is make very, very sure that my jar file was being
> > found. IIRC, the -v startup option will log exactly where solr looks
> > for jar files. Be sure your custom jar is in one of them and is picked
> > up. I've set a lib directive to one place only to discover that
> > there's an old copy lying around someplace else....
> >
> > Best,
> > Erick
> >
> > On Wed, Sep 18, 2019 at 5:08 PM Markus Jelsma
> > <[hidden email]> wrote:
> > >
> > > Hello Rahul,
> > >
> > > I don't know why you don't see your logs lines, but if i remember
> > correctly, you must put all custom processors above Log, Distributed and
> > Run, at least i remember i read it somewhere a long time ago.
> > >
> > > We put all our custom processors on top of the three default processors
> > and they run just fine.
> > >
> > > Try it.
> > >
> > > Regards,
> > > Markus
> > >
> > > -----Original message-----
> > > > From:Rahul Goswami <[hidden email]>
> > > > Sent: Wednesday 18th September 2019 22:20
> > > > To: [hidden email]
> > > > Subject: Custom update processor not kicking in
> > > >
> > > > Hello,
> > > >
> > > > I am using solr 7.2.1 in a standalone mode. I created a custom update
> > > > request processor and placed it between the distributed processor and
> > run
> > > > update processor in my chain. I made sure the chain is invoked since I
> > see
> > > > log lines from the getInstance() method of my processor factory. But I
> > > > don’t see any log lines from the processAdd() method.
> > > >
> > > > Any inputs on why the processor is getting skipped if placed after
> > > > distributed processor?
> > > >
> > > > Thanks,
> > > > Rahul
> > > >
> >
Reply | Threaded
Open this post in threaded view
|

Re: Custom update processor not kicking in

Rahul Goswami
Eric,
The 200 million docs are all large as they are content indexed. Also it
would be hard to convince the customer to rebuild their index. But more
than that, I also want to clear my understanding on this topic and know if
it’s an expected behaviour for a distributed update processor to not call
any further custom processors other than the run update processor in
standalone mode? Alternatively, is there a way I can get a handle on a
complete document once it’s reconstructed from an atomic update?

Thanks,
Rahul

On Thu, Sep 19, 2019 at 7:06 AM Erick Erickson <[hidden email]>
wrote:

> _Why_ is reindexing not an option? 200M doc isn't that many.
> Since you have Atomic updates working, you could easily
> write a little program that pulled the docs from you existing
> collection and pushed them to a new one with the new schema.
>
> Do use CursorMark if you try that.... You have to be ready to
> reindex as time passes, either to upgrade to a major version
> 2 greater than what you're using now or because the requirements
> change yet again.
>
> Best,
> Erick
>
> On Thu, Sep 19, 2019 at 12:36 AM Rahul Goswami <[hidden email]>
> wrote:
> >
> > Eric, Markus,
> > Thank you for your inputs. I made sure that the jar file is found
> correctly
> > since the core reloads fine and also prints the log lines from my
> processor
> > during update request (getInstane() method of the update factory). The
> > reason why I want to insert the processor between distributed update
> > processor (DUP) and run update processor (RUP) is because there are
> certain
> > fields which were indexed against a dynamic field “*” and later the
> schema
> > was patched to remove the * field, causing atomic updates to fail for
> such
> > documents. Reindexing is not option since the index has nearly 200
> million
> > docs. My understanding is that the atomic updates are stitched back to a
> > complete document in the DUP before being reindexed by RUP. Hence if I am
> > able to access the document before being indexed and check for fields
> which
> > are not defined in the schema, I can remove them from the stitched back
> > document so that the atomic update can happen successfully for such docs.
> > The documentation below mentions that even if I don’t include the DUP in
> my
> > chain it is automatically inserted just before RUP.
> >
> >
> https://lucene.apache.org/solr/guide/7_2/update-request-processors.html#custom-update-request-processor-chain
> >
> >
> > I tried both approaches viz. explicitly specifying my processor after DUP
> > in the chain and also tried using the “post-processor” option in the
> chain,
> > to have the custom processor execute after DUP. Still looks like the
> > processor is just short circuited. I have defined my logic in the
> > processAdd() of the  processor. Is this an expected behavior?
> >
> > Regards,
> > Rahul
> >
> >
> > On Wed, Sep 18, 2019 at 5:28 PM Erick Erickson <[hidden email]>
> > wrote:
> >
> > > It Depends (tm). This is a little confused. Why do you have
> > > distributed processor in stand-alone Solr? Stand-alone doesn't, well,
> > > distribute updates so that seems odd. Do try switching it around and
> > > putting it on top, this should be OK since distributed is irrelevant.
> > >
> > > You can also just set a breakpoint and see for instance, the
> > > instructions in the "IntelliJ" section here:
> > > https://cwiki.apache.org/confluence/display/solr/HowToContribute
> > >
> > > One thing I'd do is make very, very sure that my jar file was being
> > > found. IIRC, the -v startup option will log exactly where solr looks
> > > for jar files. Be sure your custom jar is in one of them and is picked
> > > up. I've set a lib directive to one place only to discover that
> > > there's an old copy lying around someplace else....
> > >
> > > Best,
> > > Erick
> > >
> > > On Wed, Sep 18, 2019 at 5:08 PM Markus Jelsma
> > > <[hidden email]> wrote:
> > > >
> > > > Hello Rahul,
> > > >
> > > > I don't know why you don't see your logs lines, but if i remember
> > > correctly, you must put all custom processors above Log, Distributed
> and
> > > Run, at least i remember i read it somewhere a long time ago.
> > > >
> > > > We put all our custom processors on top of the three default
> processors
> > > and they run just fine.
> > > >
> > > > Try it.
> > > >
> > > > Regards,
> > > > Markus
> > > >
> > > > -----Original message-----
> > > > > From:Rahul Goswami <[hidden email]>
> > > > > Sent: Wednesday 18th September 2019 22:20
> > > > > To: [hidden email]
> > > > > Subject: Custom update processor not kicking in
> > > > >
> > > > > Hello,
> > > > >
> > > > > I am using solr 7.2.1 in a standalone mode. I created a custom
> update
> > > > > request processor and placed it between the distributed processor
> and
> > > run
> > > > > update processor in my chain. I made sure the chain is invoked
> since I
> > > see
> > > > > log lines from the getInstance() method of my processor factory.
> But I
> > > > > don’t see any log lines from the processAdd() method.
> > > > >
> > > > > Any inputs on why the processor is getting skipped if placed after
> > > > > distributed processor?
> > > > >
> > > > > Thanks,
> > > > > Rahul
> > > > >
> > >
>