Automating Solr

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Automating Solr

Craig Hoffman
Simple question:
What is best way to automate re-indexing Solr? Setup a CRON JOB / Curl Script?

Thanks,
Craig
--
Craig Hoffman
w: http://www.craighoffmanphotography.com
FB: www.facebook.com/CraigHoffmanPhotography
TW: https://twitter.com/craiglhoffman













Reply | Threaded
Open this post in threaded view
|

Re: Automating Solr

Alexandre Rafalovitch
You don't "reindex Solr". You reindex data into Solr. So, this depends
where you data is coming from and how often it changes. If the data
does not change, no point re-indexing it. And how do you get the data
into the Solr in the first place?

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 30 October 2014 13:58, Craig Hoffman <[hidden email]> wrote:

> Simple question:
> What is best way to automate re-indexing Solr? Setup a CRON JOB / Curl Script?
>
> Thanks,
> Craig
> --
> Craig Hoffman
> w: http://www.craighoffmanphotography.com
> FB: www.facebook.com/CraigHoffmanPhotography
> TW: https://twitter.com/craiglhoffman
>
>
>
>
>
>
>
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Automating Solr

Craig Hoffman
Right, of course. The data changes every few days. According to this
article, you can run a CRON Job to create a new index.
http://www.finalconcept.com.au/article/view/apache-solr-hints-and-tips

On Thu, Oct 30, 2014 at 12:04 PM, Alexandre Rafalovitch <[hidden email]>
wrote:

> You don't "reindex Solr". You reindex data into Solr. So, this depends
> where you data is coming from and how often it changes. If the data
> does not change, no point re-indexing it. And how do you get the data
> into the Solr in the first place?
>
> Regards,
>    Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 30 October 2014 13:58, Craig Hoffman <[hidden email]> wrote:
> > Simple question:
> > What is best way to automate re-indexing Solr? Setup a CRON JOB / Curl
> Script?
> >
> > Thanks,
> > Craig
> > --
> > Craig Hoffman
> > w: http://www.craighoffmanphotography.com
> > FB: www.facebook.com/CraigHoffmanPhotography
> > TW: https://twitter.com/craiglhoffman
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>



--
______________________________________
Craig Hoffman
iChat / AIM:mountain.do
______________________________________
Reply | Threaded
Open this post in threaded view
|

Re: Automating Solr

Craig Hoffman
The data gets into Solr via MySQL script.
--
Craig Hoffman
w: http://www.craighoffmanphotography.com
FB: www.facebook.com/CraigHoffmanPhotography
TW: https://twitter.com/craiglhoffman













> On Oct 30, 2014, at 12:11 PM, Craig Hoffman <[hidden email]> wrote:
>
> Right, of course. The data changes every few days. According to this article, you can run a CRON Job to create a new index.
> http://www.finalconcept.com.au/article/view/apache-solr-hints-and-tips <http://www.finalconcept.com.au/article/view/apache-solr-hints-and-tips>
>
> On Thu, Oct 30, 2014 at 12:04 PM, Alexandre Rafalovitch <[hidden email] <mailto:[hidden email]>> wrote:
> You don't "reindex Solr". You reindex data into Solr. So, this depends
> where you data is coming from and how often it changes. If the data
> does not change, no point re-indexing it. And how do you get the data
> into the Solr in the first place?
>
> Regards,
>    Alex.
> Personal: http://www.outerthoughts.com/ <http://www.outerthoughts.com/> and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ <http://www.solr-start.com/> and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 <https://www.linkedin.com/groups?gid=6713853>
>
>
> On 30 October 2014 13:58, Craig Hoffman <[hidden email] <mailto:[hidden email]>> wrote:
> > Simple question:
> > What is best way to automate re-indexing Solr? Setup a CRON JOB / Curl Script?
> >
> > Thanks,
> > Craig
> > --
> > Craig Hoffman
> > w: http://www.craighoffmanphotography.com <http://www.craighoffmanphotography.com/>
> > FB: www.facebook.com/CraigHoffmanPhotography <http://www.facebook.com/CraigHoffmanPhotography>
> > TW: https://twitter.com/craiglhoffman <https://twitter.com/craiglhoffman>
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>
>
> --
> ______________________________________
> Craig Hoffman
> iChat / AIM:mountain.do
> ______________________________________

Reply | Threaded
Open this post in threaded view
|

Re: Automating Solr

Håvard Wahl Kongsgård
Then you have to run it again and again
30. okt. 2014 19:18 skrev "Craig Hoffman" <[hidden email]> følgende:

> The data gets into Solr via MySQL script.
> --
> Craig Hoffman
> w: http://www.craighoffmanphotography.com
> FB: www.facebook.com/CraigHoffmanPhotography
> TW: https://twitter.com/craiglhoffman
>
>
>
>
>
>
>
>
>
>
>
>
>
> > On Oct 30, 2014, at 12:11 PM, Craig Hoffman <[hidden email]>
> wrote:
> >
> > Right, of course. The data changes every few days. According to this
> article, you can run a CRON Job to create a new index.
> > http://www.finalconcept.com.au/article/view/apache-solr-hints-and-tips <
> http://www.finalconcept.com.au/article/view/apache-solr-hints-and-tips>
> >
> > On Thu, Oct 30, 2014 at 12:04 PM, Alexandre Rafalovitch <
> [hidden email] <mailto:[hidden email]>> wrote:
> > You don't "reindex Solr". You reindex data into Solr. So, this depends
> > where you data is coming from and how often it changes. If the data
> > does not change, no point re-indexing it. And how do you get the data
> > into the Solr in the first place?
> >
> > Regards,
> >    Alex.
> > Personal: http://www.outerthoughts.com/ <http://www.outerthoughts.com/>
> and @arafalov
> > Solr resources and newsletter: http://www.solr-start.com/ <
> http://www.solr-start.com/> and @solrstart
> > Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
> <https://www.linkedin.com/groups?gid=6713853>
> >
> >
> > On 30 October 2014 13:58, Craig Hoffman <[hidden email] <mailto:
> [hidden email]>> wrote:
> > > Simple question:
> > > What is best way to automate re-indexing Solr? Setup a CRON JOB / Curl
> Script?
> > >
> > > Thanks,
> > > Craig
> > > --
> > > Craig Hoffman
> > > w: http://www.craighoffmanphotography.com <
> http://www.craighoffmanphotography.com/>
> > > FB: www.facebook.com/CraigHoffmanPhotography <
> http://www.facebook.com/CraigHoffmanPhotography>
> > > TW: https://twitter.com/craiglhoffman <
> https://twitter.com/craiglhoffman>
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> > --
> > ______________________________________
> > Craig Hoffman
> > iChat / AIM:mountain.do
> > ______________________________________
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Automating Solr

Alexandre Rafalovitch
In reply to this post by Craig Hoffman
Do you mean DataImportHandler? If so, you can create full and
incremental queries and trigger them - from CRON - as often as you
would like. E.g. 1am nightly.

Regards,
   Alex.
On 30 October 2014 14:17, Craig Hoffman <[hidden email]> wrote:
> The data gets into Solr via MySQL script.
Reply | Threaded
Open this post in threaded view
|

Re: Automating Solr

Ramzi Alqrainy
In reply to this post by Craig Hoffman
Simple add this line to your crontab with crontab -e command:

0,30 * * * * /usr/bin/wget http://<solr_host>:8983/solr/<core_name>/dataimport?command=full-import

This will full import every 30 minutes. Replace <solr_host> and <core_name> with your configuration

Using delta-import command

Delta Import operation can be started by hitting the URL http://localhost:8983/solr/dataimport?command=delta-import. This operation will be started in a new thread and the status attribute in the response should be shown busy now. Depending on the size of your data set, this operation may take some time. At any time, you can hit http://localhost:8983/solr/dataimport to see the status flag.

When delta-import command is executed, it reads the start time stored in conf/dataimport.properties. It uses that timestamp to run delta queries and after completion, updates the timestamp in conf/dataimport.properties.

Note: there is an alternative approach for updating documents in Solr, which is in many cases more efficient and also requires less configuration explained on DataImportHandlerDeltaQueryViaFullImport.

Delta-Import Example

We will use the same example database used in the full import example. Note that the database schema has been updated and each table contains an additional column last_modified of timestamp type. You may want to download the database again since it has been updated recently. We use this timestamp field to determine what rows in each table have changed since the last indexed time.

Take a look at the following data-config.xml


<dataConfig>
    <dataSource driver="org.hsqldb.jdbcDriver" url="jdbc:hsqldb:/temp/example/ex" user="sa" />
    <document name="products">
        <entity name="item" pk="ID"
                query="select * from item"
                deltaImportQuery="select * from item where ID='${dih.delta.id}'"
                deltaQuery="select id from item where last_modified &gt; '${dih.last_index_time}'">
            <entity name="feature" pk="ITEM_ID"
                    query="select description as features from feature where item_id='${item.ID}'">
            </entity>
            <entity name="item_category" pk="ITEM_ID, CATEGORY_ID"
                    query="select CATEGORY_ID from item_category where ITEM_ID='${item.ID}'">
                <entity name="category" pk="ID"
                       query="select description as cat from category where id = '${item_category.CATEGORY_ID}'">
                </entity>
            </entity>
        </entity>
    </document>
</dataConfig>
Pay attention to the deltaQuery attribute which has an SQL statement capable of detecting changes in the item table. Note the variable ${dataimporter.last_index_time} The DataImportHandler exposes a variable called last_index_time which is a timestamp value denoting the last time full-import 'or' delta-import was run. You can use this variable anywhere in the SQL you write in data-config.xml and it will be replaced by the value during processing.

Reply | Threaded
Open this post in threaded view
|

Re: Automating Solr

Craig Hoffman
Thanks! One more question. WGET seems to choking on a my URL in particular the # and the & character . What’s the best method escaping?

http://<My Host> :8983/solr/#/articles/dataimport//dataimport?command=full-import&clean=true&optimize=true
--
Craig Hoffman
w: http://www.craighoffmanphotography.com
FB: www.facebook.com/CraigHoffmanPhotography
TW: https://twitter.com/craiglhoffman













> On Oct 30, 2014, at 12:30 PM, Ramzi Alqrainy <[hidden email]> wrote:
>
> Simple add this line to your crontab with crontab -e command:
>
> 0,30 * * * * /usr/bin/wget
> http://<solr_host>:8983/solr/<core_name>/dataimport?command=full-import
>
> This will full import every 30 minutes. Replace <solr_host> and <core_name>
> with your configuration
>
> *Using delta-import command*
>
> Delta Import operation can be started by hitting the URL
> http://localhost:8983/solr/dataimport?command=delta-import. This operation
> will be started in a new thread and the status attribute in the response
> should be shown busy now. Depending on the size of your data set, this
> operation may take some time. At any time, you can hit
> http://localhost:8983/solr/dataimport to see the status flag.
>
> When delta-import command is executed, it reads the start time stored in
> conf/dataimport.properties. It uses that timestamp to run delta queries and
> after completion, updates the timestamp in conf/dataimport.properties.
>
> Note: there is an alternative approach for updating documents in Solr, which
> is in many cases more efficient and also requires less configuration
> explained on DataImportHandlerDeltaQueryViaFullImport.
>
> *Delta-Import Example*
>
> We will use the same example database used in the full import example. Note
> that the database schema has been updated and each table contains an
> additional column last_modified of timestamp type. You may want to download
> the database again since it has been updated recently. We use this timestamp
> field to determine what rows in each table have changed since the last
> indexed time.
>
> Take a look at the following data-config.xml
>
>
> <dataConfig>
>    <dataSource driver="org.hsqldb.jdbcDriver"
> url="jdbc:hsqldb:/temp/example/ex" user="sa" />
>    <document name="products">
>        <entity name="item" pk="ID"
>                query="select * from item"
>                deltaImportQuery="select * from item where
> ID='${dih.delta.id}'"
>                deltaQuery="select id from item where last_modified &gt;
> '${dih.last_index_time}'">
>            <entity name="feature" pk="ITEM_ID"
>                    query="select description as features from feature where
> item_id='${item.ID}'">
>            </entity>
>            <entity name="item_category" pk="ITEM_ID, CATEGORY_ID"
>                    query="select CATEGORY_ID from item_category where
> ITEM_ID='${item.ID}'">
>                <entity name="category" pk="ID"
>                       query="select description as cat from category where
> id = '${item_category.CATEGORY_ID}'">
>                </entity>
>            </entity>
>        </entity>
>    </document>
> </dataConfig>
> Pay attention to the deltaQuery attribute which has an SQL statement capable
> of detecting changes in the item table. Note the variable
> ${dataimporter.last_index_time} The DataImportHandler exposes a variable
> called last_index_time which is a timestamp value denoting the last time
> full-import 'or' delta-import was run. You can use this variable anywhere in
> the SQL you write in data-config.xml and it will be replaced by the value
> during processing.
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Automating-Solr-tp4166696p4166707.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Automating Solr

Michael Della Bitta-2
You probably just need to put double quotes around the url.


On 10/30/14 15:27, Craig Hoffman wrote:

> Thanks! One more question. WGET seems to choking on a my URL in particular the # and the & character . What’s the best method escaping?
>
> http://<My Host> :8983/solr/#/articles/dataimport//dataimport?command=full-import&clean=true&optimize=true
> --
> Craig Hoffman
> w: http://www.craighoffmanphotography.com
> FB: www.facebook.com/CraigHoffmanPhotography
> TW: https://twitter.com/craiglhoffman
>
>
>
>
>
>
>
>
>
>
>
>
>
>> On Oct 30, 2014, at 12:30 PM, Ramzi Alqrainy <[hidden email]> wrote:
>>
>> Simple add this line to your crontab with crontab -e command:
>>
>> 0,30 * * * * /usr/bin/wget
>> http://<solr_host>:8983/solr/<core_name>/dataimport?command=full-import
>>
>> This will full import every 30 minutes. Replace <solr_host> and <core_name>
>> with your configuration
>>
>> *Using delta-import command*
>>
>> Delta Import operation can be started by hitting the URL
>> http://localhost:8983/solr/dataimport?command=delta-import. This operation
>> will be started in a new thread and the status attribute in the response
>> should be shown busy now. Depending on the size of your data set, this
>> operation may take some time. At any time, you can hit
>> http://localhost:8983/solr/dataimport to see the status flag.
>>
>> When delta-import command is executed, it reads the start time stored in
>> conf/dataimport.properties. It uses that timestamp to run delta queries and
>> after completion, updates the timestamp in conf/dataimport.properties.
>>
>> Note: there is an alternative approach for updating documents in Solr, which
>> is in many cases more efficient and also requires less configuration
>> explained on DataImportHandlerDeltaQueryViaFullImport.
>>
>> *Delta-Import Example*
>>
>> We will use the same example database used in the full import example. Note
>> that the database schema has been updated and each table contains an
>> additional column last_modified of timestamp type. You may want to download
>> the database again since it has been updated recently. We use this timestamp
>> field to determine what rows in each table have changed since the last
>> indexed time.
>>
>> Take a look at the following data-config.xml
>>
>>
>> <dataConfig>
>>     <dataSource driver="org.hsqldb.jdbcDriver"
>> url="jdbc:hsqldb:/temp/example/ex" user="sa" />
>>     <document name="products">
>>         <entity name="item" pk="ID"
>>                 query="select * from item"
>>                 deltaImportQuery="select * from item where
>> ID='${dih.delta.id}'"
>>                 deltaQuery="select id from item where last_modified &gt;
>> '${dih.last_index_time}'">
>>             <entity name="feature" pk="ITEM_ID"
>>                     query="select description as features from feature where
>> item_id='${item.ID}'">
>>             </entity>
>>             <entity name="item_category" pk="ITEM_ID, CATEGORY_ID"
>>                     query="select CATEGORY_ID from item_category where
>> ITEM_ID='${item.ID}'">
>>                 <entity name="category" pk="ID"
>>                        query="select description as cat from category where
>> id = '${item_category.CATEGORY_ID}'">
>>                 </entity>
>>             </entity>
>>         </entity>
>>     </document>
>> </dataConfig>
>> Pay attention to the deltaQuery attribute which has an SQL statement capable
>> of detecting changes in the item table. Note the variable
>> ${dataimporter.last_index_time} The DataImportHandler exposes a variable
>> called last_index_time which is a timestamp value denoting the last time
>> full-import 'or' delta-import was run. You can use this variable anywhere in
>> the SQL you write in data-config.xml and it will be replaced by the value
>> during processing.
>>
>>
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.nabble.com/Automating-Solr-tp4166696p4166707.html
>> Sent from the Solr - User mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Automating Solr

Shawn Heisey-2
In reply to this post by Craig Hoffman
On 10/30/2014 1:27 PM, Craig Hoffman wrote:
> Thanks! One more question. WGET seems to choking on a my URL in particular the # and the & character . What’s the best method escaping?
>
> http://<My Host> :8983/solr/#/articles/dataimport//dataimport?command=full-import&clean=true&optimize=true

Putting the URL in quotes would work ... but if you are calling a Solr
URL with /#/ in it, you're doing it wrong.

URLs with /#/ in them are specifically for the admin UI.  They only work
properly in a browser, where javascript and AJAX are available.  They
will NOT work like you expect with wget, even if you get the URL escaped
properly.

See the cron example that Ramzi Alqrainy gave you for the proper way of
requesting a full-import.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Automating Solr

Craig Hoffman
Thanks everyone. I got it working.
--
Craig Hoffman
w: http://www.craighoffmanphotography.com
FB: www.facebook.com/CraigHoffmanPhotography
TW: https://twitter.com/craiglhoffman













> On Oct 30, 2014, at 1:48 PM, Shawn Heisey <[hidden email]> wrote:
>
> On 10/30/2014 1:27 PM, Craig Hoffman wrote:
>> Thanks! One more question. WGET seems to choking on a my URL in particular the # and the & character . What’s the best method escaping?
>>
>> http://<My Host> :8983/solr/#/articles/dataimport//dataimport?command=full-import&clean=true&optimize=true
>
> Putting the URL in quotes would work ... but if you are calling a Solr
> URL with /#/ in it, you're doing it wrong.
>
> URLs with /#/ in them are specifically for the admin UI.  They only work
> properly in a browser, where javascript and AJAX are available.  They
> will NOT work like you expect with wget, even if you get the URL escaped
> properly.
>
> See the cron example that Ramzi Alqrainy gave you for the proper way of
> requesting a full-import.
>
> Thanks,
> Shawn
>