A working example to play with Naive Bayes classifier

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

A working example to play with Naive Bayes classifier

Tomas Ramanauskas
Hi, everyone,


would someone be able to share a working example (step by step) that demonstrates the use of Naive Bayes classifier in Solr?


I followed this Blog post:
https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947

And this tutorial:
http://yonik.com/solr-tutorial/

And this JIRA ticket:
https://issues.apache.org/jira/browse/SOLR-7739



So this is my configuration file (only what I added or modified):

  <initParams path="/update/**">
    <lst name="defaults">
      <str name="update.chain">classification</str>
    </lst>
  </initParams>


  <updateRequestProcessorChain name="classification">
    <processor class="solr.ClassificationUpdateProcessorFactory">
      <str name="inputFields">title_t,author_s</str>
      <str name="classField">cat_s</str>
      <str name="algorithm">bayes</str>
    </processor>
  </updateRequestProcessorChain>



If I modify an existing record, I think the functionality works:


$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book1",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s":"",
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":0,"QTime":8}}
$ curl http://localhost:8983/solr/demo/get?id=book1
{
  "doc":
  {
    "id":"book1",
    "title_t":["The Way of Kings"],
    "author_s":"Brandon Sanderson",
    "cat_s":"fantasy",
    "pubyear_i":2010,
    "ISBN_s":"978-0-7653-2635-5",
    "_version_":1535488016326328320}}




If I add a new document, something isn’t quite working:

$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book7",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s":"",
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":0,"QTime":0}}
$ curl http://localhost:8983/solr/demo/get?id=book7
{
  "doc":null}




Reply | Threaded
Open this post in threaded view
|

Re: A working example to play with Naive Bayes classifier

Tomas Ramanauskas
P.S. The version I use:

6.1.0-68

Also, earlier I said “If I modify an existing record, I think the functionality works:”, but I think it doesn’t work for me at all.

$ curl http://localhost:8983/solr/demo/get?id=book1
{
  "doc":
  {
    "id":"book1",
    "title_t":["The Way of Kings"],
    "author_s":"Brandon Sanderson",
    "cat_s":"fantasy",
    "pubyear_i":2010,
    "ISBN_s":"978-0-7653-2635-5",
    "_version_":1535488016326328320}}

$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book1",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s":"aaa",
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":0,"QTime":0}}

$ curl http://localhost:8983/solr/demo/get?id=book1
{
  "doc":
  {
    "id":"book1",
    "title_t":["The Way of Kings"],
    "author_s":"Brandon Sanderson",
    "cat_s":"fantasy",
    "pubyear_i":2010,
    "ISBN_s":"978-0-7653-2635-5",
    "_version_":1535488016326328320}}


Tomas


On 22 Jun 2016, at 12:47, Tomas Ramanauskas <[hidden email]<mailto:[hidden email]>> wrote:

Hi, everyone,


would someone be able to share a working example (step by step) that demonstrates the use of Naive Bayes classifier in Solr?


I followed this Blog post:
https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947

And this tutorial:
http://yonik.com/solr-tutorial/

And this JIRA ticket:
https://issues.apache.org/jira/browse/SOLR-7739



So this is my configuration file (only what I added or modified):

  <initParams path="/update/**">
    <lst name="defaults">
      <str name="update.chain">classification</str>
    </lst>
  </initParams>


  <updateRequestProcessorChain name="classification">
    <processor class="solr.ClassificationUpdateProcessorFactory">
      <str name="inputFields">title_t,author_s</str>
      <str name="classField">cat_s</str>
      <str name="algorithm">bayes</str>
    </processor>
  </updateRequestProcessorChain>



If I modify an existing record, I think the functionality works:


$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book1",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s":"",
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":0,"QTime":8}}
$ curl http://localhost:8983/solr/demo/get?id=book1
{
  "doc":
  {
    "id":"book1",
    "title_t":["The Way of Kings"],
    "author_s":"Brandon Sanderson",
    "cat_s":"fantasy",
    "pubyear_i":2010,
    "ISBN_s":"978-0-7653-2635-5",
    "_version_":1535488016326328320}}




If I add a new document, something isn’t quite working:

$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book7",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s":"",
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":0,"QTime":0}}
$ curl http://localhost:8983/solr/demo/get?id=book7
{
  "doc":null}





Reply | Threaded
Open this post in threaded view
|

Re: A working example to play with Naive Bayes classifier

Tomas Ramanauskas
I also tried this configuration, but could get the feature to work:



  <initParams path="/update/">
    <lst name="defaults">
      <str name="update.chain">classification</str>
    </lst>
  </initParams>


  <updateRequestProcessorChain name="classification">
    <processor class="solr.ClassificationUpdateProcessorFactory">
      <str name="inputFields">title_t,author_s</str>
      <str name="classField">cat_s</str>
      <str name="algorithm">bayes</str>
    </processor>
  </updateRequestProcessorChain>


Tomas

On 22 Jun 2016, at 13:46, Tomas Ramanauskas <[hidden email]<mailto:[hidden email]>> wrote:

P.S. The version I use:

6.1.0-68

Also, earlier I said “If I modify an existing record, I think the functionality works:”, but I think it doesn’t work for me at all.

$ curl http://localhost:8983/solr/demo/get?id=book1
{
  "doc":
  {
    "id":"book1",
    "title_t":["The Way of Kings"],
    "author_s":"Brandon Sanderson",
    "cat_s":"fantasy",
    "pubyear_i":2010,
    "ISBN_s":"978-0-7653-2635-5",
    "_version_":1535488016326328320}}

$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book1",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s":"aaa",
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":0,"QTime":0}}

$ curl http://localhost:8983/solr/demo/get?id=book1
{
  "doc":
  {
    "id":"book1",
    "title_t":["The Way of Kings"],
    "author_s":"Brandon Sanderson",
    "cat_s":"fantasy",
    "pubyear_i":2010,
    "ISBN_s":"978-0-7653-2635-5",
    "_version_":1535488016326328320}}


Tomas


On 22 Jun 2016, at 12:47, Tomas Ramanauskas <[hidden email]<mailto:[hidden email]>> wrote:

Hi, everyone,


would someone be able to share a working example (step by step) that demonstrates the use of Naive Bayes classifier in Solr?


I followed this Blog post:
https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947

And this tutorial:
http://yonik.com/solr-tutorial/

And this JIRA ticket:
https://issues.apache.org/jira/browse/SOLR-7739



So this is my configuration file (only what I added or modified):

  <initParams path="/update/**">
    <lst name="defaults">
      <str name="update.chain">classification</str>
    </lst>
  </initParams>


  <updateRequestProcessorChain name="classification">
    <processor class="solr.ClassificationUpdateProcessorFactory">
      <str name="inputFields">title_t,author_s</str>
      <str name="classField">cat_s</str>
      <str name="algorithm">bayes</str>
    </processor>
  </updateRequestProcessorChain>



If I modify an existing record, I think the functionality works:


$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book1",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s":"",
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":0,"QTime":8}}
$ curl http://localhost:8983/solr/demo/get?id=book1
{
  "doc":
  {
    "id":"book1",
    "title_t":["The Way of Kings"],
    "author_s":"Brandon Sanderson",
    "cat_s":"fantasy",
    "pubyear_i":2010,
    "ISBN_s":"978-0-7653-2635-5",
    "_version_":1535488016326328320}}




If I add a new document, something isn’t quite working:

$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book7",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s":"",
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":0,"QTime":0}}
$ curl http://localhost:8983/solr/demo/get?id=book7
{
  "doc":null}






Reply | Threaded
Open this post in threaded view
|

Re: A working example to play with Naive Bayes classifier

Alessandro Benedetti
Hi Tomas,
first consideration :
an empty string is different from a NULL string.
This is controversial, I would suggest you to never use the empty String as
this can cause some others side effect.
Apart from that, the plugin will add the class only if the class field is
without any value

> Object documentClass = doc.getFieldValue(classFieldName);
> if (documentClass == null) {
>
> Saying that, I would suggest you to build a sample index with some
document and then try to classify.
If this doesn't solve your issue, I can help you further.

Cheers

On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas <
[hidden email]> wrote:

> I also tried this configuration, but could get the feature to work:
>
>
>
>   <initParams path="/update/">
>     <lst name="defaults">
>       <str name="update.chain">classification</str>
>     </lst>
>   </initParams>
>
>
>   <updateRequestProcessorChain name="classification">
>     <processor class="solr.ClassificationUpdateProcessorFactory">
>       <str name="inputFields">title_t,author_s</str>
>       <str name="classField">cat_s</str>
>       <str name="algorithm">bayes</str>
>     </processor>
>   </updateRequestProcessorChain>
>
>
> Tomas
>
> On 22 Jun 2016, at 13:46, Tomas Ramanauskas <
> [hidden email]<mailto:[hidden email]>>
> wrote:
>
> P.S. The version I use:
>
> 6.1.0-68
>
> Also, earlier I said “If I modify an existing record, I think the
> functionality works:”, but I think it doesn’t work for me at all.
>
> $ curl http://localhost:8983/solr/demo/get?id=book1
> {
>   "doc":
>   {
>     "id":"book1",
>     "title_t":["The Way of Kings"],
>     "author_s":"Brandon Sanderson",
>     "cat_s":"fantasy",
>     "pubyear_i":2010,
>     "ISBN_s":"978-0-7653-2635-5",
>     "_version_":1535488016326328320}}
>
> $ curl http://localhost:8983/solr/demo/update -d '
> [
> {"id" : "book1",
> "title_t":["The Way of Kings"],
> "author_s":"Brandon Sanderson",
> "cat_s":"aaa",
> "pubyear_i":2010,
> "ISBN_s":"978-0-7653-2635-5"
> }
> ]'
> {"responseHeader":{"status":0,"QTime":0}}
>
> $ curl http://localhost:8983/solr/demo/get?id=book1
> {
>   "doc":
>   {
>     "id":"book1",
>     "title_t":["The Way of Kings"],
>     "author_s":"Brandon Sanderson",
>     "cat_s":"fantasy",
>     "pubyear_i":2010,
>     "ISBN_s":"978-0-7653-2635-5",
>     "_version_":1535488016326328320}}
>
>
> Tomas
>
>
> On 22 Jun 2016, at 12:47, Tomas Ramanauskas <
> [hidden email]<mailto:[hidden email]>>
> wrote:
>
> Hi, everyone,
>
>
> would someone be able to share a working example (step by step) that
> demonstrates the use of Naive Bayes classifier in Solr?
>
>
> I followed this Blog post:
>
> https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947
>
> And this tutorial:
> http://yonik.com/solr-tutorial/
>
> And this JIRA ticket:
> https://issues.apache.org/jira/browse/SOLR-7739
>
>
>
> So this is my configuration file (only what I added or modified):
>
>   <initParams path="/update/**">
>     <lst name="defaults">
>       <str name="update.chain">classification</str>
>     </lst>
>   </initParams>
>
>
>   <updateRequestProcessorChain name="classification">
>     <processor class="solr.ClassificationUpdateProcessorFactory">
>       <str name="inputFields">title_t,author_s</str>
>       <str name="classField">cat_s</str>
>       <str name="algorithm">bayes</str>
>     </processor>
>   </updateRequestProcessorChain>
>
>
>
> If I modify an existing record, I think the functionality works:
>
>
> $ curl http://localhost:8983/solr/demo/update -d '
> [
> {"id" : "book1",
> "title_t":["The Way of Kings"],
> "author_s":"Brandon Sanderson",
> "cat_s":"",
> "pubyear_i":2010,
> "ISBN_s":"978-0-7653-2635-5"
> }
> ]'
> {"responseHeader":{"status":0,"QTime":8}}
> $ curl http://localhost:8983/solr/demo/get?id=book1
> {
>   "doc":
>   {
>     "id":"book1",
>     "title_t":["The Way of Kings"],
>     "author_s":"Brandon Sanderson",
>     "cat_s":"fantasy",
>     "pubyear_i":2010,
>     "ISBN_s":"978-0-7653-2635-5",
>     "_version_":1535488016326328320}}
>
>
>
>
> If I add a new document, something isn’t quite working:
>
> $ curl http://localhost:8983/solr/demo/update -d '
> [
> {"id" : "book7",
> "title_t":["The Way of Kings"],
> "author_s":"Brandon Sanderson",
> "cat_s":"",
> "pubyear_i":2010,
> "ISBN_s":"978-0-7653-2635-5"
> }
> ]'
> {"responseHeader":{"status":0,"QTime":0}}
> $ curl http://localhost:8983/solr/demo/get?id=book7
> {
>   "doc":null}
>
>
>
>
>
>
>


--
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: A working example to play with Naive Bayes classifier

Tomas Ramanauskas
Thanks for the response, Alessandro.

I tried this and it didn’t work either:



$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book14",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s": null,
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]’

{"responseHeader":{"status":0,"QTime":2}}

$ curl http://localhost:8983/solr/demo/get?id=book14
{
  "doc":
  {
    "id":"book14",
    "title_t":["The Way of Kings"],
    "author_s":"Brandon Sanderson",
    "pubyear_i":2010,
    "ISBN_s":"978-0-7653-2635-5",
    "_version_":1537854598189940736}}


I don’t see “cat_s” field in the results at all.


Tomas


On 22 Jun 2016, at 16:39, Alessandro Benedetti <[hidden email]<mailto:[hidden email]>> wrote:

Hi Tomas,
first consideration :
an empty string is different from a NULL string.
This is controversial, I would suggest you to never use the empty String as
this can cause some others side effect.
Apart from that, the plugin will add the class only if the class field is
without any value

Object documentClass = doc.getFieldValue(classFieldName);
if (documentClass == null) {

Saying that, I would suggest you to build a sample index with some
document and then try to classify.
If this doesn't solve your issue, I can help you further.

Cheers

On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas <
[hidden email]<mailto:[hidden email]>> wrote:

I also tried this configuration, but could get the feature to work:



 <initParams path="/update/">
   <lst name="defaults">
     <str name="update.chain">classification</str>
   </lst>
 </initParams>


 <updateRequestProcessorChain name="classification">
   <processor class="solr.ClassificationUpdateProcessorFactory">
     <str name="inputFields">title_t,author_s</str>
     <str name="classField">cat_s</str>
     <str name="algorithm">bayes</str>
   </processor>
 </updateRequestProcessorChain>


Tomas

On 22 Jun 2016, at 13:46, Tomas Ramanauskas <
[hidden email]<mailto:[hidden email]><mailto:[hidden email]>>
wrote:

P.S. The version I use:

6.1.0-68

Also, earlier I said “If I modify an existing record, I think the
functionality works:”, but I think it doesn’t work for me at all.

$ curl http://localhost:8983/solr/demo/get?id=book1
{
 "doc":
 {
   "id":"book1",
   "title_t":["The Way of Kings"],
   "author_s":"Brandon Sanderson",
   "cat_s":"fantasy",
   "pubyear_i":2010,
   "ISBN_s":"978-0-7653-2635-5",
   "_version_":1535488016326328320}}

$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book1",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s":"aaa",
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":0,"QTime":0}}

$ curl http://localhost:8983/solr/demo/get?id=book1
{
 "doc":
 {
   "id":"book1",
   "title_t":["The Way of Kings"],
   "author_s":"Brandon Sanderson",
   "cat_s":"fantasy",
   "pubyear_i":2010,
   "ISBN_s":"978-0-7653-2635-5",
   "_version_":1535488016326328320}}


Tomas


On 22 Jun 2016, at 12:47, Tomas Ramanauskas <
[hidden email]<mailto:[hidden email]><mailto:[hidden email]>>
wrote:

Hi, everyone,


would someone be able to share a working example (step by step) that
demonstrates the use of Naive Bayes classifier in Solr?


I followed this Blog post:

https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947

And this tutorial:
http://yonik.com/solr-tutorial/

And this JIRA ticket:
https://issues.apache.org/jira/browse/SOLR-7739



So this is my configuration file (only what I added or modified):

 <initParams path="/update/**">
   <lst name="defaults">
     <str name="update.chain">classification</str>
   </lst>
 </initParams>


 <updateRequestProcessorChain name="classification">
   <processor class="solr.ClassificationUpdateProcessorFactory">
     <str name="inputFields">title_t,author_s</str>
     <str name="classField">cat_s</str>
     <str name="algorithm">bayes</str>
   </processor>
 </updateRequestProcessorChain>



If I modify an existing record, I think the functionality works:


$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book1",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s":"",
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":0,"QTime":8}}
$ curl http://localhost:8983/solr/demo/get?id=book1
{
 "doc":
 {
   "id":"book1",
   "title_t":["The Way of Kings"],
   "author_s":"Brandon Sanderson",
   "cat_s":"fantasy",
   "pubyear_i":2010,
   "ISBN_s":"978-0-7653-2635-5",
   "_version_":1535488016326328320}}




If I add a new document, something isn’t quite working:

$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book7",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s":"",
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":0,"QTime":0}}
$ curl http://localhost:8983/solr/demo/get?id=book7
{
 "doc":null}









--
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Reply | Threaded
Open this post in threaded view
|

Re: A working example to play with Naive Bayes classifier

Tomas Ramanauskas

I also tried with this config (adding **):


  <initParams path="/update/**">
    <lst name="defaults">
      <str name="update.chain">classification</str>
    </lst>
  </initParams>





And I get the error:



$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book15",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s": null,
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":500,"QTime":29},"error":{"trace":"java.lang.NullPointerException\n\tat org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.getTokenArray(SimpleNaiveBayesDocumentClassifier.java:202)\n\tat org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.analyzeSeedDocument(SimpleNaiveBayesDocumentClassifier.java:162)\n\tat org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignNormClasses(SimpleNaiveBayesDocumentClassifier.java:121)\n\tat org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignClass(SimpleNaiveBayesDocumentClassifier.java:81)\n\tat org.apache.solr.update.processor.ClassificationUpdateProcessor.processAdd(ClassificationUpdateProcessor.java:94)\n\tat org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:474)\n\tat org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:138)\n\tat org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:114)\n\tat org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:77)\n\tat org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)\n\tat org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\n\tat org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat java.lang.Thread.run(Thread.java:745)\n","code":500}}


Tomas


On 22 Jun 2016, at 17:22, Tomas Ramanauskas <[hidden email]<mailto:[hidden email]>> wrote:

Thanks for the response, Alessandro.

I tried this and it didn’t work either:



$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book14",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s": null,
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]’

{"responseHeader":{"status":0,"QTime":2}}

$ curl http://localhost:8983/solr/demo/get?id=book14
{
  "doc":
  {
    "id":"book14",
    "title_t":["The Way of Kings"],
    "author_s":"Brandon Sanderson",
    "pubyear_i":2010,
    "ISBN_s":"978-0-7653-2635-5",
    "_version_":1537854598189940736}}


I don’t see “cat_s” field in the results at all.


Tomas


On 22 Jun 2016, at 16:39, Alessandro Benedetti <[hidden email]<mailto:[hidden email]>> wrote:

Hi Tomas,
first consideration :
an empty string is different from a NULL string.
This is controversial, I would suggest you to never use the empty String as
this can cause some others side effect.
Apart from that, the plugin will add the class only if the class field is
without any value

Object documentClass = doc.getFieldValue(classFieldName);
if (documentClass == null) {

Saying that, I would suggest you to build a sample index with some
document and then try to classify.
If this doesn't solve your issue, I can help you further.

Cheers

On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas <
[hidden email]<mailto:[hidden email]>> wrote:

I also tried this configuration, but could get the feature to work:



 <initParams path="/update/">
   <lst name="defaults">
     <str name="update.chain">classification</str>
   </lst>
 </initParams>


 <updateRequestProcessorChain name="classification">
   <processor class="solr.ClassificationUpdateProcessorFactory">
     <str name="inputFields">title_t,author_s</str>
     <str name="classField">cat_s</str>
     <str name="algorithm">bayes</str>
   </processor>
 </updateRequestProcessorChain>


Tomas

On 22 Jun 2016, at 13:46, Tomas Ramanauskas <
[hidden email]<mailto:[hidden email]><mailto:[hidden email]>>
wrote:

P.S. The version I use:

6.1.0-68

Also, earlier I said “If I modify an existing record, I think the
functionality works:”, but I think it doesn’t work for me at all.

$ curl http://localhost:8983/solr/demo/get?id=book1
{
 "doc":
 {
   "id":"book1",
   "title_t":["The Way of Kings"],
   "author_s":"Brandon Sanderson",
   "cat_s":"fantasy",
   "pubyear_i":2010,
   "ISBN_s":"978-0-7653-2635-5",
   "_version_":1535488016326328320}}

$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book1",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s":"aaa",
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":0,"QTime":0}}

$ curl http://localhost:8983/solr/demo/get?id=book1
{
 "doc":
 {
   "id":"book1",
   "title_t":["The Way of Kings"],
   "author_s":"Brandon Sanderson",
   "cat_s":"fantasy",
   "pubyear_i":2010,
   "ISBN_s":"978-0-7653-2635-5",
   "_version_":1535488016326328320}}


Tomas


On 22 Jun 2016, at 12:47, Tomas Ramanauskas <
[hidden email]<mailto:[hidden email]><mailto:[hidden email]>>
wrote:

Hi, everyone,


would someone be able to share a working example (step by step) that
demonstrates the use of Naive Bayes classifier in Solr?


I followed this Blog post:

https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947

And this tutorial:
http://yonik.com/solr-tutorial/

And this JIRA ticket:
https://issues.apache.org/jira/browse/SOLR-7739



So this is my configuration file (only what I added or modified):

 <initParams path="/update/**">
   <lst name="defaults">
     <str name="update.chain">classification</str>
   </lst>
 </initParams>


 <updateRequestProcessorChain name="classification">
   <processor class="solr.ClassificationUpdateProcessorFactory">
     <str name="inputFields">title_t,author_s</str>
     <str name="classField">cat_s</str>
     <str name="algorithm">bayes</str>
   </processor>
 </updateRequestProcessorChain>



If I modify an existing record, I think the functionality works:


$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book1",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s":"",
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":0,"QTime":8}}
$ curl http://localhost:8983/solr/demo/get?id=book1
{
 "doc":
 {
   "id":"book1",
   "title_t":["The Way of Kings"],
   "author_s":"Brandon Sanderson",
   "cat_s":"fantasy",
   "pubyear_i":2010,
   "ISBN_s":"978-0-7653-2635-5",
   "_version_":1535488016326328320}}




If I add a new document, something isn’t quite working:

$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "book7",
"title_t":["The Way of Kings"],
"author_s":"Brandon Sanderson",
"cat_s":"",
"pubyear_i":2010,
"ISBN_s":"978-0-7653-2635-5"
}
]'
{"responseHeader":{"status":0,"QTime":0}}
$ curl http://localhost:8983/solr/demo/get?id=book7
{
 "doc":null}









--
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Reply | Threaded
Open this post in threaded view
|

Re: A working example to play with Naive Bayes classifier

Alessandro Benedetti-4
This is better!  At list the classifier is invoked!
How many docs in the index have the class assigned?
Take a look to the stacktrace and you should find the cause!
I am now on mobile, I will check the code tomorrow!
Cheers
On 22 Jun 2016 5:26 pm, "Tomas Ramanauskas" <[hidden email]>
wrote:

>
> I also tried with this config (adding **):
>
>
>   <initParams path="/update/**">
>     <lst name="defaults">
>       <str name="update.chain">classification</str>
>     </lst>
>   </initParams>
>
>
>
>
>
> And I get the error:
>
>
>
> $ curl http://localhost:8983/solr/demo/update -d '
> [
> {"id" : "book15",
> "title_t":["The Way of Kings"],
> "author_s":"Brandon Sanderson",
> "cat_s": null,
> "pubyear_i":2010,
> "ISBN_s":"978-0-7653-2635-5"
> }
> ]'
> {"responseHeader":{"status":500,"QTime":29},"error":{"trace":"java.lang.NullPointerException\n\tat
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.getTokenArray(SimpleNaiveBayesDocumentClassifier.java:202)\n\tat
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.analyzeSeedDocument(SimpleNaiveBayesDocumentClassifier.java:162)\n\tat
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignNormClasses(SimpleNaiveBayesDocumentClassifier.java:121)\n\tat
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignClass(SimpleNaiveBayesDocumentClassifier.java:81)\n\tat
> org.apache.solr.update.processor.ClassificationUpdateProcessor.processAdd(ClassificationUpdateProcessor.java:94)\n\tat
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:474)\n\tat
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:138)\n\tat
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:114)\n\tat
> org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:77)\n\tat
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)\n\tat
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)\n\tat
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)\n\tat
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\n\tat
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\n\tat
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\n\tat
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\tat
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat
> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\n\tat
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\n\tat
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat
> java.lang.Thread.run(Thread.java:745)\n","code":500}}
>
>
> Tomas
>
>
> On 22 Jun 2016, at 17:22, Tomas Ramanauskas <
> [hidden email]<mailto:[hidden email]>>
> wrote:
>
> Thanks for the response, Alessandro.
>
> I tried this and it didn’t work either:
>
>
>
> $ curl http://localhost:8983/solr/demo/update -d '
> [
> {"id" : "book14",
> "title_t":["The Way of Kings"],
> "author_s":"Brandon Sanderson",
> "cat_s": null,
> "pubyear_i":2010,
> "ISBN_s":"978-0-7653-2635-5"
> }
> ]’
>
> {"responseHeader":{"status":0,"QTime":2}}
>
> $ curl http://localhost:8983/solr/demo/get?id=book14
> {
>   "doc":
>   {
>     "id":"book14",
>     "title_t":["The Way of Kings"],
>     "author_s":"Brandon Sanderson",
>     "pubyear_i":2010,
>     "ISBN_s":"978-0-7653-2635-5",
>     "_version_":1537854598189940736}}
>
>
> I don’t see “cat_s” field in the results at all.
>
>
> Tomas
>
>
> On 22 Jun 2016, at 16:39, Alessandro Benedetti <[hidden email]
> <mailto:[hidden email]>> wrote:
>
> Hi Tomas,
> first consideration :
> an empty string is different from a NULL string.
> This is controversial, I would suggest you to never use the empty String as
> this can cause some others side effect.
> Apart from that, the plugin will add the class only if the class field is
> without any value
>
> Object documentClass = doc.getFieldValue(classFieldName);
> if (documentClass == null) {
>
> Saying that, I would suggest you to build a sample index with some
> document and then try to classify.
> If this doesn't solve your issue, I can help you further.
>
> Cheers
>
> On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas <
> [hidden email]<mailto:[hidden email]>>
> wrote:
>
> I also tried this configuration, but could get the feature to work:
>
>
>
>  <initParams path="/update/">
>    <lst name="defaults">
>      <str name="update.chain">classification</str>
>    </lst>
>  </initParams>
>
>
>  <updateRequestProcessorChain name="classification">
>    <processor class="solr.ClassificationUpdateProcessorFactory">
>      <str name="inputFields">title_t,author_s</str>
>      <str name="classField">cat_s</str>
>      <str name="algorithm">bayes</str>
>    </processor>
>  </updateRequestProcessorChain>
>
>
> Tomas
>
> On 22 Jun 2016, at 13:46, Tomas Ramanauskas <
> [hidden email]<mailto:[hidden email]
> ><mailto:[hidden email]>>
> wrote:
>
> P.S. The version I use:
>
> 6.1.0-68
>
> Also, earlier I said “If I modify an existing record, I think the
> functionality works:”, but I think it doesn’t work for me at all.
>
> $ curl http://localhost:8983/solr/demo/get?id=book1
> {
>  "doc":
>  {
>    "id":"book1",
>    "title_t":["The Way of Kings"],
>    "author_s":"Brandon Sanderson",
>    "cat_s":"fantasy",
>    "pubyear_i":2010,
>    "ISBN_s":"978-0-7653-2635-5",
>    "_version_":1535488016326328320}}
>
> $ curl http://localhost:8983/solr/demo/update -d '
> [
> {"id" : "book1",
> "title_t":["The Way of Kings"],
> "author_s":"Brandon Sanderson",
> "cat_s":"aaa",
> "pubyear_i":2010,
> "ISBN_s":"978-0-7653-2635-5"
> }
> ]'
> {"responseHeader":{"status":0,"QTime":0}}
>
> $ curl http://localhost:8983/solr/demo/get?id=book1
> {
>  "doc":
>  {
>    "id":"book1",
>    "title_t":["The Way of Kings"],
>    "author_s":"Brandon Sanderson",
>    "cat_s":"fantasy",
>    "pubyear_i":2010,
>    "ISBN_s":"978-0-7653-2635-5",
>    "_version_":1535488016326328320}}
>
>
> Tomas
>
>
> On 22 Jun 2016, at 12:47, Tomas Ramanauskas <
> [hidden email]<mailto:[hidden email]
> ><mailto:[hidden email]>>
> wrote:
>
> Hi, everyone,
>
>
> would someone be able to share a working example (step by step) that
> demonstrates the use of Naive Bayes classifier in Solr?
>
>
> I followed this Blog post:
>
>
> https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947
>
> And this tutorial:
> http://yonik.com/solr-tutorial/
>
> And this JIRA ticket:
> https://issues.apache.org/jira/browse/SOLR-7739
>
>
>
> So this is my configuration file (only what I added or modified):
>
>  <initParams path="/update/**">
>    <lst name="defaults">
>      <str name="update.chain">classification</str>
>    </lst>
>  </initParams>
>
>
>  <updateRequestProcessorChain name="classification">
>    <processor class="solr.ClassificationUpdateProcessorFactory">
>      <str name="inputFields">title_t,author_s</str>
>      <str name="classField">cat_s</str>
>      <str name="algorithm">bayes</str>
>    </processor>
>  </updateRequestProcessorChain>
>
>
>
> If I modify an existing record, I think the functionality works:
>
>
> $ curl http://localhost:8983/solr/demo/update -d '
> [
> {"id" : "book1",
> "title_t":["The Way of Kings"],
> "author_s":"Brandon Sanderson",
> "cat_s":"",
> "pubyear_i":2010,
> "ISBN_s":"978-0-7653-2635-5"
> }
> ]'
> {"responseHeader":{"status":0,"QTime":8}}
> $ curl http://localhost:8983/solr/demo/get?id=book1
> {
>  "doc":
>  {
>    "id":"book1",
>    "title_t":["The Way of Kings"],
>    "author_s":"Brandon Sanderson",
>    "cat_s":"fantasy",
>    "pubyear_i":2010,
>    "ISBN_s":"978-0-7653-2635-5",
>    "_version_":1535488016326328320}}
>
>
>
>
> If I add a new document, something isn’t quite working:
>
> $ curl http://localhost:8983/solr/demo/update -d '
> [
> {"id" : "book7",
> "title_t":["The Way of Kings"],
> "author_s":"Brandon Sanderson",
> "cat_s":"",
> "pubyear_i":2010,
> "ISBN_s":"978-0-7653-2635-5"
> }
> ]'
> {"responseHeader":{"status":0,"QTime":0}}
> $ curl http://localhost:8983/solr/demo/get?id=book7
> {
>  "doc":null}
>
>
>
>
>
>
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: A working example to play with Naive Bayes classifier

Alessandro Benedetti-4
Can you give an example of your schema, and can you run a simple query for
you index, curious to see how the input fields are analyzed.

Cheers

On Wed, Jun 22, 2016 at 6:05 PM, Alessandro Benedetti <
[hidden email]> wrote:

> This is better!  At list the classifier is invoked!
> How many docs in the index have the class assigned?
> Take a look to the stacktrace and you should find the cause!
> I am now on mobile, I will check the code tomorrow!
> Cheers
> On 22 Jun 2016 5:26 pm, "Tomas Ramanauskas" <
> [hidden email]> wrote:
>
>>
>> I also tried with this config (adding **):
>>
>>
>>   <initParams path="/update/**">
>>     <lst name="defaults">
>>       <str name="update.chain">classification</str>
>>     </lst>
>>   </initParams>
>>
>>
>>
>>
>>
>> And I get the error:
>>
>>
>>
>> $ curl http://localhost:8983/solr/demo/update -d '
>> [
>> {"id" : "book15",
>> "title_t":["The Way of Kings"],
>> "author_s":"Brandon Sanderson",
>> "cat_s": null,
>> "pubyear_i":2010,
>> "ISBN_s":"978-0-7653-2635-5"
>> }
>> ]'
>> {"responseHeader":{"status":500,"QTime":29},"error":{"trace":"java.lang.NullPointerException\n\tat
>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.getTokenArray(SimpleNaiveBayesDocumentClassifier.java:202)\n\tat
>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.analyzeSeedDocument(SimpleNaiveBayesDocumentClassifier.java:162)\n\tat
>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignNormClasses(SimpleNaiveBayesDocumentClassifier.java:121)\n\tat
>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignClass(SimpleNaiveBayesDocumentClassifier.java:81)\n\tat
>> org.apache.solr.update.processor.ClassificationUpdateProcessor.processAdd(ClassificationUpdateProcessor.java:94)\n\tat
>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:474)\n\tat
>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:138)\n\tat
>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:114)\n\tat
>> org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:77)\n\tat
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)\n\tat
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)\n\tat
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)\n\tat
>> org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\n\tat
>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\n\tat
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\n\tat
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
>> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat
>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat
>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat
>> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat
>> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\n\tat
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\n\tat
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat
>> java.lang.Thread.run(Thread.java:745)\n","code":500}}
>>
>>
>> Tomas
>>
>>
>> On 22 Jun 2016, at 17:22, Tomas Ramanauskas <
>> [hidden email]<mailto:[hidden email]>>
>> wrote:
>>
>> Thanks for the response, Alessandro.
>>
>> I tried this and it didn’t work either:
>>
>>
>>
>> $ curl http://localhost:8983/solr/demo/update -d '
>> [
>> {"id" : "book14",
>> "title_t":["The Way of Kings"],
>> "author_s":"Brandon Sanderson",
>> "cat_s": null,
>> "pubyear_i":2010,
>> "ISBN_s":"978-0-7653-2635-5"
>> }
>> ]’
>>
>> {"responseHeader":{"status":0,"QTime":2}}
>>
>> $ curl http://localhost:8983/solr/demo/get?id=book14
>> {
>>   "doc":
>>   {
>>     "id":"book14",
>>     "title_t":["The Way of Kings"],
>>     "author_s":"Brandon Sanderson",
>>     "pubyear_i":2010,
>>     "ISBN_s":"978-0-7653-2635-5",
>>     "_version_":1537854598189940736}}
>>
>>
>> I don’t see “cat_s” field in the results at all.
>>
>>
>> Tomas
>>
>>
>> On 22 Jun 2016, at 16:39, Alessandro Benedetti <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>> Hi Tomas,
>> first consideration :
>> an empty string is different from a NULL string.
>> This is controversial, I would suggest you to never use the empty String
>> as
>> this can cause some others side effect.
>> Apart from that, the plugin will add the class only if the class field is
>> without any value
>>
>> Object documentClass = doc.getFieldValue(classFieldName);
>> if (documentClass == null) {
>>
>> Saying that, I would suggest you to build a sample index with some
>> document and then try to classify.
>> If this doesn't solve your issue, I can help you further.
>>
>> Cheers
>>
>> On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas <
>> [hidden email]<mailto:[hidden email]>>
>> wrote:
>>
>> I also tried this configuration, but could get the feature to work:
>>
>>
>>
>>  <initParams path="/update/">
>>    <lst name="defaults">
>>      <str name="update.chain">classification</str>
>>    </lst>
>>  </initParams>
>>
>>
>>  <updateRequestProcessorChain name="classification">
>>    <processor class="solr.ClassificationUpdateProcessorFactory">
>>      <str name="inputFields">title_t,author_s</str>
>>      <str name="classField">cat_s</str>
>>      <str name="algorithm">bayes</str>
>>    </processor>
>>  </updateRequestProcessorChain>
>>
>>
>> Tomas
>>
>> On 22 Jun 2016, at 13:46, Tomas Ramanauskas <
>> [hidden email]<mailto:[hidden email]
>> ><mailto:[hidden email]>>
>> wrote:
>>
>> P.S. The version I use:
>>
>> 6.1.0-68
>>
>> Also, earlier I said “If I modify an existing record, I think the
>> functionality works:”, but I think it doesn’t work for me at all.
>>
>> $ curl http://localhost:8983/solr/demo/get?id=book1
>> {
>>  "doc":
>>  {
>>    "id":"book1",
>>    "title_t":["The Way of Kings"],
>>    "author_s":"Brandon Sanderson",
>>    "cat_s":"fantasy",
>>    "pubyear_i":2010,
>>    "ISBN_s":"978-0-7653-2635-5",
>>    "_version_":1535488016326328320}}
>>
>> $ curl http://localhost:8983/solr/demo/update -d '
>> [
>> {"id" : "book1",
>> "title_t":["The Way of Kings"],
>> "author_s":"Brandon Sanderson",
>> "cat_s":"aaa",
>> "pubyear_i":2010,
>> "ISBN_s":"978-0-7653-2635-5"
>> }
>> ]'
>> {"responseHeader":{"status":0,"QTime":0}}
>>
>> $ curl http://localhost:8983/solr/demo/get?id=book1
>> {
>>  "doc":
>>  {
>>    "id":"book1",
>>    "title_t":["The Way of Kings"],
>>    "author_s":"Brandon Sanderson",
>>    "cat_s":"fantasy",
>>    "pubyear_i":2010,
>>    "ISBN_s":"978-0-7653-2635-5",
>>    "_version_":1535488016326328320}}
>>
>>
>> Tomas
>>
>>
>> On 22 Jun 2016, at 12:47, Tomas Ramanauskas <
>> [hidden email]<mailto:[hidden email]
>> ><mailto:[hidden email]>>
>> wrote:
>>
>> Hi, everyone,
>>
>>
>> would someone be able to share a working example (step by step) that
>> demonstrates the use of Naive Bayes classifier in Solr?
>>
>>
>> I followed this Blog post:
>>
>>
>> https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947
>>
>> And this tutorial:
>> http://yonik.com/solr-tutorial/
>>
>> And this JIRA ticket:
>> https://issues.apache.org/jira/browse/SOLR-7739
>>
>>
>>
>> So this is my configuration file (only what I added or modified):
>>
>>  <initParams path="/update/**">
>>    <lst name="defaults">
>>      <str name="update.chain">classification</str>
>>    </lst>
>>  </initParams>
>>
>>
>>  <updateRequestProcessorChain name="classification">
>>    <processor class="solr.ClassificationUpdateProcessorFactory">
>>      <str name="inputFields">title_t,author_s</str>
>>      <str name="classField">cat_s</str>
>>      <str name="algorithm">bayes</str>
>>    </processor>
>>  </updateRequestProcessorChain>
>>
>>
>>
>> If I modify an existing record, I think the functionality works:
>>
>>
>> $ curl http://localhost:8983/solr/demo/update -d '
>> [
>> {"id" : "book1",
>> "title_t":["The Way of Kings"],
>> "author_s":"Brandon Sanderson",
>> "cat_s":"",
>> "pubyear_i":2010,
>> "ISBN_s":"978-0-7653-2635-5"
>> }
>> ]'
>> {"responseHeader":{"status":0,"QTime":8}}
>> $ curl http://localhost:8983/solr/demo/get?id=book1
>> {
>>  "doc":
>>  {
>>    "id":"book1",
>>    "title_t":["The Way of Kings"],
>>    "author_s":"Brandon Sanderson",
>>    "cat_s":"fantasy",
>>    "pubyear_i":2010,
>>    "ISBN_s":"978-0-7653-2635-5",
>>    "_version_":1535488016326328320}}
>>
>>
>>
>>
>> If I add a new document, something isn’t quite working:
>>
>> $ curl http://localhost:8983/solr/demo/update -d '
>> [
>> {"id" : "book7",
>> "title_t":["The Way of Kings"],
>> "author_s":"Brandon Sanderson",
>> "cat_s":"",
>> "pubyear_i":2010,
>> "ISBN_s":"978-0-7653-2635-5"
>> }
>> ]'
>> {"responseHeader":{"status":0,"QTime":0}}
>> $ curl http://localhost:8983/solr/demo/get?id=book7
>> {
>>  "doc":null}
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> --------------------------
>>
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England
>>
>>
>>


--
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England
Reply | Threaded
Open this post in threaded view
|

Re: A working example to play with Naive Bayes classifier

Tomas Ramanauskas
Hi, Allesandro,

sorry for the delay. What do you mean?


As I mentioned earlier, I followed a super simply set of steps.

1. Download Solr
2. Configure classification
3. Create some documents using curl over HTTP.


Is it difficult to reproduce the steps / problem?


Tomas



> On 23 Jun 2016, at 16:42, Alessandro Benedetti <[hidden email]> wrote:
>
> Can you give an example of your schema, and can you run a simple query for
> you index, curious to see how the input fields are analyzed.
>
> Cheers
>
> On Wed, Jun 22, 2016 at 6:05 PM, Alessandro Benedetti <
> [hidden email]> wrote:
>
>> This is better!  At list the classifier is invoked!
>> How many docs in the index have the class assigned?
>> Take a look to the stacktrace and you should find the cause!
>> I am now on mobile, I will check the code tomorrow!
>> Cheers
>> On 22 Jun 2016 5:26 pm, "Tomas Ramanauskas" <
>> [hidden email]> wrote:
>>
>>>
>>> I also tried with this config (adding **):
>>>
>>>
>>>  <initParams path="/update/**">
>>>    <lst name="defaults">
>>>      <str name="update.chain">classification</str>
>>>    </lst>
>>>  </initParams>
>>>
>>>
>>>
>>>
>>>
>>> And I get the error:
>>>
>>>
>>>
>>> $ curl http://localhost:8983/solr/demo/update -d '
>>> [
>>> {"id" : "book15",
>>> "title_t":["The Way of Kings"],
>>> "author_s":"Brandon Sanderson",
>>> "cat_s": null,
>>> "pubyear_i":2010,
>>> "ISBN_s":"978-0-7653-2635-5"
>>> }
>>> ]'
>>> {"responseHeader":{"status":500,"QTime":29},"error":{"trace":"java.lang.NullPointerException\n\tat
>>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.getTokenArray(SimpleNaiveBayesDocumentClassifier.java:202)\n\tat
>>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.analyzeSeedDocument(SimpleNaiveBayesDocumentClassifier.java:162)\n\tat
>>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignNormClasses(SimpleNaiveBayesDocumentClassifier.java:121)\n\tat
>>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignClass(SimpleNaiveBayesDocumentClassifier.java:81)\n\tat
>>> org.apache.solr.update.processor.ClassificationUpdateProcessor.processAdd(ClassificationUpdateProcessor.java:94)\n\tat
>>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:474)\n\tat
>>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:138)\n\tat
>>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:114)\n\tat
>>> org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:77)\n\tat
>>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)\n\tat
>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)\n\tat
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)\n\tat
>>> org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\n\tat
>>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\n\tat
>>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\n\tat
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat
>>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat
>>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
>>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\n\tat
>>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\tat
>>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
>>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat
>>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
>>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
>>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
>>> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat
>>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat
>>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat
>>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat
>>> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat
>>> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
>>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\n\tat
>>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\n\tat
>>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat
>>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat
>>> java.lang.Thread.run(Thread.java:745)\n","code":500}}
>>>
>>>
>>> Tomas
>>>
>>>
>>> On 22 Jun 2016, at 17:22, Tomas Ramanauskas <
>>> [hidden email]<mailto:[hidden email]>>
>>> wrote:
>>>
>>> Thanks for the response, Alessandro.
>>>
>>> I tried this and it didn’t work either:
>>>
>>>
>>>
>>> $ curl http://localhost:8983/solr/demo/update -d '
>>> [
>>> {"id" : "book14",
>>> "title_t":["The Way of Kings"],
>>> "author_s":"Brandon Sanderson",
>>> "cat_s": null,
>>> "pubyear_i":2010,
>>> "ISBN_s":"978-0-7653-2635-5"
>>> }
>>> ]’
>>>
>>> {"responseHeader":{"status":0,"QTime":2}}
>>>
>>> $ curl http://localhost:8983/solr/demo/get?id=book14
>>> {
>>>  "doc":
>>>  {
>>>    "id":"book14",
>>>    "title_t":["The Way of Kings"],
>>>    "author_s":"Brandon Sanderson",
>>>    "pubyear_i":2010,
>>>    "ISBN_s":"978-0-7653-2635-5",
>>>    "_version_":1537854598189940736}}
>>>
>>>
>>> I don’t see “cat_s” field in the results at all.
>>>
>>>
>>> Tomas
>>>
>>>
>>> On 22 Jun 2016, at 16:39, Alessandro Benedetti <[hidden email]
>>> <mailto:[hidden email]>> wrote:
>>>
>>> Hi Tomas,
>>> first consideration :
>>> an empty string is different from a NULL string.
>>> This is controversial, I would suggest you to never use the empty String
>>> as
>>> this can cause some others side effect.
>>> Apart from that, the plugin will add the class only if the class field is
>>> without any value
>>>
>>> Object documentClass = doc.getFieldValue(classFieldName);
>>> if (documentClass == null) {
>>>
>>> Saying that, I would suggest you to build a sample index with some
>>> document and then try to classify.
>>> If this doesn't solve your issue, I can help you further.
>>>
>>> Cheers
>>>
>>> On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas <
>>> [hidden email]<mailto:[hidden email]>>
>>> wrote:
>>>
>>> I also tried this configuration, but could get the feature to work:
>>>
>>>
>>>
>>> <initParams path="/update/">
>>>   <lst name="defaults">
>>>     <str name="update.chain">classification</str>
>>>   </lst>
>>> </initParams>
>>>
>>>
>>> <updateRequestProcessorChain name="classification">
>>>   <processor class="solr.ClassificationUpdateProcessorFactory">
>>>     <str name="inputFields">title_t,author_s</str>
>>>     <str name="classField">cat_s</str>
>>>     <str name="algorithm">bayes</str>
>>>   </processor>
>>> </updateRequestProcessorChain>
>>>
>>>
>>> Tomas
>>>
>>> On 22 Jun 2016, at 13:46, Tomas Ramanauskas <
>>> [hidden email]<mailto:[hidden email]
>>>> <mailto:[hidden email]>>
>>> wrote:
>>>
>>> P.S. The version I use:
>>>
>>> 6.1.0-68
>>>
>>> Also, earlier I said “If I modify an existing record, I think the
>>> functionality works:”, but I think it doesn’t work for me at all.
>>>
>>> $ curl http://localhost:8983/solr/demo/get?id=book1
>>> {
>>> "doc":
>>> {
>>>   "id":"book1",
>>>   "title_t":["The Way of Kings"],
>>>   "author_s":"Brandon Sanderson",
>>>   "cat_s":"fantasy",
>>>   "pubyear_i":2010,
>>>   "ISBN_s":"978-0-7653-2635-5",
>>>   "_version_":1535488016326328320}}
>>>
>>> $ curl http://localhost:8983/solr/demo/update -d '
>>> [
>>> {"id" : "book1",
>>> "title_t":["The Way of Kings"],
>>> "author_s":"Brandon Sanderson",
>>> "cat_s":"aaa",
>>> "pubyear_i":2010,
>>> "ISBN_s":"978-0-7653-2635-5"
>>> }
>>> ]'
>>> {"responseHeader":{"status":0,"QTime":0}}
>>>
>>> $ curl http://localhost:8983/solr/demo/get?id=book1
>>> {
>>> "doc":
>>> {
>>>   "id":"book1",
>>>   "title_t":["The Way of Kings"],
>>>   "author_s":"Brandon Sanderson",
>>>   "cat_s":"fantasy",
>>>   "pubyear_i":2010,
>>>   "ISBN_s":"978-0-7653-2635-5",
>>>   "_version_":1535488016326328320}}
>>>
>>>
>>> Tomas
>>>
>>>
>>> On 22 Jun 2016, at 12:47, Tomas Ramanauskas <
>>> [hidden email]<mailto:[hidden email]
>>>> <mailto:[hidden email]>>
>>> wrote:
>>>
>>> Hi, everyone,
>>>
>>>
>>> would someone be able to share a working example (step by step) that
>>> demonstrates the use of Naive Bayes classifier in Solr?
>>>
>>>
>>> I followed this Blog post:
>>>
>>>
>>> https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947
>>>
>>> And this tutorial:
>>> http://yonik.com/solr-tutorial/
>>>
>>> And this JIRA ticket:
>>> https://issues.apache.org/jira/browse/SOLR-7739
>>>
>>>
>>>
>>> So this is my configuration file (only what I added or modified):
>>>
>>> <initParams path="/update/**">
>>>   <lst name="defaults">
>>>     <str name="update.chain">classification</str>
>>>   </lst>
>>> </initParams>
>>>
>>>
>>> <updateRequestProcessorChain name="classification">
>>>   <processor class="solr.ClassificationUpdateProcessorFactory">
>>>     <str name="inputFields">title_t,author_s</str>
>>>     <str name="classField">cat_s</str>
>>>     <str name="algorithm">bayes</str>
>>>   </processor>
>>> </updateRequestProcessorChain>
>>>
>>>
>>>
>>> If I modify an existing record, I think the functionality works:
>>>
>>>
>>> $ curl http://localhost:8983/solr/demo/update -d '
>>> [
>>> {"id" : "book1",
>>> "title_t":["The Way of Kings"],
>>> "author_s":"Brandon Sanderson",
>>> "cat_s":"",
>>> "pubyear_i":2010,
>>> "ISBN_s":"978-0-7653-2635-5"
>>> }
>>> ]'
>>> {"responseHeader":{"status":0,"QTime":8}}
>>> $ curl http://localhost:8983/solr/demo/get?id=book1
>>> {
>>> "doc":
>>> {
>>>   "id":"book1",
>>>   "title_t":["The Way of Kings"],
>>>   "author_s":"Brandon Sanderson",
>>>   "cat_s":"fantasy",
>>>   "pubyear_i":2010,
>>>   "ISBN_s":"978-0-7653-2635-5",
>>>   "_version_":1535488016326328320}}
>>>
>>>
>>>
>>>
>>> If I add a new document, something isn’t quite working:
>>>
>>> $ curl http://localhost:8983/solr/demo/update -d '
>>> [
>>> {"id" : "book7",
>>> "title_t":["The Way of Kings"],
>>> "author_s":"Brandon Sanderson",
>>> "cat_s":"",
>>> "pubyear_i":2010,
>>> "ISBN_s":"978-0-7653-2635-5"
>>> }
>>> ]'
>>> {"responseHeader":{"status":0,"QTime":0}}
>>> $ curl http://localhost:8983/solr/demo/get?id=book7
>>> {
>>> "doc":null}
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> --------------------------
>>>
>>> Benedetti Alessandro
>>> Visiting card : http://about.me/alessandro_benedetti
>>>
>>> "Tyger, tyger burning bright
>>> In the forests of the night,
>>> What immortal hand or eye
>>> Could frame thy fearful symmetry?"
>>>
>>> William Blake - Songs of Experience -1794 England
>>>
>>>
>>>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England

Reply | Threaded
Open this post in threaded view
|

Re: A working example to play with Naive Bayes classifier

Alessandro Benedetti
But how big it is your index ? Are you expecting Solr to automatically
classify your documents without any knowledge groundbase ?
Please attach an example of schema.
There was a reason if I asked you :)
Seems related the fact we get no token from the text analysis.

Cheers

On Fri, Jul 15, 2016 at 12:11 PM, Tomas Ramanauskas <
[hidden email]> wrote:

> Hi, Allesandro,
>
> sorry for the delay. What do you mean?
>
>
> As I mentioned earlier, I followed a super simply set of steps.
>
> 1. Download Solr
> 2. Configure classification
> 3. Create some documents using curl over HTTP.
>
>
> Is it difficult to reproduce the steps / problem?
>
>
> Tomas
>
>
>
> > On 23 Jun 2016, at 16:42, Alessandro Benedetti <
> [hidden email]> wrote:
> >
> > Can you give an example of your schema, and can you run a simple query
> for
> > you index, curious to see how the input fields are analyzed.
> >
> > Cheers
> >
> > On Wed, Jun 22, 2016 at 6:05 PM, Alessandro Benedetti <
> > [hidden email]> wrote:
> >
> >> This is better!  At list the classifier is invoked!
> >> How many docs in the index have the class assigned?
> >> Take a look to the stacktrace and you should find the cause!
> >> I am now on mobile, I will check the code tomorrow!
> >> Cheers
> >> On 22 Jun 2016 5:26 pm, "Tomas Ramanauskas" <
> >> [hidden email]> wrote:
> >>
> >>>
> >>> I also tried with this config (adding **):
> >>>
> >>>
> >>>  <initParams path="/update/**">
> >>>    <lst name="defaults">
> >>>      <str name="update.chain">classification</str>
> >>>    </lst>
> >>>  </initParams>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> And I get the error:
> >>>
> >>>
> >>>
> >>> $ curl http://localhost:8983/solr/demo/update -d '
> >>> [
> >>> {"id" : "book15",
> >>> "title_t":["The Way of Kings"],
> >>> "author_s":"Brandon Sanderson",
> >>> "cat_s": null,
> >>> "pubyear_i":2010,
> >>> "ISBN_s":"978-0-7653-2635-5"
> >>> }
> >>> ]'
> >>>
> {"responseHeader":{"status":500,"QTime":29},"error":{"trace":"java.lang.NullPointerException\n\tat
> >>>
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.getTokenArray(SimpleNaiveBayesDocumentClassifier.java:202)\n\tat
> >>>
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.analyzeSeedDocument(SimpleNaiveBayesDocumentClassifier.java:162)\n\tat
> >>>
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignNormClasses(SimpleNaiveBayesDocumentClassifier.java:121)\n\tat
> >>>
> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignClass(SimpleNaiveBayesDocumentClassifier.java:81)\n\tat
> >>>
> org.apache.solr.update.processor.ClassificationUpdateProcessor.processAdd(ClassificationUpdateProcessor.java:94)\n\tat
> >>>
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:474)\n\tat
> >>>
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:138)\n\tat
> >>>
> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:114)\n\tat
> >>>
> org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:77)\n\tat
> >>>
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)\n\tat
> >>>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)\n\tat
> >>>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)\n\tat
> >>> org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\n\tat
> >>>
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\n\tat
> >>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\n\tat
> >>>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
> >>>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat
> >>>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat
> >>>
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat
> >>>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
> >>>
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
> >>>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
> >>>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\n\tat
> >>>
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\tat
> >>>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
> >>>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat
> >>>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
> >>>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
> >>>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
> >>>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
> >>> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat
> >>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat
> >>>
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat
> >>>
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat
> >>> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat
> >>>
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
> >>>
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\n\tat
> >>>
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\n\tat
> >>>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat
> >>>
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat
> >>> java.lang.Thread.run(Thread.java:745)\n","code":500}}
> >>>
> >>>
> >>> Tomas
> >>>
> >>>
> >>> On 22 Jun 2016, at 17:22, Tomas Ramanauskas <
> >>> [hidden email]<mailto:[hidden email]>>
> >>> wrote:
> >>>
> >>> Thanks for the response, Alessandro.
> >>>
> >>> I tried this and it didn’t work either:
> >>>
> >>>
> >>>
> >>> $ curl http://localhost:8983/solr/demo/update -d '
> >>> [
> >>> {"id" : "book14",
> >>> "title_t":["The Way of Kings"],
> >>> "author_s":"Brandon Sanderson",
> >>> "cat_s": null,
> >>> "pubyear_i":2010,
> >>> "ISBN_s":"978-0-7653-2635-5"
> >>> }
> >>> ]’
> >>>
> >>> {"responseHeader":{"status":0,"QTime":2}}
> >>>
> >>> $ curl http://localhost:8983/solr/demo/get?id=book14
> >>> {
> >>>  "doc":
> >>>  {
> >>>    "id":"book14",
> >>>    "title_t":["The Way of Kings"],
> >>>    "author_s":"Brandon Sanderson",
> >>>    "pubyear_i":2010,
> >>>    "ISBN_s":"978-0-7653-2635-5",
> >>>    "_version_":1537854598189940736}}
> >>>
> >>>
> >>> I don’t see “cat_s” field in the results at all.
> >>>
> >>>
> >>> Tomas
> >>>
> >>>
> >>> On 22 Jun 2016, at 16:39, Alessandro Benedetti <[hidden email]
> >>> <mailto:[hidden email]>> wrote:
> >>>
> >>> Hi Tomas,
> >>> first consideration :
> >>> an empty string is different from a NULL string.
> >>> This is controversial, I would suggest you to never use the empty
> String
> >>> as
> >>> this can cause some others side effect.
> >>> Apart from that, the plugin will add the class only if the class field
> is
> >>> without any value
> >>>
> >>> Object documentClass = doc.getFieldValue(classFieldName);
> >>> if (documentClass == null) {
> >>>
> >>> Saying that, I would suggest you to build a sample index with some
> >>> document and then try to classify.
> >>> If this doesn't solve your issue, I can help you further.
> >>>
> >>> Cheers
> >>>
> >>> On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas <
> >>> [hidden email]<mailto:[hidden email]>>
> >>> wrote:
> >>>
> >>> I also tried this configuration, but could get the feature to work:
> >>>
> >>>
> >>>
> >>> <initParams path="/update/">
> >>>   <lst name="defaults">
> >>>     <str name="update.chain">classification</str>
> >>>   </lst>
> >>> </initParams>
> >>>
> >>>
> >>> <updateRequestProcessorChain name="classification">
> >>>   <processor class="solr.ClassificationUpdateProcessorFactory">
> >>>     <str name="inputFields">title_t,author_s</str>
> >>>     <str name="classField">cat_s</str>
> >>>     <str name="algorithm">bayes</str>
> >>>   </processor>
> >>> </updateRequestProcessorChain>
> >>>
> >>>
> >>> Tomas
> >>>
> >>> On 22 Jun 2016, at 13:46, Tomas Ramanauskas <
> >>> [hidden email]<mailto:[hidden email]
> >>>> <mailto:[hidden email]>>
> >>> wrote:
> >>>
> >>> P.S. The version I use:
> >>>
> >>> 6.1.0-68
> >>>
> >>> Also, earlier I said “If I modify an existing record, I think the
> >>> functionality works:”, but I think it doesn’t work for me at all.
> >>>
> >>> $ curl http://localhost:8983/solr/demo/get?id=book1
> >>> {
> >>> "doc":
> >>> {
> >>>   "id":"book1",
> >>>   "title_t":["The Way of Kings"],
> >>>   "author_s":"Brandon Sanderson",
> >>>   "cat_s":"fantasy",
> >>>   "pubyear_i":2010,
> >>>   "ISBN_s":"978-0-7653-2635-5",
> >>>   "_version_":1535488016326328320}}
> >>>
> >>> $ curl http://localhost:8983/solr/demo/update -d '
> >>> [
> >>> {"id" : "book1",
> >>> "title_t":["The Way of Kings"],
> >>> "author_s":"Brandon Sanderson",
> >>> "cat_s":"aaa",
> >>> "pubyear_i":2010,
> >>> "ISBN_s":"978-0-7653-2635-5"
> >>> }
> >>> ]'
> >>> {"responseHeader":{"status":0,"QTime":0}}
> >>>
> >>> $ curl http://localhost:8983/solr/demo/get?id=book1
> >>> {
> >>> "doc":
> >>> {
> >>>   "id":"book1",
> >>>   "title_t":["The Way of Kings"],
> >>>   "author_s":"Brandon Sanderson",
> >>>   "cat_s":"fantasy",
> >>>   "pubyear_i":2010,
> >>>   "ISBN_s":"978-0-7653-2635-5",
> >>>   "_version_":1535488016326328320}}
> >>>
> >>>
> >>> Tomas
> >>>
> >>>
> >>> On 22 Jun 2016, at 12:47, Tomas Ramanauskas <
> >>> [hidden email]<mailto:[hidden email]
> >>>> <mailto:[hidden email]>>
> >>> wrote:
> >>>
> >>> Hi, everyone,
> >>>
> >>>
> >>> would someone be able to share a working example (step by step) that
> >>> demonstrates the use of Naive Bayes classifier in Solr?
> >>>
> >>>
> >>> I followed this Blog post:
> >>>
> >>>
> >>>
> https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947
> >>>
> >>> And this tutorial:
> >>> http://yonik.com/solr-tutorial/
> >>>
> >>> And this JIRA ticket:
> >>> https://issues.apache.org/jira/browse/SOLR-7739
> >>>
> >>>
> >>>
> >>> So this is my configuration file (only what I added or modified):
> >>>
> >>> <initParams path="/update/**">
> >>>   <lst name="defaults">
> >>>     <str name="update.chain">classification</str>
> >>>   </lst>
> >>> </initParams>
> >>>
> >>>
> >>> <updateRequestProcessorChain name="classification">
> >>>   <processor class="solr.ClassificationUpdateProcessorFactory">
> >>>     <str name="inputFields">title_t,author_s</str>
> >>>     <str name="classField">cat_s</str>
> >>>     <str name="algorithm">bayes</str>
> >>>   </processor>
> >>> </updateRequestProcessorChain>
> >>>
> >>>
> >>>
> >>> If I modify an existing record, I think the functionality works:
> >>>
> >>>
> >>> $ curl http://localhost:8983/solr/demo/update -d '
> >>> [
> >>> {"id" : "book1",
> >>> "title_t":["The Way of Kings"],
> >>> "author_s":"Brandon Sanderson",
> >>> "cat_s":"",
> >>> "pubyear_i":2010,
> >>> "ISBN_s":"978-0-7653-2635-5"
> >>> }
> >>> ]'
> >>> {"responseHeader":{"status":0,"QTime":8}}
> >>> $ curl http://localhost:8983/solr/demo/get?id=book1
> >>> {
> >>> "doc":
> >>> {
> >>>   "id":"book1",
> >>>   "title_t":["The Way of Kings"],
> >>>   "author_s":"Brandon Sanderson",
> >>>   "cat_s":"fantasy",
> >>>   "pubyear_i":2010,
> >>>   "ISBN_s":"978-0-7653-2635-5",
> >>>   "_version_":1535488016326328320}}
> >>>
> >>>
> >>>
> >>>
> >>> If I add a new document, something isn’t quite working:
> >>>
> >>> $ curl http://localhost:8983/solr/demo/update -d '
> >>> [
> >>> {"id" : "book7",
> >>> "title_t":["The Way of Kings"],
> >>> "author_s":"Brandon Sanderson",
> >>> "cat_s":"",
> >>> "pubyear_i":2010,
> >>> "ISBN_s":"978-0-7653-2635-5"
> >>> }
> >>> ]'
> >>> {"responseHeader":{"status":0,"QTime":0}}
> >>> $ curl http://localhost:8983/solr/demo/get?id=book7
> >>> {
> >>> "doc":null}
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> --------------------------
> >>>
> >>> Benedetti Alessandro
> >>> Visiting card : http://about.me/alessandro_benedetti
> >>>
> >>> "Tyger, tyger burning bright
> >>> In the forests of the night,
> >>> What immortal hand or eye
> >>> Could frame thy fearful symmetry?"
> >>>
> >>> William Blake - Songs of Experience -1794 England
> >>>
> >>>
> >>>
> >
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card - http://about.me/alessandro_benedetti
> > Blog - http://alexbenedetti.blogspot.co.uk
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
>
>


--
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|

Re: A working example to play with Naive Bayes classifier

koponk
Hi, i have some problem when implementing this solr classification,

this is my schema :

<field name="pagetext_mlt" type="text_mlt" indexed="true" stored="true"
required="false" multiValued="false" termVectors="true"/>
<field name="knn_tags" type="string" indexed="true" stored="true"
required="false" multiValued="true"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true"
docValues="true" useDocValuesAsStored="true"/>

<fieldType name="text_mlt" class="solr.TextField"
positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_id.txt"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

and this is my solrconfig :

<requestHandler name="/update" class="solr.UpdateRequestHandler">
    <lst name="defaults">
      <str name="update.chain">classi</str>
    </lst>
  </requestHandler>

  <updateRequestProcessorChain name="classi">
    <processor class="solr.ClassificationUpdateProcessorFactory">
      <str name="inputFields">pagetext_mlt</str>
      <str name="classField">knn_tags</str>
      <str name="predictedClassField">prebayes_tags</str>
      <field name="prebayes_tags" type="string" indexed="true" stored="true"
required="false" multiValued="true"/>
      <str name="algorithm">bayes</str>
    </processor>
    <processor class="solr.LogUpdateProcessorFactory" />
    <processor class="solr.RunUpdateProcessorFactory" />
  </updateRequestProcessorChain>


but this is not working, step :
1. insert document A with pagetext_mlt="something A" and knn_tags="aaa"
2. insert document B with pagetext_mlt="something B" and knn_tags="bbb"
3. insert document C with pagetext_mlt="something B" and knn_tags=null

but field prebayes_tags always empty(i cant see it even when i stored the
field). is it something i miss?

Thanks,


Alessandro Benedetti wrote
> But how big it is your index ? Are you expecting Solr to automatically
> classify your documents without any knowledge groundbase ?
> Please attach an example of schema.
> There was a reason if I asked you :)
> Seems related the fact we get no token from the text analysis.
>
> Cheers
>
> On Fri, Jul 15, 2016 at 12:11 PM, Tomas Ramanauskas <

> Tomas.Ramanauskas@

>> wrote:
>
>> Hi, Allesandro,
>>
>> sorry for the delay. What do you mean?
>>
>>
>> As I mentioned earlier, I followed a super simply set of steps.
>>
>> 1. Download Solr
>> 2. Configure classification
>> 3. Create some documents using curl over HTTP.
>>
>>
>> Is it difficult to reproduce the steps / problem?
>>
>>
>> Tomas
>>
>>
>>
>> > On 23 Jun 2016, at 16:42, Alessandro Benedetti <
>>

> benedetti.alex85@

>> wrote:
>> >
>> > Can you give an example of your schema, and can you run a simple query
>> for
>> > you index, curious to see how the input fields are analyzed.
>> >
>> > Cheers
>> >
>> > On Wed, Jun 22, 2016 at 6:05 PM, Alessandro Benedetti <
>> >

> benedetti.alex85@

>> wrote:
>> >
>> >> This is better!  At list the classifier is invoked!
>> >> How many docs in the index have the class assigned?
>> >> Take a look to the stacktrace and you should find the cause!
>> >> I am now on mobile, I will check the code tomorrow!
>> >> Cheers
>> >> On 22 Jun 2016 5:26 pm, "Tomas Ramanauskas" <
>> >>

> Tomas.Ramanauskas@

>> wrote:
>> >>
>> >>>
>> >>> I also tried with this config (adding **):
>> >>>
>> >>>
>> >>>  
> <initParams path="/update/**">
>> >>>    
> <lst name="defaults">
>> >>>      
> <str name="update.chain">
> classification
> </str>
>> >>>    
> </lst>
>> >>>  
> </initParams>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> And I get the error:
>> >>>
>> >>>
>> >>>
>> >>> $ curl http://localhost:8983/solr/demo/update -d '
>> >>> [
>> >>> {"id" : "book15",
>> >>> "title_t":["The Way of Kings"],
>> >>> "author_s":"Brandon Sanderson",
>> >>> "cat_s": null,
>> >>> "pubyear_i":2010,
>> >>> "ISBN_s":"978-0-7653-2635-5"
>> >>> }
>> >>> ]'
>> >>>
>> {"responseHeader":{"status":500,"QTime":29},"error":{"trace":"java.lang.NullPointerException\n\tat
>> >>>
>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.getTokenArray(SimpleNaiveBayesDocumentClassifier.java:202)\n\tat
>> >>>
>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.analyzeSeedDocument(SimpleNaiveBayesDocumentClassifier.java:162)\n\tat
>> >>>
>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignNormClasses(SimpleNaiveBayesDocumentClassifier.java:121)\n\tat
>> >>>
>> org.apache.lucene.classification.document.SimpleNaiveBayesDocumentClassifier.assignClass(SimpleNaiveBayesDocumentClassifier.java:81)\n\tat
>> >>>
>> org.apache.solr.update.processor.ClassificationUpdateProcessor.processAdd(ClassificationUpdateProcessor.java:94)\n\tat
>> >>>
>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:474)\n\tat
>> >>>
>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:138)\n\tat
>> >>>
>> org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:114)\n\tat
>> >>>
>> org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:77)\n\tat
>> >>>
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)\n\tat
>> >>>
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)\n\tat
>> >>>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)\n\tat
>> >>> org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\n\tat
>> >>>
>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\n\tat
>> >>>
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\n\tat
>> >>>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat
>> >>>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat
>> >>>
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat
>> >>>
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat
>> >>>
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>> >>>
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
>> >>>
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat
>> >>>
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)\n\tat
>> >>>
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)\n\tat
>> >>>
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat
>> >>>
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat
>> >>>
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>> >>>
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)\n\tat
>> >>>
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat
>> >>>
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat
>> >>> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat
>> >>>
>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat
>> >>>
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat
>> >>>
>> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)\n\tat
>> >>>
>> org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)\n\tat
>> >>>
>> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat
>> >>>
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)\n\tat
>> >>>
>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)\n\tat
>> >>>
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat
>> >>>
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat
>> >>> java.lang.Thread.run(Thread.java:745)\n","code":500}}
>> >>>
>> >>>
>> >>> Tomas
>> >>>
>> >>>
>> >>> On 22 Jun 2016, at 17:22, Tomas Ramanauskas <
>> >>>

> Tomas.Ramanauskas@

> &lt;mailto:

> Tomas.Ramanauskas@

> &gt;>
>> >>> wrote:
>> >>>
>> >>> Thanks for the response, Alessandro.
>> >>>
>> >>> I tried this and it didn’t work either:
>> >>>
>> >>>
>> >>>
>> >>> $ curl http://localhost:8983/solr/demo/update -d '
>> >>> [
>> >>> {"id" : "book14",
>> >>> "title_t":["The Way of Kings"],
>> >>> "author_s":"Brandon Sanderson",
>> >>> "cat_s": null,
>> >>> "pubyear_i":2010,
>> >>> "ISBN_s":"978-0-7653-2635-5"
>> >>> }
>> >>> ]’
>> >>>
>> >>> {"responseHeader":{"status":0,"QTime":2}}
>> >>>
>> >>> $ curl http://localhost:8983/solr/demo/get?id=book14
>> >>> {
>> >>>  "doc":
>> >>>  {
>> >>>    "id":"book14",
>> >>>    "title_t":["The Way of Kings"],
>> >>>    "author_s":"Brandon Sanderson",
>> >>>    "pubyear_i":2010,
>> >>>    "ISBN_s":"978-0-7653-2635-5",
>> >>>    "_version_":1537854598189940736}}
>> >>>
>> >>>
>> >>> I don’t see “cat_s” field in the results at all.
>> >>>
>> >>>
>> >>> Tomas
>> >>>
>> >>>
>> >>> On 22 Jun 2016, at 16:39, Alessandro Benedetti &lt;

> abenedetti@

> &gt; >>> &lt;mailto:

> abenedetti@

> &gt;> wrote:
>> >>>
>> >>> Hi Tomas,
>> >>> first consideration :
>> >>> an empty string is different from a NULL string.
>> >>> This is controversial, I would suggest you to never use the empty
>> String
>> >>> as
>> >>> this can cause some others side effect.
>> >>> Apart from that, the plugin will add the class only if the class
>> field
>> is
>> >>> without any value
>> >>>
>> >>> Object documentClass = doc.getFieldValue(classFieldName);
>> >>> if (documentClass == null) {
>> >>>
>> >>> Saying that, I would suggest you to build a sample index with some
>> >>> document and then try to classify.
>> >>> If this doesn't solve your issue, I can help you further.
>> >>>
>> >>> Cheers
>> >>>
>> >>> On Wed, Jun 22, 2016 at 3:45 PM, Tomas Ramanauskas <
>> >>>

> Tomas.Ramanauskas@

> &lt;mailto:

> Tomas.Ramanauskas@

> &gt;>
>> >>> wrote:
>> >>>
>> >>> I also tried this configuration, but could get the feature to work:
>> >>>
>> >>>
>> >>>
>> >>>
> <initParams path="/update/">
>> >>>  
> <lst name="defaults">
>> >>>    
> <str name="update.chain">
> classification
> </str>
>> >>>  
> </lst>
>> >>>
> </initParams>
>> >>>
>> >>>
>> >>>
> <updateRequestProcessorChain name="classification">
>> >>>  
> <processor class="solr.ClassificationUpdateProcessorFactory">
>> >>>    
> <str name="inputFields">
> title_t,author_s
> </str>
>> >>>    
> <str name="classField">
> cat_s
> </str>
>> >>>    
> <str name="algorithm">
> bayes
> </str>
>> >>>  
> </processor>
>> >>>
> </updateRequestProcessorChain>
>> >>>
>> >>>
>> >>> Tomas
>> >>>
>> >>> On 22 Jun 2016, at 13:46, Tomas Ramanauskas <
>> >>>

> Tomas.Ramanauskas@

> &lt;mailto:

> Tomas.Ramanauskas@

> &gt; >>>> &lt;mailto:

> Tomas.Ramanauskas@

> &gt;>
>> >>> wrote:
>> >>>
>> >>> P.S. The version I use:
>> >>>
>> >>> 6.1.0-68
>> >>>
>> >>> Also, earlier I said “If I modify an existing record, I think the
>> >>> functionality works:”, but I think it doesn’t work for me at all.
>> >>>
>> >>> $ curl http://localhost:8983/solr/demo/get?id=book1
>> >>> {
>> >>> "doc":
>> >>> {
>> >>>   "id":"book1",
>> >>>   "title_t":["The Way of Kings"],
>> >>>   "author_s":"Brandon Sanderson",
>> >>>   "cat_s":"fantasy",
>> >>>   "pubyear_i":2010,
>> >>>   "ISBN_s":"978-0-7653-2635-5",
>> >>>   "_version_":1535488016326328320}}
>> >>>
>> >>> $ curl http://localhost:8983/solr/demo/update -d '
>> >>> [
>> >>> {"id" : "book1",
>> >>> "title_t":["The Way of Kings"],
>> >>> "author_s":"Brandon Sanderson",
>> >>> "cat_s":"aaa",
>> >>> "pubyear_i":2010,
>> >>> "ISBN_s":"978-0-7653-2635-5"
>> >>> }
>> >>> ]'
>> >>> {"responseHeader":{"status":0,"QTime":0}}
>> >>>
>> >>> $ curl http://localhost:8983/solr/demo/get?id=book1
>> >>> {
>> >>> "doc":
>> >>> {
>> >>>   "id":"book1",
>> >>>   "title_t":["The Way of Kings"],
>> >>>   "author_s":"Brandon Sanderson",
>> >>>   "cat_s":"fantasy",
>> >>>   "pubyear_i":2010,
>> >>>   "ISBN_s":"978-0-7653-2635-5",
>> >>>   "_version_":1535488016326328320}}
>> >>>
>> >>>
>> >>> Tomas
>> >>>
>> >>>
>> >>> On 22 Jun 2016, at 12:47, Tomas Ramanauskas <
>> >>>

> Tomas.Ramanauskas@

> &lt;mailto:

> Tomas.Ramanauskas@

> &gt; >>>> &lt;mailto:

> Tomas.Ramanauskas@

> &gt;>
>> >>> wrote:
>> >>>
>> >>> Hi, everyone,
>> >>>
>> >>>
>> >>> would someone be able to share a working example (step by step) that
>> >>> demonstrates the use of Naive Bayes classifier in Solr?
>> >>>
>> >>>
>> >>> I followed this Blog post:
>> >>>
>> >>>
>> >>>
>> https://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html?showComment=1464358093048#c2489902302085000947
>> >>>
>> >>> And this tutorial:
>> >>> http://yonik.com/solr-tutorial/
>> >>>
>> >>> And this JIRA ticket:
>> >>> https://issues.apache.org/jira/browse/SOLR-7739
>> >>>
>> >>>
>> >>>
>> >>> So this is my configuration file (only what I added or modified):
>> >>>
>> >>>
> <initParams path="/update/**">
>> >>>  
> <lst name="defaults">
>> >>>    
> <str name="update.chain">
> classification
> </str>
>> >>>  
> </lst>
>> >>>
> </initParams>
>> >>>
>> >>>
>> >>>
> <updateRequestProcessorChain name="classification">
>> >>>  
> <processor class="solr.ClassificationUpdateProcessorFactory">
>> >>>    
> <str name="inputFields">
> title_t,author_s
> </str>
>> >>>    
> <str name="classField">
> cat_s
> </str>
>> >>>    
> <str name="algorithm">
> bayes
> </str>
>> >>>  
> </processor>
>> >>>
> </updateRequestProcessorChain>
>> >>>
>> >>>
>> >>>
>> >>> If I modify an existing record, I think the functionality works:
>> >>>
>> >>>
>> >>> $ curl http://localhost:8983/solr/demo/update -d '
>> >>> [
>> >>> {"id" : "book1",
>> >>> "title_t":["The Way of Kings"],
>> >>> "author_s":"Brandon Sanderson",
>> >>> "cat_s":"",
>> >>> "pubyear_i":2010,
>> >>> "ISBN_s":"978-0-7653-2635-5"
>> >>> }
>> >>> ]'
>> >>> {"responseHeader":{"status":0,"QTime":8}}
>> >>> $ curl http://localhost:8983/solr/demo/get?id=book1
>> >>> {
>> >>> "doc":
>> >>> {
>> >>>   "id":"book1",
>> >>>   "title_t":["The Way of Kings"],
>> >>>   "author_s":"Brandon Sanderson",
>> >>>   "cat_s":"fantasy",
>> >>>   "pubyear_i":2010,
>> >>>   "ISBN_s":"978-0-7653-2635-5",
>> >>>   "_version_":1535488016326328320}}
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> If I add a new document, something isn’t quite working:
>> >>>
>> >>> $ curl http://localhost:8983/solr/demo/update -d '
>> >>> [
>> >>> {"id" : "book7",
>> >>> "title_t":["The Way of Kings"],
>> >>> "author_s":"Brandon Sanderson",
>> >>> "cat_s":"",
>> >>> "pubyear_i":2010,
>> >>> "ISBN_s":"978-0-7653-2635-5"
>> >>> }
>> >>> ]'
>> >>> {"responseHeader":{"status":0,"QTime":0}}
>> >>> $ curl http://localhost:8983/solr/demo/get?id=book7
>> >>> {
>> >>> "doc":null}
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> --------------------------
>> >>>
>> >>> Benedetti Alessandro
>> >>> Visiting card : http://about.me/alessandro_benedetti
>> >>>
>> >>> "Tyger, tyger burning bright
>> >>> In the forests of the night,
>> >>> What immortal hand or eye
>> >>> Could frame thy fearful symmetry?"
>> >>>
>> >>> William Blake - Songs of Experience -1794 England
>> >>>
>> >>>
>> >>>
>> >
>> >
>> > --
>> > --------------------------
>> >
>> > Benedetti Alessandro
>> > Visiting card - http://about.me/alessandro_benedetti
>> > Blog - http://alexbenedetti.blogspot.co.uk
>> >
>> > "Tyger, tyger burning bright
>> > In the forests of the night,
>> > What immortal hand or eye
>> > Could frame thy fearful symmetry?"
>> >
>> > William Blake - Songs of Experience -1794 England
>>
>>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html