solr 7.0.1: exception running post to crawl simple website

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

solr 7.0.1: exception running post to crawl simple website

Kevin Layer
I want to use solr to index a markdown website.  The files
are in native markdown, but they are served in HTML (by markserv).

Here's what I did:

docker run --name solr -d -p 8983:8983 -t solr
docker exec -it --user=solr solr bin/solr create_core -c handbook

Then, to crawl the site:

quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes md
/docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web org.apache.solr.util.SimplePostTool http://quadra.franz.com:9091/index.md
SimplePostTool version 5.0.0
Posting web pages to Solr url http://localhost:8983/solr/handbook/update/extract
Entering auto mode. Indexing pages with content-types corresponding to file endings md
SimplePostTool: WARNING: Never crawl an external web site faster than every 10 seconds, your IP will probably be blocked
Entering recursive mode, depth=10, delay=0s
Entering crawl at level 0 (1 links total, 1 new)
Exception in thread "main" java.lang.NullPointerException
        at org.apache.solr.util.SimplePostTool$PageFetcher.readPageFromUrl(SimplePostTool.java:1138)
        at org.apache.solr.util.SimplePostTool.webCrawl(SimplePostTool.java:603)
        at org.apache.solr.util.SimplePostTool.postWebPages(SimplePostTool.java:563)
        at org.apache.solr.util.SimplePostTool.doWebMode(SimplePostTool.java:365)
        at org.apache.solr.util.SimplePostTool.execute(SimplePostTool.java:187)
        at org.apache.solr.util.SimplePostTool.main(SimplePostTool.java:172)
quadra[git:master]$


Any ideas on what I did wrong?

Thanks.

Kevin
Reply | Threaded
Open this post in threaded view
|

Re: solr 7.0.1: exception running post to crawl simple website

Amrit Sarkar
Kevin,

You are getting NPE at:

String type = rawContentType.split(";")[0]; //HERE - rawContentType is NULL

// related code

String rawContentType = conn.getContentType();

public String getContentType() {
    return getHeaderField("content-type");
}

HttpURLConnection conn = (HttpURLConnection) u.openConnection();

Can you check at your webpage level headers are properly set and it
has key "content-type".


Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Wed, Oct 11, 2017 at 9:08 PM, Kevin Layer <[hidden email]> wrote:

> I want to use solr to index a markdown website.  The files
> are in native markdown, but they are served in HTML (by markserv).
>
> Here's what I did:
>
> docker run --name solr -d -p 8983:8983 -t solr
> docker exec -it --user=solr solr bin/solr create_core -c handbook
>
> Then, to crawl the site:
>
> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook
> http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes md
> /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar
> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web
> org.apache.solr.util.SimplePostTool http://quadra.franz.com:9091/index.md
> SimplePostTool version 5.0.0
> Posting web pages to Solr url http://localhost:8983/solr/
> handbook/update/extract
> Entering auto mode. Indexing pages with content-types corresponding to
> file endings md
> SimplePostTool: WARNING: Never crawl an external web site faster than
> every 10 seconds, your IP will probably be blocked
> Entering recursive mode, depth=10, delay=0s
> Entering crawl at level 0 (1 links total, 1 new)
> Exception in thread "main" java.lang.NullPointerException
>         at org.apache.solr.util.SimplePostTool$PageFetcher.
> readPageFromUrl(SimplePostTool.java:1138)
>         at org.apache.solr.util.SimplePostTool.webCrawl(
> SimplePostTool.java:603)
>         at org.apache.solr.util.SimplePostTool.postWebPages(
> SimplePostTool.java:563)
>         at org.apache.solr.util.SimplePostTool.doWebMode(
> SimplePostTool.java:365)
>         at org.apache.solr.util.SimplePostTool.execute(
> SimplePostTool.java:187)
>         at org.apache.solr.util.SimplePostTool.main(
> SimplePostTool.java:172)
> quadra[git:master]$
>
>
> Any ideas on what I did wrong?
>
> Thanks.
>
> Kevin
>
Reply | Threaded
Open this post in threaded view
|

Re: solr 7.0.1: exception running post to crawl simple website

Kevin Layer
Amrit Sarkar wrote:

>> Kevin,
>>
>> You are getting NPE at:
>>
>> String type = rawContentType.split(";")[0]; //HERE - rawContentType is NULL
>>
>> // related code
>>
>> String rawContentType = conn.getContentType();
>>
>> public String getContentType() {
>>     return getHeaderField("content-type");
>> }
>>
>> HttpURLConnection conn = (HttpURLConnection) u.openConnection();
>>
>> Can you check at your webpage level headers are properly set and it
>> has key "content-type".

Amrit, this is markserv, and I just used wget to prove you are
correct, there is no Content-Type header.

Thanks for the help!  I'll see if I can hack markserv to add that, and
try again.

Kevin
Reply | Threaded
Open this post in threaded view
|

Re: solr 7.0.1: exception running post to crawl simple website

Kevin Layer
In reply to this post by Amrit Sarkar
OK, so I hacked markserv to add Content-Type text/html, but now I get

SimplePostTool: WARNING: Skipping URL with unsupported type text/html

What is it expecting?

$ docker exec -it --user=solr solr bin/post -c handbook http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
/docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
SimplePostTool version 5.0.0
Posting web pages to Solr url http://localhost:8983/solr/handbook/update/extract
Entering auto mode. Indexing pages with content-types corresponding to file endings md
SimplePostTool: WARNING: Never crawl an external web site faster than every 10 seconds, your IP will probably be blocked
Entering recursive mode, depth=10, delay=0s
Entering crawl at level 0 (1 links total, 1 new)
SimplePostTool: WARNING: Skipping URL with unsupported type text/html
SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a HTTP result status of 415
0 web pages indexed.
COMMITting Solr index changes to http://localhost:8983/solr/handbook/update/extract...
Time spent: 0:00:03.882
$

Thanks.

Kevin
Reply | Threaded
Open this post in threaded view
|

Re: solr 7.0.1: exception running post to crawl simple website

Amrit Sarkar
Strange,

Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's
Content-Type. Let's see what it says now.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer <[hidden email]> wrote:

> OK, so I hacked markserv to add Content-Type text/html, but now I get
>
> SimplePostTool: WARNING: Skipping URL with unsupported type text/html
>
> What is it expecting?
>
> $ docker exec -it --user=solr solr bin/post -c handbook
> http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
> /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar
> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web
> org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
> SimplePostTool version 5.0.0
> Posting web pages to Solr url http://localhost:8983/solr/
> handbook/update/extract
> Entering auto mode. Indexing pages with content-types corresponding to
> file endings md
> SimplePostTool: WARNING: Never crawl an external web site faster than
> every 10 seconds, your IP will probably be blocked
> Entering recursive mode, depth=10, delay=0s
> Entering crawl at level 0 (1 links total, 1 new)
> SimplePostTool: WARNING: Skipping URL with unsupported type text/html
> SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a
> HTTP result status of 415
> 0 web pages indexed.
> COMMITting Solr index changes to http://localhost:8983/solr/
> handbook/update/extract...
> Time spent: 0:00:03.882
> $
>
> Thanks.
>
> Kevin
>
Reply | Threaded
Open this post in threaded view
|

Re: solr 7.0.1: exception running post to crawl simple website

Kevin Layer
Amrit Sarkar wrote:

>> Strange,
>>
>> Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's
>> Content-Type. Let's see what it says now.

Same thing.  Verified Content-Type:

quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md |& grep Content-Type
  Content-Type: text/html;charset=utf-8
quadra[git:master]$ ]

quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
/docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
SimplePostTool version 5.0.0
Posting web pages to Solr url http://localhost:8983/solr/handbook/update/extract
Entering auto mode. Indexing pages with content-types corresponding to file endings md
SimplePostTool: WARNING: Never crawl an external web site faster than every 10 seconds, your IP will probably be blocked
Entering recursive mode, depth=10, delay=0s
Entering crawl at level 0 (1 links total, 1 new)
SimplePostTool: WARNING: Skipping URL with unsupported type text/html
SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a HTTP result status of 415
0 web pages indexed.
COMMITting Solr index changes to http://localhost:8983/solr/handbook/update/extract...
Time spent: 0:00:00.531
quadra[git:master]$

Kevin

>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>
>> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer <[hidden email]> wrote:
>>
>> > OK, so I hacked markserv to add Content-Type text/html, but now I get
>> >
>> > SimplePostTool: WARNING: Skipping URL with unsupported type text/html
>> >
>> > What is it expecting?
>> >
>> > $ docker exec -it --user=solr solr bin/post -c handbook
>> > http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
>> > /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar
>> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web
>> > org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
>> > SimplePostTool version 5.0.0
>> > Posting web pages to Solr url http://localhost:8983/solr/
>> > handbook/update/extract
>> > Entering auto mode. Indexing pages with content-types corresponding to
>> > file endings md
>> > SimplePostTool: WARNING: Never crawl an external web site faster than
>> > every 10 seconds, your IP will probably be blocked
>> > Entering recursive mode, depth=10, delay=0s
>> > Entering crawl at level 0 (1 links total, 1 new)
>> > SimplePostTool: WARNING: Skipping URL with unsupported type text/html
>> > SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a
>> > HTTP result status of 415
>> > 0 web pages indexed.
>> > COMMITting Solr index changes to http://localhost:8983/solr/
>> > handbook/update/extract...
>> > Time spent: 0:00:03.882
>> > $
>> >
>> > Thanks.
>> >
>> > Kevin
>> >
Reply | Threaded
Open this post in threaded view
|

Re: solr 7.0.1: exception running post to crawl simple website

Amrit Sarkar
Ah!

Only supported type is: text/html; encoding=utf-8

I am not confident of this either :) but this should work.

See the code-snippet below:

......

if(res.httpStatus == 200) {
  // Raw content type of form "text/html; encoding=utf-8"
  String rawContentType = conn.getContentType();
  String type = rawContentType.split(";")[0];
  if(typeSupported(type) || "*".equals(fileTypes)) {
    String encoding = conn.getContentEncoding();

....


Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer <[hidden email]> wrote:

> Amrit Sarkar wrote:
>
> >> Strange,
> >>
> >> Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's
> >> Content-Type. Let's see what it says now.
>
> Same thing.  Verified Content-Type:
>
> quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md |&
> grep Content-Type
>   Content-Type: text/html;charset=utf-8
> quadra[git:master]$ ]
>
> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook
> http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
> /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar
> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web
> org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
> SimplePostTool version 5.0.0
> Posting web pages to Solr url http://localhost:8983/solr/
> handbook/update/extract
> Entering auto mode. Indexing pages with content-types corresponding to
> file endings md
> SimplePostTool: WARNING: Never crawl an external web site faster than
> every 10 seconds, your IP will probably be blocked
> Entering recursive mode, depth=10, delay=0s
> Entering crawl at level 0 (1 links total, 1 new)
> SimplePostTool: WARNING: Skipping URL with unsupported type text/html
> SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a
> HTTP result status of 415
> 0 web pages indexed.
> COMMITting Solr index changes to http://localhost:8983/solr/
> handbook/update/extract...
> Time spent: 0:00:00.531
> quadra[git:master]$
>
> Kevin
>
> >>
> >> Amrit Sarkar
> >> Search Engineer
> >> Lucidworks, Inc.
> >> 415-589-9269
> >> www.lucidworks.com
> >> Twitter http://twitter.com/lucidworks
> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >>
> >> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer <[hidden email]> wrote:
> >>
> >> > OK, so I hacked markserv to add Content-Type text/html, but now I get
> >> >
> >> > SimplePostTool: WARNING: Skipping URL with unsupported type text/html
> >> >
> >> > What is it expecting?
> >> >
> >> > $ docker exec -it --user=solr solr bin/post -c handbook
> >> > http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
> >> > /docker-java-home/jre/bin/java -classpath
> /opt/solr/dist/solr-core-7.0.1.jar
> >> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook
> -Ddata=web
> >> > org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
> >> > SimplePostTool version 5.0.0
> >> > Posting web pages to Solr url http://localhost:8983/solr/
> >> > handbook/update/extract
> >> > Entering auto mode. Indexing pages with content-types corresponding to
> >> > file endings md
> >> > SimplePostTool: WARNING: Never crawl an external web site faster than
> >> > every 10 seconds, your IP will probably be blocked
> >> > Entering recursive mode, depth=10, delay=0s
> >> > Entering crawl at level 0 (1 links total, 1 new)
> >> > SimplePostTool: WARNING: Skipping URL with unsupported type text/html
> >> > SimplePostTool: WARNING: The URL http://quadra:9091/index.md
> returned a
> >> > HTTP result status of 415
> >> > 0 web pages indexed.
> >> > COMMITting Solr index changes to http://localhost:8983/solr/
> >> > handbook/update/extract...
> >> > Time spent: 0:00:03.882
> >> > $
> >> >
> >> > Thanks.
> >> >
> >> > Kevin
> >> >
>
Reply | Threaded
Open this post in threaded view
|

Re: solr 7.0.1: exception running post to crawl simple website

Amrit Sarkar
Kevin,

Just put "html" too and give it a shot. These are the types it is expecting:

mimeMap = new HashMap<>();
mimeMap.put("xml", "application/xml");
mimeMap.put("csv", "text/csv");
mimeMap.put("json", "application/json");
mimeMap.put("jsonl", "application/json");
mimeMap.put("pdf", "application/pdf");
mimeMap.put("rtf", "text/rtf");
mimeMap.put("html", "text/html");
mimeMap.put("htm", "text/html");
mimeMap.put("doc", "application/msword");
mimeMap.put("docx",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document");
mimeMap.put("ppt", "application/vnd.ms-powerpoint");
mimeMap.put("pptx",
"application/vnd.openxmlformats-officedocument.presentationml.presentation");
mimeMap.put("xls", "application/vnd.ms-excel");
mimeMap.put("xlsx",
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
mimeMap.put("odp", "application/vnd.oasis.opendocument.presentation");
mimeMap.put("otp", "application/vnd.oasis.opendocument.presentation");
mimeMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet");
mimeMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet");
mimeMap.put("txt", "text/plain");
mimeMap.put("log", "text/plain");

The keys are the types supported.


Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar <[hidden email]>
wrote:

> Ah!
>
> Only supported type is: text/html; encoding=utf-8
>
> I am not confident of this either :) but this should work.
>
> See the code-snippet below:
>
> ......
>
> if(res.httpStatus == 200) {
>   // Raw content type of form "text/html; encoding=utf-8"
>   String rawContentType = conn.getContentType();
>   String type = rawContentType.split(";")[0];
>   if(typeSupported(type) || "*".equals(fileTypes)) {
>     String encoding = conn.getContentEncoding();
>
> ....
>
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer <[hidden email]> wrote:
>
>> Amrit Sarkar wrote:
>>
>> >> Strange,
>> >>
>> >> Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's
>> >> Content-Type. Let's see what it says now.
>>
>> Same thing.  Verified Content-Type:
>>
>> quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md |&
>> grep Content-Type
>>   Content-Type: text/html;charset=utf-8
>> quadra[git:master]$ ]
>>
>> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook
>> http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
>> /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar
>> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web
>> org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
>> SimplePostTool version 5.0.0
>> Posting web pages to Solr url http://localhost:8983/solr/han
>> dbook/update/extract
>> Entering auto mode. Indexing pages with content-types corresponding to
>> file endings md
>> SimplePostTool: WARNING: Never crawl an external web site faster than
>> every 10 seconds, your IP will probably be blocked
>> Entering recursive mode, depth=10, delay=0s
>> Entering crawl at level 0 (1 links total, 1 new)
>> SimplePostTool: WARNING: Skipping URL with unsupported type text/html
>> SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a
>> HTTP result status of 415
>> 0 web pages indexed.
>> COMMITting Solr index changes to http://localhost:8983/solr/han
>> dbook/update/extract...
>> Time spent: 0:00:00.531
>> quadra[git:master]$
>>
>> Kevin
>>
>> >>
>> >> Amrit Sarkar
>> >> Search Engineer
>> >> Lucidworks, Inc.
>> >> 415-589-9269
>> >> www.lucidworks.com
>> >> Twitter http://twitter.com/lucidworks
>> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> >>
>> >> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer <[hidden email]> wrote:
>> >>
>> >> > OK, so I hacked markserv to add Content-Type text/html, but now I get
>> >> >
>> >> > SimplePostTool: WARNING: Skipping URL with unsupported type text/html
>> >> >
>> >> > What is it expecting?
>> >> >
>> >> > $ docker exec -it --user=solr solr bin/post -c handbook
>> >> > http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
>> >> > /docker-java-home/jre/bin/java -classpath
>> /opt/solr/dist/solr-core-7.0.1.jar
>> >> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook
>> -Ddata=web
>> >> > org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
>> >> > SimplePostTool version 5.0.0
>> >> > Posting web pages to Solr url http://localhost:8983/solr/
>> >> > handbook/update/extract
>> >> > Entering auto mode. Indexing pages with content-types corresponding
>> to
>> >> > file endings md
>> >> > SimplePostTool: WARNING: Never crawl an external web site faster than
>> >> > every 10 seconds, your IP will probably be blocked
>> >> > Entering recursive mode, depth=10, delay=0s
>> >> > Entering crawl at level 0 (1 links total, 1 new)
>> >> > SimplePostTool: WARNING: Skipping URL with unsupported type text/html
>> >> > SimplePostTool: WARNING: The URL http://quadra:9091/index.md
>> returned a
>> >> > HTTP result status of 415
>> >> > 0 web pages indexed.
>> >> > COMMITting Solr index changes to http://localhost:8983/solr/
>> >> > handbook/update/extract...
>> >> > Time spent: 0:00:03.882
>> >> > $
>> >> >
>> >> > Thanks.
>> >> >
>> >> > Kevin
>> >> >
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: solr 7.0.1: exception running post to crawl simple website

Amrit Sarkar
Reference to the code:

.....

String rawContentType = conn.getContentType();
String type = rawContentType.split(";")[0];
if(typeSupported(type) || "*".equals(fileTypes)) {
  String encoding = conn.getContentEncoding();

.....

protected boolean typeSupported(String type) {
  for(String key : mimeMap.keySet()) {
    if(mimeMap.get(key).equals(type)) {
      if(fileTypes.contains(key))
        return true;
    }
  }
  return false;
}

.....

It has another check for fileTypes, I can see the page ending with .md
(which you are indexing) and not .html. Let's hope now this is not the
issue.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Oct 13, 2017 at 7:04 PM, Amrit Sarkar <[hidden email]>
wrote:

> Kevin,
>
> Just put "html" too and give it a shot. These are the types it is
> expecting:
>
> mimeMap = new HashMap<>();
> mimeMap.put("xml", "application/xml");
> mimeMap.put("csv", "text/csv");
> mimeMap.put("json", "application/json");
> mimeMap.put("jsonl", "application/json");
> mimeMap.put("pdf", "application/pdf");
> mimeMap.put("rtf", "text/rtf");
> mimeMap.put("html", "text/html");
> mimeMap.put("htm", "text/html");
> mimeMap.put("doc", "application/msword");
> mimeMap.put("docx", "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
> mimeMap.put("ppt", "application/vnd.ms-powerpoint");
> mimeMap.put("pptx", "application/vnd.openxmlformats-officedocument.presentationml.presentation");
> mimeMap.put("xls", "application/vnd.ms-excel");
> mimeMap.put("xlsx", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
> mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
> mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
> mimeMap.put("odp", "application/vnd.oasis.opendocument.presentation");
> mimeMap.put("otp", "application/vnd.oasis.opendocument.presentation");
> mimeMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet");
> mimeMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet");
> mimeMap.put("txt", "text/plain");
> mimeMap.put("log", "text/plain");
>
> The keys are the types supported.
>
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar <[hidden email]>
> wrote:
>
>> Ah!
>>
>> Only supported type is: text/html; encoding=utf-8
>>
>> I am not confident of this either :) but this should work.
>>
>> See the code-snippet below:
>>
>> ......
>>
>> if(res.httpStatus == 200) {
>>   // Raw content type of form "text/html; encoding=utf-8"
>>   String rawContentType = conn.getContentType();
>>   String type = rawContentType.split(";")[0];
>>   if(typeSupported(type) || "*".equals(fileTypes)) {
>>     String encoding = conn.getContentEncoding();
>>
>> ....
>>
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>
>> On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer <[hidden email]> wrote:
>>
>>> Amrit Sarkar wrote:
>>>
>>> >> Strange,
>>> >>
>>> >> Can you add: "text/html;charset=utf-8". This is wiki.apache.org
>>> page's
>>> >> Content-Type. Let's see what it says now.
>>>
>>> Same thing.  Verified Content-Type:
>>>
>>> quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md |&
>>> grep Content-Type
>>>   Content-Type: text/html;charset=utf-8
>>> quadra[git:master]$ ]
>>>
>>> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c
>>> handbook http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes
>>> md
>>> /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar
>>> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web
>>> org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
>>> SimplePostTool version 5.0.0
>>> Posting web pages to Solr url http://localhost:8983/solr/han
>>> dbook/update/extract
>>> Entering auto mode. Indexing pages with content-types corresponding to
>>> file endings md
>>> SimplePostTool: WARNING: Never crawl an external web site faster than
>>> every 10 seconds, your IP will probably be blocked
>>> Entering recursive mode, depth=10, delay=0s
>>> Entering crawl at level 0 (1 links total, 1 new)
>>> SimplePostTool: WARNING: Skipping URL with unsupported type text/html
>>> SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a
>>> HTTP result status of 415
>>> 0 web pages indexed.
>>> COMMITting Solr index changes to http://localhost:8983/solr/han
>>> dbook/update/extract...
>>> Time spent: 0:00:00.531
>>> quadra[git:master]$
>>>
>>> Kevin
>>>
>>> >>
>>> >> Amrit Sarkar
>>> >> Search Engineer
>>> >> Lucidworks, Inc.
>>> >> 415-589-9269
>>> >> www.lucidworks.com
>>> >> Twitter http://twitter.com/lucidworks
>>> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>> >>
>>> >> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer <[hidden email]> wrote:
>>> >>
>>> >> > OK, so I hacked markserv to add Content-Type text/html, but now I
>>> get
>>> >> >
>>> >> > SimplePostTool: WARNING: Skipping URL with unsupported type
>>> text/html
>>> >> >
>>> >> > What is it expecting?
>>> >> >
>>> >> > $ docker exec -it --user=solr solr bin/post -c handbook
>>> >> > http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
>>> >> > /docker-java-home/jre/bin/java -classpath
>>> /opt/solr/dist/solr-core-7.0.1.jar
>>> >> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook
>>> -Ddata=web
>>> >> > org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
>>> >> > SimplePostTool version 5.0.0
>>> >> > Posting web pages to Solr url http://localhost:8983/solr/
>>> >> > handbook/update/extract
>>> >> > Entering auto mode. Indexing pages with content-types corresponding
>>> to
>>> >> > file endings md
>>> >> > SimplePostTool: WARNING: Never crawl an external web site faster
>>> than
>>> >> > every 10 seconds, your IP will probably be blocked
>>> >> > Entering recursive mode, depth=10, delay=0s
>>> >> > Entering crawl at level 0 (1 links total, 1 new)
>>> >> > SimplePostTool: WARNING: Skipping URL with unsupported type
>>> text/html
>>> >> > SimplePostTool: WARNING: The URL http://quadra:9091/index.md
>>> returned a
>>> >> > HTTP result status of 415
>>> >> > 0 web pages indexed.
>>> >> > COMMITting Solr index changes to http://localhost:8983/solr/
>>> >> > handbook/update/extract...
>>> >> > Time spent: 0:00:03.882
>>> >> > $
>>> >> >
>>> >> > Thanks.
>>> >> >
>>> >> > Kevin
>>> >> >
>>>
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: solr 7.0.1: exception running post to crawl simple website

Kevin Layer
Amrit Sarkar wrote:

>> Reference to the code:
>>
>> .....
>>
>> String rawContentType = conn.getContentType();
>> String type = rawContentType.split(";")[0];
>> if(typeSupported(type) || "*".equals(fileTypes)) {
>>   String encoding = conn.getContentEncoding();
>>
>> .....
>>
>> protected boolean typeSupported(String type) {
>>   for(String key : mimeMap.keySet()) {
>>     if(mimeMap.get(key).equals(type)) {
>>       if(fileTypes.contains(key))
>>         return true;
>>     }
>>   }
>>   return false;
>> }
>>
>> .....
>>
>> It has another check for fileTypes, I can see the page ending with .md
>> (which you are indexing) and not .html. Let's hope now this is not the
>> issue.

Did you see the "-filetypes md" at the end of the post command line?
Shouldn't that handle it?

Kevin

>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>
>> On Fri, Oct 13, 2017 at 7:04 PM, Amrit Sarkar <[hidden email]>
>> wrote:
>>
>> > Kevin,
>> >
>> > Just put "html" too and give it a shot. These are the types it is
>> > expecting:
>> >
>> > mimeMap = new HashMap<>();
>> > mimeMap.put("xml", "application/xml");
>> > mimeMap.put("csv", "text/csv");
>> > mimeMap.put("json", "application/json");
>> > mimeMap.put("jsonl", "application/json");
>> > mimeMap.put("pdf", "application/pdf");
>> > mimeMap.put("rtf", "text/rtf");
>> > mimeMap.put("html", "text/html");
>> > mimeMap.put("htm", "text/html");
>> > mimeMap.put("doc", "application/msword");
>> > mimeMap.put("docx", "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
>> > mimeMap.put("ppt", "application/vnd.ms-powerpoint");
>> > mimeMap.put("pptx", "application/vnd.openxmlformats-officedocument.presentationml.presentation");
>> > mimeMap.put("xls", "application/vnd.ms-excel");
>> > mimeMap.put("xlsx", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
>> > mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
>> > mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
>> > mimeMap.put("odp", "application/vnd.oasis.opendocument.presentation");
>> > mimeMap.put("otp", "application/vnd.oasis.opendocument.presentation");
>> > mimeMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet");
>> > mimeMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet");
>> > mimeMap.put("txt", "text/plain");
>> > mimeMap.put("log", "text/plain");
>> >
>> > The keys are the types supported.
>> >
>> >
>> > Amrit Sarkar
>> > Search Engineer
>> > Lucidworks, Inc.
>> > 415-589-9269
>> > www.lucidworks.com
>> > Twitter http://twitter.com/lucidworks
>> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> >
>> > On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar <[hidden email]>
>> > wrote:
>> >
>> >> Ah!
>> >>
>> >> Only supported type is: text/html; encoding=utf-8
>> >>
>> >> I am not confident of this either :) but this should work.
>> >>
>> >> See the code-snippet below:
>> >>
>> >> ......
>> >>
>> >> if(res.httpStatus == 200) {
>> >>   // Raw content type of form "text/html; encoding=utf-8"
>> >>   String rawContentType = conn.getContentType();
>> >>   String type = rawContentType.split(";")[0];
>> >>   if(typeSupported(type) || "*".equals(fileTypes)) {
>> >>     String encoding = conn.getContentEncoding();
>> >>
>> >> ....
>> >>
>> >>
>> >> Amrit Sarkar
>> >> Search Engineer
>> >> Lucidworks, Inc.
>> >> 415-589-9269
>> >> www.lucidworks.com
>> >> Twitter http://twitter.com/lucidworks
>> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> >>
>> >> On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer <[hidden email]> wrote:
>> >>
>> >>> Amrit Sarkar wrote:
>> >>>
>> >>> >> Strange,
>> >>> >>
>> >>> >> Can you add: "text/html;charset=utf-8". This is wiki.apache.org
>> >>> page's
>> >>> >> Content-Type. Let's see what it says now.
>> >>>
>> >>> Same thing.  Verified Content-Type:
>> >>>
>> >>> quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md |&
>> >>> grep Content-Type
>> >>>   Content-Type: text/html;charset=utf-8
>> >>> quadra[git:master]$ ]
>> >>>
>> >>> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c
>> >>> handbook http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes
>> >>> md
>> >>> /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar
>> >>> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web
>> >>> org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
>> >>> SimplePostTool version 5.0.0
>> >>> Posting web pages to Solr url http://localhost:8983/solr/han
>> >>> dbook/update/extract
>> >>> Entering auto mode. Indexing pages with content-types corresponding to
>> >>> file endings md
>> >>> SimplePostTool: WARNING: Never crawl an external web site faster than
>> >>> every 10 seconds, your IP will probably be blocked
>> >>> Entering recursive mode, depth=10, delay=0s
>> >>> Entering crawl at level 0 (1 links total, 1 new)
>> >>> SimplePostTool: WARNING: Skipping URL with unsupported type text/html
>> >>> SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a
>> >>> HTTP result status of 415
>> >>> 0 web pages indexed.
>> >>> COMMITting Solr index changes to http://localhost:8983/solr/han
>> >>> dbook/update/extract...
>> >>> Time spent: 0:00:00.531
>> >>> quadra[git:master]$
>> >>>
>> >>> Kevin
>> >>>
>> >>> >>
>> >>> >> Amrit Sarkar
>> >>> >> Search Engineer
>> >>> >> Lucidworks, Inc.
>> >>> >> 415-589-9269
>> >>> >> www.lucidworks.com
>> >>> >> Twitter http://twitter.com/lucidworks
>> >>> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> >>> >>
>> >>> >> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer <[hidden email]> wrote:
>> >>> >>
>> >>> >> > OK, so I hacked markserv to add Content-Type text/html, but now I
>> >>> get
>> >>> >> >
>> >>> >> > SimplePostTool: WARNING: Skipping URL with unsupported type
>> >>> text/html
>> >>> >> >
>> >>> >> > What is it expecting?
>> >>> >> >
>> >>> >> > $ docker exec -it --user=solr solr bin/post -c handbook
>> >>> >> > http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
>> >>> >> > /docker-java-home/jre/bin/java -classpath
>> >>> /opt/solr/dist/solr-core-7.0.1.jar
>> >>> >> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook
>> >>> -Ddata=web
>> >>> >> > org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
>> >>> >> > SimplePostTool version 5.0.0
>> >>> >> > Posting web pages to Solr url http://localhost:8983/solr/
>> >>> >> > handbook/update/extract
>> >>> >> > Entering auto mode. Indexing pages with content-types corresponding
>> >>> to
>> >>> >> > file endings md
>> >>> >> > SimplePostTool: WARNING: Never crawl an external web site faster
>> >>> than
>> >>> >> > every 10 seconds, your IP will probably be blocked
>> >>> >> > Entering recursive mode, depth=10, delay=0s
>> >>> >> > Entering crawl at level 0 (1 links total, 1 new)
>> >>> >> > SimplePostTool: WARNING: Skipping URL with unsupported type
>> >>> text/html
>> >>> >> > SimplePostTool: WARNING: The URL http://quadra:9091/index.md
>> >>> returned a
>> >>> >> > HTTP result status of 415
>> >>> >> > 0 web pages indexed.
>> >>> >> > COMMITting Solr index changes to http://localhost:8983/solr/
>> >>> >> > handbook/update/extract...
>> >>> >> > Time spent: 0:00:03.882
>> >>> >> > $
>> >>> >> >
>> >>> >> > Thanks.
>> >>> >> >
>> >>> >> > Kevin
>> >>> >> >
>> >>>
>> >>
>> >>
>> >
Reply | Threaded
Open this post in threaded view
|

Re: solr 7.0.1: exception running post to crawl simple website

Kevin Layer
In reply to this post by Amrit Sarkar
Amrit Sarkar wrote:

>> Kevin,
>>
>> Just put "html" too and give it a shot. These are the types it is expecting:

Same thing.

>>
>> mimeMap = new HashMap<>();
>> mimeMap.put("xml", "application/xml");
>> mimeMap.put("csv", "text/csv");
>> mimeMap.put("json", "application/json");
>> mimeMap.put("jsonl", "application/json");
>> mimeMap.put("pdf", "application/pdf");
>> mimeMap.put("rtf", "text/rtf");
>> mimeMap.put("html", "text/html");
>> mimeMap.put("htm", "text/html");
>> mimeMap.put("doc", "application/msword");
>> mimeMap.put("docx",
>> "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
>> mimeMap.put("ppt", "application/vnd.ms-powerpoint");
>> mimeMap.put("pptx",
>> "application/vnd.openxmlformats-officedocument.presentationml.presentation");
>> mimeMap.put("xls", "application/vnd.ms-excel");
>> mimeMap.put("xlsx",
>> "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
>> mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
>> mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
>> mimeMap.put("odp", "application/vnd.oasis.opendocument.presentation");
>> mimeMap.put("otp", "application/vnd.oasis.opendocument.presentation");
>> mimeMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet");
>> mimeMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet");
>> mimeMap.put("txt", "text/plain");
>> mimeMap.put("log", "text/plain");
>>
>> The keys are the types supported.
>>
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>
>> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar <[hidden email]>
>> wrote:
>>
>> > Ah!
>> >
>> > Only supported type is: text/html; encoding=utf-8
>> >
>> > I am not confident of this either :) but this should work.
>> >
>> > See the code-snippet below:
>> >
>> > ......
>> >
>> > if(res.httpStatus == 200) {
>> >   // Raw content type of form "text/html; encoding=utf-8"
>> >   String rawContentType = conn.getContentType();
>> >   String type = rawContentType.split(";")[0];
>> >   if(typeSupported(type) || "*".equals(fileTypes)) {
>> >     String encoding = conn.getContentEncoding();
>> >
>> > ....
>> >
>> >
>> > Amrit Sarkar
>> > Search Engineer
>> > Lucidworks, Inc.
>> > 415-589-9269
>> > www.lucidworks.com
>> > Twitter http://twitter.com/lucidworks
>> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> >
>> > On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer <[hidden email]> wrote:
>> >
>> >> Amrit Sarkar wrote:
>> >>
>> >> >> Strange,
>> >> >>
>> >> >> Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's
>> >> >> Content-Type. Let's see what it says now.
>> >>
>> >> Same thing.  Verified Content-Type:
>> >>
>> >> quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md |&
>> >> grep Content-Type
>> >>   Content-Type: text/html;charset=utf-8
>> >> quadra[git:master]$ ]
>> >>
>> >> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook
>> >> http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
>> >> /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar
>> >> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web
>> >> org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
>> >> SimplePostTool version 5.0.0
>> >> Posting web pages to Solr url http://localhost:8983/solr/han
>> >> dbook/update/extract
>> >> Entering auto mode. Indexing pages with content-types corresponding to
>> >> file endings md
>> >> SimplePostTool: WARNING: Never crawl an external web site faster than
>> >> every 10 seconds, your IP will probably be blocked
>> >> Entering recursive mode, depth=10, delay=0s
>> >> Entering crawl at level 0 (1 links total, 1 new)
>> >> SimplePostTool: WARNING: Skipping URL with unsupported type text/html
>> >> SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a
>> >> HTTP result status of 415
>> >> 0 web pages indexed.
>> >> COMMITting Solr index changes to http://localhost:8983/solr/han
>> >> dbook/update/extract...
>> >> Time spent: 0:00:00.531
>> >> quadra[git:master]$
>> >>
>> >> Kevin
>> >>
>> >> >>
>> >> >> Amrit Sarkar
>> >> >> Search Engineer
>> >> >> Lucidworks, Inc.
>> >> >> 415-589-9269
>> >> >> www.lucidworks.com
>> >> >> Twitter http://twitter.com/lucidworks
>> >> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> >> >>
>> >> >> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer <[hidden email]> wrote:
>> >> >>
>> >> >> > OK, so I hacked markserv to add Content-Type text/html, but now I get
>> >> >> >
>> >> >> > SimplePostTool: WARNING: Skipping URL with unsupported type text/html
>> >> >> >
>> >> >> > What is it expecting?
>> >> >> >
>> >> >> > $ docker exec -it --user=solr solr bin/post -c handbook
>> >> >> > http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
>> >> >> > /docker-java-home/jre/bin/java -classpath
>> >> /opt/solr/dist/solr-core-7.0.1.jar
>> >> >> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook
>> >> -Ddata=web
>> >> >> > org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
>> >> >> > SimplePostTool version 5.0.0
>> >> >> > Posting web pages to Solr url http://localhost:8983/solr/
>> >> >> > handbook/update/extract
>> >> >> > Entering auto mode. Indexing pages with content-types corresponding
>> >> to
>> >> >> > file endings md
>> >> >> > SimplePostTool: WARNING: Never crawl an external web site faster than
>> >> >> > every 10 seconds, your IP will probably be blocked
>> >> >> > Entering recursive mode, depth=10, delay=0s
>> >> >> > Entering crawl at level 0 (1 links total, 1 new)
>> >> >> > SimplePostTool: WARNING: Skipping URL with unsupported type text/html
>> >> >> > SimplePostTool: WARNING: The URL http://quadra:9091/index.md
>> >> returned a
>> >> >> > HTTP result status of 415
>> >> >> > 0 web pages indexed.
>> >> >> > COMMITting Solr index changes to http://localhost:8983/solr/
>> >> >> > handbook/update/extract...
>> >> >> > Time spent: 0:00:03.882
>> >> >> > $
>> >> >> >
>> >> >> > Thanks.
>> >> >> >
>> >> >> > Kevin
>> >> >> >
>> >>
>> >
>> >
Reply | Threaded
Open this post in threaded view
|

Re: solr 7.0.1: exception running post to crawl simple website

Amrit Sarkar
Hi Kevin,

Can you post the solr log in the mail thread. I don't think it handled the
.md by itself by first glance at code.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer <[hidden email]> wrote:

> Amrit Sarkar wrote:
>
> >> Kevin,
> >>
> >> Just put "html" too and give it a shot. These are the types it is
> expecting:
>
> Same thing.
>
> >>
> >> mimeMap = new HashMap<>();
> >> mimeMap.put("xml", "application/xml");
> >> mimeMap.put("csv", "text/csv");
> >> mimeMap.put("json", "application/json");
> >> mimeMap.put("jsonl", "application/json");
> >> mimeMap.put("pdf", "application/pdf");
> >> mimeMap.put("rtf", "text/rtf");
> >> mimeMap.put("html", "text/html");
> >> mimeMap.put("htm", "text/html");
> >> mimeMap.put("doc", "application/msword");
> >> mimeMap.put("docx",
> >> "application/vnd.openxmlformats-officedocument.
> wordprocessingml.document");
> >> mimeMap.put("ppt", "application/vnd.ms-powerpoint");
> >> mimeMap.put("pptx",
> >> "application/vnd.openxmlformats-officedocument.
> presentationml.presentation");
> >> mimeMap.put("xls", "application/vnd.ms-excel");
> >> mimeMap.put("xlsx",
> >> "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
> >> mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
> >> mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
> >> mimeMap.put("odp", "application/vnd.oasis.opendocument.presentation");
> >> mimeMap.put("otp", "application/vnd.oasis.opendocument.presentation");
> >> mimeMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet");
> >> mimeMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet");
> >> mimeMap.put("txt", "text/plain");
> >> mimeMap.put("log", "text/plain");
> >>
> >> The keys are the types supported.
> >>
> >>
> >> Amrit Sarkar
> >> Search Engineer
> >> Lucidworks, Inc.
> >> 415-589-9269
> >> www.lucidworks.com
> >> Twitter http://twitter.com/lucidworks
> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >>
> >> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar <[hidden email]>
> >> wrote:
> >>
> >> > Ah!
> >> >
> >> > Only supported type is: text/html; encoding=utf-8
> >> >
> >> > I am not confident of this either :) but this should work.
> >> >
> >> > See the code-snippet below:
> >> >
> >> > ......
> >> >
> >> > if(res.httpStatus == 200) {
> >> >   // Raw content type of form "text/html; encoding=utf-8"
> >> >   String rawContentType = conn.getContentType();
> >> >   String type = rawContentType.split(";")[0];
> >> >   if(typeSupported(type) || "*".equals(fileTypes)) {
> >> >     String encoding = conn.getContentEncoding();
> >> >
> >> > ....
> >> >
> >> >
> >> > Amrit Sarkar
> >> > Search Engineer
> >> > Lucidworks, Inc.
> >> > 415-589-9269
> >> > www.lucidworks.com
> >> > Twitter http://twitter.com/lucidworks
> >> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> >
> >> > On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer <[hidden email]> wrote:
> >> >
> >> >> Amrit Sarkar wrote:
> >> >>
> >> >> >> Strange,
> >> >> >>
> >> >> >> Can you add: "text/html;charset=utf-8". This is wiki.apache.org
> page's
> >> >> >> Content-Type. Let's see what it says now.
> >> >>
> >> >> Same thing.  Verified Content-Type:
> >> >>
> >> >> quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md
> |&
> >> >> grep Content-Type
> >> >>   Content-Type: text/html;charset=utf-8
> >> >> quadra[git:master]$ ]
> >> >>
> >> >> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c
> handbook
> >> >> http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
> >> >> /docker-java-home/jre/bin/java -classpath
> /opt/solr/dist/solr-core-7.0.1.jar
> >> >> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook
> -Ddata=web
> >> >> org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
> >> >> SimplePostTool version 5.0.0
> >> >> Posting web pages to Solr url http://localhost:8983/solr/han
> >> >> dbook/update/extract
> >> >> Entering auto mode. Indexing pages with content-types corresponding
> to
> >> >> file endings md
> >> >> SimplePostTool: WARNING: Never crawl an external web site faster than
> >> >> every 10 seconds, your IP will probably be blocked
> >> >> Entering recursive mode, depth=10, delay=0s
> >> >> Entering crawl at level 0 (1 links total, 1 new)
> >> >> SimplePostTool: WARNING: Skipping URL with unsupported type text/html
> >> >> SimplePostTool: WARNING: The URL http://quadra:9091/index.md
> returned a
> >> >> HTTP result status of 415
> >> >> 0 web pages indexed.
> >> >> COMMITting Solr index changes to http://localhost:8983/solr/han
> >> >> dbook/update/extract...
> >> >> Time spent: 0:00:00.531
> >> >> quadra[git:master]$
> >> >>
> >> >> Kevin
> >> >>
> >> >> >>
> >> >> >> Amrit Sarkar
> >> >> >> Search Engineer
> >> >> >> Lucidworks, Inc.
> >> >> >> 415-589-9269
> >> >> >> www.lucidworks.com
> >> >> >> Twitter http://twitter.com/lucidworks
> >> >> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> >> >>
> >> >> >> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer <[hidden email]>
> wrote:
> >> >> >>
> >> >> >> > OK, so I hacked markserv to add Content-Type text/html, but now
> I get
> >> >> >> >
> >> >> >> > SimplePostTool: WARNING: Skipping URL with unsupported type
> text/html
> >> >> >> >
> >> >> >> > What is it expecting?
> >> >> >> >
> >> >> >> > $ docker exec -it --user=solr solr bin/post -c handbook
> >> >> >> > http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes
> md
> >> >> >> > /docker-java-home/jre/bin/java -classpath
> >> >> /opt/solr/dist/solr-core-7.0.1.jar
> >> >> >> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook
> >> >> -Ddata=web
> >> >> >> > org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
> >> >> >> > SimplePostTool version 5.0.0
> >> >> >> > Posting web pages to Solr url http://localhost:8983/solr/
> >> >> >> > handbook/update/extract
> >> >> >> > Entering auto mode. Indexing pages with content-types
> corresponding
> >> >> to
> >> >> >> > file endings md
> >> >> >> > SimplePostTool: WARNING: Never crawl an external web site
> faster than
> >> >> >> > every 10 seconds, your IP will probably be blocked
> >> >> >> > Entering recursive mode, depth=10, delay=0s
> >> >> >> > Entering crawl at level 0 (1 links total, 1 new)
> >> >> >> > SimplePostTool: WARNING: Skipping URL with unsupported type
> text/html
> >> >> >> > SimplePostTool: WARNING: The URL http://quadra:9091/index.md
> >> >> returned a
> >> >> >> > HTTP result status of 415
> >> >> >> > 0 web pages indexed.
> >> >> >> > COMMITting Solr index changes to http://localhost:8983/solr/
> >> >> >> > handbook/update/extract...
> >> >> >> > Time spent: 0:00:03.882
> >> >> >> > $
> >> >> >> >
> >> >> >> > Thanks.
> >> >> >> >
> >> >> >> > Kevin
> >> >> >> >
> >> >>
> >> >
> >> >
>
Reply | Threaded
Open this post in threaded view
|

Re: solr 7.0.1: exception running post to crawl simple website

Kevin Layer
Amrit Sarkar wrote:

>> Hi Kevin,
>>
>> Can you post the solr log in the mail thread. I don't think it handled the
>> .md by itself by first glance at code.

How do I extract the log you want?


>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>
>> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer <[hidden email]> wrote:
>>
>> > Amrit Sarkar wrote:
>> >
>> > >> Kevin,
>> > >>
>> > >> Just put "html" too and give it a shot. These are the types it is
>> > expecting:
>> >
>> > Same thing.
>> >
>> > >>
>> > >> mimeMap = new HashMap<>();
>> > >> mimeMap.put("xml", "application/xml");
>> > >> mimeMap.put("csv", "text/csv");
>> > >> mimeMap.put("json", "application/json");
>> > >> mimeMap.put("jsonl", "application/json");
>> > >> mimeMap.put("pdf", "application/pdf");
>> > >> mimeMap.put("rtf", "text/rtf");
>> > >> mimeMap.put("html", "text/html");
>> > >> mimeMap.put("htm", "text/html");
>> > >> mimeMap.put("doc", "application/msword");
>> > >> mimeMap.put("docx",
>> > >> "application/vnd.openxmlformats-officedocument.
>> > wordprocessingml.document");
>> > >> mimeMap.put("ppt", "application/vnd.ms-powerpoint");
>> > >> mimeMap.put("pptx",
>> > >> "application/vnd.openxmlformats-officedocument.
>> > presentationml.presentation");
>> > >> mimeMap.put("xls", "application/vnd.ms-excel");
>> > >> mimeMap.put("xlsx",
>> > >> "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
>> > >> mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
>> > >> mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
>> > >> mimeMap.put("odp", "application/vnd.oasis.opendocument.presentation");
>> > >> mimeMap.put("otp", "application/vnd.oasis.opendocument.presentation");
>> > >> mimeMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet");
>> > >> mimeMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet");
>> > >> mimeMap.put("txt", "text/plain");
>> > >> mimeMap.put("log", "text/plain");
>> > >>
>> > >> The keys are the types supported.
>> > >>
>> > >>
>> > >> Amrit Sarkar
>> > >> Search Engineer
>> > >> Lucidworks, Inc.
>> > >> 415-589-9269
>> > >> www.lucidworks.com
>> > >> Twitter http://twitter.com/lucidworks
>> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > >>
>> > >> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar <[hidden email]>
>> > >> wrote:
>> > >>
>> > >> > Ah!
>> > >> >
>> > >> > Only supported type is: text/html; encoding=utf-8
>> > >> >
>> > >> > I am not confident of this either :) but this should work.
>> > >> >
>> > >> > See the code-snippet below:
>> > >> >
>> > >> > ......
>> > >> >
>> > >> > if(res.httpStatus == 200) {
>> > >> >   // Raw content type of form "text/html; encoding=utf-8"
>> > >> >   String rawContentType = conn.getContentType();
>> > >> >   String type = rawContentType.split(";")[0];
>> > >> >   if(typeSupported(type) || "*".equals(fileTypes)) {
>> > >> >     String encoding = conn.getContentEncoding();
>> > >> >
>> > >> > ....
>> > >> >
>> > >> >
>> > >> > Amrit Sarkar
>> > >> > Search Engineer
>> > >> > Lucidworks, Inc.
>> > >> > 415-589-9269
>> > >> > www.lucidworks.com
>> > >> > Twitter http://twitter.com/lucidworks
>> > >> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > >> >
>> > >> > On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer <[hidden email]> wrote:
>> > >> >
>> > >> >> Amrit Sarkar wrote:
>> > >> >>
>> > >> >> >> Strange,
>> > >> >> >>
>> > >> >> >> Can you add: "text/html;charset=utf-8". This is wiki.apache.org
>> > page's
>> > >> >> >> Content-Type. Let's see what it says now.
>> > >> >>
>> > >> >> Same thing.  Verified Content-Type:
>> > >> >>
>> > >> >> quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md
>> > |&
>> > >> >> grep Content-Type
>> > >> >>   Content-Type: text/html;charset=utf-8
>> > >> >> quadra[git:master]$ ]
>> > >> >>
>> > >> >> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c
>> > handbook
>> > >> >> http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
>> > >> >> /docker-java-home/jre/bin/java -classpath
>> > /opt/solr/dist/solr-core-7.0.1.jar
>> > >> >> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook
>> > -Ddata=web
>> > >> >> org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
>> > >> >> SimplePostTool version 5.0.0
>> > >> >> Posting web pages to Solr url http://localhost:8983/solr/han
>> > >> >> dbook/update/extract
>> > >> >> Entering auto mode. Indexing pages with content-types corresponding
>> > to
>> > >> >> file endings md
>> > >> >> SimplePostTool: WARNING: Never crawl an external web site faster than
>> > >> >> every 10 seconds, your IP will probably be blocked
>> > >> >> Entering recursive mode, depth=10, delay=0s
>> > >> >> Entering crawl at level 0 (1 links total, 1 new)
>> > >> >> SimplePostTool: WARNING: Skipping URL with unsupported type text/html
>> > >> >> SimplePostTool: WARNING: The URL http://quadra:9091/index.md
>> > returned a
>> > >> >> HTTP result status of 415
>> > >> >> 0 web pages indexed.
>> > >> >> COMMITting Solr index changes to http://localhost:8983/solr/han
>> > >> >> dbook/update/extract...
>> > >> >> Time spent: 0:00:00.531
>> > >> >> quadra[git:master]$
>> > >> >>
>> > >> >> Kevin
>> > >> >>
>> > >> >> >>
>> > >> >> >> Amrit Sarkar
>> > >> >> >> Search Engineer
>> > >> >> >> Lucidworks, Inc.
>> > >> >> >> 415-589-9269
>> > >> >> >> www.lucidworks.com
>> > >> >> >> Twitter http://twitter.com/lucidworks
>> > >> >> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > >> >> >>
>> > >> >> >> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer <[hidden email]>
>> > wrote:
>> > >> >> >>
>> > >> >> >> > OK, so I hacked markserv to add Content-Type text/html, but now
>> > I get
>> > >> >> >> >
>> > >> >> >> > SimplePostTool: WARNING: Skipping URL with unsupported type
>> > text/html
>> > >> >> >> >
>> > >> >> >> > What is it expecting?
>> > >> >> >> >
>> > >> >> >> > $ docker exec -it --user=solr solr bin/post -c handbook
>> > >> >> >> > http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes
>> > md
>> > >> >> >> > /docker-java-home/jre/bin/java -classpath
>> > >> >> /opt/solr/dist/solr-core-7.0.1.jar
>> > >> >> >> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook
>> > >> >> -Ddata=web
>> > >> >> >> > org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
>> > >> >> >> > SimplePostTool version 5.0.0
>> > >> >> >> > Posting web pages to Solr url http://localhost:8983/solr/
>> > >> >> >> > handbook/update/extract
>> > >> >> >> > Entering auto mode. Indexing pages with content-types
>> > corresponding
>> > >> >> to
>> > >> >> >> > file endings md
>> > >> >> >> > SimplePostTool: WARNING: Never crawl an external web site
>> > faster than
>> > >> >> >> > every 10 seconds, your IP will probably be blocked
>> > >> >> >> > Entering recursive mode, depth=10, delay=0s
>> > >> >> >> > Entering crawl at level 0 (1 links total, 1 new)
>> > >> >> >> > SimplePostTool: WARNING: Skipping URL with unsupported type
>> > text/html
>> > >> >> >> > SimplePostTool: WARNING: The URL http://quadra:9091/index.md
>> > >> >> returned a
>> > >> >> >> > HTTP result status of 415
>> > >> >> >> > 0 web pages indexed.
>> > >> >> >> > COMMITting Solr index changes to http://localhost:8983/solr/
>> > >> >> >> > handbook/update/extract...
>> > >> >> >> > Time spent: 0:00:03.882
>> > >> >> >> > $
>> > >> >> >> >
>> > >> >> >> > Thanks.
>> > >> >> >> >
>> > >> >> >> > Kevin
>> > >> >> >> >
>> > >> >>
>> > >> >
>> > >> >
>> >
Reply | Threaded
Open this post in threaded view
|

Re: solr 7.0.1: exception running post to crawl simple website

Kevin Layer
In reply to this post by Amrit Sarkar
Amrit Sarkar wrote:

>> Hi Kevin,
>>
>> Can you post the solr log in the mail thread. I don't think it handled the
>> .md by itself by first glance at code.

Note that when I use the admin web interface, and click on "Logging"
on the left, I just see a spinner that implies it's trying to retrieve
the logs (I see headers "Time (Local) Level Core Logger Message"),
but no log entries.  It's been like this for 10 minutes.

>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>
>> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer <[hidden email]> wrote:
>>
>> > Amrit Sarkar wrote:
>> >
>> > >> Kevin,
>> > >>
>> > >> Just put "html" too and give it a shot. These are the types it is
>> > expecting:
>> >
>> > Same thing.
>> >
>> > >>
>> > >> mimeMap = new HashMap<>();
>> > >> mimeMap.put("xml", "application/xml");
>> > >> mimeMap.put("csv", "text/csv");
>> > >> mimeMap.put("json", "application/json");
>> > >> mimeMap.put("jsonl", "application/json");
>> > >> mimeMap.put("pdf", "application/pdf");
>> > >> mimeMap.put("rtf", "text/rtf");
>> > >> mimeMap.put("html", "text/html");
>> > >> mimeMap.put("htm", "text/html");
>> > >> mimeMap.put("doc", "application/msword");
>> > >> mimeMap.put("docx",
>> > >> "application/vnd.openxmlformats-officedocument.
>> > wordprocessingml.document");
>> > >> mimeMap.put("ppt", "application/vnd.ms-powerpoint");
>> > >> mimeMap.put("pptx",
>> > >> "application/vnd.openxmlformats-officedocument.
>> > presentationml.presentation");
>> > >> mimeMap.put("xls", "application/vnd.ms-excel");
>> > >> mimeMap.put("xlsx",
>> > >> "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
>> > >> mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
>> > >> mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
>> > >> mimeMap.put("odp", "application/vnd.oasis.opendocument.presentation");
>> > >> mimeMap.put("otp", "application/vnd.oasis.opendocument.presentation");
>> > >> mimeMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet");
>> > >> mimeMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet");
>> > >> mimeMap.put("txt", "text/plain");
>> > >> mimeMap.put("log", "text/plain");
>> > >>
>> > >> The keys are the types supported.
>> > >>
>> > >>
>> > >> Amrit Sarkar
>> > >> Search Engineer
>> > >> Lucidworks, Inc.
>> > >> 415-589-9269
>> > >> www.lucidworks.com
>> > >> Twitter http://twitter.com/lucidworks
>> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > >>
>> > >> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar <[hidden email]>
>> > >> wrote:
>> > >>
>> > >> > Ah!
>> > >> >
>> > >> > Only supported type is: text/html; encoding=utf-8
>> > >> >
>> > >> > I am not confident of this either :) but this should work.
>> > >> >
>> > >> > See the code-snippet below:
>> > >> >
>> > >> > ......
>> > >> >
>> > >> > if(res.httpStatus == 200) {
>> > >> >   // Raw content type of form "text/html; encoding=utf-8"
>> > >> >   String rawContentType = conn.getContentType();
>> > >> >   String type = rawContentType.split(";")[0];
>> > >> >   if(typeSupported(type) || "*".equals(fileTypes)) {
>> > >> >     String encoding = conn.getContentEncoding();
>> > >> >
>> > >> > ....
>> > >> >
>> > >> >
>> > >> > Amrit Sarkar
>> > >> > Search Engineer
>> > >> > Lucidworks, Inc.
>> > >> > 415-589-9269
>> > >> > www.lucidworks.com
>> > >> > Twitter http://twitter.com/lucidworks
>> > >> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > >> >
>> > >> > On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer <[hidden email]> wrote:
>> > >> >
>> > >> >> Amrit Sarkar wrote:
>> > >> >>
>> > >> >> >> Strange,
>> > >> >> >>
>> > >> >> >> Can you add: "text/html;charset=utf-8". This is wiki.apache.org
>> > page's
>> > >> >> >> Content-Type. Let's see what it says now.
>> > >> >>
>> > >> >> Same thing.  Verified Content-Type:
>> > >> >>
>> > >> >> quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md
>> > |&
>> > >> >> grep Content-Type
>> > >> >>   Content-Type: text/html;charset=utf-8
>> > >> >> quadra[git:master]$ ]
>> > >> >>
>> > >> >> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c
>> > handbook
>> > >> >> http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
>> > >> >> /docker-java-home/jre/bin/java -classpath
>> > /opt/solr/dist/solr-core-7.0.1.jar
>> > >> >> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook
>> > -Ddata=web
>> > >> >> org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
>> > >> >> SimplePostTool version 5.0.0
>> > >> >> Posting web pages to Solr url http://localhost:8983/solr/han
>> > >> >> dbook/update/extract
>> > >> >> Entering auto mode. Indexing pages with content-types corresponding
>> > to
>> > >> >> file endings md
>> > >> >> SimplePostTool: WARNING: Never crawl an external web site faster than
>> > >> >> every 10 seconds, your IP will probably be blocked
>> > >> >> Entering recursive mode, depth=10, delay=0s
>> > >> >> Entering crawl at level 0 (1 links total, 1 new)
>> > >> >> SimplePostTool: WARNING: Skipping URL with unsupported type text/html
>> > >> >> SimplePostTool: WARNING: The URL http://quadra:9091/index.md
>> > returned a
>> > >> >> HTTP result status of 415
>> > >> >> 0 web pages indexed.
>> > >> >> COMMITting Solr index changes to http://localhost:8983/solr/han
>> > >> >> dbook/update/extract...
>> > >> >> Time spent: 0:00:00.531
>> > >> >> quadra[git:master]$
>> > >> >>
>> > >> >> Kevin
>> > >> >>
>> > >> >> >>
>> > >> >> >> Amrit Sarkar
>> > >> >> >> Search Engineer
>> > >> >> >> Lucidworks, Inc.
>> > >> >> >> 415-589-9269
>> > >> >> >> www.lucidworks.com
>> > >> >> >> Twitter http://twitter.com/lucidworks
>> > >> >> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > >> >> >>
>> > >> >> >> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer <[hidden email]>
>> > wrote:
>> > >> >> >>
>> > >> >> >> > OK, so I hacked markserv to add Content-Type text/html, but now
>> > I get
>> > >> >> >> >
>> > >> >> >> > SimplePostTool: WARNING: Skipping URL with unsupported type
>> > text/html
>> > >> >> >> >
>> > >> >> >> > What is it expecting?
>> > >> >> >> >
>> > >> >> >> > $ docker exec -it --user=solr solr bin/post -c handbook
>> > >> >> >> > http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes
>> > md
>> > >> >> >> > /docker-java-home/jre/bin/java -classpath
>> > >> >> /opt/solr/dist/solr-core-7.0.1.jar
>> > >> >> >> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook
>> > >> >> -Ddata=web
>> > >> >> >> > org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
>> > >> >> >> > SimplePostTool version 5.0.0
>> > >> >> >> > Posting web pages to Solr url http://localhost:8983/solr/
>> > >> >> >> > handbook/update/extract
>> > >> >> >> > Entering auto mode. Indexing pages with content-types
>> > corresponding
>> > >> >> to
>> > >> >> >> > file endings md
>> > >> >> >> > SimplePostTool: WARNING: Never crawl an external web site
>> > faster than
>> > >> >> >> > every 10 seconds, your IP will probably be blocked
>> > >> >> >> > Entering recursive mode, depth=10, delay=0s
>> > >> >> >> > Entering crawl at level 0 (1 links total, 1 new)
>> > >> >> >> > SimplePostTool: WARNING: Skipping URL with unsupported type
>> > text/html
>> > >> >> >> > SimplePostTool: WARNING: The URL http://quadra:9091/index.md
>> > >> >> returned a
>> > >> >> >> > HTTP result status of 415
>> > >> >> >> > 0 web pages indexed.
>> > >> >> >> > COMMITting Solr index changes to http://localhost:8983/solr/
>> > >> >> >> > handbook/update/extract...
>> > >> >> >> > Time spent: 0:00:03.882
>> > >> >> >> > $
>> > >> >> >> >
>> > >> >> >> > Thanks.
>> > >> >> >> >
>> > >> >> >> > Kevin
>> > >> >> >> >
>> > >> >>
>> > >> >
>> > >> >
>> >
Reply | Threaded
Open this post in threaded view
|

Re: solr 7.0.1: exception running post to crawl simple website

Amrit Sarkar
In reply to this post by Kevin Layer
ah oh, dockers. They are placed under [solr-home]/server/log/solr/log in
the machine. I haven't played much with docker, any way you can get that
file from that location.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Oct 13, 2017 at 8:08 PM, Kevin Layer <[hidden email]> wrote:

> Amrit Sarkar wrote:
>
> >> Hi Kevin,
> >>
> >> Can you post the solr log in the mail thread. I don't think it handled
> the
> >> .md by itself by first glance at code.
>
> How do I extract the log you want?
>
>
> >>
> >> Amrit Sarkar
> >> Search Engineer
> >> Lucidworks, Inc.
> >> 415-589-9269
> >> www.lucidworks.com
> >> Twitter http://twitter.com/lucidworks
> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >>
> >> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer <[hidden email]> wrote:
> >>
> >> > Amrit Sarkar wrote:
> >> >
> >> > >> Kevin,
> >> > >>
> >> > >> Just put "html" too and give it a shot. These are the types it is
> >> > expecting:
> >> >
> >> > Same thing.
> >> >
> >> > >>
> >> > >> mimeMap = new HashMap<>();
> >> > >> mimeMap.put("xml", "application/xml");
> >> > >> mimeMap.put("csv", "text/csv");
> >> > >> mimeMap.put("json", "application/json");
> >> > >> mimeMap.put("jsonl", "application/json");
> >> > >> mimeMap.put("pdf", "application/pdf");
> >> > >> mimeMap.put("rtf", "text/rtf");
> >> > >> mimeMap.put("html", "text/html");
> >> > >> mimeMap.put("htm", "text/html");
> >> > >> mimeMap.put("doc", "application/msword");
> >> > >> mimeMap.put("docx",
> >> > >> "application/vnd.openxmlformats-officedocument.
> >> > wordprocessingml.document");
> >> > >> mimeMap.put("ppt", "application/vnd.ms-powerpoint");
> >> > >> mimeMap.put("pptx",
> >> > >> "application/vnd.openxmlformats-officedocument.
> >> > presentationml.presentation");
> >> > >> mimeMap.put("xls", "application/vnd.ms-excel");
> >> > >> mimeMap.put("xlsx",
> >> > >> "application/vnd.openxmlformats-officedocument.
> spreadsheetml.sheet");
> >> > >> mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
> >> > >> mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
> >> > >> mimeMap.put("odp", "application/vnd.oasis.
> opendocument.presentation");
> >> > >> mimeMap.put("otp", "application/vnd.oasis.
> opendocument.presentation");
> >> > >> mimeMap.put("ods", "application/vnd.oasis.
> opendocument.spreadsheet");
> >> > >> mimeMap.put("ots", "application/vnd.oasis.
> opendocument.spreadsheet");
> >> > >> mimeMap.put("txt", "text/plain");
> >> > >> mimeMap.put("log", "text/plain");
> >> > >>
> >> > >> The keys are the types supported.
> >> > >>
> >> > >>
> >> > >> Amrit Sarkar
> >> > >> Search Engineer
> >> > >> Lucidworks, Inc.
> >> > >> 415-589-9269
> >> > >> www.lucidworks.com
> >> > >> Twitter http://twitter.com/lucidworks
> >> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> > >>
> >> > >> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar <
> [hidden email]>
> >> > >> wrote:
> >> > >>
> >> > >> > Ah!
> >> > >> >
> >> > >> > Only supported type is: text/html; encoding=utf-8
> >> > >> >
> >> > >> > I am not confident of this either :) but this should work.
> >> > >> >
> >> > >> > See the code-snippet below:
> >> > >> >
> >> > >> > ......
> >> > >> >
> >> > >> > if(res.httpStatus == 200) {
> >> > >> >   // Raw content type of form "text/html; encoding=utf-8"
> >> > >> >   String rawContentType = conn.getContentType();
> >> > >> >   String type = rawContentType.split(";")[0];
> >> > >> >   if(typeSupported(type) || "*".equals(fileTypes)) {
> >> > >> >     String encoding = conn.getContentEncoding();
> >> > >> >
> >> > >> > ....
> >> > >> >
> >> > >> >
> >> > >> > Amrit Sarkar
> >> > >> > Search Engineer
> >> > >> > Lucidworks, Inc.
> >> > >> > 415-589-9269
> >> > >> > www.lucidworks.com
> >> > >> > Twitter http://twitter.com/lucidworks
> >> > >> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> > >> >
> >> > >> > On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer <[hidden email]>
> wrote:
> >> > >> >
> >> > >> >> Amrit Sarkar wrote:
> >> > >> >>
> >> > >> >> >> Strange,
> >> > >> >> >>
> >> > >> >> >> Can you add: "text/html;charset=utf-8". This is
> wiki.apache.org
> >> > page's
> >> > >> >> >> Content-Type. Let's see what it says now.
> >> > >> >>
> >> > >> >> Same thing.  Verified Content-Type:
> >> > >> >>
> >> > >> >> quadra[git:master]$ wget -S -O /dev/null
> http://quadra:9091/index.md
> >> > |&
> >> > >> >> grep Content-Type
> >> > >> >>   Content-Type: text/html;charset=utf-8
> >> > >> >> quadra[git:master]$ ]
> >> > >> >>
> >> > >> >> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c
> >> > handbook
> >> > >> >> http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes
> md
> >> > >> >> /docker-java-home/jre/bin/java -classpath
> >> > /opt/solr/dist/solr-core-7.0.1.jar
> >> > >> >> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook
> >> > -Ddata=web
> >> > >> >> org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
> >> > >> >> SimplePostTool version 5.0.0
> >> > >> >> Posting web pages to Solr url http://localhost:8983/solr/han
> >> > >> >> dbook/update/extract
> >> > >> >> Entering auto mode. Indexing pages with content-types
> corresponding
> >> > to
> >> > >> >> file endings md
> >> > >> >> SimplePostTool: WARNING: Never crawl an external web site
> faster than
> >> > >> >> every 10 seconds, your IP will probably be blocked
> >> > >> >> Entering recursive mode, depth=10, delay=0s
> >> > >> >> Entering crawl at level 0 (1 links total, 1 new)
> >> > >> >> SimplePostTool: WARNING: Skipping URL with unsupported type
> text/html
> >> > >> >> SimplePostTool: WARNING: The URL http://quadra:9091/index.md
> >> > returned a
> >> > >> >> HTTP result status of 415
> >> > >> >> 0 web pages indexed.
> >> > >> >> COMMITting Solr index changes to http://localhost:8983/solr/han
> >> > >> >> dbook/update/extract...
> >> > >> >> Time spent: 0:00:00.531
> >> > >> >> quadra[git:master]$
> >> > >> >>
> >> > >> >> Kevin
> >> > >> >>
> >> > >> >> >>
> >> > >> >> >> Amrit Sarkar
> >> > >> >> >> Search Engineer
> >> > >> >> >> Lucidworks, Inc.
> >> > >> >> >> 415-589-9269
> >> > >> >> >> www.lucidworks.com
> >> > >> >> >> Twitter http://twitter.com/lucidworks
> >> > >> >> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> > >> >> >>
> >> > >> >> >> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer <
> [hidden email]>
> >> > wrote:
> >> > >> >> >>
> >> > >> >> >> > OK, so I hacked markserv to add Content-Type text/html,
> but now
> >> > I get
> >> > >> >> >> >
> >> > >> >> >> > SimplePostTool: WARNING: Skipping URL with unsupported type
> >> > text/html
> >> > >> >> >> >
> >> > >> >> >> > What is it expecting?
> >> > >> >> >> >
> >> > >> >> >> > $ docker exec -it --user=solr solr bin/post -c handbook
> >> > >> >> >> > http://quadra:9091/index.md -recursive 10 -delay 0
> -filetypes
> >> > md
> >> > >> >> >> > /docker-java-home/jre/bin/java -classpath
> >> > >> >> /opt/solr/dist/solr-core-7.0.1.jar
> >> > >> >> >> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md
> -Dc=handbook
> >> > >> >> -Ddata=web
> >> > >> >> >> > org.apache.solr.util.SimplePostTool
> http://quadra:9091/index.md
> >> > >> >> >> > SimplePostTool version 5.0.0
> >> > >> >> >> > Posting web pages to Solr url http://localhost:8983/solr/
> >> > >> >> >> > handbook/update/extract
> >> > >> >> >> > Entering auto mode. Indexing pages with content-types
> >> > corresponding
> >> > >> >> to
> >> > >> >> >> > file endings md
> >> > >> >> >> > SimplePostTool: WARNING: Never crawl an external web site
> >> > faster than
> >> > >> >> >> > every 10 seconds, your IP will probably be blocked
> >> > >> >> >> > Entering recursive mode, depth=10, delay=0s
> >> > >> >> >> > Entering crawl at level 0 (1 links total, 1 new)
> >> > >> >> >> > SimplePostTool: WARNING: Skipping URL with unsupported type
> >> > text/html
> >> > >> >> >> > SimplePostTool: WARNING: The URL
> http://quadra:9091/index.md
> >> > >> >> returned a
> >> > >> >> >> > HTTP result status of 415
> >> > >> >> >> > 0 web pages indexed.
> >> > >> >> >> > COMMITting Solr index changes to
> http://localhost:8983/solr/
> >> > >> >> >> > handbook/update/extract...
> >> > >> >> >> > Time spent: 0:00:03.882
> >> > >> >> >> > $
> >> > >> >> >> >
> >> > >> >> >> > Thanks.
> >> > >> >> >> >
> >> > >> >> >> > Kevin
> >> > >> >> >> >
> >> > >> >>
> >> > >> >
> >> > >> >
> >> >
>
Reply | Threaded
Open this post in threaded view
|

Re: solr 7.0.1: exception running post to crawl simple website

Amrit Sarkar
pardon: [solr-home]/server/log/solr.log

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Oct 13, 2017 at 8:10 PM, Amrit Sarkar <[hidden email]>
wrote:

> ah oh, dockers. They are placed under [solr-home]/server/log/solr/log in
> the machine. I haven't played much with docker, any way you can get that
> file from that location.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Fri, Oct 13, 2017 at 8:08 PM, Kevin Layer <[hidden email]> wrote:
>
>> Amrit Sarkar wrote:
>>
>> >> Hi Kevin,
>> >>
>> >> Can you post the solr log in the mail thread. I don't think it handled
>> the
>> >> .md by itself by first glance at code.
>>
>> How do I extract the log you want?
>>
>>
>> >>
>> >> Amrit Sarkar
>> >> Search Engineer
>> >> Lucidworks, Inc.
>> >> 415-589-9269
>> >> www.lucidworks.com
>> >> Twitter http://twitter.com/lucidworks
>> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> >>
>> >> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer <[hidden email]> wrote:
>> >>
>> >> > Amrit Sarkar wrote:
>> >> >
>> >> > >> Kevin,
>> >> > >>
>> >> > >> Just put "html" too and give it a shot. These are the types it is
>> >> > expecting:
>> >> >
>> >> > Same thing.
>> >> >
>> >> > >>
>> >> > >> mimeMap = new HashMap<>();
>> >> > >> mimeMap.put("xml", "application/xml");
>> >> > >> mimeMap.put("csv", "text/csv");
>> >> > >> mimeMap.put("json", "application/json");
>> >> > >> mimeMap.put("jsonl", "application/json");
>> >> > >> mimeMap.put("pdf", "application/pdf");
>> >> > >> mimeMap.put("rtf", "text/rtf");
>> >> > >> mimeMap.put("html", "text/html");
>> >> > >> mimeMap.put("htm", "text/html");
>> >> > >> mimeMap.put("doc", "application/msword");
>> >> > >> mimeMap.put("docx",
>> >> > >> "application/vnd.openxmlformats-officedocument.
>> >> > wordprocessingml.document");
>> >> > >> mimeMap.put("ppt", "application/vnd.ms-powerpoint");
>> >> > >> mimeMap.put("pptx",
>> >> > >> "application/vnd.openxmlformats-officedocument.
>> >> > presentationml.presentation");
>> >> > >> mimeMap.put("xls", "application/vnd.ms-excel");
>> >> > >> mimeMap.put("xlsx",
>> >> > >> "application/vnd.openxmlformats-officedocument.spreadsheetml
>> .sheet");
>> >> > >> mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
>> >> > >> mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
>> >> > >> mimeMap.put("odp", "application/vnd.oasis.opendoc
>> ument.presentation");
>> >> > >> mimeMap.put("otp", "application/vnd.oasis.opendoc
>> ument.presentation");
>> >> > >> mimeMap.put("ods", "application/vnd.oasis.opendoc
>> ument.spreadsheet");
>> >> > >> mimeMap.put("ots", "application/vnd.oasis.opendoc
>> ument.spreadsheet");
>> >> > >> mimeMap.put("txt", "text/plain");
>> >> > >> mimeMap.put("log", "text/plain");
>> >> > >>
>> >> > >> The keys are the types supported.
>> >> > >>
>> >> > >>
>> >> > >> Amrit Sarkar
>> >> > >> Search Engineer
>> >> > >> Lucidworks, Inc.
>> >> > >> 415-589-9269
>> >> > >> www.lucidworks.com
>> >> > >> Twitter http://twitter.com/lucidworks
>> >> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> >> > >>
>> >> > >> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar <
>> [hidden email]>
>> >> > >> wrote:
>> >> > >>
>> >> > >> > Ah!
>> >> > >> >
>> >> > >> > Only supported type is: text/html; encoding=utf-8
>> >> > >> >
>> >> > >> > I am not confident of this either :) but this should work.
>> >> > >> >
>> >> > >> > See the code-snippet below:
>> >> > >> >
>> >> > >> > ......
>> >> > >> >
>> >> > >> > if(res.httpStatus == 200) {
>> >> > >> >   // Raw content type of form "text/html; encoding=utf-8"
>> >> > >> >   String rawContentType = conn.getContentType();
>> >> > >> >   String type = rawContentType.split(";")[0];
>> >> > >> >   if(typeSupported(type) || "*".equals(fileTypes)) {
>> >> > >> >     String encoding = conn.getContentEncoding();
>> >> > >> >
>> >> > >> > ....
>> >> > >> >
>> >> > >> >
>> >> > >> > Amrit Sarkar
>> >> > >> > Search Engineer
>> >> > >> > Lucidworks, Inc.
>> >> > >> > 415-589-9269
>> >> > >> > www.lucidworks.com
>> >> > >> > Twitter http://twitter.com/lucidworks
>> >> > >> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> >> > >> >
>> >> > >> > On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer <[hidden email]>
>> wrote:
>> >> > >> >
>> >> > >> >> Amrit Sarkar wrote:
>> >> > >> >>
>> >> > >> >> >> Strange,
>> >> > >> >> >>
>> >> > >> >> >> Can you add: "text/html;charset=utf-8". This is
>> wiki.apache.org
>> >> > page's
>> >> > >> >> >> Content-Type. Let's see what it says now.
>> >> > >> >>
>> >> > >> >> Same thing.  Verified Content-Type:
>> >> > >> >>
>> >> > >> >> quadra[git:master]$ wget -S -O /dev/null
>> http://quadra:9091/index.md
>> >> > |&
>> >> > >> >> grep Content-Type
>> >> > >> >>   Content-Type: text/html;charset=utf-8
>> >> > >> >> quadra[git:master]$ ]
>> >> > >> >>
>> >> > >> >> quadra[git:master]$ docker exec -it --user=solr solr bin/post
>> -c
>> >> > handbook
>> >> > >> >> http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes
>> md
>> >> > >> >> /docker-java-home/jre/bin/java -classpath
>> >> > /opt/solr/dist/solr-core-7.0.1.jar
>> >> > >> >> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook
>> >> > -Ddata=web
>> >> > >> >> org.apache.solr.util.SimplePostTool
>> http://quadra:9091/index.md
>> >> > >> >> SimplePostTool version 5.0.0
>> >> > >> >> Posting web pages to Solr url http://localhost:8983/solr/han
>> >> > >> >> dbook/update/extract
>> >> > >> >> Entering auto mode. Indexing pages with content-types
>> corresponding
>> >> > to
>> >> > >> >> file endings md
>> >> > >> >> SimplePostTool: WARNING: Never crawl an external web site
>> faster than
>> >> > >> >> every 10 seconds, your IP will probably be blocked
>> >> > >> >> Entering recursive mode, depth=10, delay=0s
>> >> > >> >> Entering crawl at level 0 (1 links total, 1 new)
>> >> > >> >> SimplePostTool: WARNING: Skipping URL with unsupported type
>> text/html
>> >> > >> >> SimplePostTool: WARNING: The URL http://quadra:9091/index.md
>> >> > returned a
>> >> > >> >> HTTP result status of 415
>> >> > >> >> 0 web pages indexed.
>> >> > >> >> COMMITting Solr index changes to
>> http://localhost:8983/solr/han
>> >> > >> >> dbook/update/extract...
>> >> > >> >> Time spent: 0:00:00.531
>> >> > >> >> quadra[git:master]$
>> >> > >> >>
>> >> > >> >> Kevin
>> >> > >> >>
>> >> > >> >> >>
>> >> > >> >> >> Amrit Sarkar
>> >> > >> >> >> Search Engineer
>> >> > >> >> >> Lucidworks, Inc.
>> >> > >> >> >> 415-589-9269
>> >> > >> >> >> www.lucidworks.com
>> >> > >> >> >> Twitter http://twitter.com/lucidworks
>> >> > >> >> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> >> > >> >> >>
>> >> > >> >> >> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer <
>> [hidden email]>
>> >> > wrote:
>> >> > >> >> >>
>> >> > >> >> >> > OK, so I hacked markserv to add Content-Type text/html,
>> but now
>> >> > I get
>> >> > >> >> >> >
>> >> > >> >> >> > SimplePostTool: WARNING: Skipping URL with unsupported
>> type
>> >> > text/html
>> >> > >> >> >> >
>> >> > >> >> >> > What is it expecting?
>> >> > >> >> >> >
>> >> > >> >> >> > $ docker exec -it --user=solr solr bin/post -c handbook
>> >> > >> >> >> > http://quadra:9091/index.md -recursive 10 -delay 0
>> -filetypes
>> >> > md
>> >> > >> >> >> > /docker-java-home/jre/bin/java -classpath
>> >> > >> >> /opt/solr/dist/solr-core-7.0.1.jar
>> >> > >> >> >> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md
>> -Dc=handbook
>> >> > >> >> -Ddata=web
>> >> > >> >> >> > org.apache.solr.util.SimplePostTool
>> http://quadra:9091/index.md
>> >> > >> >> >> > SimplePostTool version 5.0.0
>> >> > >> >> >> > Posting web pages to Solr url http://localhost:8983/solr/
>> >> > >> >> >> > handbook/update/extract
>> >> > >> >> >> > Entering auto mode. Indexing pages with content-types
>> >> > corresponding
>> >> > >> >> to
>> >> > >> >> >> > file endings md
>> >> > >> >> >> > SimplePostTool: WARNING: Never crawl an external web site
>> >> > faster than
>> >> > >> >> >> > every 10 seconds, your IP will probably be blocked
>> >> > >> >> >> > Entering recursive mode, depth=10, delay=0s
>> >> > >> >> >> > Entering crawl at level 0 (1 links total, 1 new)
>> >> > >> >> >> > SimplePostTool: WARNING: Skipping URL with unsupported
>> type
>> >> > text/html
>> >> > >> >> >> > SimplePostTool: WARNING: The URL
>> http://quadra:9091/index.md
>> >> > >> >> returned a
>> >> > >> >> >> > HTTP result status of 415
>> >> > >> >> >> > 0 web pages indexed.
>> >> > >> >> >> > COMMITting Solr index changes to
>> http://localhost:8983/solr/
>> >> > >> >> >> > handbook/update/extract...
>> >> > >> >> >> > Time spent: 0:00:03.882
>> >> > >> >> >> > $
>> >> > >> >> >> >
>> >> > >> >> >> > Thanks.
>> >> > >> >> >> >
>> >> > >> >> >> > Kevin
>> >> > >> >> >> >
>> >> > >> >>
>> >> > >> >
>> >> > >> >
>> >> >
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: solr 7.0.1: exception running post to crawl simple website

Kevin Layer
In reply to this post by Amrit Sarkar
Amrit Sarkar wrote:

>> ah oh, dockers. They are placed under [solr-home]/server/log/solr/log in
>> the machine. I haven't played much with docker, any way you can get that
>> file from that location.

I see these files:

/opt/solr/server/logs/archived
/opt/solr/server/logs/solr_gc.log.0.current
/opt/solr/server/logs/solr.log
/opt/solr/server/solr/handbook/data/tlog

The 3rd one has very little info.  Attached:


2017-10-11 15:28:09.564 INFO  (main) [   ] o.e.j.s.Server jetty-9.3.14.v20161028
2017-10-11 15:28:10.668 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter  ___      _       Welcome to Apache Solr™ version 7.0.1
2017-10-11 15:28:10.669 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter / __| ___| |_ _   Starting in standalone mode on port 8983
2017-10-11 15:28:10.670 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter \__ \/ _ \ | '_|  Install dir: /opt/solr, Default config dir: /opt/solr/server/solr/configsets/_default/conf
2017-10-11 15:28:10.707 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter |___/\___/_|_|    Start time: 2017-10-11T15:28:10.674Z
2017-10-11 15:28:10.747 INFO  (main) [   ] o.a.s.c.SolrResourceLoader Using system property solr.solr.home: /opt/solr/server/solr
2017-10-11 15:28:10.763 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading container configuration from /opt/solr/server/solr/solr.xml
2017-10-11 15:28:11.062 INFO  (main) [   ] o.a.s.c.SolrResourceLoader [null] Added 0 libs to classloader, from paths: []
2017-10-11 15:28:12.514 INFO  (main) [   ] o.a.s.c.CorePropertiesLocator Found 0 core definitions underneath /opt/solr/server/solr
2017-10-11 15:28:12.635 INFO  (main) [   ] o.e.j.s.Server Started @4304ms
2017-10-11 15:29:00.971 INFO  (qtp1911006827-13) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json} status=0 QTime=108
2017-10-11 15:29:01.080 INFO  (qtp1911006827-18) [   ] o.a.s.c.TransientSolrCoreCacheDefault Allocating transient cache for 2147483647 transient cores
2017-10-11 15:29:01.083 INFO  (qtp1911006827-18) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={core=handbook&action=STATUS&wt=json} status=0 QTime=5
2017-10-11 15:29:01.194 INFO  (qtp1911006827-19) [   ] o.a.s.h.a.CoreAdminOperation core create command name=handbook&action=CREATE&instanceDir=handbook&wt=json
2017-10-11 15:29:01.342 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.c.SolrResourceLoader [handbook] Added 51 libs to classloader, from paths: [/opt/solr/contrib/clustering/lib, /opt/solr/contrib/extraction/lib, /opt/solr/contrib/langid/lib, /opt/solr/contrib/velocity/lib, /opt/solr/dist]
2017-10-11 15:29:01.504 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.c.SolrConfig Using Lucene MatchVersion: 7.0.1
2017-10-11 15:29:01.969 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.s.IndexSchema [handbook] Schema name=default-config
2017-10-11 15:29:03.678 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.s.IndexSchema Loaded schema default-config/1.6 with uniqueid field id
2017-10-11 15:29:03.806 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.c.CoreContainer Creating SolrCore 'handbook' using configuration from instancedir /opt/solr/server/solr/handbook, trusted=true
2017-10-11 15:29:03.853 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.c.SolrCore solr.RecoveryStrategy.Builder
2017-10-11 15:29:03.866 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.c.SolrCore [[handbook] ] Opening new SolrCore at [/opt/solr/server/solr/handbook], dataDir=[/opt/solr/server/solr/handbook/data/]
2017-10-11 15:29:04.180 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.r.XSLTResponseWriter xsltCacheLifetimeSeconds=5
2017-10-11 15:29:05.100 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.u.UpdateHandler Using UpdateLog implementation: org.apache.solr.update.UpdateLog
2017-10-11 15:29:05.101 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.u.UpdateLog Initializing UpdateLog: dataDir= defaultSyncLevel=FLUSH numRecordsToKeep=100 maxNumLogsToKeep=10 numVersionBuckets=65536
2017-10-11 15:29:05.150 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.u.CommitTracker Hard AutoCommit: if uncommited for 15000ms;
2017-10-11 15:29:05.151 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.u.CommitTracker Soft AutoCommit: disabled
2017-10-11 15:29:05.199 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.s.SolrIndexSearcher Opening [Searcher@2b9fd97b[handbook] main]
2017-10-11 15:29:05.229 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.r.ManagedResourceStorage File-based storage initialized to use dir: /opt/solr/server/solr/handbook/conf
2017-10-11 15:29:05.266 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.h.c.SpellCheckComponent Initializing spell checkers
2017-10-11 15:29:05.283 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.s.DirectSolrSpellChecker init: {name=default,field=_text_,classname=solr.DirectSolrSpellChecker,distanceMeasure=internal,accuracy=0.5,maxEdits=2,minPrefix=1,maxInspections=5,minQueryLength=4,maxQueryFrequency=0.01}
2017-10-11 15:29:05.318 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.h.ReplicationHandler Commits will be reserved for  10000
2017-10-11 15:29:05.434 INFO  (searcherExecutor-7-thread-1-processing-x:handbook) [   x:handbook] o.a.s.c.QuerySenderListener QuerySenderListener sending requests to Searcher@2b9fd97b[handbook] main{ExitableDirectoryReader(UninvertingDirectoryReader())}
2017-10-11 15:29:05.439 INFO  (searcherExecutor-7-thread-1-processing-x:handbook) [   x:handbook] o.a.s.c.QuerySenderListener QuerySenderListener done.
2017-10-11 15:29:05.440 INFO  (searcherExecutor-7-thread-1-processing-x:handbook) [   x:handbook] o.a.s.h.c.SpellCheckComponent Loading spell index for spellchecker: default
2017-10-11 15:29:05.447 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.u.UpdateLog Could not find max version in index or recent updates, using new clock 1580975517016784896
2017-10-11 15:29:05.468 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={name=handbook&action=CREATE&instanceDir=handbook&wt=json} status=0 QTime=4275
2017-10-11 15:29:05.494 INFO  (searcherExecutor-7-thread-1-processing-x:handbook) [   x:handbook] o.a.s.c.SolrCore [handbook] Registered new searcher Searcher@2b9fd97b[handbook] main{ExitableDirectoryReader(UninvertingDirectoryReader())}
2017-10-11 15:36:24.537 INFO  (qtp1911006827-14) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={indexInfo=false&wt=json&_=1507736184190} status=0 QTime=1
2017-10-11 15:36:24.579 INFO  (qtp1911006827-20) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1507736184191} status=0 QTime=38
2017-10-11 15:36:27.810 INFO  (qtp1911006827-14) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={indexInfo=false&wt=json&_=1507736184190} status=0 QTime=0
2017-10-11 15:36:27.846 INFO  (qtp1911006827-13) [   x:handbook] o.a.s.c.S.Request [handbook]  webapp=/solr path=/admin/ping params={action=status&wt=json&_=1507736184191&ts=1507736184191} status=503 QTime=8
2017-10-11 15:36:27.852 INFO  (qtp1911006827-14) [   x:handbook] o.a.s.c.S.Request [handbook]  webapp=/solr path=/admin/luke params={numTerms=0&show=index&wt=json&_=1507736187772} status=0 QTime=35
2017-10-11 15:36:27.866 INFO  (qtp1911006827-18) [   x:handbook] o.a.s.c.S.Request [handbook]  webapp=/solr path=/replication params={wt=json&command=details&_=1507736187773} status=0 QTime=53
2017-10-11 15:36:27.893 INFO  (qtp1911006827-16) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1507736184191} status=0 QTime=84
2017-10-11 15:36:27.894 INFO  (qtp1911006827-11) [   x:handbook] o.a.s.c.S.Request [handbook]  webapp=/solr path=/admin/system params={wt=json&_=1507736187773} status=0 QTime=64
2017-10-11 15:36:33.015 INFO  (qtp1911006827-13) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={indexInfo=false&wt=json&_=1507736184190} status=0 QTime=0
2017-10-11 15:36:33.033 INFO  (qtp1911006827-20) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1507736184191} status=0 QTime=18
2017-10-11 15:36:35.199 INFO  (qtp1911006827-14) [   x:handbook] o.a.s.c.S.Request [handbook]  webapp=/solr path=/select params={q=*:*&_=1507736184481} hits=0 status=0 QTime=54
2017-10-13 13:10:43.480 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.c.PluginBag Going to create a new requestHandler with {type = requestHandler,name = /update/extract,class = solr.extraction.ExtractingRequestHandler,attributes = {startup=lazy, name=/update/extract, class=solr.extraction.ExtractingRequestHandler},args = {defaults={lowernames=true,fmap.meta=ignored_,fmap.content=_text_,df=_text_}}}
2017-10-13 13:10:46.287 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.u.DirectUpdateHandler2 start commit{_version_=1581148008618131456,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2017-10-13 13:10:46.288 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
2017-10-13 13:10:46.374 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.u.DirectUpdateHandler2 end_commit_flush
2017-10-13 13:10:46.375 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.u.p.LogUpdateProcessorFactory [handbook]  webapp=/solr path=/update/extract params={commit=true}{commit=} 0 2947
2017-10-13 13:20:09.424 INFO  (qtp1911006827-11) [   x:handbook] o.a.s.u.DirectUpdateHandler2 start commit{_version_=1581148599141531648,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2017-10-13 13:20:09.447 INFO  (qtp1911006827-11) [   x:handbook] o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
2017-10-13 13:20:09.450 INFO  (qtp1911006827-11) [   x:handbook] o.a.s.u.DirectUpdateHandler2 end_commit_flush
2017-10-13 13:20:09.451 INFO  (qtp1911006827-11) [   x:handbook] o.a.s.u.p.LogUpdateProcessorFactory [handbook]  webapp=/solr path=/update/extract params={commit=true}{commit=} 0 27
2017-10-13 13:21:29.872 INFO  (qtp1911006827-17) [   x:handbook] o.a.s.u.DirectUpdateHandler2 start commit{_version_=1581148683498422272,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2017-10-13 13:21:29.873 INFO  (qtp1911006827-17) [   x:handbook] o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
2017-10-13 13:21:29.874 INFO  (qtp1911006827-17) [   x:handbook] o.a.s.u.DirectUpdateHandler2 end_commit_flush
2017-10-13 13:21:29.876 INFO  (qtp1911006827-17) [   x:handbook] o.a.s.u.p.LogUpdateProcessorFactory [handbook]  webapp=/solr path=/update/extract params={commit=true}{commit=} 0 4
2017-10-13 14:12:16.157 INFO  (qtp1911006827-15) [   x:handbook] o.a.s.u.DirectUpdateHandler2 start commit{_version_=1581151877759762432,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2017-10-13 14:12:16.158 INFO  (qtp1911006827-15) [   x:handbook] o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
2017-10-13 14:12:16.161 INFO  (qtp1911006827-15) [   x:handbook] o.a.s.u.DirectUpdateHandler2 end_commit_flush
2017-10-13 14:12:16.162 INFO  (qtp1911006827-15) [   x:handbook] o.a.s.u.p.LogUpdateProcessorFactory [handbook]  webapp=/solr path=/update/extract params={commit=true}{commit=} 0 6
2017-10-13 14:34:13.809 INFO  (qtp1911006827-17) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=42
2017-10-13 14:34:14.006 INFO  (qtp1911006827-20) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1507905253483} status=0 QTime=239
2017-10-13 14:34:14.063 INFO  (qtp1911006827-18) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1507905253483} status=0 QTime=28
2017-10-13 14:34:17.720 INFO  (qtp1911006827-15) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=0
2017-10-13 14:34:17.767 INFO  (qtp1911006827-15) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=43
2017-10-13 14:34:17.773 INFO  (qtp1911006827-17) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1507905253483} status=0 QTime=54
2017-10-13 14:34:27.726 INFO  (qtp1911006827-11) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:34:37.719 INFO  (qtp1911006827-19) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:34:41.174 INFO  (qtp1911006827-18) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=0
2017-10-13 14:34:41.222 INFO  (qtp1911006827-20) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1507905253483} status=0 QTime=48
2017-10-13 14:34:41.287 INFO  (qtp1911006827-18) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1507905253483} status=0 QTime=17
2017-10-13 14:34:42.737 INFO  (qtp1911006827-13) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=0
2017-10-13 14:34:42.745 INFO  (qtp1911006827-13) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:34:42.763 INFO  (qtp1911006827-14) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1507905253483} status=0 QTime=25
2017-10-13 14:34:52.980 INFO  (qtp1911006827-15) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:35:02.976 INFO  (qtp1911006827-11) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:35:12.976 INFO  (qtp1911006827-19) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:35:22.977 INFO  (qtp1911006827-16) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:35:32.981 INFO  (qtp1911006827-20) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:35:42.986 INFO  (qtp1911006827-20) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:35:52.986 INFO  (qtp1911006827-17) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:36:02.988 INFO  (qtp1911006827-13) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:36:12.994 INFO  (qtp1911006827-14) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:36:22.994 INFO  (qtp1911006827-15) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:36:33.002 INFO  (qtp1911006827-11) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:36:43.010 INFO  (qtp1911006827-16) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:36:52.995 INFO  (qtp1911006827-20) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:37:02.997 INFO  (qtp1911006827-17) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:37:13.002 INFO  (qtp1911006827-17) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:37:23.014 INFO  (qtp1911006827-14) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:37:24.960 INFO  (qtp1911006827-15) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=0
2017-10-13 14:37:25.004 INFO  (qtp1911006827-11) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1507905253483} status=0 QTime=19
2017-10-13 14:37:25.112 INFO  (qtp1911006827-18) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696} status=0 QTime=76
2017-10-13 14:38:07.403 INFO  (qtp1911006827-19) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=0
2017-10-13 14:38:07.440 INFO  (qtp1911006827-13) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:38:07.451 INFO  (qtp1911006827-20) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1507905253483} status=0 QTime=18
2017-10-13 14:38:17.391 INFO  (qtp1911006827-18) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:38:27.393 INFO  (qtp1911006827-16) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:38:37.403 INFO  (qtp1911006827-14) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:38:47.395 INFO  (qtp1911006827-14) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:38:57.399 INFO  (qtp1911006827-14) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:39:07.400 INFO  (qtp1911006827-14) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:39:17.404 INFO  (qtp1911006827-13) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:39:27.406 INFO  (qtp1911006827-13) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:39:37.408 INFO  (qtp1911006827-13) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:39:47.415 INFO  (qtp1911006827-17) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:39:57.416 INFO  (qtp1911006827-16) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:40:07.431 INFO  (qtp1911006827-19) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:40:17.421 INFO  (qtp1911006827-15) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:40:27.421 INFO  (qtp1911006827-11) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:40:37.422 INFO  (qtp1911006827-11) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:40:47.422 INFO  (qtp1911006827-20) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:40:57.428 INFO  (qtp1911006827-18) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:41:07.431 INFO  (qtp1911006827-13) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:41:17.422 INFO  (qtp1911006827-17) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:41:27.423 INFO  (qtp1911006827-17) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:41:37.423 INFO  (qtp1911006827-19) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:41:47.426 INFO  (qtp1911006827-15) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:41:57.441 INFO  (qtp1911006827-14) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:42:07.434 INFO  (qtp1911006827-11) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:42:17.434 INFO  (qtp1911006827-20) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:42:27.435 INFO  (qtp1911006827-18) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:42:37.439 INFO  (qtp1911006827-13) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:42:47.697 INFO  (qtp1911006827-16) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:42:57.804 INFO  (qtp1911006827-17) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:43:08.323 INFO  (qtp1911006827-19) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:43:18.653 INFO  (qtp1911006827-15) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:43:28.813 INFO  (qtp1911006827-14) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:43:38.816 INFO  (qtp1911006827-14) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:43:48.815 INFO  (qtp1911006827-20) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:43:58.817 INFO  (qtp1911006827-20) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:44:08.813 INFO  (qtp1911006827-13) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:44:18.820 INFO  (qtp1911006827-16) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:44:28.818 INFO  (qtp1911006827-17) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:44:38.821 INFO  (qtp1911006827-19) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:44:48.823 INFO  (qtp1911006827-15) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:44:58.819 INFO  (qtp1911006827-11) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:45:08.824 INFO  (qtp1911006827-14) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:45:18.820 INFO  (qtp1911006827-18) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:45:28.824 INFO  (qtp1911006827-20) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:45:38.823 INFO  (qtp1911006827-13) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:45:48.824 INFO  (qtp1911006827-16) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:45:58.819 INFO  (qtp1911006827-17) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:46:08.822 INFO  (qtp1911006827-17) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:46:18.820 INFO  (qtp1911006827-17) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:46:28.820 INFO  (qtp1911006827-11) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:46:38.826 INFO  (qtp1911006827-14) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:46:48.823 INFO  (qtp1911006827-18) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:46:58.825 INFO  (qtp1911006827-20) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:47:08.827 INFO  (qtp1911006827-13) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:47:18.846 INFO  (qtp1911006827-13) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:47:28.825 INFO  (qtp1911006827-19) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:47:38.826 INFO  (qtp1911006827-15) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:47:50.183 INFO  (qtp1911006827-17) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=1356
2017-10-13 14:47:58.828 INFO  (qtp1911006827-11) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:48:08.828 INFO  (qtp1911006827-14) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:48:18.885 INFO  (qtp1911006827-18) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:48:28.827 INFO  (qtp1911006827-20) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:48:38.831 INFO  (qtp1911006827-16) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:48:48.833 INFO  (qtp1911006827-13) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:48:58.833 INFO  (qtp1911006827-13) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:49:08.834 INFO  (qtp1911006827-15) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:49:18.832 INFO  (qtp1911006827-17) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:49:28.835 INFO  (qtp1911006827-11) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:49:38.861 INFO  (qtp1911006827-14) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=14
2017-10-13 14:49:48.853 INFO  (qtp1911006827-18) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:49:58.837 INFO  (qtp1911006827-20) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0
2017-10-13 14:50:08.833 INFO  (qtp1911006827-16) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=0




>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>
>> On Fri, Oct 13, 2017 at 8:08 PM, Kevin Layer <[hidden email]> wrote:
>>
>> > Amrit Sarkar wrote:
>> >
>> > >> Hi Kevin,
>> > >>
>> > >> Can you post the solr log in the mail thread. I don't think it handled
>> > the
>> > >> .md by itself by first glance at code.
>> >
>> > How do I extract the log you want?
>> >
>> >
>> > >>
>> > >> Amrit Sarkar
>> > >> Search Engineer
>> > >> Lucidworks, Inc.
>> > >> 415-589-9269
>> > >> www.lucidworks.com
>> > >> Twitter http://twitter.com/lucidworks
>> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > >>
>> > >> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer <[hidden email]> wrote:
>> > >>
>> > >> > Amrit Sarkar wrote:
>> > >> >
>> > >> > >> Kevin,
>> > >> > >>
>> > >> > >> Just put "html" too and give it a shot. These are the types it is
>> > >> > expecting:
>> > >> >
>> > >> > Same thing.
>> > >> >
>> > >> > >>
>> > >> > >> mimeMap = new HashMap<>();
>> > >> > >> mimeMap.put("xml", "application/xml");
>> > >> > >> mimeMap.put("csv", "text/csv");
>> > >> > >> mimeMap.put("json", "application/json");
>> > >> > >> mimeMap.put("jsonl", "application/json");
>> > >> > >> mimeMap.put("pdf", "application/pdf");
>> > >> > >> mimeMap.put("rtf", "text/rtf");
>> > >> > >> mimeMap.put("html", "text/html");
>> > >> > >> mimeMap.put("htm", "text/html");
>> > >> > >> mimeMap.put("doc", "application/msword");
>> > >> > >> mimeMap.put("docx",
>> > >> > >> "application/vnd.openxmlformats-officedocument.
>> > >> > wordprocessingml.document");
>> > >> > >> mimeMap.put("ppt", "application/vnd.ms-powerpoint");
>> > >> > >> mimeMap.put("pptx",
>> > >> > >> "application/vnd.openxmlformats-officedocument.
>> > >> > presentationml.presentation");
>> > >> > >> mimeMap.put("xls", "application/vnd.ms-excel");
>> > >> > >> mimeMap.put("xlsx",
>> > >> > >> "application/vnd.openxmlformats-officedocument.
>> > spreadsheetml.sheet");
>> > >> > >> mimeMap.put("odt", "application/vnd.oasis.opendocument.text");
>> > >> > >> mimeMap.put("ott", "application/vnd.oasis.opendocument.text");
>> > >> > >> mimeMap.put("odp", "application/vnd.oasis.
>> > opendocument.presentation");
>> > >> > >> mimeMap.put("otp", "application/vnd.oasis.
>> > opendocument.presentation");
>> > >> > >> mimeMap.put("ods", "application/vnd.oasis.
>> > opendocument.spreadsheet");
>> > >> > >> mimeMap.put("ots", "application/vnd.oasis.
>> > opendocument.spreadsheet");
>> > >> > >> mimeMap.put("txt", "text/plain");
>> > >> > >> mimeMap.put("log", "text/plain");
>> > >> > >>
>> > >> > >> The keys are the types supported.
>> > >> > >>
>> > >> > >>
>> > >> > >> Amrit Sarkar
>> > >> > >> Search Engineer
>> > >> > >> Lucidworks, Inc.
>> > >> > >> 415-589-9269
>> > >> > >> www.lucidworks.com
>> > >> > >> Twitter http://twitter.com/lucidworks
>> > >> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > >> > >>
>> > >> > >> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar <
>> > [hidden email]>
>> > >> > >> wrote:
>> > >> > >>
>> > >> > >> > Ah!
>> > >> > >> >
>> > >> > >> > Only supported type is: text/html; encoding=utf-8
>> > >> > >> >
>> > >> > >> > I am not confident of this either :) but this should work.
>> > >> > >> >
>> > >> > >> > See the code-snippet below:
>> > >> > >> >
>> > >> > >> > ......
>> > >> > >> >
>> > >> > >> > if(res.httpStatus == 200) {
>> > >> > >> >   // Raw content type of form "text/html; encoding=utf-8"
>> > >> > >> >   String rawContentType = conn.getContentType();
>> > >> > >> >   String type = rawContentType.split(";")[0];
>> > >> > >> >   if(typeSupported(type) || "*".equals(fileTypes)) {
>> > >> > >> >     String encoding = conn.getContentEncoding();
>> > >> > >> >
>> > >> > >> > ....
>> > >> > >> >
>> > >> > >> >
>> > >> > >> > Amrit Sarkar
>> > >> > >> > Search Engineer
>> > >> > >> > Lucidworks, Inc.
>> > >> > >> > 415-589-9269
>> > >> > >> > www.lucidworks.com
>> > >> > >> > Twitter http://twitter.com/lucidworks
>> > >> > >> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > >> > >> >
>> > >> > >> > On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer <[hidden email]>
>> > wrote:
>> > >> > >> >
>> > >> > >> >> Amrit Sarkar wrote:
>> > >> > >> >>
>> > >> > >> >> >> Strange,
>> > >> > >> >> >>
>> > >> > >> >> >> Can you add: "text/html;charset=utf-8". This is
>> > wiki.apache.org
>> > >> > page's
>> > >> > >> >> >> Content-Type. Let's see what it says now.
>> > >> > >> >>
>> > >> > >> >> Same thing.  Verified Content-Type:
>> > >> > >> >>
>> > >> > >> >> quadra[git:master]$ wget -S -O /dev/null
>> > http://quadra:9091/index.md
>> > >> > |&
>> > >> > >> >> grep Content-Type
>> > >> > >> >>   Content-Type: text/html;charset=utf-8
>> > >> > >> >> quadra[git:master]$ ]
>> > >> > >> >>
>> > >> > >> >> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c
>> > >> > handbook
>> > >> > >> >> http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes
>> > md
>> > >> > >> >> /docker-java-home/jre/bin/java -classpath
>> > >> > /opt/solr/dist/solr-core-7.0.1.jar
>> > >> > >> >> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook
>> > >> > -Ddata=web
>> > >> > >> >> org.apache.solr.util.SimplePostTool http://quadra:9091/index.md
>> > >> > >> >> SimplePostTool version 5.0.0
>> > >> > >> >> Posting web pages to Solr url http://localhost:8983/solr/han
>> > >> > >> >> dbook/update/extract
>> > >> > >> >> Entering auto mode. Indexing pages with content-types
>> > corresponding
>> > >> > to
>> > >> > >> >> file endings md
>> > >> > >> >> SimplePostTool: WARNING: Never crawl an external web site
>> > faster than
>> > >> > >> >> every 10 seconds, your IP will probably be blocked
>> > >> > >> >> Entering recursive mode, depth=10, delay=0s
>> > >> > >> >> Entering crawl at level 0 (1 links total, 1 new)
>> > >> > >> >> SimplePostTool: WARNING: Skipping URL with unsupported type
>> > text/html
>> > >> > >> >> SimplePostTool: WARNING: The URL http://quadra:9091/index.md
>> > >> > returned a
>> > >> > >> >> HTTP result status of 415
>> > >> > >> >> 0 web pages indexed.
>> > >> > >> >> COMMITting Solr index changes to http://localhost:8983/solr/han
>> > >> > >> >> dbook/update/extract...
>> > >> > >> >> Time spent: 0:00:00.531
>> > >> > >> >> quadra[git:master]$
>> > >> > >> >>
>> > >> > >> >> Kevin
>> > >> > >> >>
>> > >> > >> >> >>
>> > >> > >> >> >> Amrit Sarkar
>> > >> > >> >> >> Search Engineer
>> > >> > >> >> >> Lucidworks, Inc.
>> > >> > >> >> >> 415-589-9269
>> > >> > >> >> >> www.lucidworks.com
>> > >> > >> >> >> Twitter http://twitter.com/lucidworks
>> > >> > >> >> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > >> > >> >> >>
>> > >> > >> >> >> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer <
>> > [hidden email]>
>> > >> > wrote:
>> > >> > >> >> >>
>> > >> > >> >> >> > OK, so I hacked markserv to add Content-Type text/html,
>> > but now
>> > >> > I get
>> > >> > >> >> >> >
>> > >> > >> >> >> > SimplePostTool: WARNING: Skipping URL with unsupported type
>> > >> > text/html
>> > >> > >> >> >> >
>> > >> > >> >> >> > What is it expecting?
>> > >> > >> >> >> >
>> > >> > >> >> >> > $ docker exec -it --user=solr solr bin/post -c handbook
>> > >> > >> >> >> > http://quadra:9091/index.md -recursive 10 -delay 0
>> > -filetypes
>> > >> > md
>> > >> > >> >> >> > /docker-java-home/jre/bin/java -classpath
>> > >> > >> >> /opt/solr/dist/solr-core-7.0.1.jar
>> > >> > >> >> >> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md
>> > -Dc=handbook
>> > >> > >> >> -Ddata=web
>> > >> > >> >> >> > org.apache.solr.util.SimplePostTool
>> > http://quadra:9091/index.md
>> > >> > >> >> >> > SimplePostTool version 5.0.0
>> > >> > >> >> >> > Posting web pages to Solr url http://localhost:8983/solr/
>> > >> > >> >> >> > handbook/update/extract
>> > >> > >> >> >> > Entering auto mode. Indexing pages with content-types
>> > >> > corresponding
>> > >> > >> >> to
>> > >> > >> >> >> > file endings md
>> > >> > >> >> >> > SimplePostTool: WARNING: Never crawl an external web site
>> > >> > faster than
>> > >> > >> >> >> > every 10 seconds, your IP will probably be blocked
>> > >> > >> >> >> > Entering recursive mode, depth=10, delay=0s
>> > >> > >> >> >> > Entering crawl at level 0 (1 links total, 1 new)
>> > >> > >> >> >> > SimplePostTool: WARNING: Skipping URL with unsupported type
>> > >> > text/html
>> > >> > >> >> >> > SimplePostTool: WARNING: The URL
>> > http://quadra:9091/index.md
>> > >> > >> >> returned a
>> > >> > >> >> >> > HTTP result status of 415
>> > >> > >> >> >> > 0 web pages indexed.
>> > >> > >> >> >> > COMMITting Solr index changes to
>> > http://localhost:8983/solr/
>> > >> > >> >> >> > handbook/update/extract...
>> > >> > >> >> >> > Time spent: 0:00:03.882
>> > >> > >> >> >> > $
>> > >> > >> >> >> >
>> > >> > >> >> >> > Thanks.
>> > >> > >> >> >> >
>> > >> > >> >> >> > Kevin
>> > >> > >> >> >> >
>> > >> > >> >>
>> > >> > >> >
>> > >> > >> >
>> > >> >
>> >
Reply | Threaded
Open this post in threaded view
|

Re: solr 7.0.1: exception running post to crawl simple website

Amrit Sarkar
Kevin,

I am not able to replicate the issue on my system, which is bit annoying
for me. Try this out for last time:

docker exec -it --user=solr solr bin/post -c handbook
http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes html

and have Content-Type: "html" and "text/html", try with both.

If you get past this hurdle this hurdle, let me know.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Oct 13, 2017 at 8:22 PM, Kevin Layer <[hidden email]> wrote:

> Amrit Sarkar wrote:
>
> >> ah oh, dockers. They are placed under [solr-home]/server/log/solr/log
> in
> >> the machine. I haven't played much with docker, any way you can get that
> >> file from that location.
>
> I see these files:
>
> /opt/solr/server/logs/archived
> /opt/solr/server/logs/solr_gc.log.0.current
> /opt/solr/server/logs/solr.log
> /opt/solr/server/solr/handbook/data/tlog
>
> The 3rd one has very little info.  Attached:
>
>
> 2017-10-11 15:28:09.564 INFO  (main) [   ] o.e.j.s.Server
> jetty-9.3.14.v20161028
> 2017-10-11 15:28:10.668 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
> ___      _       Welcome to Apache Solr™ version 7.0.1
> 2017-10-11 15:28:10.669 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter /
> __| ___| |_ _   Starting in standalone mode on port 8983
> 2017-10-11 15:28:10.670 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter \__
> \/ _ \ | '_|  Install dir: /opt/solr, Default config dir:
> /opt/solr/server/solr/configsets/_default/conf
> 2017-10-11 15:28:10.707 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
> |___/\___/_|_|    Start time: 2017-10-11T15:28:10.674Z
> 2017-10-11 15:28:10.747 INFO  (main) [   ] o.a.s.c.SolrResourceLoader
> Using system property solr.solr.home: /opt/solr/server/solr
> 2017-10-11 15:28:10.763 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading
> container configuration from /opt/solr/server/solr/solr.xml
> 2017-10-11 15:28:11.062 INFO  (main) [   ] o.a.s.c.SolrResourceLoader
> [null] Added 0 libs to classloader, from paths: []
> 2017-10-11 15:28:12.514 INFO  (main) [   ] o.a.s.c.CorePropertiesLocator
> Found 0 core definitions underneath /opt/solr/server/solr
> 2017-10-11 15:28:12.635 INFO  (main) [   ] o.e.j.s.Server Started @4304ms
> 2017-10-11 15:29:00.971 INFO  (qtp1911006827-13) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> params={wt=json} status=0 QTime=108
> 2017-10-11 15:29:01.080 INFO  (qtp1911006827-18) [   ] o.a.s.c.TransientSolrCoreCacheDefault
> Allocating transient cache for 2147483647 transient cores
> 2017-10-11 15:29:01.083 INFO  (qtp1911006827-18) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> params={core=handbook&action=STATUS&wt=json} status=0 QTime=5
> 2017-10-11 15:29:01.194 INFO  (qtp1911006827-19) [   ]
> o.a.s.h.a.CoreAdminOperation core create command
> name=handbook&action=CREATE&instanceDir=handbook&wt=json
> 2017-10-11 15:29:01.342 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.c.SolrResourceLoader [handbook] Added 51 libs to classloader, from
> paths: [/opt/solr/contrib/clustering/lib, /opt/solr/contrib/extraction/lib,
> /opt/solr/contrib/langid/lib, /opt/solr/contrib/velocity/lib,
> /opt/solr/dist]
> 2017-10-11 15:29:01.504 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.c.SolrConfig Using Lucene MatchVersion: 7.0.1
> 2017-10-11 15:29:01.969 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.s.IndexSchema [handbook] Schema name=default-config
> 2017-10-11 15:29:03.678 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.s.IndexSchema Loaded schema default-config/1.6 with uniqueid field id
> 2017-10-11 15:29:03.806 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.c.CoreContainer Creating SolrCore 'handbook' using configuration from
> instancedir /opt/solr/server/solr/handbook, trusted=true
> 2017-10-11 15:29:03.853 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.c.SolrCore solr.RecoveryStrategy.Builder
> 2017-10-11 15:29:03.866 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.c.SolrCore [[handbook] ] Opening new SolrCore at
> [/opt/solr/server/solr/handbook], dataDir=[/opt/solr/server/
> solr/handbook/data/]
> 2017-10-11 15:29:04.180 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.r.XSLTResponseWriter xsltCacheLifetimeSeconds=5
> 2017-10-11 15:29:05.100 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.u.UpdateHandler Using UpdateLog implementation:
> org.apache.solr.update.UpdateLog
> 2017-10-11 15:29:05.101 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.u.UpdateLog Initializing UpdateLog: dataDir= defaultSyncLevel=FLUSH
> numRecordsToKeep=100 maxNumLogsToKeep=10 numVersionBuckets=65536
> 2017-10-11 15:29:05.150 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.u.CommitTracker Hard AutoCommit: if uncommited for 15000ms;
> 2017-10-11 15:29:05.151 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.u.CommitTracker Soft AutoCommit: disabled
> 2017-10-11 15:29:05.199 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.s.SolrIndexSearcher Opening [Searcher@2b9fd97b[handbook] main]
> 2017-10-11 15:29:05.229 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.r.ManagedResourceStorage File-based storage initialized to use dir:
> /opt/solr/server/solr/handbook/conf
> 2017-10-11 15:29:05.266 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.h.c.SpellCheckComponent Initializing spell checkers
> 2017-10-11 15:29:05.283 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.s.DirectSolrSpellChecker init: {name=default,field=_text_,
> classname=solr.DirectSolrSpellChecker,distanceMeasure=internal,
> accuracy=0.5,maxEdits=2,minPrefix=1,maxInspections=5,minQueryLength=4,
> maxQueryFrequency=0.01}
> 2017-10-11 15:29:05.318 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.h.ReplicationHandler Commits will be reserved for  10000
> 2017-10-11 15:29:05.434 INFO  (searcherExecutor-7-thread-1-processing-x:handbook)
> [   x:handbook] o.a.s.c.QuerySenderListener QuerySenderListener sending
> requests to Searcher@2b9fd97b[handbook] main{ExitableDirectoryReader(
> UninvertingDirectoryReader())}
> 2017-10-11 15:29:05.439 INFO  (searcherExecutor-7-thread-1-processing-x:handbook)
> [   x:handbook] o.a.s.c.QuerySenderListener QuerySenderListener done.
> 2017-10-11 15:29:05.440 INFO  (searcherExecutor-7-thread-1-processing-x:handbook)
> [   x:handbook] o.a.s.h.c.SpellCheckComponent Loading spell index for
> spellchecker: default
> 2017-10-11 15:29:05.447 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.u.UpdateLog Could not find max version in index or recent updates,
> using new clock 1580975517016784896
> 2017-10-11 15:29:05.468 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> params={name=handbook&action=CREATE&instanceDir=handbook&wt=json}
> status=0 QTime=4275
> 2017-10-11 15:29:05.494 INFO  (searcherExecutor-7-thread-1-processing-x:handbook)
> [   x:handbook] o.a.s.c.SolrCore [handbook] Registered new searcher
> Searcher@2b9fd97b[handbook] main{ExitableDirectoryReader(
> UninvertingDirectoryReader())}
> 2017-10-11 15:36:24.537 INFO  (qtp1911006827-14) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> params={indexInfo=false&wt=json&_=1507736184190} status=0 QTime=1
> 2017-10-11 15:36:24.579 INFO  (qtp1911006827-20) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> params={wt=json&_=1507736184191} status=0 QTime=38
> 2017-10-11 15:36:27.810 INFO  (qtp1911006827-14) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> params={indexInfo=false&wt=json&_=1507736184190} status=0 QTime=0
> 2017-10-11 15:36:27.846 INFO  (qtp1911006827-13) [   x:handbook]
> o.a.s.c.S.Request [handbook]  webapp=/solr path=/admin/ping
> params={action=status&wt=json&_=1507736184191&ts=1507736184191}
> status=503 QTime=8
> 2017-10-11 15:36:27.852 INFO  (qtp1911006827-14) [   x:handbook]
> o.a.s.c.S.Request [handbook]  webapp=/solr path=/admin/luke
> params={numTerms=0&show=index&wt=json&_=1507736187772} status=0 QTime=35
> 2017-10-11 15:36:27.866 INFO  (qtp1911006827-18) [   x:handbook]
> o.a.s.c.S.Request [handbook]  webapp=/solr path=/replication
> params={wt=json&command=details&_=1507736187773} status=0 QTime=53
> 2017-10-11 15:36:27.893 INFO  (qtp1911006827-16) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> params={wt=json&_=1507736184191} status=0 QTime=84
> 2017-10-11 15:36:27.894 INFO  (qtp1911006827-11) [   x:handbook]
> o.a.s.c.S.Request [handbook]  webapp=/solr path=/admin/system
> params={wt=json&_=1507736187773} status=0 QTime=64
> 2017-10-11 15:36:33.015 INFO  (qtp1911006827-13) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> params={indexInfo=false&wt=json&_=1507736184190} status=0 QTime=0
> 2017-10-11 15:36:33.033 INFO  (qtp1911006827-20) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> params={wt=json&_=1507736184191} status=0 QTime=18
> 2017-10-11 15:36:35.199 INFO  (qtp1911006827-14) [   x:handbook]
> o.a.s.c.S.Request [handbook]  webapp=/solr path=/select
> params={q=*:*&_=1507736184481} hits=0 status=0 QTime=54
> 2017-10-13 13:10:43.480 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.c.PluginBag Going to create a new requestHandler with {type =
> requestHandler,name = /update/extract,class = solr.extraction.
> ExtractingRequestHandler,attributes = {startup=lazy,
> name=/update/extract, class=solr.extraction.ExtractingRequestHandler},args
> = {defaults={lowernames=true,fmap.meta=ignored_,fmap.
> content=_text_,df=_text_}}}
> 2017-10-13 13:10:46.287 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.u.DirectUpdateHandler2 start commit{_version_=
> 1581148008618131456,optimize=false,openSearcher=true,waitSearcher=true,
> expungeDeletes=false,softCommit=false,prepareCommit=false}
> 2017-10-13 13:10:46.288 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
> 2017-10-13 13:10:46.374 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.u.DirectUpdateHandler2 end_commit_flush
> 2017-10-13 13:10:46.375 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.u.p.LogUpdateProcessorFactory
> [handbook]  webapp=/solr path=/update/extract params={commit=true}{commit=}
> 0 2947
> 2017-10-13 13:20:09.424 INFO  (qtp1911006827-11) [   x:handbook]
> o.a.s.u.DirectUpdateHandler2 start commit{_version_=
> 1581148599141531648,optimize=false,openSearcher=true,waitSearcher=true,
> expungeDeletes=false,softCommit=false,prepareCommit=false}
> 2017-10-13 13:20:09.447 INFO  (qtp1911006827-11) [   x:handbook]
> o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
> 2017-10-13 13:20:09.450 INFO  (qtp1911006827-11) [   x:handbook]
> o.a.s.u.DirectUpdateHandler2 end_commit_flush
> 2017-10-13 13:20:09.451 INFO  (qtp1911006827-11) [   x:handbook] o.a.s.u.p.LogUpdateProcessorFactory
> [handbook]  webapp=/solr path=/update/extract params={commit=true}{commit=}
> 0 27
> 2017-10-13 13:21:29.872 INFO  (qtp1911006827-17) [   x:handbook]
> o.a.s.u.DirectUpdateHandler2 start commit{_version_=
> 1581148683498422272,optimize=false,openSearcher=true,waitSearcher=true,
> expungeDeletes=false,softCommit=false,prepareCommit=false}
> 2017-10-13 13:21:29.873 INFO  (qtp1911006827-17) [   x:handbook]
> o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
> 2017-10-13 13:21:29.874 INFO  (qtp1911006827-17) [   x:handbook]
> o.a.s.u.DirectUpdateHandler2 end_commit_flush
> 2017-10-13 13:21:29.876 INFO  (qtp1911006827-17) [   x:handbook] o.a.s.u.p.LogUpdateProcessorFactory
> [handbook]  webapp=/solr path=/update/extract params={commit=true}{commit=}
> 0 4
> 2017-10-13 14:12:16.157 INFO  (qtp1911006827-15) [   x:handbook]
> o.a.s.u.DirectUpdateHandler2 start commit{_version_=
> 1581151877759762432,optimize=false,openSearcher=true,waitSearcher=true,
> expungeDeletes=false,softCommit=false,prepareCommit=false}
> 2017-10-13 14:12:16.158 INFO  (qtp1911006827-15) [   x:handbook]
> o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
> 2017-10-13 14:12:16.161 INFO  (qtp1911006827-15) [   x:handbook]
> o.a.s.u.DirectUpdateHandler2 end_commit_flush
> 2017-10-13 14:12:16.162 INFO  (qtp1911006827-15) [   x:handbook] o.a.s.u.p.LogUpdateProcessorFactory
> [handbook]  webapp=/solr path=/update/extract params={commit=true}{commit=}
> 0 6
> 2017-10-13 14:34:13.809 INFO  (qtp1911006827-17) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=42
> 2017-10-13 14:34:14.006 INFO  (qtp1911006827-20) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> params={wt=json&_=1507905253483} status=0 QTime=239
> 2017-10-13 14:34:14.063 INFO  (qtp1911006827-18) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> params={wt=json&_=1507905253483} status=0 QTime=28
> 2017-10-13 14:34:17.720 INFO  (qtp1911006827-15) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=0
> 2017-10-13 14:34:17.767 INFO  (qtp1911006827-15) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=43
> 2017-10-13 14:34:17.773 INFO  (qtp1911006827-17) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> params={wt=json&_=1507905253483} status=0 QTime=54
> 2017-10-13 14:34:27.726 INFO  (qtp1911006827-11) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:34:37.719 INFO  (qtp1911006827-19) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:34:41.174 INFO  (qtp1911006827-18) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=0
> 2017-10-13 14:34:41.222 INFO  (qtp1911006827-20) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> params={wt=json&_=1507905253483} status=0 QTime=48
> 2017-10-13 14:34:41.287 INFO  (qtp1911006827-18) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> params={wt=json&_=1507905253483} status=0 QTime=17
> 2017-10-13 14:34:42.737 INFO  (qtp1911006827-13) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=0
> 2017-10-13 14:34:42.745 INFO  (qtp1911006827-13) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:34:42.763 INFO  (qtp1911006827-14) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> params={wt=json&_=1507905253483} status=0 QTime=25
> 2017-10-13 14:34:52.980 INFO  (qtp1911006827-15) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:35:02.976 INFO  (qtp1911006827-11) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:35:12.976 INFO  (qtp1911006827-19) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:35:22.977 INFO  (qtp1911006827-16) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:35:32.981 INFO  (qtp1911006827-20) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:35:42.986 INFO  (qtp1911006827-20) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:35:52.986 INFO  (qtp1911006827-17) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:36:02.988 INFO  (qtp1911006827-13) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:36:12.994 INFO  (qtp1911006827-14) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:36:22.994 INFO  (qtp1911006827-15) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:36:33.002 INFO  (qtp1911006827-11) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:36:43.010 INFO  (qtp1911006827-16) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:36:52.995 INFO  (qtp1911006827-20) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:37:02.997 INFO  (qtp1911006827-17) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:37:13.002 INFO  (qtp1911006827-17) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:37:23.014 INFO  (qtp1911006827-14) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:37:24.960 INFO  (qtp1911006827-15) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=0
> 2017-10-13 14:37:25.004 INFO  (qtp1911006827-11) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> params={wt=json&_=1507905253483} status=0 QTime=19
> 2017-10-13 14:37:25.112 INFO  (qtp1911006827-18) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696} status=0 QTime=76
> 2017-10-13 14:38:07.403 INFO  (qtp1911006827-19) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=0
> 2017-10-13 14:38:07.440 INFO  (qtp1911006827-13) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:38:07.451 INFO  (qtp1911006827-20) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> params={wt=json&_=1507905253483} status=0 QTime=18
> 2017-10-13 14:38:17.391 INFO  (qtp1911006827-18) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:38:27.393 INFO  (qtp1911006827-16) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:38:37.403 INFO  (qtp1911006827-14) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:38:47.395 INFO  (qtp1911006827-14) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:38:57.399 INFO  (qtp1911006827-14) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:39:07.400 INFO  (qtp1911006827-14) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:39:17.404 INFO  (qtp1911006827-13) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:39:27.406 INFO  (qtp1911006827-13) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:39:37.408 INFO  (qtp1911006827-13) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:39:47.415 INFO  (qtp1911006827-17) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:39:57.416 INFO  (qtp1911006827-16) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:40:07.431 INFO  (qtp1911006827-19) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:40:17.421 INFO  (qtp1911006827-15) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:40:27.421 INFO  (qtp1911006827-11) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:40:37.422 INFO  (qtp1911006827-11) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:40:47.422 INFO  (qtp1911006827-20) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:40:57.428 INFO  (qtp1911006827-18) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:41:07.431 INFO  (qtp1911006827-13) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:41:17.422 INFO  (qtp1911006827-17) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:41:27.423 INFO  (qtp1911006827-17) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:41:37.423 INFO  (qtp1911006827-19) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:41:47.426 INFO  (qtp1911006827-15) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:41:57.441 INFO  (qtp1911006827-14) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:42:07.434 INFO  (qtp1911006827-11) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:42:17.434 INFO  (qtp1911006827-20) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:42:27.435 INFO  (qtp1911006827-18) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:42:37.439 INFO  (qtp1911006827-13) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:42:47.697 INFO  (qtp1911006827-16) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:42:57.804 INFO  (qtp1911006827-17) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:43:08.323 INFO  (qtp1911006827-19) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:43:18.653 INFO  (qtp1911006827-15) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:43:28.813 INFO  (qtp1911006827-14) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:43:38.816 INFO  (qtp1911006827-14) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:43:48.815 INFO  (qtp1911006827-20) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:43:58.817 INFO  (qtp1911006827-20) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:44:08.813 INFO  (qtp1911006827-13) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:44:18.820 INFO  (qtp1911006827-16) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:44:28.818 INFO  (qtp1911006827-17) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:44:38.821 INFO  (qtp1911006827-19) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:44:48.823 INFO  (qtp1911006827-15) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:44:58.819 INFO  (qtp1911006827-11) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:45:08.824 INFO  (qtp1911006827-14) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:45:18.820 INFO  (qtp1911006827-18) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:45:28.824 INFO  (qtp1911006827-20) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:45:38.823 INFO  (qtp1911006827-13) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:45:48.824 INFO  (qtp1911006827-16) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:45:58.819 INFO  (qtp1911006827-17) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:46:08.822 INFO  (qtp1911006827-17) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:46:18.820 INFO  (qtp1911006827-17) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:46:28.820 INFO  (qtp1911006827-11) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:46:38.826 INFO  (qtp1911006827-14) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:46:48.823 INFO  (qtp1911006827-18) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:46:58.825 INFO  (qtp1911006827-20) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:47:08.827 INFO  (qtp1911006827-13) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:47:18.846 INFO  (qtp1911006827-13) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:47:28.825 INFO  (qtp1911006827-19) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:47:38.826 INFO  (qtp1911006827-15) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:47:50.183 INFO  (qtp1911006827-17) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=1356
> 2017-10-13 14:47:58.828 INFO  (qtp1911006827-11) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:48:08.828 INFO  (qtp1911006827-14) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:48:18.885 INFO  (qtp1911006827-18) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:48:28.827 INFO  (qtp1911006827-20) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:48:38.831 INFO  (qtp1911006827-16) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:48:48.833 INFO  (qtp1911006827-13) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:48:58.833 INFO  (qtp1911006827-13) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:49:08.834 INFO  (qtp1911006827-15) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:49:18.832 INFO  (qtp1911006827-17) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:49:28.835 INFO  (qtp1911006827-11) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:49:38.861 INFO  (qtp1911006827-14) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=14
> 2017-10-13 14:49:48.853 INFO  (qtp1911006827-18) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:49:58.837 INFO  (qtp1911006827-20) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> 2017-10-13 14:50:08.833 INFO  (qtp1911006827-16) [   ]
> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>
>
>
>
> >>
> >> Amrit Sarkar
> >> Search Engineer
> >> Lucidworks, Inc.
> >> 415-589-9269
> >> www.lucidworks.com
> >> Twitter http://twitter.com/lucidworks
> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >>
> >> On Fri, Oct 13, 2017 at 8:08 PM, Kevin Layer <[hidden email]> wrote:
> >>
> >> > Amrit Sarkar wrote:
> >> >
> >> > >> Hi Kevin,
> >> > >>
> >> > >> Can you post the solr log in the mail thread. I don't think it
> handled
> >> > the
> >> > >> .md by itself by first glance at code.
> >> >
> >> > How do I extract the log you want?
> >> >
> >> >
> >> > >>
> >> > >> Amrit Sarkar
> >> > >> Search Engineer
> >> > >> Lucidworks, Inc.
> >> > >> 415-589-9269
> >> > >> www.lucidworks.com
> >> > >> Twitter http://twitter.com/lucidworks
> >> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> > >>
> >> > >> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer <[hidden email]>
> wrote:
> >> > >>
> >> > >> > Amrit Sarkar wrote:
> >> > >> >
> >> > >> > >> Kevin,
> >> > >> > >>
> >> > >> > >> Just put "html" too and give it a shot. These are the types
> it is
> >> > >> > expecting:
> >> > >> >
> >> > >> > Same thing.
> >> > >> >
> >> > >> > >>
> >> > >> > >> mimeMap = new HashMap<>();
> >> > >> > >> mimeMap.put("xml", "application/xml");
> >> > >> > >> mimeMap.put("csv", "text/csv");
> >> > >> > >> mimeMap.put("json", "application/json");
> >> > >> > >> mimeMap.put("jsonl", "application/json");
> >> > >> > >> mimeMap.put("pdf", "application/pdf");
> >> > >> > >> mimeMap.put("rtf", "text/rtf");
> >> > >> > >> mimeMap.put("html", "text/html");
> >> > >> > >> mimeMap.put("htm", "text/html");
> >> > >> > >> mimeMap.put("doc", "application/msword");
> >> > >> > >> mimeMap.put("docx",
> >> > >> > >> "application/vnd.openxmlformats-officedocument.
> >> > >> > wordprocessingml.document");
> >> > >> > >> mimeMap.put("ppt", "application/vnd.ms-powerpoint");
> >> > >> > >> mimeMap.put("pptx",
> >> > >> > >> "application/vnd.openxmlformats-officedocument.
> >> > >> > presentationml.presentation");
> >> > >> > >> mimeMap.put("xls", "application/vnd.ms-excel");
> >> > >> > >> mimeMap.put("xlsx",
> >> > >> > >> "application/vnd.openxmlformats-officedocument.
> >> > spreadsheetml.sheet");
> >> > >> > >> mimeMap.put("odt", "application/vnd.oasis.
> opendocument.text");
> >> > >> > >> mimeMap.put("ott", "application/vnd.oasis.
> opendocument.text");
> >> > >> > >> mimeMap.put("odp", "application/vnd.oasis.
> >> > opendocument.presentation");
> >> > >> > >> mimeMap.put("otp", "application/vnd.oasis.
> >> > opendocument.presentation");
> >> > >> > >> mimeMap.put("ods", "application/vnd.oasis.
> >> > opendocument.spreadsheet");
> >> > >> > >> mimeMap.put("ots", "application/vnd.oasis.
> >> > opendocument.spreadsheet");
> >> > >> > >> mimeMap.put("txt", "text/plain");
> >> > >> > >> mimeMap.put("log", "text/plain");
> >> > >> > >>
> >> > >> > >> The keys are the types supported.
> >> > >> > >>
> >> > >> > >>
> >> > >> > >> Amrit Sarkar
> >> > >> > >> Search Engineer
> >> > >> > >> Lucidworks, Inc.
> >> > >> > >> 415-589-9269
> >> > >> > >> www.lucidworks.com
> >> > >> > >> Twitter http://twitter.com/lucidworks
> >> > >> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> > >> > >>
> >> > >> > >> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar <
> >> > [hidden email]>
> >> > >> > >> wrote:
> >> > >> > >>
> >> > >> > >> > Ah!
> >> > >> > >> >
> >> > >> > >> > Only supported type is: text/html; encoding=utf-8
> >> > >> > >> >
> >> > >> > >> > I am not confident of this either :) but this should work.
> >> > >> > >> >
> >> > >> > >> > See the code-snippet below:
> >> > >> > >> >
> >> > >> > >> > ......
> >> > >> > >> >
> >> > >> > >> > if(res.httpStatus == 200) {
> >> > >> > >> >   // Raw content type of form "text/html; encoding=utf-8"
> >> > >> > >> >   String rawContentType = conn.getContentType();
> >> > >> > >> >   String type = rawContentType.split(";")[0];
> >> > >> > >> >   if(typeSupported(type) || "*".equals(fileTypes)) {
> >> > >> > >> >     String encoding = conn.getContentEncoding();
> >> > >> > >> >
> >> > >> > >> > ....
> >> > >> > >> >
> >> > >> > >> >
> >> > >> > >> > Amrit Sarkar
> >> > >> > >> > Search Engineer
> >> > >> > >> > Lucidworks, Inc.
> >> > >> > >> > 415-589-9269
> >> > >> > >> > www.lucidworks.com
> >> > >> > >> > Twitter http://twitter.com/lucidworks
> >> > >> > >> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> > >> > >> >
> >> > >> > >> > On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer <
> [hidden email]>
> >> > wrote:
> >> > >> > >> >
> >> > >> > >> >> Amrit Sarkar wrote:
> >> > >> > >> >>
> >> > >> > >> >> >> Strange,
> >> > >> > >> >> >>
> >> > >> > >> >> >> Can you add: "text/html;charset=utf-8". This is
> >> > wiki.apache.org
> >> > >> > page's
> >> > >> > >> >> >> Content-Type. Let's see what it says now.
> >> > >> > >> >>
> >> > >> > >> >> Same thing.  Verified Content-Type:
> >> > >> > >> >>
> >> > >> > >> >> quadra[git:master]$ wget -S -O /dev/null
> >> > http://quadra:9091/index.md
> >> > >> > |&
> >> > >> > >> >> grep Content-Type
> >> > >> > >> >>   Content-Type: text/html;charset=utf-8
> >> > >> > >> >> quadra[git:master]$ ]
> >> > >> > >> >>
> >> > >> > >> >> quadra[git:master]$ docker exec -it --user=solr solr
> bin/post -c
> >> > >> > handbook
> >> > >> > >> >> http://quadra:9091/index.md -recursive 10 -delay 0
> -filetypes
> >> > md
> >> > >> > >> >> /docker-java-home/jre/bin/java -classpath
> >> > >> > /opt/solr/dist/solr-core-7.0.1.jar
> >> > >> > >> >> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md
> -Dc=handbook
> >> > >> > -Ddata=web
> >> > >> > >> >> org.apache.solr.util.SimplePostTool
> http://quadra:9091/index.md
> >> > >> > >> >> SimplePostTool version 5.0.0
> >> > >> > >> >> Posting web pages to Solr url
> http://localhost:8983/solr/han
> >> > >> > >> >> dbook/update/extract
> >> > >> > >> >> Entering auto mode. Indexing pages with content-types
> >> > corresponding
> >> > >> > to
> >> > >> > >> >> file endings md
> >> > >> > >> >> SimplePostTool: WARNING: Never crawl an external web site
> >> > faster than
> >> > >> > >> >> every 10 seconds, your IP will probably be blocked
> >> > >> > >> >> Entering recursive mode, depth=10, delay=0s
> >> > >> > >> >> Entering crawl at level 0 (1 links total, 1 new)
> >> > >> > >> >> SimplePostTool: WARNING: Skipping URL with unsupported type
> >> > text/html
> >> > >> > >> >> SimplePostTool: WARNING: The URL
> http://quadra:9091/index.md
> >> > >> > returned a
> >> > >> > >> >> HTTP result status of 415
> >> > >> > >> >> 0 web pages indexed.
> >> > >> > >> >> COMMITting Solr index changes to
> http://localhost:8983/solr/han
> >> > >> > >> >> dbook/update/extract...
> >> > >> > >> >> Time spent: 0:00:00.531
> >> > >> > >> >> quadra[git:master]$
> >> > >> > >> >>
> >> > >> > >> >> Kevin
> >> > >> > >> >>
> >> > >> > >> >> >>
> >> > >> > >> >> >> Amrit Sarkar
> >> > >> > >> >> >> Search Engineer
> >> > >> > >> >> >> Lucidworks, Inc.
> >> > >> > >> >> >> 415-589-9269
> >> > >> > >> >> >> www.lucidworks.com
> >> > >> > >> >> >> Twitter http://twitter.com/lucidworks
> >> > >> > >> >> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> > >> > >> >> >>
> >> > >> > >> >> >> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer <
> >> > [hidden email]>
> >> > >> > wrote:
> >> > >> > >> >> >>
> >> > >> > >> >> >> > OK, so I hacked markserv to add Content-Type
> text/html,
> >> > but now
> >> > >> > I get
> >> > >> > >> >> >> >
> >> > >> > >> >> >> > SimplePostTool: WARNING: Skipping URL with
> unsupported type
> >> > >> > text/html
> >> > >> > >> >> >> >
> >> > >> > >> >> >> > What is it expecting?
> >> > >> > >> >> >> >
> >> > >> > >> >> >> > $ docker exec -it --user=solr solr bin/post -c
> handbook
> >> > >> > >> >> >> > http://quadra:9091/index.md -recursive 10 -delay 0
> >> > -filetypes
> >> > >> > md
> >> > >> > >> >> >> > /docker-java-home/jre/bin/java -classpath
> >> > >> > >> >> /opt/solr/dist/solr-core-7.0.1.jar
> >> > >> > >> >> >> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md
> >> > -Dc=handbook
> >> > >> > >> >> -Ddata=web
> >> > >> > >> >> >> > org.apache.solr.util.SimplePostTool
> >> > http://quadra:9091/index.md
> >> > >> > >> >> >> > SimplePostTool version 5.0.0
> >> > >> > >> >> >> > Posting web pages to Solr url
> http://localhost:8983/solr/
> >> > >> > >> >> >> > handbook/update/extract
> >> > >> > >> >> >> > Entering auto mode. Indexing pages with content-types
> >> > >> > corresponding
> >> > >> > >> >> to
> >> > >> > >> >> >> > file endings md
> >> > >> > >> >> >> > SimplePostTool: WARNING: Never crawl an external web
> site
> >> > >> > faster than
> >> > >> > >> >> >> > every 10 seconds, your IP will probably be blocked
> >> > >> > >> >> >> > Entering recursive mode, depth=10, delay=0s
> >> > >> > >> >> >> > Entering crawl at level 0 (1 links total, 1 new)
> >> > >> > >> >> >> > SimplePostTool: WARNING: Skipping URL with
> unsupported type
> >> > >> > text/html
> >> > >> > >> >> >> > SimplePostTool: WARNING: The URL
> >> > http://quadra:9091/index.md
> >> > >> > >> >> returned a
> >> > >> > >> >> >> > HTTP result status of 415
> >> > >> > >> >> >> > 0 web pages indexed.
> >> > >> > >> >> >> > COMMITting Solr index changes to
> >> > http://localhost:8983/solr/
> >> > >> > >> >> >> > handbook/update/extract...
> >> > >> > >> >> >> > Time spent: 0:00:03.882
> >> > >> > >> >> >> > $
> >> > >> > >> >> >> >
> >> > >> > >> >> >> > Thanks.
> >> > >> > >> >> >> >
> >> > >> > >> >> >> > Kevin
> >> > >> > >> >> >> >
> >> > >> > >> >>
> >> > >> > >> >
> >> > >> > >> >
> >> > >> >
> >> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: solr 7.0.1: exception running post to crawl simple website

Kevin Layer
Amrit Sarkar wrote:

>> Kevin,
>>
>> I am not able to replicate the issue on my system, which is bit annoying
>> for me. Try this out for last time:
>>
>> docker exec -it --user=solr solr bin/post -c handbook
>> http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes html
>>
>> and have Content-Type: "html" and "text/html", try with both.

With text/html I get and your command I get

quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes html
/docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=html -Dc=handbook -Ddata=web org.apache.solr.util.SimplePostTool http://quadra.franz.com:9091/index.md
SimplePostTool version 5.0.0
Posting web pages to Solr url http://localhost:8983/solr/handbook/update/extract
Entering auto mode. Indexing pages with content-types corresponding to file endings html
SimplePostTool: WARNING: Never crawl an external web site faster than every 10 seconds, your IP will probably be blocked
Entering recursive mode, depth=10, delay=0s
Entering crawl at level 0 (1 links total, 1 new)
POSTed web resource http://quadra.franz.com:9091/index.md (depth: 0)
[Fatal Error] :1:1: Content is not allowed in prolog.
Exception in thread "main" java.lang.RuntimeException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
        at org.apache.solr.util.SimplePostTool$PageFetcher.getLinksFromWebPage(SimplePostTool.java:1252)
        at org.apache.solr.util.SimplePostTool.webCrawl(SimplePostTool.java:616)
        at org.apache.solr.util.SimplePostTool.postWebPages(SimplePostTool.java:563)
        at org.apache.solr.util.SimplePostTool.doWebMode(SimplePostTool.java:365)
        at org.apache.solr.util.SimplePostTool.execute(SimplePostTool.java:187)
        at org.apache.solr.util.SimplePostTool.main(SimplePostTool.java:172)
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
        at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
        at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
        at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
        at org.apache.solr.util.SimplePostTool.makeDom(SimplePostTool.java:1061)
        at org.apache.solr.util.SimplePostTool$PageFetcher.getLinksFromWebPage(SimplePostTool.java:1232)
        ... 5 more


When I use "-filetype md" back to the regular output that doesn't scan
anything.


>>
>> If you get past this hurdle this hurdle, let me know.
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>
>> On Fri, Oct 13, 2017 at 8:22 PM, Kevin Layer <[hidden email]> wrote:
>>
>> > Amrit Sarkar wrote:
>> >
>> > >> ah oh, dockers. They are placed under [solr-home]/server/log/solr/log
>> > in
>> > >> the machine. I haven't played much with docker, any way you can get that
>> > >> file from that location.
>> >
>> > I see these files:
>> >
>> > /opt/solr/server/logs/archived
>> > /opt/solr/server/logs/solr_gc.log.0.current
>> > /opt/solr/server/logs/solr.log
>> > /opt/solr/server/solr/handbook/data/tlog
>> >
>> > The 3rd one has very little info.  Attached:
>> >
>> >
>> > 2017-10-11 15:28:09.564 INFO  (main) [   ] o.e.j.s.Server
>> > jetty-9.3.14.v20161028
>> > 2017-10-11 15:28:10.668 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
>> > ___      _       Welcome to Apache Solr™ version 7.0.1
>> > 2017-10-11 15:28:10.669 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter /
>> > __| ___| |_ _   Starting in standalone mode on port 8983
>> > 2017-10-11 15:28:10.670 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter \__
>> > \/ _ \ | '_|  Install dir: /opt/solr, Default config dir:
>> > /opt/solr/server/solr/configsets/_default/conf
>> > 2017-10-11 15:28:10.707 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
>> > |___/\___/_|_|    Start time: 2017-10-11T15:28:10.674Z
>> > 2017-10-11 15:28:10.747 INFO  (main) [   ] o.a.s.c.SolrResourceLoader
>> > Using system property solr.solr.home: /opt/solr/server/solr
>> > 2017-10-11 15:28:10.763 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading
>> > container configuration from /opt/solr/server/solr/solr.xml
>> > 2017-10-11 15:28:11.062 INFO  (main) [   ] o.a.s.c.SolrResourceLoader
>> > [null] Added 0 libs to classloader, from paths: []
>> > 2017-10-11 15:28:12.514 INFO  (main) [   ] o.a.s.c.CorePropertiesLocator
>> > Found 0 core definitions underneath /opt/solr/server/solr
>> > 2017-10-11 15:28:12.635 INFO  (main) [   ] o.e.j.s.Server Started @4304ms
>> > 2017-10-11 15:29:00.971 INFO  (qtp1911006827-13) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
>> > params={wt=json} status=0 QTime=108
>> > 2017-10-11 15:29:01.080 INFO  (qtp1911006827-18) [   ] o.a.s.c.TransientSolrCoreCacheDefault
>> > Allocating transient cache for 2147483647 transient cores
>> > 2017-10-11 15:29:01.083 INFO  (qtp1911006827-18) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
>> > params={core=handbook&action=STATUS&wt=json} status=0 QTime=5
>> > 2017-10-11 15:29:01.194 INFO  (qtp1911006827-19) [   ]
>> > o.a.s.h.a.CoreAdminOperation core create command
>> > name=handbook&action=CREATE&instanceDir=handbook&wt=json
>> > 2017-10-11 15:29:01.342 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.c.SolrResourceLoader [handbook] Added 51 libs to classloader, from
>> > paths: [/opt/solr/contrib/clustering/lib, /opt/solr/contrib/extraction/lib,
>> > /opt/solr/contrib/langid/lib, /opt/solr/contrib/velocity/lib,
>> > /opt/solr/dist]
>> > 2017-10-11 15:29:01.504 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.c.SolrConfig Using Lucene MatchVersion: 7.0.1
>> > 2017-10-11 15:29:01.969 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.s.IndexSchema [handbook] Schema name=default-config
>> > 2017-10-11 15:29:03.678 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.s.IndexSchema Loaded schema default-config/1.6 with uniqueid field id
>> > 2017-10-11 15:29:03.806 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.c.CoreContainer Creating SolrCore 'handbook' using configuration from
>> > instancedir /opt/solr/server/solr/handbook, trusted=true
>> > 2017-10-11 15:29:03.853 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.c.SolrCore solr.RecoveryStrategy.Builder
>> > 2017-10-11 15:29:03.866 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.c.SolrCore [[handbook] ] Opening new SolrCore at
>> > [/opt/solr/server/solr/handbook], dataDir=[/opt/solr/server/
>> > solr/handbook/data/]
>> > 2017-10-11 15:29:04.180 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.r.XSLTResponseWriter xsltCacheLifetimeSeconds=5
>> > 2017-10-11 15:29:05.100 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.u.UpdateHandler Using UpdateLog implementation:
>> > org.apache.solr.update.UpdateLog
>> > 2017-10-11 15:29:05.101 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.u.UpdateLog Initializing UpdateLog: dataDir= defaultSyncLevel=FLUSH
>> > numRecordsToKeep=100 maxNumLogsToKeep=10 numVersionBuckets=65536
>> > 2017-10-11 15:29:05.150 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.u.CommitTracker Hard AutoCommit: if uncommited for 15000ms;
>> > 2017-10-11 15:29:05.151 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.u.CommitTracker Soft AutoCommit: disabled
>> > 2017-10-11 15:29:05.199 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.s.SolrIndexSearcher Opening [Searcher@2b9fd97b[handbook] main]
>> > 2017-10-11 15:29:05.229 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.r.ManagedResourceStorage File-based storage initialized to use dir:
>> > /opt/solr/server/solr/handbook/conf
>> > 2017-10-11 15:29:05.266 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.h.c.SpellCheckComponent Initializing spell checkers
>> > 2017-10-11 15:29:05.283 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.s.DirectSolrSpellChecker init: {name=default,field=_text_,
>> > classname=solr.DirectSolrSpellChecker,distanceMeasure=internal,
>> > accuracy=0.5,maxEdits=2,minPrefix=1,maxInspections=5,minQueryLength=4,
>> > maxQueryFrequency=0.01}
>> > 2017-10-11 15:29:05.318 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.h.ReplicationHandler Commits will be reserved for  10000
>> > 2017-10-11 15:29:05.434 INFO  (searcherExecutor-7-thread-1-processing-x:handbook)
>> > [   x:handbook] o.a.s.c.QuerySenderListener QuerySenderListener sending
>> > requests to Searcher@2b9fd97b[handbook] main{ExitableDirectoryReader(
>> > UninvertingDirectoryReader())}
>> > 2017-10-11 15:29:05.439 INFO  (searcherExecutor-7-thread-1-processing-x:handbook)
>> > [   x:handbook] o.a.s.c.QuerySenderListener QuerySenderListener done.
>> > 2017-10-11 15:29:05.440 INFO  (searcherExecutor-7-thread-1-processing-x:handbook)
>> > [   x:handbook] o.a.s.h.c.SpellCheckComponent Loading spell index for
>> > spellchecker: default
>> > 2017-10-11 15:29:05.447 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.u.UpdateLog Could not find max version in index or recent updates,
>> > using new clock 1580975517016784896
>> > 2017-10-11 15:29:05.468 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
>> > params={name=handbook&action=CREATE&instanceDir=handbook&wt=json}
>> > status=0 QTime=4275
>> > 2017-10-11 15:29:05.494 INFO  (searcherExecutor-7-thread-1-processing-x:handbook)
>> > [   x:handbook] o.a.s.c.SolrCore [handbook] Registered new searcher
>> > Searcher@2b9fd97b[handbook] main{ExitableDirectoryReader(
>> > UninvertingDirectoryReader())}
>> > 2017-10-11 15:36:24.537 INFO  (qtp1911006827-14) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
>> > params={indexInfo=false&wt=json&_=1507736184190} status=0 QTime=1
>> > 2017-10-11 15:36:24.579 INFO  (qtp1911006827-20) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
>> > params={wt=json&_=1507736184191} status=0 QTime=38
>> > 2017-10-11 15:36:27.810 INFO  (qtp1911006827-14) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
>> > params={indexInfo=false&wt=json&_=1507736184190} status=0 QTime=0
>> > 2017-10-11 15:36:27.846 INFO  (qtp1911006827-13) [   x:handbook]
>> > o.a.s.c.S.Request [handbook]  webapp=/solr path=/admin/ping
>> > params={action=status&wt=json&_=1507736184191&ts=1507736184191}
>> > status=503 QTime=8
>> > 2017-10-11 15:36:27.852 INFO  (qtp1911006827-14) [   x:handbook]
>> > o.a.s.c.S.Request [handbook]  webapp=/solr path=/admin/luke
>> > params={numTerms=0&show=index&wt=json&_=1507736187772} status=0 QTime=35
>> > 2017-10-11 15:36:27.866 INFO  (qtp1911006827-18) [   x:handbook]
>> > o.a.s.c.S.Request [handbook]  webapp=/solr path=/replication
>> > params={wt=json&command=details&_=1507736187773} status=0 QTime=53
>> > 2017-10-11 15:36:27.893 INFO  (qtp1911006827-16) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
>> > params={wt=json&_=1507736184191} status=0 QTime=84
>> > 2017-10-11 15:36:27.894 INFO  (qtp1911006827-11) [   x:handbook]
>> > o.a.s.c.S.Request [handbook]  webapp=/solr path=/admin/system
>> > params={wt=json&_=1507736187773} status=0 QTime=64
>> > 2017-10-11 15:36:33.015 INFO  (qtp1911006827-13) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
>> > params={indexInfo=false&wt=json&_=1507736184190} status=0 QTime=0
>> > 2017-10-11 15:36:33.033 INFO  (qtp1911006827-20) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
>> > params={wt=json&_=1507736184191} status=0 QTime=18
>> > 2017-10-11 15:36:35.199 INFO  (qtp1911006827-14) [   x:handbook]
>> > o.a.s.c.S.Request [handbook]  webapp=/solr path=/select
>> > params={q=*:*&_=1507736184481} hits=0 status=0 QTime=54
>> > 2017-10-13 13:10:43.480 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.c.PluginBag Going to create a new requestHandler with {type =
>> > requestHandler,name = /update/extract,class = solr.extraction.
>> > ExtractingRequestHandler,attributes = {startup=lazy,
>> > name=/update/extract, class=solr.extraction.ExtractingRequestHandler},args
>> > = {defaults={lowernames=true,fmap.meta=ignored_,fmap.
>> > content=_text_,df=_text_}}}
>> > 2017-10-13 13:10:46.287 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.u.DirectUpdateHandler2 start commit{_version_=
>> > 1581148008618131456,optimize=false,openSearcher=true,waitSearcher=true,
>> > expungeDeletes=false,softCommit=false,prepareCommit=false}
>> > 2017-10-13 13:10:46.288 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
>> > 2017-10-13 13:10:46.374 INFO  (qtp1911006827-19) [   x:handbook]
>> > o.a.s.u.DirectUpdateHandler2 end_commit_flush
>> > 2017-10-13 13:10:46.375 INFO  (qtp1911006827-19) [   x:handbook] o.a.s.u.p.LogUpdateProcessorFactory
>> > [handbook]  webapp=/solr path=/update/extract params={commit=true}{commit=}
>> > 0 2947
>> > 2017-10-13 13:20:09.424 INFO  (qtp1911006827-11) [   x:handbook]
>> > o.a.s.u.DirectUpdateHandler2 start commit{_version_=
>> > 1581148599141531648,optimize=false,openSearcher=true,waitSearcher=true,
>> > expungeDeletes=false,softCommit=false,prepareCommit=false}
>> > 2017-10-13 13:20:09.447 INFO  (qtp1911006827-11) [   x:handbook]
>> > o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
>> > 2017-10-13 13:20:09.450 INFO  (qtp1911006827-11) [   x:handbook]
>> > o.a.s.u.DirectUpdateHandler2 end_commit_flush
>> > 2017-10-13 13:20:09.451 INFO  (qtp1911006827-11) [   x:handbook] o.a.s.u.p.LogUpdateProcessorFactory
>> > [handbook]  webapp=/solr path=/update/extract params={commit=true}{commit=}
>> > 0 27
>> > 2017-10-13 13:21:29.872 INFO  (qtp1911006827-17) [   x:handbook]
>> > o.a.s.u.DirectUpdateHandler2 start commit{_version_=
>> > 1581148683498422272,optimize=false,openSearcher=true,waitSearcher=true,
>> > expungeDeletes=false,softCommit=false,prepareCommit=false}
>> > 2017-10-13 13:21:29.873 INFO  (qtp1911006827-17) [   x:handbook]
>> > o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
>> > 2017-10-13 13:21:29.874 INFO  (qtp1911006827-17) [   x:handbook]
>> > o.a.s.u.DirectUpdateHandler2 end_commit_flush
>> > 2017-10-13 13:21:29.876 INFO  (qtp1911006827-17) [   x:handbook] o.a.s.u.p.LogUpdateProcessorFactory
>> > [handbook]  webapp=/solr path=/update/extract params={commit=true}{commit=}
>> > 0 4
>> > 2017-10-13 14:12:16.157 INFO  (qtp1911006827-15) [   x:handbook]
>> > o.a.s.u.DirectUpdateHandler2 start commit{_version_=
>> > 1581151877759762432,optimize=false,openSearcher=true,waitSearcher=true,
>> > expungeDeletes=false,softCommit=false,prepareCommit=false}
>> > 2017-10-13 14:12:16.158 INFO  (qtp1911006827-15) [   x:handbook]
>> > o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
>> > 2017-10-13 14:12:16.161 INFO  (qtp1911006827-15) [   x:handbook]
>> > o.a.s.u.DirectUpdateHandler2 end_commit_flush
>> > 2017-10-13 14:12:16.162 INFO  (qtp1911006827-15) [   x:handbook] o.a.s.u.p.LogUpdateProcessorFactory
>> > [handbook]  webapp=/solr path=/update/extract params={commit=true}{commit=}
>> > 0 6
>> > 2017-10-13 14:34:13.809 INFO  (qtp1911006827-17) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
>> > params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=42
>> > 2017-10-13 14:34:14.006 INFO  (qtp1911006827-20) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
>> > params={wt=json&_=1507905253483} status=0 QTime=239
>> > 2017-10-13 14:34:14.063 INFO  (qtp1911006827-18) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
>> > params={wt=json&_=1507905253483} status=0 QTime=28
>> > 2017-10-13 14:34:17.720 INFO  (qtp1911006827-15) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
>> > params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=0
>> > 2017-10-13 14:34:17.767 INFO  (qtp1911006827-15) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=43
>> > 2017-10-13 14:34:17.773 INFO  (qtp1911006827-17) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
>> > params={wt=json&_=1507905253483} status=0 QTime=54
>> > 2017-10-13 14:34:27.726 INFO  (qtp1911006827-11) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:34:37.719 INFO  (qtp1911006827-19) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:34:41.174 INFO  (qtp1911006827-18) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
>> > params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=0
>> > 2017-10-13 14:34:41.222 INFO  (qtp1911006827-20) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
>> > params={wt=json&_=1507905253483} status=0 QTime=48
>> > 2017-10-13 14:34:41.287 INFO  (qtp1911006827-18) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
>> > params={wt=json&_=1507905253483} status=0 QTime=17
>> > 2017-10-13 14:34:42.737 INFO  (qtp1911006827-13) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
>> > params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=0
>> > 2017-10-13 14:34:42.745 INFO  (qtp1911006827-13) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:34:42.763 INFO  (qtp1911006827-14) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
>> > params={wt=json&_=1507905253483} status=0 QTime=25
>> > 2017-10-13 14:34:52.980 INFO  (qtp1911006827-15) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:35:02.976 INFO  (qtp1911006827-11) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:35:12.976 INFO  (qtp1911006827-19) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:35:22.977 INFO  (qtp1911006827-16) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:35:32.981 INFO  (qtp1911006827-20) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:35:42.986 INFO  (qtp1911006827-20) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:35:52.986 INFO  (qtp1911006827-17) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:36:02.988 INFO  (qtp1911006827-13) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:36:12.994 INFO  (qtp1911006827-14) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:36:22.994 INFO  (qtp1911006827-15) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:36:33.002 INFO  (qtp1911006827-11) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:36:43.010 INFO  (qtp1911006827-16) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:36:52.995 INFO  (qtp1911006827-20) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:37:02.997 INFO  (qtp1911006827-17) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:37:13.002 INFO  (qtp1911006827-17) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:37:23.014 INFO  (qtp1911006827-14) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:37:24.960 INFO  (qtp1911006827-15) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
>> > params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=0
>> > 2017-10-13 14:37:25.004 INFO  (qtp1911006827-11) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
>> > params={wt=json&_=1507905253483} status=0 QTime=19
>> > 2017-10-13 14:37:25.112 INFO  (qtp1911006827-18) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696} status=0 QTime=76
>> > 2017-10-13 14:38:07.403 INFO  (qtp1911006827-19) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
>> > params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=0
>> > 2017-10-13 14:38:07.440 INFO  (qtp1911006827-13) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:38:07.451 INFO  (qtp1911006827-20) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
>> > params={wt=json&_=1507905253483} status=0 QTime=18
>> > 2017-10-13 14:38:17.391 INFO  (qtp1911006827-18) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:38:27.393 INFO  (qtp1911006827-16) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:38:37.403 INFO  (qtp1911006827-14) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:38:47.395 INFO  (qtp1911006827-14) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:38:57.399 INFO  (qtp1911006827-14) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:39:07.400 INFO  (qtp1911006827-14) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:39:17.404 INFO  (qtp1911006827-13) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:39:27.406 INFO  (qtp1911006827-13) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:39:37.408 INFO  (qtp1911006827-13) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:39:47.415 INFO  (qtp1911006827-17) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:39:57.416 INFO  (qtp1911006827-16) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:40:07.431 INFO  (qtp1911006827-19) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:40:17.421 INFO  (qtp1911006827-15) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:40:27.421 INFO  (qtp1911006827-11) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:40:37.422 INFO  (qtp1911006827-11) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:40:47.422 INFO  (qtp1911006827-20) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:40:57.428 INFO  (qtp1911006827-18) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:41:07.431 INFO  (qtp1911006827-13) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:41:17.422 INFO  (qtp1911006827-17) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:41:27.423 INFO  (qtp1911006827-17) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:41:37.423 INFO  (qtp1911006827-19) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:41:47.426 INFO  (qtp1911006827-15) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:41:57.441 INFO  (qtp1911006827-14) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:42:07.434 INFO  (qtp1911006827-11) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:42:17.434 INFO  (qtp1911006827-20) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:42:27.435 INFO  (qtp1911006827-18) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:42:37.439 INFO  (qtp1911006827-13) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:42:47.697 INFO  (qtp1911006827-16) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:42:57.804 INFO  (qtp1911006827-17) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:43:08.323 INFO  (qtp1911006827-19) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:43:18.653 INFO  (qtp1911006827-15) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:43:28.813 INFO  (qtp1911006827-14) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:43:38.816 INFO  (qtp1911006827-14) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:43:48.815 INFO  (qtp1911006827-20) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:43:58.817 INFO  (qtp1911006827-20) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:44:08.813 INFO  (qtp1911006827-13) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:44:18.820 INFO  (qtp1911006827-16) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:44:28.818 INFO  (qtp1911006827-17) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:44:38.821 INFO  (qtp1911006827-19) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:44:48.823 INFO  (qtp1911006827-15) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:44:58.819 INFO  (qtp1911006827-11) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:45:08.824 INFO  (qtp1911006827-14) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:45:18.820 INFO  (qtp1911006827-18) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:45:28.824 INFO  (qtp1911006827-20) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:45:38.823 INFO  (qtp1911006827-13) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:45:48.824 INFO  (qtp1911006827-16) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:45:58.819 INFO  (qtp1911006827-17) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:46:08.822 INFO  (qtp1911006827-17) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:46:18.820 INFO  (qtp1911006827-17) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:46:28.820 INFO  (qtp1911006827-11) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:46:38.826 INFO  (qtp1911006827-14) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:46:48.823 INFO  (qtp1911006827-18) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:46:58.825 INFO  (qtp1911006827-20) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:47:08.827 INFO  (qtp1911006827-13) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:47:18.846 INFO  (qtp1911006827-13) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:47:28.825 INFO  (qtp1911006827-19) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:47:38.826 INFO  (qtp1911006827-15) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:47:50.183 INFO  (qtp1911006827-17) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=1356
>> > 2017-10-13 14:47:58.828 INFO  (qtp1911006827-11) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:48:08.828 INFO  (qtp1911006827-14) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:48:18.885 INFO  (qtp1911006827-18) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:48:28.827 INFO  (qtp1911006827-20) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:48:38.831 INFO  (qtp1911006827-16) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:48:48.833 INFO  (qtp1911006827-13) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:48:58.833 INFO  (qtp1911006827-13) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:49:08.834 INFO  (qtp1911006827-15) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:49:18.832 INFO  (qtp1911006827-17) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:49:28.835 INFO  (qtp1911006827-11) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:49:38.861 INFO  (qtp1911006827-14) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=14
>> > 2017-10-13 14:49:48.853 INFO  (qtp1911006827-18) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:49:58.837 INFO  (qtp1911006827-20) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> > 2017-10-13 14:50:08.833 INFO  (qtp1911006827-16) [   ]
>> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
>> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
>> >
>> >
>> >
>> >
>> > >>
>> > >> Amrit Sarkar
>> > >> Search Engineer
>> > >> Lucidworks, Inc.
>> > >> 415-589-9269
>> > >> www.lucidworks.com
>> > >> Twitter http://twitter.com/lucidworks
>> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > >>
>> > >> On Fri, Oct 13, 2017 at 8:08 PM, Kevin Layer <[hidden email]> wrote:
>> > >>
>> > >> > Amrit Sarkar wrote:
>> > >> >
>> > >> > >> Hi Kevin,
>> > >> > >>
>> > >> > >> Can you post the solr log in the mail thread. I don't think it
>> > handled
>> > >> > the
>> > >> > >> .md by itself by first glance at code.
>> > >> >
>> > >> > How do I extract the log you want?
>> > >> >
>> > >> >
>> > >> > >>
>> > >> > >> Amrit Sarkar
>> > >> > >> Search Engineer
>> > >> > >> Lucidworks, Inc.
>> > >> > >> 415-589-9269
>> > >> > >> www.lucidworks.com
>> > >> > >> Twitter http://twitter.com/lucidworks
>> > >> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > >> > >>
>> > >> > >> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer <[hidden email]>
>> > wrote:
>> > >> > >>
>> > >> > >> > Amrit Sarkar wrote:
>> > >> > >> >
>> > >> > >> > >> Kevin,
>> > >> > >> > >>
>> > >> > >> > >> Just put "html" too and give it a shot. These are the types
>> > it is
>> > >> > >> > expecting:
>> > >> > >> >
>> > >> > >> > Same thing.
>> > >> > >> >
>> > >> > >> > >>
>> > >> > >> > >> mimeMap = new HashMap<>();
>> > >> > >> > >> mimeMap.put("xml", "application/xml");
>> > >> > >> > >> mimeMap.put("csv", "text/csv");
>> > >> > >> > >> mimeMap.put("json", "application/json");
>> > >> > >> > >> mimeMap.put("jsonl", "application/json");
>> > >> > >> > >> mimeMap.put("pdf", "application/pdf");
>> > >> > >> > >> mimeMap.put("rtf", "text/rtf");
>> > >> > >> > >> mimeMap.put("html", "text/html");
>> > >> > >> > >> mimeMap.put("htm", "text/html");
>> > >> > >> > >> mimeMap.put("doc", "application/msword");
>> > >> > >> > >> mimeMap.put("docx",
>> > >> > >> > >> "application/vnd.openxmlformats-officedocument.
>> > >> > >> > wordprocessingml.document");
>> > >> > >> > >> mimeMap.put("ppt", "application/vnd.ms-powerpoint");
>> > >> > >> > >> mimeMap.put("pptx",
>> > >> > >> > >> "application/vnd.openxmlformats-officedocument.
>> > >> > >> > presentationml.presentation");
>> > >> > >> > >> mimeMap.put("xls", "application/vnd.ms-excel");
>> > >> > >> > >> mimeMap.put("xlsx",
>> > >> > >> > >> "application/vnd.openxmlformats-officedocument.
>> > >> > spreadsheetml.sheet");
>> > >> > >> > >> mimeMap.put("odt", "application/vnd.oasis.
>> > opendocument.text");
>> > >> > >> > >> mimeMap.put("ott", "application/vnd.oasis.
>> > opendocument.text");
>> > >> > >> > >> mimeMap.put("odp", "application/vnd.oasis.
>> > >> > opendocument.presentation");
>> > >> > >> > >> mimeMap.put("otp", "application/vnd.oasis.
>> > >> > opendocument.presentation");
>> > >> > >> > >> mimeMap.put("ods", "application/vnd.oasis.
>> > >> > opendocument.spreadsheet");
>> > >> > >> > >> mimeMap.put("ots", "application/vnd.oasis.
>> > >> > opendocument.spreadsheet");
>> > >> > >> > >> mimeMap.put("txt", "text/plain");
>> > >> > >> > >> mimeMap.put("log", "text/plain");
>> > >> > >> > >>
>> > >> > >> > >> The keys are the types supported.
>> > >> > >> > >>
>> > >> > >> > >>
>> > >> > >> > >> Amrit Sarkar
>> > >> > >> > >> Search Engineer
>> > >> > >> > >> Lucidworks, Inc.
>> > >> > >> > >> 415-589-9269
>> > >> > >> > >> www.lucidworks.com
>> > >> > >> > >> Twitter http://twitter.com/lucidworks
>> > >> > >> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > >> > >> > >>
>> > >> > >> > >> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar <
>> > >> > [hidden email]>
>> > >> > >> > >> wrote:
>> > >> > >> > >>
>> > >> > >> > >> > Ah!
>> > >> > >> > >> >
>> > >> > >> > >> > Only supported type is: text/html; encoding=utf-8
>> > >> > >> > >> >
>> > >> > >> > >> > I am not confident of this either :) but this should work.
>> > >> > >> > >> >
>> > >> > >> > >> > See the code-snippet below:
>> > >> > >> > >> >
>> > >> > >> > >> > ......
>> > >> > >> > >> >
>> > >> > >> > >> > if(res.httpStatus == 200) {
>> > >> > >> > >> >   // Raw content type of form "text/html; encoding=utf-8"
>> > >> > >> > >> >   String rawContentType = conn.getContentType();
>> > >> > >> > >> >   String type = rawContentType.split(";")[0];
>> > >> > >> > >> >   if(typeSupported(type) || "*".equals(fileTypes)) {
>> > >> > >> > >> >     String encoding = conn.getContentEncoding();
>> > >> > >> > >> >
>> > >> > >> > >> > ....
>> > >> > >> > >> >
>> > >> > >> > >> >
>> > >> > >> > >> > Amrit Sarkar
>> > >> > >> > >> > Search Engineer
>> > >> > >> > >> > Lucidworks, Inc.
>> > >> > >> > >> > 415-589-9269
>> > >> > >> > >> > www.lucidworks.com
>> > >> > >> > >> > Twitter http://twitter.com/lucidworks
>> > >> > >> > >> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > >> > >> > >> >
>> > >> > >> > >> > On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer <
>> > [hidden email]>
>> > >> > wrote:
>> > >> > >> > >> >
>> > >> > >> > >> >> Amrit Sarkar wrote:
>> > >> > >> > >> >>
>> > >> > >> > >> >> >> Strange,
>> > >> > >> > >> >> >>
>> > >> > >> > >> >> >> Can you add: "text/html;charset=utf-8". This is
>> > >> > wiki.apache.org
>> > >> > >> > page's
>> > >> > >> > >> >> >> Content-Type. Let's see what it says now.
>> > >> > >> > >> >>
>> > >> > >> > >> >> Same thing.  Verified Content-Type:
>> > >> > >> > >> >>
>> > >> > >> > >> >> quadra[git:master]$ wget -S -O /dev/null
>> > >> > http://quadra:9091/index.md
>> > >> > >> > |&
>> > >> > >> > >> >> grep Content-Type
>> > >> > >> > >> >>   Content-Type: text/html;charset=utf-8
>> > >> > >> > >> >> quadra[git:master]$ ]
>> > >> > >> > >> >>
>> > >> > >> > >> >> quadra[git:master]$ docker exec -it --user=solr solr
>> > bin/post -c
>> > >> > >> > handbook
>> > >> > >> > >> >> http://quadra:9091/index.md -recursive 10 -delay 0
>> > -filetypes
>> > >> > md
>> > >> > >> > >> >> /docker-java-home/jre/bin/java -classpath
>> > >> > >> > /opt/solr/dist/solr-core-7.0.1.jar
>> > >> > >> > >> >> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md
>> > -Dc=handbook
>> > >> > >> > -Ddata=web
>> > >> > >> > >> >> org.apache.solr.util.SimplePostTool
>> > http://quadra:9091/index.md
>> > >> > >> > >> >> SimplePostTool version 5.0.0
>> > >> > >> > >> >> Posting web pages to Solr url
>> > http://localhost:8983/solr/han
>> > >> > >> > >> >> dbook/update/extract
>> > >> > >> > >> >> Entering auto mode. Indexing pages with content-types
>> > >> > corresponding
>> > >> > >> > to
>> > >> > >> > >> >> file endings md
>> > >> > >> > >> >> SimplePostTool: WARNING: Never crawl an external web site
>> > >> > faster than
>> > >> > >> > >> >> every 10 seconds, your IP will probably be blocked
>> > >> > >> > >> >> Entering recursive mode, depth=10, delay=0s
>> > >> > >> > >> >> Entering crawl at level 0 (1 links total, 1 new)
>> > >> > >> > >> >> SimplePostTool: WARNING: Skipping URL with unsupported type
>> > >> > text/html
>> > >> > >> > >> >> SimplePostTool: WARNING: The URL
>> > http://quadra:9091/index.md
>> > >> > >> > returned a
>> > >> > >> > >> >> HTTP result status of 415
>> > >> > >> > >> >> 0 web pages indexed.
>> > >> > >> > >> >> COMMITting Solr index changes to
>> > http://localhost:8983/solr/han
>> > >> > >> > >> >> dbook/update/extract...
>> > >> > >> > >> >> Time spent: 0:00:00.531
>> > >> > >> > >> >> quadra[git:master]$
>> > >> > >> > >> >>
>> > >> > >> > >> >> Kevin
>> > >> > >> > >> >>
>> > >> > >> > >> >> >>
>> > >> > >> > >> >> >> Amrit Sarkar
>> > >> > >> > >> >> >> Search Engineer
>> > >> > >> > >> >> >> Lucidworks, Inc.
>> > >> > >> > >> >> >> 415-589-9269
>> > >> > >> > >> >> >> www.lucidworks.com
>> > >> > >> > >> >> >> Twitter http://twitter.com/lucidworks
>> > >> > >> > >> >> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > >> > >> > >> >> >>
>> > >> > >> > >> >> >> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer <
>> > >> > [hidden email]>
>> > >> > >> > wrote:
>> > >> > >> > >> >> >>
>> > >> > >> > >> >> >> > OK, so I hacked markserv to add Content-Type
>> > text/html,
>> > >> > but now
>> > >> > >> > I get
>> > >> > >> > >> >> >> >
>> > >> > >> > >> >> >> > SimplePostTool: WARNING: Skipping URL with
>> > unsupported type
>> > >> > >> > text/html
>> > >> > >> > >> >> >> >
>> > >> > >> > >> >> >> > What is it expecting?
>> > >> > >> > >> >> >> >
>> > >> > >> > >> >> >> > $ docker exec -it --user=solr solr bin/post -c
>> > handbook
>> > >> > >> > >> >> >> > http://quadra:9091/index.md -recursive 10 -delay 0
>> > >> > -filetypes
>> > >> > >> > md
>> > >> > >> > >> >> >> > /docker-java-home/jre/bin/java -classpath
>> > >> > >> > >> >> /opt/solr/dist/solr-core-7.0.1.jar
>> > >> > >> > >> >> >> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md
>> > >> > -Dc=handbook
>> > >> > >> > >> >> -Ddata=web
>> > >> > >> > >> >> >> > org.apache.solr.util.SimplePostTool
>> > >> > http://quadra:9091/index.md
>> > >> > >> > >> >> >> > SimplePostTool version 5.0.0
>> > >> > >> > >> >> >> > Posting web pages to Solr url
>> > http://localhost:8983/solr/
>> > >> > >> > >> >> >> > handbook/update/extract
>> > >> > >> > >> >> >> > Entering auto mode. Indexing pages with content-types
>> > >> > >> > corresponding
>> > >> > >> > >> >> to
>> > >> > >> > >> >> >> > file endings md
>> > >> > >> > >> >> >> > SimplePostTool: WARNING: Never crawl an external web
>> > site
>> > >> > >> > faster than
>> > >> > >> > >> >> >> > every 10 seconds, your IP will probably be blocked
>> > >> > >> > >> >> >> > Entering recursive mode, depth=10, delay=0s
>> > >> > >> > >> >> >> > Entering crawl at level 0 (1 links total, 1 new)
>> > >> > >> > >> >> >> > SimplePostTool: WARNING: Skipping URL with
>> > unsupported type
>> > >> > >> > text/html
>> > >> > >> > >> >> >> > SimplePostTool: WARNING: The URL
>> > >> > http://quadra:9091/index.md
>> > >> > >> > >> >> returned a
>> > >> > >> > >> >> >> > HTTP result status of 415
>> > >> > >> > >> >> >> > 0 web pages indexed.
>> > >> > >> > >> >> >> > COMMITting Solr index changes to
>> > >> > http://localhost:8983/solr/
>> > >> > >> > >> >> >> > handbook/update/extract...
>> > >> > >> > >> >> >> > Time spent: 0:00:03.882
>> > >> > >> > >> >> >> > $
>> > >> > >> > >> >> >> >
>> > >> > >> > >> >> >> > Thanks.
>> > >> > >> > >> >> >> >
>> > >> > >> > >> >> >> > Kevin
>> > >> > >> > >> >> >> >
>> > >> > >> > >> >>
>> > >> > >> > >> >
>> > >> > >> > >> >
>> > >> > >> >
>> > >> >
>> >
>> >

Reply | Threaded
Open this post in threaded view
|

Re: solr 7.0.1: exception running post to crawl simple website

Amrit Sarkar
Kevin,

fileType => md is not recognizable format in SimplePostTool, anyway, moving
on.

The above is SAXParse, runtime exception. Nothing can be done at Solr end
except curating your own data.
Some helpful links:
https://stackoverflow.com/questions/2599919/java-parsing-xml-document-gives-content-not-allowed-in-prolog-error
https://stackoverflow.com/questions/3030903/content-is-not-allowed-in-prolog-when-parsing-perfectly-valid-xml-on-gae

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Fri, Oct 13, 2017 at 8:48 PM, Kevin Layer <[hidden email]> wrote:

> Amrit Sarkar wrote:
>
> >> Kevin,
> >>
> >> I am not able to replicate the issue on my system, which is bit annoying
> >> for me. Try this out for last time:
> >>
> >> docker exec -it --user=solr solr bin/post -c handbook
> >> http://quadra.franz.com:9091/index.md -recursive 10 -delay 0
> -filetypes html
> >>
> >> and have Content-Type: "html" and "text/html", try with both.
>
> With text/html I get and your command I get
>
> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook
> http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes
> html
> /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar
> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=html -Dc=handbook
> -Ddata=web org.apache.solr.util.SimplePostTool
> http://quadra.franz.com:9091/index.md
> SimplePostTool version 5.0.0
> Posting web pages to Solr url http://localhost:8983/solr/
> handbook/update/extract
> Entering auto mode. Indexing pages with content-types corresponding to
> file endings html
> SimplePostTool: WARNING: Never crawl an external web site faster than
> every 10 seconds, your IP will probably be blocked
> Entering recursive mode, depth=10, delay=0s
> Entering crawl at level 0 (1 links total, 1 new)
> POSTed web resource http://quadra.franz.com:9091/index.md (depth: 0)
> [Fatal Error] :1:1: Content is not allowed in prolog.
> Exception in thread "main" java.lang.RuntimeException:
> org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is
> not allowed in prolog.
>         at org.apache.solr.util.SimplePostTool$PageFetcher.
> getLinksFromWebPage(SimplePostTool.java:1252)
>         at org.apache.solr.util.SimplePostTool.webCrawl(
> SimplePostTool.java:616)
>         at org.apache.solr.util.SimplePostTool.postWebPages(
> SimplePostTool.java:563)
>         at org.apache.solr.util.SimplePostTool.doWebMode(
> SimplePostTool.java:365)
>         at org.apache.solr.util.SimplePostTool.execute(
> SimplePostTool.java:187)
>         at org.apache.solr.util.SimplePostTool.main(
> SimplePostTool.java:172)
> Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1;
> Content is not allowed in prolog.
>         at com.sun.org.apache.xerces.internal.parsers.DOMParser.
> parse(DOMParser.java:257)
>         at com.sun.org.apache.xerces.internal.jaxp.
> DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
>         at javax.xml.parsers.DocumentBuilder.parse(
> DocumentBuilder.java:121)
>         at org.apache.solr.util.SimplePostTool.makeDom(
> SimplePostTool.java:1061)
>         at org.apache.solr.util.SimplePostTool$PageFetcher.
> getLinksFromWebPage(SimplePostTool.java:1232)
>         ... 5 more
>
>
> When I use "-filetype md" back to the regular output that doesn't scan
> anything.
>
>
> >>
> >> If you get past this hurdle this hurdle, let me know.
> >>
> >> Amrit Sarkar
> >> Search Engineer
> >> Lucidworks, Inc.
> >> 415-589-9269
> >> www.lucidworks.com
> >> Twitter http://twitter.com/lucidworks
> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >>
> >> On Fri, Oct 13, 2017 at 8:22 PM, Kevin Layer <[hidden email]> wrote:
> >>
> >> > Amrit Sarkar wrote:
> >> >
> >> > >> ah oh, dockers. They are placed under [solr-home]/server/log/solr/
> log
> >> > in
> >> > >> the machine. I haven't played much with docker, any way you can
> get that
> >> > >> file from that location.
> >> >
> >> > I see these files:
> >> >
> >> > /opt/solr/server/logs/archived
> >> > /opt/solr/server/logs/solr_gc.log.0.current
> >> > /opt/solr/server/logs/solr.log
> >> > /opt/solr/server/solr/handbook/data/tlog
> >> >
> >> > The 3rd one has very little info.  Attached:
> >> >
> >> >
> >> > 2017-10-11 15:28:09.564 INFO  (main) [   ] o.e.j.s.Server
> >> > jetty-9.3.14.v20161028
> >> > 2017-10-11 15:28:10.668 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
> >> > ___      _       Welcome to Apache Solr™ version 7.0.1
> >> > 2017-10-11 15:28:10.669 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
> /
> >> > __| ___| |_ _   Starting in standalone mode on port 8983
> >> > 2017-10-11 15:28:10.670 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
> \__
> >> > \/ _ \ | '_|  Install dir: /opt/solr, Default config dir:
> >> > /opt/solr/server/solr/configsets/_default/conf
> >> > 2017-10-11 15:28:10.707 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter
> >> > |___/\___/_|_|    Start time: 2017-10-11T15:28:10.674Z
> >> > 2017-10-11 15:28:10.747 INFO  (main) [   ] o.a.s.c.SolrResourceLoader
> >> > Using system property solr.solr.home: /opt/solr/server/solr
> >> > 2017-10-11 15:28:10.763 INFO  (main) [   ] o.a.s.c.SolrXmlConfig
> Loading
> >> > container configuration from /opt/solr/server/solr/solr.xml
> >> > 2017-10-11 15:28:11.062 INFO  (main) [   ] o.a.s.c.SolrResourceLoader
> >> > [null] Added 0 libs to classloader, from paths: []
> >> > 2017-10-11 15:28:12.514 INFO  (main) [   ]
> o.a.s.c.CorePropertiesLocator
> >> > Found 0 core definitions underneath /opt/solr/server/solr
> >> > 2017-10-11 15:28:12.635 INFO  (main) [   ] o.e.j.s.Server Started
> @4304ms
> >> > 2017-10-11 15:29:00.971 INFO  (qtp1911006827-13) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> >> > params={wt=json} status=0 QTime=108
> >> > 2017-10-11 15:29:01.080 INFO  (qtp1911006827-18) [   ] o.a.s.c.
> TransientSolrCoreCacheDefault
> >> > Allocating transient cache for 2147483647 transient cores
> >> > 2017-10-11 15:29:01.083 INFO  (qtp1911006827-18) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> >> > params={core=handbook&action=STATUS&wt=json} status=0 QTime=5
> >> > 2017-10-11 15:29:01.194 INFO  (qtp1911006827-19) [   ]
> >> > o.a.s.h.a.CoreAdminOperation core create command
> >> > name=handbook&action=CREATE&instanceDir=handbook&wt=json
> >> > 2017-10-11 15:29:01.342 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.c.SolrResourceLoader [handbook] Added 51 libs to classloader,
> from
> >> > paths: [/opt/solr/contrib/clustering/lib,
> /opt/solr/contrib/extraction/lib,
> >> > /opt/solr/contrib/langid/lib, /opt/solr/contrib/velocity/lib,
> >> > /opt/solr/dist]
> >> > 2017-10-11 15:29:01.504 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.c.SolrConfig Using Lucene MatchVersion: 7.0.1
> >> > 2017-10-11 15:29:01.969 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.s.IndexSchema [handbook] Schema name=default-config
> >> > 2017-10-11 15:29:03.678 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.s.IndexSchema Loaded schema default-config/1.6 with uniqueid
> field id
> >> > 2017-10-11 15:29:03.806 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.c.CoreContainer Creating SolrCore 'handbook' using
> configuration from
> >> > instancedir /opt/solr/server/solr/handbook, trusted=true
> >> > 2017-10-11 15:29:03.853 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.c.SolrCore solr.RecoveryStrategy.Builder
> >> > 2017-10-11 15:29:03.866 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.c.SolrCore [[handbook] ] Opening new SolrCore at
> >> > [/opt/solr/server/solr/handbook], dataDir=[/opt/solr/server/
> >> > solr/handbook/data/]
> >> > 2017-10-11 15:29:04.180 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.r.XSLTResponseWriter xsltCacheLifetimeSeconds=5
> >> > 2017-10-11 15:29:05.100 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.u.UpdateHandler Using UpdateLog implementation:
> >> > org.apache.solr.update.UpdateLog
> >> > 2017-10-11 15:29:05.101 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.u.UpdateLog Initializing UpdateLog: dataDir=
> defaultSyncLevel=FLUSH
> >> > numRecordsToKeep=100 maxNumLogsToKeep=10 numVersionBuckets=65536
> >> > 2017-10-11 15:29:05.150 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.u.CommitTracker Hard AutoCommit: if uncommited for 15000ms;
> >> > 2017-10-11 15:29:05.151 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.u.CommitTracker Soft AutoCommit: disabled
> >> > 2017-10-11 15:29:05.199 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.s.SolrIndexSearcher Opening [Searcher@2b9fd97b[handbook] main]
> >> > 2017-10-11 15:29:05.229 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.r.ManagedResourceStorage File-based storage initialized to use
> dir:
> >> > /opt/solr/server/solr/handbook/conf
> >> > 2017-10-11 15:29:05.266 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.h.c.SpellCheckComponent Initializing spell checkers
> >> > 2017-10-11 15:29:05.283 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.s.DirectSolrSpellChecker init: {name=default,field=_text_,
> >> > classname=solr.DirectSolrSpellChecker,distanceMeasure=internal,
> >> > accuracy=0.5,maxEdits=2,minPrefix=1,maxInspections=5,
> minQueryLength=4,
> >> > maxQueryFrequency=0.01}
> >> > 2017-10-11 15:29:05.318 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.h.ReplicationHandler Commits will be reserved for  10000
> >> > 2017-10-11 15:29:05.434 INFO  (searcherExecutor-7-thread-1-
> processing-x:handbook)
> >> > [   x:handbook] o.a.s.c.QuerySenderListener QuerySenderListener
> sending
> >> > requests to Searcher@2b9fd97b[handbook] main{ExitableDirectoryReader(
> >> > UninvertingDirectoryReader())}
> >> > 2017-10-11 15:29:05.439 INFO  (searcherExecutor-7-thread-1-
> processing-x:handbook)
> >> > [   x:handbook] o.a.s.c.QuerySenderListener QuerySenderListener done.
> >> > 2017-10-11 15:29:05.440 INFO  (searcherExecutor-7-thread-1-
> processing-x:handbook)
> >> > [   x:handbook] o.a.s.h.c.SpellCheckComponent Loading spell index for
> >> > spellchecker: default
> >> > 2017-10-11 15:29:05.447 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.u.UpdateLog Could not find max version in index or recent
> updates,
> >> > using new clock 1580975517016784896
> >> > 2017-10-11 15:29:05.468 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> >> > params={name=handbook&action=CREATE&instanceDir=handbook&wt=json}
> >> > status=0 QTime=4275
> >> > 2017-10-11 15:29:05.494 INFO  (searcherExecutor-7-thread-1-
> processing-x:handbook)
> >> > [   x:handbook] o.a.s.c.SolrCore [handbook] Registered new searcher
> >> > Searcher@2b9fd97b[handbook] main{ExitableDirectoryReader(
> >> > UninvertingDirectoryReader())}
> >> > 2017-10-11 15:36:24.537 INFO  (qtp1911006827-14) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> >> > params={indexInfo=false&wt=json&_=1507736184190} status=0 QTime=1
> >> > 2017-10-11 15:36:24.579 INFO  (qtp1911006827-20) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> >> > params={wt=json&_=1507736184191} status=0 QTime=38
> >> > 2017-10-11 15:36:27.810 INFO  (qtp1911006827-14) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> >> > params={indexInfo=false&wt=json&_=1507736184190} status=0 QTime=0
> >> > 2017-10-11 15:36:27.846 INFO  (qtp1911006827-13) [   x:handbook]
> >> > o.a.s.c.S.Request [handbook]  webapp=/solr path=/admin/ping
> >> > params={action=status&wt=json&_=1507736184191&ts=1507736184191}
> >> > status=503 QTime=8
> >> > 2017-10-11 15:36:27.852 INFO  (qtp1911006827-14) [   x:handbook]
> >> > o.a.s.c.S.Request [handbook]  webapp=/solr path=/admin/luke
> >> > params={numTerms=0&show=index&wt=json&_=1507736187772} status=0
> QTime=35
> >> > 2017-10-11 15:36:27.866 INFO  (qtp1911006827-18) [   x:handbook]
> >> > o.a.s.c.S.Request [handbook]  webapp=/solr path=/replication
> >> > params={wt=json&command=details&_=1507736187773} status=0 QTime=53
> >> > 2017-10-11 15:36:27.893 INFO  (qtp1911006827-16) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> >> > params={wt=json&_=1507736184191} status=0 QTime=84
> >> > 2017-10-11 15:36:27.894 INFO  (qtp1911006827-11) [   x:handbook]
> >> > o.a.s.c.S.Request [handbook]  webapp=/solr path=/admin/system
> >> > params={wt=json&_=1507736187773} status=0 QTime=64
> >> > 2017-10-11 15:36:33.015 INFO  (qtp1911006827-13) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> >> > params={indexInfo=false&wt=json&_=1507736184190} status=0 QTime=0
> >> > 2017-10-11 15:36:33.033 INFO  (qtp1911006827-20) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> >> > params={wt=json&_=1507736184191} status=0 QTime=18
> >> > 2017-10-11 15:36:35.199 INFO  (qtp1911006827-14) [   x:handbook]
> >> > o.a.s.c.S.Request [handbook]  webapp=/solr path=/select
> >> > params={q=*:*&_=1507736184481} hits=0 status=0 QTime=54
> >> > 2017-10-13 13:10:43.480 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.c.PluginBag Going to create a new requestHandler with {type =
> >> > requestHandler,name = /update/extract,class = solr.extraction.
> >> > ExtractingRequestHandler,attributes = {startup=lazy,
> >> > name=/update/extract, class=solr.extraction.
> ExtractingRequestHandler},args
> >> > = {defaults={lowernames=true,fmap.meta=ignored_,fmap.
> >> > content=_text_,df=_text_}}}
> >> > 2017-10-13 13:10:46.287 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.u.DirectUpdateHandler2 start commit{_version_=
> >> > 1581148008618131456,optimize=false,openSearcher=true,
> waitSearcher=true,
> >> > expungeDeletes=false,softCommit=false,prepareCommit=false}
> >> > 2017-10-13 13:10:46.288 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping
> IW.commit.
> >> > 2017-10-13 13:10:46.374 INFO  (qtp1911006827-19) [   x:handbook]
> >> > o.a.s.u.DirectUpdateHandler2 end_commit_flush
> >> > 2017-10-13 13:10:46.375 INFO  (qtp1911006827-19) [   x:handbook]
> o.a.s.u.p.LogUpdateProcessorFactory
> >> > [handbook]  webapp=/solr path=/update/extract
> params={commit=true}{commit=}
> >> > 0 2947
> >> > 2017-10-13 13:20:09.424 INFO  (qtp1911006827-11) [   x:handbook]
> >> > o.a.s.u.DirectUpdateHandler2 start commit{_version_=
> >> > 1581148599141531648,optimize=false,openSearcher=true,
> waitSearcher=true,
> >> > expungeDeletes=false,softCommit=false,prepareCommit=false}
> >> > 2017-10-13 13:20:09.447 INFO  (qtp1911006827-11) [   x:handbook]
> >> > o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping
> IW.commit.
> >> > 2017-10-13 13:20:09.450 INFO  (qtp1911006827-11) [   x:handbook]
> >> > o.a.s.u.DirectUpdateHandler2 end_commit_flush
> >> > 2017-10-13 13:20:09.451 INFO  (qtp1911006827-11) [   x:handbook]
> o.a.s.u.p.LogUpdateProcessorFactory
> >> > [handbook]  webapp=/solr path=/update/extract
> params={commit=true}{commit=}
> >> > 0 27
> >> > 2017-10-13 13:21:29.872 INFO  (qtp1911006827-17) [   x:handbook]
> >> > o.a.s.u.DirectUpdateHandler2 start commit{_version_=
> >> > 1581148683498422272,optimize=false,openSearcher=true,
> waitSearcher=true,
> >> > expungeDeletes=false,softCommit=false,prepareCommit=false}
> >> > 2017-10-13 13:21:29.873 INFO  (qtp1911006827-17) [   x:handbook]
> >> > o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping
> IW.commit.
> >> > 2017-10-13 13:21:29.874 INFO  (qtp1911006827-17) [   x:handbook]
> >> > o.a.s.u.DirectUpdateHandler2 end_commit_flush
> >> > 2017-10-13 13:21:29.876 INFO  (qtp1911006827-17) [   x:handbook]
> o.a.s.u.p.LogUpdateProcessorFactory
> >> > [handbook]  webapp=/solr path=/update/extract
> params={commit=true}{commit=}
> >> > 0 4
> >> > 2017-10-13 14:12:16.157 INFO  (qtp1911006827-15) [   x:handbook]
> >> > o.a.s.u.DirectUpdateHandler2 start commit{_version_=
> >> > 1581151877759762432,optimize=false,openSearcher=true,
> waitSearcher=true,
> >> > expungeDeletes=false,softCommit=false,prepareCommit=false}
> >> > 2017-10-13 14:12:16.158 INFO  (qtp1911006827-15) [   x:handbook]
> >> > o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping
> IW.commit.
> >> > 2017-10-13 14:12:16.161 INFO  (qtp1911006827-15) [   x:handbook]
> >> > o.a.s.u.DirectUpdateHandler2 end_commit_flush
> >> > 2017-10-13 14:12:16.162 INFO  (qtp1911006827-15) [   x:handbook]
> o.a.s.u.p.LogUpdateProcessorFactory
> >> > [handbook]  webapp=/solr path=/update/extract
> params={commit=true}{commit=}
> >> > 0 6
> >> > 2017-10-13 14:34:13.809 INFO  (qtp1911006827-17) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> >> > params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=42
> >> > 2017-10-13 14:34:14.006 INFO  (qtp1911006827-20) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> >> > params={wt=json&_=1507905253483} status=0 QTime=239
> >> > 2017-10-13 14:34:14.063 INFO  (qtp1911006827-18) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> >> > params={wt=json&_=1507905253483} status=0 QTime=28
> >> > 2017-10-13 14:34:17.720 INFO  (qtp1911006827-15) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> >> > params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=0
> >> > 2017-10-13 14:34:17.767 INFO  (qtp1911006827-15) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=43
> >> > 2017-10-13 14:34:17.773 INFO  (qtp1911006827-17) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> >> > params={wt=json&_=1507905253483} status=0 QTime=54
> >> > 2017-10-13 14:34:27.726 INFO  (qtp1911006827-11) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:34:37.719 INFO  (qtp1911006827-19) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:34:41.174 INFO  (qtp1911006827-18) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> >> > params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=0
> >> > 2017-10-13 14:34:41.222 INFO  (qtp1911006827-20) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> >> > params={wt=json&_=1507905253483} status=0 QTime=48
> >> > 2017-10-13 14:34:41.287 INFO  (qtp1911006827-18) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> >> > params={wt=json&_=1507905253483} status=0 QTime=17
> >> > 2017-10-13 14:34:42.737 INFO  (qtp1911006827-13) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> >> > params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=0
> >> > 2017-10-13 14:34:42.745 INFO  (qtp1911006827-13) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:34:42.763 INFO  (qtp1911006827-14) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> >> > params={wt=json&_=1507905253483} status=0 QTime=25
> >> > 2017-10-13 14:34:52.980 INFO  (qtp1911006827-15) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:35:02.976 INFO  (qtp1911006827-11) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:35:12.976 INFO  (qtp1911006827-19) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:35:22.977 INFO  (qtp1911006827-16) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:35:32.981 INFO  (qtp1911006827-20) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:35:42.986 INFO  (qtp1911006827-20) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:35:52.986 INFO  (qtp1911006827-17) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:36:02.988 INFO  (qtp1911006827-13) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:36:12.994 INFO  (qtp1911006827-14) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:36:22.994 INFO  (qtp1911006827-15) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:36:33.002 INFO  (qtp1911006827-11) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:36:43.010 INFO  (qtp1911006827-16) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:36:52.995 INFO  (qtp1911006827-20) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:37:02.997 INFO  (qtp1911006827-17) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:37:13.002 INFO  (qtp1911006827-17) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:37:23.014 INFO  (qtp1911006827-14) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:37:24.960 INFO  (qtp1911006827-15) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> >> > params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=0
> >> > 2017-10-13 14:37:25.004 INFO  (qtp1911006827-11) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> >> > params={wt=json&_=1507905253483} status=0 QTime=19
> >> > 2017-10-13 14:37:25.112 INFO  (qtp1911006827-18) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696} status=0 QTime=76
> >> > 2017-10-13 14:38:07.403 INFO  (qtp1911006827-19) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
> >> > params={indexInfo=false&wt=json&_=1507905253481} status=0 QTime=0
> >> > 2017-10-13 14:38:07.440 INFO  (qtp1911006827-13) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:38:07.451 INFO  (qtp1911006827-20) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system
> >> > params={wt=json&_=1507905253483} status=0 QTime=18
> >> > 2017-10-13 14:38:17.391 INFO  (qtp1911006827-18) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:38:27.393 INFO  (qtp1911006827-16) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:38:37.403 INFO  (qtp1911006827-14) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:38:47.395 INFO  (qtp1911006827-14) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:38:57.399 INFO  (qtp1911006827-14) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:39:07.400 INFO  (qtp1911006827-14) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:39:17.404 INFO  (qtp1911006827-13) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:39:27.406 INFO  (qtp1911006827-13) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:39:37.408 INFO  (qtp1911006827-13) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:39:47.415 INFO  (qtp1911006827-17) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:39:57.416 INFO  (qtp1911006827-16) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:40:07.431 INFO  (qtp1911006827-19) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:40:17.421 INFO  (qtp1911006827-15) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:40:27.421 INFO  (qtp1911006827-11) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:40:37.422 INFO  (qtp1911006827-11) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:40:47.422 INFO  (qtp1911006827-20) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:40:57.428 INFO  (qtp1911006827-18) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:41:07.431 INFO  (qtp1911006827-13) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:41:17.422 INFO  (qtp1911006827-17) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:41:27.423 INFO  (qtp1911006827-17) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:41:37.423 INFO  (qtp1911006827-19) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:41:47.426 INFO  (qtp1911006827-15) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:41:57.441 INFO  (qtp1911006827-14) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:42:07.434 INFO  (qtp1911006827-11) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:42:17.434 INFO  (qtp1911006827-20) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:42:27.435 INFO  (qtp1911006827-18) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:42:37.439 INFO  (qtp1911006827-13) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:42:47.697 INFO  (qtp1911006827-16) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:42:57.804 INFO  (qtp1911006827-17) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:43:08.323 INFO  (qtp1911006827-19) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:43:18.653 INFO  (qtp1911006827-15) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:43:28.813 INFO  (qtp1911006827-14) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:43:38.816 INFO  (qtp1911006827-14) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:43:48.815 INFO  (qtp1911006827-20) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:43:58.817 INFO  (qtp1911006827-20) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:44:08.813 INFO  (qtp1911006827-13) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:44:18.820 INFO  (qtp1911006827-16) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:44:28.818 INFO  (qtp1911006827-17) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:44:38.821 INFO  (qtp1911006827-19) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:44:48.823 INFO  (qtp1911006827-15) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:44:58.819 INFO  (qtp1911006827-11) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:45:08.824 INFO  (qtp1911006827-14) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:45:18.820 INFO  (qtp1911006827-18) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:45:28.824 INFO  (qtp1911006827-20) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:45:38.823 INFO  (qtp1911006827-13) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:45:48.824 INFO  (qtp1911006827-16) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:45:58.819 INFO  (qtp1911006827-17) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:46:08.822 INFO  (qtp1911006827-17) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:46:18.820 INFO  (qtp1911006827-17) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:46:28.820 INFO  (qtp1911006827-11) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:46:38.826 INFO  (qtp1911006827-14) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:46:48.823 INFO  (qtp1911006827-18) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:46:58.825 INFO  (qtp1911006827-20) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:47:08.827 INFO  (qtp1911006827-13) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:47:18.846 INFO  (qtp1911006827-13) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:47:28.825 INFO  (qtp1911006827-19) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:47:38.826 INFO  (qtp1911006827-15) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:47:50.183 INFO  (qtp1911006827-17) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=1356
> >> > 2017-10-13 14:47:58.828 INFO  (qtp1911006827-11) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:48:08.828 INFO  (qtp1911006827-14) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:48:18.885 INFO  (qtp1911006827-18) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:48:28.827 INFO  (qtp1911006827-20) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:48:38.831 INFO  (qtp1911006827-16) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:48:48.833 INFO  (qtp1911006827-13) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:48:58.833 INFO  (qtp1911006827-13) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:49:08.834 INFO  (qtp1911006827-15) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:49:18.832 INFO  (qtp1911006827-17) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:49:28.835 INFO  (qtp1911006827-11) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:49:38.861 INFO  (qtp1911006827-14) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=14
> >> > 2017-10-13 14:49:48.853 INFO  (qtp1911006827-18) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:49:58.837 INFO  (qtp1911006827-20) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> > 2017-10-13 14:50:08.833 INFO  (qtp1911006827-16) [   ]
> >> > o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging
> >> > params={wt=json&_=1507905257696&since=0} status=0 QTime=0
> >> >
> >> >
> >> >
> >> >
> >> > >>
> >> > >> Amrit Sarkar
> >> > >> Search Engineer
> >> > >> Lucidworks, Inc.
> >> > >> 415-589-9269
> >> > >> www.lucidworks.com
> >> > >> Twitter http://twitter.com/lucidworks
> >> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> > >>
> >> > >> On Fri, Oct 13, 2017 at 8:08 PM, Kevin Layer <[hidden email]>
> wrote:
> >> > >>
> >> > >> > Amrit Sarkar wrote:
> >> > >> >
> >> > >> > >> Hi Kevin,
> >> > >> > >>
> >> > >> > >> Can you post the solr log in the mail thread. I don't think it
> >> > handled
> >> > >> > the
> >> > >> > >> .md by itself by first glance at code.
> >> > >> >
> >> > >> > How do I extract the log you want?
> >> > >> >
> >> > >> >
> >> > >> > >>
> >> > >> > >> Amrit Sarkar
> >> > >> > >> Search Engineer
> >> > >> > >> Lucidworks, Inc.
> >> > >> > >> 415-589-9269
> >> > >> > >> www.lucidworks.com
> >> > >> > >> Twitter http://twitter.com/lucidworks
> >> > >> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> > >> > >>
> >> > >> > >> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer <[hidden email]
> >
> >> > wrote:
> >> > >> > >>
> >> > >> > >> > Amrit Sarkar wrote:
> >> > >> > >> >
> >> > >> > >> > >> Kevin,
> >> > >> > >> > >>
> >> > >> > >> > >> Just put "html" too and give it a shot. These are the
> types
> >> > it is
> >> > >> > >> > expecting:
> >> > >> > >> >
> >> > >> > >> > Same thing.
> >> > >> > >> >
> >> > >> > >> > >>
> >> > >> > >> > >> mimeMap = new HashMap<>();
> >> > >> > >> > >> mimeMap.put("xml", "application/xml");
> >> > >> > >> > >> mimeMap.put("csv", "text/csv");
> >> > >> > >> > >> mimeMap.put("json", "application/json");
> >> > >> > >> > >> mimeMap.put("jsonl", "application/json");
> >> > >> > >> > >> mimeMap.put("pdf", "application/pdf");
> >> > >> > >> > >> mimeMap.put("rtf", "text/rtf");
> >> > >> > >> > >> mimeMap.put("html", "text/html");
> >> > >> > >> > >> mimeMap.put("htm", "text/html");
> >> > >> > >> > >> mimeMap.put("doc", "application/msword");
> >> > >> > >> > >> mimeMap.put("docx",
> >> > >> > >> > >> "application/vnd.openxmlformats-officedocument.
> >> > >> > >> > wordprocessingml.document");
> >> > >> > >> > >> mimeMap.put("ppt", "application/vnd.ms-powerpoint");
> >> > >> > >> > >> mimeMap.put("pptx",
> >> > >> > >> > >> "application/vnd.openxmlformats-officedocument.
> >> > >> > >> > presentationml.presentation");
> >> > >> > >> > >> mimeMap.put("xls", "application/vnd.ms-excel");
> >> > >> > >> > >> mimeMap.put("xlsx",
> >> > >> > >> > >> "application/vnd.openxmlformats-officedocument.
> >> > >> > spreadsheetml.sheet");
> >> > >> > >> > >> mimeMap.put("odt", "application/vnd.oasis.
> >> > opendocument.text");
> >> > >> > >> > >> mimeMap.put("ott", "application/vnd.oasis.
> >> > opendocument.text");
> >> > >> > >> > >> mimeMap.put("odp", "application/vnd.oasis.
> >> > >> > opendocument.presentation");
> >> > >> > >> > >> mimeMap.put("otp", "application/vnd.oasis.
> >> > >> > opendocument.presentation");
> >> > >> > >> > >> mimeMap.put("ods", "application/vnd.oasis.
> >> > >> > opendocument.spreadsheet");
> >> > >> > >> > >> mimeMap.put("ots", "application/vnd.oasis.
> >> > >> > opendocument.spreadsheet");
> >> > >> > >> > >> mimeMap.put("txt", "text/plain");
> >> > >> > >> > >> mimeMap.put("log", "text/plain");
> >> > >> > >> > >>
> >> > >> > >> > >> The keys are the types supported.
> >> > >> > >> > >>
> >> > >> > >> > >>
> >> > >> > >> > >> Amrit Sarkar
> >> > >> > >> > >> Search Engineer
> >> > >> > >> > >> Lucidworks, Inc.
> >> > >> > >> > >> 415-589-9269
> >> > >> > >> > >> www.lucidworks.com
> >> > >> > >> > >> Twitter http://twitter.com/lucidworks
> >> > >> > >> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> > >> > >> > >>
> >> > >> > >> > >> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar <
> >> > >> > [hidden email]>
> >> > >> > >> > >> wrote:
> >> > >> > >> > >>
> >> > >> > >> > >> > Ah!
> >> > >> > >> > >> >
> >> > >> > >> > >> > Only supported type is: text/html; encoding=utf-8
> >> > >> > >> > >> >
> >> > >> > >> > >> > I am not confident of this either :) but this should
> work.
> >> > >> > >> > >> >
> >> > >> > >> > >> > See the code-snippet below:
> >> > >> > >> > >> >
> >> > >> > >> > >> > ......
> >> > >> > >> > >> >
> >> > >> > >> > >> > if(res.httpStatus == 200) {
> >> > >> > >> > >> >   // Raw content type of form "text/html;
> encoding=utf-8"
> >> > >> > >> > >> >   String rawContentType = conn.getContentType();
> >> > >> > >> > >> >   String type = rawContentType.split(";")[0];
> >> > >> > >> > >> >   if(typeSupported(type) || "*".equals(fileTypes)) {
> >> > >> > >> > >> >     String encoding = conn.getContentEncoding();
> >> > >> > >> > >> >
> >> > >> > >> > >> > ....
> >> > >> > >> > >> >
> >> > >> > >> > >> >
> >> > >> > >> > >> > Amrit Sarkar
> >> > >> > >> > >> > Search Engineer
> >> > >> > >> > >> > Lucidworks, Inc.
> >> > >> > >> > >> > 415-589-9269
> >> > >> > >> > >> > www.lucidworks.com
> >> > >> > >> > >> > Twitter http://twitter.com/lucidworks
> >> > >> > >> > >> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> > >> > >> > >> >
> >> > >> > >> > >> > On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer <
> >> > [hidden email]>
> >> > >> > wrote:
> >> > >> > >> > >> >
> >> > >> > >> > >> >> Amrit Sarkar wrote:
> >> > >> > >> > >> >>
> >> > >> > >> > >> >> >> Strange,
> >> > >> > >> > >> >> >>
> >> > >> > >> > >> >> >> Can you add: "text/html;charset=utf-8". This is
> >> > >> > wiki.apache.org
> >> > >> > >> > page's
> >> > >> > >> > >> >> >> Content-Type. Let's see what it says now.
> >> > >> > >> > >> >>
> >> > >> > >> > >> >> Same thing.  Verified Content-Type:
> >> > >> > >> > >> >>
> >> > >> > >> > >> >> quadra[git:master]$ wget -S -O /dev/null
> >> > >> > http://quadra:9091/index.md
> >> > >> > >> > |&
> >> > >> > >> > >> >> grep Content-Type
> >> > >> > >> > >> >>   Content-Type: text/html;charset=utf-8
> >> > >> > >> > >> >> quadra[git:master]$ ]
> >> > >> > >> > >> >>
> >> > >> > >> > >> >> quadra[git:master]$ docker exec -it --user=solr solr
> >> > bin/post -c
> >> > >> > >> > handbook
> >> > >> > >> > >> >> http://quadra:9091/index.md -recursive 10 -delay 0
> >> > -filetypes
> >> > >> > md
> >> > >> > >> > >> >> /docker-java-home/jre/bin/java -classpath
> >> > >> > >> > /opt/solr/dist/solr-core-7.0.1.jar
> >> > >> > >> > >> >> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md
> >> > -Dc=handbook
> >> > >> > >> > -Ddata=web
> >> > >> > >> > >> >> org.apache.solr.util.SimplePostTool
> >> > http://quadra:9091/index.md
> >> > >> > >> > >> >> SimplePostTool version 5.0.0
> >> > >> > >> > >> >> Posting web pages to Solr url
> >> > http://localhost:8983/solr/han
> >> > >> > >> > >> >> dbook/update/extract
> >> > >> > >> > >> >> Entering auto mode. Indexing pages with content-types
> >> > >> > corresponding
> >> > >> > >> > to
> >> > >> > >> > >> >> file endings md
> >> > >> > >> > >> >> SimplePostTool: WARNING: Never crawl an external web
> site
> >> > >> > faster than
> >> > >> > >> > >> >> every 10 seconds, your IP will probably be blocked
> >> > >> > >> > >> >> Entering recursive mode, depth=10, delay=0s
> >> > >> > >> > >> >> Entering crawl at level 0 (1 links total, 1 new)
> >> > >> > >> > >> >> SimplePostTool: WARNING: Skipping URL with
> unsupported type
> >> > >> > text/html
> >> > >> > >> > >> >> SimplePostTool: WARNING: The URL
> >> > http://quadra:9091/index.md
> >> > >> > >> > returned a
> >> > >> > >> > >> >> HTTP result status of 415
> >> > >> > >> > >> >> 0 web pages indexed.
> >> > >> > >> > >> >> COMMITting Solr index changes to
> >> > http://localhost:8983/solr/han
> >> > >> > >> > >> >> dbook/update/extract...
> >> > >> > >> > >> >> Time spent: 0:00:00.531
> >> > >> > >> > >> >> quadra[git:master]$
> >> > >> > >> > >> >>
> >> > >> > >> > >> >> Kevin
> >> > >> > >> > >> >>
> >> > >> > >> > >> >> >>
> >> > >> > >> > >> >> >> Amrit Sarkar
> >> > >> > >> > >> >> >> Search Engineer
> >> > >> > >> > >> >> >> Lucidworks, Inc.
> >> > >> > >> > >> >> >> 415-589-9269
> >> > >> > >> > >> >> >> www.lucidworks.com
> >> > >> > >> > >> >> >> Twitter http://twitter.com/lucidworks
> >> > >> > >> > >> >> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >> > >> > >> > >> >> >>
> >> > >> > >> > >> >> >> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer <
> >> > >> > [hidden email]>
> >> > >> > >> > wrote:
> >> > >> > >> > >> >> >>
> >> > >> > >> > >> >> >> > OK, so I hacked markserv to add Content-Type
> >> > text/html,
> >> > >> > but now
> >> > >> > >> > I get
> >> > >> > >> > >> >> >> >
> >> > >> > >> > >> >> >> > SimplePostTool: WARNING: Skipping URL with
> >> > unsupported type
> >> > >> > >> > text/html
> >> > >> > >> > >> >> >> >
> >> > >> > >> > >> >> >> > What is it expecting?
> >> > >> > >> > >> >> >> >
> >> > >> > >> > >> >> >> > $ docker exec -it --user=solr solr bin/post -c
> >> > handbook
> >> > >> > >> > >> >> >> > http://quadra:9091/index.md -recursive 10
> -delay 0
> >> > >> > -filetypes
> >> > >> > >> > md
> >> > >> > >> > >> >> >> > /docker-java-home/jre/bin/java -classpath
> >> > >> > >> > >> >> /opt/solr/dist/solr-core-7.0.1.jar
> >> > >> > >> > >> >> >> > -Dauto=yes -Drecursive=10 -Ddelay=0
> -Dfiletypes=md
> >> > >> > -Dc=handbook
> >> > >> > >> > >> >> -Ddata=web
> >> > >> > >> > >> >> >> > org.apache.solr.util.SimplePostTool
> >> > >> > http://quadra:9091/index.md
> >> > >> > >> > >> >> >> > SimplePostTool version 5.0.0
> >> > >> > >> > >> >> >> > Posting web pages to Solr url
> >> > http://localhost:8983/solr/
> >> > >> > >> > >> >> >> > handbook/update/extract
> >> > >> > >> > >> >> >> > Entering auto mode. Indexing pages with
> content-types
> >> > >> > >> > corresponding
> >> > >> > >> > >> >> to
> >> > >> > >> > >> >> >> > file endings md
> >> > >> > >> > >> >> >> > SimplePostTool: WARNING: Never crawl an external
> web
> >> > site
> >> > >> > >> > faster than
> >> > >> > >> > >> >> >> > every 10 seconds, your IP will probably be
> blocked
> >> > >> > >> > >> >> >> > Entering recursive mode, depth=10, delay=0s
> >> > >> > >> > >> >> >> > Entering crawl at level 0 (1 links total, 1 new)
> >> > >> > >> > >> >> >> > SimplePostTool: WARNING: Skipping URL with
> >> > unsupported type
> >> > >> > >> > text/html
> >> > >> > >> > >> >> >> > SimplePostTool: WARNING: The URL
> >> > >> > http://quadra:9091/index.md
> >> > >> > >> > >> >> returned a
> >> > >> > >> > >> >> >> > HTTP result status of 415
> >> > >> > >> > >> >> >> > 0 web pages indexed.
> >> > >> > >> > >> >> >> > COMMITting Solr index changes to
> >> > >> > http://localhost:8983/solr/
> >> > >> > >> > >> >> >> > handbook/update/extract...
> >> > >> > >> > >> >> >> > Time spent: 0:00:03.882
> >> > >> > >> > >> >> >> > $
> >> > >> > >> > >> >> >> >
> >> > >> > >> > >> >> >> > Thanks.
> >> > >> > >> > >> >> >> >
> >> > >> > >> > >> >> >> > Kevin
> >> > >> > >> > >> >> >> >
> >> > >> > >> > >> >>
> >> > >> > >> > >> >
> >> > >> > >> > >> >
> >> > >> > >> >
> >> > >> >
> >> >
> >> >
>
12