What is the correct URL for POSTing new data?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

What is the correct URL for POSTing new data?

Christopher Schultz
All,

I've recently been encountering some frustrations with Solr 7.3 after
configuring TLS; since the command-line tools (which are a breeze to use
when you have a "toy" Solr installation) stop working when TLS is
enabled, I'm finding myself having to perform the following tasks in
order to get bin/post to work:

1. patch bin/post:

234,235c234,235
< echo "$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}"
org.apache.solr.util.SimplePostTool "${PARAMS[@]}"
< "$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}"
org.apache.solr.util.SimplePostTool "${PARAMS[@]}"
---
> echo "$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}"
${SOLR_POST_OPTS} org.apache.solr.util.SimplePostTool "${PARAMS[@]}"
> "$JAVA" -classpath "${TOOL_JAR[0]}" "${PROPS[@]}" ${SOLR_POST_OPTS}
org.apache.solr.util.SimplePostTool "${PARAMS[@]}"


2. Run the command with lots of manual options:

$ SOLR_POST_OPTS="-Djavax.net.ssl.trustStore=/etc/solr/solr-client.p12
-Djavax.net.ssl.trustStorePassword=whatevs
-Djavax.net.ssl.trustStoreType=PKCS12" /usr/local/solr/bin/post -c
new_core https://localhost:8983/solr/new_core

[time passes while bin/post uploads a very large file]

SimplePostTool version 5.0.0
Posting files to [base] url https://localhost:8983/solr/new_core...
Entering auto mode. File endings considered are
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file new_core.json (application/json) to [base]/json/docs
SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for
url: https://localhost:8983/solr/new_core/json/docs
SimplePostTool: WARNING: Response: <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /solr/new_core/json/docs. Reason:
<pre>    Not Found</pre></p>
</body>
</html>
SimplePostTool: WARNING: IOException while reading response:
java.io.FileNotFoundException:
https://localhost:8983/solr/new_core/json/docs
1 files indexed.
COMMITting Solr index changes to https://localhost:8983/solr/new_core...
SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for
url: https://localhost:8983/solr/new_core?commit=true
SimplePostTool: WARNING: Response: <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing /solr/new_core. Reason:
<pre>    Not Found</pre></p>
</body>
</html>
Time spent: 0:00:04.710

I'm guessing that I just don't know what the URL is supposed to be for
that core. When browsing the web UI, I can examine the core here:

https://localhost:8983/solr/#/~cores/new_core

Solr reports:

    startTime:        a day ago
    instanceDir:        /var/solr/data/new_core
    dataDir:        /var/solr/data/new_core/data/

Index
    lastModified:        -
    version:        2
    numDocs:        0
    maxDoc:        0
    deletedDocs:        0
    current: [check-mark]


So the core is there. I suspect I'm simply not addressing it correctly.
How should I modify the URL I pass on the command-line so that bin/post
can inject a new batch of data?

Thanks,
-chris
Reply | Threaded
Open this post in threaded view
|

Re: What is the correct URL for POSTing new data?

Shawn Heisey-2
On 4/13/2018 7:49 AM, Christopher Schultz wrote:

> $ SOLR_POST_OPTS="-Djavax.net.ssl.trustStore=/etc/solr/solr-client.p12
> -Djavax.net.ssl.trustStorePassword=whatevs
> -Djavax.net.ssl.trustStoreType=PKCS12" /usr/local/solr/bin/post -c
> new_core https://localhost:8983/solr/new_core
>
> [time passes while bin/post uploads a very large file]
>
> SimplePostTool version 5.0.0
> Posting files to [base] url https://localhost:8983/solr/new_core...
> Entering auto mode. File endings considered are
> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> POSTing file new_core.json (application/json) to [base]/json/docs
> SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for
> url: https://localhost:8983/solr/new_core/json/docs

The URL path (beyond the core name) it's ending up with is /json/docs,
when it should be /update/json/docs.

If you hadn't given the command a specific URL, it probably would have
figured out the correct URL on its own.  The base URL for the post tool
normally includes the /update path, which is different than the base URL
for something like HttpSolrClient (in the SolrJ library).  Changing the
handler path is done differently in SolrJ than it is with the post tool.

I know, we've violated that principle again. :)

The bin/post tool is a *simple* tool.  The java class that it calls is
even named "SimplePostTool".  It is expected that most users will
outgrow its functionality quickly and write their own indexing software
that does whatever custom processing they require.  The tool doesn't get
a lot of improvements because we don't intend it to be used as a
production indexing mechanism.  If it does what you need, there's
nothing wrong with production usage, but you need to be aware that it
doesn't have robust error handling, which is usually pretty important
for production.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: What is the correct URL for POSTing new data?

Christopher Schultz
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Shawn,

On 4/13/18 6:02 PM, Shawn Heisey wrote:
> On 4/13/2018 7:49 AM, Christopher Schultz wrote:
>> $
>> SOLR_POST_OPTS="-Djavax.net.ssl.trustStore=/etc/solr/solr-client.p12
>>
>>
- -Djavax.net.ssl.trustStorePassword=whatevs
>> -Djavax.net.ssl.trustStoreType=PKCS12" /usr/local/solr/bin/post
>> -c new_core https://localhost:8983/solr/new_core
>>
>> [time passes while bin/post uploads a very large file]
>>
>> SimplePostTool version 5.0.0 Posting files to [base] url
>> https://localhost:8983/solr/new_core... Entering auto mode. File
>> endings considered are
>> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp
,ots,rtf,htm,html,txt,log
>>
>>
POSTing file new_core.json (application/json) to [base]/json/docs
>> SimplePostTool: WARNING: Solr returned an error #404 (Not Found)
>> for url: https://localhost:8983/solr/new_core/json/docs
>
> The URL path (beyond the core name) it's ending up with is
> /json/docs, when it should be /update/json/docs.

Looks like that worked. I could find that nowhere in the documentation.

> If you hadn't given the command a specific URL, it probably would
> have figured out the correct URL on its own.

No, it wouldn't have. It doesn't read any configuration files and
guesses its way through everything. Simply adding HTTPS support
required me to modify the script and manually-specify the URL. That's
why I went through the trouble of explaining so in my initial post.

> The base URL for the post tool normally includes the /update path,
> which is different than the base URL for something like
> HttpSolrClient (in the SolrJ library).  Changing the handler path
> is done differently in SolrJ than it is with the post tool.
>
> I know, we've violated that principle again. :)

;)

I don't mind all surprises. It's the ones that have zero documentation
that are the most surprising.

> The bin/post tool is a *simple* tool.  The java class that it calls
> is even named "SimplePostTool".  It is expected that most users
> will outgrow its functionality quickly and write their own indexing
> software that does whatever custom processing they require.  The
> tool doesn't get a lot of improvements because we don't intend it
> to be used as a production indexing mechanism.

I'm using it as a bulk-loading operation. I have no need in production
to completely bootstrap a document collection unless the existing one
has been trashed for some reason. Why bother writing my own client
that does the equivalent of "SELECT * FROM table" and then loop over
the ResultSet calling SolrJ's add-document method.

The SimplePostTool should be able to handle that for me, and if it
did, I'd have less code to babysit in perpetuity.

> If it does what you need, there's nothing wrong with production
> usage, but you need to be aware that it doesn't have robust error
> handling, which is usually pretty important for production.
I'm okay with terse error messages.

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlrTtQ4ACgkQHPApP6U8
pFi4iA/+MQ97WTAkA6t06PqJWEjbu948gJSS5gaVo3HZlTtLqmzT3/4HhypKolId
aVWEU4KpdGGyOp9N2nkc31Zg8Wu4eLRa0k3GaOJ146b9CgJmUqgedJi/6sDlAXFL
mM472eAxDhVRpZB2wGpXp8HZyVxbjOd/ggCVX5ln6vj8TaRfkdDlhWWTX4Bci/uQ
Ia3M50whXIMxKVHmNKLziIsSbvJ/Bt1/rPoz9CzSBDch665yFK+21cXz3u8dAMsv
fdseYYvJ53tnZi6i8xDlGxsTQFbbWpYNWefs0tQjQGLF67t33NNdX5oC6ihChVjD
OlAxh+sL0TX10eGq8Q+1nQcvyg87QAiipY2yDM3CnFxFLbfn/9rdn28mFxtsNIRd
YQyNsVJN2NNXEPzjAYZe9khsIouvioQlmeX0XWhmuQOPdLbO0otiEGNRtwyUhDnt
ytXwkZ70htwRrAh9UC6GFXwgLkMgTN2E4KRjnOBJCbHSYmjL6YAFPWeeAQFX9fW1
18BVNlsyi2Qyo+v86Jbl50Ld3+64UQukjvNCJn8v/uQJ1O8NT2qfcV6jAZ9Wj273
QSzg1eVCiycmKSL+12EojS4ksSmmBVEuMa4pmFimR2JNEYZnzjyO/egaGgIx2FmQ
Sar14gER2OCeI2dXkrRI8sIiLmOaJOatkHCf9lMebpcuyvq+un8=
=D+Pm
-----END PGP SIGNATURE-----
Reply | Threaded
Open this post in threaded view
|

Re: What is the correct URL for POSTing new data?

Shawn Heisey-2
On 4/15/2018 2:24 PM, Christopher Schultz wrote:
> No, it wouldn't have. It doesn't read any configuration files and
> guesses its way through everything. Simply adding HTTPS support
> required me to modify the script and manually-specify the URL. That's
> why I went through the trouble of explaining so in my initial post.

Gotcha.  I haven't used SSL with Solr myself.  Nobody can get directly
to the Solr servers, so we don't need it.  If somebody is able to
penetrate our systems to the point where they can sniff Solr traffic,
they will already have full access to things far more sensitive than our
search index.

I'll see what I can do about the documentation to make it clear that the
URL given to the post tool needs the request handler path.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

[OT] Re: What is the correct URL for POSTing new data?

Christopher Schultz
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Shawn,

On 4/15/18 4:33 PM, Shawn Heisey wrote:

> On 4/15/2018 2:24 PM, Christopher Schultz wrote:
>> No, it wouldn't have. It doesn't read any configuration files
>> and guesses its way through everything. Simply adding HTTPS
>> support required me to modify the script and manually-specify the
>> URL. That's why I went through the trouble of explaining so in my
>> initial post.
>
> Gotcha.  I haven't used SSL with Solr myself.  Nobody can get
> directly to the Solr servers, so we don't need it.  If somebody is
> able to penetrate our systems to the point where they can sniff
> Solr traffic, they will already have full access to things far more
> sensitive than our search index.

Not necessarily, but that depends entirely upon your environment. We
have a policy of "no privileged network positions" so we don't even
trust our "private networks". Someone at the data center could
inadvertently configure a switch port to suddenly join our VLAN or a
network plug might be incorrectly assigned, etc. So we don't want our
data flying around in a way that can be intercepted.

> I'll see what I can do about the documentation to make it clear
> that the URL given to the post tool needs the request handler
> path.

That would be great. Even poking-around in the Solr web UI doesn't
reveal that path because of all the javascript magic in the interface.

It's unreasonable to expect everyone to read source code in order to
learn how to use tools that don't require direct programming.

Let me take a step back and say that Solr in fact has great
documentation. There are evidently some things it lacks for the
uninitiated.

Thanks,
- -chris
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlrUn5wACgkQHPApP6U8
pFiN+hAAyyO69VE5ZLGFTk4ti2a4L2+Cvtgdag9GYIvUbX72Zhdwlu2OWBSLVbix
Ibx6XNYKfv88IFzYWrFhTQmPS7Ce35H5Wss2YNfSnGZBhbSrifkCDam06zFZlesH
HTSwrBFs32rTB41c4d6WrBR1wgSOirRsIQ4iDitoIRcGhDsdn3y4nANqoSp3/ZmM
hYJEZ57pa7+aon4hbXde5aYKs5NGqkvOg0XAvctscDSPifZ9sijOgwM7DmABoqit
9oUB5s9pvOt0eA1czhI+gAvgscXdReo8A2i2l1hFxGhvaZ0Xnl2OJqjkNSwhUfaB
J9sc/j/LYWSzapBFl6b9fDYAqjxIcwkLtlX/BOOwLzZWa0Gjnj3OkJSfO6pZjtC3
ZQkBC2a8cyBbx3OW7GyyTzCDKQdYceslXiyYvFiqAEJL5u1SpPfbD8l9XdoTRDzL
M+lsmq9NW7ZDDk5VCAzHr6WVrcTGVM9wZPy4lJ+Wi5sOA/VS8QrXP/J+lJg8blID
MhUCstVZHY9MT6NwQxYpfBb/Sc00/sksakhkdSt95GOEnUnz3cxiW/gqaYEq6b6q
LugrqUuLz9Iy+OVPRzIj7dT31JQERpLm1wELcbY0QutI2hPICkIaec5Pw/avdRBW
UmRrESPK7+zOly+j+WVy2noX2+Y6/orje4oP3ETTPRA1Ey4Y2Xs=
=2ntw
-----END PGP SIGNATURE-----