how to deal with URLDatasource which needs authorization?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

how to deal with URLDatasource which needs authorization?

deniz
hi all

i am trying to index a page which basically returns an xml file. But i dont want it to be accessible for anyone else... the page will basically check for authorization like username and password...

e.g

the page which return is this :

www.blablabla.com/xyz

i would like to index the data from here, but i dont want anyone else to access it.

so what to do for adding authorization information to solr, order to let it index the data
Zeki ama calismiyor... Calissa yapar...
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

RE: how to deal with URLDatasource which needs authorization?

Jaeger, Jay - DOT
You could run the HTML import from Tika (see the Solr tutorial on the Solr website).  The job that ran Tika would need the user/password of the site to be indexed, but Solr would not.  (You might have to write a little script to get the HTML page using curl or wget or Nutch).

Users could then search the index so created, without having access to the actual web site, which I think is what you are asking.

But beware:  Depending on what / how you index, you may end up revealing information that you did not intend to reveal in the index.

-----Original Message-----
From: deniz [mailto:[hidden email]]
Sent: Wednesday, August 24, 2011 4:38 AM
To: [hidden email]
Subject: how to deal with URLDatasource which needs authorization?

hi all

i am trying to index a page which basically returns an xml file. But i dont
want it to be accessible for anyone else... the page will basically check
for authorization like username and password...

e.g

the page which return is this :

www.blablabla.com/xyz

i would like to index the data from here, but i dont want anyone else to
access it.

so what to do for adding authorization information to solr, order to let it
index the data

-----
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: http://lucene.472066.n3.nabble.com/how-to-deal-with-URLDatasource-which-needs-authorization-tp3280515p3280515.html
Sent from the Solr - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

RE: how to deal with URLDatasource which needs authorization?

deniz
Well, let me explain in details about the problem...

I have a website www.blablabla.com on which users can have profiles, with any kind of information. And each user has an id, something like user_xyz. So www.blablabla.com/user_xyz shows user profile, and www.blablabla.com/solr/index/user_xyz shows an xml file, holding all of the static information about the user. Solr uses www.blablabla.com/solr/index/user_xyz to index the data.

Currently www.blablabla.com/solr/index/user_xyz is accessible by everyone, both users and non-users of the site...

I would like to put some kind of secuirty thing which only allows solr to access www.blablabla.com/solr/index/user_xyz, and preventing both users and non users to access it. So that link will be a 'solr only' link.

is there any other options than restricting ip address for access this link? or that is the only option?
Zeki ama calismiyor... Calissa yapar...
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

RE: how to deal with URLDatasource which needs authorization?

Jaeger, Jay - DOT
So, the question then seems to be:  is there a way to place credentials in the URLDataSource.

There doesn't seem to be an explicit user ID or password ( http://wiki.apache.org/solr/DataImportHandler#Configuration_of_URLDataSource_or_HttpDataSource ) but perhaps you can include them in URL fashion:

http://user:password@host/yadayada

(See http://www.cs.rutgers.edu/~watrous/user-pass-url.html ).

Otherwise, if that doesn't work, I guess you will have to use some other way to get the data other than DIH/URLDataSource (such as Tika, which does support passwords).

JRJ

-----Original Message-----
From: deniz [mailto:[hidden email]]
Sent: Thursday, August 25, 2011 8:17 PM
To: [hidden email]
Subject: RE: how to deal with URLDatasource which needs authorization?

Well, let me explain in details about the problem...

I have a website www.blablabla.com on which users can have profiles, with
any kind of information. And each user has an id, something like user_xyz.
So www.blablabla.com/user_xyz shows user profile, and
www.blablabla.com/solr/index/user_xyz shows an xml file, holding all of the
static information about the user. Solr uses
www.blablabla.com/solr/index/user_xyz to index the data.

Currently www.blablabla.com/solr/index/user_xyz is accessible by everyone,
both users and non-users of the site...

I would like to put some kind of secuirty thing which only allows solr to
access www.blablabla.com/solr/index/user_xyz, and preventing both users and
non users to access it. So that link will be a 'solr only' link.

is there any other options than restricting ip address for access this link?
or that is the only option?

-----
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: http://lucene.472066.n3.nabble.com/how-to-deal-with-URLDatasource-which-needs-authorization-tp3280515p3285579.html
Sent from the Solr - User mailing list archive at Nabble.com.
Loading...