|
hi all
i am trying to index a page which basically returns an xml file. But i dont want it to be accessible for anyone else... the page will basically check for authorization like username and password... e.g the page which return is this : www.blablabla.com/xyz i would like to index the data from here, but i dont want anyone else to access it. so what to do for adding authorization information to solr, order to let it index the data Zeki ama calismiyor... Calissa yapar...
|
|
You could run the HTML import from Tika (see the Solr tutorial on the Solr website). The job that ran Tika would need the user/password of the site to be indexed, but Solr would not. (You might have to write a little script to get the HTML page using curl or wget or Nutch).
Users could then search the index so created, without having access to the actual web site, which I think is what you are asking. But beware: Depending on what / how you index, you may end up revealing information that you did not intend to reveal in the index. -----Original Message----- From: deniz [mailto:[hidden email]] Sent: Wednesday, August 24, 2011 4:38 AM To: [hidden email] Subject: how to deal with URLDatasource which needs authorization? hi all i am trying to index a page which basically returns an xml file. But i dont want it to be accessible for anyone else... the page will basically check for authorization like username and password... e.g the page which return is this : www.blablabla.com/xyz i would like to index the data from here, but i dont want anyone else to access it. so what to do for adding authorization information to solr, order to let it index the data ----- Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-deal-with-URLDatasource-which-needs-authorization-tp3280515p3280515.html Sent from the Solr - User mailing list archive at Nabble.com. |
|
Well, let me explain in details about the problem...
I have a website www.blablabla.com on which users can have profiles, with any kind of information. And each user has an id, something like user_xyz. So www.blablabla.com/user_xyz shows user profile, and www.blablabla.com/solr/index/user_xyz shows an xml file, holding all of the static information about the user. Solr uses www.blablabla.com/solr/index/user_xyz to index the data. Currently www.blablabla.com/solr/index/user_xyz is accessible by everyone, both users and non-users of the site... I would like to put some kind of secuirty thing which only allows solr to access www.blablabla.com/solr/index/user_xyz, and preventing both users and non users to access it. So that link will be a 'solr only' link. is there any other options than restricting ip address for access this link? or that is the only option? Zeki ama calismiyor... Calissa yapar...
|
|
So, the question then seems to be: is there a way to place credentials in the URLDataSource.
There doesn't seem to be an explicit user ID or password ( http://wiki.apache.org/solr/DataImportHandler#Configuration_of_URLDataSource_or_HttpDataSource ) but perhaps you can include them in URL fashion: http://user:password@host/yadayada (See http://www.cs.rutgers.edu/~watrous/user-pass-url.html ). Otherwise, if that doesn't work, I guess you will have to use some other way to get the data other than DIH/URLDataSource (such as Tika, which does support passwords). JRJ -----Original Message----- From: deniz [mailto:[hidden email]] Sent: Thursday, August 25, 2011 8:17 PM To: [hidden email] Subject: RE: how to deal with URLDatasource which needs authorization? Well, let me explain in details about the problem... I have a website www.blablabla.com on which users can have profiles, with any kind of information. And each user has an id, something like user_xyz. So www.blablabla.com/user_xyz shows user profile, and www.blablabla.com/solr/index/user_xyz shows an xml file, holding all of the static information about the user. Solr uses www.blablabla.com/solr/index/user_xyz to index the data. Currently www.blablabla.com/solr/index/user_xyz is accessible by everyone, both users and non-users of the site... I would like to put some kind of secuirty thing which only allows solr to access www.blablabla.com/solr/index/user_xyz, and preventing both users and non users to access it. So that link will be a 'solr only' link. is there any other options than restricting ip address for access this link? or that is the only option? ----- Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-deal-with-URLDatasource-which-needs-authorization-tp3280515p3285579.html Sent from the Solr - User mailing list archive at Nabble.com. |
| Powered by Nabble | See how NAML generates this page |
