Deliver static html content via solr

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Deliver static html content via solr

Matthias Geiger
Hello,
i have a web application that delivers static html content to the user.

I have been thinking about the possibility to deliver this content from
solr instead of delivering it from the filesystem.
This would prevent the "double" stored content (html files on file
systems + additional solr cores)

Is this a viable approach or a no go?
In case of a no go why do you think it is wrong

In case of the suggestion of a nosql database, what makes noSql superior to
solr?

Regards and Thanks for your time
Reply | Threaded
Open this post in threaded view
|

Re: Deliver static html content via solr

Walter Underwood
Why would you even consider putting static HTML in a search engine? You don’t want to search it.

1. Filesystems are very fast, and operating systems are very good at caching them.
2. Files can be pre-compressed for some web servers (Apache, at least) saving CPU for compression
3. Solr is not a repository, so you need a copy of the files somewhere, maybe in the file system. You cannot get around the “double” copy by keeping the originals in Solr.
4. Filesystems are much, much more reliable than Solr. Solr is very good, but much more complicated than filesystems.

If you really want to fetch blobs by ID and don’t want to use a filesystem, use a database designed for that. That was the original focus of MySQL, for example.

Solr is not a database. Solr is not a repository. A design using Solr for primary storage of data is a broken design.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)


> On Jan 4, 2018, at 8:19 AM, Matthias Geiger <[hidden email]> wrote:
>
> Hello,
> i have a web application that delivers static html content to the user.
>
> I have been thinking about the possibility to deliver this content from
> solr instead of delivering it from the filesystem.
> This would prevent the "double" stored content (html files on file
> systems + additional solr cores)
>
> Is this a viable approach or a no go?
> In case of a no go why do you think it is wrong
>
> In case of the suggestion of a nosql database, what makes noSql superior to
> solr?
>
> Regards and Thanks for your time

Reply | Threaded
Open this post in threaded view
|

Re: Deliver static html content via solr

David Hastings
Its really easy if find for people to start going down this road.  Have to
always remind myself of the hammer and nail analogy.  Use each tool for its
purpose.

On Thu, Jan 4, 2018 at 11:27 AM, Walter Underwood <[hidden email]>
wrote:

> Why would you even consider putting static HTML in a search engine? You
> don’t want to search it.
>
> 1. Filesystems are very fast, and operating systems are very good at
> caching them.
> 2. Files can be pre-compressed for some web servers (Apache, at least)
> saving CPU for compression
> 3. Solr is not a repository, so you need a copy of the files somewhere,
> maybe in the file system. You cannot get around the “double” copy by
> keeping the originals in Solr.
> 4. Filesystems are much, much more reliable than Solr. Solr is very good,
> but much more complicated than filesystems.
>
> If you really want to fetch blobs by ID and don’t want to use a
> filesystem, use a database designed for that. That was the original focus
> of MySQL, for example.
>
> Solr is not a database. Solr is not a repository. A design using Solr for
> primary storage of data is a broken design.
>
> wunder
> Walter Underwood
> [hidden email]
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Jan 4, 2018, at 8:19 AM, Matthias Geiger <[hidden email]>
> wrote:
> >
> > Hello,
> > i have a web application that delivers static html content to the user.
> >
> > I have been thinking about the possibility to deliver this content from
> > solr instead of delivering it from the filesystem.
> > This would prevent the "double" stored content (html files on file
> > systems + additional solr cores)
> >
> > Is this a viable approach or a no go?
> > In case of a no go why do you think it is wrong
> >
> > In case of the suggestion of a nosql database, what makes noSql superior
> to
> > solr?
> >
> > Regards and Thanks for your time
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Deliver static html content via solr

Erik Hatcher-4
In reply to this post by Matthias Geiger
All judgements aside on whether this is a preferred way to go, have a look at /browse and the VelocityResponseWriter (wt=velocity).  It can serve static resources.

I’ve built several prototypes this way that have been effective and business generating.  

   Erik

> On Jan 4, 2018, at 11:19, Matthias Geiger <[hidden email]> wrote:
>
> Hello,
> i have a web application that delivers static html content to the user.
>
> I have been thinking about the possibility to deliver this content from
> solr instead of delivering it from the filesystem.
> This would prevent the "double" stored content (html files on file
> systems + additional solr cores)
>
> Is this a viable approach or a no go?
> In case of a no go why do you think it is wrong
>
> In case of the suggestion of a nosql database, what makes noSql superior to
> solr?
>
> Regards and Thanks for your time
Reply | Threaded
Open this post in threaded view
|

Re: Deliver static html content via solr

Rick Leir-2
Using Velocity, you can have some results-driven HTML served by Solr and
all your JS, CSS etc 'assets' served by Apache from /var/www/html.
Warning: the Velocity learning curve is steep and you still need a
separate front-end web app for security because Velocity is a templating
output filter. Eric, please correct me!

cheers -- Rick


On 01/04/2018 11:45 AM, Erik Hatcher wrote:

> All judgements aside on whether this is a preferred way to go, have a look at /browse and the VelocityResponseWriter (wt=velocity).  It can serve static resources.
>
> I’ve built several prototypes this way that have been effective and business generating.
>
>     Erik
>
>> On Jan 4, 2018, at 11:19, Matthias Geiger <[hidden email]> wrote:
>>
>> Hello,
>> i have a web application that delivers static html content to the user.
>>
>> I have been thinking about the possibility to deliver this content from
>> solr instead of delivering it from the filesystem.
>> This would prevent the "double" stored content (html files on file
>> systems + additional solr cores)
>>
>> Is this a viable approach or a no go?
>> In case of a no go why do you think it is wrong
>>
>> In case of the suggestion of a nosql database, what makes noSql superior to
>> solr?
>>
>> Regards and Thanks for your time

Reply | Threaded
Open this post in threaded view
|

Re: Deliver static html content via solr

Erik Hatcher-4
Rick - fair enough, indeed.

However, for a “static” resource, no Velocity syntax or learning curve needed.   In fact, correcting myself, VelocityResponseWriter isn’t even part of the picture for serving a static resource.

Have a look at example/files - https://github.com/apache/lucene-solr/tree/master/solr/example/files

The <head> of each page (from head.vm) pulls a “static” resource like this:

     <script type="text/javascript" src="#{url_for_solr}/admin/file?file=/velocity/js/jquery.tx3-tag-cloud.js&contentType=text/javascript"></script>

The /admin/file handler will serve the bytes of any resource in config.  

As for separate front-end app - always recommended by me, to be sure for real(!) applications, but for internal, one-off, quick and dirty, prototyping, showing off, or handy utility kinda things I’m not opposed to doing the Simplest Possible Thing That Works.    As for security - VelocityResponseWriter doesn’t itself add any additional security concerns to Solr - it just transforms the Solr response into some textual (often HTML) format, instead of JSON or XML - so it itself isn’t a security concern.   What you need to do for Solr proper for security is a different story, but that is irrelevant to whether wt=velocity is in the mix.   It can actually be handy to use wt=velocity from inside a real app - it has been used it for generating e-mails in production systems and simply returning something formatted textually the way you want without an app template tier having to do so.   And Velocity, true to name, ain’t slow.

For more on /browse, VrW, and example/files usage of those, check out https://lucidworks.com/2015/12/08/browse-new-improved-solr-5/

        Erik



> On Jan 5, 2018, at 4:19 AM, Rick Leir <[hidden email]> wrote:
>
> Using Velocity, you can have some results-driven HTML served by Solr and all your JS, CSS etc 'assets' served by Apache from /var/www/html. Warning: the Velocity learning curve is steep and you still need a separate front-end web app for security because Velocity is a templating output filter. Eric, please correct me!
>
> cheers -- Rick
>
>
> On 01/04/2018 11:45 AM, Erik Hatcher wrote:
>> All judgements aside on whether this is a preferred way to go, have a look at /browse and the VelocityResponseWriter (wt=velocity).  It can serve static resources.
>>
>> I’ve built several prototypes this way that have been effective and business generating.
>>
>>    Erik
>>
>>> On Jan 4, 2018, at 11:19, Matthias Geiger <[hidden email]> wrote:
>>>
>>> Hello,
>>> i have a web application that delivers static html content to the user.
>>>
>>> I have been thinking about the possibility to deliver this content from
>>> solr instead of delivering it from the filesystem.
>>> This would prevent the "double" stored content (html files on file
>>> systems + additional solr cores)
>>>
>>> Is this a viable approach or a no go?
>>> In case of a no go why do you think it is wrong
>>>
>>> In case of the suggestion of a nosql database, what makes noSql superior to
>>> solr?
>>>
>>> Regards and Thanks for your time
>

Reply | Threaded
Open this post in threaded view
|

Re: Deliver static html content via solr

Rick Leir-2
Erik, Sorry I didn't mean to say Velocity has a security problem. I am just thinking that people will see it in action and think it is a full answer to a front end web app, though it has no input filtering or range checking ( as an output template system, natcch).
What do you recommend for a very basic input filter in front of Solr with Velocity?
Thanks
Rick

On January 5, 2018 10:11:31 AM EST, Erik Hatcher <[hidden email]> wrote:

>Rick - fair enough, indeed.
>
>However, for a “static” resource, no Velocity syntax or learning curve
>needed.   In fact, correcting myself, VelocityResponseWriter isn’t even
>part of the picture for serving a static resource.
>
>Have a look at example/files -
>https://github.com/apache/lucene-solr/tree/master/solr/example/files
>
>The <head> of each page (from head.vm) pulls a “static” resource like
>this:
>
><script type="text/javascript"
>src="#{url_for_solr}/admin/file?file=/velocity/js/jquery.tx3-tag-cloud.js&contentType=text/javascript"></script>
>
>The /admin/file handler will serve the bytes of any resource in config.
>
>
>As for separate front-end app - always recommended by me, to be sure
>for real(!) applications, but for internal, one-off, quick and dirty,
>prototyping, showing off, or handy utility kinda things I’m not opposed
>to doing the Simplest Possible Thing That Works.    As for security -
>VelocityResponseWriter doesn’t itself add any additional security
>concerns to Solr - it just transforms the Solr response into some
>textual (often HTML) format, instead of JSON or XML - so it itself
>isn’t a security concern.   What you need to do for Solr proper for
>security is a different story, but that is irrelevant to whether
>wt=velocity is in the mix.   It can actually be handy to use
>wt=velocity from inside a real app - it has been used it for generating
>e-mails in production systems and simply returning something formatted
>textually the way you want without an app template tier having to do
>so.   And Velocity, true to name, ain’t slow.
>
>For more on /browse, VrW, and example/files usage of those, check out
>https://lucidworks.com/2015/12/08/browse-new-improved-solr-5/
>
> Erik
>
>
>
>> On Jan 5, 2018, at 4:19 AM, Rick Leir <[hidden email]> wrote:
>>
>> Using Velocity, you can have some results-driven HTML served by Solr
>and all your JS, CSS etc 'assets' served by Apache from /var/www/html.
>Warning: the Velocity learning curve is steep and you still need a
>separate front-end web app for security because Velocity is a
>templating output filter. Eric, please correct me!
>>
>> cheers -- Rick
>>
>>
>> On 01/04/2018 11:45 AM, Erik Hatcher wrote:
>>> All judgements aside on whether this is a preferred way to go, have
>a look at /browse and the VelocityResponseWriter (wt=velocity).  It can
>serve static resources.
>>>
>>> I’ve built several prototypes this way that have been effective and
>business generating.
>>>
>>>    Erik
>>>
>>>> On Jan 4, 2018, at 11:19, Matthias Geiger <[hidden email]>
>wrote:
>>>>
>>>> Hello,
>>>> i have a web application that delivers static html content to the
>user.
>>>>
>>>> I have been thinking about the possibility to deliver this content
>from
>>>> solr instead of delivering it from the filesystem.
>>>> This would prevent the "double" stored content (html files on file
>>>> systems + additional solr cores)
>>>>
>>>> Is this a viable approach or a no go?
>>>> In case of a no go why do you think it is wrong
>>>>
>>>> In case of the suggestion of a nosql database, what makes noSql
>superior to
>>>> solr?
>>>>
>>>> Regards and Thanks for your time
>>

--
Sorry for being brief. Alternate email is rickleir at yahoo dot com
Reply | Threaded
Open this post in threaded view
|

Re: Deliver static html content via solr

Shawn Heisey-2
On 1/5/2018 6:26 PM, Rick Leir wrote:
> Erik, Sorry I didn't mean to say Velocity has a security problem. I am just thinking that people will see it in action and think it is a full answer to a front end web app, though it has no input filtering or range checking ( as an output template system, natcch).
> What do you recommend for a very basic input filter in front of Solr with Velocity?

One thing to keep in mind is that Solr should not be exposed to end users.

The velocity implementation that ships with Solr as the /browse handler
requires the user to have direct access to Solr, because the requests to
Solr are made by the user's browser.  The /browse handler is a good
demonstration of what Solr can do, but it is not suitable for production.

I'm not familiar with velocity at all, but I do think anything that
requires exposing Solr to an end user is a possible security problem.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Deliver static html content via solr

Rick Leir-2
Shawn
The easy solution is to put something like solr-security-proxy [1] in front of a Solr/Velocity app, and this is working for me. However, this has a blacklist for Solr parms and I think it should have a whitelist instead. Also, it does not check ranges or filter chars. Is this proxy adequate for use on the open internet? In particular, what character filtering should I add to it?
Thanks
Rick

[1] https://github.com/dergachev/solr-security-proxy

On January 6, 2018 11:55:35 AM EST, Shawn Heisey <[hidden email]> wrote:

>On 1/5/2018 6:26 PM, Rick Leir wrote:
>> Erik, Sorry I didn't mean to say Velocity has a security problem. I
>am just thinking that people will see it in action and think it is a
>full answer to a front end web app, though it has no input filtering or
>range checking ( as an output template system, natcch).
>> What do you recommend for a very basic input filter in front of Solr
>with Velocity?
>
>One thing to keep in mind is that Solr should not be exposed to end
>users.
>
>The velocity implementation that ships with Solr as the /browse handler
>
>requires the user to have direct access to Solr, because the requests
>to
>Solr are made by the user's browser.  The /browse handler is a good
>demonstration of what Solr can do, but it is not suitable for
>production.
>
>I'm not familiar with velocity at all, but I do think anything that
>requires exposing Solr to an end user is a possible security problem.
>
>Thanks,
>Shawn

--
Sorry for being brief. Alternate email is rickleir at yahoo dot com
Reply | Threaded
Open this post in threaded view
|

Re: Deliver static html content via solr

Shawn Heisey-2
On 1/7/2018 1:30 PM, Rick Leir wrote:
> The easy solution is to put something like solr-security-proxy [1] in front of a Solr/Velocity app, and this is working for me. However, this has a blacklist for Solr parms and I think it should have a whitelist instead. Also, it does not check ranges or filter chars. Is this proxy adequate for use on the open internet? In particular, what character filtering should I add to it?

I don't have information like that readily available.  I would be
worried with any proxy software that something important had been
forgotten and would open the door to either changing the index or not
blocking denial of service requests.

My recommendation is to never expose Solr to the Internet, or to anybody
who is not responsible for its care.  There should always be some kind
of front end server-side software that handles searching on behalf of
the user.

Even with those precautions, clever users will probably be able to
figure out how to send denial of service queries, but without direct
access to Solr's API, it would not be as vulnerable.

Thanks,
Shawn