Re: [Nutch-general] Cached.jsp for image content type (OFF TOPIC, LONGISH)

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Re: [Nutch-general] Cached.jsp for image content type (OFF TOPIC, LONGISH)

Thomas Delnoij-3
Ok, here we go..

1) Using ImageJ

Requires latest ImageJ upgrade, available here:

* Stores an image on the fileSystem under a randomly generated name. Returns
* the name of the thumb.
* @param imageUrl The URL of the image
* @param imageDir The directory the image is stored in
* @return The name of the generated thumbnail
public static String saveThumbnail(URL imageUrl, File imageDir) {
    LOG.fine("creating thumbnail form image: " + imageUrl.toString());
    String fileName = UUID.randomUUID().toString();
    String extension = imageUrl.substring(imageUrl.lastIndexOf(".") + 1,
    String outfile = imageDir.toString() + File.separator + fileName + "."
                                + extension.toLowerCase();
    try {
        ImagePlus ip = new ImagePlus(imageUrl.toString());
        ImageProcessor imp = ip.getProcessor();
        ImageProcessor imp2 = imp.resize(THUMB_WIDTH, THUMB_HEIGHT);
        (new ContrastEnhancer()).stretchHistogram(imp2, 0.05);
        ImagePlus ip2 = new ImagePlus("Resized Image", imp2);
        FileSaver fs = new FileSaver(ip2);
        } catch (RuntimeException e) {
            LOG.warning("RuntimeException thrown");
            return null;
        // Test if the Thumbnail was actually written to the Filesystem
        File file = new File(outfile);
        if (file.exists()){
            LOG.fine("Succesfully created thumbnail " + fileName + "."
+ extension);
            return (fileName + "." + extension);
        } else {
            LOG.warning("Failed to create thumbnail for URL " + imageUrl);
            return null;

The fileName that is returned is added to the metadata and later
stored in the index for later serving the thumb from an Apache

This can probably be improved in many ways. One thing I was thinking
of adding was creating and saving the thumb in a separate thread using
JDK 1.5's ThreadPool facility so as to minimize the impact on the
fetch speed.

2) Using Gimp.

Attached is a Gimp module that can be used to generate thumbnails from
images stored in the filesystem but should be started in a seperate
process using something like this:

for x in *.JPG
if [ -f "$x" ] ; then
        echo "..processing $x"
        # generate the thumbnail
        gimp -c -d -i -b "(script-fu-image-process \"$x\" 100 100)" "(gimp-quit 0)

This generates very nice quallity thumbs but as I said should be
started seperately from the Nutch process.

HTH Thomas

On 6/16/06, [hidden email] <[hidden email]> wrote:

> Just a +1 for sending your thumbnail-creating code.
> Otis
> ----- Original Message ----
> From: TDLN <[hidden email]>
> To: [hidden email]
> Sent: Wednesday, June 14, 2006 5:07:34 PM
> Subject: Re: [Nutch-general] Cached.jsp for image content type
> Take a look at the ImageJ library;
> I don't have access to my repository now but as soon as I have I will
> send you the code I am using to create thumbnails.
> Rgrds, Thomas
> On 6/12/06, Marco Pereira <[hidden email]> wrote:
> > Hi everybody,
> >
> >  As I have said on another message, I'm trying to get Nutch search for
> > images.
> >  Till now it's searching alt and title tags and indexing the image content
> > (the one you see when you open a image on NotePad for example).
> >  Now that I've indexed almost 3 million images, I am trying to generate the
> > thumbnails for them. So I started looking at cached.jsp and change some code
> > to make it show the image. For a test, I've set contentType="image/gif" and
> > changed the "if"s on the jsp so that it could be shown. But I'm just getting
> > errors. I think that maybe the index proccess corrupted the image content or
> > that it is needed to performe some transformations on the content in order
> > to show the image.
> >  Anyway, do you have any ideas on that?
> >  Talking about what I did, I think it would be better to generate the
> > thumbnails before doc.add for the contet field. But as I haven't tought of
> > that before and as I yet don't think if that would work, I didn't do that.
> >  As I've also said on the other message, I planned on using php and GD
> > library to generate the thumbnails. But, if I have the content already
> > indexed, why not using it? (if it can be done).
> >
> > Thanks in advance.
> > Marco Vanossi
> >
> >
> _______________________________________________
> Nutch-general mailing list
> [hidden email]

gimp_image_process.scm (1K) Download Attachment