existing feature for Lucene ?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

existing feature for Lucene ?

Romain Péchard
Hi,

Can the Lucene software do the following thing with some features : search
selected keywords in selected websites (forums, blogs, and others), store
the URLs where the keywords are found sorted by date and then show them in
tabular form ?

Full description of the feature I'm looking for :

The software has to run at least once a day but the better would be a real
time scanning.

Once the fist scan of a given website done, the next run has to check if the
pages have been updated and then to check if there's any keyword in that new
part. It also needs to return if there's a simple update or if there's a
keyword in it (with tag or colour).

The user front-end would need 2 parts. The 1st to tag the given websites
with the chosen keywords (there would be multiple tags for a given website
and a keywords can be associated with many websites). The 2nd to show the
results in 3 tabular forms:

   1. Website result page: number of time the keywords of a given website
   appears (and the list of the URLs) in table and in pie chart form.
   2. Keywords result page: number of time the keyword appears in all the
   websites associated (and the list of the URLs) in table and in histogram.
   3. Global result page: ratio of the different keywords for all the
   websites (% and number of time) in pie chart and histogram forms.

Best,

--
Romain Péchard
Reply | Threaded
Open this post in threaded view
|

Re: existing feature for Lucene ?

Otis Gospodnetic-2
Hi Romain,

The short answer is: no.
The long answer is: Lucene is a library/toolkit for text indexing and searching.  The functionality that you described is what your application would have to do, and Lucene would provide only indexing and searching.

You may also want to look at Nutch, a web-crawling and searching application that is part of the Lucene project.

Otis

----- Original Message ----
From: Romain Péchard <[hidden email]>
To: [hidden email]
Sent: Friday, May 5, 2006 1:51:40 PM
Subject: existing feature for Lucene ?

Hi,

Can the Lucene software do the following thing with some features : search
selected keywords in selected websites (forums, blogs, and others), store
the URLs where the keywords are found sorted by date and then show them in
tabular form ?

Full description of the feature I'm looking for :

The software has to run at least once a day but the better would be a real
time scanning.

Once the fist scan of a given website done, the next run has to check if the
pages have been updated and then to check if there's any keyword in that new
part. It also needs to return if there's a simple update or if there's a
keyword in it (with tag or colour).

The user front-end would need 2 parts. The 1st to tag the given websites
with the chosen keywords (there would be multiple tags for a given website
and a keywords can be associated with many websites). The 2nd to show the
results in 3 tabular forms:

   1. Website result page: number of time the keywords of a given website
   appears (and the list of the URLs) in table and in pie chart form.
   2. Keywords result page: number of time the keyword appears in all the
   websites associated (and the list of the URLs) in table and in histogram.
   3. Global result page: ratio of the different keywords for all the
   websites (% and number of time) in pie chart and histogram forms.

Best,

--
Romain Péchard




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]