[jira] Created: (NUTCH-294) Topic-maps of related searchwords

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-294) Topic-maps of related searchwords

Michael Gibney (Jira)
Topic-maps of related searchwords
---------------------------------

         Key: NUTCH-294
         URL: http://issues.apache.org/jira/browse/NUTCH-294
     Project: Nutch
        Type: New Feature

  Components: searcher  
    Reporter: Stefan Neufeind


Would it be possible to offer a user  "topic-maps"? It's when you search for something and get topic-related words that might also be of interest for you. I wonder if that's somehow possible with the ngram-index for "did you mean" (see separate feature-enhancement-bug for this), but we'd need to have a relation between words (in what context do they occur).

For the webfrontend usually trees are used  - which for some users offer quite impressive eye-candy :-) E.g. see this advertisement by Novell where I've just seen a similar "topic-map" as well:
http://www.novell.com/de-de/company/advertising/defineyouropen.html

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-294) Topic-maps of related searchwords

Michael Gibney (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-294?page=comments#action_12414597 ]

Chris A. Mattmann commented on NUTCH-294:
-----------------------------------------

Hi Stefan,

 I'm wondering if this issue is in any way related to the existing clustering-carrot plugin submitted by D. Weiss. The clustering carrot, as far as I understand it, clusters together topics for a particular query, grouping web pages returned by the query in clusters organized by what could be considered a topic (I may not quite understand exactly what carrot does, but that was my impression). Does this plugin fulfill your desired requirements for such a capability?

Thanks,
 Chris


> Topic-maps of related searchwords
> ---------------------------------
>
>          Key: NUTCH-294
>          URL: http://issues.apache.org/jira/browse/NUTCH-294
>      Project: Nutch
>         Type: New Feature

>   Components: searcher
>     Reporter: Stefan Neufeind

>
> Would it be possible to offer a user  "topic-maps"? It's when you search for something and get topic-related words that might also be of interest for you. I wonder if that's somehow possible with the ngram-index for "did you mean" (see separate feature-enhancement-bug for this), but we'd need to have a relation between words (in what context do they occur).
> For the webfrontend usually trees are used  - which for some users offer quite impressive eye-candy :-) E.g. see this advertisement by Novell where I've just seen a similar "topic-map" as well:
> http://www.novell.com/de-de/company/advertising/defineyouropen.html

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-294) Topic-maps of related searchwords

Michael Gibney (Jira)
In reply to this post by Michael Gibney (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-294?page=comments#action_12414653 ]

Stefan Neufeind commented on NUTCH-294:
---------------------------------------

I'm not sure. On a quick run I wasn't able to get the "clustering-carrot2"-plugin to work - though I thought I simply need to include it.
Maybe somebody else already worked with it and could comment if that plugin is within scope of this feature-request.
To what I found about carror2 it's also used to cluster data from multiple search-engines - not sure how that relates to topic-clusters.

> Topic-maps of related searchwords
> ---------------------------------
>
>          Key: NUTCH-294
>          URL: http://issues.apache.org/jira/browse/NUTCH-294
>      Project: Nutch
>         Type: New Feature

>   Components: searcher
>     Reporter: Stefan Neufeind

>
> Would it be possible to offer a user  "topic-maps"? It's when you search for something and get topic-related words that might also be of interest for you. I wonder if that's somehow possible with the ngram-index for "did you mean" (see separate feature-enhancement-bug for this), but we'd need to have a relation between words (in what context do they occur).
> For the webfrontend usually trees are used  - which for some users offer quite impressive eye-candy :-) E.g. see this advertisement by Novell where I've just seen a similar "topic-map" as well:
> http://www.novell.com/de-de/company/advertising/defineyouropen.html

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-294) Topic-maps of related searchwords

Michael Gibney (Jira)
In reply to this post by Michael Gibney (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-294?page=comments#action_12414960 ]

Dawid Weiss commented on NUTCH-294:
-----------------------------------

Ehm, sorry I'm so late with this -- tons of work.

1) Stefan, if you can't get it working, speak up what is not working (exceptions? anything else?). The only thing you need to do is enable the clustering plugin in your configuration -- there should be a checkbox next to your search box, tick that and you should be able to see clustered results when you perform a query.

2) Now, having said that, I don't think that's what you're after. Carrot2 performs clustering of search results based solely on the information contained in snippets retrieved from documents (in other words, there is NO ontology and NO predefined information, everything is constructed dynamically). If you're looking for topic-maps then I guess you're after a certain type of classification engine that could pick relevant categories and display them along with search results. It's not what (the open source) Carrot2 does.

> Topic-maps of related searchwords
> ---------------------------------
>
>          Key: NUTCH-294
>          URL: http://issues.apache.org/jira/browse/NUTCH-294
>      Project: Nutch
>         Type: New Feature

>   Components: searcher
>     Reporter: Stefan Neufeind

>
> Would it be possible to offer a user  "topic-maps"? It's when you search for something and get topic-related words that might also be of interest for you. I wonder if that's somehow possible with the ngram-index for "did you mean" (see separate feature-enhancement-bug for this), but we'd need to have a relation between words (in what context do they occur).
> For the webfrontend usually trees are used  - which for some users offer quite impressive eye-candy :-) E.g. see this advertisement by Novell where I've just seen a similar "topic-map" as well:
> http://www.novell.com/de-de/company/advertising/defineyouropen.html

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-294) Topic-maps of related searchwords

Michael Gibney (Jira)
In reply to this post by Michael Gibney (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-294?page=comments#action_12414962 ]

Stefan Neufeind commented on NUTCH-294:
---------------------------------------

1) I enabled it in plugins.include and restarted tomcat - but there is no checkbox for me.

2) My "idea" was if maybe an index of top-keywords (from "did you mean"-plugin possibly?) could be used and a query could be run on it like "the current search we searched for appeared in NNN pages, where the top10-top-keywords are ...". Wouldn't that work as a topicmap?

> Topic-maps of related searchwords
> ---------------------------------
>
>          Key: NUTCH-294
>          URL: http://issues.apache.org/jira/browse/NUTCH-294
>      Project: Nutch
>         Type: New Feature

>   Components: searcher
>     Reporter: Stefan Neufeind

>
> Would it be possible to offer a user  "topic-maps"? It's when you search for something and get topic-related words that might also be of interest for you. I wonder if that's somehow possible with the ngram-index for "did you mean" (see separate feature-enhancement-bug for this), but we'd need to have a relation between words (in what context do they occur).
> For the webfrontend usually trees are used  - which for some users offer quite impressive eye-candy :-) E.g. see this advertisement by Novell where I've just seen a similar "topic-map" as well:
> http://www.novell.com/de-de/company/advertising/defineyouropen.html

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-294) Topic-maps of related searchwords

Michael Gibney (Jira)
In reply to this post by Michael Gibney (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-294?page=comments#action_12415094 ]

Dawid Weiss commented on NUTCH-294:
-----------------------------------

Well, you certainly have something wrong in your configuration then. I just tried
with the head revision. My nutch-site looks like this:

[...]
<property>
  <name>plugin.includes</name>
  <value>clustering-carrot2|protocol-http|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic</value>
  <description>Regular expression naming plugin directory names to
  [...]
  </description>
</property>
[...]

Start Tomcat and issue any query that returns results. Look in the log files for:

2006-06-07 09:29:35 org.apache.nutch.plugin.PluginRepository displayStatus
INFO: Online Search Results Clustering using Carrot2's Lingo component (clustering-carrot2)

2006-06-07 09:29:35 org.apache.nutch.clustering.OnlineClustererFactory getOnlineClusterer
INFO: Using the first clustering extension found: Carrot2-Lingo

Ok, the results page should show a "clustering" option next to "Search" button (it does
on my installation). Select it and rerun the query. On the right side you'll have clusters
(titles and three sample documents from each cluster are shown).

As for your idea, I still don't think Lingo is what you need... Of course you can try feeding it with unrelated keywords and then see what comes out, but I don't think it's the right approach.


> Topic-maps of related searchwords
> ---------------------------------
>
>          Key: NUTCH-294
>          URL: http://issues.apache.org/jira/browse/NUTCH-294
>      Project: Nutch
>         Type: New Feature

>   Components: searcher
>     Reporter: Stefan Neufeind

>
> Would it be possible to offer a user  "topic-maps"? It's when you search for something and get topic-related words that might also be of interest for you. I wonder if that's somehow possible with the ngram-index for "did you mean" (see separate feature-enhancement-bug for this), but we'd need to have a relation between words (in what context do they occur).
> For the webfrontend usually trees are used  - which for some users offer quite impressive eye-candy :-) E.g. see this advertisement by Novell where I've just seen a similar "topic-map" as well:
> http://www.novell.com/de-de/company/advertising/defineyouropen.html

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira