|
Hi
I am at the start of my uni dissertation on interactive query expansion. At the mo I am using an Ajax framework and Wordnet to suggest alternative or additional search terms based on the user's original query. The webpage is updated as the user types. I now which to integrate my system into a search engine and Nutch seems suitable. I have successfully completed the whole-web crawl tutorial. I have two questions: 1. I wish to formulate a boolean query using the OR operator to search on all of the alternative search terms Wordnet has suggested. I have found no documentation neither in the Wiki or in the mailing list archive. Are boolean queries possible in Nutch? 2. How do I extract all index terms from nutch, and possibly their tf/idf score too? I inted to use this information to have a function similar to Google Suggest, in that as you type, suggested terms will appear based on terms actually in the index. I would want to put the terms and their associated score into a database like postgresql. Any pointers would be much appreciated! Regards, Nick. |
|
Hi Nick,
For implementing Boolean OR queries, you will have to write your own plugin. Look at the code of query-basic and query-site for example code of how to write a new query plugin. Look at the javadocs of org.apache.lucene.search.BooleanQuery for details. By making a query as non-required, you will get an OR behavior. [The API for adding a new query term is add(Query query, boolean required, boolean prohibited). So you will specify 'false' for required]. For your second question, you might want to start by looking at this email: http://mail-archives.apache.org/mod_mbox/jakarta-lucene-dev/200309-incomplete.mbox/%3C3F6A2EEC.8010007@...%3E Regards, Praveen. On 7/11/05, Nick Rowlands <[hidden email]> wrote: > Hi > > I am at the start of my uni dissertation on interactive query expansion. At > the mo I am using an Ajax framework and Wordnet to suggest alternative or > additional search terms based on the user's original query. The webpage is > updated as the user types. I now which to integrate my system into a search > engine and Nutch seems suitable. I have successfully completed the whole-web > crawl tutorial. I have two questions: > > 1. I wish to formulate a boolean query using the OR operator to search on > all of the alternative search terms Wordnet has suggested. I have found no > documentation neither in the Wiki or in the mailing list archive. Are > boolean queries possible in Nutch? > > 2. How do I extract all index terms from nutch, and possibly their tf/idf > score too? I inted to use this information to have a function similar to > Google Suggest, in that as you type, suggested terms will appear based on > terms actually in the index. I would want to put the terms and their > associated score into a database like postgresql. > > Any pointers would be much appreciated! > > Regards, > Nick. > > |
|
Dear List,
How to determine: How many real (indexed, not deleted) pages are in a segment? I think if we have some backends, we need to balance the segments between them. I firstly try the fetched number of pages, but this is not real balance. I used the lukeall.jar tool on my winxp client, but on the servers can't run graphical interfaces. Regards, Ferenc |
|
[hidden email] wrote:
> Dear List, > > How to determine: How many real (indexed, not deleted) pages are in a > segment? > I think if we have some backends, we need to balance the segments > between them. > I firstly try the fetched number of pages, but this is not real balance. > I used the lukeall.jar tool on my winxp client, but on the servers can't > run graphical interfaces. You can use two tools: 1. nutch segread -list : this will give you the total number of records in a segment. Note, however, that this includes also pages which failed to be fetched or parsed. 2. You can use LuCli (in lucene/contrib) for a command-line frontend to Lucene. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com |
|
Hi
Does anyone know of a way that you can get the "real" number of documents shwoing/returned which are displayed to the user for a particular search when the persite variable is active (not 0). As opposed to total documents returned. Can anyone can understand what I mean? _________________________________________________________________ Need software for your hardware? Click here http://www.asg.co.za |
| Powered by Nabble | See how NAML generates this page |
