Search Particulars

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Search Particulars

Vanderdray, Jacob
        I've written some extensions that allow you to define meta tags
that you would like included in nutch indexing and searching.  The meta
tag names are defined in nutch-site.xml.  In general this seems to be
working, but I'm seeing some problems with searching.

        I've added the keywords meta tag as something I want to pay
attention to.  If I do a search on a term that appears both in the
content of the page and in the keywords meta tag, I find the page in the
list of results.  Choosing "explain" shows that the term was found in
the keywords field and that influenced the ranking.  That's how I want
it to work so far.

        The problem is when I search for a term that appears in the
keywords field, but not in any other field.  Then the page doesn't get
returned.  Is there a setting that requires certain fields to contain
matches?  If so where is that setting?

Thanks,
Jake.
Reply | Threaded
Open this post in threaded view
|

Re: Search Particulars

Jérôme Charron
>         The problem is when I search for a term that appears in the
> keywords field, but not in any other field.  Then the page doesn't get
> returned.  Is there a setting that requires certain fields to contain
> matches?  If so where is that setting?


The solution is perhaps there:
http://www.nabble.com/developing-a-parse--index--query--plugin-set-t413270.html#a1134289

Regards

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/
Reply | Threaded
Open this post in threaded view
|

RE: Search Particulars

Vanderdray, Jacob
In reply to this post by Vanderdray, Jacob

Jérôme,

        Thanks for think link you sent earlier.  I've read through that discussion and I think my problem must be related, but I'm still banging my head against it.  I'm pretty sure that there's something I'm missing about how the queries get added together.

        As it stands my query filter seems to modify the ranking, but doesn't affect the results returned.  I've pasted my code bellow and I'd appreciate it if someone could take a look and let me know if they see the problem.

Thanks,
Jake.

public class MetaQueryFilter implements QueryFilter {

  private static final Logger LOG = LogFormatter
    .getLogger(MetaQueryFilter.class.getName());

  /**
   * Need to pull out the list of meta tags from the configuration
   */
  private static String [] META_TAGS =
                  NutchConf.get().getStrings("meta.names");

  /**
   * We're going to go through and create search filters for each of the meta-tags we were asked to index.
   */
  public BooleanQuery filter(Query input, BooleanQuery output) {
                  // If no meta-tags were specified in the conf file, then don't bother wasting cycles
                  if ( META_TAGS.equals(null) ) {
                                  return output;
                  }

                  addTerms(input, output);
                  return output;
  }

  private static void addTerms(Query input, BooleanQuery output) {
                  Clause[] clauses = input.getClauses();
                  for (int x = 0; x < clauses.length; x++) {
                                  Clause c = clauses[x];
                                  if (!c.getField().equals(Clause.DEFAULT_FIELD))
                                          continue; // skip non-default fields

                                  // These are the fields we're interested in indexing
                                  String [] tagsToIndex = META_TAGS;

                                  for (int i = 0; i < tagsToIndex.length; ++i) {
                                                  LOG.info("Meta Query Filter: Adding a search for " + tagsToIndex[i]);

                                                  Term term = new Term(tagsToIndex[i], c.getTerm().toString());

                                                  // add a lucene PhraseQuery for this tag
                                                  PhraseQuery metaQuery = new PhraseQuery();
                                                  metaQuery.setSlop(0);
                                                  metaQuery.add(term);

                                                  // set boost
                                                  metaQuery.setBoost(2.0f);

                                                  // add it as a specified query
                                                  output.add(metaQuery, false, false);
                                  }
                  }
  }

}
Reply | Threaded
Open this post in threaded view
|

Re: Search Particulars

Raghavendra Prabhu
Okie
I am new to this topic

But do u add metatags to a particular field

if so shud not that field also appear as in the field path

The normal nutch maybe does not look at that field at all ? Maybe this is
the reason ?

Unless you give the metadatafield and search for the keyword

Rgds
Prabhu


On 2/23/06, Vanderdray, Jacob <[hidden email]> wrote:

>
>
> Jérôme,
>
>        Thanks for think link you sent earlier.  I've read through that
> discussion and I think my problem must be related, but I'm still banging my
> head against it.  I'm pretty sure that there's something I'm missing about
> how the queries get added together.
>
>        As it stands my query filter seems to modify the ranking, but
> doesn't affect the results returned.  I've pasted my code bellow and I'd
> appreciate it if someone could take a look and let me know if they see the
> problem.
>
> Thanks,
> Jake.
>
> public class MetaQueryFilter implements QueryFilter {
>
> private static final Logger LOG = LogFormatter
>    .getLogger(MetaQueryFilter.class.getName());
>
> /**
>   * Need to pull out the list of meta tags from the configuration
>   */
> private static String [] META_TAGS =
>                  NutchConf.get().getStrings("meta.names");
>
> /**
>   * We're going to go through and create search filters for each of the
> meta-tags we were asked to index.
>   */
> public BooleanQuery filter(Query input, BooleanQuery output) {
>                  // If no meta-tags were specified in the conf file, then
> don't bother wasting cycles
>                  if ( META_TAGS.equals(null) ) {
>                                  return output;
>                  }
>
>                  addTerms(input, output);
>                  return output;
> }
>
> private static void addTerms(Query input, BooleanQuery output) {
>                  Clause[] clauses = input.getClauses();
>                  for (int x = 0; x < clauses.length; x++) {
>                                  Clause c = clauses[x];
>                                  if (!c.getField().equals(
> Clause.DEFAULT_FIELD))
>                                          continue;             // skip
> non-default fields
>
>                                  // These are the fields we're interested
> in indexing
>                                  String [] tagsToIndex = META_TAGS;
>
>                                  for (int i = 0; i < tagsToIndex.length;
> ++i) {
>                                                  LOG.info("Meta Query
> Filter: Adding a search for " + tagsToIndex[i]);
>
>                                                  Term term = new
> Term(tagsToIndex[i], c.getTerm().toString());
>
>                                                  // add a lucene
> PhraseQuery for this tag
>                                                  PhraseQuery metaQuery =
> new PhraseQuery();
>                                                  metaQuery.setSlop(0);
>                                                  metaQuery.add(term);
>
>                                                  // set boost
>                                                  metaQuery.setBoost(2.0f);
>
>                                                  // add it as a specified
> query
>                                                  output.add(metaQuery,
> false, false);
>                                  }
>                  }
> }
>
> }
>
Reply | Threaded
Open this post in threaded view
|

RE: Search Particulars

Vanderdray, Jacob
In reply to this post by Vanderdray, Jacob
        I'm not sure I understand what you're getting at.  In this case
I've added a comma separated list of names of meta tags that I want to
index and search against.  I've written a parse filter, an index filter
and this query filter that all read in that list of meta tags from the
nutch-site.xml file.  

        That much seems to work.  In the explain link I can see that the
fields are in the index and the ranking of pages are affected by them,
but if I search for a term which is in one of the meta tags, but not in
any other fields I get 0 results.

Thanks,

Jake.

-----Original Message-----
From: Raghavendra Prabhu [mailto:[hidden email]]
Sent: Thursday, February 23, 2006 1:19 PM
To: [hidden email]
Subject: Re: Search Particulars

Okie
I am new to this topic

But do u add metatags to a particular field

if so shud not that field also appear as in the field path

The normal nutch maybe does not look at that field at all ? Maybe this
is
the reason ?

Unless you give the metadatafield and search for the keyword

Rgds
Prabhu



Reply | Threaded
Open this post in threaded view
|

Re: Search Particulars

Raghavendra Prabhu
Let me say u have meta1,meta2 in the meta tags in nutch-site.xml

The code which u have written attempts to find this meta data(either meta1
or meta2) in nutch fields

When you index data,the index filter propably writes the metadata which you
get it into some field.Do you write this metadata which you gather into a
separate field

For example  define a field called METADATAFIELD and store this here.

Then the query shud be extended by a query-METADATAFIELD

Try this out  Search as
METADATAFIELD:meta1 (this shud fetch you some result )

If this works the content is there in the field (You can check it out in
this manner to see if your implementation is right)


The BasicQuery filter i guess looks in only four different fields like
URL,title,content and anchor

I was wondering whether you should add this also to BasicQueryFilter fields

I do not know whether i am helping you in the right direction. But this is
my view

Hope this helps

Rgds
Prabhu


On 2/24/06, Vanderdray, Jacob <[hidden email]> wrote:

>
>        I'm not sure I understand what you're getting at.  In this case
> I've added a comma separated list of names of meta tags that I want to
> index and search against.  I've written a parse filter, an index filter
> and this query filter that all read in that list of meta tags from the
> nutch-site.xml file.
>
>        That much seems to work.  In the explain link I can see that the
> fields are in the index and the ranking of pages are affected by them,
> but if I search for a term which is in one of the meta tags, but not in
> any other fields I get 0 results.
>
> Thanks,
>
> Jake.
>
> -----Original Message-----
> From: Raghavendra Prabhu [mailto:[hidden email]]
> Sent: Thursday, February 23, 2006 1:19 PM
> To: [hidden email]
> Subject: Re: Search Particulars
>
> Okie
> I am new to this topic
>
> But do u add metatags to a particular field
>
> if so shud not that field also appear as in the field path
>
> The normal nutch maybe does not look at that field at all ? Maybe this
> is
> the reason ?
>
> Unless you give the metadatafield and search for the keyword
>
> Rgds
> Prabhu
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Search Particulars

Jack.Tang
In reply to this post by Vanderdray, Jacob
Hey

One simplest way is copy BasicQueryFilter class and rename it, then
modify the FIELDS/FIELD_BOOSTS by replacing them with you meta tags
from nutch config. And don't forget the configuration in your query
filter's plugin.xml.

Good luck!
/Jack

On 2/24/06, Vanderdray, Jacob <[hidden email]> wrote:

>
> Jérôme,
>
>         Thanks for think link you sent earlier.  I've read through that discussion and I think my problem must be related, but I'm still banging my head against it.  I'm pretty sure that there's something I'm missing about how the queries get added together.
>
>         As it stands my query filter seems to modify the ranking, but doesn't affect the results returned.  I've pasted my code bellow and I'd appreciate it if someone could take a look and let me know if they see the problem.
>
> Thanks,
> Jake.
>
> public class MetaQueryFilter implements QueryFilter {
>
>   private static final Logger LOG = LogFormatter
>     .getLogger(MetaQueryFilter.class.getName());
>
>   /**
>    * Need to pull out the list of meta tags from the configuration
>    */
>   private static String [] META_TAGS =
>                   NutchConf.get().getStrings("meta.names");
>
>   /**
>    * We're going to go through and create search filters for each of the meta-tags we were asked to index.
>    */
>   public BooleanQuery filter(Query input, BooleanQuery output) {
>                   // If no meta-tags were specified in the conf file, then don't bother wasting cycles
>                   if ( META_TAGS.equals(null) ) {
>                                   return output;
>                   }
>
>                   addTerms(input, output);
>                   return output;
>   }
>
>   private static void addTerms(Query input, BooleanQuery output) {
>                   Clause[] clauses = input.getClauses();
>                   for (int x = 0; x < clauses.length; x++) {
>                                   Clause c = clauses[x];
>                                   if (!c.getField().equals(Clause.DEFAULT_FIELD))
>                                           continue;             // skip non-default fields
>
>                                   // These are the fields we're interested in indexing
>                                   String [] tagsToIndex = META_TAGS;
>
>                                   for (int i = 0; i < tagsToIndex.length; ++i) {
>                                                   LOG.info("Meta Query Filter: Adding a search for " + tagsToIndex[i]);
>
>                                                   Term term = new Term(tagsToIndex[i], c.getTerm().toString());
>
>                                                   // add a lucene PhraseQuery for this tag
>                                                   PhraseQuery metaQuery = new PhraseQuery();
>                                                   metaQuery.setSlop(0);
>                                                   metaQuery.add(term);
>
>                                                   // set boost
>                                                   metaQuery.setBoost(2.0f);
>
>                                                   // add it as a specified query
>                                                   output.add(metaQuery, false, false);
>                                   }
>                   }
>   }
>
> }
>


--
Keep Discovering ... ...
http://www.jroller.com/page/jmars
Reply | Threaded
Open this post in threaded view
|

Re: Search Particulars

Doug Cutting
In reply to this post by Vanderdray, Jacob
Vanderdray, Jacob wrote:

> I'm not sure I understand what you're getting at.  In this case
> I've added a comma separated list of names of meta tags that I want to
> index and search against.  I've written a parse filter, an index filter
> and this query filter that all read in that list of meta tags from the
> nutch-site.xml file.  
>
> That much seems to work.  In the explain link I can see that the
> fields are in the index and the ranking of pages are affected by them,
> but if I search for a term which is in one of the meta tags, but not in
> any other fields I get 0 results.

Are you using RawFieldQueryFilter?  If so, are you specifying a non-zero
boost to the constructor?  RawFieldQueryFilter defaults to a zero boost.
  Query terms with a zero boost are automatically converted into
filters.  And filters cannot select documents, only remove them.

Doug
Reply | Threaded
Open this post in threaded view
|

RE: Search Particulars

Vanderdray, Jacob
In reply to this post by Vanderdray, Jacob
Doug,

        I'm actually implementing a QueryFilter directly instead of
extending one of the others.  I'm setting the boost to 2.0.  Here's the
code:

public class MetaQueryFilter implements QueryFilter {

  private static final Logger LOG = LogFormatter
    .getLogger(MetaQueryFilter.class.getName());

  /**
   * Need to pull out the list of meta tags from the configuration
   */
  private static String [] META_TAGS =
                  NutchConf.get().getStrings("meta.names");

  /**
   * We're going to go through and create search filters for each of the
meta-tags we were asked to index.
   */
  public BooleanQuery filter(Query input, BooleanQuery output) {
                  // If no meta-tags were specified in the conf file,
then don't bother wasting cycles
                  if ( META_TAGS.equals(null) ) {
                                  return output;
                  }

                  addTerms(input, output);
                  return output;
  }

  private static void addTerms(Query input, BooleanQuery output) {
                  Clause[] clauses = input.getClauses();
                  for (int x = 0; x < clauses.length; x++) {
                                  Clause c = clauses[x];
                                  if
(!c.getField().equals(Clause.DEFAULT_FIELD))
                                          continue; // skip
non-default fields

                                  // These are the fields we're
interested in indexing
                                  String [] tagsToIndex = META_TAGS;

                                  for (int i = 0; i <
tagsToIndex.length; ++i) {
                                                  LOG.info("Meta Query
Filter: Adding a search for " + tagsToIndex[i]);

                                                  Term term = new
Term(tagsToIndex[i], c.getTerm().toString());

                                                  // add a lucene
PhraseQuery for this tag
                                                  PhraseQuery metaQuery
= new PhraseQuery();
                                                  metaQuery.setSlop(0);
                                                  metaQuery.add(term);

                                                  // set boost
       
metaQuery.setBoost(2.0f);

                                                  // add it as a
specified query
                                                  output.add(metaQuery,
false, false);
                                  }
                  }
  }

}

-----Original Message-----
From: Doug Cutting [mailto:[hidden email]]
Sent: Thursday, February 23, 2006 5:09 PM
To: [hidden email]
Subject: Re: Search Particulars

Vanderdray, Jacob wrote:
> I'm not sure I understand what you're getting at.  In this case
> I've added a comma separated list of names of meta tags that I want to
> index and search against.  I've written a parse filter, an index
filter
> and this query filter that all read in that list of meta tags from the
> nutch-site.xml file.  
>
> That much seems to work.  In the explain link I can see that the
> fields are in the index and the ranking of pages are affected by them,
> but if I search for a term which is in one of the meta tags, but not
in
> any other fields I get 0 results.

Are you using RawFieldQueryFilter?  If so, are you specifying a non-zero

boost to the constructor?  RawFieldQueryFilter defaults to a zero boost.

  Query terms with a zero boost are automatically converted into
filters.  And filters cannot select documents, only remove them.

Doug
Reply | Threaded
Open this post in threaded view
|

Re: Search Particulars

Raghavendra Prabhu
Hi

The code which you sent is only for query-filter

In the  parse-filter and especially in index-fitler , do u add it to any new
field which you define??

What i do  is any data which i want to have  ,i store it in a new field
(created by me)

I guess the index-filter must be storing it in such a field

So you have to use  FieldQueryFilter with this new field type .(like
query-url and something like that)

Rgds
Prabhu


On 2/24/06, Vanderdray, Jacob <[hidden email]> wrote:

>
> Doug,
>
>        I'm actually implementing a QueryFilter directly instead of
> extending one of the others.  I'm setting the boost to 2.0.  Here's the
> code:
>
> public class MetaQueryFilter implements QueryFilter {
>
> private static final Logger LOG = LogFormatter
>    .getLogger(MetaQueryFilter.class.getName());
>
> /**
>   * Need to pull out the list of meta tags from the configuration
>   */
> private static String [] META_TAGS =
>                  NutchConf.get().getStrings("meta.names");
>
> /**
>   * We're going to go through and create search filters for each of the
> meta-tags we were asked to index.
>   */
> public BooleanQuery filter(Query input, BooleanQuery output) {
>                  // If no meta-tags were specified in the conf file,
> then don't bother wasting cycles
>                  if ( META_TAGS.equals(null) ) {
>                                  return output;
>                  }
>
>                  addTerms(input, output);
>                  return output;
> }
>
> private static void addTerms(Query input, BooleanQuery output) {
>                  Clause[] clauses = input.getClauses();
>                  for (int x = 0; x < clauses.length; x++) {
>                                  Clause c = clauses[x];
>                                  if
> (!c.getField().equals(Clause.DEFAULT_FIELD))
>                                          continue;             // skip
> non-default fields
>
>                                  // These are the fields we're
> interested in indexing
>                                  String [] tagsToIndex = META_TAGS;
>
>                                  for (int i = 0; i <
> tagsToIndex.length; ++i) {
>                                                  LOG.info("Meta Query
> Filter: Adding a search for " + tagsToIndex[i]);
>
>                                                  Term term = new
> Term(tagsToIndex[i], c.getTerm().toString());
>
>                                                  // add a lucene
> PhraseQuery for this tag
>                                                  PhraseQuery metaQuery
> = new PhraseQuery();
>                                                  metaQuery.setSlop(0);
>                                                  metaQuery.add(term);
>
>                                                  // set boost
>
> metaQuery.setBoost(2.0f);
>
>                                                  // add it as a
> specified query
>                                                  output.add(metaQuery,
> false, false);
>                                  }
>                  }
> }
>
> }
>
> -----Original Message-----
> From: Doug Cutting [mailto:[hidden email]]
> Sent: Thursday, February 23, 2006 5:09 PM
> To: [hidden email]
> Subject: Re: Search Particulars
>
> Vanderdray, Jacob wrote:
> >       I'm not sure I understand what you're getting at.  In this case
> > I've added a comma separated list of names of meta tags that I want to
> > index and search against.  I've written a parse filter, an index
> filter
> > and this query filter that all read in that list of meta tags from the
> > nutch-site.xml file.
> >
> >       That much seems to work.  In the explain link I can see that the
> > fields are in the index and the ranking of pages are affected by them,
> > but if I search for a term which is in one of the meta tags, but not
> in
> > any other fields I get 0 results.
>
> Are you using RawFieldQueryFilter?  If so, are you specifying a non-zero
>
> boost to the constructor?  RawFieldQueryFilter defaults to a zero boost.
>
> Query terms with a zero boost are automatically converted into
> filters.  And filters cannot select documents, only remove them.
>
> Doug
>