Why would a record be in the database but not show up in the results?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Why would a record be in the database but not show up in the results?

Honda-Search Administrator
Does anyone have an idea why a record would be in the database but not show up in the results?

I have 400+ pages from a certain domain in my database (checked using bin/nutch admin ) yet when I search for the domain, titles to certain pages from the domain, or unique URLs from the domain no results come up.

I was thinking it might be the regex-urlfilter, but if they are already in the database wouldn't that discount the possibility of regex-urlfilter being the culprit?

BTW, all of my urls are fetched by creating fetchlists using FreeFetchlistTool
Reply | Threaded
Open this post in threaded view
|

Why would a record be in the database but not show up in the results?

Honda-Search Administrator
Asking again hoping that someone can help me out.

I have a number of pages from a certain domain in my database.  I can verify
this when I use the command:

bin/nutch admin crawl/db -textdump text

I then look at the text.pages file and it has nearly 800 pages from that
domain in my database.

yet when I search for content from that domain nothing comes up.  Can anyone
tell me why this would happen?

Reply | Threaded
Open this post in threaded view
|

Re: Why would a record be in the database but not show up in the results?

Thomas Delnoij-3
Matt,

it's the index that is used for searching, not the webdb.

What is the status of these pages in webdb? Likely they are not
fetched yet (DB_UNFETCHED), and thus can never be in your index.

These articles give very nice basic explanation of different concepts:

http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html
http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html

HTH Thomas

On 7/22/06, Matt Timion <[hidden email]> wrote:

> Asking again hoping that someone can help me out.
>
> I have a number of pages from a certain domain in my database.  I can verify
> this when I use the command:
>
> bin/nutch admin crawl/db -textdump text
>
> I then look at the text.pages file and it has nearly 800 pages from that
> domain in my database.
>
> yet when I search for content from that domain nothing comes up.  Can anyone
> tell me why this would happen?
>
>