Issue with index-more and query-more plugins

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue with index-more and query-more plugins

Jonathan Reichhold-2
I saw mention of the fact that the query-more plugin was giving results
back out of order, and just found it happening in my own results.

The cause of the problem is that the MoreIndexingFilter is indexing
"date" but not storing it.  For display more.jsp is using the field
"lastModified" not "date".  I.e. we are querying by a range on "date"
but displaying "lastModified"

For consistency, we should probably store "date" as well as index it.  
It could then be used for display for consitency.  I can make the
changes, but who has checkin rights?  I.e. how do I submit a patch?

Jonathan Reichhold

Reply | Threaded
Open this post in threaded view
|

Re: Issue with index-more and query-more plugins

Doug Cutting-2
Jonathan Reichhold wrote:
> For consistency, we should probably store "date" as well as index it.  

That makes sense to me.

> It could then be used for display for consitency.  I can make the
> changes, but who has checkin rights?  I.e. how do I submit a patch?

Use 'svn diff > my.patch' then either attach it to a message sent to
this list, or attach it to a bug report.

Doug
Reply | Threaded
Open this post in threaded view
|

Re: Issue with index-more and query-more plugins

Jonathan Reichhold-2
Here is the patch...

Doug Cutting wrote:

> Jonathan Reichhold wrote:
>
>> For consistency, we should probably store "date" as well as index it.  
>
>
> That makes sense to me.
>
>> It could then be used for display for consitency.  I can make the
>> changes, but who has checkin rights?  I.e. how do I submit a patch?
>
>
> Use 'svn diff > my.patch' then either attach it to a message sent to
> this list, or attach it to a bug report.
>
> Doug
>
>
>

Index: C:/Documents and Settings/jreichhold/workspace/nutch/trunk/src/web/jsp/more.jsp
===================================================================
--- C:/Documents and Settings/jreichhold/workspace/nutch/trunk/src/web/jsp/more.jsp (revision 332367)
+++ C:/Documents and Settings/jreichhold/workspace/nutch/trunk/src/web/jsp/more.jsp (working copy)
@@ -29,24 +29,25 @@
       contentLength = "";
     }
 
-    // Last-Modified
-    String lastModified = detail.getValue("lastModified");
-    if (lastModified != null) {
-      Calendar cal = new GregorianCalendar();
-      cal.setTimeInMillis(new Long(lastModified).longValue());
-      lastModified = cal.get(Calendar.YEAR)
-                  + "." + (1+cal.get(Calendar.MONTH)) // it is 0-based
-                  + "." + cal.get(Calendar.DAY_OF_MONTH);
+    // date
+    String date = detail.getValue("date");
+    if (date != null) {    
+      String year = date.substring(0,4);
+      String month = date.substring(4,6);
+      String day = date.substring(6,8);
+      date = year
+      + "." + month // it is 0-based
+      + "." + day;
       showMore = true;
     } else {
-      lastModified = "";
+      date = "";
     }
 %>
 
 <% if (showMore) {
     if ("text".equalsIgnoreCase(primaryType)) { %>
-    <br><font size=-1><nobr><%=contentType%> <%=contentLength%> <%=lastModified%></nobr></font>
+    <br><font size=-1><nobr><%=contentType%> <%=contentLength%> <%=date%></nobr></font>
 <%  } else { %>
-    <br><font size=-1><nobr><%=contentType%> <%=contentLength%> <%=lastModified%> - <a href="../text.jsp?<%=id%>"><i18n:message key="viewAsText"/></a></nobr></font>
+    <br><font size=-1><nobr><%=contentType%> <%=contentLength%> <%=date%> - <a href="../text.jsp?<%=id%>"><i18n:message key="viewAsText"/></a></nobr></font>
 <%  }
   } %>
Index: C:/Documents and Settings/jreichhold/workspace/nutch/trunk/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java
===================================================================
--- C:/Documents and Settings/jreichhold/workspace/nutch/trunk/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java (revision 332367)
+++ C:/Documents and Settings/jreichhold/workspace/nutch/trunk/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java (working copy)
@@ -109,7 +109,7 @@
     if (lastModified != null) {                   // try parse last-modified
       time = getTime(lastModified,url);           // use as time
                                                   // store as string
-      doc.add(Field.UnIndexed("lastModified", new Long(time).toString()));
+      doc.add(new Field("lastModified", Long.toString(time), Field.Store.YES, Field.Index.NO));
     }
 
     if (time == -1) {                             // if no last-modified
@@ -119,11 +119,11 @@
     // add support for query syntax date:
     // query filter is implemented in DateQueryFilter.java
     SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMdd");
-    sdf.setTimeZone(TimeZone.getTimeZone("GMT"));
+    //sdf.setTimeZone(TimeZone.getTimeZone("GMT"));
     String dateString = sdf.format(new Date(time));
 
     // un-stored, indexed and un-tokenized
-    doc.add(new Field("date", dateString, false, true, false));
+    doc.add(new Field("date", dateString, Field.Store.YES, Field.Index.UN_TOKENIZED));
 
     return doc;
   }
@@ -173,7 +173,7 @@
     String contentLength = metaData.getProperty("content-length");
 
     if (contentLength != null)
-      doc.add(Field.UnIndexed("contentLength", contentLength));
+      doc.add(new Field("contentLength", contentLength, Field.Store.YES, Field.Index.NO));
 
     return doc;
   }