Re: svn commit: r582674 - in /incubator/tika/trunk: ./ src/main/java/org/apache/tika/config/ src/main/java/org/apache/tika/parser/ src/main/java/org/apache/tika/parser/html/ src/main/java/org/apache/tika/parser/msexcel/ src/main/java/org/apache/tika/parse

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r582674 - in /incubator/tika/trunk: ./ src/main/java/org/apache/tika/config/ src/main/java/org/apache/tika/parser/ src/main/java/org/apache/tika/parser/html/ src/main/java/org/apache/tika/parser/msexcel/ src/main/java/org/apache/tika/parse

chrismattmann
Hi Jukka,

On 10/7/07 1:01 PM, "[hidden email]" <[hidden email]> wrote:

> Author: jukka
> Date: Sun Oct  7 13:01:46 2007
> New Revision: 582674
>
> URL: http://svn.apache.org/viewvc?rev=582674&view=rev
> Log:
> TIKA-46 - Use Metadata in Parser
>     - With improvements by Chris Mattmann
>
 
> Modified:
> incubator/tika/trunk/src/main/java/org/apache/tika/utils/ParseUtils.java
> URL:
> http://svn.apache.org/viewvc/incubator/tika/trunk/src/main/java/org/apache/tik
> a/utils/ParseUtils.java?rev=582674&r1=582673&r2=582674&view=diff
> ==============================================================================
> --- incubator/tika/trunk/src/main/java/org/apache/tika/utils/ParseUtils.java
> (original)
> +++ incubator/tika/trunk/src/main/java/org/apache/tika/utils/ParseUtils.java

I'm not sure I get these changes for this file: did you just remove it and
add it back? Was it formatting that changed?


> Modified: incubator/tika/trunk/src/test/java/org/apache/tika/TestParsers.java
> URL:
> http://svn.apache.org/viewvc/incubator/tika/trunk/src/test/java/org/apache/tik
> a/TestParsers.java?rev=582674&r1=582673&r2=582674&view=diff
> ==============================================================================
> --- incubator/tika/trunk/src/test/java/org/apache/tika/TestParsers.java
> (original)
> +++ incubator/tika/trunk/src/test/java/org/apache/tika/TestParsers.java Sun

> -        assertEquals("Sample Powerpoint Slide", contents.get("title")
> -                .getValue());
> +        assertEquals("Sample Powerpoint Slide", metadata.get("title"));

Your commit didn't include my updates to the above, which changed it to use
Metadata.TITLE, instead of the literal string "title"

>      }
>  
>      public void testWORDxtraction() throws Exception {
> @@ -130,15 +131,16 @@
>          assertEquals(s1, s2);
>          ParserConfig config = tc.getParserConfig("application/msword");
>          Parser parser = ParserFactory.getParser(config);
> -        Map<String, Content> contents = config.getContents();
> +        Collection<Content> contents = config.getContents();
>          assertNotNull(contents);
> +        Metadata metadata = new Metadata();
>          InputStream stream = new FileInputStream(file);
>          try {
> -            parser.parse(stream, contents.values());
> +            parser.parse(stream, contents, metadata);
>          } finally {
>              stream.close();
>          }
> -        assertEquals("Sample Word Document",
> contents.get("title").getValue());
> +        assertEquals("Sample Word Document", metadata.get("title"));

Same here

>      }
>  
>      public void testEXCELExtraction() throws Exception {
> @@ -156,15 +158,16 @@
>                  .contains(expected));
>          ParserConfig config = tc.getParserConfig("application/vnd.ms-excel");
>          Parser parser = ParserFactory.getParser(config);
> -        Map<String, Content> contents = config.getContents();
> +        Collection<Content> contents = config.getContents();
>          assertNotNull(contents);
> +        Metadata metadata = new Metadata();
>          InputStream stream = new FileInputStream(file);
>          try {
> -            parser.parse(stream, contents.values());
> +            parser.parse(stream, contents, metadata);
>          } finally {
>              stream.close();
>          }
> -        assertEquals("Simple Excel document",
> contents.get("title").getValue());
> +        assertEquals("Simple Excel document", metadata.get("title"));

And here

>      }
>  
>      public void testOOExtraction() throws Exception {
> @@ -185,18 +188,18 @@
>          Parser parser = ParserFactory.getParser(config);
>          assertNotNull(parser);
>  
> -        Map<String, Content> contents = config.getContents();
> +        Collection<Content> contents = config.getContents();
>          assertNotNull(contents);
> +        Metadata metadata = new Metadata();
>          InputStream stream = new FileInputStream(file);
>          try {
> -            parser.parse(stream, contents.values());
> +            parser.parse(stream, contents, metadata);
>          } finally {
>              stream.close();
>          }
> -        assertEquals("Title : Test Indexation Html", contents.get("title")
> -                .getValue());
> +        assertEquals("Title : Test Indexation Html", metadata.get("title"));

And here.

Probably just an omission, but could you update it?

Thanks!

Cheers,
 Chris



______________________________________________
Chris Mattmann, Ph.D.
[hidden email]
Cognizant Development Engineer
Early Detection Research Network Project

_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                     Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.