Re: svn commit: r584595 - in /incubator/tika/trunk: ./ src/main/java/org/apache/tika/config/ src/main/java/org/apache/tika/mime/ src/test/java/org/apache/tika/mime/

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r584595 - in /incubator/tika/trunk: ./ src/main/java/org/apache/tika/config/ src/main/java/org/apache/tika/mime/ src/test/java/org/apache/tika/mime/

chrismattmann
Jukka,

On 10/14/07 1:07 PM, "[hidden email]" <[hidden email]> wrote:

> Author: jukka
> Date: Sun Oct 14 13:07:20 2007
> New Revision: 584595
>
> URL: http://svn.apache.org/viewvc?rev=584595&view=rev
> Log:
> TIKA-66 - Use Java 5 features in org.apache.tika.mime
>     - Use Java 5 generics and foreach constructs to simplify code
>     - Removed some unused variables and method parameters
>     - Other minor cleanups

> Modified:
> incubator/tika/trunk/src/main/java/org/apache/tika/config/TikaConfig.java
> URL:
> http://svn.apache.org/viewvc/incubator/tika/trunk/src/main/java/org/apache/tik
> a/config/TikaConfig.java?rev=584595&r1=584594&r2=584595&view=diff
> ==============================================================================
> --- incubator/tika/trunk/src/main/java/org/apache/tika/config/TikaConfig.java
> (original)
> +++ incubator/tika/trunk/src/main/java/org/apache/tika/config/TikaConfig.java
> Sun Oct 14 13:07:20 2007
> @@ -72,8 +72,7 @@
>      public TikaConfig(Element element) throws JDOMException {
>          Element mtr = element.getChild("mimeTypeRepository");
>          String mimeTypeRepoResource = mtr.getAttributeValue("resource");
> -        boolean magic = Boolean.valueOf(mtr.getAttributeValue("magic"));
> -        mimeTypeRepo = new MimeUtils(mimeTypeRepoResource, magic);
> +        mimeTypeRepo = new MimeUtils(mimeTypeRepoResource);

What happened to the "magic" parameter here?


>  
>          for (Object parser : XPath.selectNodes(element, "//parser")) {
>              ParserConfig config = new ParserConfig((Element) parser);
>
> Modified:
> incubator/tika/trunk/src/main/java/org/apache/tika/mime/MagicMatch.java
> URL:
> http://svn.apache.org/viewvc/incubator/tika/trunk/src/main/java/org/apache/tik
> a/mime/MagicMatch.java?rev=584595&r1=584594&r2=584595&view=diff
> ==============================================================================
> --- incubator/tika/trunk/src/main/java/org/apache/tika/mime/MagicMatch.java
> (original)
> +++ incubator/tika/trunk/src/main/java/org/apache/tika/mime/MagicMatch.java
> Sun Oct 14 13:07:20 2007
> @@ -155,8 +155,6 @@
>      }
>  
>      public boolean eval(byte[] data) {
> -
> -        boolean ok = false;
>          for (int i = offsetStart; i <= offsetEnd; i++) {
>              if (data.length < (this.length + i)) {
>                  // Not enough data...
> @@ -181,37 +179,8 @@
>      }
>  
>      public String toString() {
> -        return new StringBuffer().append("[").append(offsetStart).append(":")
> -                
> .append(offsetEnd).append("(").append(type).append(")").append(
> -                
> "-").append(mask).append("#").append(value).append("]")
> -                .toString();
> +        return "[" + offsetStart + ":" + offsetEnd
> +            + "(" + type + ")-" + mask + "#" + value + "]";
>      }
>  
> -    private final static boolean equals(byte[] b1, byte[] b2) {
> -        if ((b1 != null) && (b2 != null)) {
> -            if (b1.length != b2.length) {
> -                return false;
> -            }
> -            for (int i = 0; i < b1.length; i++) {
> -                if (b1[i] != b2[i]) {
> -                    return false;
> -                }
> -            }
> -            return true;
> -        }
> -        if ((b1 == null) && (b2 == null)) {
> -            return true;
> -        }
> -        return false;
> -    }
> -
> -    private final static String toHexString(byte[] bytes) {
> -        StringBuffer buf = new StringBuffer();
> -        for (int i = 0; i < bytes.length; i++) {
> -            String str = Integer.toHexString(bytes[i]);
> -            buf.append((str.length() > 2) ? str.substring(str.length() - 2)
> -                    : str);
> -        }
> -        return buf.toString();
> -    }
>  }

Can you explain the reason why these methods were removed?


> Modified:
> incubator/tika/trunk/src/main/java/org/apache/tika/mime/MimeUtils.java
> URL:
> http://svn.apache.org/viewvc/incubator/tika/trunk/src/main/java/org/apache/tik
> a/mime/MimeUtils.java?rev=584595&r1=584594&r2=584595&view=diff
> ==============================================================================
> --- incubator/tika/trunk/src/main/java/org/apache/tika/mime/MimeUtils.java
> (original)
> +++ incubator/tika/trunk/src/main/java/org/apache/tika/mime/MimeUtils.java Sun
> Oct 14 13:07:20 2007
> @@ -38,18 +38,11 @@
>      private final static Logger LOG = Logger.getLogger(MimeUtils.class
>              .getName());
>  
> -    /** The key used to cache the mime repository in conf */
> -    private final static String KEY = MimeUtils.class.getName();
> -
> -    /** A flag that tells if magic resolution must be performed */
> -    private boolean magic = true;
> -
>      /** The MimeTypes repository instance */
>      private MimeTypes repository = null;
>  
>      /** Creates a new instance of MimeUtils */
> -    public MimeUtils(String resPath, boolean magic) {
> -        this.magic = magic;
> +    public MimeUtils(String resPath) {
>          if(repository == null){
>              repository = load(resPath);
>          }
>
> Modified:
> incubator/tika/trunk/src/main/java/org/apache/tika/mime/Patterns.java
> URL:
> http://svn.apache.org/viewvc/incubator/tika/trunk/src/main/java/org/apache/tik
> a/mime/Patterns.java?rev=584595&r1=584594&r2=584595&view=diff
> ==============================================================================
> --- incubator/tika/trunk/src/main/java/org/apache/tika/mime/Patterns.java
> (original)
> +++ incubator/tika/trunk/src/main/java/org/apache/tika/mime/Patterns.java Sun
> Oct 14 13:07:20 2007
> @@ -19,50 +19,45 @@
>  // JDK imports
>  import java.util.ArrayList;
>  import java.util.HashMap;
> -import java.util.Iterator;
>  import java.util.Map;
>  
>  /**
>   * Defines a MimeType pattern.
> - *
> - *
>   */
>  class Patterns {
>  
> -    private static Map escapeMap = new HashMap();
> +    private static Map<Character, String> escapeMap =
> +        new HashMap<Character, String>();
> +
>      static {
> -        escapeMap.put("\\", "\\\\");
> -        escapeMap.put("?", "\\?");
> -        escapeMap.put("[", "\\[");
> -        escapeMap.put("]", "\\]");
> -        escapeMap.put("^", "\\^");
> -        escapeMap.put(".", "\\.");
> -        escapeMap.put("-", "\\-");
> -        escapeMap.put("$", "\\$");
> -        escapeMap.put("+", "\\+");
> -        escapeMap.put("(", "\\(");
> -        escapeMap.put(")", "\\)");
> -        escapeMap.put("{", "\\{");
> -        escapeMap.put("}", "\\}");
> -        escapeMap.put("|", "\\|");
> -        escapeMap.put("*", ".*");
> +        escapeMap.put('\\', "\\\\");
> +        escapeMap.put('?', "\\?");
> +        escapeMap.put('[', "\\[");
> +        escapeMap.put(']', "\\]");
> +        escapeMap.put('^', "\\^");
> +        escapeMap.put('.', "\\.");
> +        escapeMap.put('-', "\\-");
> +        escapeMap.put('$', "\\$");
> +        escapeMap.put('+', "\\+");
> +        escapeMap.put('(', "\\(");
> +        escapeMap.put(')', "\\)");
> +        escapeMap.put('{', "\\{");
> +        escapeMap.put('}', "\\}");
> +        escapeMap.put('|', "\\|");
> +        escapeMap.put('*', ".*");
>      }



Formatting change?

>
> Modified:
> incubator/tika/trunk/src/test/java/org/apache/tika/mime/TestMimeUtils.java
> URL:
> http://svn.apache.org/viewvc/incubator/tika/trunk/src/test/java/org/apache/tik
> a/mime/TestMimeUtils.java?rev=584595&r1=584594&r2=584595&view=diff
> ==============================================================================
> --- incubator/tika/trunk/src/test/java/org/apache/tika/mime/TestMimeUtils.java
> (original)
> +++ incubator/tika/trunk/src/test/java/org/apache/tika/mime/TestMimeUtils.java
> Sun Oct 14 13:07:20 2007
> @@ -37,8 +37,6 @@
>  
>      private static final String tikaMimeFile =
> "org/apache/tika/mime/tika-mimetypes.xml";
>      
> -    private static final boolean magic = false;
> -
>      private static URL u;
>  
>      static {
> @@ -54,7 +52,7 @@
>      private MimeUtils utils;
>  
>      public TestMimeUtils() {
> -        utils = new MimeUtils(tikaMimeFile, magic);
> +        utils = new MimeUtils(tikaMimeFile);

Again, why was the "magic" parameter removed?

Cheers,
  Chris


>          assertNotNull(utils);
>      }
>  
>
>

______________________________________________
Chris Mattmann, Ph.D.
[hidden email]
Cognizant Development Engineer
Early Detection Research Network Project

_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                     Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.


Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r584595 - in /incubator/tika/trunk: ./ src/main/java/org/apache/tika/config/ src/main/java/org/apache/tika/mime/ src/test/java/org/apache/tika/mime/

chrismattmann
Ah, wait. Ok, it seems that the "magic" parameter was never used I'm
guessing, right? Well, I understand you removing it then. Perhaps we should
make MimeUtils have a parameter called "magic" in it, and then use magic to
determine whether or not we call MimeTypes.getMimeType(byte [] data) versus
calling the other methods. What do you think about this?

Cheers,
  Chris

______________________________________________
Chris Mattmann, Ph.D.
[hidden email]
Cognizant Development Engineer
Early Detection Research Network Project

_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                     Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.


Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r584595 - in /incubator/tika/trunk: ./ src/main/java/org/apache/tika/config/ src/main/java/org/apache/tika/mime/ src/test/java/org/apache/tika/mime/

Jukka Zitting
Hi,

On 10/14/07, Chris Mattmann <[hidden email]> wrote:
> Ah, wait. Ok, it seems that the "magic" parameter was never used I'm
> guessing, right? Well, I understand you removing it then.

Correct, the parameter was never used. The same goes for the methods I removed.

We can add them back if we have an actual use for them, but until that
I think we're better off without them.

> Perhaps we should make MimeUtils have a parameter called "magic" in it, and
> then use magic to determine whether or not we call
> MimeTypes.getMimeType(byte [] data) versus calling the other methods.
> What do you think about this?

Currently we only have the following method in MimeUtils:

    String getType(String typeName, String url, byte[] data)

Since we already have the data array available, we might just as well
always do the MIME magic checks. If we want to support non-magic
checks, then we should add another signature for that:

    String getType(String typeName, String url)

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r584595 - in /incubator/tika/trunk: ./ src/main/java/org/apache/tika/config/ src/main/java/org/apache/tika/mime/ src/test/java/org/apache/tika/mime/

chrismattmann
Hi Jukka,

>
> We can add them back if we have an actual use for them, but until that
> I think we're better off without them.

Me too actually thinking about it more. +1

>
>> Perhaps we should make MimeUtils have a parameter called "magic" in it, and
>> then use magic to determine whether or not we call
>> MimeTypes.getMimeType(byte [] data) versus calling the other methods.
>> What do you think about this?
>
> Currently we only have the following method in MimeUtils:
>
>     String getType(String typeName, String url, byte[] data)
>
> Since we already have the data array available, we might just as well
> always do the MIME magic checks. If we want to support non-magic
> checks, then we should add another signature for that:
>
>     String getType(String typeName, String url)

Yep, the way I envision it is that MimeUtils is really a decorator around
MimeTypes. MimeUtils should provide construction facilities and management
around the MimeTypes repo, so it probably makes sense to add a few more
method signatures to it, and expose a bit more of the functionality (and at
the same time, remove the getRepository() function).

Sound good?

Cheers,
  Chris


>
> BR,
>
> Jukka Zitting

______________________________________________
Chris Mattmann, Ph.D.
[hidden email]
Cognizant Development Engineer
Early Detection Research Network Project

_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                     Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.