message/news; charset=windows-1252 -> message/rfc822

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

message/news; charset=windows-1252 -> message/rfc822

Allison, Timothy B.
All,
  With the new mime patterns, we've gotten quite a few changes of message/news being identified as message/rfc822.  An example is:

http://162.242.228.174/docs/commoncrawl2/DA/DALFSFPD6FX4GGZ6EEJQA6RABA7OXIF5<http://162.242.228.174/docs/commoncrawl2/VG/VGXYD2ISNSDJAVMK6CK7DHB3KI6ZHB6L>

We should correct this, right?  Any recommendations?

       Best,

                  Tim



Timothy B. Allison, Ph.D.
Principal Artificial Intelligence Engineer
T835/Human Language Technology
The MITRE Corporation
7515 Colshire Drive, McLean, VA  22102
703-983-2473 (phone); 703-983-1379 (fax)


Reply | Threaded
Open this post in threaded view
|

Re: message/news; charset=windows-1252 -> message/rfc822

Nick Burch-2
On Wed, 28 Mar 2018, Allison, Timothy B. wrote:
>  With the new mime patterns, we've gotten quite a few changes of
> message/news being identified as message/rfc822.  An example is:
>
> http://162.242.228.174/docs/commoncrawl2/DA/DALFSFPD6FX4GGZ6EEJQA6RABA7OXIF5<http://162.242.228.174/docs/commoncrawl2/VG/VGXYD2ISNSDJAVMK6CK7DHB3KI6ZHB6L>

That looks like a regression to me, it's really news

> We should correct this, right?  Any recommendations?

I think it's the Message-ID header it's matching on. I'd suggest we bump
the news magics up from 50 (same as rfc822) to 60, so the news ones take
preference

Nick
Reply | Threaded
Open this post in threaded view
|

Re: message/news; charset=windows-1252 -> message/rfc822

Chris Mattmann
+1

 

 

From: Nick Burch <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Wednesday, March 28, 2018 at 8:01 AM
To: "[hidden email]" <[hidden email]>
Subject: Re: message/news; charset=windows-1252 -> message/rfc822

 

On Wed, 28 Mar 2018, Allison, Timothy B. wrote:

  With the new mime patterns, we've gotten quite a few changes of

message/news being identified as message/rfc822.  An example is:

 

http://162.242.228.174/docs/commoncrawl2/DA/DALFSFPD6FX4GGZ6EEJQA6RABA7OXIF5<http://162.242.228.174/docs/commoncrawl2/VG/VGXYD2ISNSDJAVMK6CK7DHB3KI6ZHB6L>

 

That looks like a regression to me, it's really news

 

We should correct this, right?  Any recommendations?

 

I think it's the Message-ID header it's matching on. I'd suggest we bump

the news magics up from 50 (same as rfc822) to 60, so the news ones take

preference

 

Nick