Codespell report for Tika 1.23

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Codespell report for Tika 1.23

Fossies Administrator
Hi,

the FOSS server fossies.org offers a new feature "Source code misspelling
reports":

  https://fossies.org/features.html#codespell

Although such reports are normally only generated on request, as Fossies
administrator I have just created (for testing purposes) an analysis for
the current release Tika 1.23:

  https://fossies.org/linux/misc/tika/codespell.html

That version-independent URL should redirect always to the last report
(if available), so currently to

  https://fossies.org/linux/misc/tika-1.23-src.zip/codespell.html

Although some obviously wrong matches ("false positives") are already
filtered (ignored) please inform me if you find more of them so that I can
force a new improved check if applicable.

Just for information there are also two supplemental pages

  https://fossies.org/linux/misc/tika/codespell_conf.html

showing some used "codespell" configurations and

  https://fossies.org/linux/misc/tika/codespell_fps.html

showing all resulting obvious "false positives".

Regards

Jens

--
FOSSIES - The Fresh Open Source Software archive
mainly for Internet, Engineering and Science
https://fossies.org/
Reply | Threaded
Open this post in threaded view
|

Re: Codespell report for Tika 1.23

Tim Allison
Ooooooo....dig it...fixing now.  Thank you, Jens!

On Tue, Dec 10, 2019 at 10:52 AM Fossies Administrator <
[hidden email]> wrote:

> Hi,
>
> the FOSS server fossies.org offers a new feature "Source code misspelling
> reports":
>
>   https://fossies.org/features.html#codespell
>
> Although such reports are normally only generated on request, as Fossies
> administrator I have just created (for testing purposes) an analysis for
> the current release Tika 1.23:
>
>   https://fossies.org/linux/misc/tika/codespell.html
>
> That version-independent URL should redirect always to the last report
> (if available), so currently to
>
>   https://fossies.org/linux/misc/tika-1.23-src.zip/codespell.html
>
> Although some obviously wrong matches ("false positives") are already
> filtered (ignored) please inform me if you find more of them so that I can
> force a new improved check if applicable.
>
> Just for information there are also two supplemental pages
>
>   https://fossies.org/linux/misc/tika/codespell_conf.html
>
> showing some used "codespell" configurations and
>
>   https://fossies.org/linux/misc/tika/codespell_fps.html
>
> showing all resulting obvious "false positives".
>
> Regards
>
> Jens
>
> --
> FOSSIES - The Fresh Open Source Software archive
> mainly for Internet, Engineering and Science
> https://fossies.org/
>
Reply | Threaded
Open this post in threaded view
|

Re: Codespell report for Tika 1.23

Tilman Hausherr
In reply to this post by Fossies Administrator
Am 10.12.2019 um 16:51 schrieb Fossies Administrator:
> Although such reports are normally only generated on request


Hello, can we also get this for Apache PDFBox? I've corrected typos when
I hit them, but I can't look everywhere.

https://github.com/apache/pdfbox/

or

https://svn.apache.org/repos/asf/pdfbox/

The PDFBox is used by the Tika project, and has people common to both
projects.

Tilman

Reply | Threaded
Open this post in threaded view
|

Re: Codespell report for Tika 1.23

Fossies Administrator
Hi Tilman,

> Am 10.12.2019 um 16:51 schrieb Fossies Administrator:
>>  Although such reports are normally only generated on request
>
>
> Hello, can we also get this for Apache PDFBox? I've corrected typos when I
> hit them, but I can't look everywhere.
>
> https://github.com/apache/pdfbox/
>
> or
>
> https://svn.apache.org/repos/asf/pdfbox/
>
> The PDFBox is used by the Tika project, and has people common to both
> projects.

Although Fossies has now also the possibilty to create such reports in a
special test folder that isn't integrated in the Fossies standard services
and should hopefully also not accessible by search engines, that package
is now included in the main Fossies folder "/linux/misc":

  https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/

The according codespell URLs are

  https://fossies.org/linux/misc/pdfbox/codespell.html

currently redirecting to

   https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html

and

  https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_conf.html
  https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_fps.html

If it would be meaningful to do a codespell check for e.g. for the "trunk"
version so let it know me and I can do that in the mentioned "/linux/test"
folder.

Regards

Jens

--
FOSSIES - The Fresh Open Source Software archive
mainly for Internet, Engineering and Science
https://fossies.org/
Reply | Threaded
Open this post in threaded view
|

Re: Codespell report for Tika 1.23

Tilman Hausherr
Hello Jens,

Thank you! I've now corrected all typos except those related to variable
/ method names (want to keep API stability), "Cloneable
<https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html#Cloneable>"
(that is in java itself LOL) and a few that are in resource files (these
are text extractions, i.e. the typos are in the original PDF, e.g.
PDFBOX-3044-010197-p5-ligatures.pdf).

Yes, I would like to have a report for the trunk too, although I don't
expect much new typos.

Thanks
Tilman

Am 11.12.2019 um 21:50 schrieb Fossies Administrator:

> Hi Tilman,
>
>> Am 10.12.2019 um 16:51 schrieb Fossies Administrator:
>>>  Although such reports are normally only generated on request
>>
>>
>> Hello, can we also get this for Apache PDFBox? I've corrected typos
>> when I hit them, but I can't look everywhere.
>>
>> https://github.com/apache/pdfbox/
>>
>> or
>>
>> https://svn.apache.org/repos/asf/pdfbox/
>>
>> The PDFBox is used by the Tika project, and has people common to both
>> projects.
>
> Although Fossies has now also the possibilty to create such reports in
> a special test folder that isn't integrated in the Fossies standard
> services and should hopefully also not accessible by search engines,
> that package is now included in the main Fossies folder "/linux/misc":
>
>  https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/
>
> The according codespell URLs are
>
>  https://fossies.org/linux/misc/pdfbox/codespell.html
>
> currently redirecting to
>
> https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html
>
> and
>
>  https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_conf.html
>  https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_fps.html
>
> If it would be meaningful to do a codespell check for e.g. for the
> "trunk" version so let it know me and I can do that in the mentioned
> "/linux/test" folder.
>
> Regards
>
> Jens
>

Reply | Threaded
Open this post in threaded view
|

Re: Codespell report for Tika 1.23

Fossies Administrator
Hi Tilman,

> Thank you! I've now corrected all typos except those related to variable /
> method names (want to keep API stability), "Cloneable
> <https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html#Cloneable>"
> (that is in java itself LOL) and a few that are in resource files (these are
> text extractions, i.e. the typos are in the original PDF, e.g.
> PDFBOX-3044-010197-p5-ligatures.pdf).

Oops, that file I have overseen and "Cloneable" is now also ignored.

> Yes, I would like to have a report for the trunk too, although I don't expect
> much new typos.

A new "false positive" word "hIST" is now ignored but for better
comparability I have leaved all other unchanged.

Here the main URLs for trunk checked out today Sunday at 14:59 CET.

  https://fossies.org/linux/test/pdfbox-trunk.zip/codespell.html
  https://fossies.org/linux/test/pdfbox-trunk.191215_1459.zip/codespell.html

Looks much better!

Regards

Jens

> Am 11.12.2019 um 21:50 schrieb Fossies Administrator:
>>  Hi Tilman,
>>
>>>  Am 10.12.2019 um 16:51 schrieb Fossies Administrator:
>>>>   Although such reports are normally only generated on request
>>>
>>>
>>>  Hello, can we also get this for Apache PDFBox? I've corrected typos when
>>>  I hit them, but I can't look everywhere.
>>>
>>>  https://github.com/apache/pdfbox/
>>>
>>>  or
>>>
>>>  https://svn.apache.org/repos/asf/pdfbox/
>>>
>>>  The PDFBox is used by the Tika project, and has people common to both
>>>  projects.
>>
>>  Although Fossies has now also the possibilty to create such reports in a
>>  special test folder that isn't integrated in the Fossies standard services
>>  and should hopefully also not accessible by search engines, that package
>>  is now included in the main Fossies folder "/linux/misc":
>>
>>   https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/
>>
>>  The according codespell URLs are
>>
>>   https://fossies.org/linux/misc/pdfbox/codespell.html
>>
>>  currently redirecting to
>>
>>  https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html
>>
>>  and
>>
>>   https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_conf.html
>>   https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_fps.html
>>
>>  If it would be meaningful to do a codespell check for e.g. for the "trunk"
>>  version so let it know me and I can do that in the mentioned "/linux/test"
>>  folder.
>>
>>  Regards
>>
>>  Jens
--
FOSSIES - The Fresh Open Source Software archive
mainly for Internet, Engineering and Science
https://fossies.org/
Reply | Threaded
Open this post in threaded view
|

Re: Codespell report for Tika 1.23

Tilman Hausherr
Hello Jens,
Thank you again, I have corrected all I wanted to, and created one issue
for a false positive
https://github.com/codespell-project/codespell/issues/1399
Tilman

Am 15.12.2019 um 16:33 schrieb Fossies Administrator:

> Hi Tilman,
>
>> Thank you! I've now corrected all typos except those related to
>> variable / method names (want to keep API stability), "Cloneable
>> <https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html#Cloneable>"
>> (that is in java itself LOL) and a few that are in resource files
>> (these are text extractions, i.e. the typos are in the original PDF,
>> e.g. PDFBOX-3044-010197-p5-ligatures.pdf).
>
> Oops, that file I have overseen and "Cloneable" is now also ignored.
>
>> Yes, I would like to have a report for the trunk too, although I
>> don't expect much new typos.
>
> A new "false positive" word "hIST" is now ignored but for better
> comparability I have leaved all other unchanged.
>
> Here the main URLs for trunk checked out today Sunday at 14:59 CET.
>
>  https://fossies.org/linux/test/pdfbox-trunk.zip/codespell.html
>  https://fossies.org/linux/test/pdfbox-trunk.191215_1459.zip/codespell.html 
>
>
> Looks much better!
>
> Regards
>
> Jens
>
>> Am 11.12.2019 um 21:50 schrieb Fossies Administrator:
>>>  Hi Tilman,
>>>
>>>>  Am 10.12.2019 um 16:51 schrieb Fossies Administrator:
>>>>>   Although such reports are normally only generated on request
>>>>
>>>>
>>>>  Hello, can we also get this for Apache PDFBox? I've corrected
>>>> typos when
>>>>  I hit them, but I can't look everywhere.
>>>>
>>>>  https://github.com/apache/pdfbox/
>>>>
>>>>  or
>>>>
>>>>  https://svn.apache.org/repos/asf/pdfbox/
>>>>
>>>>  The PDFBox is used by the Tika project, and has people common to both
>>>>  projects.
>>>
>>>  Although Fossies has now also the possibilty to create such reports
>>> in a
>>>  special test folder that isn't integrated in the Fossies standard
>>> services
>>>  and should hopefully also not accessible by search engines, that
>>> package
>>>  is now included in the main Fossies folder "/linux/misc":
>>>
>>>   https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/
>>>
>>>  The according codespell URLs are
>>>
>>>   https://fossies.org/linux/misc/pdfbox/codespell.html
>>>
>>>  currently redirecting to
>>>
>>>  https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html
>>>
>>>  and
>>>
>>>   https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_conf.html 
>>>
>>>   https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_fps.html 
>>>
>>>
>>>  If it would be meaningful to do a codespell check for e.g. for the
>>> "trunk"
>>>  version so let it know me and I can do that in the mentioned
>>> "/linux/test"
>>>  folder.
>>>
>>>  Regards
>>>
>>>  Jens
>

Reply | Threaded
Open this post in threaded view
|

Re: Codespell report for Tika 1.23

Fossies Administrator
On Wed, 25 Dec 2019, Tilman Hausherr wrote:

> Hello Jens,
> Thank you again, I have corrected all I wanted to, and created one issue for
> a false positive
> https://github.com/codespell-project/codespell/issues/1399
> Tilman

Yes, that is a false positive but I assume that the issue isn't easily to
solve since "codespell" claims to be "designed primarily for checking
misspelled words in source code" but the context recognition seems
currently to be improvable.

So it's more my error while manually pre-checking for false positives.
I let ignore now also "endianess" and "instanciate" and the current result
(with the very good rating grade: "A") can be found here:

  https://fossies.org/linux/test/pdfbox-trunk.zip/codespell.html
  https://fossies.org/linux/test/pdfbox-trunk-a6bc826.191225.zip/codespell.html

Regards

Jens

> Am 15.12.2019 um 16:33 schrieb Fossies Administrator:
>>  Hi Tilman,
>>
>>>  Thank you! I've now corrected all typos except those related to variable
>>>  / method names (want to keep API stability), "Cloneable
>>>  <https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html#Cloneable>"
>>>  (that is in java itself LOL) and a few that are in resource files (these
>>>  are text extractions, i.e. the typos are in the original PDF, e.g.
>>>  PDFBOX-3044-010197-p5-ligatures.pdf).
>>
>>  Oops, that file I have overseen and "Cloneable" is now also ignored.
>>
>>>  Yes, I would like to have a report for the trunk too, although I don't
>>>  expect much new typos.
>>
>>  A new "false positive" word "hIST" is now ignored but for better
>>  comparability I have leaved all other unchanged.
>>
>>  Here the main URLs for trunk checked out today Sunday at 14:59 CET.
>>
>>   https://fossies.org/linux/test/pdfbox-trunk.zip/codespell.html
>>   https://fossies.org/linux/test/pdfbox-trunk.191215_1459.zip/codespell.html 
>>
>>
>>  Looks much better!
>>
>>  Regards
>>
>>  Jens
>>
>>>  Am 11.12.2019 um 21:50 schrieb Fossies Administrator:
>>>>   Hi Tilman,
>>>>
>>>>>   Am 10.12.2019 um 16:51 schrieb Fossies Administrator:
>>>>>>    Although such reports are normally only generated on request
>>>>>
>>>>>
>>>>>   Hello, can we also get this for Apache PDFBox? I've corrected typos
>>>>>  when
>>>>>   I hit them, but I can't look everywhere.
>>>>>
>>>>>   https://github.com/apache/pdfbox/
>>>>>
>>>>>   or
>>>>>
>>>>>   https://svn.apache.org/repos/asf/pdfbox/
>>>>>
>>>>>   The PDFBox is used by the Tika project, and has people common to both
>>>>>   projects.
>>>>
>>>>   Although Fossies has now also the possibilty to create such reports in
>>>>  a
>>>>   special test folder that isn't integrated in the Fossies standard
>>>>  services
>>>>   and should hopefully also not accessible by search engines, that
>>>>  package
>>>>   is now included in the main Fossies folder "/linux/misc":
>>>>
>>>>    https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/
>>>>
>>>>   The according codespell URLs are
>>>>
>>>>    https://fossies.org/linux/misc/pdfbox/codespell.html
>>>>
>>>>   currently redirecting to
>>>>
>>>>   https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell.html
>>>>
>>>>   and
>>>>
>>>>    https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_conf.html 
>>>>
>>>>    https://fossies.org/linux/misc/pdfbox-2.0.17-src.zip/codespell_fps.html 
>>>>
>>>>
>>>>   If it would be meaningful to do a codespell check for e.g. for the
>>>>  "trunk"
>>>>   version so let it know me and I can do that in the mentioned
>>>>  "/linux/test"
>>>>   folder.
>>>>
>>>>   Regards
>>>>
>>>>   Jens