[jira] [Updated] (NUTCH-2586) Add a fallback mechanism for missing meta tags

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Updated] (NUTCH-2586) Add a fallback mechanism for missing meta tags

David Pilato (Jira)

     [ https://issues.apache.org/jira/browse/NUTCH-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Nagel updated NUTCH-2586:
    Affects Version/s: 1.15

> Add a fallback mechanism for missing meta tags
> ----------------------------------------------
>                 Key: NUTCH-2586
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2586
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 1.15
>            Reporter: Gerard Bouchar
>            Priority: Major
>             Fix For: 1.17
> While using nutch, we faced the following issue: some web pages miss a "description"  meta tag, but include an "og:description" meta (using the [open graph protocol|http://ogp.me/]).
> Here are two examples:
> * http://imagenesdelavirgenmaria.com/17-imagenes-de-la-virgen-maria-de-guadalupe/
> * http://mixcdsource.com/product/dj-arson-dj-sin-cerothe-hit-list-18-5-reggaeton-edition/
> It would be nice to have a configurable list of fallback meta tags to use when the main meta tag is absent. Something that would allow us to specify, in the configuration, "when the 'description' meta is missing, use 'og:description', when 'title' is missing, use 'og:title', etc..." .

This message was sent by Atlassian Jira