[jira] [Commented] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (NUTCH-2676) Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/NUTCH-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740472#comment-16740472 ]

Stas Batururimi commented on NUTCH-2676:
----------------------------------------

Hi, [~wastl-nagel]
Could you point me on the right direction in order to follow the redirects of the initial urls list but not the (internal/external) links present in many pages?
I played with

{code:java}
db.ignore.also.redirects
db.ignore.external.links
db.ignore.internal.links
{code}

and took a look at
https://issues.apache.org/jira/browse/NUTCH-2216
but failed with this.

All the time I have one of the following:
- redirects + a lot of other links (not specified in the initial url list)
- no redirects but saved db_redir_temp and db_redir_perm (for later use as somewhere specified)
How to combine that:
links from db_redir_temp/db_redir_perm + not internal/external links present in web pages?

> Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-2676
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2676
>             Project: Nutch
>          Issue Type: New Feature
>          Components: protocol
>    Affects Versions: 1.15
>            Reporter: Stas Batururimi
>            Priority: Major
>             Fix For: 1.16
>
>         Attachments: Screenshot 2018-11-16 at 18.15.52.png
>
>
> * Selenium needs to be updated
>  * missing remote web driver for chrome¬†
>  * necessity to add headless mode for both remote WebDriverBase Firefox & Chrome
>  * use case with Selenium grid using¬†docker (1 hub docker container, several nodes in different docker containers, Nutch in another docker container, streaming to Apache Solr in docker container, that is at least 4 different docker containers)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)